BUG: Motion hangs if webcam is removed
If I remove the webcam while Motion is running (which accidently happens every now and then), Motion becomes unkillable. Ctrl-c, kill -1, -2, -15 don't help and I have to use kill -9.
Last and only error message:
[1] v4l2_next: VIDIOC_DQBUF: EIO (s->pframe 0)
Environment
Motion version: |
3.2.9 |
ffmpeg version: |
|
Shared libraries: |
ffmpeg |
Server OS: |
Ubuntu (Hardy), kernel 2.6.24 |
--
PekkaVuorela - 17 May 2008
Follow up
Try to reproduce it in 3.2.10 , even i can happen , i've tested with webcams using v4l2 and motion using watchdog recover from that situation.
--
AngelCarpintero - 18 May 2008
I tried SVN snapshot (20080517) which behaved almost similarly:
[1] v4l2_next: VIDIOC_DQBUF: EIO (s->pframe 1)
[0] Thread 1 - Watchdog timeout, trying to do a graceful restart
[0] Thread 1 - Watchdog timeout, did NOT restart graceful,killing it!
[0] Calling vid_close() from motion_cleanup
[0] Closing video device /dev/video0
Nevertheless, Motion is unkillable.
--
PekkaVuorela - 18 May 2008
Ok, I tried 3.2.10 with similar results again, and dug a little deeper. What was happening seems to be the following:
thread 1 tries to fetch an image in v4l2_next(), where ioctl() fails with EIO when the webcam is removed. The thread loops back in the huge motion_loop() and tries to get an image again, this time blocking forever. This is why ctrl-c doesn't do anything, it suggests thread 1 to stop, but the thread is blocking in ioctl(). Now, thread 0 comes to rescue, it notices that thread 1 isn't functioning properly, tries to stop it and cleans up the mess with motion_cleanup(), but, alas, blocks itself in vid_close()->...close(fd).
So, I added poll() to the v4l2_next() to check if the video device has input and return with non fatal error if such does not appear in 5 seconds. It provided better interactivity, but unfortunately VIDIOC_QBUF didn't like to be called two times without VIDIOC_DQBUF in between so patch isn't really ideal. I'll attach it anyway as a quick fix.
patch:
motion-3.2.10-webcam-disappear-hang-fix.patch
This is what happens with it (ctrl-c pressed after opening errors):
[1] v4l2_next: VIDIOC_DQBUF: EIO (s->pframe 0)
[1] v4l2_next: Video device doesn't have an image ready
[1] v4l2_next: VIDIOC_QBUF: Invalid argument
[1] Video device fatal error - Closing video device
[1] Closing video device /dev/video0
[1] Retrying until successful connection with camera
[1] Failed to open video device /dev/video0: No such file or directory
[1] Retrying until successful connection with camera
[1] Failed to open video device /dev/video0: No such file or directory
[1] Thread exiting
[0] Motion terminating
--
PekkaVuorela - 21 May 2008
Take two: added int image_queued to struct src_v4l2_t so v4l2_next() knows if VIDIOC_QBUF has been already made. Now Motion doesn't close the device if input is not available.
motion-3.2.10-webcam-disappear-hang-fix2.patch
Btw. v4l2_next() does some signal blocking stuff. To me it seems like some old cruft. The function has many exit points, including mine, where the masking isn't undone.
--
PekkaVuorela - 21 May 2008
Pekka your approach looks good , but lets add also an obvious patch that you spotted out , signal is not unmasked :-/
So look what i got using this patch unplugging and plugin back my webcam , so obviously pwc module has not any issue :
[1] Using V4L2
[1] Resizing pre_capture buffer to 1 items
[1] Started stream webcam server in port 8081
[1] File of type 8 saved to: /tmp/cam1/01-20080522043744.avi
[1] v4l2_next: VIDIOC_QBUF: No such device
[1] Video device fatal error - Closing video device
[1] Closing video device /dev/video0
[1] Retrying until successful connection with camera
[1] cap.driver: "pwc"
[1] cap.card: "Logitech QuickCam Orbit"
[1] cap.bus_info: "usb-0000:00:02.0-4"
[1] cap.capabilities=0x05000001
[1] - VIDEO_CAPTURE
[1] - READWRITE
Btw could you tell me what webcam & driver are you using ?
Thanks
--
AngelCarpintero - 22 May 2008
Allright, this does also the unmasking within the function:
motion-3.2.10-webcam-disappear-hang-fix3.patch
I'm using Logitech Quickcam Pro for Notebooks, I think. UVC driver anyway. It seems that this driver returns 0 from ioctl(VIDIOC_QBUF) and pwc returns -1 when the webcam is removed. I wonder if this could be considered as a bug in the UVC driver.
--
PekkaVuorela - 22 May 2008
From uvc_dequeue_buffer() / uvc_queue_buffer()
uvcvideo svn r209
* Dequeue a video buffer. If nonblocking is false, block until a buffer is
* available.
So i can prepare a patch to open device using :
open(dev->video_device, O_RDWR | O_NONBLOCK, 0);
And checking EAGAIN in VIDIOC_DQBUF.
from uvc_queue_buffer()
/*
* Queue a video buffer. Attempting to queue a buffer that has already been
* queued will return -EINVAL.
*/
Anyway let me try to reproduce to be what is the exact problem ... because i guess this issue depends on state of buffer.
So we can try to workaround debuging uvcvideo. Anyway lets see if with a simple approach we can isolate the issue , test my last patch.
--
AngelCarpintero - 22 May 2008
The UVC driver works as specified in
V4L2 API regarding blocking. However, to me opening the device with O_NONBLOCK seems a bit dangerous. The image has to be almost instantly available, otherwise it might be skipped? At least some drivers could take some time between QBUF and DQBUF to get the image?
Anyway, with your patch:
[1] Using V4L2
[1] Resizing pre_capture buffer to 1 items
[1] v4l2_next: VIDIOC_DQBUF: EAGAIN (s->pframe -1)
[1] v4l2_next: VIDIOC_DQBUF: EAGAIN (s->pframe -1)
[1] File of type 8 saved to: /home/pvuorela/motion/2008-05-22/avi/22-47-10.avi
[1] File of type 1 saved to: /home/pvuorela/motion/2008-05-22/jpg/22-47-10-00.jpg
...
[1] v4l2_next: VIDIOC_DQBUF: EAGAIN (s->pframe 3)
[1] v4l2_next: VIDIOC_QBUF: Invalid argument
[1] Video device fatal error - Closing video device
[1] Closing video device /dev/video0
[1] Retrying until successful connection with camera
[1] Failed to open video device /dev/video0: No such file or directory
It seems to take a little time in the beginning before my webcam is able to deliver images. After that it works better and is able to survive webcam disappering. However, I think QBUF fails because it's called twice in a row (QBUF -> DQBUF+EAGAIN -> QBUF). I used the patch to vanilla 3.2.10 so it didn't have the check I added for these cases.
I think the whole thing comes down UVC driver which doesn't return an error with QBUF to removed webcam. But anyway, either poll()/select() or O_NONBLOCK should probably be used to avoid blocking. I'll probably do some more testing with this when I have more time.
--
PekkaVuorela - 22 May 2008
Fix record