Netcam Stability Patch
Introduction
This is a new clean page to start working on getting the
StreamingNetcamWithoutCurl stable.
Description of Patch
This is a major rewrite of the code in order to make the Network camera support more stable.
Although this code is somewhat experimental, please give it a try and report your success/failures. If you are reporting issues, please attach your config files and relevant lines from your syslog/screen capture. If possible, please use
MjpegSniffer to grab a small capture from your netcam as well.
Currently, a small sub-patch is being distributed independently of the main patch for both 3.1.19 and 3.2.1. This is for video.c and appears to solve a race condition during netcam startup. If you are experiencing issues when starting motion with a netcam, please try this patch.
The main patch is currently for 3.2.1 only. Backports to 3.1.19 are planned, but a release schedule is not currently planned. Please check back occasionally to see if 3.1.19 patches are available.
For 3.1 this is assumed to always be the latest stable release. For 3.2 the latest snapshot (specify in the
PatchForVersion field)
FOR DEVELOPERS: The mjpegserver is a netcam simulator that mimics the behavior/quirks of various netcam models. This is currently being used to test the stream formats when using motion. Currently, mjpegserver simulates streams from the following netcams:
- D-Link DCS-900/DCS-900W
- Axis 205/211/2100/2110
To incorporate additional simulated types, please submit a raw data captures from your netcam using
MjpegSniffer.
A special TWiki topic for attaching network camera streams has been created:
NetcamMjpegStreamDumps. Just go there and click the
Attach Image or Document link and add it. Give it a name and comment so we know which camera it is.
Installation of Patch
Each patch exists in a 3.1 and/or 3.2 version.
- Download the Motion sources the patch was made for and extract the tar.gz file as described for normal installation in the MotionGuide.
- Download the patch file and extract it in the Motion source file directory.
- Run
patch < name_of_patch_file
- Re-run the configure, make clean, make and make install.
Changelog of Patch
- 3.1.20-snap5-post1
- 3.2.1-snap12-post1
- rewrote url parser, better syntax checking and error handling of urls
- userpass now allowed in url (http://user:pass@example.com/)
netcam_userpass has precedence, it will override a userpass embedded in the url
- 3.1.20-snap4-post4 (integrated with 3.1.20-snap5)
- 3.2.1-snap11-post4 (integrated with 3.1.20-snap12)
- reworked thread signal/wait conditions, should fix some race conditions
- use gettimeofday() to determine thread timeouts, results in better accuracy
- adjusted condition timeouts to smaller values due to usage of gettimeofday() and rework of thread signal/wait conditions
- adjusted reconnection retries to 60 (every minute for an hour)
- 3.1.20-snap4-post3
- 3.2.1-snap11-post3
- really fix bug where motion will not quit if requested when reconnecting
- 3.1.20-snap4-post2
- 3.2.1-snap11-post2
- fix bug where motion will not quit if requested when reconnecting
- 3.1.20-snap4-post1
- 3.2.1-snap11-post1
- cruft, feature creep and redudant code removed
- consolated reconnection capability to unified netcam_reconnect function
- rework netcam_start logic, minimize startup variables
- rework netcam_stream_read and netcam_single_read logic
- minor changes to netcam_next logic
- fix bug in streaming camera without content-length, recent mod broke
- fix bug in startup of single image reads without content-length
- 3.1.20-snap3-post6 (integrated with 3.1.20-snap4)
- 3.2.1-snap10-post6 (integrated with 3.1.20-snap11)
- rearranged timeout assignments for pthread_cond_timedwait() calls
- adjusted TIMEOUT_COND_WHICH to 4 seconds
- 3.1.20-snap3-post5
- 3.2.1-snap10-post5
- added additional headers in http request
- added back header validation (should fix netcam_read_header lockups)
- detect when there is no data on socket in netcam_read_ functions (should fix netcam_read_image_contentlength() and netcam_read_image_no_contentlength() lockups)
- 3.1.20-snap3-post4
- 3.2.1-snap10-post4
- removed additional header validation check
- limited times headers will be checked
- removed mutex lock around netcam_start() in video.c, hopefully race conditions are fixed
- 3.1.20-snap3-post3
- 3.2.1-snap10-post3
- added additional header validation check
- changed a couple fd references to use RBUF_FD
- added error message if jpeglib error occurs
- 3.1.20-snap3-post2
- 3.2.1-snap10-post2
- reworked reconnection for netcam_start() - disabled by default, see source for INIT_RECONNECT_RETRIES
- break some long lines in code
- replaced sleep with nanosleep per suggestion by KennethLavrsen
- 3.1.20-snap3-post1
- 3.2.1-snap10-post1
- destroy mutexes in netcam_cleanup()
- add reconnection for netcam_start() - this may block other cameras from starting up!
- added additional defines for reconnect retries
- change reconnection timeouts to 60 seconds
- reworked close(sock) in netcam_connect, to insure future changes won't forget to close the socket
- 3.1.19-post1 (integrated with 3.1.20-snap1)
- backported 3.2.1-snap7-post1 to 3.1.19
- 3.2.1-snap7-post1 (integrated with snap8)
- added support for non-streaming (image based) netcams without content-length header
- 3.2.1-snap6-post1 (integrated with snap7)
- added support for netcams without content-length header (streaming only)
- remove memmem from netcam_wget.[c|h] (no longer used)
- several miscellaneous code cosmetic changes
- TODO: remove tests for memmem from configure
- 3.1.19-video.c
- 3.2.1-snap5_post1_video.c (integrated with snap6)
- Fixed netcam startup race condition
- 3.2.1-snap5-post1 (integrated with snap6)
- refactored image handling back to single unified function
- refactored reconnection algorithm
- jpeg only based connections should now use less cpu time
- temporarily removed support for devices that do not support content-length (in progress)
- synced syslog/printf style to new motion standard
- added developer debug trace defines/code
- defines now used for many constants
NOTE: Added version numbers to patches, since I am starting to backport to 3.1.19.
Configuration switch: Force to stream mode
Raised as a feature request but it is actually more a missing feature or maybe even bug in the Netcam code
D-Link DCS-900 has a 'stream' interface, can be reached by /video.cgi. It gives continious stream of images like this:
--video boundary--
Content-length: 23131
Content-type: image/jpeg
[Data here]
--video boundary--
Content-length: 23103
Content-type: image/jpeg
[Data here]
[This goes on for ever]
Now I need a way to tell motion that this is really stream, not a single image, even if content-type tells otherwise.
--
KennethLavrsen - 22 Mar 2005 for
TommiRouvali
The D-Link DCS-900 should work with the 3.2.1-snap6 code base, as this is similar to the camera I am using (DCS-900W). Is this issue with a 3.1.x release? The configuration file for the camera should contain something like:
netcam_url http://localhost/video.cgi
netcam_userpass username:password
If this isn't the problem, could we get a copy of the config files? I don't think I'll need a stream capture for this one, but it wouldn't hurt.
--
ChristopherPrice - 22 Mar 2005
--
BruceDurham - 24 Mar 2005
Oops. Hit the "Add comment" button first by mistake. Here's a bug report:
My 4 cameras (which all work with motion-3.1.16) are:
Thread[1] Panasonic BL-C10A
Thread[2] Panasonic KX-HCM280
Thread[3] Panasonic KX-HCM270
Thread[4] Toshiba IK-WB01A
Here is the output from my run of motion-3.2.1_snap6 with this patch installed on RH 2.4EL:
Output from motion:
[cpe1 config]$ /usr/motion/bin/motion
Processing thread 0 - config file motion.conf
Processing thread 1 - config file /usr/config/camera1.properties
Processing thread 2 - config file /usr/config/camera2.properties
Processing thread 3 - config file /usr/config/camera3.properties
Processing thread 4 - config file /usr/config/camera4.properties
Thread 1 PID: 4253
Thread 2 PID: 4253
Thread 3 PID: 4253
Thread 4 PID: 4253
motion-httpd running, accepting connections
waiting for data on port TCP 8080
Thread 1 exiting
Thread 2 exiting
[4] netcam: unsupported network camera (in progress)
[4] Capture error Success
Thread 4 finishing...
[cpe1 config]$
My /var/log/messages shows:
Mar 23 20:39:46 cpe1 motion: Processing thread 0 - config file motion.conf
Mar 23 20:39:46 cpe1 motion: Processing thread 1 - config file /usr/config/camera1.properties
Mar 23 20:39:46 cpe1 motion: Processing thread 2 - config file /usr/config/camera2.properties
Mar 23 20:39:46 cpe1 motion: Processing thread 3 - config file /usr/config/camera3.properties
Mar 23 20:39:46 cpe1 motion: Processing thread 4 - config file /usr/config/camera4.properties
Mar 23 20:39:51 cpe1 motion: Somebody stole the video device, lets hope we got his picture
Mar 23 20:39:53 cpe1 motion: Somebody stole the video device, lets hope we got his picture
Mar 23 20:39:53 cpe1 motion: [4] netcam: unsupported network camera (in progress)
Mar 23 20:39:53 cpe1 motion: [4] Capture error Success
Mar 23 20:39:58 cpe1 ntpd[4000]: time reset -0.474696 s
Mar 23 20:39:58 cpe1 ntpd[4000]: kernel time discipline status change 41
Mar 23 20:39:58 cpe1 ntpd[4000]: synchronisation lost
Mar 23 20:44:30 cpe1 ntpd[4000]: kernel time discipline status change 1
[cpe1 config]#
I can try to capture info from them with the mjpeg-sniffer tool if it is necessary.
--
BruceDurham - 24 Mar 2005
Ah, I see that thread 4 is a camera using the image only mode that doesn't support content length. I'm still working on that.
I will hopefully have that code in place this weekend.
Could you try disabling thread 4 (Toshiba IK-WB01A) and see if the other threads work or not?
And, yes, could you send me the mpegsniffer output for each of the camera models? I'm finishing up a mjpegserver program that simulates the streams of multiple camera types, so I don't have to have direct access to the actual hardware.
--
ChristopherPrice - 24 Mar 2005
Christopher,
I've just tried out motion-3.1.20_snap3 and am getting the problem with the Toshiba Cam image not being recognized as JPEG (I logged a bug there, and subsequently saw the message to log it here) which causes motion to quit.
By the way, the URLs I'm using for each of these cameras is a single JPEG, not MJPEG; Will sniffer still be necessary?
I think that in previous versions disabling the Toshiba cam cleared up the problem; All of them work with motion 3.1.16 though.
--
BruceDurham - 28 Mar 2005
I think it is a good idea to upload some sniffing results on
NetcamMjpegStreamDumps. Upload both the mjpeg and jpeg types. It helps us test the Netcam code without having the actual cameras.
--
KennethLavrsen - 28 Mar 2005
Kit.
Watch out for the use of sleep() in your netcam code.
sleep, usleep and nanosleep all get interrupted by signals. Just one of the other threads executing an external program which then dies and sends a SIG_CHLD makes the sleep function end before you intended.
I just cured a bug in motion.c so this is why I have my attention to it.
This code is more reliable.
struct timespec delay_time, remaining_time;
delay_time.tv_sec = RECONNECT_TIMEOUT;
delay_time.tv_nsec = 0;
while ( nanosleep(&delay_time,&remaining_time) == -1 )
{
delay_time.tv_sec = remaining_time.tv_sec;
delay_time.tv_nsec = remaining_time.tv_nsec;
}
--
KennethLavrsen - 29 Mar 2005
Thanks! I was looking into a few issues that looked like race conditions, but this may have been the problem. I am replacing the sleeps with the code you provided.
--
ChristopherPrice - 29 Mar 2005
Christopher,
I've run mjpegsnif on my four netcams and attached them to the
NetcamMjpegStreamDumps topic. Please note that these are single-jpeg image dumps.
--
BruceDurham - 30 Mar 2005
I applied 3.1.20_snap3_post4 patch, but one by one, the 4 camera's stop working after a couple of hours. Stop working means no images were saved (and no mpeg4 file is created).
The motion processes keep running (and keep the cpu busy!).
When I hit ctrl-c, one netcam thread that was still running exits, but the other threads don't stop until I kill them (kill -9)
Mar 31 09:42:31 argus motion: [3] netcam: freeing data...
Mar 31 09:42:31 argus motion: [3] netcam: exiting loop...
I don't see any specific error in the logfiles.
--
BrechtSamyn - 31 Mar 2005
I compiled motion (netcam.c) again with
#define NETCAM_DEBUG
When a thread stops working, it "stops" this way in the logs:
[1] netcam: header [HTTP/1.0 401 Unauthorized]
[1] netcam: header [Date: Thu, 31 Mar 2005 13:25:51 GMT]
[1] netcam: header [Server: Boa/0.92o]
[1] netcam: header [Content-Type: text/html]
[1] netcam: header [WWW-Authenticate: Basic realm="axview"]
[1] netcam: header []
[1] netcam: exit netcam_read_header() = 5
[1] netcam: enter netcam_read_image(), which = 0
[1] netcam: enter netcam_read_image_contentlength(), which = 0
Before this
header [HTTP/1.0 401 Unauthorized]
, there were 2841
header [HTTP/1.0 200 OK]
messages, so I guess there's nothing wrong with the account.
--
BrechtSamyn - 31 Mar 2005
I believe I have recently tracked down this problem and am currently working on a fix. Under certain conditions, calling the netcam_read_image_xxx functions will result in an endless loop trying to read data from the socket. I am preparing a patch to be released shortly.
Thanks for the report!
--
ChristopherPrice - 31 Mar 2005
FYI - I plan to merge in whatever is your latest Netcam patch together with a patch that
JoergWeber is working on. I will probably make the snaps Saturday (I am out Friday).
--
KennethLavrsen - 31 Mar 2005
OK, here’s my bug.
First, what I'm running. Linux From Scratch v5.0 with a 2.4.22 kernel running dhcdp, bind & postfix. I've got two Axis 205 netcams on the network that get IP address from dhcp. I can browse to the cameras OK using their assigned names from my XP PCs.
motion version 3.1.20snap3 with netcam patch post4.
Problem 1
I get the following errors out to the log every 1 - 6 minutes with the [1] varying between [1] and [2] apparently randomly.
motion: [1] netcam: connect() : Connection refused
motion: [1] netcam: error reading image
I've done some network testing and I don't appear to be having any problems in that area.
Problem 2
Sometimes I get a sequence of 4 pictures pertaining to be from one camera (normally my study camera, number 1 in motion) where the first image if from camera 2 and the following 3 from camera 1. All four pictures are named as per camera 1 settings. Email notification is turned on for this camera and a message is sent.
It does happen the other way as well but not as often. When the images say they are from camera 2 no email message is sent (and this is turned off for camera 2).
Problem 3
Sometimes an image is missing from the motion sequence output and substituted for an image earlier in the sequence. Imagine I was walking up a flight of stairs and motion was detected at each stair. You would expect to see output for stair 1, then 2, then 3, 4, 6, 7 etc. However what you get is the sequence stair 1, 2, 3, 5, 3, 6, 7 etc. I'm sure that output has also been in the wrong order (not a repeated frame) but I can't find the output to prove that.
Problems 2 and 3 are best illustrated by seeing the files. Please contact me directly and I'll email them to the developers (I just don't want them around for all the world to see - picky I know
)
If there is more I can do to help diagnose this problem, be it running some debug option or similar then please let me know. I'll do what I can to help.
--
TheSpike - 01 Apr 2005
TheSpike,
First of all, I don't think your base config is an issue, as until recently I used kernel 2.4.22, 2.4.25 and 2.4.27 and experienced no major issues. Currently, I am running kernel 2.6.10. I upgraded since 2.6.x supports Native Posix Threads.
Problem 1:
Are you using the mjpeg streaming or jpeg image method? This sounds like it's a problem with the jpeg image method.
Problem 2:
This is interesting, as I have seen something similiar before, but not at all how you describe it. What I have encountered was upon netcam startup, two motion netcams connected to a single hardware netcam stream. This almost sounds like a reentrant problem, I'll have to investigate.
Problem 3:
I believe I know what is causing this. Currently, the way the netcam_next function works is it will timeout after a certain period and go ahead and process the current image. Basically, I have a logic problem that has crept in trying to avoid deadlock situations. I believe I can fix this with some careful manipulation of the code in the netcam_next and netcam reader functions.
I'll be looking into these issues. Thanks for the report.
--
ChristopherPrice - 02 Apr 2005
Thanks for the response.
Problem 1 : I was using this address, http://camera-1.xx.xxx/axis-cgi/image.cgi to get a single jpg image. I also had user authentication turned on.
I've been back through the Axis.com and found an alternative address, http://camera-1.xx.xxx/jpg/image.jpg to serve a single image. I've also turned off the user authentication in case that was causing the problem.
I've been running motion for the last 10 minutes and in that time there were 4 instances of the connection refused message, 3 from camera 1 and 1 from camera 2.
Problem 2 : If it's of any help, when the first image is output (from the "wrong" camera) the pixel count is always in the 175000 range as if the entire picture was detected as changed.
Just had a thought. I could turn on precapture and see what sort of output that gives?
I'll let you know..
Put a <NOP> in front of a URL that you do not want twiki to interpret as a URL. Especially if it ends with .jpg or .gif because then Twiki tries to show the picture instead. It is a smart feature normally but in this case not what you wanted -- KennethLavrsen - 02 Apr 2005
--
TheSpike - 02 Apr 2005
TheSpike,
Have you tried using the mjpeg stream using (this should be accurate for Axis cameras):
http://camera-1.xx.xxx/axis-cgi/mjpg/video.cgi?showlength=1
--
ChristopherPrice - 03 Apr 2005
Afternoon..
I've changed the URL to the one you have suggested and the connection refused messages have stopped. 1 down 2 to go!
Problem 2 : I've turned on the pre_capture setting on both of the cameras and set it to 1. What happens now is that there is an image from camera-1, then camera-2, then three images from camera-1. (Probably should have mentioned before that I've got post_capture set to 1 as well - sorry).
I'm going to go back to running version 3.1.17 for the time being as I need reliable detection most of the time. I'll keep an eye on this page for patches and try them out when they appear. If there is anything I can do to help then please let me know.
Regards,
--
TheSpike - 03 Apr 2005
I've just tried v3.1.20_snap4: It works with my Panasonic KX-HCM280, KX-HCM270,
BLC10A, but not my Toshiba IK-WB01A (the Toshiba works with v3.1.16 but crashes all subsequent versions at startup; I've put a mjpegsnif-jpeg dump from it at the mjpegsnif upload area).
This version (v3.1.20_snap4) didn't log any error messages on the console or at /var/log/messages.
--
BruceDurham - 04 Apr 2005
I tried 3.1.20_snap4 this morning.
when I start motion, it says:
...
Thread4 device: http://cam4/axis-cgi/jpg/image.cgi?showlength=1 input: -1
Not a JPEG file: starts with 0x3c 0x48 (repeated 12 times)
Processing thread 0 - config file /usr/local/etc/motion.conf
...
[4] netcam: entering loop...
[2] netcam: jpeglib decompression failed (repeated 12 times)
...
But camera2 starts getting pictures.
What is maybe more important: I have the same problem as
TheSpike: I see pictures (well, I found one so far) from camera 2 in the filetree of camera 3. I don't know how I can give more info about that?
/2005/04/04/10/49/25-00.jpg
is the first picture of a serie in the cam3 filetree, but taken from camera 2 (the wrong file has no "motion" on the picture),
/2005/04/04/10/49/25-01.jpg
/2005/04/04/10/49/25-02.jpg
/2005/04/04/10/49/25-03.jpg
are pictures of the right camera: cam3.
--
BrechtSamyn - 04 Apr 2005
Sorry to say but things are taking a turn for the worst for me
I've running motion-3.1.20snap4 with netcam patch motion-3.1.20snap-post3 with my two Axis 205 cameras.
Normally when motion is running I can see two motion processes running at about 40-50% CPU. This is the case when I start things up. Motion appears to be captured OK. Two hours later when I come back to my machine all 7 threads are still running but none are using any CPU. When I run motion-control to try and quit motion, while no error is issued motion does not exit and I have to kill -9 things.
I can see nothing in the logs to indicate a problem.
I had top running for that time outputting every 10 seconds to a file. I think I'm right in saying there's two motion processes for each camera. Things start of with all four processes using CPU (2 a lot more than the other two). Very quickly two of the processes stop wanting CPU.
The other two processes then carry one (again 1 using a lot CPU than the other) until they to stop (after 57 minutes 52 seconds processing).
(All through this the amount of CPU being used by that process steadily declined. This could however have been due to the fact that it was getting dark and there was less going on in the image, I don't know).
Point is, eventually all the threads seemed to just hang.
A side point to this. With a quick comparison to the last stable version that I have (3.1.17) a lot more CPU is now being used. Is this a result of the new code or might some gremlins have got in? The jump in CPU concerns me a little as when both the streams were running in the new version my box was at 100% for most of the time, leaving few clicks for anything else to run (and also impacting motion I would guess).
I can't really say if problems 2 and 3 from my previous posts have been fixed as motion seems only to run for about an hour before hanging. (And it's dark at the moment!)
I have the 'top' output for every 10 seconds over this time period if that will be of any use.
I don't know what else I can tell you or do that might be of help?
Regards,
--
TheSpike - 06 Apr 2005
With the newer motion releases, there are two threads per netcam. One is the stream/image reader. The second is the process thread (converts jpeg to internal format).
Also, the reader thread trys to read as fast as the netcam can push data. I don't know if there is any way you can limit the frame rate of Axis 205 camera, but I'd recommend limiting the frame rate if at all possible. For instance, my cameras output one frame per second and my cpu usage is much lower than that.
So, basically, the comparison of the 3.1.17 version to the 3.1.18+ versions is almost comparing apples and oranges. The older curl based netcam code was awfully slow (because of curl). The newer code just plain runs faster and crunches a lot more data in a smaller amount of time, therefore high cpu usage.
If any thread is running at the cpu usage you mentioned, there's generally a problem. I'm still getting reports of lockups and high cpu usage, so there must be a few critical paths in the code that cause this. I'll do a visual trace of the code this weekend to see if I can find any where this might be happening.
--
ChristopherPrice - 07 Apr 2005
OK.. There is a throttle on the Axis 205 camera. I've set it to 3 frames per second and this has given me acceptable CPU usage. Excellent.
Running the new software. Motion-3.1.19snap4 with netcam patch post 4.
Been running for a day. Kept running all day. No problems. Excellent.
Detected motion. Output the pictures all in the correct order. Excellent.
When there was no motion it didn't detect any (problem 2 from 1st April). Excellent.
I have to say that I think, as it is at the moment, it all seems to work! Motion is being detected correctly and output. I don't appear to be getting dead threads. The motion-control interface appears to be working OK. I'm not getting random snapshots taken.
It all appears to be good. I think the netcam code is at last stable (at least on my system for today). I'll keep running and let you know of any issues.
Thanks again for all the effort and I hope you don’t think I was just moaning all the time!
--
TheSpike - 10 Apr 2005
There is a support question related to the new Netcam code:
SupportQuestion2005x04x24x073352.
I assigned it to you Kit. I also asked him to provide a binary dump.
--
KennethLavrsen - 24 Apr 2005
New bug and with a proposed easy fix.
BugReport2005x05x06x174416
I will implement this in my sources in the next snap.
--
KennethLavrsen - 09 May 2005