Motion - Bug Report 2006x 04x 07x 164654

BUG: Random child threads appear to get killed

After an undetermined amount of time, roughly 1 hour up to 48 hours, one or more camera threads will suddenly stop functioning. The "webcam_port" will stop, netcam requests will cease to be made on the network connection, and motion output will cease for the thread.

After running motion using the following command line for over 24 hours I was able to capture some logged text referring to a thread going offline.

/usr/bin/motion -n -d 99 2>&1 | tee motion-20060406.log

Here is part of the output log that shows the last messages emitted by the thread before it went offline.

[18] Received first header
[18] Received first header
[18] Received first header
[18] Received first header
[18] Received first header
[18] Non-streaming camera
[18] Received first header
[18] Content-length present
[18] Calculated frame time 272312.531250
[18] ***new pic delay successful***
[18] avcodec_open - could not open codec: Connection reset by peer
[NULL @ 0x7a7610]insufficient thread locking around avcodec_open/close()
[18] ffopen_open error creating file [/var/www/html/cam/ac-cis3/2006/04/07/092638.avi]: Connection reset by peer
[18] netcam camera handler: finish set, exiting
[18] Thread exiting

Test case

I'm not sure if there is any way that I can exacerbate the problem, this simply occurs at random intervals and to random netcam threads. The above errors exist even though there are many other threads that are functioning normally and do not get affected. The only way to bring the netcam thread back online is to restart motion entirely.

I just saw the bug rel="nofollow" entered by MadjidLaouar. This may be the same bug. I have not been able to resolve it, and I've captured some data on the failure. I have too many netcams to be able to run multiple instances of motion and manage them that way.

Environment

Motion version: 3.2.6
ffmpeg version: 0.4.9
Shared libraries:  
Server OS: Red Hat Enterprise Linux AS release 3 (Taroon Update 6), kernel version 2.4.21-27.0.2.ELsmp

-- PeterS - 07 Apr 2006

Follow up

No crashes over this weekend. Perhaps this is entirely related to ffmpeg and how many movies have/are generated.

-- PeterS - 10 Apr 2006

Fix record

Check out the fix just made for BugReport2006x03x10x000151.

For your network cameras this is probably the main root cause why the camera threads freeze.

-- KennethLavrsen - 16 Apr 2006

Follow up

I now have the new (recommended) code in place and running. We will see. If I lose a thread, I'll have to restart it in debug mode (last time it created about 2.4GB of logs.) Please keep this open in the meantime. I'm not 100% sold on this being the fix as the "NULL" seems, to me, to be related to the movie creation and/or ffmpeg rather than with netcam access--however I do not discount that this may be the fix. I'll be watching it.

-- PeterS - 17 Apr 2006

Follow up

Bad news. A thread died on me about 30 minutes ago (09:55-ish USCDT).. I'm about to restart Motion with debug logging turned on (very painful--it tends to fill up syslog..) Looks like the newer version doesn't necessarily fix this issue, but I don't have the debug log to verify quite yet.

-- PeterS - 19 Apr 2006

Comment

Sounds a lot to me like http://www.lavrsen.dk/twiki/bin/view/Motion/BugReport2005x11x22x172816 , don't know if you could be the original poster of this problem? - I'm the 'Other Bug' there.

PeterS do you use timelapse, if yes with what interval - How many Cams do you run?

-- TheOtherBug - 19 Apr 2006

Follow up

Actually no, I'm not the original poster of that problem and also I was not aware of that bug report. It does sort-of sound like the same thing except in my case I'm running about 18 cameras off one instance now, only one of the cameras is doing timelapse, and also Motion is running as root, so permissions are not likely to be a problem. But thank you for noticing. (Btw I fixed the formatting of your comment. Only a minor change really.)

-- PeterS - 19 Apr 2006

I reviewed the bug report a bit more carefully and I see why you'd think they are similar. This Guest user is getting the same error message as I've been getting, namely: avcodec_open - could not open codec: Connection reset by peer

So in that respect it is similar. What to do? I don't know. I still think it has something to do with ffmpeg and/or avcodec..

-- PeterS - 19 Apr 2006

I am practically convinced it is an issue with ffmpeg rather than Motion itself. It seems someone has had some similar issues as I in the past [1]. Another disconcerting thing I found in the "documentation" for ffmpeg [2]:

int avcodec_open...
...
not thread save!
...

I take this to mean that potentially under specific circumstances, you might have trouble if you call avcodec_open twice at the same time.. Comments?

[1]  http://koala.ilog.fr/twikiirc/bin/irclogger_log/motion?date=2005-06-25,Sat&sel=283#l279
[2]  http://mplayerhq.hu/~michael/ffmpeg-doxy/libavcodec_2utils_8c.html#a44

-- PeterS - 19 Apr 2006

Digging through the ffmpeg code I see the following.

int avcodec_open(AVCodecContext *avctx, AVCodec *codec)
{
    int ret= -1;

    entangled_thread_counter++;
    if(entangled_thread_counter != 1){
        av_log(avctx, AV_LOG_ERROR, "insufficient thread locking around avcodec_open/close()\n");
        goto end;
    }
...

Ok, so let's suppose that Motion doesn't provide sufficient thread locking (I don't know yet as I haven't looked at where Motion is actually calling up avcodec_open..) And let's also suppose that AV_LOG_ERROR goes nowhere (it sure would be nice to have Motion capture that output...) So in the odd case that two avcodec_open's go off at the same moment from Motion would someone expect at least one of the avcodec_open's to return a "-1", crash its thread with a NULL pointer, and emit "avcodec_open - could not open codec: Connection reset by peer"??

-- PeterS - 19 Apr 2006

I do not believe we have any thread locking protecting the avcodec_open/close().

Naturally we cannot let Motion threads waiting for the entire avcodec_open and avcodec_close to have happened. That would mean that one thread recording a movie would block all the others. But maybe it is only the avcodec_open and avcodec_close functions that have to be protected? It could seem so. And then it would be possible to protect just those calls by a mutex.

I am not so much expert in mutexes but perhaps Angel or Bill feels like having a quick go. Or someone else are welcome to post a patch here. See MotionSubversion for the easy way to get the latest sources and produce a patch file.

-- KennethLavrsen - 20 Apr 2006

Follow up

Although I generally have to restart Motion about once a day now, today a thread died in an entirely new way which I've not seen before.

Apr 26 09:35:11 swlx12 motion: [2] avcodec_open - could not open codec: File exists
Apr 26 09:35:11 swlx12 motion: [2] ffopen_open error creating file [/var/www/html/cam/ac-dc1-630-2/2006/04/26/093510.avi]: File exists
Apr 26 09:35:11 swlx12 motion: [2] netcam camera handler: finish set, exiting

The really strange thing is that file does not exist at all. There are a few .avi's before that time and obviously none after. I hope this might provide a bit more information on the problem I'm experiencing.

-- PeterS - 26 Apr 2006

You guys are going to love this. I'm giving the following a try right now on my massive # of cameras system.

diff -ur motion-20060420-051001-A/ffmpeg.c motion-20060420-051001-B/ffmpeg.c
--- motion-20060420-051001-A/ffmpeg.c   2006-04-19 22:10:02.000000000 -0500
+++ motion-20060420-051001-B/ffmpeg.c   2006-04-26 16:52:08.000000000 -0500
@@ -317,11 +317,19 @@
        /* Set the picture format - need in ffmpeg starting round April-May 2005 */
        c->pix_fmt = PIX_FMT_YUV420P;
 
+       /* Get a mutex lock. */
+       pthread_mutex_lock(&global_lock);
+
        /* open the codec */
        if (avcodec_open(c, codec) < 0) {
+               /* Release the lock. */
+               pthread_mutex_unlock(&global_lock);
                motion_log(LOG_ERR, 1, "avcodec_open - could not open codec");
                ffmpeg_cleanups(ffmpeg);
                return (NULL);
+       } else {
+               /* Release the lock. */
+               pthread_mutex_unlock(&global_lock);
        }
 
        ffmpeg->video_outbuf = NULL;

I seriously have no idea if it will work, only time will tell. Hopefully I don't have to reboot this system at some point because of my change!

-- PeterS - 26 Apr 2006

Follow up

Hi Peter

I think you are right about that it has to do with something accessing ffmpeg at the same time.

After my comment of the 19/4 i added som threads to the system i'm testing on with the nessesary differences but getting the image stream from the same camera.

I did it to simulate having more cams and the next day one of the threads were gone dead with this in the log:

Apr 20 10:01:34 localhost motion: [5] avcodec_open - could not open codec: Operation now in progress
Apr 20 10:01:34 localhost motion: [5] ffopen_open error creating file [/home/klaus/motion/20060420100134-01-cam5.avi]: Operation now in progress
Apr 20 10:01:35 localhost motion: [5] netcam camera handler: finish set, exiting

At this time one of the other threads from that cam has recorded a cat walking by. This is points in the direction that it happens of the reason you think and it also make sense to the problems i have reportet in that other bugreport if timelapse sometimes gets to ffmpeg at the same time, the problem is that i can not provoke this error not matter how hard i try it just happen by itself sometimes.

We know that there are no problem in recording motion from more threads at the same time so it must be something that happen in some filecreate function or similar in ffmpeg if this is called at the absolutely same time from more threads.

I have no idea what to do if it's something in ffmpeg but a possible solution could be making motion create/touch the file before calling ffmpeg when a event starts or timelapse filerollover is made?

The big problem is still that the thread dies totaly!

Hope this writing makes sense to anybody. wink

-- TheOtherBug - 27 Apr 2006

Did anyone try Peter's patch? I never see the error so I cannot say if it works.

-- KennethLavrsen - 03 May 2006

Sorry no - I need to get it cut out in big letters what exactly to do to try it out .. but i would like to give it a try smile

-- TheOtherBug - 04 May 2006

Bad news--one of my threads died today with the same error message. I'm currently trying to trace down where this message is coming from exactly.

May  8 12:31:34 swlx12 motion: [19] avcodec_open - could not open codec: Connection reset by peer
May  8 12:31:34 swlx12 motion: [19] ffopen_open error creating file [/var/www/html/cam/ac-cis3/2006/05/08/123134.avi]: Connection reset by peer
May  8 12:31:34 swlx12 motion: [19] netcam camera handler: finish set, exiting

-- PeterS - 08 May 2006

I'm determined to get the appropriate information out of the ffmpeg libraries now. I'm toying with hooking the avcodec log into motion's log with the following code. I just don't think you can rely on "errno" for the thread with regards to avcodec. I've added/changed the following bits in the hopes of implementing an ffmpeg log handler routine. I've compiled it locally here and am currently running it in production (20060509 20:09 CST) so we'll see how it goes!!!

diff -u motion-20060509-051001C/ffmpeg.c motion-3.2.7/ffmpeg.c
--- motion-20060509-051001C/ffmpeg.c    2006-05-08 22:10:02.000000000 -0500
+++ motion-3.2.7/ffmpeg.c       2006-05-09 20:03:15.000000000 -0500
@@ -129,6 +129,7 @@
 void ffmpeg_init()
 {
        av_register_all();
+       av_log_set_callback( ffmpeg_avcodec_log );

        /* Copy the functions to use for the append file protocol from the standard
         * file protocol.

@@ -129,6 +129,7 @@
 void ffmpeg_init()
 {
        av_register_all();
+       av_log_set_callback( ffmpeg_avcodec_log );

        /* Copy the functions to use for the append file protocol from the standard
         * file protocol.
@@ -618,4 +627,11 @@
        return;
 }

+static callback *ffmpeg_avcodec_log(void *, int val, const char * msg, va_list ap)
+{
+       motion_log(LOG_ERR, val, msg, ap);
+}
+
+
+
 #endif /* HAVE_FFMPEG */

diff -u motion-20060509-051001C/ffmpeg.h motion-3.2.7/ffmpeg.h
--- motion-20060509-051001C/ffmpeg.h    2006-05-08 22:10:02.000000000 -0500
+++ motion-3.2.7/ffmpeg.h       2006-05-09 19:50:45.000000000 -0500
@@ -66,4 +66,7 @@
 /*Deinterlace the image. */
 void ffmpeg_deinterlace(unsigned char *, int, int);

+/*Setup an avcodec log handler. */
+static callback *ffmpeg_avcodec_log(void *, int val, const char * msg, va_list ap);
+
 #endif /* _INCLUDE_FFMPEG_H_ */

-- PeterS - 10 May 2006

I am following this topic with great interest. It will be interesting to see what you find out. I personally never see the same issue so I cannot do much to help reproducing it.

I fixed some formatting and restored the TopicTitle in the form below. I wonder why it disappears all the time.

I have noticed that you both fight with TWiki to get the formatting right and I see a lot of <BR> tags. They are rarely needed. To get a new paragraph just make an empty line.

<verbatim> tags are the right thing to use when showing code, patches, error messages. But they only work right when they have their own line. If you want to show a few words in monotype inside a paragraph you can use equal signs instead.

Good luck with the bug hunting

-- KennethLavrsen - 11 May 2006

Good news. I haven't had a single glitch all week. As of now, 20060519 17:05 (CDT) I haven't seen any problems since I started Motion at 20060509 19:56 (CDT).. That is pretty impressive. Will keep all posted on the progress. If it lasts another week I'd say the changes are golden. Although I haven't seen any avicodec log output yet.

-- PeterS - 19 May 2006

Still running solid as of today. No glitches. I'm thinking these changes are good.

-- PeterS - 22 May 2006


I have committed both patches to SVN. And I will ask people on the mailing list to give it a good testing.

Thanks a lot for the patches.

SVN 74 - daily snap from 20060523 CANCELLED!!

-- KennethLavrsen - 22 May 2006

Hmmm

Just tried to build it.

I get errors.

Version:  3.2.7
Platform: Linux (if this is incorrect, please read README.FreeBSD)

Compiling Motion object files...
--------------------------------------------------------------------------------
gcc -g -O2 -DHAVE_FFMPEG -I/usr/include/ffmpeg -DHAVE_MYSQL -I /usr/include/mysql -DHAVE_PGSQL -I/usr/include -DTYPE_32BIT="int" -DHAVE_BSWAP   -march=i686 -Wall -DVERSION=\"3.2.7\" -D_REENTRANT -Dsysconfdir=\"/usr/local/etc\"   -c -o motion.o motion.c
In file included from motion.c:9:
ffmpeg.h:70: error: syntax error before '*' token
ffmpeg.h:70: warning: type defaults to 'int' in declaration of 'ffmpeg_avcodec_log'
ffmpeg.h:70: warning: data definition has no type or storage class
make: *** [motion.o] Error 1

Even when I tried to fix that I get plenty of more errors.

The patch that redefines the logging is completely wrong. The function definition lacks a variable name for the first parameter. The type mismatches. The type callback is not found anywhere.

This you have to done all over.

I reverted the changes back to SVN 73 and this is then 75.

-- KennethLavrsen - 22 May 2006

You are correct, the logging patch needs work. The mutex patch is good, however, and will likely not change. I will work on the logging patch. I've looked at the code I've been running for the past two weeks and it is indeed only the mutex patch. Still I feel the logging patch is needed and the two patches do not rely on each other.

-- PeterS - 22 May 2006

No worries. It was not too hard to fix. I have BOTH patches included in SVN 76. Daily snap motion-20060523-015920

-- KennethLavrsen - 22 May 2006

Thank you, Kenneth. I was wondering what I was going to do with regards to the logging. Like you've said before, the ffmpeg documentation is very spotty. I'm getting interesting messages from the avcodec library with that build. Do you have any idea what this means?

May 22 19:10:40 swlx12 motion: [15] warning, clipping 210209244 dct coefficients to 1..65536 : Operation now in progress
May 22 19:10:40 swlx12 motion: [15] warning, clipping 210209244 dct coefficients to -655359..983038 : Operation now in progress

I'm 99% sure we can ignore the " : Operation now in progress" and it would probably be a good idea to turn off the errno() code with regards to avcodec.. But "clipping X coefficients" ? Is this a problem? I'm getting quite a lot of them. I'm also now seeing this:

May 22 19:13:49 swlx12 motion: [1] warning, clipping 26749404 dct coefficients to -327665..-131083 : Cannot allocate memory
May 22 19:13:50 swlx12 motion: [1] warning, clipping 26749404 dct coefficients to 65538..0 : Cannot allocate memory

Perhaps some of this provides an insight into part of the thread dieing problem? Curious. Within the next day I'll massage the ffmpeg logging code to prefix these messages with where they came from, they just sort of hang out there currently.

-- PeterS - 23 May 2006

Fixed up the message outbound from the avcodec stuff (FfmpegAvicodecLoggingFix2.diff). Please review it. I wasn't sure what anyone would like to see as the prefix text, so I just used something. The avcodec messages now look like the following.

May 23 09:52:46 swlx12 motion: [15] ffmpeg_avcodec_log: warning, clipping 193087964 dct coefficients to -327844..-65524
May 23 09:52:46 swlx12 motion: [15] ffmpeg_avcodec_log: warning, clipping 193087964 dct coefficients to -4522114..1441829

-- PeterS - 23 May 2006

I really don't know why, exactly, but motion does not seem to like to stay up after it makes the above messages. I'm looking into that. Hmm. Actually it doesn't stay up during the first few seconds of running. ? Strange. It is quietly exitting and I do not know why.

-- PeterS - 23 May 2006

Ok, I think the way I'm handling memory, etc in that last patch is just not acceptible. If anyone can come up with a better way of manipulating the error message coming in from the avcodec to add/prefix our message before outputting it, I'd like to know, because my version just doesn't work. I think. Well, actually it does output something, but like I said, it seems to break motion overall. The limitted version of it that doesn't prefix the data seems to work fine, it just is not as verbose.

-- PeterS - 23 May 2006

Kenneth, in the interest of simplicity, I've created a patch thread topic for the logging portion of this. I feel that the mutex locking issue has been fixed, while the logging of the avicodec output is seperate. If you'd like to edit this page and clean out some of the text I've added regarding just the logging, that is great. Logging Patch Thread

-- PeterS - 23 May 2006

Follow up

Still looking good. Haven't had any thread failures in quite a while.

-- PeterS - 30 May 2006

Follow up

I have tried to implement this patch, but it seems like instead of random threads getting killed, it all gets killed.. No error messages, so I dont know where to find the problem..

-- GunnarSkjold - 15 Aug 2006

Did you try a daily snapshot? A lot has happened since 3.2.6 in the netcam code.

-- KennethLavrsen - 15 Aug 2006

Sorry about that. Tried a daily snapshot and now it works perfect. Thanks.

-- GunnarSkjold - 19 Aug 2006

Fix up

Fixed from svn 111 version , please reopen with detailled information if problem remains.

-- AngelCarpintero - 03 Oct 2006

In 3.2.7

-- KennethLavrsen - 20 Oct 2006

BugReportForm edit

TopicTitle Random child threads appear to get killed
BugStatus Released
AssignedBugTo PeterS
SubmittedBy PeterS
Topic attachments
I Attachment Action Size Date Who Comment
FfmpegAvicodecLoggingFix1.diffdiff FfmpegAvicodecLoggingFix1.diff manage 1 K 22 May 2006 - 18:42 UnknownUser Ffmpeg avicodec logging fix or add
FfmpegAvicodecLoggingFix2.diffdiff FfmpegAvicodecLoggingFix2.diff manage 692 bytes 23 May 2006 - 14:56 UnknownUser Part two
FfmpegMutexLocking1.diffdiff FfmpegMutexLocking1.diff manage 647 bytes 22 May 2006 - 18:48 UnknownUser Ffmpeg code mutex locking fix
Topic revision: r32 - 20 Oct 2006, KennethLavrsen
Copyright © 1999-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Please do not email Kenneth for support questions (read why). Use the Support Requests page or join the Mailing List.
This website only use harmless session cookies. See Cookie Policy for details. By using this website you accept the use of these cookies.