Motion - Bug Report 2004x 10x 24x 140610

BUG: Zombie processes from onsave and onmpeg scripts

I'm still getting a lot of zombie processes from onsave and onmpeg when I run motion 3.1.17 in daemon (-D) mode. Here are a few examples from my 'ps -ef'
bruce    13341 11891  0 18:44 ?        00:00:00 [onsave.sh <defunct>]
bruce    13427 11891  0 18:45 ?        00:00:00 [onsave.sh <defunct>]
bruce    13493 11891  0 18:46 ?        00:00:00 [onsave.sh <defunct>]
bruce    13722 11891  0 18:48 ?        00:00:00 [onsave.sh <defunct>]
bruce    13814 11891  0 18:49 ?        00:00:00 [onmpeg.sh <defunct>]
bruce    13816 11891  0 18:49 ?        00:00:00 [onsave.sh <defunct>]
.
.
.
bruce    26191 11891  0 21:00 ?        00:00:00 [onsave.sh <defunct>]
bruce    26630 11891  0 21:05 ?        00:00:00 [onsave.sh <defunct>]
bruce    26726 11891  0 21:06 ?        00:00:00 [onmpeg.sh <defunct>]
bruce    26728 11891  0 21:06 ?        00:00:00 [onsave.sh <defunct>]
My scripts only do string manipulation and save the result to a file; I'm running: Linux 2.4.21-15.0.4.EL #1 Sat Jul 31 01:33:50 EDT 2004 i686 i686 i386 GNU/Linux

-brucedur@pacbellPLEASENOSPAM.net

-- BruceDurham - 17 Oct 2004

I really cannot reproduce this condition. Motion hangs when I have USB errors but I never have problems with my onsave perl scripts -- KennethLavrsen - 22 Oct 2004

Test case

Environment

Motion version: 3.1.17
Shared libraries: curl, xmlrpc, ffmpeg, mysql, postgresql
Server OS: Linux 2.4.21-15.0.4.EL

-- KennethLavrsen - 24 Oct 2004 on behalf of BruceDurham

Follow up

Note that you can attach your shell scripts and motion config files to the bug topic.

-- KennethLavrsen - 25 Oct 2004


I'm using bash scripts.

-- BruceDurham - 27 Oct 2004


I need an example bash script of those that hang. I use perl scripts and they never hang like this.

-- KennethLavrsen - 27 Oct 2004

Here is an example of the 'onsave' script:

#!/bin/bash

jpeg_file=$1                             # arg1 is motion supplied 'jpeg' path.
camera_root=${jpeg_file%/*/*/*/*/*/*}    # Clip trailing YY/MM/DD/HH/MM/file.ext
tmp_trans_file="$camera_root/tmp.trans"  # Define temporary transaction file.

echo "<jpeg>$jpeg_file</jpeg>" >> $tmp_trans_file # Add jpeg file info.
exit

-- BruceDurham - 22 Nov 2004

Ugh...line-feed snafu. Sorry about that; Let me know if I need to repost.

-- BruceDurham - 22 Nov 2004

Thanks. I will try when I have time for the experiments.

Note that you can just press the "edit" link and correct anything. The text box and "Add Comment" button is an easy way to add to the page but this web site is a Wiki and you can edit anything anyone have entered by pressing "edit".

When you want to paste code it is smart to enclose it in <VERBATIM> </VERBATIM>. Then it gets nice to look at.

-- KennethLavrsen - 25 Nov 2004


I have tried this script which is similar to yours.
#!/bin/bash
jpeg_file=$1
camera_root=/usr/local/apache2/htdocs/cam5
tmp_trans_file="$camera_root/tmp.trans"
echo "<jpeg>$jpeg_file</jpeg>" >> $tmp_trans_file # Add jpeg file info.
exit

And I do not get any hanging processes. So I cannot see that it is Motion that creates the problem.

-- KennethLavrsen - 14 Dec 2004

I have some more info: I looked at /var/log/messages and found at lot of these:

Jan  9 04:03:49 cpe1 kernel: application bug: motion(4290) has SIGCHLD set to SI
G_IGN but calls wait().
Jan  9 04:03:49 cpe1 kernel: (see the NOTES section of 'man 2 wait'). Workaround
 activated.

My guess is that this message is listed for every zombie created.

The 'man 2 wait' command yields:

...
       The original POSIX standard left the behaviour of  setting  SIGCHLD  to
       SIG_IGN  unspecified.   Later  standards,  including  SUSv2  and  POSIX
       1003.1-2001 specify the behaviour just described as  an  XSI-compliance
       option.   Linux  does  not conform to the second of the two points just
       described: if a wait() or waitpid() call is made while SIGCHLD is being
       ignored,  the  call  behaves  just  as  though  SIGCHLD  were not being
       ignored, that is, the call blocks until the next child  terminates  and
       then returns the PID and status of that child.
...

Additional info: I utilize onsave, onmpeg and onffmpegclose concurrently. Might there be race/collision problems that only occur if some combination of these script (exec) launching features are used on RedHat?

-- BruceDurham - 11 Jan 2005


I am not sure this is a fix but we changed the signal handler to resolve the RedHat bug you mention.

Give the new MotionRelease3x1x18snap10 a try and see if this changed something for your problem.

-- KennethLavrsen - 15 Jan 2005

Bruce has reported that initial tests shows that the problem has been resolved with 3.1.18_snap10.

-- KennethLavrsen - 16 Jan 2005

Hi, I don't know if this issue has been finally solved, but I got exactly the same pb on a RedHat 9, Motion version: 3.2.3 in daemon (-D), Server OS: Linux 2.4.31-grsec with [sh ] each time a 'on_picture_save echo -en "\007">/dev/console' is called... (to make a simple beep).

If there's a solution, thanks to tell me... FRed

-- FredericLOIRETTE - 06 Oct 2005

Fix record

in motion.c the sig_handler() got this case entry added:


    case SIGCHLD: {
#ifdef WNOHANG
        while (waitpid(-1, NULL, WNOHANG) > 0) {};
#endif /* WNOHANG */
        signal(SIGCHLD, sig_handler);
        return;
    }

and the signal(SIGCHLD, SIG_IGN); are replaced by signal(SIGCHLD, sig_handler);

This makes the handling of signals from a terminating child process POSIX compliant and also seems to be more robust in removing zombies.

-- KennethLavrsen - 16 Jan 2005
Topic revision: r15 - 06 Oct 2005, FredericLOIRETTE
Copyright © 1999-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Please do not email Kenneth for support questions (read why). Use the Support Requests page or join the Mailing List.
This website only use harmless session cookies. See Cookie Policy for details. By using this website you accept the use of these cookies.