BUG: Zombie processes from onsave and onmpeg scripts
I'm still getting a lot of zombie processes from onsave and onmpeg when I run motion 3.1.17 in daemon (-D) mode. Here are a few examples from my 'ps -ef'
bruce 13341 11891 0 18:44 ? 00:00:00 [onsave.sh <defunct>]
bruce 13427 11891 0 18:45 ? 00:00:00 [onsave.sh <defunct>]
bruce 13493 11891 0 18:46 ? 00:00:00 [onsave.sh <defunct>]
bruce 13722 11891 0 18:48 ? 00:00:00 [onsave.sh <defunct>]
bruce 13814 11891 0 18:49 ? 00:00:00 [onmpeg.sh <defunct>]
bruce 13816 11891 0 18:49 ? 00:00:00 [onsave.sh <defunct>]
.
.
.
bruce 26191 11891 0 21:00 ? 00:00:00 [onsave.sh <defunct>]
bruce 26630 11891 0 21:05 ? 00:00:00 [onsave.sh <defunct>]
bruce 26726 11891 0 21:06 ? 00:00:00 [onmpeg.sh <defunct>]
bruce 26728 11891 0 21:06 ? 00:00:00 [onsave.sh <defunct>]
My scripts only do string manipulation and save the result to a file; I'm running: Linux 2.4.21-15.0.4.EL #1 Sat Jul 31 01:33:50 EDT 2004 i686 i686 i386 GNU/Linux
-brucedur@pacbell.net
--
BruceDurham - 17 Oct 2004
I really cannot reproduce this condition. Motion hangs when I have USB errors but I never have problems with my onsave perl scripts -- KennethLavrsen - 22 Oct 2004
Test case
Environment
Motion version: |
3.1.17 |
Shared libraries: |
curl, xmlrpc, ffmpeg, mysql, postgresql |
Server OS: |
Linux 2.4.21-15.0.4.EL |
--
KennethLavrsen - 24 Oct 2004 on behalf of
BruceDurham
Follow up
Note that you can attach your shell scripts and motion config files to the bug topic.
--
KennethLavrsen - 25 Oct 2004
I'm using bash scripts.
--
BruceDurham - 27 Oct 2004
I need an example bash script of those that hang. I use perl scripts and they never hang like this.
--
KennethLavrsen - 27 Oct 2004
Here is an example of the 'onsave' script:
#!/bin/bash
jpeg_file=$1 # arg1 is motion supplied 'jpeg' path.
camera_root=${jpeg_file%/*/*/*/*/*/*} # Clip trailing YY/MM/DD/HH/MM/file.ext
tmp_trans_file="$camera_root/tmp.trans" # Define temporary transaction file.
echo "<jpeg>$jpeg_file</jpeg>" >> $tmp_trans_file # Add jpeg file info.
exit
--
BruceDurham - 22 Nov 2004
Ugh...line-feed snafu. Sorry about that; Let me know if I need to repost.
--
BruceDurham - 22 Nov 2004
Thanks. I will try when I have time for the experiments.
Note that you can just press the "edit" link and correct anything. The text box and "Add Comment" button is an easy way to add to the page but this web site is a Wiki and you can edit anything anyone have entered by pressing "edit".
When you want to paste code it is smart to enclose it in <VERBATIM> </VERBATIM>. Then it gets nice to look at.
--
KennethLavrsen - 25 Nov 2004
I have tried this script which is similar to yours.
#!/bin/bash
jpeg_file=$1
camera_root=/usr/local/apache2/htdocs/cam5
tmp_trans_file="$camera_root/tmp.trans"
echo "<jpeg>$jpeg_file</jpeg>" >> $tmp_trans_file # Add jpeg file info.
exit
And I do not get any hanging processes. So I cannot see that it is Motion that creates the problem.
--
KennethLavrsen - 14 Dec 2004
I have some more info: I looked at /var/log/messages and found at lot of these:
Jan 9 04:03:49 cpe1 kernel: application bug: motion(4290) has SIGCHLD set to SI
G_IGN but calls wait().
Jan 9 04:03:49 cpe1 kernel: (see the NOTES section of 'man 2 wait'). Workaround
activated.
My guess is that this message is listed for every zombie created.
The 'man 2 wait' command yields:
...
The original POSIX standard left the behaviour of setting SIGCHLD to
SIG_IGN unspecified. Later standards, including SUSv2 and POSIX
1003.1-2001 specify the behaviour just described as an XSI-compliance
option. Linux does not conform to the second of the two points just
described: if a wait() or waitpid() call is made while SIGCHLD is being
ignored, the call behaves just as though SIGCHLD were not being
ignored, that is, the call blocks until the next child terminates and
then returns the PID and status of that child.
...
Additional info: I utilize onsave, onmpeg and onffmpegclose concurrently. Might there be race/collision problems that only occur if some combination of these script (exec) launching features are used on
RedHat?
--
BruceDurham - 11 Jan 2005
I am not sure this is a fix but we changed the signal handler to resolve the RedHat bug you mention.
Give the new
MotionRelease3x1x18snap10 a try and see if this changed something for your problem.
--
KennethLavrsen - 15 Jan 2005
Bruce has reported that initial tests shows that the problem has been resolved with 3.1.18_snap10.
--
KennethLavrsen - 16 Jan 2005
Hi,
I don't know if this issue has been finally solved, but I got exactly the same pb on a
RedHat 9, Motion version: 3.2.3 in daemon (-D), Server OS: Linux 2.4.31-grsec with [sh
] each time a 'on_picture_save echo -en "\007">/dev/console' is called... (to make a simple beep).
If there's a solution, thanks to tell me...
FRed
-- FredericLOIRETTE - 06 Oct 2005
Fix record
in motion.c the sig_handler() got this case entry added:
case SIGCHLD: {
#ifdef WNOHANG
while (waitpid(-1, NULL, WNOHANG) > 0) {};
#endif /* WNOHANG */
signal(SIGCHLD, sig_handler);
return;
}
and the signal(SIGCHLD, SIG_IGN);
are replaced by signal(SIGCHLD, sig_handler);
This makes the handling of signals from a terminating child process POSIX compliant and also seems to be more robust in removing zombies.
-- KennethLavrsen - 16 Jan 2005