BUG: File Descriptor leak
while working with onmpeg parameter, I have found that some
file descriptors were leak to sub-process.
It can have several effects:
- a good subprocess can wait forever that a file is not used anymore (I am in this case)
- netcam may not restart after a crash (device is busy)
- an evil or buggy subprocess can play with netcam or opened jpg, mmpeg...
The boggus functions in event.c are exec_command, send_mail, send_sms
(functions that are using fork and an exec* function)
The three functions need to call
- ffmpeg_close(cnt->ffmpeg_new)
- ffmpeg_close(cnt->ffmpeg_motion)
- ffmpeg_close(cnt->ffmpeg_timelapse)
- netcam_close(netcam)
send_sms and send_mail also forgot to call vid_close().
Test case
Add in motion.conf
onmpeg /usr/sbin/lsof -n >> /tmp/lsof.out
Example
motion 6691 kmaster 5w REG 8,24 722190 4514570 /mnt/data/webcam/2005/0321/20050321-081240-03.avi
sh 6716 kmaster 5w REG 8,24 722190 4514570 /mnt/data/webcam/2005/0321/20050321-081240-03.avi
Environment
Motion version: |
3.1.19 |
ffmpeg version: |
0.4.9-pre1 |
Shared libraries: |
curl, xmlrpc, ffmpeg, mysql, postgresql |
Server OS: |
Fedora Core 3, kernel 2.6.10 |
--
ChristopheGRENIER - 21 Mar 2005
Follow up
I have been playing with this for several hours.
I tried this code in exec_command():
#ifdef HAVE_FFMPEG
if (cnt->ffmpeg_new) ffmpeg_close(cnt->ffmpeg_new);
if (cnt->ffmpeg_motion) ffmpeg_close(cnt->ffmpeg_motion);
if (cnt->ffmpeg_timelapse) ffmpeg_close(cnt->ffmpeg_timelapse);
#endif /* HAVE_FFMPEG */
The is not enough. In this example I have timelapse enabled on 9 cameras. And it seems all open exec forks carry the same file descriptors which then needs to be closed in each. (output from your testcase)
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 34w REG 253,0 18034 9930483 /usr/local/apache2/htdocs/cam5/2005032523-timelapse.mpg
sh 16833 root 34w REG 253,0 18034 9930483 /usr/local/apache2/htdocs/cam5/2005032523-timelapse.mpg
sh 16836 root 34w REG 253,0 18034 9930483 /usr/local/apache2/htdocs/cam5/2005032523-timelapse.mpg
sh 16837 root 34w REG 253,0 18034 9930483 /usr/local/apache2/htdocs/cam5/2005032523-timelapse.mpg
sh 16842 root 34w REG 253,0 18034 9930483 /usr/local/apache2/htdocs/cam5/2005032523-timelapse.mpg
sh 16843 root 34w REG 253,0 18034 9930483 /usr/local/apache2/htdocs/cam5/2005032523-timelapse.mpg
sh 16848 root 34w REG 253,0 18034 9930483 /usr/local/apache2/htdocs/cam5/2005032523-timelapse.mpg
sh 16849 root 34w REG 253,0 18034 9930483 /usr/local/apache2/htdocs/cam5/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 41w REG 253,0 25898 9228223 /usr/local/apache2/htdocs/cam4/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 38w REG 253,0 9515 10716748 /usr/local/apache2/htdocs/cam2/2005032523-timelapse.mpg
sh 16843 root 38w REG 253,0 9515 10716748 /usr/local/apache2/htdocs/cam2/2005032523-timelapse.mpg
sh 16848 root 38w REG 253,0 9515 10716748 /usr/local/apache2/htdocs/cam2/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 35w REG 253,0 4409 8735556 /usr/local/apache2/htdocs/cam6/2005032523-timelapse.mpg
sh 16836 root 35w REG 253,0 4409 8735556 /usr/local/apache2/htdocs/cam6/2005032523-timelapse.mpg
sh 16837 root 35w REG 253,0 4409 8735556 /usr/local/apache2/htdocs/cam6/2005032523-timelapse.mpg
sh 16843 root 35w REG 253,0 4409 8735556 /usr/local/apache2/htdocs/cam6/2005032523-timelapse.mpg
sh 16848 root 35w REG 253,0 4409 8735556 /usr/local/apache2/htdocs/cam6/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 33w REG 253,0 38604 8507904 /usr/local/apache2/htdocs/cam9/2005032523-timelapse.mpg
sh 16833 root 33w REG 253,0 38604 8507904 /usr/local/apache2/htdocs/cam9/2005032523-timelapse.mpg
sh 16836 root 33w REG 253,0 38604 8507904 /usr/local/apache2/htdocs/cam9/2005032523-timelapse.mpg
sh 16837 root 33w REG 253,0 38604 8507904 /usr/local/apache2/htdocs/cam9/2005032523-timelapse.mpg
sh 16843 root 33w REG 253,0 38604 8507904 /usr/local/apache2/htdocs/cam9/2005032523-timelapse.mpg
sh 16848 root 33w REG 253,0 38604 8507904 /usr/local/apache2/htdocs/cam9/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 37w REG 253,0 12931 10094976 /usr/local/apache2/htdocs/cam1/2005032523-timelapse.mpg
sh 16843 root 37w REG 253,0 12931 10094976 /usr/local/apache2/htdocs/cam1/2005032523-timelapse.mpg
sh 16848 root 37w REG 253,0 12931 10094976 /usr/local/apache2/htdocs/cam1/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 36w REG 253,0 16806 9522034 /usr/local/apache2/htdocs/cam7/2005032523-timelapse.mpg
sh 16843 root 36w REG 253,0 16806 9522034 /usr/local/apache2/htdocs/cam7/2005032523-timelapse.mpg
sh 16848 root 36w REG 253,0 16806 9522034 /usr/local/apache2/htdocs/cam7/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 39w REG 253,0 13586 10814151 /usr/local/apache2/htdocs/cam3/2005032523-timelapse.mpg
sh 16848 root 39w REG 253,0 13586 10814151 /usr/local/apache2/htdocs/cam3/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 40w REG 253,0 24541 10749052 /usr/local/apache2/htdocs/cam8/2005032523-timelapse.mpg
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 42w REG 253,0 17268 10094991 /usr/local/apache2/htdocs/cam1/01-20050325233012.avi
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
motion 16806 root 43w REG 253,0 15802 10094994 /usr/local/apache2/htdocs/cam1/01-20050325233012m.avi
--
KennethLavrsen - 25 Mar 2005
Fix record
This fix will be known as the BFG9000 fix (ever played Quake?).
In send_sms, send_mail and exec_command I added this
if (!fork()) {
int i;
/* Detach from parent */
setsid();
/* Close any file descripter except console because we will like to see error messages */
for (i=getdtablesize(); i>2; --i)
close(i);
exec....
That cleans up any mess effectively and without too much cpu overhead I believe.
Implemented in motion-3.2.1_snap10 and motion-3.1.20_snap3.