Profiling Motion
What is Profiling?
From the
gprof documentation: "Profiling allows you to learn where your program spent its time and which functions called which other functions while it was executing." In other words, profiling is important to know where there is a need for optimization. Optimizations are good, but there's no point in optimizing rarely called code, or code that already is fast. Instead, optimization should be directed at bottlenecks and CPU-hogging code.
How is it Done?
First of all, you need a profiler. Here, I'm going to use gprof, the GNU profiler, which should be available on most Unix-based systems. To profile a program using gprof, here is what you do:
- Alter the makefile, adding
-pg
to both the compile flags and the link flags.
- Compile and build.
- Run the program. A file named
gmon.out
will be produced (provided that the program exits gracefully).
- Run gprof:
gprof ./prog gmon.out
, where prog
of course is the program executable.
(Note:
gmon.out
is the default profile filename, and can be omitted.)
The output appearing on screen should begin with a "flat profile", which shows the amount of time spent in each function, and how often the function was called. For example, the following excerpt from a flat profile shows that the
draw_textn
function was called 2886 times during execution of Motion, and each call took 49ms. The function accounted for 1.9% of the total execution time.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
...
1.90 74.03 1.41 2886 0.49 0.49 draw_textn
...
Profiling Multi-Threaded Programs
The problem with profiling Motion is that Motion is multi-threaded. If you try to profile Motion the way that is described above, you will get a pretty useless flat profile, since all times will be zero. A simple and very elegant solution is outlined in
HOWTO: using gprof with multithreaded applications. Basically, you build a preload library that will wrap each call to
pthread_create
, preparing the newly created thread for profiling. So, the procedure for Motion becomes:
- Build the preload library, gprof-helper (I've attached an archive containing source code and a makefile).
- Alter
Makefile
(after running configure
), adding -pg
to both CFLAGS
and LDFLAGS
.
- Compile and build.
- Run Motion using
LD_PRELOAD=<path to gprof-helper.so> ./motion
(assuming you're in Motion's source directory).
- Exit Motion using, for example, motion-control (to force a graceful exit).
- Run gprof:
gprof ./motion gmon.out
(again, gmon.out
can be omitted).
Now you should see the true flat profile for Motion (as in the short example above).
Example Profiles
Below are two example profiles from profiling Motion 3.1.17 with the updated rotate patch applied. To keep it short, only the first 10 or so entries are included.
Profile when idle
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
69.43 305.93 305.93 motion_loop
14.57 370.14 64.21 8681 7.40 7.40 alg_diff_fast
12.98 427.36 57.21 26046 2.20 2.20 reverse_inplace_quad
2.40 437.94 10.59 26043 0.41 0.41 draw_textn
0.14 438.58 0.64 8682 0.07 6.75 vid_next
0.08 438.91 0.34 8682 0.04 0.04 v4l_next
0.07 439.20 0.29 8682 0.03 6.62 rotate_map
0.05 439.44 0.24 8681 0.03 0.03 webcam_put
0.04 439.61 0.17 8682 0.02 0.02 v4l_set_input
0.03 439.75 0.14 17365 0.01 0.03 event
0.03 439.88 0.13 17362 0.01 0.62 draw_text
Nothing strange here. The heaviest function is
alg_diff_fast
, but 7.4 ms per call isn't too bad, I think.
Profile when detecting motion
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
31.51 60.43 60.43 motion_loop
18.83 96.56 36.12 193 187.16 187.16 dilate9
12.29 120.13 23.57 193 122.13 132.65 alg_labeling
9.60 138.55 18.42 193 95.45 95.45 dilate5
8.86 155.55 17.00 442 38.46 38.46 alg_diff_standard
5.40 165.91 10.36 4854 2.13 2.13 reverse_inplace_quad
4.47 174.49 8.58 1197 7.17 7.17 alg_diff_fast
2.26 178.82 4.33 353 12.27 12.27 erode9
1.65 181.99 3.17 104 30.48 30.48 alg_locate_center
1.41 184.69 2.70 235 11.49 11.49 erode5
1.16 186.91 2.22 104 21.35 21.35 alg_locate
1.14 189.09 2.18 4851 0.45 0.45 draw_textn
Now it's getting more interesting. Apparently
dilate9
,
alg_labeling
and
dilate5
are pretty CPU-intensive functions. In other words, they are the prime candidates for optimizations. The
alg_diff_standard
function is five times slower than
alg_diff_fast
, but then again it's also more precise.
Notes
- For FreeBSD, read the comments below and use the separately attached makefile.
- Don't stop Motion using
kill -9
or by pressing Ctrl-C (in non-daemon mode). Then the profile file won't be created.
--
PerJonsson - 19 Oct 2004
Nice doc Per !
Unfortunately i didn't get working gprof in FreeBSD when i tested ( there's not any gprof-helper version ).
--
AngelCarpintero - 19 Oct 2004
Thanks
So the source cannot be built on FreeBSD? What errors do you get?
--
PerJonsson - 19 Oct 2004
Great doc. I moved it under Misc info. The menu on the front page is getting a little crowded.
I will have good use of this info.
--
KennethLavrsen - 19 Oct 2004
gprof-helper doesn't want to build , i've added the right includes and LD value to compile against linuxthreads but that's not the problem , here it's :
/usr/libexec/elf/ld: cannot find -ldl
--
AngelCarpintero - 20 Oct 2004
I just read something about FreeBSD having the dl* functions in libc, which means that you could remove
-ldl
from the build line. Does that work?
--
PerJonsson - 20 Oct 2004
Yes ! you're right
i've attached the FreeBSD Makefile .
To run in FreeBSD basically is the same :
- -.
LD_PRELOAD=/usr/home/motion/gprof-helper/gprof-helper.so ./motion
- -.
kill -15 <first_motion_pid>
- -.
gprof ./motion motion.gmon > profile.motion
--
AngelCarpintero - 21 Oct 2004
Another important remark about the different steps during the profiling process:
When motion is started like 'LD_PRELOAD=/usr/home/motion/gprof-helper/gprof-helper.so ./motion', the actual profiling data is captured. To see the results, you have to terminate Motion and start 'gprof ./motion'.
--
JoergWeber - 25 Sep 2007
From this information, do you have any recommendations for us normal users to decrease our CPU load? -- Should I disable label or dilute or any other specific despeckle filter?
--
BobSaggeth - 26 Sep 2007
When it comes to CPU usage, the despeckle option is right on top of the list, yes. To reduce load, you can start disabling labelling. For indoor cams, you should not need despeckle anyway - except for very cheap and noisy cams. Smartmask may also need some resources and should be disabled when not needed.
--
JoergWeber - 26 Sep 2007
I did some system profiling while motion was recording an event and I had the webcam open -- On a 1,8ghz
Core2Duo, the system was 50% busy from just 1 camera recording at 25fps 640x480.
Here is the profiling information:
CPU: Core 2, speed 1862.56 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % image name app name symbol name
47835 26.5178 libjpeg.so.62.0.0 libjpeg.so.62.0.0 (no symbols)
40494 22.4483 libavcodec.so.1d.51.38.0 libavcodec.so.1d.51.38.0 (no symbols)
21368 11.8456 motion motion alg_update_reference_frame
18182 10.0794 libc-2.6.1.so libc-2.6.1.so (no symbols)
9593 5.3180 motion motion alg_despeckle
6873 3.8101 mplayer mplayer (no symbols)
4796 2.6587 motion motion alg_locate_center_size
3482 1.9303 libGLcore.so.100.14.11 libGLcore.so.100.14.11 (no symbols)
1792 0.9934 motion motion alg_diff_standard
1790 0.9923 libmozjs.so libmozjs.so (no symbols)
1607 0.8909 motion motion alg_diff
1483 0.8221 motion motion iflood
1450 0.8038 nvidia_drv.so nvidia_drv.so (no symbols)
1204 0.6675 libflashplayer.so libflashplayer.so (no symbols)
1133 0.6281 Xorg Xorg (no symbols)
1013 0.5616 motion motion draw_textn
812 0.4501 nvidia nvidia (no symbols)
651 0.3609 saa7134 saa7134 (no symbols)
621 0.3443 libglib-2.0.so.0.1400.1 libglib-2.0.so.0.1400.1 (no symbols)
376 0.2084 libvte.so.9.2.13 libvte.so.9.2.13 (no symbols)
374 0.2073 jbd jbd (no symbols)
366 0.2029 oprofiled oprofiled (no symbols)
364 0.2018 ext3 ext3 (no symbols)
290 0.1608 vmlinux vmlinux unix_poll
270 0.1497 libfb.so libfb.so fbCopyAreammx
260 0.1441 compiz.real compiz.real (no symbols)
243 0.1347 libglib-2.0.so.0.1400.1 libglib-2.0.so.0.1400.1 (no symbols)
--
RomanGaufman - 04 Dec 2007
And here's the same without the webcam being accessed - why is mpeg4 encoding eating so much CPU? -- mpeg4 encoding done by the windows software takes a mere 2-3% cpu on the same system with the same card.
CPU: Core 2, speed 1862.56 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % image name app name symbol name
201611 67.4810 libavcodec.so.1d.51.38.0 libavcodec.so.1d.51.38.0 (no symbols)
23815 7.9711 motion motion alg_update_reference_frame
18878 6.3186 libc-2.6.1.so libc-2.6.1.so (no symbols)
9200 3.0793 motion motion alg_despeckle
7422 2.4842 libjpeg.so.62.0.0 libjpeg.so.62.0.0 (no symbols)
2831 0.9476 motion motion alg_locate_center_size
2635 0.8820 libmozjs.so libmozjs.so (no symbols)
2330 0.7799 motion motion alg_diff
2021 0.6764 motion motion alg_diff_standard
1377 0.4609 libflashplayer.so libflashplayer.so (no symbols)
1118 0.3742 libGLcore.so.100.14.11 libGLcore.so.100.14.11 (no symbols)
1083 0.3625 motion motion draw_textn
1053 0.3524 libgklayout.so libgklayout.so (no symbols)
848 0.2838 nvidia nvidia (no symbols)
833 0.2788 Xorg Xorg (no symbols)
747 0.2500 ext3 ext3 (no symbols)
719 0.2407 jbd jbd (no symbols)
717 0.2400 saa7134 saa7134 (no symbols)
599 0.2005 oprofiled oprofiled (no symbols)
583 0.1951 libxpcom_core.so libxpcom_core.so (no symbols)
553 0.1851 vmlinux vmlinux copy_user_generic_string
539 0.1804 libglib-2.0.so.0.1400.1 libglib-2.0.so.0.1400.1 (no symbols)
477 0.1597 libxpconnect.so libxpconnect.so (no symbols)
431 0.1443 libavformat.so.1d.51.10.0 libavformat.so.1d.51.10.0 (no symbols)
365 0.1222 nvidia_drv.so nvidia_drv.so (no symbols)
306 0.1024 libglib-2.0.so.0.1400.1 libglib-2.0.so.0.1400.1 (no symbols)
300 0.1004 vmlinux vmlinux __d_lookup
--
RomanGaufman - 04 Dec 2007
BTW, I used oprofile for the above profiling - couldn't get gprof to work straight away so decided to use the next best thing
--
RomanGaufman - 04 Dec 2007
Hmm, I'm having the same with realtime FPS, has anyone tried to use something other than ffmpeg? - Or use ffmpeg differently? -- Wonder how the windows software manages to get the CPU usage so low as the videos are also h263 or divx.
--
BobSaggeth - 27 Jan 2008