Dilate9 Speed Patch
Introduction
This patch was created to improve the speed of the
dilate9
function in alg.c. As can be seen in the execution profile when motion is being detected, the function is pretty CPU intensive:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
18.83 96.56 36.12 193 187.16 187.16 dilate9
(See
MotionProfiling for more info and the example profile where the above snippet was taken from.)
When writing the patch, a bug in the original function was detected. The patch contains a fix for the bug.
Description of Patch
The patch has two purposes. The first and foremost purpose is to increase the speed of the
dilate9
function. The current function is slow, mainly because it performs the same calculations several times. I wrote a test program that compares the current/old function and the optimized/new function:
Running 2000 iterations of old_dilate9 with image size 320x240: 42.75 ms/iteration
Running 2000 iterations of new_dilate9 with image size 320x240: 10.89 ms/iteration
We can also compare the entry in the execution profile (see above):
% cumulative self self total
time seconds seconds calls ms/call ms/call name
9.51 54.04 7.05 144 48.96 48.96 dilate9
As can be seen, the optimized function runs
nearly 75% faster regardless of how we measure. The speed improvement is achieved mainly by cutting down on the number of statements executed in the inner loop.
The second, and also very important, purpose is to fix a bug detected in the current
dilate9
function. The bug occurs because the function treats the image as an array of (signed)
char
, and uses the macro
MAX(x, y)
which compares the absolute values of its two operands. This has the effect that luminance (Y) values above 127 may be considered smaller than luminance values below 127 in some cases. See the
mailing list discussion for more info.
Installation of Patch
The installation is very straightforward:
-
tar xzf motion-3.1.18_snap4.tar.gz
-
cd motion-3.1.18
-
zcat ../motion-3.1.18_snap4-dilate9.patch.gz | patch -p1
-
./configure
and make
.
Testing and Validation
Since I don't have any test pictures for which I know the expected result after running the function, I created a program that randomly generates a picture, runs both the old and the new functions on it, and compares the results:
Testing accuracy of new_dilate9 compared to old_dilate9; 15000 iterations with image size 320x240: all ok
In other words, the new function generated the same result as the old in 15000 random cases. Note that in this test, the bug mentioned above had been fixed in the old function as well.
--
PerJonsson - 11 Nov 2004
I will post a patch for the
dilate5
function next week as well. It can be optimized in a similar way as the
dilate9
function.
--
PerJonsson - 12 Nov 2004
I have already added this patch to my source tree.
And I have released a snapshot release with it.
http://www.lavrsen.dk/twiki/bin/view/Motion/MotionRelease3x1x18snap6
Excellent job.
--
KennethLavrsen - 12 Nov 2004