Dilate5 Speed Patch
Introduction
This patch is similar to the
DilateNineSpeedPatch, but improves the speed of the
dilate5
function instead. As can be seen in the execution profile when motion is being detected, the function hogs some CPU:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
9.60 138.55 18.42 193 95.45 95.45 dilate5
(See
MotionProfiling for more info and the example profile where the above snippet was taken from.)
As with the
dilate9
function, this function contains a bug due to the use of signed chars (see
DilateNineSpeedPatch). The patch contains a fix for the bug.
Description of Patch
The speed gain ranges from around 40% to around 60%. When dilating a picture with much information, the speed gain is lower, but when dilating a picture that is mostly empty (as the motion images are), the speed gain is higher:
Running 2000 iterations of old_dilate5 with image size 320x240: 23.46 ms/iteration
Running 2000 iterations of new_dilate5 with image size 320x240: 13.40 ms/iteration
(Speed gain: 43%)
Using cleared test buffers for speed test.
Running 2000 iterations of old_dilate5 with image size 320x240: 22.56 ms/iteration
Running 2000 iterations of new_dilate5 with image size 320x240: 8.35 ms/iteration
(Speed gain: 63%)
We can also compare the entry in the execution profile (see above):
% cumulative self self total
time seconds seconds calls ms/call ms/call name
13.31 29.64 5.49 131 41.91 41.91 dilate5
In this "real" case, the optimized function runs 56% faster than the original function.
Installation of Patch
The installation is very straightforward:
-
tar xzf motion-3.1.18_snap6.tar.gz
-
cd motion-3.1.18
-
zcat ../motion-3.1.18_snap6-dilate5.patch.gz | patch -p1
-
./configure
and make
.
Testing and Validation
Testing has been made by running both the old and the new function on randomly generated images:
Testing accuracy of new_dilate5 compared to old_dilate5; 15000 iterations with image size 320x240: all ok
In other words, the new function generated the same result as the old in 15000 random cases. Note that in this test, the bug mentioned above had been fixed in the old function as well.
--
PerJonsson - 13 Nov 2004
One additional note: The patch also removes the MAX macro in favor of the MAX2 macro. The difference is that MAX2 doesn't use the abs function, which makes it faster.
One consequence of this is that the patch changes one row in
dilate9
(yes, the optimized version) as well.
--
PerJonsson - 13 Nov 2004