Motion - Dilate Five Speed Patch

Dilate5 Speed Patch

Introduction

This patch is similar to the DilateNineSpeedPatch, but improves the speed of the dilate5 function instead. As can be seen in the execution profile when motion is being detected, the function hogs some CPU:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
  9.60    138.55    18.42      193    95.45    95.45  dilate5
(See MotionProfiling for more info and the example profile where the above snippet was taken from.)

As with the dilate9 function, this function contains a bug due to the use of signed chars (see DilateNineSpeedPatch). The patch contains a fix for the bug.

Description of Patch

The speed gain ranges from around 40% to around 60%. When dilating a picture with much information, the speed gain is lower, but when dilating a picture that is mostly empty (as the motion images are), the speed gain is higher:

Running 2000 iterations of old_dilate5 with image size 320x240: 23.46 ms/iteration
Running 2000 iterations of new_dilate5 with image size 320x240: 13.40 ms/iteration
(Speed gain: 43%)

Using cleared test buffers for speed test.
Running 2000 iterations of old_dilate5 with image size 320x240: 22.56 ms/iteration
Running 2000 iterations of new_dilate5 with image size 320x240: 8.35 ms/iteration
(Speed gain: 63%)

We can also compare the entry in the execution profile (see above):

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 13.31     29.64     5.49      131    41.91    41.91  dilate5

In this "real" case, the optimized function runs 56% faster than the original function.

Installation of Patch

The installation is very straightforward:

  1. tar xzf motion-3.1.18_snap6.tar.gz
  2. cd motion-3.1.18
  3. zcat ../motion-3.1.18_snap6-dilate5.patch.gz | patch -p1
  4. ./configure and make.

Testing and Validation

Testing has been made by running both the old and the new function on randomly generated images:

Testing accuracy of new_dilate5 compared to old_dilate5; 15000 iterations with image size 320x240: all ok

In other words, the new function generated the same result as the old in 15000 random cases. Note that in this test, the bug mentioned above had been fixed in the old function as well.

-- PerJonsson - 13 Nov 2004

Discussion and Comments


One additional note: The patch also removes the MAX macro in favor of the MAX2 macro. The difference is that MAX2 doesn't use the abs function, which makes it faster.

One consequence of this is that the patch changes one row in dilate9 (yes, the optimized version) as well.

-- PerJonsson - 13 Nov 2004
Topic revision: r3 - 30 Jan 2005, KennethLavrsen
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Please do not email Kenneth for support questions (read why). Use the Support Requests page or join the Mailing List.
This website only use harmless session cookies. See Cookie Policy for details. By using this website you accept the use of these cookies.