Motion - Alg Diff Standard Mmx Patch

alg_diff_standard MMX patch

Introduction

This patch contains MMX assembler addition to the alg_diff_standard function. The benefit is a major speed improvement, as can be seen below. The old non-MMX code remains in the function and handles remaining pixels in the case a weird resolution that isn't divisible by 8 is used.

Note: This patch is experimental for now, please apply with care!

Bugs

There are two bugs in the old alg_diff_standard that are fixed in this patch:

  1. The cast to char in unsigned register char curdiff=(int)(abs((char)(*ref-*new))); truncates the result from the subtraction.
  2. The cast to char in curdiff=((int)((char)curdiff**mask++)/255); truncates curdiff.

Description of Patch

The patch contains a new header file, mmx.h (copied from the ffmpeg distribution), and some extra MMX code in alg_diff_standard. The MMX code is thoroughly commented and is hopefully not too hard to understand :-).

Let's look at some numbers.

Default case

In the default case, there is no mask, and the smartmask feature is not active. This is the output from my test program:

5000 iterations of old_alg_diff_standard w/ img size 320x240: 43614 ms => 8.72 ms/iter
5000 iterations of new_alg_diff_standard w/ img size 320x240: 8774 ms => 1.75 ms/iter
Testing accuracy of new_alg_diff_standard compared to old_alg_diff_standard; 1000 iterations with image size 320x240:
differed on avg in 0.00% of the pixels

Note that this is the output from one run of the test program only. Due to caching and load reasons, the numbers may be slightly different in a second run. Anyway, these ones show that the MMX version runs 80% faster than the non-MMX version. Also, the accuracy is 100%.

With mask

Adding the use of a (static) mask, the performance numbers are as follows:

3500 iterations of old_alg_diff_standard w/ img size 320x240: 41268 ms => 11.79 ms/iter
3500 iterations of new_alg_diff_standard w/ img size 320x240: 7791 ms => 2.23 ms/iter
Testing accuracy of new_alg_diff_standard compared to old_alg_diff_standard; 1000 iterations with image size 320x240:
differed on avg in 0.00% of the pixels

In other words, there is an 80% speed improvement also when using a static mask. The accuracy is 100% also in this case.

With mask and smartmask

My test program does not emulate the smartmask. Thus, I had to use profiling instead. Here is an excerpt from using the non-MMX version (when using both mask and smartmask, i.e. the "worst" case):

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 37.47     33.24    33.24      280   118.72   118.72  alg_diff_standard

Here is an excerpt from using the MMX version:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
...
 17.21     24.80    11.55      251    46.02    46.02  alg_diff_standard

These numbers show a 60% speed improvement when using both mask and smartmask. Note that the figures cannot be compared directly to the ones from my test program. Due to profiling overhead, they are much higher than when not profiling.

I have tested the accuracy of the smartmask code by tweaking Motion to run both the non-MMX version and the MMX version of alg_diff_standard in parallel. In all my tests, the two versions produced the same contents in both imgs.out and imgs.smartmask_buffer.

Installation of Patch

Note: This patch is experimental for now, please apply with care!

The installation is very straightforward:

  1. tar xzf motion-3.1.18_snap8.tar.gz
  2. cd motion-3.1.18
  3. zcat ../motion-3.1.18_snap8-algdiffstd_v2.patch.gz | patch -p1
  4. ./configure and make.

Discussion and Comments


I should add this: The patch is experimental because I haven't really tested the smartmask part (I normally don't use smartmask, so I don't know how to test it).

I have tested the non-mask case and using an ordinary mask, though. Both cases seem to work fine.

-- PerJonsson - 31 Dec 2004

I have installed your patch and it is running since 2 days without problems. Smartmask is also running fine as far as I can see. Thank you for this major performance improvement!

-- JoergWeber - 02 Jan 2005

Sounds great, thanks for testing!

Update: I found a bug in my test program - it reported too high precision loss when using a mask. The real figure is 0.05% mismatching pixels instead of 0.30%.

Just for the fun of it, though, I'm working on a version of the patch without precision loss in the mask application.

-- PerJonsson - 02 Jan 2005

Don't waste too much time and CPU on it. It is already much more precise than necessary. If the result differs 1/255... who cares:-)

BTW: smartmask only uses set or not set for a pixel.

-- JoergWeber - 02 Jan 2005

Too late, I already did it smile

I found a bug as well. Haven't fixed it yet, but I'm working on it!

-- PerJonsson - 03 Jan 2005

I have uploaded a new version of the patch. I fixed the two bugs listed above and also some bugs in the MMX code. Moreover, there is no longer a loss in precision (compared to a bugfixed version of the old code) when running with a static mask.

Joerg, the smartmask code works in my tests, but I would appreciate if you could test it as well.

-- PerJonsson - 03 Jan 2005

Even though the testing is limited I included the patch in 3.1.18_snap9 and changed status. Otherwise I loose track of what is included and what is not.

Again. Great job.

-- KennethLavrsen - 03 Jan 2005

Per, I have it running since a few hours and cannot see any bad side effect on smartmask. It's working well.

-- JoergWeber - 04 Jan 2005

I'm trying to reduce motion's load on my CPU and this seems like the right step forward. Has this been added to motion already? Is it still being looked at?

-- BobSaggeth - 30 Sep 2007

It was added in 3.1.18.

You can see this in the form at the bottom

-- KennethLavrsen - 30 Sep 2007
Topic revision: r11 - 30 Sep 2007, KennethLavrsen
Copyright © 1999-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Please do not email Kenneth for support questions (read why). Use the Support Requests page or join the Mailing List.
This website only use harmless session cookies. See Cookie Policy for details. By using this website you accept the use of these cookies.