更快速地计算图像（M，N）和模板（3,3）之间的平方差之和以进行模板匹配？

本文介绍了更快速地计算图像（M，N）和模板（3,3）之间的平方差之和以进行模板匹配？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在实施纹理合成算法，如。为此，我需要计算平方差的和，这是一个估算模板与图像之间不同位置之间误差的指标$ C>。我有一个缓慢的工作实现如下：

I am implementing an algorithm for Texture Synthesis as outlined here. For this I need to calculate the Sum of Squared Differences, a metric to estimate the error between the template and different positions across the image. I have a slow working implementation in place as follows:

total_weight = valid_mask.sum()
for i in xrange(input_image.shape[0]):
    for j in xrange(input_image.shape[1]):
        sample = image[i:i + window, j:j + window]
        dist = (template - sample) ** 2
        ssd[i, j] = (dist * valid_mask).sum() / total_weight

此处， total_weight 仅用于规范化。有些像素的强度未知，因此我使用 valid_mask 来屏蔽它们。这个嵌套循环位于2个循环中，因此这是4个嵌套循环，这显然是性能杀手！

Here, total_weight is just for normalisation. Some pixels have unknown intensities, so I use valid_mask for masking them. This nested loop lies inside of 2 loops, so that's 4 nested loops which is obviously a performance killer!

有没有一种方法可以让它在NumPy或Python中更快，取代这个嵌套循环？矢量化是可能的吗？我需要处理（3,3）部分图像与（3,3）的模板。

Is there a way I can make it faster in NumPy or Python, a replacement for this nested loop? Is Vectorization is possible? I'll need to work on (3, 3) part of the image with the (3, 3) of the template.

我随后将在Cython中实现此功能，因此我可以更快地将其工作使用NumPy，更好。

I am subsequently going to implement this in Cython, so the faster I can get it to work using just NumPy, better it is.

您可以找到完整的代码。第62-67行引用了这里。

You can find the complete code here. Line 62 - 67 quoted here.

谢谢， Chintak

Thanks, Chintak

`推荐答案`

这基本上是对Warren Weckesser答案的改进。前进的方法显然是使用原始数组的多维窗口视图，但是您希望保持该视图不会触发副本。如果你扩展总和（（ab）** 2），你可以把它变成总和（a ** 2）+总和（b * * 2） - 2 * sum（a * b），以及您可以使用线性代数运算符执行的乘法 - 然后减少 - 和 - 运算，并且在性能和性能方面都有显着改进内存使用：

This is basically an improvement over Warren Weckesser's answer. The way to go is clearly with a multidimensional windowed view of the original array, but you want to keep that view from triggering a copy. If you expand your sum((a-b)**2), you can turn it into sum(a**2) + sum(b**2) - 2*sum(a*b), and this multiply-then-reduce-with-a-sum operations you can perform with linear algebra operators, with a substantial improvement in both performance and memory use:

def sumsqdiff3(input_image, template):
    window_size = template.shape
    y = as_strided(input_image,
                    shape=(input_image.shape[0] - window_size[0] + 1,
                           input_image.shape[1] - window_size[1] + 1,) +
                          window_size,
                    strides=input_image.strides * 2)
    ssd = np.einsum('ijkl,kl->ij', y, template)
    ssd *= - 2
    ssd += np.einsum('ijkl, ijkl->ij', y, y)
    ssd += np.einsum('ij, ij', template, template)

    return ssd

In [288]: img = np.random.rand(500, 500)

In [289]: template = np.random.rand(3, 3)

In [290]: %timeit a = sumsqdiff2(img, template) # Warren's function
10 loops, best of 3: 59.4 ms per loop

In [291]: %timeit b = sumsqdiff3(img, template)
100 loops, best of 3: 18.2 ms per loop

In [292]: np.allclose(a, b)
Out[292]: True

我已离开 valid_mask 有意参数，因为我不完全了解你将如何使用它。原则上，只需将模板和/或 input_image 中的相应值归零，就应该做同样的伎俩。

I have left the valid_mask parameter out on purpose, because I don't fully understand how you would use it. In principle, just zeroing the corresponding values in template and/or input_image should do the same trick.

                        这篇关于更快速地计算图像（M，N）和模板（3,3）之间的平方差之和以进行模板匹配？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！