本文介绍了更快速地计算图像(M,N)和模板(3,3)之间的平方差之和以进行模板匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在实施纹理合成算法,如。为此,我需要计算平方差的和,这是一个估算模板图像之间不同位置之间误差的指标$ C>。我有一个缓慢的工作实现如下:

I am implementing an algorithm for Texture Synthesis as outlined here. For this I need to calculate the Sum of Squared Differences, a metric to estimate the error between the template and different positions across the image. I have a slow working implementation in place as follows:

total_weight = valid_mask.sum()
for i in xrange(input_image.shape[0]):
    for j in xrange(input_image.shape[1]):
        sample = image[i:i + window, j:j + window]
        dist = (template - sample) ** 2
        ssd[i, j] = (dist * valid_mask).sum() / total_weight

此处, total_weight 仅用于规范化。有些像素的强度未知,因此我使用 valid_mask 来屏蔽它们。这个嵌套循环位于2个循环中,因此这是4个嵌套循环,这显然是性能杀手!

Here, total_weight is just for normalisation. Some pixels have unknown intensities, so I use valid_mask for masking them. This nested loop lies inside of 2 loops, so that's 4 nested loops which is obviously a performance killer!

有没有一种方法可以让它在NumPy或Python中更快,取代这个嵌套循环?矢量化是可能的吗?我需要处理(3,3)部分图像与(3,3)的模板

Is there a way I can make it faster in NumPy or Python, a replacement for this nested loop? Is Vectorization is possible? I'll need to work on (3, 3) part of the image with the (3, 3) of the template.

我随后将在Cython中实现此功能,因此我可以更快地将其工作使用NumPy,更好。

I am subsequently going to implement this in Cython, so the faster I can get it to work using just NumPy, better it is.

您可以找到完整的代码。第62-67行引用了这里。

You can find the complete code here. Line 62 - 67 quoted here.

谢谢,

Chintak

Thanks,
Chintak

推荐答案

这基本上是对Warren Weckesser答案的改进。前进的方法显然是使用原始数组的多维窗口视图,但是您希望保持该视图不会触发副本。如果你扩展总和((ab)** 2),你可以把它变成总和(a ** 2)+总和(b * * 2) - 2 * sum(a * b),以及您可以使用线性代数运算符执行的乘法 - 然后减少 - 和 - 运算,并且在性能和性能方面都有显着改进内存使用:

This is basically an improvement over Warren Weckesser's answer. The way to go is clearly with a multidimensional windowed view of the original array, but you want to keep that view from triggering a copy. If you expand your sum((a-b)**2), you can turn it into sum(a**2) + sum(b**2) - 2*sum(a*b), and this multiply-then-reduce-with-a-sum operations you can perform with linear algebra operators, with a substantial improvement in both performance and memory use:

def sumsqdiff3(input_image, template):
    window_size = template.shape
    y = as_strided(input_image,
                    shape=(input_image.shape[0] - window_size[0] + 1,
                           input_image.shape[1] - window_size[1] + 1,) +
                          window_size,
                    strides=input_image.strides * 2)
    ssd = np.einsum('ijkl,kl->ij', y, template)
    ssd *= - 2
    ssd += np.einsum('ijkl, ijkl->ij', y, y)
    ssd += np.einsum('ij, ij', template, template)

    return ssd

In [288]: img = np.random.rand(500, 500)

In [289]: template = np.random.rand(3, 3)

In [290]: %timeit a = sumsqdiff2(img, template) # Warren's function
10 loops, best of 3: 59.4 ms per loop

In [291]: %timeit b = sumsqdiff3(img, template)
100 loops, best of 3: 18.2 ms per loop

In [292]: np.allclose(a, b)
Out[292]: True

我已离开 valid_mask 有意参数,因为我不完全了解你将如何使用它。原则上,只需将模板和/或 input_image 中的相应值归零,就应该做同样的伎俩。

I have left the valid_mask parameter out on purpose, because I don't fully understand how you would use it. In principle, just zeroing the corresponding values in template and/or input_image should do the same trick.

这篇关于更快速地计算图像(M,N)和模板(3,3)之间的平方差之和以进行模板匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 17:29