问题描述
我正在实施纹理合成算法,如。为此,我需要计算平方差的和,这是一个估算模板
与图像$ c>之间不同位置之间误差的指标$ C>。我有一个缓慢的工作实现如下:
I am implementing an algorithm for Texture Synthesis as outlined here. For this I need to calculate the Sum of Squared Differences, a metric to estimate the error between the template
and different positions across the image
. I have a slow working implementation in place as follows:
total_weight = valid_mask.sum()
for i in xrange(input_image.shape[0]):
for j in xrange(input_image.shape[1]):
sample = image[i:i + window, j:j + window]
dist = (template - sample) ** 2
ssd[i, j] = (dist * valid_mask).sum() / total_weight
此处, total_weight
仅用于规范化。有些像素的强度未知,因此我使用 valid_mask
来屏蔽它们。这个嵌套循环位于2个循环中,因此这是4个嵌套循环,这显然是性能杀手!
Here, total_weight
is just for normalisation. Some pixels have unknown intensities, so I use valid_mask
for masking them. This nested loop lies inside of 2 loops, so that's 4 nested loops which is obviously a performance killer!
有没有一种方法可以让它在NumPy或Python中更快,取代这个嵌套循环?矢量化是可能的吗?我需要处理(3,3)
部分图像
与(3,3)的模板
。
Is there a way I can make it faster in NumPy or Python, a replacement for this nested loop? Is Vectorization is possible? I'll need to work on (3, 3)
part of the image
with the (3, 3) of the template
.
我随后将在Cython中实现此功能,因此我可以更快地将其工作使用NumPy,更好。
I am subsequently going to implement this in Cython, so the faster I can get it to work using just NumPy, better it is.
您可以找到完整的代码。第62-67行引用了这里。
You can find the complete code here. Line 62 - 67 quoted here.
谢谢,
Chintak
Thanks,
Chintak
推荐答案
这基本上是对Warren Weckesser答案的改进。前进的方法显然是使用原始数组的多维窗口视图,但是您希望保持该视图不会触发副本。如果你扩展总和((ab)** 2)
,你可以把它变成总和(a ** 2)+总和(b * * 2) - 2 * sum(a * b)
,以及您可以使用线性代数运算符执行的乘法 - 然后减少 - 和 - 运算,并且在性能和性能方面都有显着改进内存使用:
This is basically an improvement over Warren Weckesser's answer. The way to go is clearly with a multidimensional windowed view of the original array, but you want to keep that view from triggering a copy. If you expand your sum((a-b)**2)
, you can turn it into sum(a**2) + sum(b**2) - 2*sum(a*b)
, and this multiply-then-reduce-with-a-sum operations you can perform with linear algebra operators, with a substantial improvement in both performance and memory use:
def sumsqdiff3(input_image, template):
window_size = template.shape
y = as_strided(input_image,
shape=(input_image.shape[0] - window_size[0] + 1,
input_image.shape[1] - window_size[1] + 1,) +
window_size,
strides=input_image.strides * 2)
ssd = np.einsum('ijkl,kl->ij', y, template)
ssd *= - 2
ssd += np.einsum('ijkl, ijkl->ij', y, y)
ssd += np.einsum('ij, ij', template, template)
return ssd
In [288]: img = np.random.rand(500, 500)
In [289]: template = np.random.rand(3, 3)
In [290]: %timeit a = sumsqdiff2(img, template) # Warren's function
10 loops, best of 3: 59.4 ms per loop
In [291]: %timeit b = sumsqdiff3(img, template)
100 loops, best of 3: 18.2 ms per loop
In [292]: np.allclose(a, b)
Out[292]: True
我已离开 valid_mask
有意参数,因为我不完全了解你将如何使用它。原则上,只需将模板
和/或 input_image
中的相应值归零,就应该做同样的伎俩。
I have left the valid_mask
parameter out on purpose, because I don't fully understand how you would use it. In principle, just zeroing the corresponding values in template
and/or input_image
should do the same trick.
这篇关于更快速地计算图像(M,N)和模板(3,3)之间的平方差之和以进行模板匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!