本文介绍了Cython中numpy数组蒙版的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题的后续操作此处(感谢MSeifert的帮助),我想到了一个问题,即在传递被屏蔽的数组以更新val_dict之前,必须用索引数组new_vals_idx屏蔽numpy数组new_values.

As a follow up of this question here (thanks MSeifert for your help) I came up with the problem that I have to mask a numpy array new_values with an index array new_vals_idx before passing the masked array to update val_dict.

在老帖子中针对MSeifert的建议解决方案,我尝试应用数组掩码,但是性能不令人满意.
我在以下示例中使用的数组和字典是:

To the proposed solutions in answer of MSeifert in the old post I tried to apply the array masking, but the performance is not satisfying.
The arrays and dicts I used for the following examples are:

import numpy as np
val_dict = {'a': 5.0, 'b': 18.8, 'c': -55/2}
for i in range(200):
    val_dict[str(i)] = i
    val_dict[i] = i**2

keys = ('b', 123, '89', 'c')  # dict keys to update
new_values = np.arange(1, 51, 1) / 1.0  # array with new values which has to be masked
new_vals_idx = np.array((0, 3, 5, -1))  # masking array
valarr = np.zeros((new_vals_idx.shape[0]))  # preallocation for masked array
length = new_vals_idx.shape[0]

为了使我的代码片段更容易与以前的问题进行比较,我将坚持使用MSeifert答案的函数命名.这些是我试图从python/cython中获得最佳性能的尝试(其他答案由于性能太差而被忽略了):

To make my code-snippets easier to compare with my old question, I'll stick to the function naming of MSeifert's answer. These are my tries to get the best performance out of python/cython (the other answers were left out because of too poor performance):

def old_for(val_dict, keys, new_values, new_vals_idx, length):
    for i in range(length):
        val_dict[keys[i]] = new_values[new_vals_idx[i]]
%timeit old_for(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.6 µs per loop

def old_for_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length):
    valarr = new_values[new_vals_idx]
    for i in range(length):
        val_dict[keys[i]] = valarr[i]
%timeit old_for_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length)
# 100000 loops, best of 3: 2.33 µs per loop

def new2_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length):
    valarr = new_values[new_vals_idx].tolist()
    for key, val in zip(keys, valarr):
        val_dict[key] = val
%timeit new2_w_valarr(val_dict, keys, new_values, valarr, new_vals_idx, length)
# 100000 loops, best of 3: 2.01 µs per loop

Cython函数:

%load_ext cython
%%cython
import numpy as np
cimport numpy as np
cpdef new3_cy(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
    cdef Py_ssize_t i
    cdef double val  # this gives about 10 µs speed boost compared to directly assigning it to val_dict
    for i in range(length):
        val = new_values[new_vals_idx[i]]
        val_dict[keys[i]] = val
%timeit new3_cy(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.38 µs per loop

cpdef new3_cy_mview(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
    cdef Py_ssize_t i
    cdef int[:] mview_idx = new_vals_idx
    cdef double [:] mview_vals = new_values
    for i in range(length):
        val_dict[keys[i]] = mview_vals[mview_idx[i]]
%timeit new3_cy_mview(val_dict, keys, new_values, new_vals_idx, length)
# 1000000 loops, best of 3: 1.38 µs per loop

# NOT WORKING:
cpdef new2_cy_mview(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
    cdef double [new_vals_idx] masked_vals = new_values
    for key, val in zip(keys, masked_vals.tolist()):
        val_dict[key] = val

cpdef new2_cy_mask(dict val_dict, tuple keys, double[:] new_values, valarr, int[:] new_vals_idx, Py_ssize_t length):
    valarr = new_values[new_vals_idx]
    for key, val in zip(keys, valarr.tolist()):
        val_dict[key] = val

Cython函数new3_cynew3_cy_mview似乎并不比old_for快得多.传递valarr来避免函数内部的数组构造(因为它将被调用数百万次)甚至似乎减慢了它的速度.
在Cython中使用new_vals_idx数组在new2_cy_mask中进行屏蔽的错误是:指定的memoryview的索引无效,键入int [:]".索引数组是否有像Py_ssize_t这样的类型?
尝试在new2_cy_mview中创建一个蒙版的内存视图会给我错误'无法为'double [__pyx_v_new_vals_idx]分配类型'double [:]'".甚至还有被遮罩的memoryviews之类的东西吗?我找不到有关此主题的信息...

The Cython functions new3_cy and new3_cy_mview do not seem to be considerably faster than old_for. Passing valarr to avoid array construction inside the function (as it is going to be called several million times) even seems to slow it down.
Masking in new2_cy_mask with the new_vals_idx array in Cython gives me the error: 'Invalid index for memoryview specified, type int[:]'. Is there any type like Py_ssize_t for arrays of indexes?
Trying to create a masked memoryview in new2_cy_mview gives me the error 'Cannot assign type 'double[:]' to 'double [__pyx_v_new_vals_idx]''. Is there even something like masked memoryviews? I wasn't able to find information on this topic...

将时序结果与我的老问题得到的结果进行比较,我猜想阵列屏蔽是整个过程的大部分时间.而且由于它很可能已经在numpy中进行了高度优化,因此可能没有太多工作要做.但是速度减慢是如此之大,以至于必须(有希望)有一种更好的方法.
任何帮助表示赞赏!预先感谢!

Comparing the timing results with those from my old question I guess that the array masking is the process taking up most of the time. And as it is most likely already highly optimized in numpy, there is probably not much to do. But the slow-down is so huge, that there must be (hopefully) a better way to do it.
Any help is appreciated! Thanks in advance!

推荐答案

在当前结构中您可以做的一件事是关闭边界检查(如果可以的话).不会产生很大的变化,但是会带来一些增量性能.

One thing you can do in the current construction is turn off bounds checking (if it's safe to!). Won't make a huge difference, but some incremental performance.

%%cython
import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef new4_cy(dict val_dict, tuple keys, double[:] new_values, int[:] new_vals_idx, Py_ssize_t length):
    cdef Py_ssize_t i
    cdef double val  # this gives about 10 µs speed boost compared to directly assigning it to val_dict
    for i in range(length):
        val = new_values[new_vals_idx[i]]
        val_dict[keys[i]] = val

In [36]: %timeit new3_cy(val_dict, keys, new_values, new_vals_idx, length)
1.76 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [37]: %timeit new4_cy(val_dict, keys, new_values, new_vals_idx, length)
1.45 µs ± 31.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

这篇关于Cython中numpy数组蒙版的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 23:22