问题描述
我有一个带有整数值的NumPy数组.矩阵的值范围从0到矩阵中的最大元素(换句话说,其中显示的所有数字从0到最大数据元素).我需要建立有效的(有效手段快速完全矢量化的解决方案),以搜索每行中的元素数量并根据矩阵值对其进行编码.
I have a NumPy array with integer values. Values of matrix range from 0 to max element in matrix(in other words, all numbers from 0 to max data element presented in it). I need to build effective( effective means fast fully-vectorized solution) for searching number of elements in each row and encode them according to matrix values.
我找不到类似的问题,或者以某种方式帮助解决了这个问题.
I could not find a similar question, or a question that somehow helped to solve this.
所以,如果我在输入中输入了这个data
:
So if i have this data
in input:
# shape is (N0=4, m0=4)
1 1 0 4
2 4 2 1
1 2 3 5
4 4 4 1
所需的输出是:
# shape(N=N0, m=data.max()+1):
1 2 0 0 1 0
0 1 2 0 1 0
0 1 1 1 0 1
0 1 0 0 3 0
我知道如何解决这个问题,方法是简单地逐行迭代data
的每一行中的唯一值,然后考虑到data
数组中的所有可能值,组合结果.
I know how to solve this by simply counting unique values in each row of data
iterating one by one, and then combining results taking in account all possible values in data
array.
尽管使用NumPy将其矢量化,但是关键的问题是逐个搜索每个数字的速度很慢,并且假设存在很多唯一的数字,但这并不是有效的解决方案.通常,N
和唯一数字计数都很大(顺便说一句,N
似乎比唯一数字计数大).
While using NumPy for vectorizing this the key problem is that searching each number one by one is slow and assuming that there are a lot of unique numbers presented, this can not be effective solution. Generally both N
and unique numbers count is rather large(by the way, N
seem to be larger than unique numbers count).
有人有好主意吗?)
推荐答案
np.bincount
用于1D
数组.但是,我们需要在每行上迭代使用它(简单地考虑一下).为了使其向量化,我们可以使每行偏移最大数量.想法是每行具有不同的bin,以使它们不受编号相同的其他行元素的影响.
Well that's basically what does np.bincount
does with 1D
arrays. But, we need to use it on each row iteratively (thinking of it simply). To make it vectorized, we could offset each row by that max number. The idea is to have different bins for each row such that they are not affected by other row elements with same numbers.
因此,实现应为-
# Vectorized solution
def bincount2D_vectorized(a):
N = a.max()+1
a_offs = a + np.arange(a.shape[0])[:,None]*N
return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)
样品运行-
In [189]: a
Out[189]:
array([[1, 1, 0, 4],
[2, 4, 2, 1],
[1, 2, 3, 5],
[4, 4, 4, 1]])
In [190]: bincount2D_vectorized(a)
Out[190]:
array([[1, 2, 0, 0, 1, 0],
[0, 1, 2, 0, 1, 0],
[0, 1, 1, 1, 0, 1],
[0, 1, 0, 0, 3, 0]])
Numba调整
我们可以引入 numba
以进一步提高速度.现在,numba
允许一些调整.
We can bring in numba
for further speedups. Now, numba
allows few tweaks.
-
首先,它允许JIT编译.
First off, it allows JIT compilation.
最近,他们还引入了实验性 会自动并行化已知具有并行语义的函数中的操作.
Also, recently they had introduced experimental parallel
that automatically parallelizes operations in the function known to have parallel semantics.
最终的调整是使用 prange
作为range
的替代.文档指出,该循环并行运行,类似于OpenMP并行循环和Cython的prange. prange
在较大的数据集上表现良好,这可能是由于设置并行工作所需的开销.
Final tweak would be to use prange
as a subsititute for range
. The docs state that this runs loops in parallel, similar to OpenMP parallel for loops and Cython’s prange. prange
performs well with larger datasets, which probably is because of the overhead needed to setup the parallel work.
因此,通过这两项新调整以及 对于非Python模式,我们将有三个变体-
So, with these new two tweaks along with the njit
for no-Python mode, we would have three variants -
# Numba solutions
def bincount2D_numba(a, use_parallel=False, use_prange=False):
N = a.max()+1
m,n = a.shape
out = np.zeros((m,N),dtype=int)
# Choose fucntion based on args
func = bincount2D_numba_func0
if use_parallel:
if use_prange:
func = bincount2D_numba_func2
else:
func = bincount2D_numba_func1
# Run chosen function on input data and output
func(a, out, m, n)
return out
@njit
def bincount2D_numba_func0(a, out, m, n):
for i in range(m):
for j in range(n):
out[i,a[i,j]] += 1
@njit(parallel=True)
def bincount2D_numba_func1(a, out, m, n):
for i in range(m):
for j in range(n):
out[i,a[i,j]] += 1
@njit(parallel=True)
def bincount2D_numba_func2(a, out, m, n):
for i in prange(m):
for j in prange(n):
out[i,a[i,j]] += 1
为了完整性和以后进行测试,该循环版本为-
For completeness and testing out later on, the loopy version would be -
# Loopy solution
def bincount2D_loopy(a):
N = a.max()+1
m,n = a.shape
out = np.zeros((m,N),dtype=int)
for i in range(m):
out[i] = np.bincount(a[i], minlength=N)
return out
运行时测试
案例1:
In [312]: a = np.random.randint(0,100,(100,100))
In [313]: %timeit bincount2D_loopy(a)
...: %timeit bincount2D_vectorized(a)
...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
10000 loops, best of 3: 115 µs per loop
10000 loops, best of 3: 36.7 µs per loop
10000 loops, best of 3: 22.6 µs per loop
10000 loops, best of 3: 22.7 µs per loop
10000 loops, best of 3: 39.9 µs per loop
案例2:
In [316]: a = np.random.randint(0,100,(1000,1000))
In [317]: %timeit bincount2D_loopy(a)
...: %timeit bincount2D_vectorized(a)
...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 2.97 ms per loop
100 loops, best of 3: 3.54 ms per loop
1000 loops, best of 3: 1.83 ms per loop
100 loops, best of 3: 1.78 ms per loop
1000 loops, best of 3: 1.4 ms per loop
案例3:
In [318]: a = np.random.randint(0,1000,(1000,1000))
In [319]: %timeit bincount2D_loopy(a)
...: %timeit bincount2D_vectorized(a)
...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 4.01 ms per loop
100 loops, best of 3: 4.86 ms per loop
100 loops, best of 3: 3.21 ms per loop
100 loops, best of 3: 3.18 ms per loop
100 loops, best of 3: 2.45 ms per loop
似乎像numba
变体一样,表现非常好.从三个变体中选择一个将取决于输入数组的形状参数,并在某种程度上取决于其中的唯一元素的数量.
Seems like the numba
variants are performing very well. Choosing one out of the three variants would depend on the input array shape parameters and to some extent on the number of unique elements in it.
这篇关于每行的Bin元素-NumPy的矢量化2D Bincount的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!