本文介绍了每行的Bin元素-NumPy的矢量化2D Bincount的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有整数值的NumPy数组.矩阵的值范围从0到矩阵中的最大元素(换句话说,其中显示的所有数字从0到最大数据元素).我需要建立有效的(有效手段快速完全矢量化的解决方案),以搜索每行中的元素数量并根据矩阵值对其进行编码.

I have a NumPy array with integer values. Values of matrix range from 0 to max element in matrix(in other words, all numbers from 0 to max data element presented in it). I need to build effective( effective means fast fully-vectorized solution) for searching number of elements in each row and encode them according to matrix values.

我找不到类似的问题,或者以某种方式帮助解决了这个问题.

I could not find a similar question, or a question that somehow helped to solve this.

所以,如果我在输入中输入了这个data:

So if i have this data in input:

# shape is (N0=4, m0=4) 
1   1   0   4
2   4   2   1
1   2   3   5
4   4   4   1

所需的输出是:

# shape(N=N0, m=data.max()+1):
1   2   0   0   1   0
0   1   2   0   1   0
0   1   1   1   0   1
0   1   0   0   3   0

我知道如何解决这个问题,方法是简单地逐行迭代data的每一行中的唯一值,然后考虑到data数组中的所有可能值,组合结果.

I know how to solve this by simply counting unique values in each row of data iterating one by one, and then combining results taking in account all possible values in data array.

尽管使用NumPy将其矢量化,但是关键的问题是逐个搜索每个数字的速度很慢,并且假设存在很多唯一的数字,但这并不是有效的解决方案.通常,N和唯一数字计数都很大(顺便说一句,N似乎比唯一数字计数大).

While using NumPy for vectorizing this the key problem is that searching each number one by one is slow and assuming that there are a lot of unique numbers presented, this can not be effective solution. Generally both N and unique numbers count is rather large(by the way, N seem to be larger than unique numbers count).

有人有好主意吗?)

推荐答案

np.bincount 用于1D数组.但是,我们需要在每行上迭代使用它(简单地考虑一下).为了使其向量化,我们可以使每行偏移最大数量.想法是每行具有不同的bin,以使它们不受编号相同的其他行元素的影响.

Well that's basically what does np.bincount does with 1D arrays. But, we need to use it on each row iteratively (thinking of it simply). To make it vectorized, we could offset each row by that max number. The idea is to have different bins for each row such that they are not affected by other row elements with same numbers.

因此,实现应为-

# Vectorized solution
def bincount2D_vectorized(a):    
    N = a.max()+1
    a_offs = a + np.arange(a.shape[0])[:,None]*N
    return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)

样品运行-

In [189]: a
Out[189]: 
array([[1, 1, 0, 4],
       [2, 4, 2, 1],
       [1, 2, 3, 5],
       [4, 4, 4, 1]])

In [190]: bincount2D_vectorized(a)
Out[190]: 
array([[1, 2, 0, 0, 1, 0],
       [0, 1, 2, 0, 1, 0],
       [0, 1, 1, 1, 0, 1],
       [0, 1, 0, 0, 3, 0]])

Numba调整

我们可以引入 numba 以进一步提高速度.现在,numba允许一些调整.

We can bring in numba for further speedups. Now, numba allows few tweaks.

  • 首先,它允许JIT编译.

  • First off, it allows JIT compilation.

最近,他们还引入了实验性 会自动并行化已知具有并行语义的函数中的操作.

Also, recently they had introduced experimental parallel that automatically parallelizes operations in the function known to have parallel semantics.

最终的调整是使用 prange 作为range的替代.文档指出,该循环并行运行,类似于OpenMP并行循环和Cython的prange. prange在较大的数据集上表现良好,这可能是由于设置并行工作所需的开销.

Final tweak would be to use prange as a subsititute for range. The docs state that this runs loops in parallel, similar to OpenMP parallel for loops and Cython’s prange. prange performs well with larger datasets, which probably is because of the overhead needed to setup the parallel work.

因此,通过这两项新调整以及 对于非Python模式,我们将有三个变体-

So, with these new two tweaks along with the njit for no-Python mode, we would have three variants -

# Numba solutions
def bincount2D_numba(a, use_parallel=False, use_prange=False):
    N = a.max()+1
    m,n = a.shape
    out = np.zeros((m,N),dtype=int)

    # Choose fucntion based on args
    func = bincount2D_numba_func0
    if use_parallel:
        if use_prange:
            func = bincount2D_numba_func2
        else:
            func = bincount2D_numba_func1
    # Run chosen function on input data and output
    func(a, out, m, n)
    return out

@njit
def bincount2D_numba_func0(a, out, m, n):
    for i in range(m):
        for j in range(n):
            out[i,a[i,j]] += 1

@njit(parallel=True)
def bincount2D_numba_func1(a, out, m, n):
    for i in range(m):
        for j in range(n):
            out[i,a[i,j]] += 1

@njit(parallel=True)
def bincount2D_numba_func2(a, out, m, n):
    for i in prange(m):
        for j in prange(n):
            out[i,a[i,j]] += 1

为了完整性和以后进行测试,该循环版本为-

For completeness and testing out later on, the loopy version would be -

# Loopy solution
def bincount2D_loopy(a):
    N = a.max()+1
    m,n = a.shape
    out = np.zeros((m,N),dtype=int)
    for i in range(m):
        out[i] = np.bincount(a[i], minlength=N)
    return out 

运行时测试

案例1:

In [312]: a = np.random.randint(0,100,(100,100))

In [313]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
10000 loops, best of 3: 115 µs per loop
10000 loops, best of 3: 36.7 µs per loop
10000 loops, best of 3: 22.6 µs per loop
10000 loops, best of 3: 22.7 µs per loop
10000 loops, best of 3: 39.9 µs per loop

案例2:

In [316]: a = np.random.randint(0,100,(1000,1000))

In [317]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 2.97 ms per loop
100 loops, best of 3: 3.54 ms per loop
1000 loops, best of 3: 1.83 ms per loop
100 loops, best of 3: 1.78 ms per loop
1000 loops, best of 3: 1.4 ms per loop

案例3:

In [318]: a = np.random.randint(0,1000,(1000,1000))

In [319]: %timeit bincount2D_loopy(a)
     ...: %timeit bincount2D_vectorized(a)
     ...: %timeit bincount2D_numba(a, use_parallel=False, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=False)
     ...: %timeit bincount2D_numba(a, use_parallel=True, use_prange=True)
100 loops, best of 3: 4.01 ms per loop
100 loops, best of 3: 4.86 ms per loop
100 loops, best of 3: 3.21 ms per loop
100 loops, best of 3: 3.18 ms per loop
100 loops, best of 3: 2.45 ms per loop

似乎像numba变体一样,表现非常好.从三个变体中选择一个将取决于输入数组的形状参数,并在某种程度上取决于其中的唯一元素的数量.

Seems like the numba variants are performing very well. Choosing one out of the three variants would depend on the input array shape parameters and to some extent on the number of unique elements in it.

这篇关于每行的Bin元素-NumPy的矢量化2D Bincount的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-17 01:02