本文介绍了如何使用二维数组中的numpy.search进行矢量化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个二维数组(a)
用于查找,还有一个数组(v)
来查找应在其中插入元素的索引:
I have a 2d-array (a)
for lookup and an array (v)
to find indices where elements should be inserted:
import numpy as np
# [EDIT] Add more records which contain NaNs
a = np.array(
[[0., 923.9943, 996.8978, 1063.9064, 1125.639, 1184.3985, 1259.9854, 1339.6107, 1503.4462, 2035.6527],
[0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
[0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
[0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
[0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
[0., 923.9943, 996.8978, 1063.9064, 1125.639, 1184.3985, 1259.9854, 1339.6107, 1503.4462, 2035.6527],
[0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
[0., 1593.6196, 1885.2442, 2152.956, 2419.0038, 2843.517, 3551.225, 5423.009, 18930.8694, 70472.4002],
[0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578],
[0., 1084.8388, 1132.6918, 1172.2278, 1215.7986, 1259.062, 1334.4778, 1430.738, 1650.4502, 3966.1578]])
v = np.array([641.954, 56554.498, 168078.307, 1331.692, 2233.327, 1120.03, 641.954, 56554.498, 168078.307, 1331.692, 2233.327])
这是我想要得到的结果:
This is the result I want to get:
[1, 9, 10, 6, 9, 0, 1, 9, 10, 6, 9]
显然,通过for循环,我可以像这样对数组a和v进行索引:
Obviously, with a for loop I can index the array a and v like this:
for i, _ in enumerate(a):
print(np.searchsorted(a[i], v[i]))
是否有任何 vectorized
方法可以更有效地进行此操作?
Are there any vectorized
ways to do this which are more efficient?
推荐答案
灵感来自 矢量化搜索排序的numpy
的基本概念,这是介于 2D
和 1D
数组之间的一个-
Inspired by Vectorized searchsorted numpy
for the underlying idea, here's one between 2D
and 1D
arrays -
def searchsorted2d(a,b):
# Inputs : a is (m,n) 2D array and b is (m,) 1D array.
# Finds np.searchsorted(a[i], b[i])) in a vectorized way by
# scaling/offsetting both inputs and then using searchsorted
# Get scaling offset and then scale inputs
s = np.r_[0,(np.maximum(a.max(1)-a.min(1)+1,b)+1).cumsum()[:-1]]
a_scaled = (a+s[:,None]).ravel()
b_scaled = b+s
# Use searchsorted on scaled ones and then subtract offsets
return np.searchsorted(a_scaled,b_scaled)-np.arange(len(s))*a.shape[1]
给定样本的输出-
In [101]: searchsorted2d(a,v)
Out[101]: array([ 1, 9, 10, 6, 9])
包含所有NaN行的情况
要扩展使其适用于所有NaN行,我们还需要一些步骤-
To extend to make it work for all NaNs rows, we need few more steps -
valid_mask = ~np.isnan(a).any(1)
out = np.zeros(len(a), dtype=int)
out[valid_mask] = searchsorted2d(a[valid_mask],v[valid_mask])
这篇关于如何使用二维数组中的numpy.search进行矢量化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!