将重复参数传递给Numpy向量化函数的最佳方法

本文介绍了将重复参数传递给Numpy向量化函数的最佳方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

因此，继续讨论@TheBlackCat和我在此答案中，我想知道最好的将参数传递给Numpy向量化函数的方式.这样定义的功能是这样的:

So, continuing from the discussion @TheBlackCat and I were having in this answer, I would like to know the best way to pass arguments to a Numpy vectorized function. The function in question is defined thus:

vect_dist_funct = np.vectorize(lambda p1, p2: vincenty(p1, p2).meters)

其中，vincenty来自 Geopy包.

我目前以这种方式呼叫vect_dist_funct:

I currently call vect_dist_funct in this manner:

def pointer(point, centroid, tree_idx):
    intersect = list(tree_idx.intersection(point))
    if len(intersect) > 0:
        points = pd.Series([point]*len(intersect)).values
        polygons = centroid.loc[intersect].values
        dist = vect_dist_funct(points, polygons)
        return pd.Series(dist, index=intercept, name='Dist').sort_values()
    else:
        return pd.Series(np.nan, index=[0], name='Dist')

points['geometry'].apply(lambda x: pointer(point=x.coords[0], centroid=line['centroid'], tree_idx=tree_idx))

(请在此处参考问题:标签化的数据类型Python )

(Please refer to the question here: Labelled datatypes Python)

我的问题与函数pointer内部发生的事情有关.我将points转换为pandas.Series然后获取值(在第四行，在if语句正下方)的原因是使它与多边形具有相同的形状.如果我仅将点称为points = [point]*len(intersect)或points = itertools.repeat(point, len(intersect))，Numpy抱怨它不能一起广播大小(n，2)和大小(n，)的数组"(n是intersect的长度).

My question pertains to what happens inside the function pointer. The reason I am converting points to a pandas.Series and then getting the values (in the 4th line, just under the if statement) is to make it in the same shape as polygons. If I merely call points either as points = [point]*len(intersect) or as points = itertools.repeat(point, len(intersect)), Numpy complains that it "cannot broadcast arrays of size (n,2) and size (n,) together" (n is the length of intersect).

如果我这样调用vect_dist_funct:dist = vect_dist_funct(itertools.repeat(points, len(intersect)), polygons)，则vincenty抱怨我给它传递了太多参数.我完全不知道两者之间的区别.

If I call vect_dist_funct like so: dist = vect_dist_funct(itertools.repeat(points, len(intersect)), polygons), vincenty complains that I have passed it too many arguments. I am at a complete loss to understand the difference between the two.

请注意，这些是坐标，因此将始终成对出现.以下是point和polygons外观的示例:

Note that these are coordinates, therefore will always be in pairs. Here are examples of how point and polygons look like:

point = (-104.950752   39.854744) # Passed directly to the function like this.
polygons = array([(-104.21750802451864, 37.84052458697633),
                  (-105.01017084789603, 39.82012158954065),
                  (-105.03965315742742, 40.669867471420886),
                  (-104.90353460825702, 39.837631505433706),
                  (-104.8650601872832, 39.870796282334744)], dtype=object)
           # As returned by statement centroid.loc[intersect].values

在这种情况下打电话给vect_dist_funct的最佳方法是什么，以便我可以进行矢量化呼叫，而Numpy和vincenty都不会抱怨我传递了错误的论点?而且，寻求导致最小的存储器消耗和增加的速度的技术.目的是计算点到每个多边形质心之间的距离.

What is the best way to call vect_dist_funct in this circumstance, such that I can have a vectorized call, and both Numpy and vincenty will not complain that I am passing wrong arguments? Also, techniques that result in minimum memory consumption, and increased speed are sought. The goal is to compute distance between the point to each polygon centroid.

推荐答案

np.vectorize确实对您没有帮助.根据文档:

np.vectorize doesn't really help you here. As per the documentation:

实际上，vectorize会积极地伤害您，因为它将输入转换为numpy数组，进行了不必要且昂贵的类型转换并产生了您所看到的错误.您最好使用带有for循环的函数.

In fact, vectorize actively hurts you, since it converts the inputs into numpy arrays, doing an unnecessary and expensive type conversion and producing the errors you are seeing. You are much better off using a function with a for loop.

对于层级函数，也最好使用函数而不是lambda，因为它可以让您拥有文档字符串.

It also is better to use a function rather than a lambda for a to-level function, since it lets you have a docstring.

这就是我执行您正在做的事情的方式:

So this is how I would implement what you are doing:

def vect_dist_funct(p1, p2):
    """Apply `vincenty` to `p1` and each element of `p2`.

    Iterate over `p2`, returning `vincenty` with the first argument
    as `p1` and the second as the current element of `p2`.  Returns
    a numpy array where each row is the result of the `vincenty` function
    call for the corresponding element of `p2`.
    """
    return [vincenty(p1, p2i).meters for p2i in p2]

如果您确实想使用vectorize，则可以使用excluded参数不对p1参数进行矢量化处理，或者最好设置一个将vincenty包装并且仅对第二个参数进行矢量化处理的lambda :

If you really want to use vectorize, you can use the excluded argument to not vectorize the p1 argument, or better yet set up a lambda that wraps vincenty and only vectorizes the second argument:

def vect_dist_funct(p1, p2):
    """Apply `vincenty` to `p1` and each element of `p2`.

    Iterate over `p2`, returning `vincenty` with the first argument
    as `p1` and the second as the current element of `p2`.  Returns
    a list where each value is the result of the `vincenty` function
    call for the corresponding element of `p2`.
    """
    vinc_p = lambda x: vincenty(p1, x)
    return np.vectorize(vinc_p)(p2)

这篇关于将重复参数传递给Numpy向量化函数的最佳方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！