本文介绍了开始:停止切片numpy和Pandas之间的不一致之处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对numpy和Pandas之间的以下差异感到惊讶/困惑

I am a bit suprised/confused about the following difference between numpy and Pandas

import numpy as np
import pandas as pd
a = np.random.randn(10,10)

> a[:3,0, newaxis]

array([[-1.91687144],
       [-0.6399471 ],
       [-0.10005721]])

但是:

b = pd.DataFrame(a)

> b.ix[:3,0]

0   -1.916871
1   -0.639947
2   -0.100057
3    0.251988

换句话说,numpy不以start:stop表示法包含stop索引,但是Pandas包括.我以为熊猫是基于Numpy.这是一个错误吗?故意的?

In other words, numpy does not include the stop index in start:stop notation, but Pandas does. I thought Pandas was based on Numpy. Is this a bug? Intentional?

推荐答案

已记录在案,它是高级索引.关键是您根本不使用停止索引.

This is documented, and it's part of Advanced Indexing. The key here is that you're not using a stop index at all.

ix属性是一种特殊的功能,它使您可以按标签进行各种高级索引 —选择标签列表,使用包含范围的标签而不是一半范围的标签索引以及其他各种东西.

The ix attribute is a special thing that lets you do various kinds of advanced indexing by label—choosing a list of labels, using an inclusive range of labels instead of a half-exclusive range of indices, and various other things.

如果您不想要它,那就不要使用它:

If you don't want that, just don't use it:

In [191]: b[:3][0]
Out[191]: 
0   -0.209386
1    0.050345
2    0.318414
Name: 0

如果您在不阅读文档的情况下进行了更多操作,您可能会想到标签为'A', 'B', 'C', 'D'而不是0, 1, 2, 3的情况,突然,b.ix[:3]仅返回3行而不是4行,您将一头雾水.

If you play with this a bit more without reading the docs, you'll probably come up with a case where your labels are, say, 'A', 'B', 'C', 'D' instead of 0, 1, 2, 3, and suddenly, b.ix[:3] will returns only 3 rows instead of 4, and you'll be baffled all over again.

区别在于,在这种情况下,b.ix[:3] indices 上的切片,而不是 labels 上的切片.

The difference is that in that case, b.ix[:3] is a slice on indices, not on labels.

您在代码中所要求的实际上在所有标签包括一个3"和所有索引最多但不包括3"之间是模棱两可的,并且标签总是以ix获胜(因为如果您不这样做,如果不想进行标签切片,则不必首先使用ix.这就是为什么我说的问题是您根本没有使用停止索引.

What you've requested in your code is actually ambiguous between "all labels up to an including 3" and "all indices up to but not including 3", and labels always win with ix (because if you don't want label slicing, you don't have to use ix in the first place). And that's why I said the problem is that you're not using a stop index at all.

这篇关于开始:停止切片numpy和Pandas之间的不一致之处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 20:29