本文介绍了 pandas .at与.loc的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在探索如何优化我的代码,并尝试使用pandas .at方法.根据文档

I've been exploring how to optimize my code and ran across pandas .at method. Per the documentation

与loc类似,提供基于标签的标量查找.您也可以使用这些索引器进行设置.

Similarly to loc, at provides label based scalar lookups. You can also set using these indexers.

所以我跑了一些样品:

import pandas as pd
import numpy as np
from string import letters, lowercase, uppercase

lt = list(letters)
lc = list(lowercase)
uc = list(uppercase)

def gdf(rows, cols, seed=None):
    """rows and cols are what you'd pass
    to pd.MultiIndex.from_product()"""
    gmi = pd.MultiIndex.from_product
    df = pd.DataFrame(index=gmi(rows), columns=gmi(cols))
    np.random.seed(seed)
    df.iloc[:, :] = np.random.rand(*df.shape)
    return df

seed = [3, 1415]
df = gdf([lc, uc], [lc, uc], seed)

print df.head().T.head().T

df看起来像:

            a                                        
            A         B         C         D         E
a A  0.444939  0.407554  0.460148  0.465239  0.462691
  B  0.032746  0.485650  0.503892  0.351520  0.061569
  C  0.777350  0.047677  0.250667  0.602878  0.570528
  D  0.927783  0.653868  0.381103  0.959544  0.033253
  E  0.191985  0.304597  0.195106  0.370921  0.631576

让我们使用.at.loc并确保我得到相同的东西

Lets use .at and .loc and ensure I get the same thing

print "using .loc", df.loc[('a', 'A'), ('c', 'C')]
print "using .at ", df.at[('a', 'A'), ('c', 'C')]

using .loc 0.37374090276
using .at  0.37374090276

使用.loc

%%timeit
df.loc[('a', 'A'), ('c', 'C')]

10000 loops, best of 3: 180 µs per loop

使用.at

%%timeit
df.at[('a', 'A'), ('c', 'C')]

The slowest run took 6.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8 µs per loop

这似乎是一个巨大的速度提升.即使在缓存阶段,6.11 * 8也比180

This looks to be a huge speed increase. Even at the caching stage 6.11 * 8 is a lot faster than 180

.at的局限性是什么?我有动力去使用它.该文档说它与.loc类似,但是行为却不一样.示例:

What are the limitations of .at? I'm motivated to use it. The documentation says it's similar to .loc but it doesn't behave similarly. Example:

# small df
sdf = gdf([lc[:2]], [uc[:2]], seed)

print sdf.loc[:, :]

          A         B
a  0.444939  0.407554
b  0.460148  0.465239

其中print sdf.at[:, :]会导致TypeError: unhashable type

因此,即使意图相似,显然也不相同.

So obviously not the same even if the intent is to be similar.

也就是说,谁可以提供有关使用.at方法可以做什么和不能做什么的指南?

That said, who can provide guidance on what can and cannot be done with the .at method?

推荐答案

更新: df.get_value .建议继续使用df.atdf.iat.

df.at一次只能访问一个值.

df.at can only access a single value at a time.

df.loc可以选择多个行和/或列.

df.loc can select multiple rows and/or columns.

请注意,还有 df.get_value ,甚至可以更快地访问单个值:

Note that there is also df.get_value, which may be even quicker at accessing single values:

In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 187 µs per loop

In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
100000 loops, best of 3: 8.33 µs per loop

In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
100000 loops, best of 3: 3.62 µs per loop


在后台,df.at[...] 致电 ,但它也会某些类型检查.


Under the hood, df.at[...] calls df.get_value, but it also does some type checking on the keys.

这篇关于 pandas .at与.loc的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 15:05