本文介绍了OHLC聚合器不适用于大 pandas 的数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道这是否是一个错误,或者是设计的 - 也许我错过了一些东西,而且这个ohlc聚合器并不适用于数据帧。也许这种行为是设计的,因为除了索引列和价格列之外的数据框可能会产生奇怪的结果?其他聚合器(平均值,stdev等)与数据帧一起使用。无论如何,我试图从这些数据中获得OHLC,而转换为时间表似乎也不起作用。



这是一个例子:

 将大熊猫导入为pd 
rng = pd.date_range('1/1/2012',periods = 1000,freq ='S')

ts = pd.Series(randint(0,500,len(rng)) ,index = rng)
df = pd.DataFrame(randint(0,500,len(rng)),index = rng)

ts.resample('5Min',how ='ohlc' )#works great
df.resample('5Min',how ='ohlc')#抛出一个NotImplementedError

newts = pd.TimeSeries(df)#am我缺少一个索引命令在这一行?
#以上行产生此错误TypeError:仅适用于DatetimeIndex或
PeriodIndex






  Full NotImplementedError粘贴:

NotImplementedError追溯(最近的最后一次呼叫)
/ home / jeff /< ipython-input-7-85a274cc0d8c>在< module>()
----> 1 df.resample('5Min',how ='ohlc')

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7- linux-x86_64.egg / pandas / core / generic.pyc in resample(self,rule,how,axis,fill_method,closed,label,convention,kind,loffset,limit,base)
231 fill_method = fill_method,convention =约定,
232 limit = limit,base = base)
- > 233 return sampler.resample(self)
234
235 def first(self,offset):

/usr/local/lib/python2.7/dist-packages/pandas -0.9.2.dev-py2.7-linux-x86_64.egg / pandas / tseries / resample.pyc in resample(self,obj)
66
67 if isinstance(axis,DatetimeIndex):
---> 68 rs = self._resample_timestamps(obj)
69 elif isinstance(axis,PeriodIndex):
70 offset = to_offset(self.freq)

/ usr / local / lib / python2.7 / dist-packages / pandas-0.9.2.dev-py2.7-linux-x86_64.egg / pandas / tseries / resample.pyc in _resample_timestamps(self,obj)
189 if len(grouper。 binlabels) len(axlabels)或self.how不是None:
190 grouping = obj.groupby(grouper,axis = self.axis)
- > 191 result = grouping.aggregate(self._agg_method)
192其他:
193#上采样快捷方式


/usr/local/lib/python2.7/dist聚合中的-packages / pandas-0.9.2.dev-py2.7-linux-x86_64.egg / pandas / core / groupby.pyc(self,arg,* args,** kwargs)
1538
1539 if isinstance(arg,basestring):
- > 1540 return getattr(self,arg)(* args,** kwargs)
1541
1542 result = {}

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc在ohlc (self)
384对于多个分组,结果索引将是一个MultiIndex
385
- > 386 return self._cython_agg_general('ohlc')
387
388 def nth(self,n):

/usr/local/lib/python2.7/dist-packages _pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc在_cython_agg_general(self,how,numeric_only)
1452
1453 def _cython_agg_general(self,如何,numeric_only = True):
- > 1454 new_blocks = self._cython_agg_blocks(how,numeric_only = numeric_only)
1455 return self._wrap_agged_blocks(new_blocks)
1456

/usr/local/lib/python2.7/dist _cython_agg_blocks(self,how,numeric_only)中的-packages / pandas-0.9.2.dev-py2.7-linux-x86_64.egg / pandas / core / groupby.pyc
1490 values = com.ensure_float(values)
1491
- > 1492结果,_ = self.grouper.aggregate(values,how,axis = agg_axis)
1493 newb = make_block(result,block.items,block.ref_items)
1494 new_blocks.append(newb)

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc汇总( self,values,how,axis)
730 values = values.swapaxes(0,axis)
731 if arity> 1:
- > 732 raise NotImplementedError
733 out_shape =(self.ngroups,)+ values.shape [1:]
734

NotImplementedError:


解决方案

您可以对单个列进行重新抽样(因为每个列都是时间序列):

 在[9]中:df [0] .resample('5Min',how ='ohlc')
输出[9]:
打开高低关闭
2012-01-01 00:00:00 136 136 136 136
2012-01-01 00:05:00 462 499 0 451
2012- 01-01 00:10:00 209 499 0 495
2012-01-01 00:15:00 25 499 0 344
2012-01-01 00:20:00 200 498 0 199


在[10]中:type(df [0])
输出[10]:pandas.core.series.TimeSeries
/ pre>

我不清楚如何为更大的DataFrames(带多列)输出,但也许您可以创建一个Panel:

 在[11]中:newts = Panel(di ct((col,df [col] .resample('5Min',how ='ohlc'))
for df.columns))

在[12]:newts [ 0]
出[12]:
打开高低关闭
2012-01-01 00:00:00 136 136 136 136
2012-01-01 00:05: 00 462 499 0 451
2012-01-01 00:10:00 209 499 0 495
2012-01-01 00:15:00 25 499 0 344
2012-01-01 00:20:00 200 498 0 199

注意:也许有一个规范的输出重新采样一个DataFrame,它是尚未实现?


I am not sure if this is a bug or if it's by design-- perhaps I am missing something and the ohlc aggregator isn't supposed to work with dataframes. Perhaps this behavior is by design because a dataframe with anything other than an index column and a price column could yield strange results? Other aggregators (mean,stdev, etc.) work with a dataframe. In any case, I'm trying to get OHLC from this data, and converting to a timeseries doesn't seem to work either.

Here's an example:

import pandas as pd
rng = pd.date_range('1/1/2012', periods=1000, freq='S')

ts = pd.Series(randint(0, 500, len(rng)), index=rng)
df = pd.DataFrame(randint(0,500, len(rng)), index=rng)

ts.resample('5Min', how='ohlc') # works great
df.resample('5Min', how='ohlc') # throws a "NotImplementedError"

newts = pd.TimeSeries(df) #am I missing an index command in this line?
# the above line yields this error "TypeError: Only valid with DatetimeIndex or
  PeriodIndex"


Full NotImplementedError paste:

NotImplementedError                       Traceback (most recent call last)
/home/jeff/<ipython-input-7-85a274cc0d8c> in <module>()
----> 1 df.resample('5Min', how='ohlc')

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
    231                               fill_method=fill_method, convention=convention,
    232                               limit=limit, base=base)
--> 233         return sampler.resample(self)
    234 
    235     def first(self, offset):

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/tseries/resample.pyc in resample(self, obj)
     66 
     67         if isinstance(axis, DatetimeIndex):
---> 68             rs = self._resample_timestamps(obj)
     69         elif isinstance(axis, PeriodIndex):
     70             offset = to_offset(self.freq)

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/tseries/resample.pyc in _resample_timestamps(self, obj)
    189             if len(grouper.binlabels) < len(axlabels) or self.how is not None:
    190                 grouped = obj.groupby(grouper, axis=self.axis)
--> 191                 result = grouped.aggregate(self._agg_method)
    192             else:
    193                 # upsampling shortcut


/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   1538         """
   1539         if isinstance(arg, basestring):
-> 1540             return getattr(self, arg)(*args, **kwargs)
   1541 
   1542         result = {}

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in ohlc(self)
    384         For multiple groupings, the result index will be a MultiIndex
    385         """
--> 386         return self._cython_agg_general('ohlc')
    387 
    388     def nth(self, n):

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
   1452 
   1453     def _cython_agg_general(self, how, numeric_only=True):
-> 1454         new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   1455         return self._wrap_agged_blocks(new_blocks)
   1456 

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
   1490                 values = com.ensure_float(values)
   1491 
-> 1492             result, _ = self.grouper.aggregate(values, how, axis=agg_axis)
   1493             newb = make_block(result, block.items, block.ref_items)
   1494             new_blocks.append(newb)

/usr/local/lib/python2.7/dist-packages/pandas-0.9.2.dev-py2.7-linux-x86_64.egg/pandas/core/groupby.pyc in aggregate(self, values, how, axis)
    730                 values = values.swapaxes(0, axis)
    731             if arity > 1:
--> 732                 raise NotImplementedError
    733             out_shape = (self.ngroups,) + values.shape[1:]
    734 

NotImplementedError: 
解决方案

You can resample over an individual column (since each of these is a timeseries):

In [9]: df[0].resample('5Min', how='ohlc')
Out[9]: 
                     open  high  low  close
2012-01-01 00:00:00   136   136  136    136
2012-01-01 00:05:00   462   499    0    451
2012-01-01 00:10:00   209   499    0    495
2012-01-01 00:15:00    25   499    0    344
2012-01-01 00:20:00   200   498    0    199


In [10]: type(df[0])
Out[10]: pandas.core.series.TimeSeries

It's not clear to me how this should output for a larger DataFrames (with multiple columns), but perhaps you could make a Panel:

In [11]: newts = Panel(dict((col, df[col].resample('5Min', how='ohlc'))
                                for col in df.columns))

In [12]: newts[0]
Out[12]: 
                     open  high  low  close
2012-01-01 00:00:00   136   136  136    136
2012-01-01 00:05:00   462   499    0    451
2012-01-01 00:10:00   209   499    0    495
2012-01-01 00:15:00    25   499    0    344
2012-01-01 00:20:00   200   498    0    199

Note: Perhaps there is a canonical output for resampling a DataFrame and it is yet to be implemented?

这篇关于OHLC聚合器不适用于大 pandas 的数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-28 18:54