sklearn TimeSeriesSplit错误:KeyError:'[0 1 2 ...]不在索引中'

本文介绍了sklearn TimeSeriesSplit错误:KeyError:'[0 1 2 ...]不在索引中'的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在以下数据帧上使用来自sklearn的TimeSeriesSplit来预测总和:

I want to use TimeSeriesSplit from sklearn on the following dataframe to predict sum:

因此，准备X和y时，请执行以下操作:

So to prepare X and y I do the following:

X = df.drop(['sum'],axis=1)
y = df['sum']

，然后将这两个信息提供给:

and then feed these two to:

for train_index, test_index in tscv.split(X):
X_train01, X_test01 = X[train_index], X[test_index]
y_train01, y_test01 = y[train_index], y[test_index]

这样做，我得到以下错误:

by doing so, I get the following error:

KeyError: '[ 0  1  2 ...] not in index'

这里X是一个数据帧，显然这会导致错误，因为如果我将X转换为数组，如下所示:

Here X is a dataframe, and apparently this cause the error, because if I convert X to an array as following:

X = X.values

然后它将起作用.但是，为了以后对模型进行评估，我需要X作为数据框.有什么方法可以将X保留为数据帧并将其提供给tscv，而无需将其转换为数组?

Then it will work. However, for later evaluation of the model I need X as a dataframe. Is there any way that I can keep X as a dataframe and feed it to tscv without converting it to an array?

推荐答案

正如@Jarad正确说的那样，如果您已更新熊猫的版本，它将不会像以前的版本那样自动切换到基于整数的索引.您需要显式使用.iloc进行基于整数的切片.

As @Jarad rightly said, if you have updated version of pandas, it will not automatically switch to integer based indexing as was possible in previous versions. You need to explicitly use .iloc for integer based slicing.

for train_index, test_index in tscv.split(X):
    X_train01, X_test01 = X.iloc[train_index], X.iloc[test_index]
    y_train01, y_test01 = y.iloc[train_index], y.iloc[test_index]

请参见 https://pandas.pydata.org/pandas-docs/stable/indexing.html

这篇关于sklearn TimeSeriesSplit错误:KeyError:'[0 1 2 ...]不在索引中'的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！