本文介绍了sklearn TimeSeriesSplit错误:KeyError:'[0 1 2 ...]不在索引中'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



I want to use TimeSeriesSplit from sklearn on the following dataframe to predict sum:


So to prepare X and y I do the following:

X = df.drop(['sum'],axis=1)
y = df['sum']


and then feed these two to:

for train_index, test_index in tscv.split(X):
X_train01, X_test01 = X[train_index], X[test_index]
y_train01, y_test01 = y[train_index], y[test_index]


by doing so, I get the following error:

KeyError: '[ 0  1  2 ...] not in index'


Here X is a dataframe, and apparently this cause the error, because if I convert X to an array as following:

X = X.values


Then it will work. However, for later evaluation of the model I need X as a dataframe. Is there any way that I can keep X as a dataframe and feed it to tscv without converting it to an array?



As @Jarad rightly said, if you have updated version of pandas, it will not automatically switch to integer based indexing as was possible in previous versions. You need to explicitly use .iloc for integer based slicing.

for train_index, test_index in tscv.split(X):
    X_train01, X_test01 = X.iloc[train_index], X.iloc[test_index]
    y_train01, y_test01 = y.iloc[train_index], y.iloc[test_index]

请参见 https://pandas.pydata.org/pandas-docs/stable/indexing.html

这篇关于sklearn TimeSeriesSplit错误:KeyError:'[0 1 2 ...]不在索引中'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-22 08:50