本文介绍了sklearn.model_selection.train_test_split出现Python错误:ValueError:找到输入数据的样本数不一致:[416858,398427]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的标签数量与样本数量不匹配,因此我认为解决方案是删除一些样本数据,但总体而言,这不是一个好习惯.

My number of labels doesn't match the number of samples, so I think a solution would be to remove some of the sample data, but I think that's not a good practice overall.

这是我的代码:

X = np.loadtxt('/Users/myname/PycharmProjects/my_project/X.txt')
y = np.loadtxt('/Users/myname/PycharmProjects/my_project/y.txt')

print np.shape(X)
print np.shape(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

我得到了错误:

ValueError: Found input variables with inconsistent numbers of samples: [416858, 398427]

任何人都可以解释一下我需要做些什么来解决它吗?

Can anyone explain what I would need to do to fix it?

推荐答案

np.shape(x)和np.shape(y)的结果是什么?也许可以帮到您.如果您没有所有输入的目标值,则必须解决该问题.仅删除可能会出现问题,因为如果缺失值不是随机的,则会影响模型的结果.最好的选择是执行插补.有关更多信息,请参见维基百科页面.

What are the results for np.shape(x) and np.shape(y)? Maybe that can help you. If you don't have a target value for all your input, you have to fix that. Just deleting can be problematic, because if the missing values are not random you will influence the outcome of your model. Your best option would be to perform imputation. See the Wikipedia page for more information.

这篇关于sklearn.model_selection.train_test_split出现Python错误:ValueError:找到输入数据的样本数不一致:[416858,398427]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-26 19:54