问题描述
这个问题似乎以前曾被问过,但我似乎无法发表评论以进一步澄清已接受的答案,而且我也无法弄清楚所提供的解决方案.
this question seems to have been asked before, but I can't seem to comment for further clarification on the accepted answer and I couldn't figure out the solution provided.
我正在尝试学习如何对自己的数据使用sklearn.基本上,我得到了过去100年中两个不同国家/地区的GDP年度变化百分比.我现在只是想学习使用单个变量.我本质上想做的是使用sklearn来预测给定B国GDP的百分比变化后,A国GDP的百分比变化.
I am trying to learn how to use sklearn with my own data. I essentially just got the annual % change in GDP for 2 different countries over the past 100 years. I am just trying to learn using a single variable for now. What I am essentially trying to do is use sklearn to predict what the GDP % change for country A will be given the percentage change in country B's GDP.
问题是我收到一条错误消息:
The problem is that I receive an error saying:
这是我的代码:
import sklearn.linear_model as lm
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
def bytespdate2num(fmt, encoding='utf-8'):#function to convert bytes to string for the dates.
strconverter = mdates.strpdate2num(fmt)
def bytesconverter(b):
s = b.decode(encoding)
return strconverter(s)
return bytesconverter
dataCSV = open('combined_data.csv')
comb_data = []
for line in dataCSV:
comb_data.append(line)
date, chngdpchange, ausgdpchange = np.loadtxt(comb_data, delimiter=',', unpack=True, converters={0: bytespdate2num('%d/%m/%Y')})
chntrain = chngdpchange[:-1]
chntest = chngdpchange[-1:]
austrain = ausgdpchange[:-1]
austest = ausgdpchange[-1:]
regr = lm.LinearRegression()
regr.fit(chntrain, austrain)
print('Coefficients: \n', regr.coef_)
print("Residual sum of squares: %.2f"
% np.mean((regr.predict(chntest) - austest) ** 2))
print('Variance score: %.2f' % regr.score(chntest, austest))
plt.scatter(chntest, austest, color='black')
plt.plot(chntest, regr.predict(chntest), color='blue')
plt.xticks(())
plt.yticks(())
plt.show()
我做错了什么?我本质上试图将sklearn教程(他们使用了一些糖尿病数据集)应用于我自己的简单数据.我的数据仅包含日期,A国在特定年份的GDP百分比变化以及B国在同一年的GDP百分比变化.
What am I doing wrong? I essentially tried to apply the sklearn tutorial (They used some diabetes data set) to my own simple data. My data just contains the date, country A's % change in GDP for that specific year, and country B's % change in GDP for that same year.
我在>此处尝试了解决方案和此处(主要是试图在第一个链接中找到有关该解决方案的更多信息),但只会收到完全相同的错误.
I tried the solutions here and here (basically trying to find more out about the solution in the first link), but just receive the exact same error.
以下是完整的回溯,以备您查看:
Here is the full traceback in case you want to see it:
Traceback (most recent call last):
File "D:\My Stuff\Dropbox\Python\Python projects\test regression\tester.py", line 34, in <module>
regr.fit(chntrain, austrain)
File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\linear_model\base.py", line 376, in fit
y_numeric=True, multi_output=True)
File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\utils\validation.py", line 454, in check_X_y
check_consistent_length(X, y)
File "D:\Programs\Installed\Python34\lib\site-packages\sklearn\utils\validation.py", line 174, in check_consistent_length
"%s" % str(uniques))
ValueError: Found arrays with inconsistent numbers of samples: [ 1 107]
推荐答案
在fit(X,y)中,输入参数X应该是二维数组.但是,如果数据中的X仅是一维,则可以将其重塑为二维数组,如下所示:regr.fit(chntrain_X.reshape(len(chntrain_X), 1), chntrain_Y)
In fit(X,y),the input parameter X is supposed to be a 2-D array. But if X in your data is only one-dimension, you can just reshape it into a 2-D array like this:regr.fit(chntrain_X.reshape(len(chntrain_X), 1), chntrain_Y)
这篇关于sklearn问题:进行回归时发现样本数量不一致的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!