本文介绍了Python Scikit随机森林回归器错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从csv加载训练和测试数据,在scikit/sklearn中运行随机森林回归器,然后预测测试文件的输出.

I am trying to load training and test data from a csv, run the random forest regressor in scikit/sklearn, and then predict the output from the test file.

TrainLoanData.csv文件包含5列;第一列是输出,接下来的四列是要素. TestLoanData.csv包含4列-功能.

The TrainLoanData.csv file contains 5 columns; the first column is the output and the next 4 columns are the features. The TestLoanData.csv contains 4 columns - the features.

运行代码时,出现错误:

When I run the code, I get error:

    predicted_probs = ["%f" % x[1] for x in predicted_probs]
IndexError: invalid index to scalar variable.

这是什么意思?

这是我的代码:

import numpy, scipy, sklearn, csv_io //csv_io from https://raw.github.com/benhamner/BioResponse/master/Benchmarks/csv_io.py
from sklearn import datasets
from sklearn.ensemble import RandomForestRegressor

def main():
    #read in the training file
    train = csv_io.read_data("TrainLoanData.csv")
    #set the training responses
    target = [x[0] for x in train]
    #set the training features
    train = [x[1:] for x in train]
    #read in the test file
    realtest = csv_io.read_data("TestLoanData.csv")

    # random forest code
    rf = RandomForestRegressor(n_estimators=10, min_samples_split=2, n_jobs=-1)
    # fit the training data
    print('fitting the model')
    rf.fit(train, target)
    # run model against test data
    predicted_probs = rf.predict(realtest)
    print predicted_probs
    predicted_probs = ["%f" % x[1] for x in predicted_probs]
    csv_io.write_delimited_file("random_forest_solution.csv", predicted_probs)

main()

推荐答案

RandomForestRegressor的返回值是一个浮点数组:

The return value from a RandomForestRegressor is an array of floats:

In [3]: rf = RandomForestRegressor(n_estimators=10, min_samples_split=2, n_jobs=-1)

In [4]: rf.fit([[1,2,3],[4,5,6]],[-1,1])
Out[4]:
RandomForestRegressor(bootstrap=True, compute_importances=False,
           criterion='mse', max_depth=None, max_features='auto',
           min_density=0.1, min_samples_leaf=1, min_samples_split=2,
           n_estimators=10, n_jobs=-1, oob_score=False,
           random_state=<mtrand.RandomState object at 0x7fd894d59528>,
           verbose=0)

In [5]: rf.predict([1,2,3])
Out[5]: array([-0.6])

In [6]: rf.predict([[1,2,3],[4,5,6]])
Out[6]: array([-0.6,  0.4])

因此,您正在尝试为(-0.6)[1]这样的浮点索引,这是不可能的.

So you're trying to index a float like (-0.6)[1], which is not possible.

请注意,该模型不返回概率.

As a side note, the model does not return probabilities.

这篇关于Python Scikit随机森林回归器错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-23 03:35