scikit-learn 支持向量机的 predict_proba 的混淆概率

本文介绍了scikit-learn 支持向量机的 predict_proba 的混淆概率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的目的是根据特定类别的每个样本的排序概率绘制 PR 曲线.但是，我发现当我使用两个不同的标准数据集时，svm 的 predict_proba() 获得的概率有两种不同的行为:虹膜和数字.

My purpose is to draw the PR curve by the sorted probability of each sample for a specific class. However, I found that the obtained probabilities by svm's predict_proba() have two different behaviors when I use two different standard datasets: the iris and digits.

第一种情况是用iris"情况和下面的python代码进行评估的，它的工作原理是类获得最高概率.

The first case is evaluated with the "iris" case with the python code below, and it works reasonably that the class gets the highest probability.

D = datasets.load_iris()
clf = SVC(kernel=chi2_kernel, probability=True).fit(D.data, D.target)
output_predict = clf.predict(D.data)
output_proba = clf.predict_proba(D.data)
output_decision_function = clf.decision_function(D.data)
output_my = proba_to_class(output_proba, clf.classes_)

print D.data.shape, D.target.shape
print "target:", D.target[:2]
print "class:", clf.classes_
print "output_predict:", output_predict[:2]
print "output_proba:", output_proba[:2]

接下来，它产生如下输出.显然，每个样本的最高概率与 predict() 的输出匹配:样本 #1 的 0.97181088 和样本 #2 的 0.96961523.

Next, it produces the outputs like below. Apparently, the highest probability of each sample match the outputs of the predict(): The 0.97181088 for sample #1 and 0.96961523 for sample #2.

(150, 4) (150,)
target: [0 0]
class: [0 1 2]
output_predict: [0 0]
output_proba: [[ 0.97181088  0.01558693  0.01260218]
[ 0.96961523  0.01702481  0.01335995]]

但是，当我使用以下代码将数据集更改为数字"时，概率揭示了一种相反的现象，即每个样本的最低概率主导了 predict() 的输出标签，样本 #1 的概率为 0.00190932 和0.00220549 样品#2.

However, when I change the dataset to "digits" with the following code, the probabilities reveal an inverse phenomenon, that the lowest probability of each sample dominates the outputted labels of the predict() with probability 0.00190932 for sample #1 and 0.00220549 for sample #2.

D = datasets.load_digits()

输出:

(1797, 64) (1797,)
target: [0 1]
class: [0 1 2 3 4 5 6 7 8 9]
output_predict: [0 1]
output_proba: [[ 0.00190932  0.11212957  0.1092459   0.11262532      0.11150733  0.11208733
0.11156622  0.11043403  0.10747514  0.11101985]
[ 0.10991574  0.00220549  0.10944998  0.11288081  0.11178518   0.11234661
0.11182221  0.11065663  0.10770783  0.11122952]]

我已阅读这篇文章，它提供了一个解决方案使用带有decision_function() 的线性SVM.但是，由于我的任务，我仍然必须专注于 SVM 的卡方内核.

I've read this post and it leads a solution to using linear SVM with decision_function(). However, because of my task, I still have to focus on the chi-squared kernel for SVM.

有什么解决办法吗?

the

scikit-learn 支持向量机的 predict_proba 的混淆概率

问题描述

推荐答案