问题描述
我有二进制分类问题,我想计算结果的roc_auc.为此,我使用sklearn以两种不同的方式进行了处理.我的代码如下.
I have binary classification problem where I want to calculate the roc_auc of the results. For this purpose, I did it in two different ways using sklearn. My code is as follows.
代码1:
from sklearn.metrics import make_scorer
from sklearn.metrics import roc_auc_score
myscore = make_scorer(roc_auc_score, needs_proba=True)
from sklearn.model_selection import cross_validate
my_value = cross_validate(clf, X, y, cv=10, scoring = myscore)
print(np.mean(my_value['test_score'].tolist()))
我得到的输出为0.60
.
代码2:
y_score = cross_val_predict(clf, X, y, cv=k_fold, method="predict_proba")
from sklearn.metrics import roc_curve, auc
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(2):
fpr[i], tpr[i], _ = roc_curve(y, y_score[:,i])
roc_auc[i] = auc(fpr[i], tpr[i])
print(roc_auc)
我得到的输出为{0: 0.41, 1: 0.59}
.
我很困惑,因为我在两个代码中得到了两个不同的分数.请让我知道为什么会发生这种差异,以及正确的做法是什么.
I am confused since I get two different scores in the two codes. Please let me know why this difference happens and what is the correct way of doing this.
如果需要,我很乐意提供更多详细信息.
I am happy to provide more details if needed.
推荐答案
似乎您从另一个答案中使用了我的代码的一部分,所以我虽然也要回答这个问题.
It seems that you used a part of my code from another answer, so I though to also answer this question.
对于二元分类的情况,您有2个类别,一个是肯定类别.
For a binary classification case, you have 2 classes and one is the positive class.
例如,请参见此处. pos_label
是肯定类的标签. pos_label=None
时,如果y_true
位于{-1, 1}
或{0, 1}
中,则pos_label
设置为1
,否则将引发错误.
For example see here. pos_label
is the label of the positive class. When pos_label=None
, if y_true
is in {-1, 1}
or {0, 1}
, pos_label
is set to 1
, otherwise an error will be raised..
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import cross_val_predict
from sklearn.linear_model import LogisticRegression
import numpy as np
iris = datasets.load_iris()
X = iris.data
y = iris.target
mask = (y!=2)
y = y[mask]
X = X[mask,:]
print(y)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
positive_class = 1
clf = OneVsRestClassifier(LogisticRegression())
y_score = cross_val_predict(clf, X, y, cv=10 , method='predict_proba')
fpr = dict()
tpr = dict()
roc_auc = dict()
fpr[positive_class], tpr[positive_class], _ = roc_curve(y, y_score[:, positive_class])
roc_auc[positive_class] = auc(fpr[positive_class], tpr[positive_class])
print(roc_auc)
{1: 1.0}
和
from sklearn.metrics import make_scorer
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_validate
myscore = make_scorer(roc_auc_score, needs_proba=True)
clf = OneVsRestClassifier(LogisticRegression())
my_value = cross_validate(clf, X, y, cv=10, scoring = myscore)
print(np.mean(my_value['test_score'].tolist()))
1.0
这篇关于如何在Sklearn中获得Roc Auc进行二进制分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!