SKLearn交叉验证:如何将折叠示例的信息传递给我的计分员功能?

您可能可以将其破解为cross_val_score方法，但难度更高.在scikit-learn中，上面显示的得分函数与评分对象(可以是一个函数)，由cross_val_score获取.计分对象是具有签名scorer(estimator, X, y)的可调用对象或函数.在我看来，您可以定义一个适用于您的指标的评分对象.您只需要删除 Source 列，然后再将数据发送到估算器，然后在计算指标时使用该列即可.如果您走这条路线，我认为您也必须包装分类器，以便其fit方法跳过 Source 列.希望有帮助...祝你好运！I am trying to craft a custom scorer function for cross-validating my (binary classification) model in scikit-learn (Python).Some examples of my raw test data:Source Feature1 Feature2 Feature3 123 0.1 0.2 0.3 123 0.4 0.5 0.6 456 0.7 0.8 0.9Assuming that any fold might contain multiple test examples that come from the same source...Then for the set of examples with the same source, I want my custom scorer to "decide" the "winner" to be the example for which the model spit out the higher probability. In other words, there can be only one correct prediction for each source but if my model claims that more than one evaluation example was "correct" (label=1), I want the example with the highest probability to be matched against the truth by my scorer.My problem is that the scorer function requires the signature:score_func(y_true, y_pred, **kwargs)where y_true and y_pred contain the probability/label only.However, what I really need is:score_func(y_true_with_source, y_pred_with_source, **kwargs)so I can group the y_pred_with_source examples by their source and choose the winner to match against that of the y_true_with_source truth. Then I can carry on to calculate my precision, for example.Is there a way I can pass in this information in some way? Maybe the examples' indices? 解决方案 It sounds like you have a learning-to-rank problem here. You are trying to find the highest-ranked instance out of each group of instances. Learning-to-rank isn't directly supported in scikit-learn right now - scikit-learn pretty much assumes i.i.d. instances - so you'll have to do some extra work. I think my first suggestion is to drop down a level in the API and use the cross-validation iterators. That would just generate indices for training and validation folds. You would subset your data with those indices and call fit and predict on the subsets, with Source removed, and then score it using the Source column.You can probably hack it in to the cross_val_score approach, but its trickier. In scikit-learn there is a distinction between the score function, which is what you showed above, and the scoring object (which can be a function) taken by cross_val_score. The scoring object is a callable object or function which has signature scorer(estimator, X, y). It looks to me like you can define a scoring object that works for your metric. You just have to remove the Source column before sending data to the estimator, and then use that column when computing your metric. If you go this route, I think you will have to wrap the classifier, too, so that its fit method skips the Source column.Hope that helps... Good luck! 这篇关于SKLearn交叉验证:如何将折叠示例的信息传递给我的计分员功能?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！