本文介绍了如何从mlr包中将阻塞因子包括在makeClassifTask()中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在某些分类任务中,使用mlr包,我需要处理与此类似的data.frame:

In some classification tasks, using mlr package, I need to deal with a data.frame similar to this one:

set.seed(pi)
# Dummy data frame
df <- data.frame(
   # Repeated values ID
   ID = sort(sample(c(0:20), 100, replace = TRUE)),
   # Some variables
   X1 = runif(10, 1, 10),
   # Some Label
   Label = sample(c(0,1), 100, replace = TRUE)
   )
df 

我需要对模型进行交叉验证,并使用相同的ID值,我从教程中知道:

I need to cross-validate the model keeping together the values with the same ID, I know from the tutorial that:

https://mlr -org.github.io/mlr-tutorial/release/html/task/index.html#further-settings

问题是我如何在makeClassifTask中包括该阻止因素?

The question is how can I include this blocking factor in the makeClassifTask?

不幸的是,我找不到任何示例.

Unfortunately, I couldn't find any example.

推荐答案

您具有哪个版本的mlr?一段时间以来,阻塞应该是其中的一部分.您可以直接在makeClassifTask

What version of mlr do you have? Blocking should be part of it since a while. You can find it directly as an argument in makeClassifTask

以下是您的数据示例:

df$ID = as.factor(df$ID)
df2 = df
df2$ID = NULL
df2$Label = as.factor(df$Label)
tsk = makeClassifTask(data = df2, target = "Label", blocking = df$ID)
res = resample("classif.rpart", tsk, resampling = cv10)

# to prove-check that blocking worked
lapply(1:10, function(i) {
  blocks.training = df$ID[res$pred$instance$train.inds[[i]]]
  blocks.testing = df$ID[res$pred$instance$test.inds[[i]]]
  intersect(blocks.testing, blocks.training)
})
#all entries are empty, blocking indeed works! 

这篇关于如何从mlr包中将阻塞因子包括在makeClassifTask()中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 07:10