本文介绍了插入符号将随机森林建模为PMML错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用pmml库导出Caret随机森林模型,以便可以将其用于Java中的预测. 这是我得到的错误的再现.

data(iris)
require(caret)
require(pmml)
rfGrid2 <- expand.grid(.mtry = c(1,2))
fitControl2 <- trainControl(
  method = "repeatedcv",
  number = NUMBER_OF_CV,
  repeats = REPEATES)

model.Test <- train(Species ~ .,
  data = iris,
  method ="rf",
  trControl = fitControl2,
  ntree = NUMBER_OF_TREES,
  importance = TRUE,
  tuneGrid = rfGrid2)

print(model.Test)
pmml(model.Test)

Error in UseMethod("pmml") :
  no applicable method for 'pmml' applied to an object of class "c('train', 'train.formula')"

我搜索了一段时间,发现实际上很少有关于导出到PMML的信息,pmml库在以下位置具有randomforest:

methods(pmml)
 [1] pmml.ada          pmml.coxph        pmml.cv.glmnet    pmml.glm          pmml.hclust       pmml.itemsets     pmml.kmeans
 [8] pmml.ksvm         pmml.lm           pmml.multinom     pmml.naiveBayes   pmml.nnet         pmml.randomForest pmml.rfsrc
[15] pmml.rpart        pmml.rules        pmml.svm

它使用直接的随机森林模型工作,但没有经过插入符号训练的模型.

library(randomForest)
iris.rf <- randomForest(Species ~ ., data=iris, ntree=20)
# Convert to pmml
pmml(iris.rf)
# this works!!!
str(iris.rf)

List of 19
 $ call           : language randomForest(formula = Species ~ ., data = iris, ntree = 20)
 $ type           : chr "classification"
 $ predicted      : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
...

str(model.Test)
List of 22
 $ method      : chr "rf"
 $ modelInfo   :List of 14
  ..$ label     : chr "Random Forest"
  ..$ library   : chr "randomForest"
  ..$ loop      : NULL
  ..$ type      : chr [1:2] "Classification" "Regression"
...
解决方案

您不能使用traintrain.formula类型(即,这是model.Test对象的类型)调用pmml方法./p>

train方法的维护文档说明,您可以作为finalModel字段访问最佳模型.然后可以在该对象上调用pmml方法.

rf = model.Test$finalModel
pmml(rf)

不幸的是,事实证明Caret使用矩阵接口"(即通过设置xy字段)而不是使用更常见的公式接口"(即通过设置)来指定RF模型formula字段). AFAIK的"pmml"软件包不支持此类RF模型的导出.

因此,看来最好的选择是使用两级方法.首先,使用Caret软件包为您的数据集找到最合适的RF参数化.其次,使用带有这种参数化的公式界面"手动训练最终的RF模型.

I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting.

data(iris)
require(caret)
require(pmml)
rfGrid2 <- expand.grid(.mtry = c(1,2))
fitControl2 <- trainControl(
  method = "repeatedcv",
  number = NUMBER_OF_CV,
  repeats = REPEATES)

model.Test <- train(Species ~ .,
  data = iris,
  method ="rf",
  trControl = fitControl2,
  ntree = NUMBER_OF_TREES,
  importance = TRUE,
  tuneGrid = rfGrid2)

print(model.Test)
pmml(model.Test)

Error in UseMethod("pmml") :
  no applicable method for 'pmml' applied to an object of class "c('train', 'train.formula')"

I was googling for a while, and found actually little info about exporting to PMML in general the pmml library has the randomforest in:

methods(pmml)
 [1] pmml.ada          pmml.coxph        pmml.cv.glmnet    pmml.glm          pmml.hclust       pmml.itemsets     pmml.kmeans
 [8] pmml.ksvm         pmml.lm           pmml.multinom     pmml.naiveBayes   pmml.nnet         pmml.randomForest pmml.rfsrc
[15] pmml.rpart        pmml.rules        pmml.svm

It works using a direct randomforest model, but not the caret trained one.

library(randomForest)
iris.rf <- randomForest(Species ~ ., data=iris, ntree=20)
# Convert to pmml
pmml(iris.rf)
# this works!!!
str(iris.rf)

List of 19
 $ call           : language randomForest(formula = Species ~ ., data = iris, ntree = 20)
 $ type           : chr "classification"
 $ predicted      : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
...

str(model.Test)
List of 22
 $ method      : chr "rf"
 $ modelInfo   :List of 14
  ..$ label     : chr "Random Forest"
  ..$ library   : chr "randomForest"
  ..$ loop      : NULL
  ..$ type      : chr [1:2] "Classification" "Regression"
...
解决方案

You cannot invoke the pmml method with train or train.formula types (ie. this is the type of your model.Test object).

Caret documentation for the train method says that you can access the best model as the finalModel field. You can invoke the pmml method on that object then.

rf = model.Test$finalModel
pmml(rf)

Unfortunately, it turns out that Caret specifies the RF model using the "matrix interface" (ie. by setting the x and y fields), not using the more common "formula interface" (ie. by setting the formula field). AFAIK, the "pmml" package does not support the export of such RF models.

So, looks like your best option is to use a two-level approach. First, use the Caret package to find the most appropriate RF parametrization for your dataset. Second, train the final RF model manually using the "formula interface" with this parametrization.

这篇关于插入符号将随机森林建模为PMML错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:22