本文介绍了星火ML管道Logistic回归生成那就更糟了predictions大于r GLM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用ML管道运行逻辑回归模型,但由于某种原因,我得到了比R.我做了一些研究,只是我发现,是关系到这个问题后最坏的结果是this 。似乎 - [R GLM函数使用最大似然。在星火示范只拿到的记录71.3%,而右侧R能够正确predict的情况下,95.55%。我在想,如果我做错了什么的设立,以及是否有提高prediction的一种方式。以下是我的星火code和R code -

星火code

 部分model_input
标签,年龄,性别,Q1,Q2,Q3,Q4,Q5,DET_AGE_SQ
1.0,39,0,0,1,0,0,1,31.55709342560551
1.0,54,0,0,0,0,0,0,83.38062283737028
0.0,51,0,1,1,1,0,0,35.61591695501733高清trainModel(DF:数据帧):PipelineModel = {
  VAL LR =新逻辑回归()。setMaxIter(100000).setTol(0.0000000000000001)
  VAL管道=新管道()。setStages(阵列(LR))
  pipeline.fit(DF)
}VAL元= NominalAttribute.defaultAttr.withName(标签)。withValues​​(阵列(A,B))。toMetadataVAL汇编=新VectorAssembler()。
  setInputCols(阵列(年龄,性别,DET_AGE_SQ
 QA1,QA2,QA3,QA4,QA5))。
  setOutputCol(特征)VAL模型= trainModel(model_input)
VAL preD = model.transform(model_input)
pred.filter(标签!= prediction)。算

研究code

  LR<  -  model_input%方式>%GLM(=数据,公式=标签〜年龄+性别+ Q1 + Q2 + Q3 + Q4 + Q5 + DET_AGE_SQ,
          家庭=二项式)
preD< - data.frame(Y = model_input $标签,P =装(LR))
表(preD $ Y,$ P $ $ PD P> 0.5)

随时让我知道如果你需要任何其他信息。谢谢!

修改2015年9月18日我试图增加最大迭代,并显着降低了宽容。不幸的是,它并没有提高prediction。它似乎收敛于一个局部最小值,而非整体最小值的模式。


解决方案

Minimization of a loss function is pretty much a definition of the linear models and both glm and ml.classification.LogisticRegression are no different here. Fundamental difference between these two is the way how it is achieved.

All linear models from ML/MLlib are based on some variants of Gradient descent. Quality of the model generated using this approach vary on a case by case basis and depend on the Gradient Descent and regularization parameters.

R from the other hand computes an exact solution which, given its time complexity, is not well suited for large datasets.

As I've mentioned above quality of the model generated using GS depends on the input parameters so typical way to improve it is to perform hyperparameter optimization. Unfortunately ML version is rather limited here compared to MLlib but for starters you can increase a number of iterations.

这篇关于星火ML管道Logistic回归生成那就更糟了predictions大于r GLM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-25 01:33