围绕JPMML-SkLearn命令行应用程序的薄包装.有关受支持的Scikit-Learn Estimator和Transformer类型的列表,请参阅JPMML-SkLearn项目的文档. a thin wrapper around the JPMML-SkLearn command-line application. For a list of supported Scikit-Learn Estimator and Transformer types, please refer to the documentation of the JPMML-SkLearn project.正如@ user1808924所指出的,它支持Python 2.7或3.4+.它还需要Java 1.7 +As @user1808924 notes, it supports Python 2.7 or 3.4+. It also requires Java 1.7+通过以下方式安装:(需要 git )Installed via: (requires git)pip install git+https://github.com/jpmml/sklearn2pmml.git 如何将分类器树导出到PMML的示例.首先生长该树:# example tree & viz from http://scikit-learn.org/stable/modules/tree.htmlfrom sklearn import datasets, treeiris = datasets.load_iris()clf = tree.DecisionTreeClassifier()clf = clf.fit(iris.data, iris.target) SkLearn2PMML转换有两个部分,一个是估计器(我们的clf),另一个是映射器(用于离散化或PCA等预处理步骤).我们的映射器非常基础,因为我们没有进行任何转换.There are two parts to an SkLearn2PMML conversion, an estimator (our clf) and a mapper (for preprocessing steps such as discretization or PCA). Our mapper is pretty basic, since we are not doing any transformations.from sklearn_pandas import DataFrameMapperdefault_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']])from sklearn2pmml import sklearn2pmmlsklearn2pmml(estimator=clf, mapper=default_mapper, pmml="D:/workspace/IrisClassificationTree.pmml")可以(虽然没有记录)通过mapper=None,但是您会看到预测变量名称丢失了(返回x1而不是sepal length等).It is possible (though not documented) to pass mapper=None, but you will see that the predictor names get lost (returning x1 not sepal length etc.).让我们看一下.pmml文件:<?xml version="1.0" encoding="UTF-8" standalone="yes"?><PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3"> <Header> <Application name="JPMML-SkLearn" version="1.1.1"/> <Timestamp>2016-09-26T19:21:43Z</Timestamp> </Header> <DataDictionary> <DataField name="sepal length (cm)" optype="continuous" dataType="float"/> <DataField name="sepal width (cm)" optype="continuous" dataType="float"/> <DataField name="petal length (cm)" optype="continuous" dataType="float"/> <DataField name="petal width (cm)" optype="continuous" dataType="float"/> <DataField name="Species" optype="categorical" dataType="string"> <Value value="setosa"/> <Value value="versicolor"/> <Value value="virginica"/> </DataField> </DataDictionary> <TreeModel functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="Species" usageType="target"/> <MiningField name="sepal length (cm)"/> <MiningField name="sepal width (cm)"/> <MiningField name="petal length (cm)"/> <MiningField name="petal width (cm)"/> </MiningSchema> <Output> <OutputField name="probability_setosa" dataType="double" feature="probability" value="setosa"/> <OutputField name="probability_versicolor" dataType="double" feature="probability" value="versicolor"/> <OutputField name="probability_virginica" dataType="double" feature="probability" value="virginica"/> </Output> <Node id="1"> <True/> <Node id="2" score="setosa" recordCount="50.0"> <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="0.8"/> <ScoreDistribution value="setosa" recordCount="50.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="0.0"/> </Node> <Node id="3"> <SimplePredicate field="petal width (cm)" operator="greaterThan" value="0.8"/> <Node id="4"> <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.75"/> <Node id="5"> <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.95"/> <Node id="6" score="versicolor" recordCount="47.0"> <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.6500001"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="47.0"/> <ScoreDistribution value="virginica" recordCount="0.0"/> </Node> <Node id="7" score="virginica" recordCount="1.0"> <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.6500001"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="1.0"/> </Node> </Node> <Node id="8"> <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.95"/> <Node id="9" score="virginica" recordCount="3.0"> <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.55"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="3.0"/> </Node> <Node id="10"> <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.55"/> <Node id="11" score="versicolor" recordCount="2.0"> <SimplePredicate field="sepal length (cm)" operator="lessOrEqual" value="6.95"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="2.0"/> <ScoreDistribution value="virginica" recordCount="0.0"/> </Node> <Node id="12" score="virginica" recordCount="1.0"> <SimplePredicate field="sepal length (cm)" operator="greaterThan" value="6.95"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="1.0"/> </Node> </Node> </Node> </Node> <Node id="13"> <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.75"/> <Node id="14"> <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.8500004"/> <Node id="15" score="virginica" recordCount="2.0"> <SimplePredicate field="sepal width (cm)" operator="lessOrEqual" value="3.1"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="2.0"/> </Node> <Node id="16" score="versicolor" recordCount="1.0"> <SimplePredicate field="sepal width (cm)" operator="greaterThan" value="3.1"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="1.0"/> <ScoreDistribution value="virginica" recordCount="0.0"/> </Node> </Node> <Node id="17" score="virginica" recordCount="43.0"> <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.8500004"/> <ScoreDistribution value="setosa" recordCount="0.0"/> <ScoreDistribution value="versicolor" recordCount="0.0"/> <ScoreDistribution value="virginica" recordCount="43.0"/> </Node> </Node> </Node> </Node> </TreeModel></PMML>第一个分割(节点1)的花瓣宽度为0.8.节点2(花瓣宽度The first split (Node 1) is on petal width at 0.8. Node 2 (petal width <= 0.8) captures all of the setosa, with nothing else.您可以将pmml输出与graphviz输出进行比较:You can compare the pmml output to the graphviz output:from sklearn.externals.six import StringIOimport pydotplus # this might be pydot for python 2.7dot_data = StringIO()tree.export_graphviz(clf, out_file=dot_data, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True, special_characters=True)graph = pydotplus.graph_from_dot_data(dot_data.getvalue())graph.write_pdf("D:/workspace/iris.pdf")# for in-line display, you can also do:# from IPython.display import Image# Image(graph.create_png()) 这篇关于将python scikit学习模型导出到pmml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-13 18:52