sklearn中的组件解释偏最小二乘方差

本文介绍了sklearn中的组件解释偏最小二乘方差的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用来自 sklearn 的代码执行 PLSRegression，并且我想保留那些解释某种程度差异的组件，例如在 PCA 中.

I am trying to perform a PLSRegression using the code from sklearn and I want to keep with those components that explain some level of variance, like in PCA.

有没有办法知道PLS中每个分量解释了多少方差

Is there a way to know how much variance is explained by each component in PLS

推荐答案

我对计算每个分量的解释方差也有同样的要求.我是 PLS 新手，母语不是英语，请参考我的解决方案.

I also have the same requirement of calculating each components' explained variance. I am new for PLS and not a native English speaker, please just take my solution for your reference.

背景:如果您选择deflation_mode"作为回归"，这是默认选项.估计的 Y 可以通过 "PLSRegression"[1] 中的这个表达式计算:

Backgroud: If you choose the 'deflation_mode' as "regression", which is the default option. The estimated Y could be calculated by this expression in "PLSRegression"[1]:

Y = TQ' + 错误

其中 T 是 x_scores_，Q 是 y_loadings_这个表达式可以提供所有主成分的估计 Y.因此，如果我们想知道第一主成分已经解释了多少方差，我们可以使用 x_scores_ 和 y_loadings_ 的第一个向量来计算估计的 Y1:

where T is the x_scores_, Q is the y_loadings_This expression could provide the estimated Y from all of principle components. So if we want to know how many variance has been explained of the first principle component, we could use the fist vector of the x_scores_ and y_loadings_ to calculate estimated Y1:

Y1 = T[0]Q[0]' + 错误

请参阅下面的 Python 代码，该代码计算每个组件的 R 平方.

Please see the code in Python below, which calculates each component's R square.

import numpy as np
from sklearn.cross_decomposition import PLSRegression
from sklearn.metrics import r2_score

pls = PLSRegression(n_components=3)
pls.fit(X,Y_true)
r2_sum = 0
for i in range(0,3):
        Y_pred=np.dot(pls.x_scores_[:,i].reshape(-1,1),pls.y_loadings_[:,i].reshape(-1,1).T)*naY.std(axis=0, ddof=1)+naY.mean(axis=0)
        r2_sum += round(r2_score(Y_true,Y_pred),3) 
        print('R2 for %d component: %g' %(i+1,round(r2_score(Y_true,Y_pred),3)))
print('R2 for all components (): %g' %r2_sum) #Sum of above
print('R2 for all components (): %g' %round(r2_score(Y_true,pls.predict(X)),3)) #Calcuted from PLSRegression's 'predict' function.

输出:

R2 for 1 component: 0.633
R2 for 2 component: 0.221
R2 for 3 component: 0.104
R2 for all components: 0.958
R2 for all components: 0.958

[1] 请注意这个表达.'score', 'weight', 'loading'的术语和数值可能因计算方法不同而略有不同.

[1] Please be aware of this expression. The jargon and value of 'score', 'weight' and 'loading' might be a little different in different calculation method.

这篇关于sklearn中的组件解释偏最小二乘方差的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！