问题描述
我想要python中各个变量和主要成分之间的相关性.我在sklearn中使用PCA.我不理解分解数据后如何获得加载矩阵?我的代码在这里.
I want the correlations between individual variables and principal components in python.I am using PCA in sklearn. I don't understand how can I achieve the loading matrix after I have decomposed my data? My code is here.
iris = load_iris()
data, y = iris.data, iris.target
pca = PCA(n_components=2)
transformed_data = pca.fit(data).transform(data)
eigenValues = pca.explained_variance_ratio_
http://scikit-learn.org/stable/modules/generation/sklearn .decomposition.PCA.html 没有提及如何实现.
推荐答案
我认为@RickardSjogren在描述特征向量,而@BigPanda在进行加载.有很大的不同:加载与特征向量在PCA中:何时使用一个或另一个?.
I think that @RickardSjogren is describing the eigenvectors, while @BigPanda is giving the loadings. There's a big difference: Loadings vs eigenvectors in PCA: when to use one or another?.
我使用loadings
方法.
载荷与多元线性回归中的系数更相似.我在这里不使用.T
,因为在上面链接的PCA类中,组件已经转置了. numpy.linalg.svd
生成u, s, and vt
,其中vt
是Hermetian转置,因此您首先需要使用vt.T
返回到v
.
Loadings, as given by pca.components_ * np.sqrt(pca.explained_variance_)
, are more analogous to coefficients in a multiple linear regression. I don't use .T
here because in the PCA class linked above, the components are already transposed. numpy.linalg.svd
produces u, s, and vt
, where vt
is the Hermetian transpose, so you first need to back into v
with vt.T
.
还有一个重要的细节:sklearn.PCA
中组件和负载的符号(正/负)可能与R之类的包装有所不同. 此处的更多信息:
There is also one other important detail: the signs (positive/negative) on the components and loadings in sklearn.PCA
may differ from packages such as R. More on that here:
在sklearn.decomposition.PCA中,components_为何为负?/a>.
In sklearn.decomposition.PCA, why are components_ negative?.
这篇关于使用sklearn进行因子加载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!