本文介绍了是否可以向SVM模型添加协变量(控制无用变量)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对机器学习和python还是很陌生,我正在尝试建立一个模型来预测患者(N = 200)与对照(N = 200)形成结构性神经影像数据.在初始预处理之后,我将神经成像数据重塑为2D数组,然后建立了以下模型:

I'm very new to machine learning and python and I'm trying to build a model to predict patients (N=200) vs controls (N=200) form structural neuroimaging data. After the initial preprocessing were I reshaped the neuroimaging data into a 2D array I built the following model:

from sklearn.svm import SVC
svc = SVC(C=1.0, kernel='linear')


from sklearn.grid_search import GridSearchCV
from numpy import range
k_range = np.arange(0.1,10,0.1)
param_grid=dict(C=k_range)
grid=GridSearchCV(svc, param_grid, cv=10, scoring='accuracy')
grid.fit(img,labels)
grid.grid_scores_
print grid.best_score_
print grid.best_params_

这给了我一个不错的结果,但我想控制一个事实,即使用不同的扫描仪获取了不同的图像(例如,用扫描仪1扫描了对象1至150,用扫描仪2扫描了对象101至300,然后用了用扫描仪3扫描受试者301至400.无论如何,可以将其添加到上面的模型中吗?

This gives me a decent a result but I'd like to control for the fact that different images were acquired with different scanners (e.g. subjects 1 through 150 were scanned with scanner 1, subjects 101 through 300 were scanned with scanner 2 and subjects 301 through 400 were scanned with scanner 3). Is there anyway this could be added to the model above?

我了解到,进行以前的功能选择可能会有所帮助.但是,当这些功能可能与扫描仪相关时,我并不想简单地提取出有意义的功能.实际上,我不希望基于扫描仪(即控制扫描仪)对患者和对照进行分类.

I read that doing a previous feature selection might help. However, I don't want to simply extract meaningful features when those features might be related to the scanner. In fact, I want to classify patients and controls NOT based on the scanner (i.e. controlling for scanner).

任何对此的想法将不胜感激,谢谢

Any thoughts on this would be appreciated,thank you

推荐答案

对于诊断,您可以查看每个扫描仪的数据分布方式,以了解您所追求的方向是否有希望.正如已经有人建议的那样,归一化(例如,每个扫描仪的均值+方差)可以是一种选择.另一个选择是在功能集中添加3个附加维度,以作为所用扫描仪的单点编码(即,对于每个示例,您在相应扫描仪的位置上分别为1,对于其他扫描仪,则为0)

For diagnostics, you could take a look at how your data is distributed per scanner to see whether this direction you're pursuing is promising. Normalization (e.g., of mean+variance per scanner) can be one option as someone already suggested. Another option is adding 3 additional dimensions to your feature set as a one-hot encoding for the scanner used (i.e., for each example, you have a 1 in the position of the appropriate scanner and 0 for others)

这篇关于是否可以向SVM模型添加协变量(控制无用变量)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-01 07:55