本文介绍了为什么向量归一化可以提高聚类和分类的准确性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在《 Mahout在行动》中描述,归一化可以稍微提高准确性.谁能解释原因,谢谢!

It is described in Mahout in Action that normalization can slightly improve the accuracy.Can anyone explain the reason, thanks!

推荐答案

标准化并非总是必需的,但很少有伤害.

Normalization is not always required, but it rarely hurts.

一些例子:

K均值:

Matlab中的示例

Example in Matlab:

X = [randn(100,2)+ones(100,2);...
     randn(100,2)-ones(100,2)];

% Introduce denormalization
% X(:, 2) = X(:, 2) * 1000 + 500;

opts = statset('Display','final');

[idx,ctrs] = kmeans(X,2,...
                    'Distance','city',...
                    'Replicates',5,...
                    'Options',opts);

plot(X(idx==1,1),X(idx==1,2),'r.','MarkerSize',12)
hold on
plot(X(idx==2,1),X(idx==2,2),'b.','MarkerSize',12)
plot(ctrs(:,1),ctrs(:,2),'kx',...
     'MarkerSize',12,'LineWidth',2)
plot(ctrs(:,1),ctrs(:,2),'ko',...
     'MarkerSize',12,'LineWidth',2)
legend('Cluster 1','Cluster 2','Centroids',...
       'Location','NW')
title('K-means with normalization')

(仅供参考:如何检测我的数据集是集群还是非集群(即形成一个集群)

分布式集群:

人工神经网络(输入):

人工神经网络(输入/输出)

标准化输入变量或目标变量倾向于使训练 通过改善数值条件,可以更好地表现过程(请参见 ftp://ftp.sas.com/pub/neural/illcond/illcond.html ) 问题并确保涉及的各种默认值 初始化和终止是适当的.标准化目标 也会影响目标函数.

Standardizing either input or target variables tends to make the training process better behaved by improving the numerical condition (see ftp://ftp.sas.com/pub/neural/illcond/illcond.html) of the optimization problem and ensuring that various default values involved in initialization and termination are appropriate. Standardizing targets can also affect the objective function.

案例标准化应谨慎对待,因为它会 丢弃信息.如果该信息无关紧要,则 标准化案例可能会很有帮助.如果该信息是 重要的话,那么规范案件就可能是灾难性的.

Standardization of cases should be approached with caution because it discards information. If that information is irrelevant, then standardizing cases can be quite helpful. If that information is important, then standardizing cases can be disastrous.


有趣的是,更改度量单位甚至可能导致人们看到非常不同的群集结构:

为避免这种对度量单位选择的依赖, 标准化数据的选项.这将转换原始 测量无单位变量.

To avoid this dependence on the choice of measurement units, one has the option of standardizing the data. This converts the original measurements to unitless variables.

Kaufman等人.继续讲一些有趣的内容注意事项(第11页):

Kaufman et al. continues with some interesting considerations (page 11):

这篇关于为什么向量归一化可以提高聚类和分类的准确性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-24 15:49