转换数据以适合正态分布

本文介绍了转换数据以适合正态分布的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个相当容易理解的问题。

我有一组数据，我想估计这个数据适合标准正常分配。为此，我从我的代码开始：

  [f_p，m_p] = hist（data，128）; 
 f_p = f_p / trapz（m_p，f_p）; 
 
 x_th = min（data）:. 001：max（data）; 
 y_th = normpdf（x_th，0,1）; 
 
图（1）
 bar（m_p，f_p）
持有
图（x_th，y_th，'r'，'LineWidth'，2.5）
网格
扣除

1将如下所示：

很容易看出合身度相当差，可以看出钟形。因此，主要问题在于我的数据的差异。

为了找出我的数据库应该拥有的正确的出现次数，我这样做：

  f_p_th = interp1（x_th，y_th，m_p，'spline'，'extrap'）; 
 figure（2）
 bar（m_p，f_p_th）
持有
图（x_th，y_th，'r'，'LineWidth'，2.5）
 grid on 
扣除

这将导致以下图。：

因此，问题是：如何扩展我的数据块以匹配高斯分布，如图2 ？

小心

我想强调一点：我不要想要查找适合数据的最佳分布; 问题是反向：从我的数据开始，我想以这样一种方式操纵它，最终它的分布与高斯一致。 / p>

不幸的是，目前我对如何执行这些数据过滤器，变换或操纵没有一个真正的想法。

欢迎任何支持。

解决方案

可能是您感兴趣的是基于秩的逆正态变换。基本上，您先对数据进行排名，将其转换为正态分布：

  rank = bundrank（data）; 
 p = rank /（length（rank）+ 1）; ％＃+1以避免Inf为最大点
 newdata = norminv（p，0，1）;

I have a rather easy-to-understand question.

I have a set of data and I want to estimate how good this data fit a standard normal distribution. To do so, I start with my code:

[f_p,m_p] = hist(data,128);
f_p = f_p/trapz(m_p,f_p);

x_th = min(data):.001:max(data);
y_th = normpdf(x_th,0,1);

figure(1)
bar(m_p,f_p)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off

Fig. 1 will look like the one below:

Easy to see that the fit is quite poor, altough the bell-shape can be spotted. The main problem resides therefore in the variance of my data.

To find out the proper number of occurrances my data-bins should own, I do this:

f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
figure(2)
bar(m_p,f_p_th)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off

which will result in the following fig. :

Hence, the question is: how can I scale my data-block to match the Gaussian distribution as in Fig.2 ?

CAUTION

I wanna stress the focus on one point: I don't wanna find the best distribution fitting the data; the problem is reversed: starting from my data, I'd like to manipulate it in such a way that,in the end, its distribution reasonably fits the Gaussian one.

Unfortunately, at the moment, I don't have a real idea on how to perform this data "filter", "transform" or "manipulation".

Any support would be welcome.

解决方案

May be what you are interested in is rank-based inverse normal transformation. Basically you rank the data first an them convert it to normal distribution:

rank = tiedrank( data );
p = rank / ( length(rank) + 1 ); %# +1 to avoid Inf for the max point
newdata = norminv( p, 0, 1 );

这篇关于转换数据以适合正态分布的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！