本文介绍了转换数据以适合正态分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相当容易理解的问题。



我有一组数据,我想估计这个数据适合标准正常分配。为此,我从我的代码开始:

  [f_p,m_p] = hist(data,128); 
f_p = f_p / trapz(m_p,f_p);

x_th = min(data):. 001:max(data);
y_th = normpdf(x_th,0,1);

图(1)
bar(m_p,f_p)
持有
图(x_th,y_th,'r','LineWidth',2.5)
网格
扣除

1将如下所示:





很容易看出合身度相当差,可以看出钟形。因此,主要问题在于我的数据的差异



为了找出我的数据库应该拥有的正确的出现次数,我这样做:

  f_p_th = interp1(x_th,y_th,m_p,'spline','extrap'); 
figure(2)
bar(m_p,f_p_th)
持有
图(x_th,y_th,'r','LineWidth',2.5)
grid on
扣除

这将导致以下图。 :





因此,问题是:如何扩展我的数据块以匹配高斯分布,如图2



小心



我想强调一点:我不要想要查找适合数据的最佳分布; 问题反向:从我的数据开始,我想以这样一种方式操纵它,最终它的分布与高斯一致。 / p>

不幸的是,目前我对如何执行这些数据过滤器,变换或操纵没有一个真正的想法。

欢迎任何支持。

解决方案

可能是您感兴趣的是基于秩的逆正态变换。基本上,您先对数据进行排名,将其转换为正态分布:

  rank = bundrank(data); 
p = rank /(length(rank)+ 1); %#+1以避免Inf为最大点
newdata = norminv(p,0,1);


I have a rather easy-to-understand question.

I have a set of data and I want to estimate how good this data fit a standard normal distribution. To do so, I start with my code:

[f_p,m_p] = hist(data,128);
f_p = f_p/trapz(m_p,f_p);

x_th = min(data):.001:max(data);
y_th = normpdf(x_th,0,1);

figure(1)
bar(m_p,f_p)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off

Fig. 1 will look like the one below:

Easy to see that the fit is quite poor, altough the bell-shape can be spotted. The main problem resides therefore in the variance of my data.

To find out the proper number of occurrances my data-bins should own, I do this:

f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
figure(2)
bar(m_p,f_p_th)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off

which will result in the following fig. :

Hence, the question is: how can I scale my data-block to match the Gaussian distribution as in Fig.2 ?

CAUTION

I wanna stress the focus on one point: I don't wanna find the best distribution fitting the data; the problem is reversed: starting from my data, I'd like to manipulate it in such a way that,in the end, its distribution reasonably fits the Gaussian one.

Unfortunately, at the moment, I don't have a real idea on how to perform this data "filter", "transform" or "manipulation".

Any support would be welcome.

解决方案

May be what you are interested in is rank-based inverse normal transformation. Basically you rank the data first an them convert it to normal distribution:

rank = tiedrank( data );
p = rank / ( length(rank) + 1 ); %# +1 to avoid Inf for the max point
newdata = norminv( p, 0, 1 );

这篇关于转换数据以适合正态分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 10:57