问题描述
我有一个相当容易理解的问题。
我有一组数据,我想估计这个数据适合标准正常分配。为此,我从我的代码开始:
[f_p,m_p] = hist(data,128);
f_p = f_p / trapz(m_p,f_p);
x_th = min(data):. 001:max(data);
y_th = normpdf(x_th,0,1);
图(1)
bar(m_p,f_p)
持有
图(x_th,y_th,'r','LineWidth',2.5)
网格
扣除
1将如下所示:
很容易看出合身度相当差,可以看出钟形。因此,主要问题在于我的数据的差异。
为了找出我的数据库应该拥有的正确的出现次数,我这样做:
f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
figure(2)
bar(m_p,f_p_th)
持有
图(x_th,y_th,'r','LineWidth',2.5)
grid on
扣除
这将导致以下图。 :
因此,问题是:如何扩展我的数据块以匹配高斯分布,如图2 ?
小心
我想强调一点:我不要想要查找适合数据的最佳分布; 问题是反向:从我的数据开始,我想以这样一种方式操纵它,最终它的分布与高斯一致。 / p>
不幸的是,目前我对如何执行这些数据过滤器,变换或操纵没有一个真正的想法。
欢迎任何支持。
可能是您感兴趣的是基于秩的逆正态变换。基本上,您先对数据进行排名,将其转换为正态分布:
rank = bundrank(data);
p = rank /(length(rank)+ 1); %#+1以避免Inf为最大点
newdata = norminv(p,0,1);
I have a rather easy-to-understand question.
I have a set of data and I want to estimate how good this data fit a standard normal distribution. To do so, I start with my code:
[f_p,m_p] = hist(data,128);
f_p = f_p/trapz(m_p,f_p);
x_th = min(data):.001:max(data);
y_th = normpdf(x_th,0,1);
figure(1)
bar(m_p,f_p)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off
Fig. 1 will look like the one below:
Easy to see that the fit is quite poor, altough the bell-shape can be spotted. The main problem resides therefore in the variance of my data.
To find out the proper number of occurrances my data-bins should own, I do this:
f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
figure(2)
bar(m_p,f_p_th)
hold on
plot(x_th,y_th,'r','LineWidth',2.5)
grid on
hold off
which will result in the following fig. :
Hence, the question is: how can I scale my data-block to match the Gaussian distribution as in Fig.2 ?
CAUTION
I wanna stress the focus on one point: I don't wanna find the best distribution fitting the data; the problem is reversed: starting from my data, I'd like to manipulate it in such a way that,in the end, its distribution reasonably fits the Gaussian one.
Unfortunately, at the moment, I don't have a real idea on how to perform this data "filter", "transform" or "manipulation".
Any support would be welcome.
May be what you are interested in is rank-based inverse normal transformation. Basically you rank the data first an them convert it to normal distribution:
rank = tiedrank( data );
p = rank / ( length(rank) + 1 ); %# +1 to avoid Inf for the max point
newdata = norminv( p, 0, 1 );
这篇关于转换数据以适合正态分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!