python中的对数正态分布

本文介绍了python中的对数正态分布的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在 stackoverflow 中看到了几个关于如何拟合 log-normal 分布 的问题.还有两个澄清我需要知道.

I have seen several questions in stackoverflow regarding how to fit a log-normal distribution. Still there are two clarifications that I need known.

我有一个样本数据，其对数服从正态分布.所以我可以使用 scipy.stats.lognorm.fit(即 log-normal distribution)

I have a sample data, the logarithm of which follows a normal distribution. So I can fit the data using scipy.stats.lognorm.fit (i.e a log-normal distribution)

拟合效果很好，还给了我标准偏差.这是我的一段代码和结果.

The fit is working fine, and also gives me the standard deviation. Here is my piece of code with the results.

sample = np.log10(data) #taking the log10 of the data

scatter,loc,mean = stats.lognorm.fit(sample) #Gives the paramters of the fit

x_fit = np.linspace(13.0,15.0,100)
pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,mean) #Gives the PDF

print "scatter for data is %s" %scatter
print "mean of data is %s" %mean

结果

scatter for data is 0.186415047243
mean for data is 1.15731050926

从图片上你可以清楚地看到平均值在14.2左右，但我得到的是1.15??！！ 为什么会这样?显然对数(平均值)也不接近 14.2！！

在这个帖子和这个问题中提到了log(mean) 是实际平均值.

In THIS POST and in THIS QUESTION it is mentioned that the log(mean) is the actual mean.

但是您可以从我上面的代码中看到，我获得的拟合是使用 sample = log(data) 并且它似乎也很合适.但是当我尝试

But you can see from my above code, the fit that I have obtained is using a the sample = log(data) and it also seems to fit well. However when I tried

sample = data
pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,np.log10(mean))

合身似乎不起作用.

1) 为什么平均值不是 14.2?

2) 如何绘制填充/绘制显示 1 sigma 置信区域的垂直线?

推荐答案

你说

我有一个样本数据，其对数服从正态分布.

假设 data 是包含样本的数组.将这些数据拟合到使用 scipy.stats.lognorm 的对数正态分布，使用:

Suppose data is the array containing the samples. To fit this data toa log-normal distribution using scipy.stats.lognorm, use:

s, loc, scale = stats.lognorm.fit(data, floc=0)

现在假设 mu 和 sigma 是基础正态分布.得到这些值的估计从这个适合，使用:

Now suppose mu and sigma are the mean and standard deviation of theunderlying normal distribution. To get the estimate of those valuesfrom this fit, use:

estimated_mu = np.log(scale)
estimated_sigma = s

(这些是不是平均值和标准偏差的估计data 中的样本.有关公式，请参阅维基百科页面对于以 mu 和 sigma 表示的对数正态分布的均值和方差.)

(These are not the estimates of the mean and standard deviation ofthe samples in data. See the wikipedia page for the formulasfor the mean and variance of a log-normal distribution in terms of mu and sigma.)

要组合直方图和 PDF，您可以使用，例如，

To combine the histogram and the PDF, you can use, for example,

import matplotlib.pyplot as plt.

plt.hist(data, bins=50, normed=True, color='c', alpha=0.75)
xmin = data.min()
xmax = data.max()
x = np.linspace(xmin, xmax, 100)
pdf = stats.lognorm.pdf(x, s, scale=scale)
plt.plot(x, pdf, 'k')

如果你想查看数据的日志，你可以这样做以下.注意使用了 normal 分布的 PDF在这里.

If you want to see the log of the data, you could do something likethe following. Note the the PDF of the normal distribution is usedhere.

logdata = np.log(data)
plt.hist(logdata, bins=40, normed=True, color='c', alpha=0.75)
xmin = logdata.min()
xmax = logdata.max()
x = np.linspace(xmin, xmax, 100)
pdf = stats.norm.pdf(x, loc=estimated_mu, scale=estimated_sigma)
plt.plot(x, pdf, 'k')

顺便说一下，使用 stats.lognorm 拟合的替代方法是拟合 log(data)使用 stats.norm.fit:

By the way, an alternative to fitting with stats.lognorm is to fit log(data)using stats.norm.fit:

logdata = np.log(data)
estimated_mu, estimated_sigma = stats.norm.fit(logdata)