问题描述
我有一个数据集,其中包含从0到1e-5的值.我猜数据可以用对数正态分布来描述.因此,我使用scipy.stats.lognorm拟合数据,并希望通过使用matplotlib在同一图形上绘制原始数据和拟合分布.
I have a data set which contains values from 0 to 1e-5. I guess the data can be described by lognormal distribution. So I use scipy.stats.lognorm to fit my data and want to plot the origin data and the fitting distribution on a same figure by using matplotlib.
首先,我通过直方图绘制样本:
Firstly, I plot the sample by histogram:
然后,我通过线图添加拟合分布.但是,这会将Y轴更改为非常大的数字:
Then, I add the fitting distribution by line plot. However, this will change the Y-axis to a very large number:
因此,原始数据(样本)无法在图中看到!
So the origin data (sample) cannot be seen on the figure!
我检查了所有变量,发现变量 pdf_fitted
很大(> 1e7).我真的不明白为什么对由相同分布 scistats.lognorm.pdf
生成的样本进行简单拟合 scistats.lognorm.fit
不起作用.这是演示我的问题的代码:
I've check all variables and I found that the variable pdf_fitted
is so large (>1e7). I really don't understand why a simple fit scistats.lognorm.fit
to a sample that was generated by the same distribution scistats.lognorm.pdf
doesn't work. Here is the codes to demonstrate my problem:
from matplotlib import pyplot as plt
from scipy import stats as scistats
import numpy as np
# generate a sample for x between 0 and 1e-5
x = np.linspace(0, 1e-5, num=1000)
y = scistats.lognorm.pdf(x, 3, loc=0, scale=np.exp(10))
h = plt.hist(y, bins=40) # plot the sample by histogram
# plt.show()
# fit the sample by using Log Normal distribution
param = scistats.lognorm.fit(y)
print("Log-normal distribution parameters : ", param)
pdf_fitted = scistats.lognorm.pdf(
x, *param[:-2], loc=param[-2], scale=param[-1])
plt.plot(x, pdf_fitted, label="Fitted Lognormal distribution")
plt.ticklabel_format(style='sci', scilimits=(-3, 4), axis='x')
plt.legend()
plt.show()
推荐答案
问题
您面临的直接问题是您的健康状况非常非常糟糕.如果您将绘图上的 x 和 y 比例设置为 log,您可以看到这一点,例如 plt.xscale('log')
和 plt.yscale('log')
.这让您可以在一个图上同时看到直方图和拟合数据:
The problem
The immediate problem that you're having is that your fit is really, really bad. You can see this if you set the x and y scale on the plot to log, like with plt.xscale('log')
and plt.yscale('log')
. This lets you see both your histogram and your fitted data on a single plot:
所以它在两个方向上都偏离了多个数量级.
so it's off by many orders of magnitude in both directions.
您从 stats.lognorm
表示的概率分布生成样本并对其进行拟合的整个方法是错误的.这是一种正确的方法,使用与您在问题中提供的对数范数分布相同的参数:
Your whole approach to generating a sample from the probability distribution represented by stats.lognorm
and fitting it was wrong. Here's a correct way to do it, using the same parameters for the lognorm distribution that you supplied in your question:
from matplotlib import pyplot as plt
from scipy import stats as scistats
import numpy as np
plt.figure(figsize=(12,7))
realparam = [.1, 0, np.exp(10)]
# generate pdf data around the mean value
m = realparam[2]
x = np.linspace(m*.6, m*1.4, num=10000)
y = scistats.lognorm.pdf(x, *realparam)
# generate a matching random sample
sample = scistats.lognorm.rvs(*realparam, size=100000)
# plot the sample by histogram
h = plt.hist(sample, bins=100, density=True)
# fit the sample by using Log Normal distribution
param = scistats.lognorm.fit(sample)
print("Log-normal distribution parameters : ", param)
pdf_fitted = scistats.lognorm.pdf(x, *param)
plt.plot(x, pdf_fitted, lw=5, label="Fitted Lognormal distribution")
plt.legend()
plt.show()
输出:
Log-normal distribution parameters : (0.09916091013245995, -215.9562383088556, 22245.970148671593)
这篇关于使用scipy对数正态分布来拟合较小值的数据,然后在matplotlib中显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!