本文介绍了使用scipy对数正态分布来拟合较小值的数据,然后在matplotlib中显示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中包含从0到1e-5的值.我猜数据可以用对数正态分布来描述.因此,我使用scipy.stats.lognorm拟合数据,并希望通过使用matplotlib在同一图形上绘制原始数据和拟合分布.

I have a data set which contains values from 0 to 1e-5. I guess the data can be described by lognormal distribution. So I use scipy.stats.lognorm to fit my data and want to plot the origin data and the fitting distribution on a same figure by using matplotlib.

首先,我通过直方图绘制样本:

Firstly, I plot the sample by histogram:

然后,我通过线图添加拟合分布.但是,这会将Y轴更改为非常大的数字:

Then, I add the fitting distribution by line plot. However, this will change the Y-axis to a very large number:

因此,原始数据(样本)无法在图中看到!

So the origin data (sample) cannot be seen on the figure!

我检查了所有变量,发现变量 pdf_fitted 很大(> 1e7).我真的不明白为什么对由相同分布 scistats.lognorm.pdf 生成的样本进行简单拟合 scistats.lognorm.fit 不起作用.这是演示我的问题的代码:

I've check all variables and I found that the variable pdf_fitted is so large (>1e7). I really don't understand why a simple fit scistats.lognorm.fit to a sample that was generated by the same distribution scistats.lognorm.pdf doesn't work. Here is the codes to demonstrate my problem:

from matplotlib import pyplot as plt
from scipy import stats as scistats
import numpy as np

# generate a sample for x between 0 and 1e-5
x = np.linspace(0, 1e-5, num=1000)
y = scistats.lognorm.pdf(x, 3, loc=0, scale=np.exp(10))
h = plt.hist(y, bins=40) # plot the sample by histogram
# plt.show()

# fit the sample by using Log Normal distribution
param = scistats.lognorm.fit(y)
print("Log-normal distribution parameters : ", param)
pdf_fitted = scistats.lognorm.pdf(
    x, *param[:-2], loc=param[-2], scale=param[-1])
plt.plot(x, pdf_fitted, label="Fitted Lognormal distribution")
plt.ticklabel_format(style='sci', scilimits=(-3, 4), axis='x')
plt.legend()
plt.show()

推荐答案

问题

您面临的直接问题是您的健康状况非常非常糟糕.如果您将绘图上的 x 和 y 比例设置为 log,您可以看到这一点,例如 plt.xscale('log')plt.yscale('log').这让您可以在一个图上同时看到直方图和拟合数据:

The problem

The immediate problem that you're having is that your fit is really, really bad. You can see this if you set the x and y scale on the plot to log, like with plt.xscale('log') and plt.yscale('log'). This lets you see both your histogram and your fitted data on a single plot:

所以它在两个方向上都偏离了多个数量级.

so it's off by many orders of magnitude in both directions.

您从 stats.lognorm 表示的概率分布生成样本并对其进行拟合的整个方法是错误的.这是一种正确的方法,使用与您在问题中提供的对数范数分布相同的参数:

Your whole approach to generating a sample from the probability distribution represented by stats.lognorm and fitting it was wrong. Here's a correct way to do it, using the same parameters for the lognorm distribution that you supplied in your question:

from matplotlib import pyplot as plt
from scipy import stats as scistats
import numpy as np

plt.figure(figsize=(12,7))
realparam = [.1, 0, np.exp(10)]

# generate pdf data around the mean value
m = realparam[2]
x = np.linspace(m*.6, m*1.4, num=10000)
y = scistats.lognorm.pdf(x, *realparam)

# generate a matching random sample
sample = scistats.lognorm.rvs(*realparam, size=100000)
# plot the sample by histogram
h = plt.hist(sample, bins=100, density=True)

# fit the sample by using Log Normal distribution
param = scistats.lognorm.fit(sample)
print("Log-normal distribution parameters : ", param)
pdf_fitted = scistats.lognorm.pdf(x, *param)
plt.plot(x, pdf_fitted, lw=5, label="Fitted Lognormal distribution")
plt.legend()
plt.show()

输出:

Log-normal distribution parameters :  (0.09916091013245995, -215.9562383088556, 22245.970148671593)

这篇关于使用scipy对数正态分布来拟合较小值的数据,然后在matplotlib中显示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 04:00