本文介绍了将对数正态分布的拟合 PDF 缩放到 python 中的直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个对数正态分布的样本集,想对其进行拟合.然后我想将样本的直方图和拟合的 PDF 绘制成一个图,并且我想使用直方图的原始缩放比例.

I have a log-normal distributed set a samples and want to perform a fit to it. Then I want to plot both the histogram of the samples and the fitted PDF into one plot, and I'd like to use the original scaling for the histogram.

我的问题:如何直接缩放 PDF 使其在直方图中可见?

My question: How to directly scale the PDF such that it is visible in the histogram plot?

代码如下:

import numpy as np
import scipy.stats

# generate log-normal distributed set of samples
samples   = np.random.lognormal( mean=1., sigma=.4, size=10000 )

# make a fit to the samples and generate the resulting PDF
shape, loc, scale = scipy.stats.lognorm.fit( samples, floc=0 )
x_fit       = np.linspace( samples.min(), samples.max(), 100 )
samples_fit = scipy.stats.lognorm.pdf( x_fit, shape, loc=loc, scale=scale )

而且,为了更好地理解我的意思,这里是图:

And, to hopefully better understand what I mean, here is the figure:

我的问题是,是否有一个参数可以轻松地将 PDF 缩放到直方图(我还没有找到一个,但这并不意味着太多......),这样 PDF 在中间图中可见?

My question is, if there is a parameter to easily scale the PDF to the histogram (I haven't found one but that does not mean too much...), such that the PDF is visible in the middle plot?

推荐答案

您要的是预期直方图的绘图.

What you are asking for is a plot of the expected histogram.

假设 [a, b] 是直方图的 x 个区间之一.对于一个随机大小为n的样本,区间内的期望样本数为

Suppose [a, b] is one of the x intervals of the histogram. For a randomsample of size n, the expected number of samples in the interval is

(cdf(b) - cdf(a))*n

其中 cdf(x) 是累积分布函数.要绘制预期的直方图,您需要为每个 bin 计算该值.

where cdf(x) is the cumulative distribution function. To plot the expected histogram, you'll compute that value for each bin.

下面的脚本显示了一种绘制预期直方图的方法在 matplotlib 直方图之上.它生成了这个图:

The script below shows one way to plot the expected histogramon top of a matplotlib histogram. It generates this plot:

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt


# Generate log-normal distributed set of samples
np.random.seed(1234)
samples = np.random.lognormal(mean=1., sigma=.4, size=10000)

# Make a fit to the samples.
shape, loc, scale = scipy.stats.lognorm.fit(samples, floc=0)

# Create the histogram plot using matplotlib.  The first two values in
# the tuple returned by hist are the number of samples in each bin and
# the values of the histogram's bin edges.  counts has length num_bins,
# and edges has length num_bins + 1.
num_bins = 50
clr = '#FFE090'
counts, edges, patches = plt.hist(samples, bins=num_bins, color=clr, label='Sample histogram')

# Create an array of length num_bins containing the center of each bin.
centers = 0.5*(edges[:-1] + edges[1:])

# Compute the CDF at the edges. Then prob, the array of differences,
# is the probability of a sample being in the corresponding bin.
cdf = scipy.stats.lognorm.cdf(edges, shape, loc=loc, scale=scale)
prob = np.diff(cdf)

plt.plot(centers, samples.size*prob, 'k-', linewidth=2, label='Expected histogram')

# prob can also be approximated using the PDF at the centers multiplied
# by the width of the bin:
# p = scipy.stats.lognorm.pdf(centers, shape, loc=loc, scale=scale)
# prob = p*(edges[1] - edges[0])
# plt.plot(centers, samples.size*prob, 'r')

plt.legend()

plt.show()

注意:由于 PDF 是 CDF 的导数,你可以写出 cdf(b) - cdf(a) 的近似值


Note: Since the PDF is the derivative of the CDF, you can write an approximation of cdf(b) - cdf(a) as

cdf(b) - cdf(a) = pdf(m)*(b - a)

其中 m 是区间 [a, b] 的中点.然后,您提出的确切问题的答案是通过将 PDF 乘以样本大小和直方图 bin 宽度来缩放 PDF.脚本中有一些注释掉的代码,显示了如何使用缩放后的 PDF 绘制预期的直方图.但由于 CDF 也可用于对数正态分布,您不妨使用它.

where m is, say, the midpoint of the interval [a, b]. Then the answer to the exact question that you asked is to scale the PDF by multiplying it by the sample size and the histogram bin width. There is some commented-out code in the script that shows how the expected histogram can be plotted using the scaled PDF. But since the CDF is also available for the lognormal distribution, you might as well use it.

这篇关于将对数正态分布的拟合 PDF 缩放到 python 中的直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-21 12:57