我有一个2D数据集,我想绘制一个2D直方图,直方图上的每个单元格代表数据点的概率。因此,为了获得概率,我需要对直方图数据进行归一化,以使其总和为1。这是我从2Dhistogram文档中获得的示例:

xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
#create edges of bins

#create random data points
x=np.random.normal(2,1,100)
y=np.random.normal(1,1,100)
H,xedges,yedges = np.histogram2d(x,y,bins=(xedges,yedges))
#setting normed=True in histogram2d doesn't seem to do what I need

H=H.T
#weirdly histogram2d swaps the x,y axis, so transpose to restore it.

fig = plt.figure(figsize=(7,3))
plt.imshow(H,interpolation='nearest',origin='low',extent=[xedges[0], xedges[-1],yedges[0],yedges[-1]])
plt.show()


Resulting plot

首先,np.sum(H)给出类似于86的值。我希望每个单元格代表位于该合并单元格上的数据的概率,因此它们都应求和为1。此外,如何绘制图例以将颜色强度映射到它的值与imshow吗?

谢谢!

最佳答案

尝试使用normed参数。同样,根据docs,H中的值将计算为bin_count / sample_count / bin_area。因此,我们计算垃圾箱的面积,并将其乘以H,即可得出垃圾箱的概率。

xedges = [0,1,3,5]
yedges = [0,2,3,4,6]
# create edges of bins

x = np.random.normal(2, 1, 100) # create random data points
y = np.random.normal(1, 1, 100)
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges), normed=True)
areas = np.matmul(np.array([np.diff(xedges)]).T, np.array([np.diff(yedges)]))
# setting normed=True in histogram2d doesn't seem to do what I need

fig = plt.figure(figsize=(7, 3))
im = plt.imshow(H*areas, interpolation='nearest', origin='low', extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
plt.colorbar(im)
plt.show()

关于python - 二维直方图针对概率进行了归一化,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50939778/

10-13 07:22