本文介绍了在 matplotlib 中绘制时,正态分布显得过于密集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试估计数据的概率密度函数.就我而言,数据是形状为8200 x 8100的卫星图像.下面,我将向您展示 PDF 的代码(函数is_outlier"是由在这里发布此代码的人借用的).我们可以看到,图1中的PDF太密集了.我猜这是由于卫星图像组成的数千个像素所致.这很丑.

I am trying to estimate the probability density function of my data. IN my case, the data is a satellite image with a shape 8200 x 8100.Below, I present you the code of PDF (the function 'is_outlier' is borrowed by a guy that post this code on here ). As we can see, the PDF is in figure 1 too dense. I guess, this is due to the thousands of pixels that the satellite image is composed of. This is very ugly.

我的问题是,如何绘制不太密集的PDF?例如,如图2所示.

My question is, how can I plot a PDF that is not too dense? something like shown in figure 2 for example.

lst = 'satellite_img.tif' #import the image
lst_flat = lst.flatten() #create 1D array

#the function below removes the outliers
def is_outlier(points, thres=3.5):

    if len(points.shape) == 1:
        points = points[:,None]
    median = np.median(points, axis=0)
    diff = np.sum((points - median)**2, axis=-1)
    diff = np.sqrt(diff)
    med_abs_deviation = np.median(diff)

    modified_z_score = 0.6745 * diff / med_abs_deviation

    return modified_z_score > thres


lst_flat = np.r_[lst_flat]
lst_flat_filtered = lst_flat[~is_outlier(lst_flat)]
fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))

plt.plot(lst_flat_filtered, fit)
plt.hist(lst_flat_filtered, bins=30, normed=True)
plt.show()

图1

图2

推荐答案

问题是 PDF 图中的 x 值未排序,因此绘制的线在随机点之间来回移动,造成您看到的混乱

The issue is that the x values in the PDF plot are not sorted, so the plotted line is going back and forwards between random points, creating the mess you see.

两个选项:

  1. 不要画线,只画点(如果您有很多点,就不好了,但是会确认我上面说的是否正确):

  1. Don't plot the line, just plot points (not great if you have lots of points, but will confirm if what I said above is right or not):

plt.plot(lst_flat_filtered, fit, 'bo')

  • 在计算PDF并将其绘制之前,先对 lst_flat_filtered 数组进行排序:

    lst_flat = np.r_[lst_flat]
    lst_flat_filtered = np.sort(lst_flat[~is_outlier(lst_flat)])  # Changed this line
    fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
    
    plt.plot(lst_flat_filtered, fit)
    

  • 以下是一些显示这些行为的最小示例:

    Here's some minimal examples showing these behaviours:

    import numpy as np
    import scipy.stats as stats
    import matplotlib.pyplot as plt
    
    lst_flat_filtered = np.random.normal(7, 5, 1000)
    
    fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
    
    plt.hist(lst_flat_filtered, bins=30, normed=True)
    
    plt.plot(lst_flat_filtered, fit)
    
    plt.show()
    
    import numpy as np
    import scipy.stats as stats
    import matplotlib.pyplot as plt
    
    lst_flat_filtered = np.random.normal(7, 5, 1000)
    
    fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
    
    plt.hist(lst_flat_filtered, bins=30, normed=True)
    
    plt.plot(lst_flat_filtered, fit, 'bo')
    
    plt.show()
    
    import numpy as np
    import scipy.stats as stats
    import matplotlib.pyplot as plt
    
    lst_flat_filtered = np.sort(np.random.normal(7, 5, 1000))
    
    fit = stats.norm.pdf(lst_flat_filtered, np.mean(lst_flat_filtered), np.std(lst_flat_filtered))
    
    plt.hist(lst_flat_filtered, bins=30, normed=True)
    
    plt.plot(lst_flat_filtered, fit)
    
    plt.show()
    

    这篇关于在 matplotlib 中绘制时,正态分布显得过于密集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

    06-21 12:58