本文介绍了使用python的2D散点图的高斯求和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想,我正在尝试建立人们将其粗略地称为自制KDE的东西.我正在尝试评估一组相当大的数据点的密度.特别是,由于有许多分散的数据点,我想使用颜色渐变来指示密度(请参见下面的链接).

I am trying to establish what people would loosely refer to as a homemade KDE - I suppose. I am trying to evaluate a density of a rather huge set of datapoints. In particular, having many data points for a scatter, I want to indicate the density using a color gradient (see link below).

为了举例说明,我在下面提供了一对随机的(x,y)数据.实际数据将以不同的比例散布,因此X和Y网格点间距也不同.

For exemplification, I provide a random pair of (x,y) data below. The real data will be spread on different scales, hence the difference in X and Y grid point spacing.

import numpy as np
from matplotlib import pyplot as plt

def homemadeKDE(x, xgrid, y, ygrid, sigmaX = 1, sigmaY = 1):

    a = np.exp( -((xgrid[:,None]-x)/(2*sigmaX))**2 )
    b = np.exp( -((ygrid[:,None]-y)/(2*sigmaY))**2 )

    xweights = np.dot(a, x.T)/np.sum(a)
    yweights = np.dot(b, y.T)/np.sum(b)

    return xweights, yweights

x = np.random.rand(10000)
x.sort()
y = np.random.rand(10000)

xGrid = np.linspace(0, 500, 501)
yGrid = np.linspace(0, 10, 11)

newX, newY = homemadeKDE(x, xGrid, y, yGrid)

我所坚持的是,如何将这些值投影回原始的x和y向量,以便可以用z值绘制二维散点图(x,y),以z值表示给定颜色的密度像这样的地图:

What I am stuck with is, how to project these values back to the original x and y vector so I can use it for plotting a 2D scatter plot (x,y) with a z value for the density colored by a given color map like so:

plt.scatter(x, y, c = z, cmap = "jet")

绘图和KDE方法实际上受到了这个出色的 answer

Plotting and KDE approach is in fact inspired by this great answer

编辑1 为了消除一些混淆,其想法是在高得多的网格上进行高斯KDE. SigmaX和sigmaY分别在x和y方向上反映内核的带宽.

EDIT 1To smooth out some confusion, the idea is to do a gaussian KDE, which would be on a much coarser grid. SigmaX and sigmaY reflect the bandwidth of the kernel in x and y directions, respectively.

推荐答案

我实际上-稍加思考-就能自行解决问题.也要感谢您的帮助和有见地的评论.

I was actually- with a little bit of thinking -able to solve the problem on my own. Also thanks to the help and insightful comments.

import numpy as np
from matplotlib import pyplot as plt

def gaussianSum1D(gridpoints, datapoints, sigma=1):

    a = np.exp( -((gridpoints[:,None]-datapoints)/sigma)**2 )

    return a

#some test data
x = np.random.rand(10000)
y = np.random.rand(10000)

#create grids
gridSize = 100
xedges = np.linspace(np.min(x), np.max(x), gridSize)
yedges = np.linspace(np.min(y), np.max(y), gridSize)

#calculate weights for both dimensions seperately
a = gaussianSum1D(xedges, x, sigma=2)
b = gaussianSum1D(yedges, y, sigma=0.1)

Z = np.dot(a, b.T).T

#plot original data
fig, ax = plt.subplots()
ax.scatter(x, y, s = 1)
#overlay data with contours
ax.contour(xedges, yedges, Z, cmap = "jet")

这篇关于使用python的2D散点图的高斯求和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-14 05:34