Python中的多变量内核密度估计

本文介绍了Python中的多变量内核密度估计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用SciPy的gaussian_kde函数来估计多元数据的密度.在下面的代码中，我采样了一个3D多元法线并拟合了内核密度，但不确定如何评估拟合度.

I am trying to use SciPy's gaussian_kde function to estimate the density of multivariate data. In my code below I sample a 3D multivariate normal and fit the kernel density but I'm not sure how to evaluate my fit.

import numpy as np
from scipy import stats

mu = np.array([1, 10, 20])
sigma = np.matrix([[4, 10, 0], [10, 25, 0], [0, 0, 100]])
data = np.random.multivariate_normal(mu, sigma, 1000)
values = data.T
kernel = stats.gaussian_kde(values)

我看到了此，但不确定如何将其扩展到3D.

I saw this but not sure how to extend it to 3D.

还不确定我该如何开始评估拟合密度?我如何形象地看待这个?

Also not sure how do I even begin to evaluate the fitted density? How do I visualize this?

推荐答案

您可以通过多种方式以3D形式显示结果.

There are several ways you might visualize the results in 3D.

最简单的方法是在用于生成高斯KDE的点上对其进行评估，然后通过密度估计为这些点着色.

The easiest is to evaluate the gaussian KDE at the points that you used to generate it, and then color the points by the density estimate.

例如:

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

mu=np.array([1,10,20])
sigma=np.matrix([[4,10,0],[10,25,0],[0,0,100]])
data=np.random.multivariate_normal(mu,sigma,1000)
values = data.T

kde = stats.gaussian_kde(values)
density = kde(values)

fig, ax = plt.subplots(subplot_kw=dict(projection='3d'))
x, y, z = values
ax.scatter(x, y, z, c=density)
plt.show()

如果您具有更复杂的分布(即，并非全部都位于平面上)，则可能需要在常规3D网格上评估KDE并可视化体积的等值面(3D轮廓).使用Mayavi进行可视化是最简单的:

If you had a more complex (i.e. not all lying in a plane) distribution, then you might want to evaluate the KDE on a regular 3D grid and visualize isosurfaces (3D contours) of the volume. It's easiest to use Mayavi for the visualiztion:

import numpy as np
from scipy import stats
from mayavi import mlab

mu=np.array([1,10,20])
# Let's change this so that the points won't all lie in a plane...
sigma=np.matrix([[20,10,10],
                 [10,25,1],
                 [10,1,50]])

data=np.random.multivariate_normal(mu,sigma,1000)
values = data.T

kde = stats.gaussian_kde(values)

# Create a regular 3D grid with 50 points in each dimension
xmin, ymin, zmin = data.min(axis=0)
xmax, ymax, zmax = data.max(axis=0)
xi, yi, zi = np.mgrid[xmin:xmax:50j, ymin:ymax:50j, zmin:zmax:50j]

# Evaluate the KDE on a regular grid...
coords = np.vstack([item.ravel() for item in [xi, yi, zi]])
density = kde(coords).reshape(xi.shape)

# Visualize the density estimate as isosurfaces
mlab.contour3d(xi, yi, zi, density, opacity=0.5)
mlab.axes()
mlab.show()

这篇关于Python中的多变量内核密度估计的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！