拟合偏高斯 | data

本文介绍了拟合偏高斯的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 scikit-learn 来拟合高斯数. scikit-learn GaussianMixture 似乎比使用curve_fit强得多.

I'm trying to fit a sum of gaussians using scikit-learn because the scikit-learn GaussianMixture seems much more robust than using curve_fit.

问题:即使在单个高斯峰的截断部分都拟合得不好，

Problem: It doesn't do a great job in fitting a truncated part of even a single gaussian peak:

from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab
import numpy as np

clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
data = np.random.randn(10000)
data = [[x] for x in data]
clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = matplotlib.pyplot.hist(data, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
         mlab.normpdf(np.linspace(rangeMin, rangeMax),
                      clf.means_, np.sqrt(clf.covariances_[0]))[0])

给予现在将data = [[x] for x in data]更改为data = [[x] for x in data if x <0]以便截断分布返回有什么想法如何正确地放置截断吗?

givesnow changing data = [[x] for x in data] to data = [[x] for x in data if x <0] in order to truncate the distribution returnsAny ideas how to get the truncation fitted properly?

注意:分布不一定会在中间被截断，剩余的分布可能在整个分布的50％到100％之间.

Note: The distribution isn't necessarily truncated in the middle, there could be anything between 50% and 100% of the full distribution left.

如果有人可以将我引向其他软件包，我也将很高兴.我只尝试了curve_fit，但是一旦涉及两个以上的峰，就无法使它做任何有用的事情.

I would also be happy if anyone can point me to alternative packages. I've only tried curve_fit but couldn't get it to do anything useful as soon as more than two peaks are involved.

推荐答案

有点野蛮，但简单的解决方案是将曲线分成两半(data = [[x] for x in data if x < 0])，镜像左部分(data.append([-data[d][0]]))，然后进行常规的高斯拟合.

A bit brutish, but simple solution would be to split the curve in two halfs (data = [[x] for x in data if x < 0]), mirror the left part (data.append([-data[d][0]])) and then do the regular Gaussian fit.

import numpy as np
from sklearn import mixture
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab

np.random.seed(seed=42)
n = 10000

clf = mixture.GaussianMixture(n_components=1, covariance_type='full')

#split the data and mirror it
data = np.random.randn(n)
data = [[x] for x in data if x < 0]
n = len(data)
for d in range(n):
    data.append([-data[d][0]])

clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = plt.hist(data[0:n], bins=20, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
         mlab.normpdf(np.linspace(rangeMin, rangeMax),
                      clf.means_, np.sqrt(clf.covariances_[0]))[0] * 2)

plt.show()

这篇关于拟合偏高斯的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！