问题描述
我正在尝试使用 scikit-learn 来拟合高斯数. scikit-learn GaussianMixture 似乎比使用curve_fit强得多.
I'm trying to fit a sum of gaussians using scikit-learn because the scikit-learn GaussianMixture seems much more robust than using curve_fit.
问题:即使在单个高斯峰的截断部分都拟合得不好,
Problem: It doesn't do a great job in fitting a truncated part of even a single gaussian peak:
from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab
import numpy as np
clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
data = np.random.randn(10000)
data = [[x] for x in data]
clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = matplotlib.pyplot.hist(data, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
mlab.normpdf(np.linspace(rangeMin, rangeMax),
clf.means_, np.sqrt(clf.covariances_[0]))[0])
给予现在将data = [[x] for x in data]
更改为data = [[x] for x in data if x <0]
以便截断分布返回有什么想法如何正确地放置截断吗?
givesnow changing data = [[x] for x in data]
to data = [[x] for x in data if x <0]
in order to truncate the distribution returnsAny ideas how to get the truncation fitted properly?
注意:分布不一定会在中间被截断,剩余的分布可能在整个分布的50%到100%之间.
Note: The distribution isn't necessarily truncated in the middle, there could be anything between 50% and 100% of the full distribution left.
如果有人可以将我引向其他软件包,我也将很高兴.我只尝试了curve_fit,但是一旦涉及两个以上的峰,就无法使它做任何有用的事情.
I would also be happy if anyone can point me to alternative packages. I've only tried curve_fit but couldn't get it to do anything useful as soon as more than two peaks are involved.
推荐答案
有点野蛮,但简单的解决方案是将曲线分成两半(data = [[x] for x in data if x < 0]
),镜像左部分(data.append([-data[d][0]])
),然后进行常规的高斯拟合.
A bit brutish, but simple solution would be to split the curve in two halfs (data = [[x] for x in data if x < 0]
), mirror the left part (data.append([-data[d][0]])
) and then do the regular Gaussian fit.
import numpy as np
from sklearn import mixture
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
np.random.seed(seed=42)
n = 10000
clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
#split the data and mirror it
data = np.random.randn(n)
data = [[x] for x in data if x < 0]
n = len(data)
for d in range(n):
data.append([-data[d][0]])
clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = plt.hist(data[0:n], bins=20, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
mlab.normpdf(np.linspace(rangeMin, rangeMax),
clf.means_, np.sqrt(clf.covariances_[0]))[0] * 2)
plt.show()
这篇关于拟合偏高斯的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!