本文介绍了加速从叠加的截断正态分布中绘制随机值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从两个截断正态分布之和的分布中抽取 N 个随机样本.我通过从 scipy.stats 子类化 rv_continuous 类并提供一个 pdf 来得到我想要的东西,该 pdf 是两个给定 pdf 的平均值:

I want to draw N random samples from a distribution that is the sum of two truncated normal distributions. I get what I want by subclassing rv_continuous class from scipy.stats and providing a pdf that is the mean of the two given pdfs:

import numpy as np
from scipy import stats

my_lim = [0.05, 7]  # lower and upper limit
my_loc = [1.2, 3]  # loc values of the two truncated normal distributions
my_scale = [0.6, 2]  # scale values of the two truncated normal distributions

class sum_truncnorm(stats.rv_continuous):
    def _pdf(self, x):
        return (stats.truncnorm.pdf(x,
                                    a=(my_lim[0] - my_loc[0]) / my_scale[0],
                                    b=(my_lim[1] - my_loc[0]) / my_scale[0],
                                    loc=my_loc[0],
                                    scale=my_scale[0]) +
                stats.truncnorm.pdf(x,
                                    a=(my_lim[0] - my_loc[1]) / my_scale[1],
                                    b=(my_lim[1] - my_loc[1]) / my_scale[1],
                                    loc=my_loc[1],
                                    scale=my_scale[1]) / 2

但是,使用:

my_dist = sum_truncnorm()
my_rvs = my_dist.rvs(size=10)

非常很慢,每个随机值大约需要 5 秒.

is very slow and takes about 5 seconds per random value.

我确信这可以更快地完成,但我不知道该怎么做.我是否应该将我的分布定义为(非截断的)正态分布的总和,然后强制截断?我在这个方向上做了一些测试,但这只是快了大约 10 倍,因此仍然很慢.

I'm sure that this can be done much faster, but I am not sure how to do it. Should I maybe define my distribution as a sum of (non truncated) normal distributions and force the truncated afterwards? I did some tests in this direction, but this was only about 10x faster and thus still way to slow.

Google 告诉我,我可能需要使用逆变换采样并覆盖 _rvs 方法,但我未能使其适用于我的截断分布.

Google told me that I probably need to use inverse transform sampling and override the _rvs method, but I failed to make this working for my truncated distributions.

推荐答案

首先,您必须确保 _pdf 已规范化.框架不会检查它,否则会默默地产生废话.

First, you'll going to have to make sure _pdf is normalized.The framework does not check it, and silently produces nonsense otherwise.

其次,为了使绘图变量更快,您需要实现一个 _ppf 或 _rvs.仅使用 _pdf,它会通过通用代码路径(数字集成和根查找),这就是为什么您当前的版本很慢.

Second, to make drawing variates fast, you need to implement a _ppf or _rvs. With just _pdf only, it goes through the generic code path (numeric integration and root-finding) which why your current version is slow.

这篇关于加速从叠加的截断正态分布中绘制随机值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-21 12:53