c++ - 从Poisson分布中提取数字的性能较低

为了从C++中的泊松分布中提取随机数，通常建议使用

RNG_type rng;
std::poisson_distribution<size_t> d(1e-6);
auto r = d(rng);

在std::poisson_distribution对象的每次调用中，整个随机位序列被消耗(例如std::mt19937为32位，std::mt19937_64为64位)。令我吃惊的是，在如此低的平均值(mean = 1e-6)下，绝大多数时候，只有几位足以确定要返回的值为0。然后可以将其他位缓存起来以备后用。

假设设置为true的位序列与Poisson分布的高返回值相关联，则当使用1e-6的均值时，任何不以19 true开头的序列都必然返回零!确实，

1 - 1/2^19 < P(0, 1e-6) < 1 - 1/2^20

，其中P(n, r)表示从具有平均值n的泊松分布中提取r的概率。不浪费位的算法将使用一半的时间，一半的时间使用两位，四分之一的时间，三分之一使用三位....

是否有一种算法可以通过绘制泊松数时消耗尽可能少的位来提高性能？当我们认为平均值较低时，与std::poisson_distribution相比，还有其他方法可以提高性能吗？

回应@ Jarod42的评论谁说

我认为这不会破坏等概率。在模糊测试中，我考虑了一个具有简单bernoulli分布的相同问题。我以概率1/2^4采样为true，以概率1 - 1/2^4采样为false。一旦在高速缓存中看到真值，函数drawWithoutWastingBits就会停止，并且不管这些位是什么，函数drawWastingBits都会消耗4位。

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <random>

bool drawWithoutWastingBits(std::vector<bool>& cache, size_t& cache_index)
{
    /*
        Get a true with probability 1/2^4 (=1/16=0.0625) and a false otherwise
    */

    size_t nbTrues = 0;
    while (cache[cache_index])
    {
        ++nbTrues;
        ++cache_index;
        if (nbTrues == 4)
        {
            return true;
        }
    }
    ++cache_index;
    return false;
}


bool drawWastingBits(std::vector<bool>& cache, size_t& cache_index)
{
    /*
        Get a true with probability 1/2^4 (=1/16=0.0625) and a false otherwise
    */

    bool isAnyTrue = false;
    for (size_t i = 0 ; i < 4; ++i)
    {
        if (cache[cache_index])
        {
            isAnyTrue = true;
        }
        ++cache_index;
    }
    return !isAnyTrue;
}

int main()
{
    /*
        Just cache a lot of bits in advance in `cache`. The same sequence of bits will be used by both function.
        I am just caching way enough bits to make sure they don't run out of bits below
        I made sure to have the same number of zeros and ones so that any deviation is caused by the methodology and not by the RNG
    */

    // Produce cache
    std::vector<bool> cache;
    size_t nbBitsToCache = 1e7;
    cache.reserve(nbBitsToCache);
    for (size_t i = 0 ; i < nbBitsToCache/2 ; ++i)
    {
        cache.push_back(false);
        cache.push_back(true);
    }
    // Shuffle cache
    {
        std::mt19937 mt(std::random_device{}());
        std::shuffle(cache.begin(), cache.end(), mt);
    }


    // Draw without wasting bits
    {
        size_t nbDraws = 1e6;
        size_t cache_index = 0;
        std::pair<size_t, size_t> outcomes = {0,0};
        for (size_t r = 0 ; r < nbDraws ; ++r)
        {
            drawWithoutWastingBits(cache, cache_index) ? ++outcomes.first : ++outcomes.second;
            assert(cache_index <= cache.size());
        }

        assert(outcomes.first + outcomes.second == nbDraws);
        std::cout << "Draw Without Wasting Bits: prob true = " << (double)outcomes.first / nbDraws << "\n";
    }


    // Draw wasting bits
    {
        size_t nbDraws = 1e6;
        size_t cache_index = 0;
        std::pair<size_t, size_t> outcomes = {0,0};
        for (size_t r = 0 ; r < nbDraws ; ++r)
        {
            drawWastingBits(cache, cache_index) ? ++outcomes.first : ++outcomes.second;
            assert(cache_index <= cache.size());
        }

        assert(outcomes.first + outcomes.second == nbDraws);
        std::cout << "Draw Wit Wasting Bits: prob true = " << (double)outcomes.first / nbDraws << "\n";
    }
}

可能的输出

Draw Without Wasting Bits: prob true = 0.062832
Draw Wit Wasting Bits: prob true = 0.062363

最佳答案

Devroye的Non-Uniform Random Variate Generation(第505页和第86页)提到了按顺序搜索算法进行的反演。

基于该算法，如果您知道mean大大小于1，则如果在[0，1]中生成统一的随机数u，则如果u <= exp(-mean)，则Poisson变量将为0，否则为0以上。

如果均值较低并且可以忍受近似分布，则可以使用以下方法(请参阅“The Discrete Gaussian for Differential Privacy”的附录A):

以有理数形式mean / numer表示denom。例如，如果mean是固定值，则可以相应地(例如在编译时)预先计算numer和denom。

随机生成一个Bernoulli(numer / denom)编号(以numer / denom的概率生成1，否则生成0)。如果以这种方式生成1，则对Bernoulli(numer / (denom * 2))，Bernoulli(numer / (denom * 3))等重复此步骤，直到以这种方式生成0。使用最小化位浪费的算法来生成这些数字，例如Lumbroso的Fast Dice Roller论文(2013)附录B中提到的算法，或从那里修改并在Boolean conditions中给出的“ZeroToOne”方法。另请参见this question。

如果步骤2产生的偶数个偶数，则Poisson变量正好为0。

如果步骤2产生的奇数个奇数，则Poisson变量大于0，并且需要“慢速”算法来仅对大于0的Poisson变量进行采样。

例如，假设平均值为1e-6(1/1000000)，生成一个Bernoulli(1/1000000)的数字，然后生成Bernoulli(1/2000000)，依此类推，直到您以这种方式生成0。如果生成了偶数，则泊松变量恰好为0。否则，泊松变量为1或更大，并且需要“更慢”的算法。

下面的算法是一个示例，该算法基于第505页和第86页的算法，但仅对Poisson变量1或更大的样本进行采样:

METHOD Poisson1OrGreater(mean)
 sum=Math.exp(-mean)
 prod=sum
 u=RNDRANGE(sum, 1)
 i=0
 while i==0 or u>sum
   prod*=mean/(i+1)
   sum+=prod
   i=i+1
 end
 return i
END METHOD

但是，此方法不是很健壮，特别是因为它使用接近1(浮点空间更稀疏)的数字而不是接近0的数字。

编辑(5月7日):

请注意，与n无关的Poisson(mean)随机数的总和是分布的Poisson(mean*n)(p。501)。因此，只要n乘以它们的均值，此答案中的上述讨论适用于n泊松随机数之和。例如，要生成平均值为1e-6的1000个泊松随机数之和，只需生成平均值为0.001的单个泊松随机数。这将大大节省对随机数生成器的调用。

编辑(5月13日):一般进行编辑。

关于c++ - 从Poisson分布中提取数字的性能较低，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/61614458/