本文介绍了通过添加随机数来近似正态分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成一些正态分布的随机数.这不是关键任务,所以一个简单的算法就足够了.然后我想提供我自己的均值和标准差.

I would like to generate some random numbers which are normally distributed. It’s not mission critical, so a simple algorithm will suffice. I would then like to supply my own mean and standard deviation.

据我所知,根据中心极限定理,我应该能够通过将随机数相加来近似正态分布的随机数.

From what I have been able to read, according to the Central Limit Theorem, I should be able to approximate normally distributed random numbers by adding random numbers together.

例如:

rand()+rand()+rand()+rand()+rand()+rand()

其中 rand() 导致从 0 到 1 的均匀分布的随机数是一个合理的近似值.(我知道从技术上讲它是 0 ≤ rand() ).

where rand() results in an evenly distributed random number from 0 to 1is a reasonable approximation. (I am aware that technically it’s 0 ≤ rand() < 1).

预期的平均值是 6*0.5 所以我用这样的东西得到了想要的平均值:

The expected mean is 6*0.5 so I get to the desired mean with something like this:

(rand()+rand()+rand()+rand()+rand()+rand()-3) + mean

但是标准差是多少?

一旦我知道了这一点,设置任意标准偏差是否只是相乘的问题?

Once I know that, would setting an arbitrary standard deviation simply be a matter of multiplying?

更新

通过实验,我发现

(rand()+rand()+rand()+rand()+rand()+rand()-3)*sqrt(2)*sd+mean

给我一​​组具有所需标准偏差和平均值的数据.我已经使用 stddev()avg() 聚合函数在具有 1000 万行的数据库 (PostgreSQL) 中对此进行了测试,典型结果接近于2 位小数,还不错.

gives me a set of data with the desired standard deviation and mean. I have tested this out in a database (PostgreSQL) with a 10 million rows, using the stddev() and avg() aggregate functions, and typical results are close to within 2 decimal places which isn’t too bad.

我不知道为什么涉及 sqrt(2) ......

I have no idea why sqrt(2) is involved …

解决方案

好的,感谢下面的 Severin Pappadeux,我有了答案.

OK, thanks to Severin Pappadeux below, I have an answer.

我可以通过以下方式生成合理的结果:

I can generate a reasonable result with:

(rand() + … + rand() - n/2) / sqrt(n/12) * sd + mean

其中 n 是我准备进行的 rand() 调用次数.

where n is the number of rand() calls I am prepared to make.

推荐答案

这是一个正确的方法.唯一的问题是仔细分析你遗漏的尾巴.

That is a correct approach. The only problem is to carefully analyze the tails you're missing.

让我们考虑使 N(0,1) - 均值为 0 且标准偏差为 1 的高斯分布.然后是任何其他高斯分布 N(\mu, \sigma) 只是从 N(0,1) 缩放和偏移.

Let's consider making N(0,1) - gaussian distributed with mean 0 and std.deviation of 1. Then any other gaussian N(\mu, \sigma) is just scale and shift from N(0,1).

所以,G(0,1)(它是N(0,1)的近似值)的建议算法是

So, proposed algorithm for G(0,1) (which is an approximation for N(0,1)) is

G(0,1) = U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1)

其中 U(0,1)均匀分布的随机[0,1) 范围内的数字.让我们来看看平均值.

where U(0,1) is uniformly distributed random number in the [0,1) range. Lets take a look at the mean.

E(G(0,1)) = 6*E(U(1,0)) = 6*0.5 = 3

这正是你所拥有的.因此,要获得 G(0,1) 的 0 均值,我们必须减去 3.现在让我们检查 G(0,1) 的方差,我们必须使其等于 1.

which is exactly what you've got. So, to get 0 mean for G(0,1) we have to subtract 3. Lets now check the variance of the G(0,1), we have to make it equal to 1.

V(G(0,1)) = 6*V(U(1,0)) = 6*(1/12) = 1/2

Std.deviation (σ) 是方差的平方根,因此要使其为 1,您必须除以 sqrt(1/2).

Std.deviation (σ) is square root of variance, so to get it to 1 you have to divide by sqrt(1/2).

所以,最终的表达是

G(0,1) = (U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) - 3)/sqrt(1/2)

而且它是 N(0,1) 的相当好的近似.

and it is reasonably good approximation of the N(0,1).

我不知道为什么会涉及到 sqrt(2) ......

除以 sqrt(1/2) 与乘以 sqrt(2) 相同 - 现在我希望你知道它是从哪里来的.

Dividing by sqrt(1/2) is the same as multiplying by sqrt(2) - now I hope you know where it came from.

一些简单的推论 - 对于其他一些 n 和 U(0,1) 方差乘数将包括术语 sqrt(n/12).

Some simple corollary - for some other n sum of U(0,1) variance multiplier will include term sqrt(n/12).

另一个简单的推论 - 因为 V(U(0,1)) 等于 1/12,所以对十二个 U(0,1) 求和不需要任何乘数

Another simple corollary - because V(U(0,1)) is equal to 1/12, then summing twelve U(0,1) will not require any multipliers

G(0,1) = Sum_1^12 U(0,1) - 6

实际上经常在一些旧的采样食谱书籍/论文中被引用.

is actually often cited in some old sampling recipes books/papers.

您可能还想查看相关的Irwin-Hall 分布贝茨分布

You might also want to take a look at related Irwin-Hall distribution and Bates distribution

更新

我考虑过对方法进行一些简化.假设我们要对偶数个U(0,1)求和,所以n=2m.再次,将 G(0,1) 作为 N(0,1)

I've thought about some simplification of the approach. Suppose we want to sum even number of U(0,1), so n=2m. Again, talking about G(0,1) as an approximation for N(0,1)

G(0,1) = (Sum_1^2m U(0,1) - m ) / sqrt(m/6)

我们改写为

G(0,1) = (Sum_1^m U(0,1) - (m - Sum_1^m U(0,1)))/sqrt(m/6) =
       = (Sum_1^m U(0,1) - Sum_1^m(1 - U(0,1)))/sqrt(m/6)

由于 1 - U(0,1)U(0,1) 具有相同的分布,我们可以以对称形式写G(0,1)

Due to the fact, that 1 - U(0,1) has the same distribution as U(0,1) we couldwrite G(0,1) in symmetric form

G(0,1) = (Sum_1^m U(0,1) - Sum_1^m U(0,1))/sqrt(m/6) =
       = Sum_1^m (U(0,1) - U(0,1)) / sqrt(m/6)

这篇关于通过添加随机数来近似正态分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-23 14:33