问题描述
我想生成一些正态分布的随机数.这不是关键任务,所以一个简单的算法就足够了.然后我想提供我自己的均值和标准差.
I would like to generate some random numbers which are normally distributed. It’s not mission critical, so a simple algorithm will suffice. I would then like to supply my own mean and standard deviation.
据我所知,根据中心极限定理,我应该能够通过将随机数相加来近似正态分布的随机数.
From what I have been able to read, according to the Central Limit Theorem, I should be able to approximate normally distributed random numbers by adding random numbers together.
例如:
rand()+rand()+rand()+rand()+rand()+rand()
其中 rand()
导致从 0 到 1 的均匀分布的随机数是一个合理的近似值.(我知道从技术上讲它是 0 ≤ rand() ).
where rand()
results in an evenly distributed random number from 0 to 1is a reasonable approximation. (I am aware that technically it’s 0 ≤ rand() < 1
).
预期的平均值是 6*0.5
所以我用这样的东西得到了想要的平均值:
The expected mean is 6*0.5
so I get to the desired mean with something like this:
(rand()+rand()+rand()+rand()+rand()+rand()-3) + mean
但是标准差是多少?
一旦我知道了这一点,设置任意标准偏差是否只是相乘的问题?
Once I know that, would setting an arbitrary standard deviation simply be a matter of multiplying?
更新
通过实验,我发现
(rand()+rand()+rand()+rand()+rand()+rand()-3)*sqrt(2)*sd+mean
给我一组具有所需标准偏差和平均值的数据.我已经使用 stddev()
和 avg()
聚合函数在具有 1000 万行的数据库 (PostgreSQL) 中对此进行了测试,典型结果接近于2 位小数,还不错.
gives me a set of data with the desired standard deviation and mean. I have tested this out in a database (PostgreSQL) with a 10 million rows, using the stddev()
and avg()
aggregate functions, and typical results are close to within 2 decimal places which isn’t too bad.
我不知道为什么涉及 sqrt(2)
......
I have no idea why sqrt(2)
is involved …
解决方案
好的,感谢下面的 Severin Pappadeux,我有了答案.
OK, thanks to Severin Pappadeux below, I have an answer.
我可以通过以下方式生成合理的结果:
I can generate a reasonable result with:
(rand() + … + rand() - n/2) / sqrt(n/12) * sd + mean
其中 n
是我准备进行的 rand()
调用次数.
where n
is the number of rand()
calls I am prepared to make.
推荐答案
这是一个正确的方法.唯一的问题是仔细分析你遗漏的尾巴.
That is a correct approach. The only problem is to carefully analyze the tails you're missing.
让我们考虑使 N(0,1)
- 均值为 0 且标准偏差为 1 的高斯分布.然后是任何其他高斯分布 N(\mu, \sigma)
只是从 N(0,1)
缩放和偏移.
Let's consider making N(0,1)
- gaussian distributed with mean 0 and std.deviation of 1. Then any other gaussian N(\mu, \sigma)
is just scale and shift from N(0,1)
.
所以,G(0,1)
(它是N(0,1)
的近似值)的建议算法是
So, proposed algorithm for G(0,1)
(which is an approximation for N(0,1)
) is
G(0,1) = U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1)
其中 U(0,1)
是 均匀分布的随机[0,1) 范围内的数字.让我们来看看平均值.
where U(0,1)
is uniformly distributed random number in the [0,1) range. Lets take a look at the mean.
E(G(0,1)) = 6*E(U(1,0)) = 6*0.5 = 3
这正是你所拥有的.因此,要获得 G(0,1)
的 0 均值,我们必须减去 3.现在让我们检查 G(0,1) 的方差,我们必须使其等于 1.
which is exactly what you've got. So, to get 0 mean for G(0,1)
we have to subtract 3. Lets now check the variance of the G(0,1), we have to make it equal to 1.
V(G(0,1)) = 6*V(U(1,0)) = 6*(1/12) = 1/2
Std.deviation (σ) 是方差的平方根,因此要使其为 1,您必须除以 sqrt(1/2).
Std.deviation (σ) is square root of variance, so to get it to 1 you have to divide by sqrt(1/2).
所以,最终的表达是
G(0,1) = (U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) + U(0,1) - 3)/sqrt(1/2)
而且它是 N(0,1)
的相当好的近似.
and it is reasonably good approximation of the N(0,1)
.
我不知道为什么会涉及到 sqrt(2) ......
除以 sqrt(1/2) 与乘以 sqrt(2) 相同 - 现在我希望你知道它是从哪里来的.
Dividing by sqrt(1/2) is the same as multiplying by sqrt(2) - now I hope you know where it came from.
一些简单的推论 - 对于其他一些 n
和 U(0,1) 方差乘数将包括术语 sqrt(n/12).
Some simple corollary - for some other n
sum of U(0,1) variance multiplier will include term sqrt(n/12).
另一个简单的推论 - 因为 V(U(0,1)) 等于 1/12,所以对十二个 U(0,1) 求和不需要任何乘数
Another simple corollary - because V(U(0,1)) is equal to 1/12, then summing twelve U(0,1) will not require any multipliers
G(0,1) = Sum_1^12 U(0,1) - 6
实际上经常在一些旧的采样食谱书籍/论文中被引用.
is actually often cited in some old sampling recipes books/papers.
您可能还想查看相关的Irwin-Hall 分布 和 贝茨分布
You might also want to take a look at related Irwin-Hall distribution and Bates distribution
更新
我考虑过对方法进行一些简化.假设我们要对偶数个U(0,1)
求和,所以n=2m
.再次,将 G(0,1)
作为 N(0,1)
I've thought about some simplification of the approach. Suppose we want to sum even number of U(0,1)
, so n=2m
. Again, talking about G(0,1)
as an approximation for N(0,1)
G(0,1) = (Sum_1^2m U(0,1) - m ) / sqrt(m/6)
我们改写为
G(0,1) = (Sum_1^m U(0,1) - (m - Sum_1^m U(0,1)))/sqrt(m/6) =
= (Sum_1^m U(0,1) - Sum_1^m(1 - U(0,1)))/sqrt(m/6)
由于 1 - U(0,1)
与 U(0,1)
具有相同的分布,我们可以以对称形式写G(0,1)
Due to the fact, that 1 - U(0,1)
has the same distribution as U(0,1)
we couldwrite G(0,1)
in symmetric form
G(0,1) = (Sum_1^m U(0,1) - Sum_1^m U(0,1))/sqrt(m/6) =
= Sum_1^m (U(0,1) - U(0,1)) / sqrt(m/6)
这篇关于通过添加随机数来近似正态分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!