本文介绍了runif函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Cuda中实现一个Metropolis-Hastings算法。对于这种算法,我需要能够生成许多具有变化范围的均匀随机数。因此,我想有一个名为runif(min,max)的函数,该函数在此范围内返回一个均匀分布的数字。此函数必须在实际实现算法的另一个函数内调用多次。



基于,我试图把那里显示的代码放入一个函数(见下文)。如果我理解这正确,同样的状态导致相同的数字序列。所以,如果状态不改变,我总是得到相同的输出。一个替代方法是在runif函数内生成一个新的状态,以便每次调用函数时,都会用另一个状态调用。正如我听说过,这是不可取的,因为函数变慢了。



那么,这样的函数的最好的实现是什么?我应该在函数内部生成一个新的状态,还是在每次调用函数时生成一个新的状态?还是还有另一种方法?

  __ device__ float runif(float a,float b,curandState state)
{
float myrandf = curand_uniform_double(& state);
myrandf * =(b - a + 0.999999);
myrandf + = a;
return myrandf;
}


解决方案

>

curand_uniform * 系列函数接受指向 curandState 使用它以某种方式修改它,所以当下一次 curand_uniform * 函数将被调用与相同的状态对象,你可以有所需的随机性。



重要的是:
为了得到有意义的结果,您需要 curandState 更改。 / p>

错误的方式1



现在你传递 curandState 通过值,所以状态更改在内核返回后丢失。



错误方式2



创建和初始化新的本地状态里面的内核不仅会杀死性能(并击败任何使用CUDA),但会给你错误的分配。



正确的方式



在链接的示例代码中, curandState 通过指针传递,保证修改被保存(指针指向的地方)。

通常,您需要在程序中分配和初始化一个随机状态数组(在启动任何需要RNG的内核之前)。然后,为了生成一些数字,您从内核访问此数组,索引基于线程ID。需要多个(许多)状态以避免数据竞争(每个并发运行 curand_uniform * 函数至少有一个状态)。



这样,您可以避免副本和状态初始化的性能成本,并获得完美的分发。



请参阅以获取模式信息和示例代码。


I am trying to implement a Metropolis-Hastings algorithm in Cuda. For this algorithm, I need to be able to generate many uniform random numbers with varying range. Therefore, I would like to have a function called runif(min, max) that returns a uniformly distributed number in this range. This function has to be called multiple times inside another function that actually implements the algorithm.

Based on this post, I tried to put the code shown there into a function (see below). If I understood this correctly, the same state leads to the same sequences of numbers. So, if the state doesn't change, I always get the same output. One alternative would be to generate a new state inside the runif function so that each time the function is called, it is called with another state. As I've heard though, this is not advisable since the function gets slow.

So, what would be the best implementation of such a function? Should I generate a new state inside the function or generate a new one outside each time I call the function? Or is there yet another approach?

__device__ float runif(float a, float b, curandState state)
{
  float myrandf = curand_uniform_double(&state);
  myrandf *=  (b - a + 0.999999);
  myrandf += a;
  return myrandf;
}
解决方案

How it works

curand_uniform* family of functions accepts a pointer to curandState object, uses it somehow and modifies it, so when next time curand_uniform*function will be called with the same state object you could have desired randomness.

Important thing here is: In order to get meaningful results you need to write curandState changes back.

Wrong way 1

For now you are passing curandState by value, so state changes are lost after kernel returns. Not even mentioning unnecessary waste of time on copying.

Wrong way 2

Creating and initializing of a new local state inside kernel will not only kill performance (and defeat any use of CUDA) but will give you wrong distribution.

Right way

In the sample code you've linked, curandState is passed by pointer, that guarantees that modifications are saved (somewhere where this pointer points to).

Usually, you would want to allocate and initialize an array of random states once in your program (before launching any kernels that require RNG). Then, in order to generate some numbers, you access this array from kernels, with indices based on thread ids. Multiple (many) states are required to avoid data race (at least one state per concurrently running curand_uniform* function).

This way you avoid performance costs of copies and state initialization and get your perfect distribution.

See cuRand documentation for mode information and sample code.

这篇关于runif函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 12:43