本文介绍了CUDA矩阵问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一个新的CUDA,我有一个问题,我试图创建一个问题。问题是以下:我有一个方形矩阵(现在是5x5,但它会更大,如1k x 1k),这个矩阵填充随机数,然后我把这个矩阵到设备,它会做一些工作(现在它只应用一些阈值)。代码如下:

I'm quite new to CUDA and I'm having quite a few problems with a thing I'm trying to create. The problems is the following: I have a square matrix (for now it's 5x5 but it will be much bigger, like 1k x 1k), this matrix is filled with random numbers and then i pass this matrix to the device where it will do some work (for now it only applies some thresholds). The code is the following:

#define N 3
#define MINTHRESHOLD 100
#define MAXTHRESHOLD 200
#define THREADS 128

__global__ void applyThresh(int *d_base, int *d_thresh) {
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;

    while(tid < N) {
        if(d_base[tid] > MAXTHRESHOLD) {
            d_thresh[tid] = MAXTHRESHOLD;
        } else if(d_base[tid] < MINTHRESHOLD) {
            d_thresh[tid] = MINTHRESHOLD;
        } else {
            d_thresh[tid] = d_base[tid];
        }
        tid += stride;
    }
}

int main( void ) {
    cudaError_t err;
        int *base, *thresh, *d_base, *d_thresh, i;

    base = (int*)malloc((N * N) * sizeof(int));
    thresh = (int*)malloc((N * N) * sizeof(int));

    err = cudaMalloc((void**)&d_base, (N * N) * sizeof(int));
    if(err != cudaSuccess) {printf("ERROR 1"); return -1;}
    err = cudaMalloc((void**)&d_thresh, (N * N) * sizeof(int));
    if(err != cudaSuccess) {printf("ERROR 2"); return -1;}


    for(i = 0; i < N * N; i++) {
        base[i] = rand() % 256;
        thresh[i] = 0;
    }

    err = cudaMemcpy(d_base, base, (N * N) * sizeof(int), cudaMemcpyHostToDevice);
    if(err != cudaSuccess){printf("ERROR 3"); return -1;}

    applyThresh<<<(N + THREADS - 1)/THREADS , THREADS>>>(d_base, d_thresh);

    err = cudaMemcpy(thresh, d_thresh, (N * N) * sizeof(int), cudaMemcpyDeviceToHost);
    if(err != cudaSuccess) {printf("ERROR 4"); return -1;}

    for(i = 0; i < N *N; i++) {
        printf("%d -> ", base[i]);
        printf("%d\n", thresh[i]);
    }

    free(base);
    free(thresh);
    cudaFree(d_base);
    cudaFree(d_thresh);

    return 0;
}

程序的输出如下:

41 -> 100
35 -> 100
190 -> 190
132 -> 132
225 -> 200
108 -> -1082130432
214 -> -1082130432
174 -> 1007746492
82 ->  100509168

我真的不明白这个问题...我认为这可能是由索引我使用访问矩阵,但我真的找不到解决方案:(

I really can't understand the problem...i think it might be caused by the index I'm using to access matrices but I really can't find out a solution :(

推荐答案

while(tid < N) {

你只是处理数组的第一个 N 元素,改为 N * N

you are only processing first N elements of the array. Change it to N * N.

这篇关于CUDA矩阵问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 02:11