本文介绍了Nvidia NPP nppiFilter在与2d内核进行卷积时会产生垃圾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

描述了类似的问题,如用户Steenstrup的图片所示:






一些最后的注释:




  • 值(例如 NppiPoint oAnchor = {0,0} {1,1} ), c $ c> -24 ,根据。在中简要提到了此问题。

  • 此代码非常详细。这不是主要的问题,但是有谁有任何建议,如何使这个代码更简洁?


解决方案

您正在为内核数组使用2D内存分配器。内核阵列是密集的1D阵列,而不是典型的NPP图像的2D条纹阵列。



只需使用大小为kernelWidth * kernelHeight * sizeof(Npp32s)的简单cuda malloc替换2D CUDA malloc,并执行正常的CUDA memcopy而不是memcopy 2D。

  // 1D而不是2D 
cudaMalloc((void **)& deviceKernel,kernelSize.width * kernelSize.height * sizeof (Npp32s));
cudaMemcpy(deviceKernel,hostKernel,kernelSize.width * kernelSize.height * sizeof(Npp32s),cudaMemcpyHostToDevice);

另外,比例因子为1不会转换为无缩放。缩放发生与因素2 ^( - ScaleFactor)。


Nvidia Performance Primitives (NPP) provides the nppiFilter function for convolving a user-provided image with a user-provided kernel. For 1D convolution kernels, nppiFilter works properly. However, nppiFilter is producing a garbage image for 2D kernels.

I used the typical Lena image as input:


Here's my experiment with a 1D convolution kernel, which produces good output.

#include <npp.h> // provided in CUDA SDK
#include <ImagesCPU.h> // these image libraries are also in CUDA SDK
#include <ImagesNPP.h>
#include <ImageIO.h>

void test_nppiFilter()
{
    npp::ImageCPU_8u_C1 oHostSrc;
    npp::loadImage("Lena.pgm", oHostSrc);
    npp::ImageNPP_8u_C1 oDeviceSrc(oHostSrc); // malloc and memcpy to GPU 
    NppiSize kernelSize = {3, 1}; // dimensions of convolution kernel (filter)
    NppiSize oSizeROI = {oHostSrc.width() - kernelSize.width + 1, oHostSrc.height() - kernelSize.height + 1};
    npp::ImageNPP_8u_C1 oDeviceDst(oSizeROI.width, oSizeROI.height); // allocate device image of appropriately reduced size
    npp::ImageCPU_8u_C1 oHostDst(oDeviceDst.size());
    NppiPoint oAnchor = {2, 1}; // found that oAnchor = {2,1} or {3,1} works for kernel [-1 0 1] 
    NppStatus eStatusNPP;

    Npp32s hostKernel[3] = {-1, 0, 1}; // convolving with this should do edge detection
    Npp32s* deviceKernel;
    size_t deviceKernelPitch;
    cudaMallocPitch((void**)&deviceKernel, &deviceKernelPitch, kernelSize.width*sizeof(Npp32s), kernelSize.height*sizeof(Npp32s));
    cudaMemcpy2D(deviceKernel, deviceKernelPitch, hostKernel,
                     sizeof(Npp32s)*kernelSize.width, // sPitch
                     sizeof(Npp32s)*kernelSize.width, // width
                     kernelSize.height, // height
                     cudaMemcpyHostToDevice);
    Npp32s divisor = 1; // no scaling

    eStatusNPP = nppiFilter_8u_C1R(oDeviceSrc.data(), oDeviceSrc.pitch(),
                                          oDeviceDst.data(), oDeviceDst.pitch(),
                                          oSizeROI, deviceKernel, kernelSize, oAnchor, divisor);

    cout << "NppiFilter error status " << eStatusNPP << endl; // prints 0 (no errors)
    oDeviceDst.copyTo(oHostDst.data(), oHostDst.pitch()); // memcpy to host
    saveImage("Lena_filter_1d.pgm", oHostDst); 
}

Output of the above code with kernel [-1 0 1] -- it looks like a reasonable gradient image:


However, nppiFilter outputs a garbage image if I use a 2D convolution kernel. Here are the things that I changed from the above code to run with the 2D kernel [-1 0 1; -1 0 1; -1 0 1]:

NppiSize kernelSize = {3, 3};
Npp32s hostKernel[9] = {-1, 0, 1, -1, 0, 1, -1, 0, 1};
NppiPoint oAnchor = {2, 2}; // note: using anchor {1,1} or {0,0} causes error -24 (NPP_TEXTURE_BIND_ERROR)
saveImage("Lena_filter_2d.pgm", oHostDst);

Below is the output image using the 2D kernel [-1 0 1; -1 0 1; -1 0 1].

What am I doing wrong?

This StackOverflow post describes a similar problem, as shown in user Steenstrup's image: http://1ordrup.dk/kasper/image/Lena_boxFilter5.jpg


A few final notes:

  • With the 2D kernel, for certain anchor values (e.g. NppiPoint oAnchor = {0, 0} or {1, 1}), I get error -24, which translates to NPP_TEXTURE_BIND_ERROR according to the NPP User Guide. This issue was mentioned briefly in this StackOverflow post.
  • This code is very verbose. This isn't the main question, but does anyone have any suggestions for how to make this code more concise?

解决方案

You are using a 2D memory allocator for the kernel array. Kernel arrays are dense 1D arrays, not 2D strided arrays as the typical NPP image is.

Simply replace the 2D CUDA malloc with a simple cuda malloc of size kernelWidth*kernelHeight*sizeof(Npp32s) and do a normal CUDA memcopy not memcopy 2D.

//1D instead of 2D
cudaMalloc((void**)&deviceKernel, kernelSize.width * kernelSize.height * sizeof(Npp32s));
cudaMemcpy(deviceKernel, hostKernel, kernelSize.width * kernelSize.height * sizeof(Npp32s), cudaMemcpyHostToDevice);

As an aside, a "scale factor" of 1 does not translate to no scaling. Scaling happens with factors 2^(-ScaleFactor).

这篇关于Nvidia NPP nppiFilter在与2d内核进行卷积时会产生垃圾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-18 00:29