本文介绍了OpenGL-OpenCL互操作传输时间+从位图进行纹理化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

两部分问题:

我正在一个学校项目中,以生活游戏为工具来尝试gpgpu.我正在使用OpenCL和OpenGL进行实时可视化,目标是尽可能快地使它变大.通过分析,我发现帧时间主要由CL获取和释放GL缓冲区控制,并且时间成本与缓冲区的实际大小成正比.

I'm working on a school project using the game of life as a vehicle to experiment with gpgpu. I'm using OpenCL and OpenGL for realtime visualizations and the goal is to get this thing as big and fast as possible. Upon profiling I find that the frame time is dominated by CL Acquiring and Releasing the GL buffers, and that the time cost is directly proportional to the actual size of the buffer.

1)这正常吗?为什么会这样呢?据我所知,缓冲区永远不会离开设备内存,并且CL Acquire/Release就像一个互斥锁. OpenCL是否单独锁定/解锁每个字节或其他内容?

1) Is this normal? Why should this be? To the best of my understanding, the buffer never leaves device memory, and the CL Acquire/Release acts like a mutex. Does OpenCL lock/unlock each byte individually or something?

要解决这个问题,我已经从24位RGBA色彩模式(据我所知是OpenGL首选的色彩模式?)缩小为8位RGB颜色.这样可以大大提高速度,但是在调整内核后,传输时间再次占主导地位.

To get around this I've shrunk from 24-bit RGBA color mode (OpenGL's preferred color mode as I understand it?) to 8-bit RGB color. This has resulted in a major speedup, but after tuning my kernel, the transfer times are dominating again.

由于没有任何关于如何完全消除传输时间的想法(缺少将内核从OpenCL移植到GLSL的机会,这超出了项目的原始范围),我现在认为我最好的选择是写信给一个位图(与我当前正在使用的8位像素图相对),然后将该位图与颜色索引一起使用,以对四边形进行纹理处理.

In the absence of any ideas on how to eliminate the transfer times entirely (short of porting my kernel from OpenCL to GLSL, which would exceed the original scope of the project), I now figure that my best bet is to write to a bitmap (as opposed to the 8-bit pixmap I'm currently using) and then use that bitmap with a color index to texture a quad.

2)我可以直接使用位图对四边形进行纹理化吗?我曾考虑过使用glBitmap绘制一个辅助缓冲区,然后使用该缓冲区对四边形进行纹理处理,但是如果有可用的路径,我宁愿使用更直接的路由.

2) Can I texture a quad directly using a bitmap? I have considered using glBitmap to draw to an auxiliary buffer, and then using this buffer to texture my quad, but I would prefer to use a more direct route if one is available.

推荐答案

CL/GL互操作获取和发布调用背后的设计意图是将它们简单地视为所有权转移.但是,在许多早期的实现中,它们都是将图像从CL复制到GL,然后再复制回来.

The design intent behind the CL/GL interop acquire and release calls was for them to be simply ownership transfers. However, in many early implementations these were doing copies of the images from CL to GL and back.

除非您在OpenCL 1.1中使用同步对象扩展名,否则在发布前需要先进行clFinish,在获取前先进行glFinish.您 会在这里看到很多时间,因为所有排队的工作都必须先完成,然后这些调用才能继续.在某些平台上,可以使用clFlush代替clFinish;查看供应商提供的OpenCL文档.

Unless you use the sync object extensions in OpenCL 1.1, you need to clFinish before you release and glFinish before you acquire; you will see a lot of time spent here because all queued work will have to finish before these calls continue. Some platforms you can use clFlush instead of clFinish; check the OpenCL documentation from your vendor.

借助或多或少的最新硬件上的最新NVIDIA和AMD驱动程序,我发现高清视频尺寸图像的获取和发布调用非常迅速.

With the latest NVIDIA and AMD drivers on more or less recent hardware, I'm seeing the acquire and release calls going pretty quickly for HD video sized images.

这篇关于OpenGL-OpenCL互操作传输时间+从位图进行纹理化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 15:40