本文介绍了什么是Linux设备驱动程序用于协同处理外围设备的良好接口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经写了一些Linux设备驱动程序,但是我仍然处于新手黑客的水平.我可以让他们工作,但这就是我所能声称的.到目前为止,我已经能够使用write()将它们工作到写入数据模型中,并使用read()读取数据模型.我偶尔会使用ioctl进行更精细的控制.

I've written some Linux device drivers but I am still at the level of newbie hack. I can get them working but that's all I can claim. So far, I've been able to work them into a model of write data using write() and read data using read(). I occasionally use ioctl for more fine-tuned control.

现在,我想在FPGA逻辑中构建一个协同处理块,并在同一FPGA中为ARM处理器编写设备驱动程序,以将工作从ARM卸载到FPGA.我很难确定如何最好地设计此界面.

Now I want to build a coprocessing block in FPGA logic and write a device driver for the ARM processor in that same FPGA to offload work from the ARM to the FPGA. I'm having a hard time working out how best to design this interface.

如果对协处理器的访问是排他的,则可以将数据写入驱动程序,处理将在FPGA架构中进行,并且将通过读取调用来检索数据.但是,对协处理硬件的独占访问将是浪费.理想情况下,如果可用,任何用户空间进程都可以使用硬件.我认为,如果策略需要用户空间进程来打开设备,写入数据,读取结果然后关闭文件,那将是可行的.似乎每次需要访问协处理器时打开和关闭文件的开销抵消了首先卸载工作的好处.

If access to the coprocessor was exclusive, data could be written to the driver, the processing would happen in the FPGA fabric, and the data would be retrieved with a call to read. However, exclusive access to the coprocessing hardware would be a waste. Ideally any user space process could use the hardware if it's available. I believe it would work if policy required user space processes to open the device, write data, read results then close the file. It seems like the overhead of opening and closing the file each time the coprocessor needs to be accessed offsets the benefit of offloading the work in the first place.

我知道设备驱动程序代码中存在许多问题需要处理,以安全地处理对硬件的多次访问.但是,从高层次来看,我很乐意看到一个使该接口起作用并遵循Linux设备驱动程序的良好实践的概念.

I understand that there is a world of issues to be dealt with inside the device driver code to safely handle multiple access to the hardware. But just from a high level, I would love to see a concept that would make this interface work and adhere to good practices for Linux device drivers.

暂时清除所有复杂性,理想的系统看起来像是一个系统,其中任何进程都可以打开设备并具有访问点,可以将数据写入设备(可能是在阻塞调用中),并在协处理器完成操作后读取数据.驱动程序将处理硬件访问,并且调用过程可以在需要时将设备文件保持打开状态.绝对感谢您提供任何见解或指导!

Temporarily sweeping aside all complications the ideal seems like a system where any process can open the device and have an access point where data is written to the device, perhaps in a blocking call, and data is read after the coprocessor does it's magic. The driver would handle the hardware accesses and the calling processes can keep the device file open for as long as it's needed. Absolutely any insights or guidance would be greatly appreciated!

所有这些都是多余的信息,以防有人关心或以某种方式有用或有趣:

This is all extra information in case anyone cares or it's somehow useful or interesting:

该特定的FPGA是Xilinx的Zynq器件.它在与FPGA架构(基于其Kintex系列)相同的芯片上具有双核Cortex ARM A9.该系统正在运行用于ARM的Arch Linux,并且在一年之前已经做得非常漂亮.我使用通用名称协处理器硬件",因为这样的想法是,随着时间的推移,这部分硬件将获得功能,而与其设备驱动程序的用户空间接口则保持相当恒定.例如,您将能够写入1024个样本,并使该模块执行低通滤波操作,FFT等,并获得比处理器本身更快的结果.

This particular FPGA is a Zynq device from Xilinx. It has a dual-core Cortex ARM A9 on the same silicon as the FPGA fabric (which is based on their Kintex family). The system is running Arch Linux for ARM and has done so quite beautifully for a year now. I use the generic name "coprocessor hardware" because the idea is that this chunk of hardware will gain capability over time while the user-space interface to it's device driver remains fairly constant. You will be able to, for example, write 1024 samples and have this block perform a low-pass filtering operation, an FFT, etc and get the results faster than the processor could have done so on it's own.

谢谢!这是我的第一个问题,因此对违反协议和固有的无知表示歉意.

Thank you! This is my first question here so I apologize for breaches of protocol and inherent ignorance.

-蒂姆

推荐答案

我的团队已经从事这种事情了两年了.为了使CPU和可编程逻辑之间的延迟最小,我们将硬件映射到应用程序进程中,以便它可以直接与硬件进行通信.这样就消除了初始连接后的操作系统开销.

My team has been working on this sort of thing for a couple of years. To enable the lowest latency between the CPU and the programmable logic, we memory map the hardware into the application process so that it can directly communicate with the hardware. This eliminates the OS overhead after the initial connection.

仍然,我们发现CPU->加速器和回退至少为1微秒.这导致我们分担更大的工作量,或使用此路径来配置将结果直接写入系统DRAM的数据采集操作.

Still, we find that CPU -> accelerator and back is at least 1 microsecond. This leads us to offload bigger chunks of work or use this path to configure data acquisition operations that write results directly to the system DRAM.

根据工作的不同,可以通过多种方式安排共享加速器.

Depending on the mix of work, there are a variety of ways you can arrange for the accelerator to be shared.

  1. 您可以具有互斥量来保护硬件,以便使用它的每个进程都具有互斥权限.

  1. You can have a mutex protecting the hardware, so that each process using it has exclusive access.

您可以拥有一个具有独占访问权的守护程序,并使其多路复用请求和多路分解响应.

You can have a daemon with exclusive access, and have it multiplex requests and demultiplex responses.

您的加速器可以提供可以由不同进程同时使用的多个独立端口.您需要一种将端口分配给进程并在之后回收它们的方法.

Your accelerator can provide multiple independent ports that can be used simultaneously by different processes. You need a way to assign the ports to processes and to reclaim them afterward.

如果您的加速器具有请求和响应队列,则可以通过已编程的IO(内存映射硬件寄存器)或系统DRAM中的共享内存队列(以及到/来自可编程逻辑的DMA)来访问它们.

If your accelerator has request and response queues, they can be accessed either by programmed IO (memory map hardware registers) or by shared memory queues in system DRAM (and DMA to/from the programmable logic).

有关这些方法的更多讨论,请参见我们的FPGA 2015论文: http://www.connectal .org/connectal-fpga2015.pdf

See our FPGA 2015 paper for some more discussion of these approaches: http://www.connectal.org/connectal-fpga2015.pdf

这篇关于什么是Linux设备驱动程序用于协同处理外围设备的良好接口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-18 20:12