PCI Express and DMA

I am looking for some assistance writing a driver and FPGA code to handle DMA on a PCI Express system. The FPGA is a Xilinx V2P with a Xilinx x4 PCIe LogiCORE (v3.0).

I've scoured through the entire PCI Express Base Specification v2.0 (the Solari/Intel book) and DMA isn't mentioned once, as far as I can tell. I suppose it is at a higher level than the base spec covers. The Xilinx manuals don't mention it, either. I've also googled everywhere (websites, groups, etc.) for mention of PCI Express and DMA, to no avail.

Where should I go to find out how PCI Express handles DMA? What should the TLP messages look like? Are there any reference designs / sample code available?

I look forward to hearing from the community about this issue.

Thank you,

--Alex Gross

Reply to
SongDragon
Loading thread data ...
[ ... ]

PCI (express or otherwise) doesn't really support DMA as such. Looking for bus mastering is much more likely to get you useful information.

--
    Later,
    Jerry.

The universe is a figment of its own imagination.
Reply to
Jerry Coffin

The DMA isn't done by the PCI express - it's done by the surrounding layers. The PCI, PCI-X, PCIe all have the ability to be a Master in a Burst transaction. For your FPGA to DMA to another system, the FPGA needs a request to master a transaction issued to the core. Once granted, the transaction will specify the location for the data transfer which has to be coordinated in your system, not in the PCIe core. The transaction can provide a complete payload or may be interrupted (at least in PCI/X land) to allow other higher-priority transactions to occur.

Look at mastering transactions and post again with further questions.

Reply to
John_H

Assuming the LogiCORE is capable of bus mastering, you need to instantiate a 'DMA controller' in your FPGA; either your own design or borrowed from another source.

A 'DMA controller' can simply be a set of registers (sometimes referred to as 'descriptors') mapped into the host address space that allow the software to set a DMA transfer - source address, destination address, transfer size, control/status etc - hit a 'GO' bit, and generate an interrupt when it's done. If you want to get more fancy, add multiple channels, scatter-gather descriptors, request queuing, etc.

From the back side of the PCIe core, all the DMA controller does is request the bus and issue a standard (burst in PCI-land) read/write to/from the source/destination addresses in the register. PCIe itself has no concept of 'DMA' - all it sees is another PCIe transfer.

Exactly how you establish the transfer in the core is dependent on the backend interface of the LogiCORE. You shouldn't have to worry about the format of the TLP at all if there's a decent backend interface.

A DMA controller IP core for PCI would still illustrate the concepts and give some insight into what you're up for. At the risk of muddying the waters further, there's a wishbone DMA core on opencores which can ultimately be used for PCI DMA transfers when connected to a PCI core (the opencores PCI core is a wishbone bridge so it bolts straight on). Might even be worth just looking at the doco for it.

As for the driver, that will depend on what class of device you're implementing, especially if you're talking about windows. Your best bet there is to find an open-source/example driver for a similar device. If you're doing windows and need a 'grass-roots' high performance driver, prepare yourself for a frustrating and challenging time.

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266
Reply to
Mark McDougall

Thanks for the helpful responses from everyone.

The basic idea seems to be as follows:

1) device driver (let's say for linux 2.6.x) requests some kernel-level physical memory 2) device driver performs MEMWRITE32 (length = 1) to a register ("destination descriptor") on the PCIe device, setting destination address in the memory 3) device driver performs MEMWRITE32 (length = 1) to a register ("length descriptor") on the PCIe device, setting length "N" (We'll say this also signals "GO") 4) PCIe device sends MEMWRITE32s (each length = up to 128 bytes at a time) to _______ (what is the destination?) until length N is reached 5) PCIe device sends interrupt (for now, let's say INTA ... it could be MSI, though) 6) device driver services interrupt and writes a zero to a register ("serviced descriptor"), telling the PCIe device the interrupt has been fielded.

I have a number of questions regarding this. First and foremost, is this view of the transaction correct? Is this actually "bus mastering"? It seems like for PCIe, since there is no "bus", there is no additional requirements to handle other devices "requesting" the bus. So I shouldn't have to perform any bus arbitration (listen in to see if any of the other INT pins are being triggered, etc). Is this assumption correct?

In PCI Express, you have to specify a bunch of things in the TLP header, including bus #, device #, function #, and tag. I'm not sure what these values should be. If the CPU were requesting a MEMREAD32, the values for these fields in the MEMREAD32_COMPLETION response would would be set to the same values as were included in the MEMREAD32. However, since the PCIe device is actually sending out a MEMWRITE32 command, the values for these fields are not clear to me.

Thanks,

--Alex

Reply to
SongDragon

(snip snip)

Your description of events is pretty much correct. The exact registers and sequencing will of course depend on your implementation of a DMA controller.

You'll need a source register too unless the data is being supplied by a FIFO or I/O "pipe" on the device.

"Bus mastering" is a PCI term and refers to the ability to initiate a PCI transfer - which also implies the capability to request the bus.

In PCIe nomenclature, an entity that can initiate a transfer is referred to as a "requestor" and you're right, there's no arbitration involved as such. But this is the equivalent of a PCI bus master I suppose. The target of the request is called the "completer".

This is where my knowledge of PCIe becomes thinner, as I'm currently in the process of ramping up for a PCIe project myself. But I have worked on several PCI projects so I think my foundations are valid.

For example, using a (bus-mastering) PCI core you wouldn't have to 'worry about' requesting the bus etc - initiating a request via the back-end of the core would trigger that functionality in the core transparently for you. As far as your device is concerned, you have "exclusive" use of the bus - you may just have to wait a bit to get to use it (and you may get interrupted occasionally). Arbitration etc is not your problem.

This is where I'll have to defer to others...

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266
Reply to
Mark McDougall

BTW if you're writing Linux device drivers as opposed to Windows drivers, you're in for a *much* easier ride! :)

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266
Reply to
Mark McDougall

easier ride? how much easier?

just yesterday I wrote test application that allocates system dma buffer and sends the physical address of it to the pci target that then starts master transaction.

the PCI backend logic needed about 20 lines of verilog for the WinXP test application I wrote about 15 lines of Delphi code

you say on linux it would be easier?

well if you have linux box in your bedroom then maybe :)

Antti PS actually linux device drivers are fun, I agree, but quick dirty direct hardware programming on WinXP is simple as well.

Reply to
Antti

As I said:

There's several options these days to make life a lot easier on Windows, for example the Jungo tools, TVICPCI etc. But to some extent it depends on what type of driver you're writing, what performance you need, and what versions of windows you need to support.

A big part of the time/effort is simply ramping up on windows device drivers - working out what *type* of driver you need to write (is it WDM? native kernel mode? VxD? upper/lower filter? HID?) - sometimes you even need 2 drivers! - and how it fits into the whole mess.

Years and years ago I spent *months* writing a SCSI miniport driver for

95/NT4/2K, which included support calls to M$. Once I'd finished, it took me 3 days get a basic port running on Linux, and I'd *never* written a Linux device driver before.

Regards,

--
Mark McDougall, Engineer
Virtual Logic Pty Ltd, 
21-25 King St, Rockdale, 2216
Ph: +612-9599-3255 Fax: +612-9599-3266
Reply to
Mark McDougall

Alex,

I was wondering if you made anymore progress with the PCI Express DMA problem. I have a similar problem but it is concerning bursting of data from the host to the Endpoint. My Windows Driver sets up a buffer of data to be sent to the endpoint and initiates a block transfer. The chipset, however, breaks this block into multiple single DW transfers effectively killing performance. I believe that allowing the Endpoint to become the bus master and initiate block transfers by reading from the allocated buffer on the host will lead to better bus utilization. Do you have any ideas about this or any updates on your progress with DMA?

Thanks --Kevin

Reply to
Kevin Irick

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.