Question on PCI-express verssus Standard PCI performance

B

Benjamin Couillard 14 years ago

Hi everyone,

I'm working on a conversion project where we needed to convert a PCI acquisition card to a PCI-express (x1) acquisition card. The project is essentially the same except instead that the new acquisition card is a PCI-express endpoint instead of being a standard-PCI endpoint. The project is implemented on a Xilinx FPGA, but I don't think my issue is Xilinx specific.

The conversion has worked fine on all levels except one. The read latency of PCI express is about 4 times higher than standard PCI. For example, on the old product, it takes about 0.9 us to perform a 1- DWORD read. With the PCI-express product it takes about 3-4 us to perform a 1-DWORD read. I've seen this read latency both in real-life (with a real board) and in VHDL Simulation so I don't think that this is a driver issue. Do any of you have experienced similar performance issues?

Don't get me wrong, for me PCI-express is a major step ahead, the write burst and read burst performance is way better than standard PCI.. Perhaps this is the reason, since most PCI-express cards are mostly used in burst transactions, the read latency does not really matter, therefore they sacrificed some read latency in order to obtain better performance.

Best regards

Vote

R

Robert Wessel 14 years ago

One lane PCIe 1.x should be able to turn a word read around in about

250ns assuming not too much else is going on. Of course an excessive number of switches (or slow switches) or slow hardware on either end are obviously possible issues. But PCIe is certainly much faster than 3-4us to read a word.

Vote

R

rupertlssmith 14 years ago

I have no actual experience of experimenting with this, however, I have been interested in a latency sensitive device that may potentially use PCI-E so have been looking around for answers.

Have a look at this write up, of a comparison of HyperTransport and PCI-E. The authors claim around 250 nano-seconds (page 9) to read the first byte:

formatting link

It would be interesting to hear what is causing you to see 3-4 us? That would kill off my potential project, so I am hoping to be able to match the results in the above paper.

Could there be some inaccuracy in your measurements; how do you measure the latency?

Rupert

Vote

K

Kolja Sulimma 14 years ago

When designing with PCI or PCIe you should really try to avoid reads as much as possible. What do you need it for anyway? In a multitasking operating system you are going to have microseconds of jitter on the software side in kernel mode and tens of miliseconds in user mode anyway. So I am wondering what the scenario is that benefits from sub us latency for software reads?

Kolja cronologic.de

Vote

J

John Adair 14 years ago

Generally speaking PCI Express much more prone to latency than convertional PCI because packets have to be constructed, passed through a structure of nodes, and checked at most levels. Data checking isn't completed, and onward transmission, until last data arrives and CRCs are checked.

If you do a "read" this will have a packet outgoing and one coming back so doubly worse. If you can do a DMA like operation where data is sent from the data source and then interrupt your system to use the data in memory.

The latency will also vary from system to system because rooting structures differ between motherboards. The amount of other things going on will also affect latency as different things contend for the data pipes. Generally speaking if you are trying to do anything real time it is something of a nightmare if you are planning using the host motherboard processor for control functions.

You can try and make the latency smaller by using smaller packet sizes and this sometimes helps. Ultimately if there is a real time element to this then putting the processing and/or control on your card is probably best for performance and accuracy.

John Adair Home of Raggedstone2. The Spartan-6 PCIe Development Board.

Vote

R

rupertlssmith 14 years ago

In the paper I posted a link to, I think the times are for an interrupt, or for DMA, not a software initiated "read". Thanks for explaining the difference.

Rupert

Vote

M

Morten Leikvoll 14 years ago

Is it possible that time-stamping the data would disconnect you somewhat from the latency problem? Usually data can't be processed and presented real-time at those speeds anyway..

Vote

Question on PCI-express verssus Standard PCI performance

Join the Discussion

Didn't find your answer?