PCI-X DMA problem w/ Xeon?

If anyone can offer help with this (or a pointer to a more suitable forum) it would be greatly appreciated. I am posting here because I know there is some some folks with expertise who frequent this group (and we're using a Xilinx core :)).

We are attempting to perform 64B burst PCI-X DMA write transfers from our add-in card into host memory on a dual Xeon system.

Our linux device driver (kernel 2.24.x/2.26.x) is notified via an interrupt & single dword "doorbell" write that the larger 64B entry is available for processing.

The order of operation on the PCI-X bus is as follows:

64B data write --> 4B doorbell write --> interrupt.

Upon receiving the interrupt, the device driver polls the location in memory where the 4B doorbell write is expected to show up. Once he sees the doorbell location written, he reads the 64B data location. PCI ordering should guarantee that the 64B data location is written to system memory before the 4B doorbell write is.

The above writes are performed as Memory Write Block transactions (we have also tried Memory Write transactions), the No Snoop bit is cleared, and the Relaxed Ordering bit is cleared.

We consistently encounter a situation where the device driver correctly receives the interrupt & single dword doorbell write, but the 64B write fails to appear in memory. Instead, the device driver reads a stale 64B block of data (data from the last successful DMA write).

As a debug measure, we had the FPGA on our add-in card perform a readback (Memory Read Block) of the 64B entry immediately after writing it. We obeserved that the data read back was stale and matched the stale data that the device driver saw. Eg:

1) Location 0xABCDEF00 is known to contain stale 64B data 0xAAAA....AAAA. 1) FPGA does Memory Write Block 64B 0xBBBB....BBBB at address 0xABCDEF00. 2) FPGA does Memory Read Block 64B at address 0xABCDEF00 (Split esponse). 3) Split Completion is returned by bridge with data 0xAAAA....AAAA.

This appears to be a violation of PCI ordering rules. Again, the No Snoop and Relaxed Order bits are cleared for all of these transactions.

The device driver *never* writes to the 64B location, so there should be no possibility of a collision occurring where he writes/flushes stale data that overwrites the incoming DMA write.

This tells me that the location is NOT getting written because, according to PCI ordering rules, the FPGA read *must* push the Memory Write Block into system memory before reading back the location.

We observe this behaviour in dual Xeon systems with both the Intel E7501 chipset and the Broadcom Serverworks GC-LE chipset.

We observe this in SMP and single processor configurations.

When bus traffic is light at 133MHz, or whenever the bus is running at 66MHz, we do *not* observe this problem. We occasionally observe the problem when the bus is running at 100MHz with heavy traffic. This suggests that we are hitting a narrow timing window at higher bus speeds.

We are suspicious that we might be encountering a cache errata in the Xeon, and are wondering if anyone can confirm this and possibly provide a workaround?

We've been banging our heads on this for a couple of weeks now.

Mark

Reply to
Mark Schellhorn
Loading thread data ...

I was observing from the scope that the non-DMA from-device writes were taking anywhere between 30 and 40 bus clock cycles for the lower dword to transfer. I would recommend doing something along this line:

host writes to DMA trigger register ==>

device causes DMA transaction ==>

on done device causes interrupt ==>

driver reads doorbell register to check transaction completion size, etc.

it

some

Xilinx core

add-in

interrupt &

processing.

memory

doorbell

guarantee

doorbell

also

Relaxed

receives

appear

(data from

obeserved

device

0xAAAA....AAAA.
0xABCDEF00.

esponse).

and

no

that

to PCI

system

chipset

66MHz, we

the bus

a

Xeon, and

workaround?

Reply to
Brannon King

I have a similar application. We perform PCI-X DMA with 4Kbyte bursts from our add-in card into the host's memory (Menory Write = command x"07"). This works well in a dual Xeon system with Intel E7501 chipset @ 133 MHz, we have a datarate of about 420 MB/s. We

have been working for a while before the handshake between driver and hardware worked properly. We do it like following:

--> add-in card performs an interrupt and sets an interrupt bit

--> the device driver writes a valid address for the DMA into a BAR register

--> the device driver clears the interrupt bit!

--> the user application in the FPGA starts the DMA and attempts to write the desired burst length to this address. The user application keeps track of the address. In case of a DMA abort it makes a new transaction request till the 4Kbyte are finished.

--> add-in card performs an interrupt and so on ....

Within 4Kbyte bursts we have about 10 aborts, depending on bus traffic and system. I have also measured aborts right from the start of the DMA without any data being written

to the host's memory! If you don't consider aborts, this can also be your problem!

Matthias

Mark Schellhorn schrieb:

forum) it

is some

Xilinx core

our add-in

interrupt &

for processing.

memory

doorbell

guarantee

doorbell

have also

Relaxed

correctly receives

to appear

(data from

readback

obeserved

device

0xAAAA....AAAA.
0xABCDEF00.

esponse).

0xAAAA....AAAA.

Snoop and

be no

data that

according to PCI

system

E7501 chipset

66MHz, we

when the bus

hitting a

Xeon, and

workaround?

-- Matthias Müller Fraunhofer Institut Integrierte Schaltungen IIS

-Bildsensorik- Am Wolfsmantel 33 D-91058 Erlangen Tel: +49 (0)9131-776-554 Fax: +49 (0)9131-776-598 mailto: snipped-for-privacy@iis.fhg.de

formatting link

Reply to
Matthias Müller

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.