If anyone can offer help with this (or a pointer to a more suitable forum) it would be greatly appreciated. I am posting here because I know there is some some folks with expertise who frequent this group (and we're using a Xilinx core :)).
We are attempting to perform 64B burst PCI-X DMA write transfers from our add-in card into host memory on a dual Xeon system.
Our linux device driver (kernel 2.24.x/2.26.x) is notified via an interrupt & single dword "doorbell" write that the larger 64B entry is available for processing.
The order of operation on the PCI-X bus is as follows:
64B data write --> 4B doorbell write --> interrupt.Upon receiving the interrupt, the device driver polls the location in memory where the 4B doorbell write is expected to show up. Once he sees the doorbell location written, he reads the 64B data location. PCI ordering should guarantee that the 64B data location is written to system memory before the 4B doorbell write is.
The above writes are performed as Memory Write Block transactions (we have also tried Memory Write transactions), the No Snoop bit is cleared, and the Relaxed Ordering bit is cleared.
We consistently encounter a situation where the device driver correctly receives the interrupt & single dword doorbell write, but the 64B write fails to appear in memory. Instead, the device driver reads a stale 64B block of data (data from the last successful DMA write).
As a debug measure, we had the FPGA on our add-in card perform a readback (Memory Read Block) of the 64B entry immediately after writing it. We obeserved that the data read back was stale and matched the stale data that the device driver saw. Eg:
1) Location 0xABCDEF00 is known to contain stale 64B data 0xAAAA....AAAA. 1) FPGA does Memory Write Block 64B 0xBBBB....BBBB at address 0xABCDEF00. 2) FPGA does Memory Read Block 64B at address 0xABCDEF00 (Split esponse). 3) Split Completion is returned by bridge with data 0xAAAA....AAAA.This appears to be a violation of PCI ordering rules. Again, the No Snoop and Relaxed Order bits are cleared for all of these transactions.
The device driver *never* writes to the 64B location, so there should be no possibility of a collision occurring where he writes/flushes stale data that overwrites the incoming DMA write.
This tells me that the location is NOT getting written because, according to PCI ordering rules, the FPGA read *must* push the Memory Write Block into system memory before reading back the location.
We observe this behaviour in dual Xeon systems with both the Intel E7501 chipset and the Broadcom Serverworks GC-LE chipset.
We observe this in SMP and single processor configurations.
When bus traffic is light at 133MHz, or whenever the bus is running at 66MHz, we do *not* observe this problem. We occasionally observe the problem when the bus is running at 100MHz with heavy traffic. This suggests that we are hitting a narrow timing window at higher bus speeds.
We are suspicious that we might be encountering a cache errata in the Xeon, and are wondering if anyone can confirm this and possibly provide a workaround?
We've been banging our heads on this for a couple of weeks now.
Mark