Params: Xilinx's PCIX core for PCI64/PCIX at 66MHz
2v4000-4 running the controller core with 40 Fifos (10 targets, 2 channels, r/w) and a busmaster wrapper Tyan 2721 MB w/Xeon 2.6GHz w/ 4GB RAM Win2k Server sp4 No scatter/gather support in driver Exact same software and hardware for both reads and writes Bus commands 1110 and 1111
Results: Max host write speed: 70MB/s Max host read speed: 230MB/s Development time: six months w/ two engineers for both driver and core wrapper
The timer does not include the memory allocations. Any ideas why the write speed is so much slower? Would it be the latency parameters in the core? An OS issue?
When you say "write speed" do you refer to your device becoming bus master and doing memory writes to the system RAM behind the host bridge? Likewise, by the term "read speed" do you refer to your device becoming bus master and doing memory reads of the system RAM behind the host bridge?
I just want to make sure I didn't mis-interpret your question before I try to answer it. Or did I get it backwards?
Is the bus operating in PCI or PCIX mode? If it's in PCI mode then you are seeing the disadvantage of not being able to post read requests. Your device is getting told to retry while the chipset fetches the read data.
If it's in PCIX mode then you should make sure that your DMA engine is issuing as many posted read requests as possible of as large a size as possible.
Mark
Brann> To clarify one issue, host write refers to DMA busmaster read (the busmaster
I think Mark described it well in his post. If this is PCI mode, it isn't entirely surprising. If this is in PCI-X mode, and you are using split transactions (supporting multiple outstanding is best) then you may need to do some hunting.
The best tool for this is a bus analyzer, if you have one (or maybe can borrow one from a vendor to "evaluate" it?) There could be all manner of secondary issues that cause problems:
bus traffic from other agents
you are behind a bridge
your byte counts are small
Sorry I don't have a more specific answer for you. Eric
For those speed tests the device was in PCI mode. I was assuming it would be the same speed as PCIX (at the same bus speed) because the timing diagrams all looked compatible between the two. Please explain what you mean by "post read requests". Is there some workaround for this to make the PCI mode handle this better?
Actually I shouldn't have called them "posted reads". Posting a transaction means that the initiator never gets an explicit acknowledgement that the transaction reached its destination (like posting a letter in the mail). PCI writes are posted. A PCI read by definition is non-posted because the initiator must receive an acknowledgement (the read data).
What I should have said was that the PCI-X protocol allows the initiator to pipeline reads. If you have a copy, the PCI-X spec explains it pretty well. Here's the short version:
In PCI-X, the target of a transaction can terminate the transaction with a split response, which tells the initiator that the target will get back to him later with a completion transaction (data if it's a read). The request is tagged with a 5-bit number that will come back with the completion so that the initiator can match completions to outstanding requests. The initiator is allowed to have up to 32 split requests outstanding in the pipeline at any one time. Each read request can be for up to 4kB of data. The throughput of a system that takes full advantage of split transaction is highest when the amount of data being transferred is large and the latency is small enough that 32 tags can keep the pipeline full.
In PCI, the target of a read transaction must either respond with data immediately, or repeatedly terminate the read attempts with retry while he goes off and fetches the data. Once he's fetched it, he will be able to respond immediately to the initiator on the initiator's next attempt. This is very inefficient because there is only one transaction in the pipeline at a time. If the latency is large (the initiator has to retry many times), the throughput is much lower than when pipelined reads are used.
If PCI-X mode is available, use it. Or, there may be chipset settings that you can use to improve PCI mode performance. The chipset may be able to do pre-fetching of data in anticipation of you reading it. There may also be burst length settings that allow you to increase the amount of data transferred in a single transaction. You need to read the specs for the chipset you are using and figure out what can be tweaked.
Mark
Brann> For those speed tests the device was in PCI mode. I was assuming it would be
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.