Virtex-II Switch Matrix Performance

- G
- Gunter Knittel
  
  Contact options for registered users
posted
18 years ago

Fri, May 13, 2005 8:17 PM

Hi,

I'm trying hard to speed-optimize an arithmetic function on a VII 4000 - 4 device, using ISE 7.1. I have minimized the logic down to a few layers of 4-input LUTs. However, the routed design spends much more time on the wire than in the LUT. In particular, communication from one slice to the next in the same CLB through the Switch Matrix can be slow, sometimes in the order of 1ns.

So my question is: is there no fast private communication between slices in the same CLB (other than shift and carry)? Is there any documentation available about the performance of the Switch Matrix, and how I should arrange the logic such that fastest interconnects can be made?

Thanks a lot Gunter

- J
- John Adair
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, May 14, 2005 12:47 PM

There are particular routes from LUT output to particular LUT inputs that are faster. You can get down about 10 pS from my memory. We did exactly this on our CRC32 core which has multiple 3 levels of LUT and can run at about

380 MHz in V2. To do this though you have to spend a lot of time in FPGA Editor and going back and forward to get the timing. You get different timings from a LUT o/p to different inputs of the next (same) LUT so controlling which output links to which input gets you reduced flight times.

Starting from the simple end you can floorplan your LUT and roughly based on spacial distance before going to the extremes I describe above. You can also use the multiple place and route feature of the tools and that might get you close enough. Another thing to try is setting your synthesiser for area. Seems contractory but sometimes goes faster than speed optimisation. Try all of these first then try the more difficult.

John Adair Enterpoint Ltd. - Home of Spartan-3 PCI Development Boards.

formatting link

- G
- Gunter Knittel
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, May 18, 2005 9:40 PM

John,

thanks for your answer. What you describe is pretty much what I did - I used FMAPs, LOCs, RLOCs and PROHIBITs to force PAR to do what I thought was best, but without comprehensive documentation about the switch matrix I never got beyond a certain performance. Needless to say that I spent far too much time on that. People at Xilinx have been very helpful, but the ultimate solution has yet to be found. What I experienced were ratios of flight time in the order of several hundred - on connections which looked very similar in the FPGA-editor. I'm still hoping to get a good solution for this problem.

Cheers Gunter

"John Adair" wrote in message news: snipped-for-privacy@nnrp-t71-03.news.uk.clara.net...