Hi,
I'm trying hard to speed-optimize an arithmetic function on a VII 4000 - 4 device, using ISE 7.1. I have minimized the logic down to a few layers of 4-input LUTs. However, the routed design spends much more time on the wire than in the LUT. In particular, communication from one slice to the next in the same CLB through the Switch Matrix can be slow, sometimes in the order of 1ns.
So my question is: is there no fast private communication between slices in the same CLB (other than shift and carry)? Is there any documentation available about the performance of the Switch Matrix, and how I should arrange the logic such that fastest interconnects can be made?
Thanks a lot Gunter