Virtex4 shift register layout: Horizontal or vertical?

Summary: Should high-speed shift registers (purely DFFs) flow left->right, top->bottom, or does it matter?

Doing a Virtex4 (LX100). I wound up with high-speed inputs (clocks and serial data) on the IOBs that run vertically down the center of the die. OK for clocks, since the DCMs and BUFGs are also in that center column.

Regarding data: I deserialize the inputs, and pass parallel data to some large datapath blocks that start on the left edge of the die. I constrain all my deserializer logic to use sites near the center column to minimize clock skew. (Sorry, I can't use the on-chip deserializers; they max out at 10 bits, and I need 12.)

Kind of annoying to take my inputs from the center over to the left, and then start flowing left-to-right, but that's just how the layout wound up...

Anyway, older Xilinx docs say you should flow left->right, and numerical operations should have their MSB flowing downward (?) to make best use of carry chains. Is that still the case?

Back to my deserializers: I'm using 20 of them, so that's 40 6-bit shift registers (6 bits on rising edge, 6 bits on falling edge), which I plan to do manual placement on. I don't want to use the SRL16's (yet; long story...), but just DFFs.

I thought left->right flow was perfect for shift registers, but ISE seems to place all the DFFs for a downward flow. Actually, since my inputs are all down the center column (20 serial data streams) and need to go left, horizontal alignment flowing right-to-left makes for best placement. Not sure about routing though...

Any suggestions? The data is coming in close to 1Gbps, so the shift regs run close to 500 MHz...

Thanks again for all the help; I'll appreciate your input as I work another weekend...

Reply to
Loading thread data ...

schrieb im Newsbeitrag news:

6 bit shift register places nicely into single CLB cluster with ALL routing being implemente in the switchbox, eg with no local routing involved at all. So there is no top to bottom or left to right issue at all as its so compact. So as of shift register there is no issue.

but at 500MHz clock, sure careful design is required.

so for your deserializer

1) use idelay so you can fine adjust the routing delay/diff from iob to the shift dff 'cluster' 1a) use DDR input flops in IOB for the first stage 2) place the parallel holding register 'around' the serializer cluster, this could be possible with short local routing 2a) implement a 2 bit serial parallel cluster (that uses only swithcbox routing) and place the 2 bit clusters so that they reach each by local routing

you only need some RLOCs in your macros and all the thing is done. sounds like fun thing todo - I was amazed myself when I measured actual real clock speeds of 970MHz in slowest speed grade V4 :)


Reply to
Antti Lukats

How about 6-bit with additional user logic at 1/6 the input frequency to get to 12 bits?

Kolja Sulimma

Reply to
Kolja Sulimma wrote:

You can still use the deserializers. Use them as 6 bits and unload them at 2x the rate. Since your data is ~1Gbps, that gets you to an unload frequency at about 170 MHz, which is quite easy to deal with. The SRL16's wouldn't help you much for a deserializer.

As far as the layout, unless you are using the carry chains, there is no reason to arrange the registers for horizontal flow. In fact, with the V4SX family, you'll find that you'll have more of a vertical flow to use the DSP48s anyway. Direct register to register connections, even without an optimal place and route are pretty fast in V4, and I think you'll find that 500 MHz is fairly easy to achieve if there is no logic between the flip-flops, even in a -10 part. For high clock rates, you' want to avoid the carry chains. The router doesn't do a very good job at getting on and off the chains, even when the driving flip-flop is in an adjacent slice, so even if you get lucky and make timing with one PAR run, there is a good chance you won't make timing after making design changes. As long as you stay off the carry chains and keep your logic to one level between flip-flops, the fabric will run faster than the maximum clock spec of the BRAM and DSP48 slices. That max clock is 400 MHz for a -10 part, 500 MHz in a -12. If you avoid logic in front of the FF's you can get well past 500 MHz in a -10 part. Be careful that the synthesis doesn't use the sync reset input though, that input is dreadfully slow: IIRC more than 2ns tsu. Synplicity has a nasty habit of using the sync reset for logic (which would be a good thing if it could be overridden with a switch), which kills the performance of a high clock rate design.

--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
 Click to see the full signature
Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.