Xilinx hi-speed interconnect/routing question

I'm working on a V2Pro design that needs to have a small portion operate at over 400 MHz. As I've looked into the timing, I've noticed that similar routing between slices seems to have different timing delays. For example:

Location Delay type Delay(ns)

------------------------------------------------- SLICE_X34Y1.YQ Tcko 0.374

SLICE_X34Y3.BY net (fanout=1) 0.614 SLICE_X34Y3.CLK Tdick 0.202

------------------------------------------------- Total 1.190ns

**************************************************

Location Delay type Delay(ns)

------------------------------------------------- SLICE_X66Y42.YQ Tcko 0.374

SLICE_X66Y44.BY net (fanout=1) 0.407 SLICE_X66Y44.CLK Tdick 0.202

------------------------------------------------- Total 0.983ns

Note that both circuites route from a YQ output, jump two slices, then go to a BY input. Yet, the net delays vary by 200 psec.

Ideally, I'd pack the 2 flip-flops in one slice, but in my design they are clocked by opposite clock edges as I convert a DDR signal from the negedge into the posedge domain.

Can anyone explain the difference in interconnect delay? Does Xilinx publish anything that really explains how to get the shortest routing delay?

Thanks!

John Providenza

Reply to
johnp
Loading thread data ...

Johnp,

The simple answer is, no, we don't publish the information you are asking for, as we have practically no reason to support 'hand crafted' designs (results in too many unhappy people -- been there, done that).

Does the path in question span a BRAM column? That would be one reason for the difference.

Generally, differences are real, and we know they are there, and they are ususally there for a very good reason (that is the way it was in layout).

The "accepted" way of doing this, is to create a macro or block with its own contstraints, hard fixed or relatively fixed, and let the tools place it properly...but I admit that doing that is tough to fight the tools to squeeze ps out of a design. Resorting to FPGA_Editor, and just placing it exactly where it belongs and works is easier. It is just hell to support, and maintain.

There are many on this forum who know how to squeeze and navigate, and do what you need done, but I suspect they get paid for that knowledge...

Aust> I'm working on a V2Pro design that needs to have a small

Reply to
Austin Lesea

Austin -

Thanks for the comments. I'm just frustrated because if I run multi-pass P&R, the tools can find a solution that meets timing, but I don't see any hints as to how to use RLOCs to force the critical cells into magical alignments that produce the smaller interconnect delay.

John Providenza

Reply to
johnp

Hi John, If you don't trust the timing constraints, you can use FPGA editor's 'Directed Routing' feature to tie down your net. Either pick a routed configuration that meets your spec., or hand route the net in FPGA editor. Select the net, then go to Tools -> Directed Routing Constraints. You can use the dialogue box to save some text to paste into your UCF file which will route the net you selected the same way every time. You can even have it make relative constraints. It turns out that Xilinx use this themselves in some of their IP products, so it has a fair chance of continuing to work with new software releases! HTH, and good luck! Syms.

Reply to
Symon

Since Austin has revealed these are real numbers, could you somehow fine tune the costraints, so the longer paths do not (quite) make the cut ?

-jg

Reply to
Jim Granville

Ideal placement no longer guarantees an ideal route, sorry to say. In releases before the 5.1 "escape", there used to be a delay based clean-up so that if you gave it a perfect placement, the router did a darned good job of getting the routing correct. Since then however, the router only works as hard as it has to for the whole design to meet timing. The thing is, it no longer picks the low hanging fruit (ie the direct connects) consistently, which in turn congests the other routing resources. As a result, the router winds up stepping all over itself trying to get something that meets timing; if you are pushing the performance hard, the router will often not be able to find a solution that meets timing in a dense design unless you happened to have the right cost table (MPPR iterates using different cost tables to affect the routing order). In those cases, about your only option, assuming you've already tried setting the router to maximum effort continue on impossible, is to use directed routing...basically hand routing it with the FPGA editor and then exporting the routing info into a ucf. I don't wish that on my worst enemy, as it is a gruelling task if more than a small design or macro.

Reply to
Ray Andraka

If a PCB router did this, it would be laughed out of the market...

-jg

Reply to
Jim Granville

Hi, Can you do this: Take the tricky part of the design into a new blank design on its own. Get the P&R to churn away on that until it gets something that works (should be easier, as there is nothing from a huge design to get in its way). Then export that routing to a ucf, as suggested by Ray, and finally re run with this ucf for the full design. I could be completly wrong - but it might save you the hassle of hand placing.

Maybe something like PlanAhead also - that seems to be incredible for constraining the P&R tools in a graphical way - check out the Demos on Demand for it on the Xilinx website.

Reply to
John McGrath

One thing I've done in 5.x and 6.x ( but haven't tried in 7.x or 8.x ) that seems to work OK is to go into FPGA editor with a simple test design, find the delays for the direct connect paths I want it to use, then stick a MAXDELAY on those nets to force the router to use those connections.

This has worked well in conjunction with placed logic without resorting to the directed routing constraints ( at least for the small sections of critical logic that I've used it for so far, I'm not sure if a horde of MAXDELAYs would blow up P&R for a big RPM ).

Brian

Reply to
Brian Davis

Are you using local inversion for that? I.e. are you using the same clock in both cases to enter the slice, and then use pos-/neg-edge stuff inside the slices? If so, in one case the local inverter for the clock adds some delay, in the other case it's not used.

Maybe the difference in the delay originates in the clock delay introduced by the inverter.

cu, Sean

Reply to
Sean Durkin

Thanks for all the thoughts about this issue.

With my design, I put tight (but achievable) constraints on my critical signal and used RLOCs to lock the flip-flops in reasonable postitions.

If I use the multi-pass P&R, sometimes the tool makes timing on the critical net, sometimes it just misses. So having a 'good' constraint won't force P&R to perform correctly.

Gien that the identical verilog is used for my tests, Sean's comments about local clock inversion probably don't apply. It purely appears to depend on P&R.

I'll give Brian's MAXDELAY tip a try next.

I'll keep you posted on results.

John Providenza

Reply to
johnp

No, unfortunately that doesn't work too well. The router does no better for small designs...it just isn't as noticible because the routes that go outside the used LUTs aren't impeded by congestion from neighboring logic. If you tried to plug that back into a dense design, you get route collisions.

PlanAhead and floorplanner also don't help for routing. They can specify placement, but not routing.

Anyway, the hand placing isn't too much a hassle if you do it hierarchically in your source code. Look at the gallery on my website

formatting link
for some (admittedly dated, I haven't updated it in a while) examples of designs that were floorplanned using RLOCs embedded in the VHDL.

Reply to
Ray Andraka

Brian, I've tried that. For one or two it seems to work. For many, it slows PAR way down and usually won't find a solution where all the maxdelays get met, not to mention making the UCF a nightmare.

Reply to
Ray Andraka

Ok, that sounds (dare I say) about par for the course...

Have you ever gotten anywhere in convincing Xilinx to add a flag to the router to restore the old, more consistent, behavior?

I just stuck the MAXDELAYs in the source near the net keep directives.

I have enough UCF nightmares already; on the bright side, at least they haven't made the UCF a binary file yet :)

Brian

Reply to
Brian Davis

From a Software Architecture viewpoint, the _sensible_ thing would be to allow a selection of router by resource, and also an order.

That way, you could tell it to use a simple, direct path router on the fast nets first, and then the speed-driven router on the other nets..

I fear the 'old router' code is long lost, in the mists of time...

ssssshhh ! - someone in Xilinx might think that's a good idea!

-jg

Reply to
Jim Granville

NO, and the answer I get back is "ain't gonna happen".

I'll have to try that. I've put other xilinx attributes like HU_SET and TNM in the source, but hadn't thought about putting the MAXDELAYs in the source.

Ssshhhh! Don't give them any ideas.

Reply to
Ray Andraka

I believe that the MAXDELAY constraint applies to the actual net delay and does not include clock->Q propagation delay or setup time. So if your target is 400MHz, 2.5 ns is NOT the right value.

This trick may have worked for me, but because sometimes PAR does OK and sometimes it doesn't it's hard to say for sure that MAXDELAY is a sure fire work-around to force PAR to do the right thing.

It is very annoying that having told PAR how to place two block so that it can succeed, it then messes up a very simply route and misses timing.

This sure seems like low-lying fruit that the Xilinx s/w folks could pick.

John Providenza

Reply to
johnp

Right, it doesn't replace the timing constraints, just forces the router to look for a route with delay shorter than the constraint you've set for that net.

Sorry if my explanation was a bit murky, let me try again:

If the value you set is less than physically possible in the part, PAR will bail out saying such a route does not exist; that's why I'd mentioned looking up the delays in fpga_editor first: then you set MAXDELAY just barely above the delay of the routing path you want to hit, so that the router has no other available choice.

It may also help to make the delay values a set of constants so you can change them easily if moving parts or speed grades.

I've used it just for small critical bits of logic ( like synchronizers, DDR capture stages), and the ones I've checked have looked OK.

Given the "physically impossible route" warnings from PAR, my hope is that this constraint is checked early on before the first attempted route of the net, thus forcing usage of the shortest interconnections from the very beginning of the routing process.

Ray has pointed out that he's seen problems with wholescale usage of the MAXDELAY constraint choking PAR, so it doesn't seem to be suitable for mass application.

Brian

Reply to
Brian Davis

and don't forget speed file upgrades.....:)

-jg

Reply to
Jim Granville

OK, but the actual implementation of calc_max_delay is left as an exercise for the reader :)

attribute maxdelay of my_net : signal is calc_max_delay ( desired_path_type, part_type, speed_grade, ise_version, speed_file_version, phase_of_moon );

Brian

Reply to
Brian Davis

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.