ISE Toolflow : hardmacro, incremental or modular

D

Daniel 21 years ago

Hi,

We are doing a large V2P design with some blocks requiring very tight timings. Up to now we run the design through par and then used the ngc file to guide for the critical routed nets and placed components.

This is not really convienient since it doesnt allow us to separate the different blocks easily.

I wonder now which of the following flows might be the best solution for us: hardmacro, incremental or modular design.

We need to have fixed routings and placement. Is this possible with all flows?

Are those flows stable for large design (V2P70 which will get at least 60-70% of usage)

Thanks for your help

Best regards Daniel

Vote

B

Bret Wade 21 years ago

Up to now we run the design through par and then used the ngc file to guide for the critical routed nets and placed components.

different blocks easily.

hardmacro, incremental or modular design.

of usage)

Rather than using a hard macro, I suggest combining the use of an RPM macro with the Directed Routing feature. This gives you all of the control of the hard macro with none of the drawbacks. IMO, hard macros should now only be used to force configurations that MAP won't accept. That and for Partial Reconfig bus macros I suppose.

Bret

Vote

D

Daniel 21 years ago

@Bret Thanks for the suggestion. Didnt know that its possible to fix the routing also within RPM macros.

Are RPM macros usefull for design modules as large as my one (376 CLB Slices (752 FlipFlops, 260FG's), 8BRAMs and 1GlobalBuffer) or does this lead to problems with performance or even unstable workflows?

How fix is the routing "fixed" with directed routing in the later PAR of the whole design? Is it possible that the router changes anything?

Best regards Daniel

Vote

B

Bret Wade 21 years ago

routing also within RPM macros.

(752 FlipFlops, 260FG's), 8BRAMs and 1GlobalBuffer) or does this lead to problems with performance or even unstable workflows?

Yes, large RPM macros can be used effectively. Multiple small RPMs can be assembled with offsets defined by hierarchical RLOCs to assemble a large RPM. Automatic placement of large RPMs can be a challenge so it may be necessary to locate the macro. I believe that Ray Andraka has posted on the subject of large RPMs in the past. You may want to Google for that.

whole design? Is it possible that the router changes anything?

Directed Routing is a relative constraint, and so the constraints will be valid wherever the macro ends up, providing that the relative location of the component pins is consistent with the routing constraints. It may be necessary to use BEL constraints as well as RLOC constraints to ensure the pin locations are correct. Once the routing constraint is successfully applied, the routing is fixed.

Since you mentioned that a large macro is involved, I should point out that Directed Routing is not recommended for use with large numbers of signals on the order of hundreds. Guide should be used instead. That recommendation depends on the nature of the routing involved and the level of congestion around the locked routing.

Some more information on Directed Routing here:

formatting link

Regards, Bret

Vote

B

Brian Drummond 21 years ago

routing also within RPM macros.

(752 FlipFlops, 260FG's), 8BRAMs and 1GlobalBuffer) or does this lead to problems with performance or even unstable workflows?

I am working with larger RPMs than this, with some degree of success (cautiously expressed; the design is not yet complete)

There *may* be problems with BRAMs and multipliers in RPMs.

There are problems using the floorplanner to create RPMs from smaller ones, mostly associated with tools issues (the floorplanner alone has two mutually incompatible understandings of RLOC_ORIGIN, the mapper moves the origin left by 1 location under some (apparently undefined) circumstances, the placer reports errors on some correct RLOC_ORIGIN constraints and silently deletes others altogether, and so on.

formatting link

contains a test case for a few of the problems, where only one of eight RPMs is placed correctly.

There are other problems too, including floorplanner swapping BELs within a CLB, and crashing when writing RPM UCF files.

Maybe FPGA editor is a more stable tool for floorplanning? I would try it but only have WebPack in current software.

But if you can identify and work round the tools limitations, it looks tantalisingly close to workable, with RPMs considerably larger than yours above, and composed hierarchically of RPMs several levels deep.

One of the (undocumented? - at least I haven't found it anywhere) workarounds seems to be to keep a component in the lower left hand corner of the smallest bounding box that can surround the RPM, which should site this corner at RLOC=X0Y0 - a condition violated deliberately in the test case above (and accidentally in several of my RPMs!)

Another is to "replace all with placement" (or "constrain from" as appropriate) which corrects randomly swapped BEL elements, should they occur.

I would love to see some others, or an App Note on this process...

And I'm hoping some of these floorplanner bugs can be fixed.

whole design? Is it possible that the router changes anything?

Guide is interesting, both because I don't have FPGA editor, and apparently directed routing isn't intended for large RPMs.

Can you point me in the direction of a flow that would use guided routing for a top level module composed of several pre-routed RPM modules? Preferably where at least one of the RPMs has itself been created in this way?

I am currently finding it difficult to maintain timings achieved on lower level modules when they are combined together, and scope for routing congestion obviously increases.

- Brian

Vote

B

Bret Wade 21 years ago

routing also within RPM macros.

(752 FlipFlops, 260FG's), 8BRAMs and 1GlobalBuffer) or does this lead to problems with performance or even unstable workflows?

There is a certain amount of complexity added when an RPM combines multiple component types (Heterogeneous RPM) due to the fact that for the default grid system, the various component types are on different grids. This is only a problem if your RPM requires normalization. If your RPM uses the X0Y0 slice and does not use any negative RLOC values, then no normalization occurs and there is not problem. If normalization is necessary, then the RPM must be implemented with the RPM Grid system. More on normalization and the RPM Grid here:

formatting link

RLOC_ORIGIN values must take into account the normalization of the RPM. More on that in the appnote. The mapper converts RLOC_ORIGINs into MACRO LOCATE constraints in the PCF file, so strictly speaking it's not possible for the placer to ignore RLOC_ORIGIN constraints because it never sees them.

A separate issue is that the RPM needs to be placed in the same "slice type" that it was created about. There are four slice types represented by the four slices in a CLB, S0-S3. For simplicitys sake, it is best to construct a macro about the X0Y0 slice and then always place it in an S0 slice type if possible. If the RPM is placed in a different slice type, the relative placement will be broken, which can lead to placement failures.

I suspect that these two issues explain some of the behavior that you're seeing. I can't speak to what the Floorplanner is doing in constructing your RPMs.

FPGA Editor is not a floorplanner. It is an editor for displaying and modifying the physical design and applying some physical constraints. The Floorplanner is an editor for applying constraints to the logical design. FPGA Editor is the only tool for applying Directed Routing constraints and it is useful for obtaining grid values for both the standard and RPM Grid systems. FPGA Editor is also a great tool for understanding the details of possible component, placement and routing configurations within the FPGA.

This is the normalization issue again. You don't have to build the RPM around the X0Y0 slice, but if you don't, you have to account for normalization.

whole design? Is it possible that the router changes anything?

Directed Routing is not incompatible with the guided flows. There's no reason why you can't combine them. You'll just need to get FPGA Editor and try it.

There is one problem area that I should mention. The placer can currently place an unconstrained slice in conflict with Directed Routing, where the locked routing blocks switchbox access to BX and BY pins on the slice. I've only seen it on one design so far that had a lot of Directed Routing that was using switchbox bank shots. Beware using routing constraints like that in areas where the slice utilization is uncontrolled. A work around is to prohibit the affected slice site. We're looking a placer fix in the 7.1i time frame.

Regards, Bret

Vote

B

Brian Drummond 21 years ago

This is _very_ good information on RPMs including BRAMS or multipliers or such (IOBs?) which live on different grids.

I note S0 and S1 share the same RPM_GRID X value though (unless I misunderstand the floorplanner) they appear in adjacent columns (x, x+1) in floorplanning, e.g.

floorplanner and standard grid S2 S3 S0 S1

RPM_GRID S3 S2 S1 S0

The translation given from SLICE_X26Y40 to RPM Grid X42Y84 in the appnote seems to bear this out.

It's not clear to me quite how that relates to the testcase, which only uses LUTS, SRL16s, and FFs, which are all on the same grid (Spartan-3 restrictions on SRL16 notwithstanding). R0C0 is used, though some elements have negative X values (the floorplanner doesn't give you any choice about this if you don't use the lower left hand corner of the bounding box surrounding the RPM).

Is normalisation still an issue in this case? It seems to me that the normalisation is onto the same grid since RPM_GRID is not being used, so I don't see where the problem lies.

And outside Xapp416 and this message, I haven't seen any mention of normalisation. Is it described anywhere for the standard grid? Aha! Searching on that word gives the useful looking TechXclusive "Relationally Placed Macros 08/30/2002 "

formatting link

True! Strictly speaking the placer ignores the MACRO LOCATE constraints instead. They are in the PCF file, I just checked - see comment above regarding the mapper moving the origin, it does so in the same constraint conversion.

This may be the problem, but I don't see why the limitation exists. Hand placement of the same components onto the other slice types (again, excepting SRL16s in "odd X" locations) seems to work fine, though not placement of RPMs.

I have used it (3.1 era) but don't have a current one.

Interesting, but I think I have been warned off Directed Routing for the size of macros I am using.

Is there a way of using routed versions (NCDs?) of several RPMs as (multiple) guide files for a design incorporating them? Your earlier recommendation that "Guide should be used instead" seems to imply that there is, but I can't see it.

Many thanks for your answers and help,

- Brian

Vote

B

Bret Wade 21 years ago

You're correct that the two grid systems increment differently. The RPM Grid corresponds to the actual placement grid. The original grid system was created so that designers could easily specify column based RPMs such as carry chains using increments of one. This discrepancy between the original grid and the placement grid is what causes problems when an RPM is not placed in the correct slice type. There are inherent problems anyway wrt shifting logic across CLB boundaries in something other than full CLB increments.

Yes, normalization needs to be taken into account even when there is only one component grid involved if you are using RLOC_ORIGIN. If you don't calculate an offset to compensate for normalization, then the macro won't get placed where you expect it.

formatting link

I was consulted on that section of the document, so you still have no evidence that I know what I'm talking about. :-)

The placer doesn't often ignore LOCATE constraints. It's more likely that you are just getting unexpected results because of the normalization issue. Note the difference between your RLOC_ORIGIN and the resulting LOCATE constraint. That difference is due to normalization.

The best way to illustrate this is to manually place RPMs in FPGA Editor using different slice types. You'll quickly see how the relative position gets corrupted and if you crunch the grid numbers, you'll understand why. The importance of the relative position varies with the logic involved. It's critical for wide-gate structures that depend on dedicated routing resources between F5 and F6 muxes in different slices. It's relatively unimportant for generic LUT/FF slices.

You've been warned off using it for hundreds of nets. You could still consider using it for the most critical paths or in any case where there is only one suitable routing resource for a signal.

That's what Modular Design does. Separate guide files are used during the assembly phase to guide the overall design from the various module implementations.

You're welcome. Bret

Vote

B

Brian Drummond 21 years ago

ah! so in these generations (VII, S3) the carry increment is 2? Then I can see that reflecting the true organisation in the floorplanner would be problematic.

That may be part of what my test case is exposing.

Incidentally I DID find a recommmendation to place RPMs such that they began at R0C0 in the answers database ... but it went on to say "this problem will be fixed in 4.2"!

Oh I calculate an offset. The problems are that the tools appear to modify that offset in undefined ways or ignore them.

I'm pretty sure you do, but it can get pretty convoluted so I have no evidence I understand you :-)

Possible, but I would expect the placer report (.par) file to contain "RESOLVED that be placed at " messages, but I only get 6 for 8 constraints (this file is included in the testcase), and the normalisation for the other two was X20Y22 and X18Y-12, for modules 6x9 in size. Seems unlikely that this is just normalisation.

The other 6 were placed within a couple of CLBs of the expected location, I am trying to reconcile the differences with what you have told me about normalisation.

part of this exercise has been to see how far I could get with the free tools and the S3/1500, but now I'm convinced it's time to upgrade.

Again, thanks. Having the right term to search for makes all the difference!

- Brian

Vote

B

Bret Wade 21 years ago

This sounds as though you've searched the Answer Archive and found an old obsolete Answer Record. If not and that's an active Answer Record please let me know what the number is. Some aspect of this problem probably was fixed in version 4.2. It's always a challenge to write an Answer Record that won't be misapplied to similar but different problems.

Here's one that is applicable to your situation:

formatting link

I took a look at your test case and do agree that there is some unexpected behavior resulting from the normalization of your negative RLOC constraints. Focusing on the macro "I1/hset", You have a number of instances RLOC'd into column 0 beginning with X0Y0, so far so good. Then you have some instances RLOC'd to X-2Y8, X-4Y8 and X-5Y8.

The first two don't cause a problem because they are S0 slices. The last one does cause a problem because it's an S3 slice and the normalization pushes every thing else into the wrong slice type. If I disable that RLOC with the following UCF constraint, the macro starts behaving like a good citizen again:

INST "I1/int_delay1" USE_RLOC=FALSE ; # LUT2 at X-5Y8

I looked at this and found a messaging issue. All eight macros were locked, but the "RESOLVED" messages only listed five of them. Note that there are three macros missing from that list and three macros that generate the following warning about alignment:

WARNING:Place:206 - This design contains an RPM macro for which a specific alignment on the CLB grid was desired. The macro can not be aligned in this specific way. The placer will disregard this alignment.

I can understand that. I have a Linux PVR project going at home.

Regards, Bret

Vote

B

Bret Wade 21 years ago

formatting link

Brian,

I've looked at these issues closer and found that there are indeed two tool bugs. First the RLOC_ORIGIN of the RPM was being incorrectly translated into a Macro Locate constraint by MAP. Second the Placer was ignoring the bad LOCATE constraint, although it was printing a warning.

Happily, both problems are already fixed for version 7.1i. Meanwhile to work around the issue, you can modify the macro (as mentioned above) or you could manually fix the macro constraint in the PCF file.

You can make the PCF fix permanent by moving the corrected PCF constraint below the "SCHEMATIC END" line, remove the UCF constraint, and then use the existing PCF as input to map. Map will rewrite the PCF, saving everything below the "SCHEMATIC END" line.

I still stand by what I said previously about normalization, but I was incorrect to assume that it applied to your macros which don't require normalization since the X0Y0 slice is the reference comp. The reference comp is the lower left most comp, with lower beating out left most in this case.

Regards, Bret

Vote

B

Brian Drummond 21 years ago

Yes, re-checking it (13684), I see "Status:Archive", I had found it on a "whole site" search and hadn't noticed that before.

Incidentally, a minor "wish" would be that searching the answers database for 17217 returns Answer Record 17217. OK, I can edit URLs, but...

[...]

Yes, it's a pathological example! I will look at how USE_RLOC=FALSE changes the behaviour. This was suggested as the (partial) solution to another problem.

[...]

I agree there are three "Place:206" warnings, which don't count as ignoring constraints in my book - however, they apply to three of the (6, not 5) "RESOLVED" constraints.

The two remaining MACRO LOCATE constraints (for I1 and I5) have neither "RESOLVED" nor "ERROR" nor Place:206 warning. As seen in the PAR report in the test case (from 6.3sp2).

Which PAR version are you using?

Though i have to say for free tools, and for the price of the Spartan-3 development kit (Avnet PCI) it does astonishingly well...

PVR - what's that, if I may ask?

- Brian

Vote

B

Brian Drummond 21 years ago

looks like my last message crossed with this one...

It still looks to me that the Place:206 warnings are NOT about the "ignored" constraints but 3 that had been "RESOLVED" earlier in the placer and later discovered to be wrong. These ones only move a little step to the right (+/-1 slice), not the huge jump (to the left?) I mentioned earlier.

Any light you can shed on the missing 2 (I1 and I5?) If you're getting results different to the ".par.saved" file, which PAR are you using?

Excellent! Which will be out ... when?

The other issue has been acknowledged as a bug, it'll be my third CR on the floorplanner.

I've been modifying PCF constraints as a last resort only.

aha! another "trick of the trade" to file away in case I need it...

That was maybe why I was getting confused... I didn't dare try a test case with the reference comp _off_ X0Y0! Maybe one day when I get some time...

I know all this may seem like nitpicking...

but I've always pretty much used "pushbutton mode" in the past, and re-pipelined if I didn't quite meet targets. This is an exercise to see how much better results I can get from floorplanning, without changing my VHDL coding style to any great extent.

I've tried floorplanning in the past (3.1 era!) but while it's looked OK for individually instantiated elements (as per Ray Andraka's classic approach) it got ugly pretty fast from regular VHDL.

But the tools seem to have come along to the point where it _almost_ works, though I miss the "Congestion" plot in the old 3.1 floorplanner! Modulo these bugs, (and others I won't mention because I can work round them) and my slow learning...

Up till now I've been spending more time trying to get the tools to behave (or looking at it the other way, learning which buttons NOT to touch!) than working, but I can take a substantial block, say several hundred components, (fairly) quickly floorplan it, RPM it, and black box it in the higher levels. If you're floorplanning, hierarchy is your friend...

These larger blocks are easily 50% faster than the "pushbutton" flow achieves.

Combining several of these into a larger RPM gave some difficulty at first, but seems to work with care. Assembling the whole lot into a completed design without losing some of that extra speed is another matter. So far.

But 50% extra speed for (eventually, I hope!) little extra work is pretty worthwhile...

- Brian

Vote

B

Bret Wade 21 years ago

If I fix all of the MACRO LOCATE statements by hand editing the PCF to lock them to the closest S0 slice, I get the following in the .par file:

Resolved that macro must be placed at site SLICE_X20Y20. Resolved that macro must be placed at site SLICE_X40Y20. Resolved that macro must be placed at site SLICE_X10Y10. Resolved that macro must be placed at site SLICE_X30Y10. Resolved that macro must be placed at site SLICE_X10Y20. Resolved that macro must be placed at site SLICE_X20Y10. Resolved that macro must be placed at site SLICE_X30Y20. Resolved that macro must be placed at site SLICE_X40Y10.

If I change the LOCATE constraint for I1/hset back to SLICE_X9Y10,6.3i will ignore the constraint and I only have seven "Resolved" messages. If I run 7.1i PAR, I get a hard error message with a detailed description of why that constraint won't work. This indicates to me that the bad LOCATE constraints are the root of the problems. 7.1i MAP no longer generates the bad constraints and 7.1i will correctly reject them.

Currently scheduled for FCS in early March.

Yes. We were just discussing in a meeting the other day about how obsolete this keyword is and whether it should be changed.

Not at all.

Glad to hear you're seeing some good results.

Bret

Vote

B

Brian Drummond 21 years ago

as expected...

Excellent news again!

I had no doubt the constraints were responsible ... but it's not obvious why they are "bad".

What I was trying to understand in simple terms, with this test case, is that hand placement "works" for six of the eight possible alignments (4 out of 4 LUT/FF, 2 out of 4 LUT/SRL16/FF) and the failures are easily understood.

Yet placement by RLOC_ORIGIN can't reproduce this, and I'm still not clear why not. OK, there are differences (temporarily) down to tools issues, but - beyond that - why not?

I naively expected that the mapper would be transparent to RLOC_ORIGIN, (=> MACRO LOCATE) and the placer, in expanding the macro, would translate each BEL appropriately (e.g. for a shift right, S0 -> S1 and S1 -> X+1.S0 rather than X.S2), only failing if that BEL didn't exist (e.g. no SRL16 in S1) or some other conflict occurred.

i.e. if it can be done by hand, why don't the tools do it? Or, if it can't be done, why did the hand placement get through PAR?

Possible answers are a) "7.1 does exactly that" ... in which case, great!

b) "It's a pointless exercise that might satisfy naive floorplanner users but adds complexity and/or bugs to the placer without improving packing or performance" is probably an acceptable answer, but IMO needs health warnings in the floorplanner manual!

c) "There's a fundamental flaw you overlooked" ... ok, but where? I thought normalisation might have been it, but apparently not.

d) "The simpler translation (S0->S1, S1->S2) is better because" ... from which there's probably something useful to learn about floorplanning.

e)???

Time to pre-order...

And I'm very glad to hear ... coming back to topic ... that the RPM toolflow is only going to improve.

- Brian

Vote

ISE Toolflow : hardmacro, incremental or modular

Join the Discussion

Didn't find your answer?