Place and Route Algorithms

- M
- marco
  
  Contact options for registered users
posted
18 years ago

Tue, Dec 20, 2005 5:44 PM

Hello everyone.

Does anyone know where I can find information about the place and route algorithms used for FPGAs, and what kind of work has been done or is being done to accelerate these algorithms. Books and/or links to web-sites will be greatly appreciated.

Regards, Marco.

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Dec 20, 2005 8:05 PM

Marco, I am sure that you will not find anything beyond very basic tutorial information. These are the "crown jewels" of any FPGA company, and these jewels are well guarded, but also polished daily. The quality of these tools determines the success of our companies, and each of us wants to be at leats a step ahead of the other company. BTW, the continuous investment by companies like Xilinx and Altera (to name just the two biggest) is enormous, and it is unlikely that an individual engineer can provide significant improvements. Unless you are a genius and addressthe problem in a very unconventioanl way. Peter Alfke

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Dec 20, 2005 8:58 PM

formatting link

- S
- Stephane
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 9:20 AM

Maybe it's just better not to know: [remark for all users:] did you never realize that sometimes setting some options has the opposite action that their name suggest?

I always told myself: the proof that the tool vendors are not comfortable with the complexity of their software is named Xplorer...

I don't blame, I'm no troll! I acknowledge the great job your teams do. But intrisic knowledge of the tools should allow confirmed users to set correctly the right options with predictable behavior.

In that case, create your company and get acquired by one of those "biggest" or submit a CV ;-)

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 2:39 PM

Peter Alfke schrieb:

I totally disagree.

There is a difference between an algorithm and an implementation with tweaked parameters for a given architecture. Im am doing EDA-algortihm research for quite some time, and most substantial progress has been made by individuals or very small teams.

These people do not provide industrial grade implementations that can do a better job on any given FPGA than the vendor tools, but provide enough evidence that a certain approach might be better suited.

I hear pathfinder variants are used a lot for fpga routing. Two authors only.

formatting link

And without flowmap Xilinx probably would build MUX-based FPGAs now.

You and can be greatful that Leiserson and Saxe did not patent retiming and sell it to altera.

formatting link

Otherwise ISE would have to be an web based application with servers running in europe.

Xilinx used placement based on simulated annealing in the past. (UC Berkeley, Carl Sechen, Sangiovanni-Vincentelly, et. al.) What is it now? Quadratic placement imported from Munich?

The OP should turn to University of Toronto were a lot of FPGA placement and routing research has been done. Also, scholar.google.com tells me that André DeHon at CalTech has published in the area of hardware accelerated placement and routing recently (2003)

Kolja Sulimma

- M
- marco
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 5:35 PM

Thank you very much for the information Kolja. I greatly appreciate the information that you included in your reply. Best Regards, Marco.

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 6:36 PM

I stand corrected. My response was directed at the suggestion: "I can come up with a better P&R implementation for Xilinx or Altera chips". As Kolja also might agree, an industrial-strength complete solution involves massive efforts that only the major companies can afford. But a novel algorithm, a completely different way of attacking the problem, can more easily come from a few individuals who are not burdened with maintaining legacy designs, or with the expectation of completing their project to industrial strength. Without such burdens, they can explore more freely, and such exploration has been fruitful in the past, and might well be in the future. I stand corrected. Peter Alfke

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 7:55 PM

Peter Alfke schrieb:

Full ACK.

For my contributions to that area it usually is required to check the benchmarks beforehand for special cases the tools can not handle. E.g. it is just not worthwile for research to handle wires that go directly from inputs to outputs if your data model for some reason requires a cell connected to each net. An industrial tool would hack in some workaround. I just delete the wire. Even large companies sometimes take a while to handle such cases. I regularly succeed to kill industrial grade synthesis tools with 0 input designs for example. There are VLSI placers that can not handle single cell designs, because the loop condition is checked after the loop. One element arrays in high level synthesis sometimes are fun, too.

Another issue is, that many algorithms have a lot of tunable parameters that need to be adjusted for a whole range of devices for optimum results. The developer of the tool can do that on a case to case basis manually. But if you want to sell a push button solution to thousands of customers....

Still, there are many small companies that develop special case EDA algorithm implementations and sell them to tool chain vendors that incorporate them into their flows. A lot of what xilinx uses was not developed inhouse.

Kolja Sulimma

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 8:47 PM

If these are so polished, then why do manual tools like PlanAhead, still give significant gains ? Until the automated tools get within a few % of good manual design, to me that demonstrates ample scope for improved SW.

There also seem to be steady annual claims of Tool Improvements well into the double digits, which also suggest these are far from mastered ( or are these claims more inflated by marketing fluff... ?)

A cynic might also point out, that the PlanAhead tools are $$$, and so Xilinx have something of a conflict of interest - if the free tools, get too close to the fullsuite, their revenue stream will drop....

-jg

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 9:32 PM

jg,

I'll probably regret putting in my 2 cents, but here goes:

It is true that "significant gains" are still (often, not always) realized by some careful hand placement, and some careful partitioning.

That suggests that the design languages lack something important, as the intent of the designer is not being communicated to the tools.

Teasing that intent from the HDL, through clever constrains, additional tools (like PlanAhead), or inferring (or evwen guessing) by use of clever code, can provide improvement.

The software folks here at Xilinx are amazing: they have managed to improve every generation the performance while reducing the time to compile the designs; all the while we in IC design follow Moore's law and double the density. Not to mention we add more custom IP, and customers are getting more demanding.

It matters little if IC Design or software cleverness led to low power, and high speed/performance. It is the sum total that is magic: the next version of the part (and the tools) provides that cost/benefit that is required for a new generation of applications.

So, if I would recap this ramble: HDLs are not good at telling the tools the best way to actually do what the designer wanted. Lots of room for improvements here. Tools still have cleverness left to discover, as we seem to still be getting more clever even after 7 products (that I was personally involved in). Adjuct tools (like PlanAhead) are very valuable for the users who just must get the absolute best from the tools (without hiring a consultant who hand places, and tries to tease the performance out which is not always successful, and certainly can't be predicted), and because they are so useful, they provide direct benefit that is worth something (so they justify their price). Finally, it is the combination of tools and silicon that solves the problem, neither stands alone.

Austin

- D
- dp
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 10:29 PM

Austin,

The purpose of high level languages (for logic generation or writing software) is to allow cheaper programming, the loss factor I have witnessed has varied between 10 and >1000. If one has the resources to do things at a lower level, this is always the better choice. It does not take longer, it does not cost more text (except for very low complexity works where this is a non-issue anyway), it "only" takes more skills. No translating tool can replace direct access to the programmed hardware. I wish silicon vendors were less unwilling to allow third parties to be able to program their silicon, thus removing a major progress roadblock. But this is unlikely to come true, it has some political background which eludes me (economically the chip makers would benefit from such openness).

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Dec 21, 2005 10:44 PM

I appreciate the quality of the tools over where they were years ago. I regularly curse the software, however, because the design intent that "I want to achive 200 MHz performance" can make it through the synthesis tools after some coaxing but ends up coming through the place & route software with brain-dead 2.5 ns nets on both sides of a single LUT. This is the extreme example of when good routes go bad.

I've tried to push the idea that a "critical path routing" approach can make the designs run as fast as n-levels of logic with appropriate primitive-specific delays can go. The tool shows no signs that it looks at what is possible before deciding how to throw everything together.

My opinion is that the process of mapping separate from place & route is archaic (to use kind words) and that spreading the logic out so each slice has just one LUT is *not* the way to alleviate the problem. The tools should have the intelligence to unbundle and rebundle unrelated logic as necessary to keep the logic "tight" (delays low) even in a low-utilization design.

The tools *can* do so much more; the evolutionary development of the tools has hampered true progress. The silicon is *amazing* in what can be accomplished. "Pushing the rope" to improve results with synthesis is bad enough. Having place & route software that can't understand what it takes to produce good results every time is sad. I can pass with total timing compliance then lose by 1.5 nanoseconds after changing non-critical logic. I prefer not to curse my tools.

Respectfully,

- John_H

- P
- Phil Hays
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Dec 22, 2005 2:12 AM

Yes. Xilinx has added "map -timing" to do just that. Mappping logic is now with placement, and the result works rather better.

--
Phil Hays

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Dec 22, 2005 3:59 AM

...

Austin,

What is missing is geographic relationships between parts of the circuit. Perhaps the biggest piece missing in the current tools is utilization of the hierarchy in a design. The current xilinx tools flatten the design before they even start on the place and route problem, and that greatly increases the workload and time to complete while also degrading performance. The tools have an opportunity to use the hierarchy in the design to treat each hierarchical layer as a mini-design, essentially breaking the problem into smaller problems in a way that is consistent with the way the designer broke up the design. Going to a true hierarchical place and route would improve both the quality of results as well as the run times.

Now, I do disagree with your assertion that each generation of the tools improves both run time and quality of results. I have indeed seen improvements in run time, but more often than not the quality of results has taken a step backwards with each major release of ISE. Yes, I suppose for flat RTL only designs, the results have gotten somewhat better, but that is mostly due to large improvements in synthesis, and small incremental improvements in the automatic placement (which BTW, still does a dismal job with placing non-LUT resources, with placing data paths, and with placing logic that has multiple levels of LUTs between FFs). In the mean time, the routing algorithms have gotten lazy, apparently in the interest of speeding up run times. For designs with poor placement, the effects of poor routing are not as apparent as they are for well placed (eg carefully floorplanned) designs. For my floorplanned designs, I have seen a steady erosion in the max speeds attainable by the tools on each new release since 4.1.

One of the biggest steps backward came from eliminating delay based clean-up (IIRC that happened in 5.2). The result there is that the tools just stop as soon as all routes meet timing. Every route in the design is potentially a critical route. The routes to nearby destinations often take circuitous routes that congest the routing resources and unnecessarily drive the power dissipation up considerably. With the current emphasis on power dissipation, I would think that the Xilinx team would be looking at reinstituting the delay based clean-up. Based on my empirical observations, that could pick up a 15-20% improvement in power dissipation for designs that are clocked in the upper half of the performance envelope.

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Dec 22, 2005 4:28 AM

The map -timing is still done ONLY in the map phase WITHOUT iterative back & forth with place & route. The attempt is made to group related logic together to get "tighter" logic but this little user-selectable switch doesn't make up for the rest of the problems with disjunct mapper and P&R.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Dec 22, 2005 5:24 AM

Hmm.. seems the real world is rather removed from that of Xilinx employees - Engineers should know better than to believe their own marketing fluff ?

Peter, What Ray suggests sounds very sensible and not what I'd call "a very unconventioanl way"

Austin, perhaps if you used engineering measurements for SW results, rather than the words like "wizards" and "magic", then the SW might have a chance to really improve with each release ?

For designs

Yikes! One wonders how _CAN_ SW make a carefully floorplanned design go backwards ? By how much ?

Is that the lazy routing, being so bad, it actually finds a longer path, than earlier SW ?

I did wonder how Altera suddenly found power savings in SOFTWARE - perhaps they now do exactly this, clean up messy, but timing legal, routes ? Anyone in Altera comment ?

PCB routing software has for years used cleanup and optimise passes, and only rarely (these days) goes outside a bounding rectangle on paths.

PCB routers also routinely route the 'most obvious' traces first, and so are very unlikely to go backwards on carefully floorplanned designs. They also allow net priority, where important nets can get first bite at resources.

Q: Does Xilinx SW scan floorplanned areas, and tag those nets as (probably) being important, and thus should have first-bite-privilages in Routing ?

Given the enomous investment the companies claim, these field results seem rather abysmal - seems the HW is carrying the SW ?.

Still, it does seem there is indeed a lot of 'fat' in Place & Route SW, so we can expect further 'double digit improvement' claims.... :)

-jg

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Dec 22, 2005 3:50 PM

Ray,

Some comments,

Austin

-snip-

As I said, there is a lot of room for improvement here. You are assuming that the hierarchy is well done, and that the results from working on each piece alone will do better. Just don't know if that is true. Good area for work, I would agree.

I have to differ here. I understand your issues, but if we deal with the ever expanding "standard suite" of test designs with better performance, and better run time, I have to assert that it is better. Is everything better? Of course not.

I happen to agree with you here, my personal opinion is that the tools should allow you to choose to go to the extra effort to find the best paths, and not stop as soon as the aggregate requirements are met (or stop and give up if it can't meet the requirements). I think you will appreciate that what was done did provide for a much faster time to get the design.

We do make the parts bigger every generation, and you may have noticed, processor power is not keeping up anymore.

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Dec 22, 2005 4:10 PM

Jim,

Some comments,

Austin

-snip-

The software group has a very rigorous quality of results metrics (measurement) system for evaluating their work. I get to use the superlatives, they do not.

We still beat them on power, ask your FAE for the presentation. They took a really lousy situation and made it just plain lousy. We still have a 1 to 5 watt advantage, AFTER they run their power cleanup.

Rather, the software is now (often) carrying the hardware. Very hard to get the latest technology to be any faster than the previous one, without architecture and software. If the software buys a speed grade, that is all the customer cares about. The silicon get less expensive with the shrink to the next generation. Who cares if the performance came from software, hardware, or both?

I agree. Until the tools do a better job than a room full of experts, the tools are just not 'there'. Reminds me of compilers for high level languages many years ago: there was a time I could write assembly code that was faster, better, smaller, than any compiled high level language (anyone recall PLM86?). Then after a while, the compilers got better and better. Until finally, I had to agree that all that work was not worth it: often the compiler yielded a better solution that my hand written assembly code.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Dec 22, 2005 6:16 PM

Enough to make so a design that passed timing with the earlier tools will not pass timing no matter what you do with the newer tools short of hand routing it. about 10% loss in performance average in each major revision. There was a huge hit going to 5.2. 7.1 seems to have a much smaller degradation from 6.3.

Yes, the routing got lazy so that it actually finds a longer path than it did with earlier software. Quite often, it will not find the direct connection to a neighboring cell, and instead routes it all over the place, which adds delay, increases power consumption, and congests the routing resources so that other nets also get a circuitous route so that the overall timing is even further degraded.

From what I understand, Altera is moving toward more delay based clean-up. Xilinx has moved away from it, and is instead pursuing capacitance based clean-up to reduce the power...which not only may miss the mark, but also requires toggle rate information for each net.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Dec 22, 2005 6:29 PM

Austin,

From what I have seen, folks who use hierarchy generally do a decent job of it. You really have to work hard at making a hierarchical design worse than a flat design. Hierarchy puts organization in the design, and because crossing levels of hierarchy is a little bit painful, it forces the designer to think in terms of components and to group related stuff together. Even in a poor example of hierarchy, there is at least a little bit of grouping done, and therefore information the tools can use. I and others have been asking for hierarchical tools from Xilinx for close to 15 years. I honestly don't think Xilinx understands why using hierarchy is a good thing.

Aust> I have to differ here. I understand your issues, but if we deal with

Fine, but improvements shouldn't break existing designs. Nearly every single one of my designs over the past 5 years gets better results with the tool that was current at the start of the project than it does with later versions of the tools. I could accept a low rate of recitivism, but close to 100% is criminal. I know I am not the only "power user" running into this, in fact it regularly comes up as a subject here at each major release of the tools.

Ummm, well no. The tools give you faster time to completion for a run through the tools, but that doesn't help if the design does not meet the timing requirements. It actually takes longer to complete a design because you need to iterate on the place and route much more than when there was a predictable routing solution for a good placement. Faster completion in the tools does not equal faster time to get the design done.

Yup, and Hierarchy can help you tremendously here. Routing complexity (and therefore effort needed) goes up with roughly the square of the device size measured in LUTs, primitives, cells etc. By breaking it down into hierarchical sub-assemblies, you end up with N smaller k/N problems, so the effort is smaller than k^2.