I've been reading papers about routing in island-style FPGAs. Most cite Xilinx architectures, though I've looked at a few papers about H-tree networks. Often, there is a simplified model being used. There is mention (kind of dated) that commercial routers use derivatives of maze routing, with some more recent mention of channel routing. Is there some papers that can give a good idea of how the real industry software does global and detailed routing, what algorithms are actually used? What is the typical lag time between the advent of certain approaches in conference/journal papers versus uptake in commercial routers? I'm kind of curious how much I can should trust the papers as an indication of actual practice. As well, I am still in rummaging mode, and have yet to rummage into a paper that shows how the switches in the switch boxes are actually explored to get detailed routing, given a non-full crossbar. I've looked at
Wu &Tsukiyama et al.: Graph analysis of 2D FPGA routing
but I'm hoping to rummage into something more applied.
Thanks.
Fred
--
Fred Ma
Dept. of Electronics, Carleton University
These days, the best academic routers do combined global + detailed routing using negototiated congestion. The base algorithm employed is known as Pathfinder -- see "Placement and Routing Tools for the Triptych FPGA" by C. Ebeling, L. McMurchie, S.A. Hauck, and S. Burns. A widely used tool is VPR (by Vaughn Betz), which performs clustering, followed by placement (via simulated annealing), and then timing-driven negotiated-congestion based routing. It uses an improved form of the Pathfinder algorithm, but what exact improvements it adds I forget, since VPR was commercialized and I no longer remember what was in the original academic version! I *think* that VPR (or some improved version of it) is still the best academic router, quickly providing small channel widths and good timing.
There is a book "Architecture and CAD for Deep-Submicron FPGAs" by Betz, Rose, and Marquardt that covers the algortihms employed in VPR and T-VPACK (a timing-driven clusterer by Marquardt). It also covers the architecture of island-style FPGAs and various experiments on the detailed routing design. And there are a slew of papers at FPGA, ICCAD, FPL and other conferences on FPGA routing. Go to
formatting link
to see information on the "purple book" plus a list of papers on VPR by Vaughn -- the refernces in these papers should serve as good starting point.
I did in fact encounter Vaughn Betz's papers on VPR. There is pretty intuitive description of the VPR, and Pathfinder is described as an iterative application of shortest path. I'm still gnawing breadth-first on promising references from web-found papers. I'll look more closely at Pathfinder to see if it details how the switchbox settings are determined. Thanks for the pointer to Vaughn's book.
If anyone can comment on how relatively wide-spring are the various algorithms, both in academia and industry, that would be helpful. Papers often reference other papers, but don't actually indicate which algorithms are used by which commercial tools, and how prevalent are the various commercial tools. Vaughn's website says Right Track is now part of Altera, so maybe Altera's own tools may start using ideas from VPR. I have yet to come across information about what was the algorithm prior to this. What about Xilinx? Is it the case that they don't like to disclose the actual inner workings of their native tools? How common is it to use 3rd party tools for place-and-route rather than those from the vendor? In general, information about which commercial tool (vendors) use which algorithms seems much rarer than just the algorithms presented in isolation.
Fred
--
Fred Ma
Dept. of Electronics, Carleton University
I'm not quite sure what you mean by this. If you're referring to how the topology of the switchbox is determined (an architectural decision), there are some other papers I can point you to.
From a routing algorithm perspective, it is really quite simple. You represent the entire chip as a graph where each node represents a routing resource -- a block input, block output, or routing wire. Directed edges are placed between these nodes to represent the presence of a (programmable) connection from one resource to another. So a swich box that is not a complete cross-bar is encoded in this graph as edges. For each net, the router starts at the source block output, traverses the edges from that node, and assigns a desirability/cost to each of the nodes seen and places them in a heap. It then removes the best node from the heap and repeats from there until it hits the desired destination (a block input). This is known as an a-star or best-first traversal, and clearly all the work/intelligence lies in how you cost the nodes (including a prediction of future cost), how you guide the router to correct congestion, etc. For a multi-terminal net, the routing thus far is placed on the heap and the router expands from there. There are optimizations that can be done to reduce the amount of re-expansion of the wavefront, and other techniques for quickly routing high-fanout nets.
Global routing (not worrying about precise wires and switch box settings) is done using the same algorithm, except that rather than nodes representing one wire, you represent a group of wires with a node of capacity = channel width. A well-written global + detailed router does not require a global route and performs fast enough that there really isn't any point to look at global routing (in FPGAs) anymore.
We don't like to tell each other what we do in our tools for obvious reasons :-) VPR served as the basis for Right Track CAD's work, and Right Track provided Altera with enhanced place & route results for their FLEX10K products in MaxPlus II. What the algorithms were before or what they've become since is not a matter of public record.
Do any placement algorithms try to make regular structures? For datapaths, for example? This is how a human would do it, and might give excellent results in some cases.
Although, you certainly want to architect FPGAs with enough routing resources so that regular structures are not needed.
--
/* jhallen@world.std.com (192.74.137.5) */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
I sort of got lost with the heap, but don't worry, I was just getting a rough idea. If needed, I will look up A* (I have papers on it). The representation for wires/switches above matches that in Pathfinder, and I noticed that Betz's VPR also has tricks to cut down some work in restarting the wave front for multiterminal nets. The use of arcs to represent switches seems to get rid of the division between detailed and global routing.
Yes, I was noticing that in recent papers.
Aaawwww. OK. Thanks. I understanding why there is a paucity of such information.
Fred
--
Fred Ma
Dept. of Electronics, Carleton University
Joseph, I'm no expert in the area, but seem to recall seeing this. Look up the authors of C compilers for Garp, I think one of them works on a fast linear placement algorithm. It might be from arraying bit slices. I expect the granularity of one's slice to be highly dependent on the target platforms array architecture.
That's the constant balance, it seems. Deciding ow to focused to make your application domain, which determines the suite of applications (and kinds of operations) you need to support, which helps you avoid excessive flexibility in the array architecture. It's pretty vague, but it seems to be the general motivation for reconfigurable logic.
Fred
--
Fred Ma
Dept. of Electronics, Carleton University
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.