Why is router software not multi-threaded?

Excuse my ignorance if there is an obvious answer ...

Is it true that the routing algorithms used in Altera's and Xilinx' tools are single-threaded? If so, what is the primary difficulty with making them multi-threaded? Is the algorithm particularly difficult to partition into parallel units? Or has it simply not been a priority for companies making the tools?

Now that some of the projects I am working on are taking quite a while to fit, I'm thinking it would be nice to parcel out a fit between 5 machines sometime in the future when/if the software supports it.

-- Pete

Reply to
Peter Sommerfeld
Loading thread data ...

Look at the PAR section of the dev.pdf (Developers Reference Guide) in your Xilinx doc directory. The -m option of par allows you to specify a list of workstations to parse out place-and-route runs. I used this 3 years ago in environment of Solaris workstations ..... the effort level to get it working was minimal.

This does not mean that par is multi-threaded. I just means that you can run N different place-and-routes using different placement starting points (cost table entries) across multiple machines.

-- Regards, John Retta Owner and Designer Retta Technical Consulting Inc.

email : snipped-for-privacy@rtc-inc.com web :

formatting link

Reply to
John Retta

This would be nice...Xilinx willing comment?

Reply to
fabbl

I think by definition, a routing task is a single threaded process. Although there may be some ways to use multitasking. I belive that a routing problem is what computer scientists call NP complete. It is a process of exploring a tree (a very huge tree) finding the shortest path to a leaf (which represents a completed route). The only algorithm I know of that can find the best path without exploring the entire tree is to prune branches when you decide that they are already longer than the shortest path you have found so far. So at any given node, there are multiple paths that can be followed which could be partioned out as multiple tasks, but if this is done at a low level it will be a very short lived process. Even near the top levels any given subtask could reach a dead end very quickly.

In the end, no one ever finds the best path. Algorithms are used that try the most likely *good* paths and hope for the best.

I expect the work involved in splitting off subtasks would be significant compared to the basic function of searching the tree. But certainly this is worth a look unless it has already been determined that the overhead is too large.

If you are just trying to get a solution, then multiprocessing would be a benifit. But for most people who are looking to improve their fit, multiple runs work pretty well.

Reply to
Ralph Malph

This feature (-m) is called the Turns Engine and will be available under Linux beginning with version 7.1i. No multi-threaded support yet, but it's being considered for a future release.

Regards, Bret

Reply to
Bret Wade

(reverse order -- information first, pseudo-rant second :-))

Quartus has a mode known as Fast Fit. This turns down the fitter effort levels and disables some of the fancier but expensive algorithms. The result is a 50% faster compile at the expense of about 10% performance. If you're hitting your performance target, you might as well enable this option.

You can also parallelize the multiple compiles across multiple processors (trying different options, etc.). This doesn't help you with your big design's compile time, unless you break it up into pieces and use the modular design features such as LogicLock to compile each piece and then recombine them in a final compile. But this would likely loose you some performance too...

Parallelizing a routing (or placement) algorithm is very difficult. The problem is that the routing of different nets in different threads/processors may seem like a natural division, except that the routing of one net will be affected by the routing of another. I guess you could partition the nets in the device in a way that they are unlikely to interact with one another, and then route these nets in parallel, and after each iteration repartition and try to reconcile conflicts (nets using the same wire twice) the next time around, but I imagine that you rapidly run into the situation that after the first few iterations, everything left to resolve is interconnected in some way and you can no longer parallelize...

On top of this, we'd quickly hit Amdahl's law -- if you make routing parallelizable by N, run time will quickly be dominated by another piece of P&R and so performance will not get better by a factor of N. If you look at all the things done during a compile -- synthesis, tech-mapping, clustering, placement, routing, delay annotation/computation, timing analysis at various steps, etc. -- that's a lot of very different algorithms that all need to be parallelized in order to realize a significant gain in compile time.

In addition, the approximations/limitations we would need to impose on the algorithms in order to parallize them may very well result in a reduction in achieved performance & fitting. Let's say parallel place & route achieves

10% worse performance for 60% run-time gain. Would this be worth it? Not if you can instead enable "Fast Fit" mode in serial place and route and try a little less hard and thus lose 10% performance for a 50% gain...

There is a fair bit of academic literature on the subject. I can't recall having seen any implementations of parallel place & route that competed well with the best academic serial place & route tools, but then again I haven't looked that hard.

Regards,

Paul Leventis Altera Corp.

Reply to
Paul Leventis (at home)

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.