why such fast placement?

- M
- Michael Hennebry
  
  Contact options for registered users
posted
18 years ago

Fri, Feb 3, 2006 5:30 PM

I've recently read some articles stating that a realy fast place and route algorithm for fpgas would be a good thing. In this case, "really fast" means sub-second or even sub-millisecond. For what kind of applications would one need such fast "compile" times? Once one has done the compiling, How fast can one program an fpga anyway?

- M
- Mike Harrison
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Feb 3, 2006 6:06 PM

I suppose that if you can p&r really fast, the tool can try lots and lots of different placements to see which is best, giving better results more of the time.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Feb 3, 2006 6:36 PM

Compile time is a completely different issue. In using an FPGA as a processor (for netlists) the practice of dynamic linking becomes just as useful for FPGA's as it is for processors. Consider that your operating system is loaded with hundreds/thousands of dll's and .so objects for libraries and small modules. It's actually more useful, as the amount of "memory" (AKA LUT's) in an FPGA is smaller, a LOT smaller, so the practice of swapping/paging in smaller netlist seqments will be a necessary tool to avoid completely reloading an entire FPGA image in the form of a freash bitstream.

How many times do you compile a program? ... How many times do you execute/dynamically link an object to run it? Link and go times are much more important than compile times, except maybe for very large programs that a programmer is debugging.

Loading an entire bit stream is very time intensive ... loading a few collums is a LOT faster.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Feb 3, 2006 6:52 PM

A VERY GOOD P&R is necessary to optimize a hardware design.

A VERY FAST P&R that is pretty good, is necessary to use an FPGA as a netlist processor.

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Feb 4, 2006 9:18 AM

Look in the data sheets. There will be a max clock rate and a table of number-of-config bits vs chip size.

-- The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.

- M
- Michael Hennebry
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Feb 7, 2006 6:58 PM

If one has an fpga set up to do interesting things with cosines and wants to move on to doing those same interesting things with hyperbolic cotangents one can declare the cosine LUTs unassigned and place and route LUTs for the hyperbolic cotangent and connections to the rest of the fgpa leaving the assigned LUTs with their previous assignment. This would have to happen automaticlly if the decision to go from cosine to hyperbolic cotangent happened automatically. It might not be preplannable if one does not necessarily know what other sets of canned functions are currently connected. This is an example of why one might want sub-millisecond place and route times.

My example above corresponds to a link and go.

There are tools that will allow loading a few columns.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Feb 8, 2006 8:22 AM

Yep ... dynamic linking/loading of library modules is both powerful and important for fpga computers, just as it has been for traditional computers. Dynamic linking for an fpga is essentially a very fast place and route directly to the fpga. That is NOT todays ISE PAR and bitgen tools.

There are tools that do it, but not with offically documented interfaces independent of par/bitgen that open source developers can use. Some of these tools, like the JHDL place and route functions which do a load and go, as well as the ADB wire data base and router that was integrated into JHDLBits are ment to be open source, and were done in cooperation with Xilinx, but mix NDA access and publicly documented access to the technology.

In theory those interfaces can be reverse engineered independent of the Xilinx EULA NDA and just used, but there is a grey area that in doing so and using the Xilinx tools, the open source developers are also needful of not accidentally violating the EULA NDA terms. There is plenty of room for Xilinx to litigate to force the open source teams to abandon that access, unless Xilinx just documents the interfaces used by JHDL and JHDLBits (and other similar projects like VPR for Virtex). It might mean getting a commitment from FSF, or other open source advocate, to back any possible litigation so an open source team can freely document the interfaces used by JHDL, JHDLBits, and similar projects inspite of Xilinx's willingness to litigate.

When you look at where the sales are for Xilinx for the last 5 years, reconfigurable computing isn't even on the map. People have been talking about it for more than a decade, and it just hasn't been a serious market niche. I believe there is a huge chicken and the egg problem, which revolves around not having tool chains that specifically make it work as easily programming traditional ISA machines. No market, no good tools. No good tools, no market. Doing FpgaC with a good load and go back end that supports dynamic linking, and then building fpac centric libraries with established API's is a first step in breaking the chicken and egg problem.

It's conceivable today to build high end FPGA machines which can break

10 petaflops using tools which produce highly parallel approaches for certain major simulation applications. It's actually not even that hard. The hard part, is not having tool chains which can factor the computationally intense kernels into both netlists and compact state machines or traditional ISA code and distribute the application across a few tens of thousand very large FPGA's, and a few thousand state machines or cpu cores. Preferably for some parallel dialicts of fortran, as well as C. I've actually pitched this several times last year as a straw man just to get the reaction and list of problems with building machines of this class. The SEU problem is high on the list, but easily managable for XC4V's with some novel packaging. The bleading edge problem .... IE nobody has done it, is certainly a problem for conservative data center types, along with some NIH issues. The biggest problem, is the lack of tools today which can scale large application netlists across 2,000-30,000 FPGAs to load and go, while handling the implict communications needs, which are tightly coupled to the architecture.

For certain super computer sites, I believe the core work for their applications could be knocked off in as little as 10 man years, over less than a calendar year using some very strict KISS design and a well focused development team. It would take another couple years for the facility to completely mature by porting a traditional super computer software infrastructure to the hybrid architecture, and doing some extensive training about how to best use this resource. As super computer site projects go, a relatively small project. Since there are no huge FPGA machines, and no tools today, it's just an intellectual exercise to second guess what you can really do.

Dreaming aside, it would be relatively straight forward today to write those math functions needed in any of the fpga C tools (Handel-C, Impulse C, or FpgaC) as traditional functions. There are existing ISA C functions for them, that would only take a few days to port to fpga's. I have a road map for FpgaC to port floating point applications to the compiler and target runtime environment, making it math friendly and efficient by mid to late summer anyway. Beside the pipelining and process features in the Feb 2006 beta-2 for FpgaC (nearly ready for release), there are a few other FpgaC special features we'll be adding for the next two betas, to better make FpgaC a good host language for math intensive highly parallel applications. Most of that can be done easily today, without direct compiler/linker support.