high level languages for synthesis

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Aug 26, 2006 9:15 PM

True to some extent, but the argument doesn't scale, otherwise it woudl be true also for programming without software tools at all, in 1's and

0's. So language, it's functional abstract complexity, do make a difference - and a large one.

Mostly because of the complexity/cost to build new library parts.

- K
- KJ
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 11:19 AM

I agree that every line of code is a probability for design error, but in my opinion, the various languages being bandied about for doing hardware design can me equally used to create either succint code that describes the functionality concisely or misused to describe the functionality 'bit at a time' as you say (I'm interpreting that to mean....bloated....lots of code that could've been written more concisely).

Whether you get the concise code or the bloated code for a particular hardware design I've found is simply a function of

- The skill level of the person with the hardware design language that they are using.

- The skill level of the person in hardware design.

I could be wrong, but I haven't seen anything to indicate that the actual language used itself is an important factor.

KJ

- R
- Robin Bruce
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 2:01 PM

There seems to be little attempt here to categorise the tools themselves. Are we talking solely about cycle-accurate HLLs or not? The differences between a tool like Handel-C and a tool like Impulse-C are enormous. I would argue that the non-cycle-accurate high-level languages are more distinct from their cycle-accurate cousins than either of them are from HDLs.

The cycle-accurate languages seem to suit a spiral methodology that begins with high levels of abstraction. Functional models precede a hardware-software partition and the fine details are filled in gradually, with functional testing at every stage. At the end of the design process you are almost as close to the hardware as you would have been if you designed in VHDL in the first place, still having to understand what's happening on every clock cycle. Handel-C and System-C fit into this category. The advantage to these languages is that you are working at the system level from the outset. It's not so straightforward to use a spiral design methodology using VHDL, which is much better suited to building and testing all your components fully before you bring them together. That's a nasty time to find that there are problems with the system-level design, as the components themselves may need re-designed. These languages make sense for large projects where timescales are tight, budgets are large and you want to mitigate the risks of design respins. See Jonathan Feifarak's paper at last year's MAPLD conference for an example:

formatting link

Non cycle-accurate languages are targeted at an entirely different crowd, and are more suited for general-purpose reconfigurable computing. Example of these languages are Impulse-C, SRC's Carte Programming Environment, Mitrionics' MitrionC, and Nallatech's DIME-C. They get their performance speed-ups from a mixture of spatial and temporal parallelism.

These languages are aimed at users that are not necessarily familiar with FPGA development, or even low-level programming. High-Performance Computing users are those who have most to gain from general-purpose reconfigurable computing. Paradoxically these users, who have the greatest need of high performance, are often not the power users one would expect due to the fact they are also application-domain specialists. They may be used to writing C, but as an alternative to FORTRAN and not as an alternative to assembly language. To expect such people to be able to take quickly to VHDL is misunderstand who these users are. These people do not want to design PCI cores, but instead want to compare genetic sequences, implement CFD simulations and implement a wide variety of other scientific algorithms. These algorithms exist in C and FORTRAN presently and people are now looking to porting them to FPGAs. The tools do not make it possible to simply compile pre-existing C applications and receive 500X speed-up, but they instead offer an environment in which all the nasty complexities of FPGA design are abstracted. Users can adapt the algorithms to best suit the hardware compiler. No clock periods, no PCI cores to design, no memory controllers to worry about, no pins to worry about. The tool only makes sense as part of an integrated platform that makes all of this possible. Many of these tools are now beginning to lean towards a library-based development process, where the most frequently used functions are implemented, ideally as pipelined cores, in a traditional HDL process and then integrated into the tool.

This kind of development also has a place within High-Performance Embedded Computing. Your application may interface to sensors and actuators etc..., but in the world of the billion-plus transistor FPGA some seriously complicated algorithms can be implemented. At this stage it makes sense to develop the external interfaces in a traditional HDL manner, but develop the main algorithm in a high-level language. This allows for a lot more experimentation with the main algorithm. You don't give up half as much performance as you think you might in doing things this way and you could react to market opportunities a lot more quickly. Once you've settled on the structure of your algorithm, you have the option of recreating this architecture in HDL for a performance increase. You could either do this before you release the first version of your design to users, or you could get there first with the HLL version then follow it up later with the optimised version.

A final point is that we really need to nail down this fallacy that an FPGA-targeting C-syntax HLL is somehow inherently sequential. It's not. The compiler determines that. FPGAs are best for algorithms where there are large data sets. They offer the best performance when the algorithms (or their main computational loops) are pipelinable. For loops should be pipelined as a matter of course wherever possible. Below is a quick example of what I mean. It's a DIME-C design that implements the probality density function. The array sizes are 8192 here, as I've implemented the memory in BRAM, though I could have had these arrays packed into SRAMs, so they could be a lot bigger. The entire for loop is pipelined, so the whole thing will take N + latency cycles to execute. In this case latency is about 85 cycles. I would expect a clock rate of 120-150MHz on V4 for this, but I haven't actually built this.

It took me 15 minutes to do this, and I'm a real slowcoach. I wonder how long it would take me to do with VHDL...

/* Project to implement probability density function on V4 device Robin Bruce 28/08/06 */

#include "math.h" #define SIZE 8192 #define PI 3.1415926535897 #define ROOT_2PI 2.506628274

void probability_density(float x[SIZE], float mu[SIZE], float sigma[SIZE], float phi[SIZE], int N) {

int i = 0; float sigma_sqr = 0.0; float dif_x_mu_sqr = 0.0; float dif_x_mu = 0.0; float sigma_local = 0.0; float x_local = 0.0; float mu_local = 0.0;

for(i=0; i wrote in message

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 3:21 PM

This is certainly one of several key determinates, as is the specific task and experience with it.

Here we differ, even if just discussing VHDL/Verilog, in how FSM's are best constructed for clarity. I've read nearly 100 different coding styles, too many very difficult to issolate just exactly what the FSM really is.

This is one area where language design makes a significant difference in maintainability. FSM's in C are just a natural byproduct of the for/do/while construction coupled with if/then/else constructions (or switch/case/break/continue if supported). Here, the C language is about describing flow control (the task at hand), not about describing instantiation (where flow control must be designed by construction). While it might be possible to build indirect FSM's as you do in VHDL/Verilog using C constructs, it's certainly not the native style of any C coder, even those that are also hardware trained.

But again ... language is something of a religious war ... the real topic is reducing expression complexity, which is greatly abused by nearly all VHDL/Verilog designs I've read. This variation is certainly visible in the current thread on FSM construction, and the clearly different ideas about how this should be expressed.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 4:18 PM

Take Robin's example, one thing that compilers need, is a hint about available resources. That for-loop can easily be unrolled by a fair factor and each instance executed in parallel. Even with N a variable, it can lop the loop into large parallel chunks, and finish the last increment with a partial chunk execution.

I disagree here ... they can also easily replicate MANY instances of small parallel processes as they can replicate parallel instances of any other data handling code block. For example, the code block of a typical MPI parallel process can be replicated on FPGA dozens/hundreds/thousands of times, that would require huge clusters of machines for a sequential instruction set processor.

Best? not clear ... but pipelining is certainly an advantage for getting clock rate up. Another alternative is pure async construction, or better yet, fully async FPGA fabrics.

There are a lot of things a good compiler can do here to improve synthesis and performance, IF the compiler is tuned for doing FP based data algorithms. Those are optimizations, that would never be put into a pure hardware description language, as they would seldom be needed. As such the optimizations in the compiler are leveraged across many applications.

If this code block was the kernel of simulation engine, with thousands of cells, it would be best for the compiler to implement this bit/digit serial, using a data flow linkage to avoid the multiplier expansion. Literially run thousands in parallel, with a word length latency in clocks.

In many cases, C programers use variables instead of constants. There is a clear advantage for FPGA compilers to be presented with constants instead. This favors using maifest constants at compile time, rather than per run constants in the data. Frequently a recompile is faster than the resulting code fully parameterized.

- M
- Martin Thompson
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Aug 29, 2006 10:42 AM

Right both times, sort of :-)

Neal Stephenson wrote "in the beginning was the command line"

formatting link

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html

- M
- Martin Thompson
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Aug 29, 2006 10:46 AM

Can I suggest you stick the docs up on the actualy web somewhere, last time I looked, they were all in SVN. I checked them out and there was some very interesting stuff in there, but I imagine most people would like to have a read of what you have more easily than that...

Are you going to start making use of the carry chain... last time I tried a counter, it didn't go anywhere near it, just a big lump of LUT4s - took me right back to uni, doing next-state logic minimisations for a counter!

Have you considered doing a higher-level technology independent VHDL/Verilog backend, which could then be thrown into a normal synthesizer and let it ifugre out the best adders and such like - they've been practising that for a number of years now :-)

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Aug 29, 2006 12:42 PM

On a sunny day (29 Aug 2006 11:42:00 +0100) it happened Martin Thompson wrote in :

I have just read part of that, he really gets out there.... :-)

For me, one thing the 'command line' interface does, is allow me to speak normal language to a computer. This is important in interfacing to computers with speech.

For example I have a script (Linux Bash) called 'show', and a PCI satellite card in the PC.

formatting link

When I _type_ 'show BBC2' at the command prompt, then it will steer the dish to one of the Astra satellites and start mplayer.

And when I _say_ 'show bbc2' it will do the same, using perlbox voice.

The ability to make commands that are simply spelled as English words is perhaps the most powerful thing. 'Random access'.

You often _know_ what program or action you want the computer to do, and going via many menus to dig deeper and deeper is a barrier, takes time.

I run X and fvwm with 9 xterms, 9 virtual screens.

MS was dead from me when they killed DOS command line in win 95 just to get rid of DR DOS ......

Same in FPGA, use makefiles, scripts.

To pester MS windows users I say sometimes: 'My PC speaks English'.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Aug 29, 2006 11:42 PM

The doc's are available under the Doc page, which are Beta-2 docs. Current work in progress will be always SVN. I try update the doc's as we check things in, so it's not a frenzy at the next beta freeze.

There isn't really any technology specific generation at the moment, so it doesn't use the LUT carry resources. In fact, the current internal technology makes it nearly impossible. That is something that we are addressing soon ... probably Beta-4

There is a VHDL output, fully functional I believe in Beta-1 ... it got broke in Beta-2, and will be fixed at Beta-3 again (which should be very soon).

- M
- Martin Thompson
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Aug 30, 2006 8:11 AM

Ahh, I see where they are - I was looking for them off the

formatting link

page. There's a sentence in there about docs, which might benefit from a hyperlink to the actual docs?

I think that's also low-level though, LUTs and wiring? I was thinking of something higher-level, where if you have an adder, you put a

- M
- Martin Thompson
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Aug 30, 2006 8:16 AM

Yeah :-)

Yep, get it done once and it's easy all the other times!

I've often thought that good engineers are fundamentally lazy - they only want to do something once, do it well, then get on with the next thing, having made the first thing so easy that they don't have to think about it! When I can make that less waffley (sp?), it might become a .sig...

Neat- I shall have to play with that!

It's OK the first time when you are exploring something new, the guidance can be a boon. But when you have to do it every time, it gets a bit annoying.

Ahh, back on topic :-) Ditto for me. I wouldn't be without python now!

How do they respond to "goading" like that?

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Aug 30, 2006 2:32 PM

That might make sense for a lnaguage translator, but that isn't the synthesis goal for FpgaC.

For TMCC (the starting point for FpgaC) the goal was just to get a portable form, optimized for C. It wasn't particularly optimial, just something that worked. The goal was a C language semantic HDL, as an educational exercise.

The FpgaC project is targeting using FPGAs to provide a VERY fast execution environment for ANSI C. As it matures it will do some things in a very different way. One for starters, will be to keep all expressions and variable assignments in "carry save" format right up to the point they are committed to being saved in their final form as DFF's. This may use a little more logic to gain performance, and avoids unnecessary ripple carry resolution of intermediate terms. With LUT term sharing and packing, it may actually use the same or less, by foling several operations into a single lut.