FPGA C Compiler on sourceforge.net (TMCC derivative)

- R
- Robin Bruce
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 12:02 PM

Jim,

I agree with you about the value of mixing and matching HLL and HDL solutions in your system. You might want to design the core of your algorithm using an HLL tool, then link it up to control and memory systems that you've designed using HDL.

I also can see the constant changing of the underlying hardware as being a challenge to those who are developing C-to-hardware tools. I think perhaps it favours those who target their systems at a single architecture, like SRC with their Carte programming environment. HandelC has seen a lot of real success, the most notable in my mind is it being used in an effort by Lockheed Martin to create a space system to dock with Hubble

formatting link

My understanding of HandelC is that the basic package contains generic HDL routines for the common operations (data array storage and retrieval, fixed-point multiplication etc.) bu the user is given the option to supplement this with implementation specific routines. So in Xilinx V-II this would be BRAMs for data array storage and use of the 18x18 multipliers for the fixed-point stuff.

With each new generation of FPGA, you'll need to update your underlying library routines. As I recall Peter Alfke saying at this year's FPL, to get the best out of FPGAs, you need to target your architectures, generic just won't cut it (apologies to Peter if that's not what he was getting at).

So I guess I agree with you Jim, in that C -> registers is not the best approach.

Apologies for my ignorance, but can I ask you to expand on "alternative of HLL -> FPGA Running HLL amd the best tool set". I wasn't sure what you meant.

Cheers,

Robin

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 1:05 PM

air snipped-for-privacy@yahoo.com schrieb:

Yes, but Java and C# have essentially the same syntax without any of the defunct side effect issues that make C so damn hard to synthesize but do not have any real use anyway.

But instead of using an existing standard lagnuage that can be synthesized they define a new language of their own and call it "C, but you are not allowed to use this and that"

BTW, VHDL has the same issues.

Yes, but it also is a clear advantage to have a language that allows for explicit parallelism, processes, signals and events on a finer grain that threads.

Such languages exist for traditional processors for a long time now. Algol comes to mind as a very old example. You know, it is very easy to map explicit parallelism on a serial machine. But it is very hard to extract parallelism from a serial description.

I am a big fan of high level synthesis from algorithmic descriptions that do not describe the hardware details, but I am sure that C is "A Really Bad Choice (TM)".

Heck, C is a really bad choice for serial CPUs to begin with. If you build hardware, you want your compiler to do as many compile time checks as possible. There's not much that you can catch at compile time with plain C.

Kolja Sulimma

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 1:49 PM

I was not suggesting that C syntax HDL's will obsolete VHDL/Verilog and other technology specific HDL's, but just as you clearly state the goal is to produce "efficient circuits from nearly arbitrary C" which from the presentations by the group are very very close to that of a good HDL with a good to excellent coder. With FPGAs increasing in size at roughly the same Moore's Law rate, and similar performance increases, the end result is that "surprisingly efficient circuits from nearly arbitrary C" is more than good enough to replace VHDL/Verilog for the majority of algorithmic and systems level designs to allow faster time to market, with a larger pool of talent (being able to draw on C coders), with good enough performance and fit so that expensive fine tuning and coding in VHDL/Verilog will NOT be required.

The clear intent is to produce acceptable circuits with a less talented engineers for a variety of target applications ranging from full systems to reconfigurable computing.

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 2:15 PM

The issue is having to subset a language and it's expected runtime environment. Both Java and C# barrow heavilty from the same subset of C syntax, and both have similar problems of porting arbitrary code from a traditional sequential execution environment to FPGA's. The lack of "real addressable memory" results in difficulties in dynamic allocation, and runtime architectures that expect pointers to create and manage data objects.

Starting a "my language is better than your language debate" is really non-productive, as the real test is does it exist in a usable form today for this application, and the assumption or assertions are in the end only validated by availability and use as the true test of what is the best for target applications. One clear standard is access to trained labor pools, as frequently the "best" tools for briliant experienced engineers create unmanagable complexities for less talented, skilled, experienced engineers that will have to maintain the project over it's life. Managing concurrency has always been a tough one for less skilled engineers unable to grasp global system architecture and state enough to protect from the hazards and deadlocks.

There is a lot of room in this new HLL to netlist market ... we would ALL like to see affordable usable tools that do a better job. Bitching that C tools are not to some higher standard is pretty non-productive, when the existing broadly used tools are at even a lower standard.

There are few affordable open source tools for students, hobbiests, and small development shops for FPGAs .... I don't see any that meet your minimum requirements, maybe your talents will make them available?

- R
- Robin Bruce
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 3:15 PM

from nearly >arbitrary C" is more than good enough to replace VHDL/Verilog for the majority of >algorithmic and systems level designs to allow faster time to market, with a larger pool >of talent (being able to draw on C coders), with good enough performance and fit so >that expensive fine tuning and coding in VHDL/Verilog will NOT be required.

Wow, that's a long sentence :) and one I broadly agree with. I'm not so sure about the phrase "nearly arbitrary C". I don't know all the tools though, so I'm presenting my limited experiences here, not what I think is universally true... So no flaming. :)

Is there any tool out there that can produce code that rivals good VHDL when written by a "C coder"? I'm currently working with a very bright undergrad who has never used VHDL before, and I've got them using a C-to-VHDL tool. To effectively use the tool, they're having to understand why they need shortcuts that avoid "/, % and *" as much as possible. They need to think about memory management in a very new way, in a land where BRAM is not limitless. They're also having to consider the differences between BRAM, SRAM and registered values and their effects on performance. They need to understand what will break the pipeline and what won't, what will result in big pipeline latency and what won't. They need explanations as to why a few small changes can quarter the final slice count. To work effectively you need to understand what kind of hardware that you are inferring by what you write. Two functionally equivalent statements can compile to two very different VHDL projects. In my experience, HLLs free you up from the drudgeries of HDLs, but they don't yet free you up from the need for an understanding of the underlying hardware. With nothing but a C knowledge you can get something big and slow, but for small and fast, you need to know what you're inferring.

Cheers,

Robin

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 3:43 PM

That's a clear problem, even with C coders and doing any kind of device driver on a traditional system. The pool of C coders that understand hardware enough to write drivers is a small percentage of actual coders. The difference is that it's relatively easy to teach the high level aspects of hardware from a systems perspective to train new device driver writers and maintainers, and one good low level engineer can mentor some half dozen others with less skill and maintain the quality level necessary for production work. I've done so on several cross architecture porting projects with undergrad students.

I believe there is a similar leverage in using C coders for FPGA work, you do not need EE's with logic level design skills to develop fpga projects, but you do need to teach C coders about the hardware architecture models that the HLL is going to produce after synthesis. Even today it's necessary to teach C coders about cycle counting, as assembly language coding is no longer a basic skill. Performance sensitive designs require teaching about working sets, cache performance issues, CPU pipelines, and a number of issues that a typical C++ or other object oriented coder remains clueless about. There is a continuous design space:

BitLevelLogicAbstractAlgorithms

Schematic designs on the left, VHDL/Verilog somewhere left of center, C based HLLs somewhere just right of center, with better HLLs to come even farther right of center. EE's tend to design to the left of center, and HLL coders to the right of center under this model, and the better the HLL effieciently hides the hardware, the farther to the right we move.

So the bottom line, is that just as EE's were the entire computer development staff in the 50's, today EE's are a fraction of the product development team. With FPGAs becoming common along with HLLs we will see that same trend.

During the 70's we saw a lot of old salt EE's and Systems types crying about HLLs and computer performance, which is nearly a moot point today. Ditto for EE's and systems types that will be crying about large commodity FPGAs similarly not being effieciently used by HLLs generating FPGA designs from abstract language tools.

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 3:51 PM

Simple code examples and synthesis results is what is missing from the web sites and discussions I have seen. If you've got some, consider posting a link so that all interested can have a look.

-- Mike Treseler

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 4:19 PM

For those that haven't looked at this stuff, it's the next generation HLL FPGA environment, two steps above C with a cute GUI based system level abstraction tool .... very cool :)

formatting link

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Nov 4, 2005 8:50 PM

Jim Granville wrote:

"FPGA Running HLL" is a terse way of saying a SoftCPU (can be DSP enhanced) running opcodes (ex HLL), on the FPGA. ie a FPGA CPU

eg NIOS has an interesting Opcode extension scheme - you can code in C, run in C, and then grab ONLY the tight stuff for expansion into hardware, and call with an opcode.

The FPGA vendors seem to be favouring the 'sea of DSP' and std tool flows, over 'sea of programmers' approach :)

-jg

- K
- Kolja Sulimma
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Nov 5, 2005 11:51 AM

You are perfectly right. As I wrote in my post I think that you need some features like explicit parallelism that none of the mainstream languages offer, albeit there are languages available that would be suitable.

But apparently all those highly trained clever software engineers can not be bothered to learn another language. At least this argument allways comes up at that point. (Maybe I can find an engineer in india that is still capable of learning?) So if your really need C-syntax as many believe - I don't - at least use a modern C derived language that is easier to compile. With java essentially only the "new" operated is a problem. With C, well look at the System-C restrictions.

On the other hand: What's so hard about dynamic allocation? Tell the designer that it will be slow, and if he uses it simply synthesize to a microblaze implementation. You will not meet the timing constraint, but it can be synthesized. Or even use profiling to find a typicall number of allocated objects and create them in hardware. If more are used halt execution. That is exeactly what a sequential processor would. You can't call malloc a billion times in C and maybe you can not call it 16 times in hardware C. It is the same type of constraint that is not imposed by the language but by the implementation fabric and the designer needs to know the capabilities of his system before implementing.

Kolja Sulimma

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Nov 5, 2005 1:25 PM

The reality is that forms of parallelism emerge when using C as an HLL for FPGAs. The first is that the compiler is free to parallel statements as much as can be done. This alone is typically enough to bring the performance of a 300MHz fpga clock cycle near the performance of a several GHz RISC/CISC CPU for code bodies that have a significant inner loop. Second, explicit parallelism is available by replicating these inner loops by creating threads with the same code body and using established MPI code structures and libraries. Third the compiler is free to unroll inner loops. Fourth the compiler is free to flatten the netlists to gain additional parallelism. All this and more is obtained without abandoning stable mature development tools, without learning a new development environment that might add a few percent higher performance, and without significant unwarranted risks for many projects.

After several decades of managing large development projects across multiple facility and platform evolutions we have learned to mitigate risks and maximize human potential across a large number of projects, teams, and technologies while repeatedly delivering results with acceptable tradeoffs judged by our experience.

Many have also advocated radical changes in language and development styles. We have the gained the experience in this process after watching radical changes fail for human and technology elements not considered by the radical technologies as proposed. We do learn from those that do succeed and incorporate with reasoned process to mitigate risks.

One critial risk not forseen by many of these brash proposed changes is observing that individuals have different degrees of ability to manage state in designs. Some with high natural ability can learn tosafely manage a very large amount of state with concurrency, and many more lack this ability and after the best training can only handle significantly smaller amounts of concurrency in a design. This is not a training issue, this is not an experience issues, this is natural ability that is developed with training and experience, but the maximum for each individual is independent of training and experience. Managing these differences in natural ability causes tradeoffs in complexity that may not be the best for some, but are best for organizations over time. It's not uncommon to see briallant designs to be completely unmaintainable by mortals.

Time, and time alone, judges sucesses and failures. Not idealism and insults.

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Nov 5, 2005 2:10 PM

Open source know no boarders, no race, no religion, no ethnic, no political, no barriers to who can contribute to the world.

Since you KNOW the answer, share it. We will be looking for your work on sourceforge, and your announcement here. Good ideas which are never realized, always ways worthless failures.

Good designs have excellent space time tradeoffs. CPU cores are large, and take you right back to serial execution with poor parallelism, that in many cases would have been done better with a VLSI cpu, either as a hard core, or external device. Likewise, pointer based memory takes you right back to serial access of that memory as a critical path resource. Dynamic allocation is implicitly serial by design.

A language designed around dynamic allocation of objects and classes is implicitly unusable if limited to 16 such allocations, if only trivial code body invocations can be realized.

The multiplexors to emulate a memory pool of statically allocated objects are both huge and implicitly serial once hazzard free for conncurrent access. This takes us right back to poor space time tradeoffs and a lack of implict parallism that static objects offer.

Obviously you see right past these problems, and we are waiting for your magic to be realized as a much better language offering on sourceforge.net.

Since the programmers in India are by your assertion superior, please show us by results, and time may prove you right.

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Nov 5, 2005 5:38 PM

Yes. All of the next-gen websites are cute. Why is a working code example so hard to find?

-- Mike Treseler

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Nov 5, 2005 10:12 PM

You can always ask the various sites, or some user. Robin seems to be using and happy with the DIME stuff, email him for some samples. Have you tried talking with the company?

Impluse C offers a full featured 30 day trial, and they are pretty cool to talk with, and have done a good job of productizing Streams C.

Streams C is free for non-commercial use, and is available from

formatting link

SA-C by the Colostate team (Wimm Bohm) looks like they don't intend to make it publically available, except to companies funding their research projects.

ASH by the CMU guys, isn't likely to get open source released either, and is likely to end up licensed to someone for a revenue stream from what I was told by one person last year ... but I haven't seen even that yet. Mihai Budiu appears to now be at Microsoft, and publishing papers from there on the technology, so maybe Microsoft will be licensing the technology, or working from Mihai's development independent or in partnership with CMU. The papers have been very cool, but until it's publicly available or a product it's hard to judge just how useful for others. The ASH team offered training at a conference earlier this year, and may do more.

Celoxica isn't quite as easy to get a demo copy from, but some Xilinx reps seem to have a copy, and they were offering training seminars with Xilinx across the country.

FpgaC has some examples in the download image, and is free to run your own tests with, and has no restrictions against commerical use.

It seems pretty easy to get working examples simply by downloading or asking the sales guys ... who didn't respond to your asking?

- J
- Jeremy Stringer
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Nov 6, 2005 9:20 PM

An interesting point with this, of course, is that it's just splitting the work less - instead of going

C -> object code -> Processor (with out of order execution),

this would seem to be a case where the management of of-of-order execution type things is done statically at compile time, rather than dynamically by the processor.

It could be interesting to see how far this could go - Compile to code+processor, where the processor architecture is implemented by the compiler subject to the requirements of the design.

My 2c, Jeremy

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Nov 6, 2005 10:10 PM

I think that is being done already, tho at the simple end of the scale, it does prove it is possible. IIRC, it involved compiling the design twice. Once to generate the Core+Codes, and again to remove unused portions of the core. It can introduce other problems - if the CPU changes every time, that complicates things more, and what looks like a few lines of code, might enable a new block of the CPU, and have an unexpected hit on both % Usage, and speed.

Cores themselves are not too large these days, the bigger bottleneck is on chip code memory.

-jg

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Nov 7, 2005 2:01 AM

That's interesting :) ... who's tools are doing that?

The other extreme are Sarah's HarPE tools which even optimize away pretty much the whole core into logic.

- P
- Phil Tomson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Nov 7, 2005 7:55 AM

The main advantage that C has over the HDLs is that many software engineers know C, not many know VHDL/verilog. Perhaps the goal of targetting FPGAs with C is to allow lots of software engineers to be able to develop algorithms that can be accelerated in a FPGA.

Of course a lot of software engineers do not prefer C these days....

Phil

- R
- Robin Bruce
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Nov 7, 2005 5:41 PM

OK, here's an example from a few months back. It's a functional block that can carry out either an FFT, an IFFT or a complex multiply. The ensemble makes a pulse compressor. The tool has changed a little since then, so I wouldn't write it quite like this again. For example, there was a problem with the % operator back then. Plus, now that I know a little better what I'm doing, I'd work my index variables differently. I'd also make it all one loop, so as to better exploit the pipelining. I maybe add about 100log2(SIZE) cycles by not having it as one loop. I seem to remember this compiled to around 16000 slices of an X2CV6000, and I could clock it at 120MHz (slightly more than ISE said, but you can get away with these sort of things in the lab).

#define FFT_FORWARD -1 #define FFT_BACKWARD 1 #define CMPLX_MULT 2 #define SIZE 4096 #define 2xSIZE 8192

void PC4096Opt(IEEE754 realA_result[SIZE], IEEE754 imagA_result[SIZE], IEEE754 realB[SIZE], IEEE754 imagB[SIZE], IEEE754 Root_u1[2xSIZE], IEEE754 Root_u2[2xSIZE], int shuffle[SIZE], int nn, int m, char mode, IEEE754 scale){

int toggle;

float c1,c2,scaleLocal,t1,t2,u1,u2,xi1,xi,xIn,yi1,yi,yIn, xA[SIZE], yA[SIZE], xB[SIZE], yB[SIZE];

int w,z;

int i,j,i1,l,l1,l2,count,index,offset,shuff;

if ( (mode == FFT_FORWARD) || (mode == FFT_BACKWARD) ){

if (mode == FFT_FORWARD) offset = 0; else offset = nn;

scaleLocal = (float) scale;

for (i = 0; i < nn; i++) { shuff = shuffle[i]; xIn = (float)realA_result[shuff]; xA[i] = xIn; yIn = (float)imagA_result[shuff]; yA[i] = yIn; }

// Compute the FFT

c1 = -1.0; c2 = 0.0; l2 = 1; count = offset; toggle = 0;

for (l=0;l((m-l)-1)); u1 = (float) Root_u1[index]; u2 = (float) Root_u2[index]; w = l2*j; z = w / (nn-1); i = w - (nn-1) * z; // Really, should be: i = (l2*j) % (nn-1) i1 = i + l1;

if (toggle == 0) { xi1 = xA[i1]; xi = xA[i]; yi1 = yA[i1]; yi = yA[i];

t1 = u1 * xi1 - u2 * yi1; t2 = u1 * yi1 + u2 * xi1; xB[i1] = xi - t1; yB[i1] = yi - t2; xB[i] = xi + t1; yB[i] = yi + t2; } else { xi1 = xB[i1]; xi = xB[i]; yi1 = yB[i1]; yi = yB[i];

t1 = u1 * xi1 - u2 * yi1; t2 = u1 * yi1 + u2 * xi1; xA[i1] = xi - t1; yA[i1] = yi - t2; xA[i] = xi + t1; yA[i] = yi + t2; } } count = (l2 - 1) + offset; toggle = toggle ^ 1; } // Scaling for forward transform for (i=0;i

- A
- air_bits
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Tue, Nov 8, 2005 10:01 PM

When we look at FPGAs for reconfigurable computing, that is certainly the draw. When you look at C like offering for FPGAs from firms like Mitrionics and their product Mitrion-C which directly targets moving High Performance Computing (HPC) applications to FPGAs:

formatting link

with the claim of 20 times faster execution on FPGAs, it's pretty clear that FPGA have a new volume market. If you google search reconfigurable computing, there are links to hundreds of firms and projects with this goal.

There are more different types of software engineers than there are hardware engineers. Most software engineers have never liked plain vanilla C, just as most hardware engineers don't like RF and power supply engineering.

That has always been true. C has always been the systems programming language of choice for low level implementation as a direct substitute for assembly language. This has been true since the early days in the 1970s when C was designed from the "B programming Language" which was a threaded interpreter to a fully compiled language usable to replace almost all the assembly in the UNIX operating system and utilities during the V5, to V6, to V7 migrations.

Higher level languages, with better database and GUI interfaces, and other applications development libraries have always been the language of choice for higher level applications. These days that is a large number of higher level object oriented or application specific languages, including C++ and Java. While C++ and Java resemble C syntax, that is about where the simularity ends. Much like Apples, Oranges, and Bananas all grow on plants known as trees, and that is where the simularity ends.

C as a low level assembly language replacement, is primarily used by a subset of programmers doing systems level programming and a small group of applications programmers doing hardware interfacing and performance sensitive optimizations. These programmers frequently have the skill sets to understand interfacing to hardware at a high level, and are the target of many current C based reconfigurable computing development tools projects. While some high level applications programmers used to coding in C++, Java, and other production languages may be trainable to do low level C work on FPGAs, in general they will find C about as primative as a VHDL/Verilog designer will find Schematics.

So, like it or not, there are two vary different markets for FPGA hardware and tools. Those building hardware, and those building applications for HPC platforms. And a lot of grey area in between.