FpgaC developers wanted :)

- F
- fpga_toys
  
  Contact options for registered users
posted
18 years ago

Thu, Mar 30, 2006 2:53 AM

The open source FpgaC project is looking for a few more developers in various technology areas and skill levels. One project is updating the generic LUT only technology mapping code that came with TMCC written a decade ago, to a newer algorithm that better fits current technology FPGAs today. We've started this with several changes, and there is lots more fun work to go.

The next part of this project, is given a truth table, is to find the best way to decompose it into LUT's and MUX's. This project is available for anyone that would like it. Since it's pretty issolated from the compiler guts, and will be implemented in the device specific output functions, it's a good project for someone wanting to get their feet wet as an internals developer.

This isn't exactly easy to do well, but almost any implementation will be better than the current fitting strategy for LUT only mapping, so refinement can happen over time as the developers skills build in this technology area. This is an area of active and competitive research, so it might make an interesting undergraduate project, or graduate level thesis.

Interesting reading is related to the Quine-McCluskey algorithm (implemented in FpgaC) and general logic synthesis algorithms. See:

formatting link

Good resume builder project for someone interested in EDA software tools as a job.

formatting link

Have fun, John

- I
- Isaac Bosompem
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 5:33 AM

Hi John,

I have to ask are you the head developer of this project?

Also I see University of Toronto is heavily involved with your project. While I do not go to that school, I go to a lesser known University in the area, do you have any contacts there?

Thanks

-Isaac

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 5:45 AM

I founded the FpgaC project, and drive it to some extent, but view it as a group effort.

They have not been active in FpgaC, other than providing BSD licensed TMCC sources we used as a foundation. I exchange emails with the original author from time to time, but he is too busy in other projects to participate at this time.

- P
- Phil Tomson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 7:52 AM

What format is the truth table in? (BLIF, PLA, some other internal data structure)

So is this the tech mapping step? You've already created the truth table that represents some part of the design and you need to map it into the underlying architecture. Sounds interesting.

Ah, yes, Alan is on the cutting edge with this And-Inverter Graph stuff. He's also a very good programmer. Can you make use of what is already in ABC? While I would hesitate to use code from some other university projects that have appeared in the past, I would not hesistate to use Alan's code; it's generally engineered very well.

Phil

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 3:52 PM

Currently just a 2^n bit table, with a linked list for the associated var's.

Internally, each signal in TMCC/FpgaC has always been a truth table with var list, limited to 4 inputs, and mapped to LUTs at netlist output time. Simple optimizations, like don't cares and duplicate removal have been applied to this 4-lut set to generate a reasonable netlist, that's a little deeper than can be achieved by allowing the internal representation to be wider than a 4-LUT. Widening up the internal representation allows for F5/F6 MUX's to be easily used, controlling the depth of the LUT tree, and better don't care removal, at the cost of more effort to decompose the truth table at the technology mapping step.

I've looked at a number of routing and mapping solutions over the last couple years with conflicting project goals for FpgaC. Both excellent static map/route and excellent fast dynamic compile load and go or Just-In-Time styles seem to be needed. Because of frequent larger word widths, there are some additional twists, like replication to minimize fan-out which become useful at some point. Where to start has always been an interesting problem. It seems that maybe just looking at the technology mapping in a general way for specific devices is a good start.

The ideas in ABC, and related educational projects, are certainly a good grounding to start with.

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 4:03 PM

I should probably note that I didn't want to make an implementation choice, allowing whoever wants this project to make that choice, which might include their own research work. This could easily be a one, two or more person project as it can quickly expand to meet broader needs ... or it could be a class project with a lasting legacy.

- E
- Erik Widding
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 6:02 PM

Does this bit table represent two or three states per bit? (i.e. True, False, or True, False, Don't Care)

While on a sequential processor it might not make sense to worry about the don't care state and simply enumerate the table, in logic it can make a very big difference. You have probably already thought of this. The use of "bit" just threw up a red-flag for me, as the bit type in C generally has only two states.

Regards, Erik.

--
Erik Widding
President
Birger Engineering, Inc.

 (mail) 100 Boylston St #1070; Boston, MA 02116
(voice) 617.695.9233
  (fax) 617.695.9234
  (web) http://www.birger.com

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 6:50 PM

It's bivalued, since that is the only defined number set for C boolean operations, logicals, and arithmetics used by C. Multivalued sets have little value otherwise for FpgaC.

Because the boolean and arithmentic operations are defined as bivalue operations, nothing else is necessary for FpgaC

Which, is enough. Even for boolean and arithmetics for other HDL and HLL's RTL generation. Extended sets for simulation, fuzzy logic, tristate busses, and other domains is a completely different problem.

- E
- Erik Widding
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 9:26 PM

John,

I think I may have done a poor job communicating the issue. So I will offer a little more detail.

It is common occurance in code to have a series of independent "if" statements with no correponding "else" all operating on the same set of variables. Atleast this is common in code that I have written. In VHDL and C this can be a very efficient way to code. In either language the last assignment wins. And in many such instances, the truth table is then only populated with a very small subset of answers. The remainder of the table can be assumed to have a value of "don't care", in many languages this requires a first (or default) assignment to the state of "don't care" that may or may not be overridden.

When minimizing logic into anything other than a fully populated truth table the ability to manipulate the "don't cares" can allow for the drastic reduction in logic at mapping. Any first year digital design course teaches a technique called "Karnaugh Mapping" as an analysis tool and visual means for exploring this.

If a default value is placed into the table, it artificially over specifies the logic, and will result in a suboptimal result. There is no problem with equations of no more than four terms as your basic unit is a 4-lut. But when you start getting into complex state machines with dozens of terms, this can result in being an order of magnitude off in utilization in specific areas of a design. I bring this up now, as the early test cases are going to be simpler, so it may be some time before you see that there is really a problem.

So I am not disagreeing with you that FPGAC may not care about this for now. I am suggesting that you should allow for this future optimization that you will almost certainly need, by considering that you should specify your fitter to take as input the additional state of "don't care". All of the work out of Berkely (espresso, etc) supports this addional state as it is crucial to quality of result.

This is me trying to constructively offer an example to the earlier suggestion I made that "what you want to do [in the area of reconfigurable computing] is a whole lot more similar to the current use of the devices than you realize."

This post is intended to be supportive. If you have taken it any other way, please simply ignore it.

Regards, Erik.

--
Erik Widding
President
Birger Engineering, Inc.


 (mail) 100 Boylston St #1070; Boston, MA 02116
(voice) 617.695.9233
  (fax) 617.695.9234 
  (web) http://www.birger.com

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 10:32 PM

I'll expand this a little, by adding that there can also be an 'inferred else' operation, that depends on the register used.

Thus a .D ff state engine, will tend to 00000 state, if it hits a non-covered instance, whilst a .T ff state engine will stay where it was. If you want to include glitch/noise recovery, that can matter...

-jg

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Mar 30, 2006 11:44 PM

Actually, there are a small number of implicants, and a large number of default zeros.

Not true in C. Assuming a don't care, would allow covering a term which doesn't have an explict implicant, and potentially change a variable that was remotely set. That brings us to two separate cases:

1) globals always have a default value of zero, so there is never a don't care for this case. 2) function variables have an undefined value initially, and it's considered poor programming practice to leave the variable unassigned before any reference, as the value will have a nondeterminate state (possibly it's prior state, prior stack contents, but implementation dependent).

As such, there isn't any asserted term that can properly be a don't care by your construction in C, as assignments imply implicants, and lack of assignments/implicants imply retained state (no modification of the variable).

The design by Dave Galloway (TMCC's author) models this correctly, using the ffs_zero_at_powerup model. The simple if and nested if statements are then reduced as follows:

1) if(c) a = b; populates the truth table for "a" with the function b & c, where b AND c becomes an implicant for "a". 2) if(c1) {if(c2) a = b;} populates the truth table for "a" with the function b & c1 & c2, where b AND c1 AND c2 becomes an implicant for "a".

If there are other assignments to "a", it may have other implicants, including it's self if there is a reference to "a" before it's first assignment (resulting in a FF with either a gated clock, or feedback term).

Very true, except that the C model precludes this. Assuming don't cares, would "create" assignments where there are not implicants in the current context.

- E
- Erik Widding
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Mar 31, 2006 3:55 AM

How do you handle the case when a local variable (very local, as in local to the block) is being used to store a partial calculation?

Many coding standards insist that a line of code not wrap (at least as a general rule). The point of this is not all variables have a notion of state (as in a flop), as they are created locally, assigned once and only read once locally. I would assume there are a lot of instances where a local variable is used for the readability of the code, rather than with the intent of creating another stage in a pipeline. How does FpgaC differentiate between variables that are meant to be another stage in a pipeline (i.e. a flop) and those that are merely for convenience (i.e. a combinatorial node)?

Many coding standards insist that global variables not be used, except when used as semaphores or in a similarly limited way to share data between two threads. As I understand it the few remaining places where it is considered acceptable to use a lot of global variables is in resource limited (i.e. 8bit micros with little ram) systems. So for the most part I would think the "changed outside of the scope" argument is moot. Most compilers assume that a variable can not be changed outside of the scope unless it is declared VOLATILE. So, I don't buy the argument as anything more than a special case.

In the usage of variables with scope no more global than the single sequential path through the a block, is it more important to take the strict ansi-C interpretation that anything declared is implicitly zero, and follow the logic tree through; or is it worth understanding when the logic has nodes that truly have a "don't care" component and optimizing as such?

If the possibility to optimize is ignored, I fear you are creating a tool that may have some academic interest, but not widespread practical use.

I will be the first to admit that I have not played with FpgaC at all - and have no intention to. But I assume there have to be pragmas to direct the compiler to deal with pipeline and concurrency issues, as there is no support in the language for this. Much of the use of a sequential language for representing sequential logic has to implicitly told to the compiler so that it can understand which variables represent combinatorial nodes, and which represent sequential nodes.

Being only C-like and not actually ansi-C, please enlighten me why it is important to hold on to one of the aspects of the language (implicit zeros which is not universal across all types of variables in the language, just in most cases) that kills efficiency when targetting hardware, when so many other aspects have to be bastardized to make the use of this language for this purpose even possible?

I ask, these questions because my cursiosity has been piqued - and I obviously clearly don't understand.

Regards, Erik.

--
Erik Widding
President
Birger Engineering, Inc.


 (mail) 100 Boylston St #1070; Boston, MA 02116
(voice) 617.695.9233
  (fax) 617.695.9234 
  (web) http://www.birger.com

- F
- fpga_toys
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Mar 31, 2006 6:22 AM

If there has been a value stored, then what's the problem? It will be correct when referenced. If however, there may be a value stored, and contain an undefined value at other times, then there is a programming error.

A variable that is always referenced after an assignment, is reduced to a combinatorial expression, and doesn't get latched in a FF. If the variable is referenced in a loop, before being assigned in that loop, then it's latched in a FF.

Similarly, global's in FpgaC are frequently applied to the same coding standard, for similar reasons. It's isn't however moot, since the value is global, it may (and most likely is) expected not to change state magically, just because it was easier for the compiler to cover a "don't care" that would assign it errantly by including a broader implicant than the conditionals explictily represent.

If there is a don't care, then there is an uninitialized variable case, which is in correct programming. Show me a clear case where it is not.

There is NOT a possiblity to optimize here that is being ignored. As I understand your example, you are presenting an undefined variable case, and wanting to define it as a don't care, which is not correct C, and has clear side effects which can be wrong.

While some C based HDL's and HLL's use pragma's to control concurrance and pipelining, it is not required by the C language definition and expected sequential semantics. Just as multi-issue processors are free to concurrently execute multiple statements in parallel, so does FpgaC, using identical strategies. The limitation to this, is that the apparent senquencing must be preserved. So in TMCC/FpgaC every block is assumed to be concurrent, and clocks are observed for looping and function calls.

In Handel-C they choose to make each statement a cycle, and use "par" blocks for concurrancy. FpgaC doesn't use that strategy, choosing instead to make looped blocks a cycle.

In Streams-C they use pragmas to have the compiler automatically unroll loops and insert any required retiming for the resulting pipeline. FpgaC doesn't provide that, but offers instead that the programmer may directly control the degree of unrolling and pipelining by the structure and sequence of the statements. Pipeline stages are explicitly created by reversing statement order, in a way that the pipelines are formed using C semantics. Consider: fpgac_input iport; fpgac_output oport; while(1){ int a, b, c; a = iport; b = a; c = b; oport = c; }

this creates a combinatorial from iport to oport, transfering new values each clock. pipelines are created by reversing the statement order:

oport = c; c = b; b = a; a = iport;

so that there is no combinatorial chain from iport to oport, but rather a 4 cycle/stage pipeline where the values in iport require successive clocks to walk thru a, to b, to c, and finally to oport.

This code is functionally correct executing on an FPGA, with identical looping/pipeline behavior on a traditional processor. Including any retiming that may be necessary for references outside a particular pipeline stage.

Actually, the goal for FpgaC is to become a proper subset of ansi-C, with a few minor additions. And to that end it is NOT correct to violate certain conventions like zero initialized memory, and making assignments which are incorrect by the language standard.

I believe you are very wrong here. The efficiency you claim is being lost, only exists for illegal programming practice of using an undefined variable value in a poorly (incorrectly) formed statement sequence.

I do believe that is the case. Show me a valid programming example otherwise.

- E
- Erik Widding
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Mar 31, 2006 6:46 PM

John,

I want to thank you for explaining this for me. It has been educational. I wrote in my second post:

The implication in this was that C was one of those languages that would require the implicit assignment to "don't care". If this were accounted for in your analysis of my question it would have resulted in a different understanding of where my question was headed. Though I do realize from your response that being ansi-C compliant is one of the important driving principles in your work.

It has been an interesting conversation. I am coming at this from the perspective of an engineer that has degrees in Electrical Engineering and in Computer Science Engineering, that has used at least a dozen languages as a practicing engineer, and seen at least as many in an academic setting. My questions come from the perspective of an end user of the language, who wants to know in advance what the limitations of that language are.

We use VHDL rather than Verilog, having used both, because some of the things that we want to do are not easily supported in Verilog, and tolerate the more cumbersome nature of the VHDL language because it does not have these limitations. We do our algorithm development and much of our software in Ansi-C though we use the C++ extensions in the compilers because the strict type checking often catches unintended mistakes. We often write algorithms a first time in C to prove they work, write them a second time in C using the structure that the hardware will actually have, and a third time in VHDL reflecting the exact structure that we want.

If FPGAC allowed the third writing to simply be an optimization of the second using the same source code, this would be interesting to us. But from my perspective I will still get the best result for my work with the current strategy, as we need to be able to specify optimizations that are, as you have pointed out, contrary to the c model. I do not mean to insult you by saying that it is interesting work, and may have great application one day for those who would not have even considered an FPGA for the solution, but for me coming from the perspective that an FPGA is a highly probable target, I need the design methodology that is most efficient from both a design cost and silicon cost perspective.

Regards, Erik.

--
Erik Widding
President
Birger Engineering, Inc.


 (mail) 100 Boylston St #1070; Boston, MA 02116
(voice) 617.695.9233
  (fax) 617.695.9234 
  (web) http://www.birger.com