Know any good public FPGA projects to contribute to?

S

signaltap 11 years ago

Hi all,

Can you suggest any good FPGA projects I could contribute to? I have some free time and want to work on something challenging and interesting. Inste ad of starting something myself I'm wondering where to find some cool proje cts that exist already that need help.

Thanks!

Vote

M

mnentwig 11 years ago

Hi,

here, for example, is one.

formatting link

In general, there are many CPUs but a shortage of simple (!) "Hello world" examples to actually use them without spending a week first.

This blog nails it, more or less:

formatting link

(that said: I managed to get the "small" variant of the ZPU in question working on a Spartan 6, here.

formatting link

It is slow but fairly small, about 12 % on a Spartan 6 LX9)

Another interesting project is "minSoc". It appears to be very well maintained.

A simulation worked right out of the box when I tried yesterday - it even includes its own iverilog simulator - but I wasn't able to build on Spartan

6 as the JTAG block is not supported.

A minimal openRisc "hello world" example could be useful for many - nothing but processor, on-chip RAM with initial values for program code and a LED.

--------------------------------------- Posted through

formatting link

Vote

M

mnentwig 11 years ago

wrong link: blog.tube42.se/?p=105

--------------------------------------- Posted through

formatting link

Vote

M

mnentwig 11 years ago

well... as fascinating as this candy business is, I was trying to link to "Tubologue | The sad state of OSS hardware (part 1)" but usenet won't let me... Lost in quotation...

--------------------------------------- Posted through

formatting link

Vote

J

jim.brakefield 11 years ago

e free time and want to work on something challenging and interesting. Ins tead of starting something myself I'm wondering where to find some cool pro jects that exist already that need help.

Have an experimental processor core that needs the VHDL for the control log ic to be written.

Jim Brakefield

Vote

R

rickman 11 years ago

There are a million processor cores out there. What is interesting about yours?

Rick

Vote

J

jim.brakefield 11 years ago

some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some cool projects that exist already that need help.

logic to be written.

Hybrid between stack, accumulator and memory oriented instruction sets. (1 to 4 stack pointers with offset addressing, frame & thread pointers, sin gle block RAM) (data size orthogonal, single to quad issue capable, fast interrupts)

Intent is that it can be used as an accumulator machine, a stack machine or a C machine. Everything except a RISC machine. All pointer registers (in cluding PC) are in a LUT RAM, stacks are in the block RAM (at least in a mi nimal implementation).

Vote

T

Tom Gardner 11 years ago

To be a bit belligerent, those are all internal /features/, not /benefits/ visible to a user of the (black-box) processor. Certainly they are all more-or-less useless without tool support (e.g. compiler, debuggers).

Now if you had said that the processor used minimal power, or had fixed execution times for all instructions (so that the compiler/IDE could define the execution time of each block/loop/subroutine), then that might have been of benefit to the user of the black box.

Vote

J

jim.brakefield 11 years ago

e some free time and want to work on something challenging and interesting. Instead of starting something myself I'm wondering where to find some coo l projects that exist already that need help.

ol logic to be written.

e or a C machine. Everything except a RISC machine. All pointer registers (including PC) are in a LUT RAM, stacks are in the block RAM (at least in a minimal implementation).

It does have fixed execution times. It is intended for the hard real-time embedded market. Power should be minimal: Estimating 300 Spartan6 LUTs + multiplier for 16-b it version.

Vote

G

glen herrmannsfeldt 11 years ago

(snip)

Well, for a high-level language programmer, I suppose. But for assembly programmers, those are mostly still visible.

Now, we could all say that it doesn't matter, that Intel won the world, but I might believe that there is still something left out there, especially if the goal isn't to get rich.

Also, there might still be some room for new ideas in soft processors.

-- glen

Vote

T

Tom Gardner 11 years ago

OK, that's a benefit in some situations. I wonder how it compares to the XMOS processors, which claim the same advantage and are commercially available at Digikey

formatting link

They have good tool support.

Vote

T

Tom Gardner 11 years ago

True, but it doesn't really invalidate my point.

No, they've carved themselves out a large lucrative niche :)

Very definitely. But then I've worked a few miles from the origin of other commercial processors. e.g ARM in Cambridge, and XMOS in Bristol.

A very valid goal, but please be explicit about that so that other people can quickly assess its viability.

Very definitely. Over the past half decade there's been an explosion of new commercial processor families. Most will fall by the wayside, but some will succeed.

Vote

R

rickman 11 years ago

I'm not quite understanding. First, I don't really know what an "accumulator" machine is other than one which has very limited instructions that don't let you do much other than move stuff through an accumulator. Is there some advantage to an accumulator CPU?

So it is a stack machine with stack pointers into on chip memory. You feel it is useful as a platform for C. Do you have any plans to provide a C compiler for this?

I would find it interesting if you could compare this to the ZPU, a 32 bit soft core designed for C which can be quite small. I believe the small version fits in around 500 LUT4s. I'm not sure how to compare LUT4s to the LUT6s found in the Spartan 6. But the ZPU is quite slow when running C code and possibly any code. It takes a lot of CPU cycles to do nearly anything. Do you have any timing info on your design?

Rick

Vote

J

jim.brakefield 11 years ago

For me the classic accumulator machine is the CDC 1604. Instruction addresses a memory location and result between memory and accum ulator is left either in the accumulator or in memory. CDC 1604 had six in dex registers.

The stack pointers can be used either as stack pointers or as index registe rs. The 2nd operand uses a pointer + offset address. So second operand can be somewhere on any of the stacks or at absolute adr, relative adr or an immed iate.

Would like to have a C compiler. Probably beyond my ability. Am comfortable with assembler. Consider the programming model up for grabs. E.g., this can be considered a research machine.

ZPU has a limited instruction set. Here, have tried to put as much functio nality into each instruction so each instruction does the work of several R ISC instructions. While keeping code density high.

Typically it takes about 1.5 4LUTs to equal a 6LUT or an Altera ALUT.

formatting link

Click on family compar ison link.

As currently designed instructions take 2, 3 or 4 clock cycles with a weigh ted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am aimin g to get 200MHz clock frequency on a Kintex-7 part. Straight forward to do uble this by executing two instructions one clock apart using dual port Blo ck RAM.

Vote

R

rickman 11 years ago

Ok, so you are shooting for high code density. Have you done any comparisons with other machines? Saying "each instruction does the work of several RISC instructions" is just shooting from the hip.

Rick

Vote

J

jim.brakefield 11 years ago

an

ccumulator is left either in the accumulator or in memory. CDC 1604 had si x index registers.

de

isters.

be somewhere on any of the stacks or at absolute adr, relative adr or an i mmediate.

red a research machine.

es

ctionality into each instruction so each instruction does the work of sever al RISC instructions. While keeping code density high.

mparison link.

eighted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am a iming to get 200MHz clock frequency on a Kintex-7 part. Straight forward t o double this by executing two instructions one clock apart using dual port Block RAM.

There are several sources of code inefficiency in the standard RISC instruc tion set: A) 16-bit immediates and displacements when they are mostly under 8-bits. B) 15-bits per instruction for register locations (3x5) C) Load and store instructions in addition to calculation instructions D) Separate address modification instructions and worst of all: E) Subroutine overhead

A thru D: Besides the normal code density advantage of non-RISC and compact ed RISC versus standard RISC, the architecture supports instruction byte gr anularity with the single byte instructions being stack instructions. Don' t have any statistics. E: The standard C model for subroutines has the effect of discouraging shor t subroutines. This is where stack machines gain a big advantage.

Vote

R

rickman 11 years ago

Ok, that is compared to RISC in a subjective manner. How about other ISA types? MISC? Other CISC designs?

I find it interesting that you refer to issues in using C while you have no intent to work toward having your CPU supported by a C compiler. Is anything to do with C important then?

I would point out that even if you reach 200 MHz operation on an FPGA that will be approximately the same as running at 57 MHz if your instructions used a single clock cycle, not a tricky goal usually. In addition that makes many aspects of the machine simpler.

Rick

Vote

J

jim.brakefield 11 years ago

h an

accumulator is left either in the accumulator or in memory. CDC 1604 had six index registers.

ou

vide

egisters.

an be somewhere on any of the stacks or at absolute adr, relative adr or an immediate.

dered a research machine.

32

e

w

cles

unctionality into each instruction so each instruction does the work of sev eral RISC instructions. While keeping code density high.

comparison link.

weighted average of 3.25 clock cycles (branches take 2, arithmetic 4). Am aiming to get 200MHz clock frequency on a Kintex-7 part. Straight forward to double this by executing two instructions one clock apart using dual po rt Block RAM.

rk

truction set:

s.

pacted RISC versus standard RISC, the architecture supports instruction byt e granularity with the single byte instructions being stack instructions. Don't have any statistics.

short subroutines. This is where stack machines gain a big advantage.

Tend to consider C as typical of using a single memory stack to hold subrou tine frames, parameters, result, previous frame pointer and return address. However, some compilers probably manage to keep some of this in registers . Original Fortran used parameter lists and globally allocated memory for non

-recursive subroutines. On a dual stack machine, parameters can moved to t he return stack to create a frame. My intent is that the ISA support any o f these memory usages and others as well.

If each instruction does twice as much as a RISC instruction and if the dua l issue does not increase the LUT count significantly, then will have a 200 MHz RISC equivalent. Without pipelining.

Also possible to use LUT RAM for the stacks and increase the execution rate . For now content to go with the slower design. Yes RISCs are simple. Keepi ng all the stacks in memory and all the pointers in LUT RAM is also simple. For this instruction set, address ALU is 100% busy and data ALU is 30% bu sy. With dual issue one needs a second address ALU and data ALU is 60% bus y.

Am aiming for a low LUT count, single block RAM design. My figure of merit is instructions per second per LUT (with adjustment for word size). Very easy to add a few features and double the LUT count. There is an extensive comparison of soft core processors at:

formatting link

click on best of each design link

Jim Brakefield

Vote

R

rickman 11 years ago

Ok, let us know how it shakes out.

Rick

Vote

D

David Brown 11 years ago

There is no requirement to have a stack for C - the standards don't even mention the word. And there are C implementations for machines that have little or no stack (perhaps just a short hardware return stack). But the most common arrangement is a single stack for frames, parameters, and return addresses, with passed data and local variables in registers where possible and on the stack when necessary.

The key reason for having a single stack is not processor efficiency, but for simplicity of memory management. Starting from low memory, you have program code, then statically allocated data, and then the heap (for dynamic memory) grows up into free space. The stack starts at the top of memory and grows downwards, until it hits the heap and the system crashes.

If you are happy with a segmented memory, then it may be more efficient to have multiple stacks for different purposes. This is particularly true if the hardware can access the stacks simultaneously.

There is quite a bit of information available on the net for multiple stack systems. Many target Forth rather than C, as Forth is a highly stack-oriented language.

Superscaling (handling multiple instructions simultaneously) is usually considered much more advanced and complex to implement than pipelining, which has been common on cpu cores for decades.

Vote

Know any good public FPGA projects to contribute to?

Join the Discussion

Didn't find your answer?