Soft core processors: RISC versus stack/accumulator for equal FPGA resources

It would appear there are very similar resource needs for either RISC or Stack/Accumulator architectures when both are of the "load/store" classification. Herein, same multi-port LUT RAM for either RISC register file or dual stacks. And the DSP for multiply and block RAM for main memory. "Load/store" refers to using distinct instructions for moving data between LUT RAM and block RAM.

Has someone studied this situation? Would appear the stack/accumulator program code would be denser? Would appear multiple instruction issue would be simpler with RISC?

Jim Brakefield

Reply to
jim.brakefield
Loading thread data ...

I've done a little investigation and the instruction set for a stack processor was not much denser than the instruction set for the RISC CPU I compared it to. I don't recall which one it was.

A lot depends on the code you use for comparison. I was using loops that move data. Many stack processors have some levels of inefficiency because of the juggling of the stack required in some code. Usually proponents say the code can be done to reduce the juggling of operands which I have found to be mostly true. If you code to reduce the parameter juggling, stack processors can be somewhat more efficient in terms of code space usage.

I have looked at a couple of things as alternatives. One is to use VLIW to allow as much parallelism in usage of the execution units within the processor, they are, data unit, address unit and instruction unit. This presents some inherent inefficiency in that a fixed size instruction field is used to control the instruction unit when most IU instructions are just "next", for example. But it allows both the address unit and the data unit to be doing work at the same time for doing things like moving data to/from memory and counting a loop iteration, for example.

Another potential stack optimization I have looked at is combining register and stack concepts by allowing very short offsets from top of stack to be used for a given operand along with variable size stack adjustments. I didn't pursue this very far but I think it has potential of virtually eliminating operand juggling making stack processor much faster. I'm not sure of the effect on code size optimization because of the larger instruction size.

--

Rick
Reply to
rickman

On Sunday, September 27, 2015 at 3:37:24 AM UTC+9:30, snipped-for-privacy@ieee.org w rote:

I worked with the 1980's Lilith computer and its Modula-2 compiler which us ed a stack-based architecture. Christian Jacobi includes a detailed analysi s of the code generated in his dissertation titled "Code Generation and the Lilith Architecture". You can download a copy from my website:

formatting link
I am currently working on the 2015 RISC equivalent - the FPGA RISC5 Oberon compiler used in Project Oberon:

formatting link

The code generation is described in detail in the included documentation.

I have both systems in operation and have some very similar test programs f or both. I'll experiment to see if the results give any surprises. Any comp arison would have to take into account the fact that the Lilith was a 16-bi t architecture whereas RISC5 is 32-bit so it might be tricky.

Regards, Chris Burrows CFB Software

formatting link

Reply to
cfbsoftware

r Stack/Accumulator architectures when both are of the "load/store" classif ication.

tacks. And the DSP for multiply and block RAM for main memory. "Load/stor e" refers to using distinct instructions for moving data between LUT RAM an d block RAM.

Have considered multiple stacks as a form of VLIW: each stack having its ow n part of the VLIW instruction, or if nothing to do, providing future immed iates for any of the other stack instructions.

Also this is a way to improve processing rate as there are fewer instructio ns than "pure" stack code (each instruction has a stack/accumulator operati on and a small offset for the other operand). While one is at it, one can add various instructions bits for "return", stack/accumulator mode, replace operation, stack pointer selector, ...

Personally, don't have hard numbers for any of this (there are open source stack machines with small offsets and various instruction bits, what is nee ded is compilers so that comparisons can be done). And don't want to dupli cate any work (AKA research) that has already been done.

Jim Brakefield

Reply to
jim.brakefield

On Saturday, September 26, 2015 at 8:19:29 PM UTC-5, snipped-for-privacy@gmail.com wro te:

used a stack-based architecture. Christian Jacobi includes a detailed analy sis of the code generated in his dissertation titled "Code Generation and t he Lilith Architecture". You can download a copy from my website:

n compiler used in Project Oberon:

for both. I'll experiment to see if the results give any surprises. Any co mparison would have to take into account the fact that the Lilith was a 16- bit architecture whereas RISC5 is 32-bit so it might be tricky.

s a 16-bit architecture whereas RISC5 is 32-bit so it might be tricky. And in the 1980s main memory access time was smaller multiple of clock rate than today's DRAMs. However, the main memory for the RISC5 FPGA card is a synchronous static RAM with a fast access time and comparable to the main m emory of the Lilith?

Jim Brakefield

Reply to
jim.brakefield

I assume you mean two data stacks? I was trying hard not to expand on the hardware significantly. The common stack machine is typically two stacks, one for data and one for return addresses. In Forth the return stack is also used for loop counting. My derivation uses the return stack for addresses such as memory accesses as well as jump/calls, so I call it the address stack. This lets you do minimal arithmetic (loop counting and incrementing addresses) and reduces stack ops on the data stack such as the two drops required for a memory write.

Yes, returns are common so it can be useful to provide a minimal instruction overhead for that. The other things can require extra hardware.

--

Rick
Reply to
rickman

Reply:

Yes, in particular integer arithmetic on one and floating-point on the othe r.

as well as jump/calls, so I call it the address stack. OK

With FPGA 6LUTs one can have several read ports (4LUT RAM can do it also, i ts just not as efficient). At one operation per clock and mapping both dat a and address stacks to the same LUT RAM, one has two ports for operand rea ds, one port for result write and one port for "return" address read. Just about any stack or accumulator operation that fits these constraints is po ssible with appropriate instruction decode and ALU. The SWAP operation req uires two writes, so one would need to make TOS a separate register to do i t in one clock (other implementations possible using two multiport LUT RAMs ).

Jim

Reply to
jim.brakefield

Yes, if you need floating point a separate stack is often used.

I used a TOS register for each stack and used a write port and read port for each stack in one block RAM. The write/read ports share the address. A read happens on each cycle automatically and in all the parts I have used that can be set so the data written in a cycle shows up on the read port, so it is the next on stack at all times.

Managing the stack pointers can get a bit complex if an effort to keep it simple is not made. As it was the stack pointer was in the critical timing path which ended in the flag registers. The stack pointers set error flags in the CPU status register for over and underflow. I thought this would be useful for debugging, but there is likely ways to minimize the timing overhead.

--

Rick
Reply to
rickman

te than today's DRAMs. However, the main memory for the RISC5 FPGA card is asynchronous static RAM with a fast access time and comparable to the main memory of the Lilith?

Rather than trying to paraphrase the information and risk getting it wrong I refer you to a detailed description of the Lilith memory organisation in the 'Lilith Computer Hardware Manual'. You can download a copy of this and several other related documents from BitSavers:

formatting link

Regards, Chris

Reply to
cfbsoftware

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.