Soft core processors: RISC versus stack/accumulator for equal FPGA resources

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
It would appear there are very similar resource needs for either RISC or Stack/Accumulator architectures when both are of the "load/store" classification.
Herein, same multi-port LUT RAM for either RISC register file or dual stacks.  And the DSP for multiply and block RAM for main memory.  "Load/store" refers to using distinct instructions for moving data between LUT RAM and block RAM.

Has someone studied this situation?
Would appear the stack/accumulator program code would be denser?
Would appear multiple instruction issue would be simpler with RISC?

Jim Brakefield

Re: Soft core processors: RISC versus stack/accumulator for equal FPGA resources
On 9/26/2015 2:07 PM, snipped-for-privacy@ieee.org wrote:
Quoted text here. Click to load it

I've done a little investigation and the instruction set for a stack  
processor was not much denser than the instruction set for the RISC CPU  
I compared it to.  I don't recall which one it was.

A lot depends on the code you use for comparison.  I was using loops  
that move data.  Many stack processors have some levels of inefficiency  
because of the juggling of the stack required in some code.  Usually  
proponents say the code can be done to reduce the juggling of operands  
which I have found to be mostly true.  If you code to reduce the  
parameter juggling, stack processors can be somewhat more efficient in  
terms of code space usage.

I have looked at a couple of things as alternatives.  One is to use VLIW  
to allow as much parallelism in usage of the execution units within the  
processor, they are, data unit, address unit and instruction unit.  This  
presents some inherent inefficiency in that a fixed size instruction  
field is used to control the instruction unit when most IU instructions  
are just "next", for example.  But it allows both the address unit and  
the data unit to be doing work at the same time for doing things like  
moving data to/from memory and counting a loop iteration, for example.

Another potential stack optimization I have looked at is combining  
register and stack concepts by allowing very short offsets from top of  
stack to be used for a given operand along with variable size stack  
adjustments.  I didn't pursue this very far but I think it has potential  
of virtually eliminating operand juggling making stack processor much  
faster.  I'm not sure of the effect on code size optimization because of  
the larger instruction size.

--  

Rick

Re: Soft core processors: RISC versus stack/accumulator for equal FPGA resources
On Saturday, September 26, 2015 at 3:02:27 PM UTC-5, rickman wrote:
Quoted text here. Click to load it
r Stack/Accumulator architectures when both are of the "load/store" classif
ication.
Quoted text here. Click to load it
tacks.  And the DSP for multiply and block RAM for main memory.  "Load/stor
e" refers to using distinct instructions for moving data between LUT RAM an
d block RAM.
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it

  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
Have considered multiple stacks as a form of VLIW: each stack having its ow
n part of the VLIW instruction, or if nothing to do, providing future immed
iates for any of the other stack instructions.

Quoted text here. Click to load it
  
Quoted text here. Click to load it
Also this is a way to improve processing rate as there are fewer instructio
ns than "pure" stack code (each instruction has a stack/accumulator operati
on and a small offset for the other operand).  While one is at it, one can  
add various instructions bits for "return", stack/accumulator mode, replace
 operation, stack pointer selector, ...

Personally, don't have hard numbers for any of this (there are open source  
stack machines with small offsets and various instruction bits, what is nee
ded is compilers so that comparisons can be done).  And don't want to dupli
cate any work (AKA research) that has already been done.

Jim Brakefield

Re: Soft core processors: RISC versus stack/accumulator for equal FPGA resources
On 9/27/2015 8:30 PM, snipped-for-privacy@ieee.org wrote:
Quoted text here. Click to load it


I assume you mean two data stacks? I was trying hard not to expand on  
the hardware significantly. The common stack machine is typically two  
stacks, one for data and one for return addresses. In Forth the return  
stack is also used for loop counting. My derivation uses the return  
stack for addresses such as memory accesses as well as jump/calls, so I  
call it the address stack. This lets you do minimal arithmetic (loop  
counting and incrementing addresses) and reduces stack ops on the data  
stack such as the two drops required for a memory write.


Quoted text here. Click to load it

Yes, returns are common so it can be useful to provide a minimal  
instruction overhead for that.  The other things can require extra  
hardware.


Quoted text here. Click to load it


--  

Rick

Re: Soft core processors: RISC versus stack/accumulator for equal FPGA resources
On Sunday, September 27, 2015 at 10:20:39 PM UTC-5, rickman wrote:
Quoted text here. Click to load it
  
Quoted text here. Click to load it

Reply:
Quoted text here. Click to load it
Yes, in particular integer arithmetic on one and floating-point on the othe
r.

Quoted text here. Click to load it
as well as jump/calls, so I call it the address stack.
OK

Quoted text here. Click to load it
With FPGA 6LUTs one can have several read ports (4LUT RAM can do it also, i
ts just not as efficient).  At one operation per clock and mapping both dat
a and address stacks to the same LUT RAM, one has two ports for operand rea
ds, one port for result write and one port for "return" address read.  Just
 about any stack or accumulator operation that fits these constraints is po
ssible with appropriate instruction decode and ALU.  The SWAP operation req
uires two writes, so one would need to make TOS a separate register to do i
t in one clock (other implementations possible using two multiport LUT RAMs
).

Jim

Re: Soft core processors: RISC versus stack/accumulator for equal FPGA resources
On 9/28/2015 12:31 AM, snipped-for-privacy@ieee.org wrote:
Quoted text here. Click to load it

Yes, if you need floating point a separate stack is often used.


Quoted text here. Click to load it
 >
Quoted text here. Click to load it

I used a TOS register for each stack and used a write port and read port  
for each stack in one block RAM.  The write/read ports share the  
address.  A read happens on each cycle automatically and in all the  
parts I have used that can be set so the data written in a cycle shows  
up on the read port, so it is the next on stack at all times.

Managing the stack pointers can get a bit complex if an effort to keep  
it simple is not made.  As it was the stack pointer was in the critical  
timing path which ended in the flag registers.  The stack pointers set  
error flags in the CPU status register for over and underflow.  I  
thought this would be useful for debugging, but there is likely ways to  
minimize the timing overhead.

--  

Rick

Re: Soft core processors: RISC versus stack/accumulator for equal FPGA resources
On Sunday, September 27, 2015 at 3:37:24 AM UTC+9:30, snipped-for-privacy@ieee.org w
rote:
Quoted text here. Click to load it

I worked with the 1980's Lilith computer and its Modula-2 compiler which us
ed a stack-based architecture. Christian Jacobi includes a detailed analysi
s of the code generated in his dissertation titled "Code Generation and the
 Lilith Architecture". You can download a copy from my website:

http://www.cfbsoftware.com/modula2/
I am currently working on the 2015 RISC equivalent - the FPGA RISC5 Oberon  
compiler used in Project Oberon:

http://www.projectoberon.com

The code generation is described in detail in the included documentation.
  

I have both systems in operation and have some very similar test programs f
or both. I'll experiment to see if the results give any surprises. Any comp
arison would have to take into account the fact that the Lilith was a 16-bi
t architecture whereas RISC5 is 32-bit so it might be tricky.

Regards,
Chris Burrows
CFB Software
http://www.astrobe.com


Re: Soft core processors: RISC versus stack/accumulator for equal FPGA resources
On Saturday, September 26, 2015 at 8:19:29 PM UTC-5, snipped-for-privacy@gmail.com wro
te:
Quoted text here. Click to load it
 wrote:
Quoted text here. Click to load it
used a stack-based architecture. Christian Jacobi includes a detailed analy
sis of the code generated in his dissertation titled "Code Generation and t
he Lilith Architecture". You can download a copy from my website:
Quoted text here. Click to load it
n compiler used in Project Oberon:
Quoted text here. Click to load it
  
Quoted text here. Click to load it
 for both. I'll experiment to see if the results give any surprises. Any co
mparison would have to take into account the fact that the Lilith was a 16-
bit architecture whereas RISC5 is 32-bit so it might be tricky.
Quoted text here. Click to load it

s a 16-bit architecture whereas RISC5 is 32-bit so it might be tricky.
And in the 1980s main memory access time was smaller multiple of clock rate
 than today's DRAMs.  However, the main memory for the RISC5 FPGA card is a
synchronous static RAM with a fast access time and comparable to the main m
emory of the Lilith?

Jim Brakefield

Re: Soft core processors: RISC versus stack/accumulator for equal FPGA resources
On Monday, September 28, 2015 at 10:49:47 AM UTC+9:30, snipped-for-privacy@ieee.org  
wrote:
Quoted text here. Click to load it
te than today's DRAMs.  However, the main memory for the RISC5 FPGA card is
 asynchronous static RAM with a fast access time and comparable to the main
 memory of the Lilith?

Rather than trying to paraphrase the information and risk getting it wrong  
I refer you to a detailed description of the Lilith memory organisation in  
the 'Lilith Computer Hardware Manual'. You can download a copy of this and  
several other related documents from BitSavers:

http://www.bitsavers.org/pdf/eth/lilith/

Regards,
Chris


Site Timeline