A Challenge for serialized processor design and implementation

Hi

I have been think and part time working towards a goal to make useable and useful serialized processor. The idea is that it should be

1) VERY small when implemented in any modern FPGA (less 25% of smallest device, 1 BRAM) 2) be supported by high level compiler (C ?) 3) execute code in-place from either serial flash (Winbond quad speed SPI memory delivers 320 mbit/s!) or from file on sd-card

serial implementation would be smaller and run at higher speeds, so

128 clock per machine cycle would already mean 2 MIPS, what would be acceptable for many applications.

Parallax basic stamps I executes 2KIPS only, so ultra lite serial processor in FPGA with 2 MIPS would be eh, for me its some to dream off :)

I have poked around this idea for some years, but never got the "final kick" to really go and do-complete the design and development of this processor.

So I decided to offer some bounty for others to maybe motivate to work for this goal and dream, current list of items available for the developers from my own funding is listed here (I hope to add items and maybe some $ by the time)

formatting link

there is also very preliminary spec-goal document as well

Antti Lukats

Reply to
Antti
Loading thread data ...

The idea is not new, according to Denyer, Renshaw "VLSI Signal Processing A Bit serial Approach" Addison-Wesley 1985 there were machines back in the 50ies, 60ies ( Pilot ACE 1953 ). Description of a typical application would be "Speech Codec architecture for Pan-European Digital Mobile Radio using bit-serial Signal processing" ( Nokia & Tampere University ) in: Brodersen "VLSI Signal Processing III" IEEE Press 1988 Thats a DSP for GSM, fixed application. But serial 8 bit microprocessors like the MC6803 or ST62xx were a slow nuisance.

MfG JRD

Reply to
Rafael Deliano

That means an existing core ?

Did someone mention an 8080 earlier - how small is that ?

To work best with serial memory, a smart set of skip opcodes is needed. eg Conditional skip of 1,2,3,4 opcodes. Jumps are very slow, tho to accept existing cores, I suppose some (small enough?) extra logic could be added that 'sniffed' the jump opcodes, and diverted the very small forward ones to a Skip-Instead block ?

What about adding this for DATA memory option ? SPI SRAM.

formatting link

I'd target the Winbond Quad parts as a primary target, and look at 1, 2, 4 bit serial choices. (1 bit may not be the smallest?)

4 bit (nibble serial) would look to have merit, as it matches the memory, and you also can do 8/16/(24?)/32 cores, with simply wider busses. IIRC, natsemi used 10 clocks on their COP's which gave 8 clocks to transport data, and 2 clocks for opcode-action. 2 Winbond devices would morph this to 4 DT clocks + 2 opcode, for a 6 clock Core.

-jg

Reply to
Jim Granville

"Antti" skrev i meddelandet news: snipped-for-privacy@p73g2000hsd.googlegroups.com...

You mean a COP800 FPGA core :-)

The COP800 uses 10 clocks, Clock 1 = Precharge Clock 2-9 process 8 bits of data. Clock 10 ?

--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB
Reply to
Ulf Samuelsson

With the COP800 well past it's commercial life, what is the status of 'abandonware' C Compilers for it ? How small is a COP800 in fpga ?

Opcode execute ?

With BRAM being cheap, and DATA SRAM being slow/relatively expensive, the core needs to use many registers well. Features like Bank swap, or register frame pointer aka Z8/XC166 etc. Also a Push NN/PopNN would avoid costly DATA ram thrashing.

-jg

Reply to
Jim Granville

We have a COP8 compiler. The COP8 certainly fits the description because it original implementation was almost identical.

8051, Z8 and 6804 would also be on my historical short list.

Walter..

Reply to
Walter Banks

Antti,

my impression is that a subset of the 32 bit Transputers with a serial implementation would be the best tradeoff in terms of complexity and performance. The 3 element hardware stack and ALU would be very compact on the FPGAs and the instruction set would give you reasonable access to a block RAM (you don't want to write too much to a Flash and it is nice to avoid reading random data from it to keep the instruction stream flowing).

Depending on how compatible it was (I wouldn't include the multitasking and message sending stuff) you would have these languages available for it:

formatting link

A serial implementation of a classic RISC (like MIPS or DLX) would be somewhat more awkward, but it would be doable.

A well known serial processor was the Motorola MC14500B, which is probably as simple as it can get - see VHDL at

formatting link

Sadly, it is far too simple and wouldn't do a good job for the applications you are probably interested in. Since you mentioned Basic Stamps, a very unorthodox approach would be to implement in hardware the virtual machine for TinyBasic, put the interpreter in a block RAM and have your Flash hold the source (or tokenized source) directly.

Sorry about only giving advice instead of actual help, but you know how hard it is to find time for so many projects.

-- Jecel

Reply to
Jecel

Do you still sell any ?

The target has to be smaller than PicoBlaze/Mico8

- otherwise, why bother ?

So, 8051 are too large for this project, and I thought of the Modern RS08 (as that does have C compilers ;), but I get the feeling all the multi-byte opcodes, and address variants would not map well onto a FPGA. (so it would fail size) )

Z8 - Nice Reg-Reg scheme, but likely to be close to 8051 in FPGA resource usage.

8048 ? maybe, but no compilers for this ?

The best-mapped FPGA small CPUs use 18 bit opcodes, so that makes finding some old chip, with a C-Compiler, unlikely!

There is CoolRisc 816, which uses a 22 bit opcode, and does have a GNU C compiler, and some infrastructure ? Again, could be too large...

A 24 bit opcode is also possible. It WOULD give faster operation, and more opcodes/KB, than 32 bit opcodes. Not sure how many C compilers, for 24 Bit opcodes ?

-jg

Reply to
Jim Granville

Maybe I am missing something, but I have seen CPUs in FPGAs as small as 600 LUTs. I am pretty sure the picoBlaze is about that size. Isn't there a C compiler for that?

The OP asked for something that would use no more than 25% of the smallest FPGA in a given family. That is still nearly 1000 LUTs. So why go with anything else?

A bit serial CPU might be smaller than an 8 bit CPU, but what is the driving need for something that small? 600 LUTs is not much in a 3000 LUT FPGA!

Reply to
rickman

I think it is smaller, about 200 LUTs:

formatting link

Could be interesting to pack it in a Max II, where the smallest device has

240 LEs. Sometimes you need some high speed logic and some more complex tasks, but which can be low speed (keyboard sampling, output to LCD text display). If you can get an additional low speed CPU for free, you could save an external microcontroller.
--
Frank Buss, fb@frank-buss.de
http://www.frank-buss.de, http://www.it4-systems.de
Reply to
Frank Buss

And also the similar Mico8 ~240-323 LUT

formatting link

The serial code memory is part of the appeal. FPGA cores are easy enough, but they are like stone soup, and you need to add code execution storage, = many pins, and EMC and PCB area issues. Single chip uC are a tough nut to crack, as they have FLASH+Analog, and higher volumes and growths than the FPGA sector.

-jg

Reply to
Jim Granville

I can see cases where this could be useful, where there's a need for a complex state machine, but not necessarily super fast processing.

IMO, the most critical aspect of such a core is that it's easy to target a real C compiler, like LLVM, for it. Thus I strongly suggest a plain-jane orthogonal *32- bit* RISC, ideally without delay slots or special purpose registers, etc.

Maybe one of the existing ones (MIPS, MB, Nios II, Mico32, etc) is already suitable enough, which would save the effort of building a new tool chain, but they may be too big.

AVR is an interesting example as it was explicitly designed to support C compilation (and it does, far better than PIC). However, 32-bit code ends up suffering badly from having to do everything in 8-bit steps and there aren't really enough registers.

Finally, the most impressive little core I've seen is Bernd Paysan's b16:

formatting link

It appears there was some effort to port GCC to it, but I don't know the status.

Good luck, Tommy

Reply to
Tommy Thorn

Or something similar, more modern:

formatting link

9mW per core at 1GHz.

Kolja Sulimma

Reply to
Kolja Sulimma

I don't think the 0.80 USD is so bothersome. As having to make space on a already tight pcb for another chip. Arranging powersupply, decouple, route properly without crosstalk etc.., find a reliable component source etc.. are the factors that makes it attractive to use the least number of components as possible.

Reply to
sky465nm

In article , Jim Granville writes

Nooooo... please, please noooo... I still have nightmares....

--
Steve Goodwin...  www.p2cl.co.uk (includes contact details)
Reply to
Steve Goodwin

Actually yes, the COP8 is still being used as the core of some special purpose self contained devices. The COP8 has low analog noise and works well hybrid systems.

A main stream part as Ulf said it is now rarely used.

Walter..

Reply to
Walter Banks

Might want to look at 6804. Jack Ganssle's newsletter was recently talking about bit serial processors and I dug out the documentation for the 6804 and was surprised just how similar it was to RS08.

The interesting thing about multibyte opcodes is they are generally either address or data fields that get routed to some register or alu and I don't think would be all that complex to implement.

Walter..

Reply to
Walter Banks

You too? Perhaps we should start a support & recovery group.

Didn't you just *love* how you could insert a line of assembler, and trigger half a dozen errors elsewhere due to code crossing the page boundaries?

--

John Devereux
Reply to
John Devereux

I fear denial is the only remedy...

We ended up with so much code that we had to we add our own hardware bank switching on top of the pages... until *finally* we persuaded the company to change to a Z80 and c...

Happy days ... :)

--
Steve Goodwin...  www.p2cl.co.uk (includes contact details)
Reply to
Steve Goodwin

Bugs we learned hate. Byte Craft was new, mostly consulting then one of our customers was having a problem with their

8048 controller just up and failing. A static stack checker found 14 nested functions in an 8 level stack. (Lots of little functions so the code would fit) Processors like that paid the rent.

w..

Reply to
Walter Banks

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.