40 core embeddified processor

M

Martin Griffith 17 years ago

hmmm

formatting link

martin

Vote

W

Wilco Dijkstra 17 years ago

formatting link

Definitely deserves a price for the wackiest microprocessor. You need a nop after every addition, there is no subtract (just figure out how to do it using xor and add), 16x16 multiply takes 40 instructions, only supports Forth...

Wilco

Vote

J

Jim Stewart 17 years ago

formatting link

xor

Sorta like an ILLIAC IV on a chip.

Vote

M

Martin Griffith 17 years ago

formatting link

xor

I'm really out of my depth on this sort of thing, and was just wondering what the pro's here think. That's why I put a "hmmm" in my OP. I wonder what the dev system is like, nope, can't be bothered to D/L it

martin

Vote

W

Walter Banks 17 years ago

More like ILLIAC XL

w..

Jim Stewart wrote:

formatting link

using xor

Vote

W

Walter Banks 17 years ago

formatting link

using xor

Martin

It is a weak instruction set. 64 words of ram and rom per processor is strange. It isn't clear (to me) if the RAM is also the stacks. The total ROM is 2560 for 40 processors which limits the size of applications that can be described (I know instructions are packed offset some by packing and execution order limits).

The web site doesn't make a compelling case for applications that could effectively use this processor. Compare this processor with the parallax Propellor for example. The parallax processor is far more flexible

Walter..

Vote

M

Martin Griffith 17 years ago

formatting link

using xor

Thanks for the parallax Propellor pointer, it will give me another chance of improving my incompetance again :) It looks fun

martin

Vote

A

Alex Colvin 17 years ago

formatting link

See also

formatting link

64 cores with 64k cache each,

mac the naïf

Vote

W

Wilco Dijkstra 17 years ago

formatting link

using xor

The return and data stack are separate and have 9 and 10 entries.

The ROMs contain library calls for most basic of operations (like subtract or multiply).

The 64-word RAMs probably give around 128 instructions after all the packing losses, required nops and inline literals. Given each instruction does very little this is not much at all.

The Propellor is pretty odd as well (2 operands, 9-bit direct memory address) and also uses an unconventional programming language. Personally I'd prefer a fully featured core with single cycle instructions capable of running C.

Wilco

Vote

4

42Bastian Schick 17 years ago

:-) All these designs sound like someone wants to make the wheel a bit rounder then others (and fails). There must be a reason why no new design (despite DSP) realy made it to the common market.

42Bastian Do not email to bastian42@yahoo.com, it's a spam-only account :-) Use @monlynx.de instead !

Vote

S

sodaant 17 years ago

ess)

refer

.

I had high hopes for the Propeller until I read the data sheet and discovered its limitations.

In addition to the unconventional programming language (Spin), it also lacks in the following areas:

Only 2K of RAM per cog (32K RAM/ROM main memory is only accessible in interpreted Spin code)
No integrated peripherals (except for timers and GPIO). Everything else (e.g. I2C, SPI, ADC, USART, etc.) has to either be bit banged or added externally
No JTAG debug support. This is a big one for me. The Parallax development platform (Propeller Tool) has no debugging capabilities (single step, breakpoints, etc.)
No interrupts, although the workaround here is to dedicate one or more cogs to polling for events

Vote

L

Leon 17 years ago

dress)

prefer

C.

I've played with it, and the Propeller can do some interesting things. Each cog has graphics hardware, and it is quite feasible to generate graphics and sound simultaneously, and do lots of other things at the same time. It mainly appeals to hobbyists.

I'm more interested in the XMOS chips:

formatting link

It's designed by David May, who designed the transputer. I used to develop transputer-based systems, and it will be nice to work with something similar.

I have a contact at XMOS, and he has told me that they already have some design wins. I'm going to one of their seminars next month - $99 including a 1600 MIPS dev kit..

Leon

Vote

S

sodaant 17 years ago

I agree. That's where its marketing appears to be directed. Even the "propeller head" logo looks amateurish.

Anyone know if the Propeller is being used in a shipping commercial product?

Vote

W

Walter Banks 17 years ago

My understanding is that some are being used in low volume applications. Parallax have a long history of supporting advanced hobbyists and low volume industrial users with their products. Their customer support is very good.

Walter..

snipped-for-privacy@gmail.com wrote:

Vote

V

vinnie 17 years ago

address)

prefer

C.

Having a C compiler and OS that can take advantage of a multi-core processor can be a great advantage, and give developers a way to get more performance in an embedded system without having to look at 1Ghz + chips from Intel, etc.

Renesas ships SH4A and SH2A RISC micros with a superscalar core (2 instructions per cycle) and will offer general purpose dual core versions to the general market very soon. (Already shipping for specific OEM markets.)

formatting link

Debugging is done via JTAG, and support for multiple OS/RTOS is being done.

Vote

J

Jeff Fox 17 years ago

messagenews: snipped-for-privacy@4ax.com...

No. You need a nop *before* some addition opcodes depending on the previous state of carry bits and the size of your arguments.

The fact that a nop is sometimes needed before an addition is due to one of the patented architectural features related to instruction execution and instruction decoding happening in parallel (on each core) as opposed to the more conventional approach of having to decode operand bits and setup paths before opcodes can be executed.

There is no subtraction opcode, there is subtraction. ;-) You can use NOT and add or the more traditional NOT ADD and ADD 1.

A 16x16 multiply takes 16 steps which can be done in a two instruction loop or four instruction loop repeated eight times. Setting up the loop takes a couple of instructions. If one of the arguements is not pre-scaled then a few shifts.

That's silly. All processors only process their native code if you want to get exact but people don't say that ARM only execute ARM instructions or that Pentium only execute Pentium instructions. It is true that the native code is a Forth dialect. This makes Forth about the easiest high level language to implement since the instruction set is designed to offer about 80% of dynamic Forth execution as small and fast five- bit native opcodes. The other 20% of Forth words by execution frequency have to be macros or calls with more than one five-bit opcode.

The S40 follows the S24 that you declared impossible last year when you learned that it had neither pipelines or cache and was getting 750mips per 20k transistor core in 0.18u.

I assume that to you wacky just means it doesn't look like typical processor designed for C. It also helps explain why you made four points and made four errors in understanding the design.

Best Wishes

Vote

J

Jeff Fox 17 years ago

snipped-for-privacy@4ax.com...

...

nop

t using xor

It is probably more like 40 transputers on a chip each sped up by about 75x with some a/d and d/a and digital I/O on chip.

Best Wishes

Vote

J

Jeff Fox 17 years ago

The instruction set was based on more than a decade's research into instruction profiles on real programs to get as much of the Forth language (in dynamic execution of real programs) into small opcodes and to minimize macro sequences for the rest. The size of RAM and ROM is based on cost, and speed tradeoffs and on what real programs needed in real applications.

It is a design that offers lots of processing power but not much local RAM and it is fine tuned for the sort of applications described in the literature.

The size of ROM isn't too much of a factor. ROM code is masked and for the most part not something users can change unless they order a custom chip variation. ROM contains boot code, math and I/O drivers etc. which can be called from RAM code to reduce the size of that RAM code.

The size of programs that can be executed is limited by external memory just as with other processors. Processors can execute streams and if you stream in a ten gigabyte program you can execute a ten gigabyte program. It is similar to saying that computers can execute programs larger than what they can hold in local cache memory.

That is correct. The stacks are banks of registers. The stack registers are faster than addressable registers because they don't require addressing. That's one of the architectural features.

t

ing

ry little

The idea is that each node will only execute instructions in its ram, up to

256 opcodes, when speed is important. Each node is suppose to do very little. As a size indicator 64 words is more than is needed to implement packet boot from SPI, asynchronous or synchronous external interfaces.

There was Forth written for the Parallax propeller and it claimed to only require ten propeller opcodes for each Forth primitive. That means you get eight processors each running at two Forth mips and time-sharing one I/ O interface. That's very different that fourty processors running at

750 Forth mips each in a smaller chip.

ess)

refer

.

We knew that you prefer bigger things designed to run C code. To each their own.

Best Wishes

Vote

J

Jim Granville 17 years ago

ROM seems to be very small - or is it OTP ? Given the very low area cost of true ROM, normally you would expect more of it ?

So this can execute from SPI, or even QuadSPI memory ?

Any prices on the devices, and development systems ?

How does debug work - I see no mention if ICE/single step, but do see mention of Simulation ?

Target markets ?

-jg

Vote

W

Walter Banks 17 years ago

more conventional approach

executed.

What was the design logic to require a nop in the execution stream and not have the processor logic insert a pseudo nop or extra cycle when needed?

Regards

-- Walter Banks Byte Craft Limited

formatting link

snipped-for-privacy@bytecraft.com

Vote

40 core embeddified processor

Join the Discussion

Didn't find your answer?