Parallax Propeller

P

Peter Jakacki 13 years ago

For some reason I found myself reading these newsgroups which I haven't really used for many years. It seems that although the Parallax Propeller web forums are extremely active there never seems to be much discussion elsewhere. Aren't you all curious as to what you are missing out on?

I have used just about every kind of micro and architecture over the decades and while I might still use some small "one buck" micros for certain tasks and some special ARM chips for higher end tasks I have almost exclusively been using Propeller chips for everything else. The reason is very simple, they are so simple to work with and there are no "peripheral modules" to worry about other than the counters and video hardware per cog. Every pin is general-purpose and any one of the eight

32-bit cores can use them, you don't have to worry about trying to route a special pin or have the dilemma of using the special pin for one function but not the other etc. The 32-bit CPUs or cogs are a breeze to program and you can work with a variety of languages besides the easy to use Spin compiler that comes with it.

For you Forth enthusiasts there has been a big flurry of Forth related threads on the Propeller forums of late and there are several versions of Forth including my own Tachyon Forth. IMHO this is the best way to get into the Propeller and it's hardware. I just can't be bother trying to cram functions and debug interrupts on other chips when I can set a cog to work on a task and still have plenty of resources left over including video on-chip.

If you are intrigued or just plain curious it won't kill you unless you are a cat to have a look at my introduction page which has links to projects and the forum etc.

formatting link

There is the sup'ed up version of the Propeller, the Propeller II coming out in the following months which has 96 I/O, 8 cogs, 128K RAM and

160MIPS/core.

BTW, I use Silabs C8051F chips as specialised peripherals (ADC etc) that hang off the "I2C bus" from the Propeller and I can load new firmware into each chip using just one extra line to program them in-circuit.

*peter*

Vote

M

MK 13 years ago

Had a quick look (again) - it's not very fast and it's not very cheap (compared with a CortexM4 clocked at 150MHz or more). Then there is all the pain of programming and synching multiple cores which is (I think) worse than the pain of managing DMA on a typical M4 based micro-controller. For some applications I'm sure it's a really good fit but I haven't hit one yet. It's similar in concept to XMOS and Greenchip offerings but like them suffers from a less than first rank supplier which would worry most of my customers.

MK

Vote

P

Peter Jakacki 13 years ago

Since I've had a lot of experience with the Propeller and ARMs and I'm familiar with M4s (still pushing out a design) then I can say that unless you actually sit down with it for an hour or two that you will probably continue to have this misconceptions. Is that the way that the Propeller comes across? I've always wondered about that.

No, your idea of the Propeller is so off the mark, I'm saying that not to offend, just setting the record straight although I appreciate the honest opinion as this gives me something to work on in explaining what it is and what it is not.

First off, don't try to compare this with an XMOS or GreenArrays (I think you mean) as they are very different. No, the Propeller is more like 8 identical 32-bit CPUs + counter and video hardware and 512 longs of RAM each sharing all 32 I/O in common and coupled to a multi-port 32kB RAM (hub). Sounds strange? Okay, but the eight cogs are not really utilised for "parallel processing" but think of each cog as either a CPU or a virtual peripheral or both. The way it's used is that you might setup one cog as an intelligent quad port UART, another as the VGA cog, another as keyboard and mouse etc while maybe only one cog is processing the application. When you see it in action it is very simple to program even for a beginner.

Seeing that each cog only has 512 longs of RAM which it directly addresses both source and destination in each 32-bit instruction, it becomes necessary then to run a virtual machine for high level application code. This is what Spin does as it compiles bytecode in hub RAM while one or more cogs may have the Spin interpreter loaded. My Tachyon Forth does something similar but has a much faster runtime speed and of course the target hosts a Forth development and runtime system. So I use bytecode that directly addresses the code in the first 256 longs of the Tachyon VM. Each cog runs at a maximum of 2.5M Forth bytecode instructions/second and since there are no interrupts!! nor any need for them then each cog runs unencumbered. Realtime software development and testing has never been easier.

Hopefully I've put it in some kind of nutshell. What do you think? Weird? Yes, but it works really well :)

*peter*

Vote

P

Peter Jakacki 13 years ago

Almost forgot, as for "first rank supplier" I can only say that all the big companies are the ones that let you down because they upgrade and suddenly the chip you are using is no longer current and may become hard to get and of course expensive. Parallax have supported the little guy just as well as the big and they have a track record of continuing to support their product (in many ways) over the years with assurances as well. How many companies these days are willing to make their chips available in DIP as well as SMD simply for "that" market?

I have STM32F4 designs I'm working on mainly for Ethernet, as well as other vendor's ARMs and while the chips are impressive you can't really compare their raw speed with what the Propeller does and how it does it.

Price is relative too but 8 bucks for eight 32-bit fully utilisable cores is very good value but the price goes down to under $4.50 in volume.

*peter*

Vote

A

Andrew Haley 13 years ago

I think this looks really good. Fast and simple multi-tasking with even less overhead than the simplest Forth scheduler. Extremely low latency. Cheap, simple. I'm sorry I didn't hear about this years ago.

Andrew.

Vote

P

Peter Jakacki 13 years ago

Yes, when you run a task in a cog then that is all it has to do. There is no need for task switching or interrupts etc. Some of my biggest headaches had to do with mysterious glitches which always end up being traced back to the wrong interrupts at the wrong time, but only sometimes, just to make it harder to find.

The P2 which is due to be released soon allows each cog to run up to 4 tasks with zero switching overhead, you can even set the switching priority patterns in a 32-bit mask. But P2 runs 8 times faster than P1 just in terms of IPS alone without taking in account the many enhancements. Although the P2 has not yet been released there are many of us testing code for this on FPGA boards such as the DE2-115 or Nanos because Parallax made the FPGA binary for the P2 openly available. Can you beat that? Never ever heard of any chip company doing anything even close to that.

*peter*

Vote

A

Albert van der Horst 13 years ago

It sounds like GA144 done properly.

Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Vote

D

David Brown 13 years ago

I don't know about GreenArrays, but this sounds very much like XMOS devices. There are differences in the details, and the main languages for the XMOS are C, C++ and XC (sort of C with a few bits removed, and some parallel processing and XMOS features added) rather than Forth. But it's the same idea - you have multiple CPUs so that you can make your peripherals in software in a CPU rather than as dedicated hardware.

And I'd imagine it is similarly "easy" to work with - some things /will/ be easy, but other things will be a lot harder in practice. In particular, 8 cores/threads sounds great when you start, and you can write very elegant UARTs, flashing lights, etc. But in the real application you need more than 8 cores, and you start having to combine tasks within a single core/thread and the elegance, clarity, and ease of development go out the window. The other big problem is the small memories - you start off thinking how cool these devices are that are so fast you can make USB or Ethernet peripherals in software cores, using ready-made "software components" from the company's website. But then you discover that to actually /use/ them for something more than a flashing lights demo takes more ram than the chips have.

So it's a nice idea for some types of problem - but it is not unique, and it is only really good for a small number of applications.

Vote

R

Rafael Deliano 13 years ago

Un unusal/non-mainstream architecture will always be initally misjudged.

But some of the misconceptions are due to Parallax: if it has a smallish videogenerator then its probably a retro arcade game machine chip ? While the spin language is usable, there is nothing people like less then learning new weird languages. (

formatting link

"Spin was inspired by portions of C, Delphi, and Python, and a host of problem/solution scenarios ..." ) It could have easily been an extended BASIC or FORTH.

There are issues in implementation: The upper half of the 64k byte memorymap is a 32kbyte ROM that contains the small 4k byte spin interpreter. That ROM is rather underutilized with data tables. This spin interpreter firmware is encrypted

formatting link

Which makes implementing other languages ontop of it certainly much fun. Not much copy protection for the user as the chip boots the application from an external serial EEPROM to its internal SRAM.

There are issues in the basic concept: The round-robin access doesn´t scale very well. If you go from 8 to 16 cores, the clock has to double otherwise speed is down. So a 100 pin chip can hardly have 64 cores. One can think of COG-RAM to HUB-RAM as an analogy to the zero-page of an old 6502. But the speed penalty is much worse: 4 clocks for COG versus 8...23 for HUB. Identical cores are one-size-fits-all. The propeller may be flexible and fast concerning I/O, bit banging. But an I/O core that interprets bytecode is probably no match for even the smallest ARM in executing high level language.

MfG JRD

Vote

P

Peter Jakacki 13 years ago

Well it seems that unless you try it you can't really judge it, that's for sure and in trying to judge it everyone is way off the mark. In terms of critique of your critique I would have to give you a very poor mark though. To mention flashing lights followed by "etc" is indicative of an early negative response and a attempt at minimalizing, as if you were Bill Gates talking about Linux. As for real applications I am always producing real commercial products with this chip so I think I know (actually, I do know) what goes into real apps having worked with a very wide variety of small to large micros through the decades (and still do).

Having read the rest of your comments I can't see where your fascination for flashing lights is coming from but I hope you are cured soon :)

BTW, the Propeller does very nice blinking lights along with simultaneous VGA and graphics rendering along with audio and SD, HID etc while still handling critical I/O in real deterministic time. The language and tools were the easiest to learn and use of any processor I've worked with.

Can't you see I'm not trying to sell them, I just didn't want to keep on being selfish keeping this good thing all to myself and the secret Propeller "Illuminati".

*peter*

Vote

P

Peter Jakacki 13 years ago

QnaSpin.htm

+Page

Round-robin access doesn't have to scale, it is what it is and trying to judge it based on how it might operate as it is but with 64 cores is weird!! Just judge it as it is. Besides, the successor chip still has 8 cores but allows 8 longs to be accessed in one "round-robin" cycle. The fixed round-robin access provides a guaranteed access time for cogs which might otherwise have been upset by some resource crazed object.

The smallest or largest ARM has to handle more than one task plus interrupts etc so it never ever achieves it's full speed. If everything fits in 512 longs of a very code efficient cog then it is very fast but even more so because it's not bogged down like an ARM. I can't write any ARM code that doesn't have to do everything including washing the dishes but the ARM does have a large code space without a doubt. The high level code rarely needs to run fast though, it's all the low-level stuff that needs priority which on an ARM requires some complex interrupt processing to hopefully service that special interrupt in time. No such problem ever with the Propeller though.

When I first developed code for the ARM it was on the LPC2148 which I ended up entering into the Philips/NXP 2005 ARM design contest as the "noPC" with which I was bit-banging VGA graphics under interrupts along with SD access, audio generation, PS/2 keyboard & mouse as well as serial etc. That was a lot of hard work on the ARM and left very little processing time left for the application but doing the same thing on a Propeller chip is better, easier, and faster.

Perhaps you have been Googling and clicking on some very old links but the Spin interpreter source code etc is openly available and has been for years although I don't know why whether it's encrypted or not should have been worthy of any kind of mention.

Higher-end CPU users and those who require secured firmware should take a look at the forthcoming Propeller 2 due out sometime in the next few months or so.

*peter*

Vote

D

David Brown 13 years ago

I have worked with XMOS chips - but I never claimed to have used the Parallax (or GreenArray). I have just read some of the information from the web site, and considered possible applications and problems based on my XMOS experience - since the XMOS and the Parallax have a similar philosophy.

I have no doubt that you can do lots of fun things with a Parallax - and I have no doubt that you can do commercial projects with it. But I also have no doubt that 8 cores/threads is far too few to be able to dedicate a core to each task in non-trivial real world applications. And when you have to multiplex tasks within each core/thread, you have lost much of the elegance that you had by using the multi-core system.

Did you not know that every electronics card needs a flashing light, so that customers will know that it is working?

Seriously, it was obviously just a reference to test or demo software rather than fully functional software.

That's fine - and I am not trying to be critical to the devices (or your work). I am just pointing out a few realities about the devices, such as their similarities to the XMOS and limitations that I think users will quickly encounter. They look like an interesting architecture - and I am always a fan of considering new architectures and languages - but I am not convinced that they are a great idea for general use.

Vote

P

Peter Jakacki 13 years ago

From what I know about XMOS and also from those who work with the chip it is quite a different beast from the Propeller. The Propeller appears to be a simpler and easier to use chip and it's eight cogs are eight CPUs, not tasks.

Indeed I know that every pcb needs a flashing light!! How is it that there are so many designs out there that do not have such a rudimentary indicator just to tell you that there is power and activity/status. But indicators do not require the resources of a whole CPU and normally I run these directly from code positions or even from a more general-purpose timer "task" which looks after multiple timeouts and actions including low priority polling. Not sure what you are getting at by referring to "non-trivial" real world applications :) How non-trivial is this? Are all the real-time industrial control functions, communications and network protocols (wired, RF, and SM fibre), H-bridge motor and microstepping control, graphic display and input processing, SD file systems etc trivial? Okay, if so then I think you might be thinking that just because the Propeller has eight cores that it is some kind of parallel processing "beastie" but it's actually classed as a "microcontroller", not a PC killer.

*peter*

Vote

B

Ben Bradley 13 years ago

So you're PREVENTED from using interrupts to make programming easier.

... still has zero interrupts.

Yes, I first saw the propellor mentioned years ago, the 8 32-bit cores thing sounds nice, but no interrupts was a deal killer for me. A year or two back (with maybe earlier mention of the P2) I looked on the "official" support/discussion forums for the Propellor and saw this longish thread on "why doesn't it have interrupts" and there were posts there that covered every objection I've had or seen to a microcontroller not having interrupts, even "why not add interrupts? It would take very little silicon and you don't have to use 'em if you don't want to." It's against that designer guru guy's religion or something.

As far as I know there's no other microcontroller that doesn't have interrupts, and I can't recall one that didn't. The Apple ][ didn't use interrupts even though the 6502 had them. Maybe there were some

4-bit microprocessors that didn't have any interrupts.

Vote

P

Paul E. Bennett 13 years ago

One question you should ask yourself is why you think a parallel processor (particularly the mesh organised ones) really need to include interrupts. You have many processors, all the same, simple, no frills. You can afford to dedicate a processor to deal with inputs that need to be responded to rapidly without the need for interrupts. I know that, to some, it might seem a waste of a processor, but with heavily parallel chips you could afford to think about how you allocate I/O around the processor array.

Check back in the history of processors and you will see why the interrupt was thought to be necessary in the first place. With the world heading to the much more use of multi-parallel processor I suspect the need for interrupts will wane.

******************************************************************** Paul E. Bennett............... Forth based HIDECS Consultancy Mob: +44 (0)7811-639972 Tel: +44 (0)1235-510979 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

A

Arlet Ottens 13 years ago

Why not let the end user decide whether interrupts are useful ? Adding the support is not that complicated, and allows you to have a single processor performing multiple tasks, and still achieve low latency response to external events.

If I need a fast and predictable (but very simple) response to an external event, it is a waste to dedicate an entire processor to the job. If you don't care about wasting silicon, why not waste some on some interrupt logic, and offer the best of both worlds ?

Vote

D

David Brown 13 years ago

(Please do not snip newsgroups by using "followup to", unless the post really is off-topic in a group.)

Were there not small Microchip PIC devices without interrupts? The PIC12 series, or something like that (I didn't use them myself).

That argument might have merit /if/ this chip had many processors. It only has 8. I think the idea of splitting a design into multiple simple, semi-independent tasks that all work on their own cpu/core/thread, in their own little worlds, is very elegant. It can give great modularisation, re-use of software-components, and easy testing. But you need /many/ more than 8 threads for such a system with real-world programs - otherwise you have to combine tasks within the same thread, and you have lost all the elegance.

So then you might have a system with lots more cores - say 64 cores. Then you have enough to do quite a number of tasks. But to get that with a realistic price, power and size, these cores will be very simple and slow - which means that you can't do tasks that require a single core running quickly.

What makes a lot more sense is to have a cpu that has hardware support for a RTOS, and is able to switch rapidly between different tasks. That way demanding tasks can get the cpu time they need, while you can also have lots of very simple tasks that give you the modularisation in code without having to dedicate lots of silicon. The XMOS does a bit of this, in that it has 8 threads per cpu that can run up to 100 MIPS each (IIRC), but with a total of 500 MIPS per cpu, and it also has inter-process communication in hardware.

Look how many interrupts a modern PC or large embedded system has - they outnumber the number of cores by 50 to 1 at least. Interrupts are not going away.

Vote

A

Albert van der Horst 13 years ago

A modern CPU has 10 cores. So you say that they have far more than

500 interrupts. An Intel with an interrupt vector table of 500? The largest IRQ I have seen fiddling in the BIOS startup screen is

I may look at this the wrong way, but 500+ seems excessive.

Groetjes Albert

Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Vote

G

George Neuner 13 years ago

There has been much discussions in comp.arch re: this very question. The consensus has been that interrupts are extremely difficult to implement properly (in the hardware and/or microcode), and most chips don't do it right, leading to the occasional unavoidable glitch even when handler code is written correctly per the CPU documentation.

There also has been much discussion of non-interrupting systems where cores can be devoted to device handling. The consensus there is that interrupts per se are not necessary, but such systems still require inter-processor signaling. There has been considerable debate about the form(s) such signaling should take.

George

Vote

P

Paul Rubin 13 years ago

The GA144 has no interrupts since you just dedicate a processor to the event you want to listen for. The processors have i/o ports (to external pins or to adjacent processors on the chip) that block on read, so the processor doesn't burn power while waiting for data.

Vote

Parallax Propeller

Join the Discussion

Didn't find your answer?