Parallax Propeller

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jan 22, 2013 7:39 AM

Of course you could design a core which restarts every time an external or internal interrupt occurs (such as a request packet sent by an other core), run to completion and pot core in low power halt state.

Of course, this works for some problems, but sooner or later you end up with a hellish state machine, which remembers, where you were when the previous interrupt occurred.

- M
- Mel Wilson
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jan 22, 2013 3:14 PM

It does look as though the AVR TWI interface was designed to be controlled just that way.

Mel.

- M
- Mark Wills
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jan 24, 2013 10:03 PM

?

I'm inclined to agree, though I've only had experience with 'classic' micro-processors in this regard, so maybe my thoughts on the issue are simply out of date. I can see that if you have a lot cores you can effectively make your own interrupt controller by dedicating a core or more to it. That idea seems to make sense on a simple device like the GA devices, where each core is very primitive in its own right, so one can argue that the 'cost' of assigning a core to the task of interrupt detection is low. However, the idea does not sit well with me when talking about complex devices such as the Propeller. Dedicating a cog to interrupt control sounds bonkers to me, especially when a cog has its own video controller - that's real overkill.

I get the impression that the Propeller is somewhat dumbed-down for the hobbyist market. I cite its programming language, and the lack of interrupts as two examples. Why they couldn't they add a 9th super- simple core just for interrupts, that could pipe certain types of interrupts to certain cogs? Best of both worlds.

The TMS99xx family of processors (very old) has 16 prioritised cascading interrupts. Probably inherited from mini-computer architecture. Very very powerful for its day. Since they were prioritised, a lower level interrupt would not interrupt a higher level interrupt until the higher level ISR terminated. Makes serving multiple interrupts an absolute doddle. Not bad for 1976.

- N
- None
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Thu, Jan 24, 2013 11:26 PM

I played with the PropTerm and found that the CPU-generated VGA bit stream (a CPU got dedicated to the task) resulted in displays which always had a little bit of fuzziness. It worked, and was quite readable, but the sharpness of a regular PC display really made me aware of the limits of a pure software approach to analog generation.

Andy

- H
- Hugh Aguilar
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jan 25, 2013 6:55 AM

Doddle? I've never heard that word before. Is a doddle good or bad?

- M
- Mark Wills
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jan 25, 2013 7:03 AM

doddle = extremely simple/easy

"Did you manage to fix that bug?" "Yeah, it was a doddle!"

:-)

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jan 25, 2013 2:25 PM

How do you know the display "fuzziness" was due to software timing? I would expect software timing on a clocked processor to be on par with other means of timing. There are other aspects of design that could cause fuzziness or timing ambiguities in the signal.

Rick

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jan 25, 2013 4:49 PM

I have worked on a couple event driven ISA designs. Jitter is visible on displays but it is equally problematic with control systems. The best solution

that I have seen/used is to have the hardware transfer out a precomputed value or latch an input on the event interrupt trigger. Output values are almost always known in advance.

This minor change has essentially little impact on the processor silicon complexity.

A second important performance issue is to have an easily accesses data area associated with each interrupt source. It means that a lot of common code (pwm, ac phase control... ) can be a single executable. In some cases preloading an index register with the start of data for that interrupt in some hardware has significant performance improvements.

Walter Banks Byte Craft Limited

- C
- Coos Haak
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jan 25, 2013 7:02 PM

Op Thu, 24 Jan 2013 23:03:32 -0800 (PST) schreef Mark Wills:

As we say: een fluitje van een cent. A flute of a cent does cost nearly nothing and can be made for nearly nothing. There is a herb (Anthriscus sylvestris) we call Fluitekruid. Due to nearly French purism (hash-tag vs. mot-dièse) we must write Fluitenkruid, as if it were plural.

--
Coos 

CHForth, 16 bit DOS applications 
http://home.hccnet.nl/j.j.haak/forth.html

- N
- None
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Fri, Jan 25, 2013 9:57 PM

If you look at the inner loop driving the output pin, you can do a min/max skew calculation which ends up with quite a bit of jitter on the table. The product is the PockeTerm, you can pick one up at:

formatting link

It's open source, VGA_HiRes_Text.spin is the low level driver for VGA output. Note it actually uses *two* CPUs, and is some pretty darn cool assembly code--written by the president of the Propeller company!

Andy Valencia Home page:

formatting link

To contact me:

formatting link

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Jan 26, 2013 11:07 PM

I don't follow what causes the skew you mention. Instruction timings are deterministic, no? If not, trying to time using code is hopeless. If the timings are deterministic, the skew should not be cumulative since they are all based on the CPU clock. Is the CPU clock from an accurate oscillator like a crystal? If it is using an internal RC clock, again timing to sufficient accuracy is hopeless.

Rick

- N
- None
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Jan 27, 2013 3:09 AM

The chip has a lower level bit stream engine which the higher level CPU ("cog") is feeding. Well, a pair of cogs. Each cog has local memory and then a really expensive path through a central arbiter ("hub"). It fills its image of the scanlines from the shared memory, then has to feed it via waitvid into the lower level. Note that it's bit stream engine *per cog*, so you also have to worry about their sync.

So yes, instruction timings are deterministic (although your shared memory accesses will vary modulo the hub round-robin count). You need to reach the waitvid before it's your turn to supply the next value. But given that, this is much like the old wait state sync feeding bytes to a floppy controller. PLL and waitvid sync are achieved with magic incantations from Parallax, and it is not 100%.

The board has a CPU clock from which the PLL derives the video output frequency. I recall the CPU clock being based on a crystal, but not one with any consideration for video intervals. And the PLL's are per cog, again my comment about (potential lack of) global sync.

Anyway, you should buy one and check it out. I'd be curious to hear if (1) you also observe the same video quality, and (2) if you think it's the waitvid mechanism, more the PLL->SVGA generation, or the sync issues of the paired video generators. They even supply the schematic, FWIW.

Andy Valencia Home page:

formatting link

To contact me:

formatting link

- M
- Mark Wills
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Jan 27, 2013 8:50 AM

I

max

n

The instruction times are deterministic (presumably; never written code on the propeller), but when generating video in software, *per scan line* all possible code-paths have to add up to the same number of cycles in order to completely avoid jitter. That's very hard to do.

Consider a single scan line that contains text interspersed with spaces. For the current horizontal position the software has to:

Determine if background or foreground (i.e. a pixel of text colour) should be drawn

If background * select background colour to video output register

If foreground * determine character under current horizontal position * determine offset (in pixels) into the current line of the character * is a pixel to be drawn? * If yes, load pixel colour * otherwise, load background colour

The second code path is a lot more complex, containing many more instructions, yet both code paths have to balance in terms of execution time. This is just one example.

This is how video is done on the original Atari VCS console. 100% software, with the hardware only providing horizontal interrupts (one per scan line) and VBLNK interrupts, IIRC.

Caveat: The above assumes that there is no interrupt per horizontal pixel. With interrupts, it's much easier. The Propeller doesn't have any interrupts so software video generation would be non-trivial to say the least. The easiest way would be to provide a pixel clock and use an I/O pin to sync to, as Chuck found out for himself when implementing video on the GA144.

- A
- Andrew Haley
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Jan 27, 2013 9:30 AM

I really don't understand why you say this. You need to be able to sync to a timing pulse; whether this is done with interrupts doesn't matter.

Andrew.

- D
- Dombo
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Jan 27, 2013 11:58 AM

On the Atari VCS the software did not have to send out the individual pixels. The TIA chip had memory for a single scan-line, which the TIA chip converted to a video signal autonomously. The software just had to make sure that the right data was loaded into the TIA chip in time for each scan-line, it could finish doing that before the end of the scan-line, but not after that. The TIA chip has also a function to stall the CPU until the start of the next scan line. I.o.w. the software had to be fast enough for each possible execution flow, but did not have to complete in the exact same number of cycles.

- H
- Hugh Aguilar
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jan 29, 2013 1:13 AM

Maybe the reason why we don't have "doddle" or any similar word in America, is because we never do anything the simple/easy way here! :-)

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jan 29, 2013 2:38 AM

I'm not getting it. I guess the software had to be done this way to optimize the CPU utilization. The "proper" way to time in software is to have the video data already calculated in a frame buffer and use spin loops to time when pixels are shifted out. That way you don't have lots of processing to figure out the timing for. But you spend most of your processing time in spin loops. Why was it done this way? To save a few bucks on video hardware? That's just not an issue now days... unless you are really obsessive about not using hardware where hardware is warranted.

I would have to go back and reread the web pages, but I think Chuck's original attempt was to time the *entire* frame timing in software with NO hardware timing at all. He found the timings drifted too much from temperature (that's what async processors do after all, they are timed by the silicon delays which vary with temp) so that with the monitor he was using it would stop working once the board warmed up. I'm surprised he had to build it to find that out. But I guess he didn't have specs on the monitor.

His "compromise" to hardware timing was to use a horizontal *line* interrupt (with a casual use of the word "interrupt", it is really a wait for a signal) which was driven from the 10 MHz oscillator node, like you described the Atari VCS. He still did the pixel timing in a software loop. With 144 processors it is no big deal to do that... *OR* he could have sprinkled a few counters around the chip to be used for

*really* low power timing. Each CPU core uses 5 mW when it is running a simple timing loop. One of the big goals of the chip is to be low power and software timing is the antithesis of low power in my opinion. But then you would need an oscillator and a clock tree...

I think there is an optimal compromise between a chip with fully async CPUs, with teeny tiny memories, no clocks, no peripherals (including nearly no real memory interface) and a chip with a very small number of huge CPUs, major clock trees running at very high clock rates, massive memories (multiple types), a plethora of hardware peripherals and a maximal bandwidth memory interface. How about an array of many small CPUs, much like the F18 (or an F32 which rumor has is under development), each one with a few kB of memory, with a dedicated idle timer connected to lower speed clock trees (is one or two small clock trees a real power problem?), some real hardware peripherals for the higher speed I/O standards like 100/1000 Mbps Ethernet, real USB (including USB 3.0), some amount of on chip block RAM and some *real* memory interface which works at 200 or 300 MHz clock rates?

I get where Chuck is coming from with the minimal CPU thing. I have said before that I think it is a useful chip in many ways. But so far I haven't been able to use it. One project faced the memory interface limitation and another found the chip to be too hard to use in the low power modes it is supposed to be capable of, just not when you need to do real time stuff at real low power. It only needs a few small improvements including *real* I/O that can work at a number of voltages rather than just the core voltage.

Oh yeah, some real documentation on the development system would be useful too. I think you have to read some three or more documents just to get started with the tools. I know it was pretty hard to figure it all out, not that I *actually* figured it out.

Rick

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jan 29, 2013 2:51 AM

Weird, your posts all show up in my reader as replies to your own messages rather than replies to my posts. The trimming made it hard for me to figure out just what we were talking about with the odd connections in my reader.

I can't picture the processing with this description. I don't know about the higher level and lower level CPUs you describe. Are you saying there is some sort of dedicated hardware in each CPU for video? Or is this separate from the CPUs? Why a *pair* of COGs? I assume a COG is the Propeller term for a CPU?

Not 100%? What does that mean? Magic? I guess this is the magic smoke you want to keep from getting out of the chip?

I still don't know enough about the architecture to know what this means. I don't care if the CPUs are not coordinated closely. If you have a video engine providing the clock timing, why would the CPU timing matter?

I appreciate your enthusiasm, but I have my own goals and projects. I am currently oriented towards absurdly low power levels in digital designs and am working on a design that will require no explicit power source, it will scavenge power from the environment. I don't think a Propeller is suitable for such a task is it?

Rick

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Tue, Jan 29, 2013 8:15 AM

I am not sure, but I think "doddle" is perhaps a Scottish term. As far as I could tell from an online dictionary, the origins are from the German word for bagpipe...

Another very useful term is "fangle", which is the Scottish word that perfectly describes the organisation of cables on most embedded developers' desks.

- N
- None
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Wed, Jan 30, 2013 2:10 AM

Sorry. I'm assuming your reader is threading via the "References" field? It looks like my posting software is preserving that.

Yes, each cog has its own PLL and "video" bit stream engine (quotes because they claim it can be used for any sort of analog stream in general). They needed to use a pair of cogs (CPU's) because of the time it takes to pull from screen memory as conceived by the ANSI emulation and generate the scan lines to represent the font plus underline plus cursor. So the idea is one is doing all that while the other is painting scan lines. Double buffering, basically.

Yes, there is no formal/deterministic way to lock the PLL's of the two cogs. Everybody uses the sample code Parallax provided, and it has definitely been shown that their "lock" can be skewed.

They have *two* video engines. Each is generated from its own PLL, so the first global clock is a crystal oscillator.

Darn, because I'm pretty sure you are much better equipped to drill down into this than I. :-> But, no way, a Propeller is definitely a traditional CPU for your purposes.

Andy Valencia Home page:

formatting link

To contact me:

formatting link