New embedded CPU architecture - Page 2

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: New embedded CPU architecture
Quoted text here. Click to load it

On the point of how many general purpose registers there should be,
I'd like to just point out that the Infineon TriCore does a full
context save in 2 clock cycles!  This includes a total of 32 x 32-bit
registers.  At the same time, if you don't need all the registers,
half the context is automatically saved as the processor starts to
respond to an interrupt.  SO, you're talking about sub-100nS context
saves/ restores on a 40MHz clock.

It's a good architecture to look at for some very advanced
architectural features, involving response time and performance.  It's
also not as stack-dependant as older architectures tend to be.


Join the Infineon TriCore Users Group Mailing List
Details - /

Re: New embedded CPU architecture
Quoted text here. Click to load it

That simply can't be the full story. Presumably you mean 2 core clock
cycles, not memory clocks? All that data will probably have to hit RAM at
some time, outside of your control, and then the real cost comes (writing
128 bytes at memory speed).

Quoted text here. Click to load it

Re: New embedded CPU architecture
Quoted text here. Click to load it

If you look at the registers as being an on-chip separate memory,
with no connection with real memory, you simplify things.  I would
make the underlying machine easy to implement a stack machine on
by placing a portion (top 4 or 8 items) of the stack in
registers.  These can be thought of as a queue, the idea being to
minimize actual stack access.  A push, when full, will empty, say,
4 of them by automatically writing to the external stack.  A pop,
when empty, will reload 4 again.

This leaves 24 of the proposed on chip registers.  Make 16 real gp
registers.  Of the remaining 8 we have:

   program status (PSW)
   instruction counter
   stack pointer
   local base pointer
   global data pointer

with space for future expansion.  The status word includes the sp
offset into the 8 on chip stack registers, and the marker for
junction with memory there.  Both 4 bits.  Also a field to
identify the register group in use.

Bear in mind this is just a rough outline.  There will be gotchas.

Now look at the action of an interrupt.  Each discrete (say 4)
interrupt specifies only a new set of registers.  The PSW is
copied to a dedicated register, the register set switched, and the
saved PSW pushed.  No need to save stack pointers, instruction
counter, etc.  The interrupt exit is simply to pop that saved PSW,
returning the original register set etc.

Things have to be set up, so the register addressing has to be
able to write into all those other registers.  This is privileged,
by using it in register set 0 only.

Possibly a pair of non-switched registers are needed, for process
to process communication.

The instruction set basically operates on the stack, which is
usually in registers.  Add is effectively pop, pop, sum, push,
etc.  Load converts an address on the stack to a value. Store move
top of stack (TOS) to the address below it, and pops both values.
It may be convenient to have those instructions able to impose a
small stack offset, which if non-zero leaves the address on stack.

This system also provides for multi-processing, and context saving
is already handled. All the executive has to do is decide which
process runs next, and set the appropriate reg. set value to
return to.

The whole system is fairly easily simulated.  It makes it easy to
avoid accessing external memory, whether cached or not.
Instruction access can be through an on chip cache.

Chuck F ( (
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: New embedded CPU architecture
Quoted text here. Click to load it

You're right, it isn't the full story, of course :-)  The 2-cycles
apply to context saving/ restoring from on-chip memory.  In the case
of the TC1775 implementation, there is 32KBytes of on-chip memory that
can be used for this purpose.

It takes 64 bytes (16x32-bit registers) for a half context.  Often,
this is enough though no OS will do that for you directly - it would
require manual coding to beenfit from this.

If you decide to use, say 25K of the on-chip 32K for contexts, you are
looking at storing 400 half-contexts or 200 full contexts.  Depending
on the needs of your application, this may be sufficient and extremely
attractive!  But, yes, the numbers will change if it has to go to
off-chip memory.

But, the main point I wanted to make was that context save/ restore on
the TriCore architecture is hardware-assisted and is potentially very
fast - sub 100ns context switches do make the mouth drool a tad bit


Join the TriCore Users Group Mailing List
Details - /

chip sales numbers

I am looking for recent figures on microprocessor
sales.  Detail is not important (ie not interested
in break down by geographic region, or market sector),
but I would like some degree of authority (ie not the
usual quote from 5 years ago about what will be
happening in 5 years time).

I want to be able to cite the source in a paper.  So
it must be publiclly available.


Re: New embedded CPU architecture
Quoted text here. Click to load it

Sounds lke homework to me...

First draw up a list of current microcontrollers.
Then list : what they have in common, what they have that
is unique, and what market share each has.

List market share against release date of original core, and
software maturity.

 Next, define your target users, and what
MIPS/CODE/RAM/PIN/Peripheral/Price numbers they will need.

 Once you have CODE/RAM defined, you can derive opcodes to efficently
access those areas.
 Conversely, if you do not define CODE/RAM targets first, you cannot
design efficent opcodes.

 'Do efficent opcodes really matter' can be next semesters homework...

Re: New embedded CPU architecture
Quoted text here. Click to load it

Nope, I have actually designed and implemented the CPU and toolchain
already. Just looking for a few new ideas to make it stand (further)
out from the crowd when I release it.

Say what you want, and you might just get it!


Re: New embedded CPU architecture (Jon Beniston) wrote in message
Quoted text here. Click to load it

Look at the Mcore and NIOS instruction sets to see what instructions
they thought were important enough to keep in.

Certain bit operations are hard to do in software but trivial in
hardware. For example, find-first-bit is just a priority encoder. Nice
to have if you simulate floating point. Also, a REP instruction
(repeat next instruction n times) doesn't take much hardware.

You can check out the instruction set of my soft CPU at which is being used in a .35u SoC so it
won't be too long before I find out how well it works for real.

Brad Eckert

Re: New embedded CPU architecture

Quoted text here. Click to load it

REP should be avoided like the plauge - it might not take much hardware to
implement the execution, but it would play havoc with interrupts.  You'd
either have to wait for it to complete (giving you undeterministicly long
interrupt delays) or make it restartable (leading to much more complex
interrupt handling overhead just to cover this one case).  It's far better
to have a sort of a DBNZ (Decrement and branch if not zero) instruction to
repeat the *previous* instruction, using a general register as the counter
which you have loaded earlier.  If you have an instruction prefetch queue a
few instructions long, you can get this to run at top speed with no memory
accesses to read the program - without having to make any changes to the
interrupt system.  (The 68332 has this "loop mode" - it is totally
transparent to the user, and to almost all of the processor's hardware, but
greatly speeds up memory copy loops.)

Quoted text here. Click to load it

Re: New embedded CPU architecture
Quoted text here. Click to load it

Is this going to be a soft core, or FAB'd in what process silicon ?

What are the target RAM.CODE sizes/speeds it will work with ?

Does it need to fetch code from external memory ?

 One feature, seen in the Z8, and C166 but missing in the AVR,
is a register frame pointer.
 Some 80C51 variants are being talked about with a RAM frame pointer,
and Ubicom have a natural extension to task switch, by using
HW to time-slice such Frame pointers.


Re: New embedded CPU architecture
Quoted text here. Click to load it

It will be available as a soft core (Verilog). It will be free to
hobbiests / academics. It will be in silicon before the end of the
year as part of a  licensee's SoC.
Quoted text here. Click to load it

Work on the compiler is continuing, but currently code size is as
least as good as m6811/arm-thumb/mips-16. Clock-for-clock, performance
is better.

As it's a soft-core, the RAM interface is up to you.

Quoted text here. Click to load it

You can have it on or off chip, its up to you.


Re: New embedded CPU architecture

Quoted text here. Click to load it

   There's probably a few more things: maximum clock speed, minimum
clock speed (zero (static) is best), total power at max and min
speeds, MIPS/milliwatt, ALU features such as MAC (as an instruction in
the CPU core, not as a peripheral as the 430 does it), barrel

Quoted text here. Click to load it


Re: New embedded CPU architecture

Quoted text here. Click to load it

How about direct support in hardware for multi-threading ? Multiple
Program Counters etc. With mechanisms where a specific interrupts
can be allocated to a specific thread.

   Anton Erasmus

Re: New embedded CPU architecture
Quoted text here. Click to load it

Doesn't sound too good to me: You would limit the number of threads in HW or
your OS kernel would have to revert to the conventional context switching -
with the additional overhead of checking if there's enough HW resource. You
would also have to design/manufacture/pay for HW for the high (constant)
number of threads even if your particular application would use only a
fraction of those.

Many CPU architectures support alternative register sets for interrupt
handlers (ADSP 21xx and ARM come to my mind) but that's not exactly what
you're talking about, it's far from being generic.

Some RISC architectures support a moving window over their register bank (I
don't know the right term) so that the caller and the callee can work on a
different set of registers, thus implementing some limited stack in the
registers. That's also somewhat different from your idea though.

Andras Tantos

Re: New embedded CPU architecture
andras says...
Quoted text here. Click to load it
Yes and no.  The applications I've done have had only a few threads that
required the fastest response.  The 80c196 (Intel) and 80c166/ST10 (SGS &
Infineon) use mechanisms similar to this.  Although I've yet to use an
RTOS that takes advantage of them they are very useful for fast interrupt
responses since you only have a few registers to save for context rather
than 10(s).  Something like

Receive interrupt
save current context pointer
load interrupt context pointer
do interrupt processing
restore previous context pointer
return from interrupt

Where the context pointer register points to a bank (window) of registers
that contains the stack pointer, ALU registers and a working register

Compare that to the more usual

Receive interrupt
save current register set (all registers in set)
do interrupt processing
restore previous register set
return from interrupt

The key question of course is how many registers you have in the working
set.  If it's only a few you've gained nothing, but if its in the
order of 10's there is the potential for a much faster context switch.  
Also the interrupt gets it's own stack rather than needing to reserve
space in all the task stacks for interrupt overhead.

If you extend this context switch to task level (and nothing in either of
the architectures I've mentioned would appear to make that difficult),
you would presumably reserve the limited available register contexts for
your most critical tasks which will need the extra planning overhead


Re: New embedded CPU architecture
On Wed, 1 Oct 2003 11:49:46 -0700, "Andras Tantos"

Quoted text here. Click to load it

My idea is that you keep the hardware threads for device driver type
code. One can still run a normal RTOS, with arbitrary number of tasks
as one of the threads. Also I do not mean that one should try and
speed up context switching.  The hardware threads must be able to
execute concurrently. I.e. either interleav instructions of different
threads, or have some parallel hardware to execute code from
different thread simultaneously.
One thread can execute a TCP/IP stack, another thread can
execute some real time filter on audio date, or a software modem.

With something like a TCP/IP stack on current processors. The amount
of CPU cycles needed by this code is highly dependent on Net trafffic.
Most of this traffic might even be discarded by the stack.

Quoted text here. Click to load it

All these architectures try and minimise the time for a context
switch. I think if one can get away with no context switching for
at least some or most of the device driver type tasks, then the
overall speed will be improved. I also think that one can make things
a bit more determanistic for protocol stacks etc.

   Anton Erasmus
Quoted text here. Click to load it

Re: New embedded CPU architecture
Quoted text here. Click to load it

I am in Total Agreement with You...
A Multithreaded processor would be real cool.
I did a White Paper on Multithreading at National, but they are just
focusing on Analog nowadays it seems.
The National MCU management/apps has since then left the company, and I
belive they are doing MT technology nowadays...
The reason why it has not taken off, and why Intels HT technology only gives
you a few %
is the evil cache-trashing effect.
If you always execute form internal memory, you are safe and clear.

Best Regards,
Ulf Samuelsson
We've slightly trimmed the long signature. Click to see the full one.
Re: New embedded CPU architecture

Quoted text here. Click to load it

They are very cool - I have a couple sitting on the desk in front of me  ;-)


Re: New embedded CPU architecture

Quoted text here. Click to load it
Quoted text here. Click to load it

That is where the National Semiconduct guys I spoke to in 1995-1996 ended up

Best Regards,
Ulf Samuelsson
We've slightly trimmed the long signature. Click to see the full one.
Re: New embedded CPU architecture
On Fri, 03 Oct 2003 23:30:19 +0100, Dave Hudson

Quoted text here. Click to load it


I had a look and it looks quite nice. Is there a port of gcc to this
device available ? I have found some references to ip3000
optimisations in gcc, but on the gnu site, the closest target is
ip2k-*-elf. Are there any sites that has some info on using the
gnu tools for the ip2000 and ip3000 ?

   Anton Erasmus

Site Timeline