New embedded CPU architecture

- D
- Dave Hudson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Oct 3, 2003 10:30 PM

They are very cool - I have a couple sitting on the desk in front of me ;-)

formatting link

Regards, Dave

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Oct 4, 2003 12:47 AM

You should put as much on chip as the resource allows.

Since we have established this "New embedded CPU architecture" is a soft-core, and not intended as single-chip uC, then you should look at the most common environment, and optimise for that.

Soft-cores can expect BlockRAM, so should be able to context switch their register set within that block ram with minimal cycle overhead. That can extend to hardware based slot assign, and support for dual-port RAM for communicate between these now isloated/weighted threads.

Some uP schemes allow partial register-set overlap, so a smart compiler can pass procedure parameters this way.

For some good examples of Custom soft cores, Industry Std soft cores, and mapping to FPGA fabrics, see

formatting link

They started with a Custom 9 bit opcode device, optimised for the 9 bit Block RAM, but have moved to support industry std cores. (8051/6805/8085/z8..)

-jg

- M
- Morris Dovey
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Oct 4, 2003 2:44 AM

Ok. It's your Si - how many can I have? :-)

--
Morris Dovey
West Des Moines, Iowa USA
C links at http://www.iedu.com/c

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Oct 4, 2003 2:16 PM

"Dave Hudson" skrev i meddelandet news:blks3r$q3l$1$ snipped-for-privacy@news.demon.co.uk...

;-)

That is where the National Semiconduct guys I spoke to in 1995-1996 ended up working...

--
Best Regards,
Ulf Samuelsson   ulf@a-t-m-e-l.com
This is a personal view which may or may not be
share by my Employer Atmel Nordic AB


> Regards,
> Dave
>

- G
- Guy Macon
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Oct 4, 2003 8:42 PM

...

For some reason, whenever i do a web search on "soft core" I get a bunch of results that have *nothing* to do with Embedded CPUs...

:)

--
Guy Macon, Electronics Engineer & Project Manager.  Remember Doc Brown 
from the 'Back to the Future' movies?  Do you have an "impossible" 
engineering project that only someone like Doc Brown can solve? 
Check out my resume at http://www.guymacon.com/resume.html/

- A
- Anton Erasmus
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sun, Oct 5, 2003 8:30 AM

Hi,

I had a look and it looks quite nice. Is there a port of gcc to this device available ? I have found some references to ip3000 optimisations in gcc, but on the gnu site, the closest target is ip2k-*-elf. Are there any sites that has some info on using the gnu tools for the ip2000 and ip3000 ?

Regards Anton Erasmus

- D
- Dave Hudson
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sun, Oct 5, 2003 9:34 AM

The IP3000 isn't shipping yet although it should be before the end of the year. Once the part is fully in production we'll commit the IP3k backend code to the FSF and it will be part of future gcc releases.

FWIW the IP2000 code that's in the FSF tree isn't terribly good at optimizing and needs some work. The version that we supply with our SDK generates significantly better code (*much* better in fact) but requires some tweaks to the FSF tree to do it and so far we haven't had time to come up with the necessary generic improvements to merge this back to the FSF tree.

Regards, Dave

- M
- Mohit Sindhwani
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Oct 9, 2003 11:36 AM

*snip*

On the point of how many general purpose registers there should be, I'd like to just point out that the Infineon TriCore does a full context save in 2 clock cycles! This includes a total of 32 x 32-bit registers. At the same time, if you don't need all the registers, half the context is automatically saved as the processor starts to respond to an interrupt. SO, you're talking about sub-100nS context saves/ restores on a 40MHz clock.

It's a good architecture to look at for some very advanced architectural features, involving response time and performance. It's also not as stack-dependant as older architectures tend to be.

Cheers, Mohit.

-- Join the Infineon TriCore Users Group Mailing List Details -

formatting link

- T
- Tim Clacy
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Oct 10, 2003 1:49 PM

That simply can't be the full story. Presumably you mean 2 core clock cycles, not memory clocks? All that data will probably have to hit RAM at some time, outside of your control, and then the real cost comes (writing

128 bytes at memory speed).

- T
- Tim Clacy
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Oct 10, 2003 1:57 PM

Hmm, it does seem like that it wouldn't be very useful :-) However, consider that all of those operations that you listed can be implemented by state-machines and that a state-machine simlpy requires a move and a copy operation. I'll be posting the VM on CodeProject before the end of the year, then a FPGA core later; have a look then see what you think.

Tim

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Oct 11, 2003 12:19 AM

If you look at the registers as being an on-chip separate memory, with no connection with real memory, you simplify things. I would make the underlying machine easy to implement a stack machine on by placing a portion (top 4 or 8 items) of the stack in registers. These can be thought of as a queue, the idea being to minimize actual stack access. A push, when full, will empty, say,

4 of them by automatically writing to the external stack. A pop, when empty, will reload 4 again.

This leaves 24 of the proposed on chip registers. Make 16 real gp registers. Of the remaining 8 we have:

program status (PSW) instruction counter stack pointer local base pointer global data pointer

with space for future expansion. The status word includes the sp offset into the 8 on chip stack registers, and the marker for junction with memory there. Both 4 bits. Also a field to identify the register group in use.

Bear in mind this is just a rough outline. There will be gotchas.

Now look at the action of an interrupt. Each discrete (say 4) interrupt specifies only a new set of registers. The PSW is copied to a dedicated register, the register set switched, and the saved PSW pushed. No need to save stack pointers, instruction counter, etc. The interrupt exit is simply to pop that saved PSW, returning the original register set etc.

Things have to be set up, so the register addressing has to be able to write into all those other registers. This is privileged, by using it in register set 0 only.

Possibly a pair of non-switched registers are needed, for process to process communication.

The instruction set basically operates on the stack, which is usually in registers. Add is effectively pop, pop, sum, push, etc. Load converts an address on the stack to a value. Store move top of stack (TOS) to the address below it, and pops both values. It may be convenient to have those instructions able to impose a small stack offset, which if non-zero leaves the address on stack.

This system also provides for multi-processing, and context saving is already handled. All the executive has to do is decide which process runs next, and set the appropriate reg. set value to return to.

The whole system is fairly easily simulated. It makes it easy to avoid accessing external memory, whether cached or not. Instruction access can be through an on chip cache.

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
     USE worldnet address!

- M
- Mohit Sindhwani
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Oct 11, 2003 5:21 AM

You're right, it isn't the full story, of course :-) The 2-cycles apply to context saving/ restoring from on-chip memory. In the case of the TC1775 implementation, there is 32KBytes of on-chip memory that can be used for this purpose.

It takes 64 bytes (16x32-bit registers) for a half context. Often, this is enough though no OS will do that for you directly - it would require manual coding to beenfit from this.

If you decide to use, say 25K of the on-chip 32K for contexts, you are looking at storing 400 half-contexts or 200 full contexts. Depending on the needs of your application, this may be sufficient and extremely attractive! But, yes, the numbers will change if it has to go to off-chip memory.

But, the main point I wanted to make was that context save/ restore on the TriCore architecture is hardware-assisted and is potentially very fast - sub 100ns context switches do make the mouth drool a tad bit :-)

Cheers, Mohit.

-- Join the TriCore Users Group Mailing List Details -

formatting link

- D
- Derek M Jones
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Oct 17, 2003 12:54 AM

All,

I am looking for recent figures on microprocessor sales. Detail is not important (ie not interested in break down by geographic region, or market sector), but I would like some degree of authority (ie not the usual quote from 5 years ago about what will be happening in 5 years time).

I want to be able to cite the source in a paper. So it must be publiclly available.

Thanks.