40 core embeddified processor

J

Jeff Fox 17 years ago

It is small. It is masked ROM, not OTP. It does not contain user code but contains boot code, utility code, and functions like fractional math routines.

One expects more of everything than on the smallest core for multi- core around. The simple answer might be as much as was needed after coding things like async autobaud, sync, and spi packet boot.

All nodes have synchronized wake/sleep communication ports with neighboring processors. Multiple ports may addressed at a single address. Ports are executable. Instruction streams from offchip via async, sync, spi, i2c, parallel ports or an external memory interface as thus directly executable without using local memory.

Quoting prices is not my job. They had been giving away the thumbdrive evaluation systems. Some applications were developed with free tools and the simulator before being run on real hardware. Some things need real hardware because simulation is too slow.

It depends on the application. Some apps can easily be written and debugged without real hardware and simulation provides profiles and views of anything.

Some apps require instrumentation and debugging of modules separately. Use of data flow templates, data flow algebra, and data flow testbeds allows debugging of some processing modules.

Sometimes a process can multitask to process debugging command messages or to provide debugging command messages. It depends on the nature of the process. The tightest realtime code won't tolerate that sort of intrusion.

The stack based code and limited memory often require that code be rigorously factored into very small pieces as is the tradition in Forth. The longer functions are the more time is spend debugging them. There is a threshold of size and complexity below which bugs tend not to happen. When code is factored that way debugging is minimized.

The tradition is to maximize productivity by minimizing the need for debugging and by using simple methods and by debugging very small pieces of code incrementally. Most programmers find that they don't need ICE or single step environments with this approach. This approach has been one of Forth's advantages in developing software for new hardware over the years.

Best Wishes

Vote

L

Leon 17 years ago

It seems to be intended for similar applications to the XMOS chips. The latter only deliver 1600 MIPS, but they intend to put a lot more cores on each chip, eventually. They are programmed in XC and C, which will make them a lot more popular.

Leon

Vote

J

Jeff Fox 17 years ago

t have the

Part of the speed achieved is due to all instructions executing at the same time that the current instruction is being decoded. In a asynchronous design like this for all instructions to run at the same speed they must all run at the speed of the slowest instruction. Since on-core memory is small pointers to it are small and may produce correct results from addition without the need for a leading nop (depending on the previous state of carry bits of course).

It would have been possible to eliminate the option to use addition at the same speed as all the stack instructions but the cost would be that all the stack instructions would have to slowed down by a factor of two or more to match. On the S40 the addition of logic to eliminate the need for the programmer or compiler to occasionally have to insert a nop before an addition would be the loss of about 15000 mips in performance. The logic was that requiring an occasional nop was not as great a cost as the loss of 15000 mips or more per small cluster chip.

Best Wishes

Vote

40 core embeddified processor

Join the Discussion

Didn't find your answer?