Tiny CPUs for Slow Logic

S

Svenn Are Bjerkem 7 years ago

On Tuesday, March 19, 2019 at 1:13:38 AM UTC+1, snipped-for-privacy@gmail.com wro te:

n't need to happen at high speed. Simple CPUs can be built into an FPGA us ing a very small footprint much like the ALU blocks. There are stack based processors that are very small, smaller than even a few kB of memory.

e interested? Or is a C compiler mandatory even for processors running ver y small programs?

go on an I/O board for an array processor which had it's own assembler. It was very simple and easy to use, but very much not a high level language. This would have a language that was high level, just not C rather somethin g extensible and simple to use and potentially interactive.

picoblaze is such a small cpu and I would like to program it in something e lse but its assembler language.

svenn

Vote

T

Tom Gardner 7 years ago

A reasonable question.

A major problem with lots of communicating sequential processors (such as the OP suggests) is how to /think/ about orchestrating them so they compute and communicate to produce a useful result.

Once you have such a conceptual framework, thereafter you can develop tools to help.

Oddly enough that occurred to CAR (Tony) Hoare back in the 70s, and he produced the CSP (communicating sequential processes) calculus.

In the 80s that was embodied in hardware and software, the transputers and occam respectively. The modern variant is the xCORE processors and xC.

They provide a concrete demonstration of one set of tools and techniques that allow a cloud of processors to do useful work.

That's something the GA144 conspicuously failed to achieve.

The OP appears to have a vague concept of something running through his head, but appears unwilling to understand what has been tried, what has failed, and where the /conceptual/ practical problems lie.

Overall the OP is a bit like the UK Parliament at the moment. Both know what they don't want, but can't articulate/decide what they do want.

The UK Parliament is an unmitigated dysfunctional mess.

I don't think it is difficult to combine those, any more or less than it is difficult to combine current traditional hardware and software.

Through long experience, I'm wary of any single tool that claims to do everything from top to bottom. They always work well for things that fit their constraints, but badly otherwise.

N.B. that includes a single programming style from top to bottom of a software application. I've used top-level FSMs expressed in GC'ed OOP languages that had procedural runtimes. Why? Because the application domain was inherently FSM based, the GC'ed OOP tools were the best way to create distributed high availability systems, and the procedural language was the best way to create the runtime.

I have comparable examples involving hardware all the way from low-noise analogue electronics upwards.

Moral: choose the right conceptual framework for each part of the problem.

It is up to the OP to give us a clue as to example problems and solutions, and why his concepts are significantly better than existing techniques.

Yup.

The "granularity" of the computation and communication will be a key to understanding what the OP is thinking.

Vote

A

already5chosen 7 years ago

lates

Which had surprisingly small influence on how majority (not majority in sen se of 70%, majority in sense of 99.7%) of the industry solve their problems .

The same as above.

Do you prefer dysfunctional mesh ;)

so

mbine

sits

es in

the

e

es.

s as

stand

I don't know what Rick had in mind. I personally would go for one "hard-CPU" block per 4000-5000 6-input logic elements (i.e. Altera ALMs or Xilinx CLBs). Each block could be configured either as one 64-bit core or pair of 32-bit cores. The bock would contains hard instruction decoders/ALUs/shifters and hard register files. It can opt ionally borrow adjacent DSP blocks for multipliers. Adjacent embedded memor y blocks can be used for data memory. Code memory should be a bit more flex ible giving to designer a choice between embedded memory blocks or distribu ted memory (X)/MLABs(A).

Vote

T

Tom Gardner 7 years ago

That's principally because Moore's "law" enabled people to avoid confronting the issues. Now that Moore's "law" has run out of steam, the future becomes more interesting.

Note that TI included some of the concepts in its DSP processors.

Golang has included some of the concepts.

Many libraries included some of the concepts.

:) I'll settle for anything that /works/ predictably :(

It would be interesting to find an application level description (i.e. language constructs) that - could be automatically mapped onto those primitives by a toolset - was useful for more than a niche subset of applications - was significantly better than existing tools

I wouldn't hold my breath :)

Vote

G

gnuarm.deletethisbit 7 years ago

tes

o

ine

its

in

e

.

When people talk about things like "software running on such heterogeneous cores" it makes me think they don't really understand how this could be use d. If you treat these small cores like logic elements, you don't have such lofty descriptions of "system software" since the software isn't created o ut of some global software package. Each core is designed to do a specific job just like any other piece of hardware and it has discrete inputs and o utputs just like any other piece of hardware. If the hardware clock is not too fast, the software can synchronize with and literally function like ha rdware, but implementing more complex logic than the same area of FPGA fabr ic might.

There is no need to think about how the CPUs would communicate unless there is a specific need for them to do so. The F18A uses a handshaked parallel port in their design. They seem to have done a pretty slick job of it and can actually hang the processor waiting for the acknowledgement saving pow er and getting an instantaneous wake up following the handshake. This can be used with other CPUs or

as

and

Not sure what is hard to think about. It's a CPU, a small CPU with limited memory to implement small tasks that can do rather complex operations comp ared to a state machine really and includes memory, arithmetic and logic as well as I/O without having to write a single line of HDL. Only the actual app needs to be written.

Yes, clearly not getting the concept. Asking about heterogeneous performan ce is totally antithetical to this idea.

Rick C.

Vote

G

gnuarm.deletethisbit 7 years ago

rote:

don't need to happen at high speed. Simple CPUs can be built into an FPGA using a very small footprint much like the ALU blocks. There are stack bas ed processors that are very small, smaller than even a few kB of memory.

be interested? Or is a C compiler mandatory even for processors running v ery small programs?

ago on an I/O board for an array processor which had it's own assembler. It was very simple and easy to use, but very much not a high level language . This would have a language that was high level, just not C rather someth ing extensible and simple to use and potentially interactive.

else but its assembler language.

Yes, it is small. How large is the program you are interested in?

Rick C.

Vote

D

David Brown 7 years ago

That is software.

If you want to try to get cycle-precise control of the software and use that precision for direct hardware interfacing, you are almost certainly going to have a poor, inefficient and difficult design. It doesn't matter if you say "think of it like logic" - it is /not/ logic, it is software, and you don't use that for cycle-precise control. You use when you need flexibility, calculations, and decisions.

Fair enough.

Vote

P

Philipp Klaus Krause 7 years ago

Am 19.03.19 um 16:24 schrieb Svenn Are Bjerkem:

It would be possible to write a C compiler for it (with some restrictions, such as functions being non-reentrant). The architecture doesn't seem any worse than PIC. And there are / were pic14 and pic16 backends in SDCC.

Philipp

Vote

A

already5chosen 7 years ago

t don't need to happen at high speed. Simple CPUs can be built into an FPG A using a very small footprint much like the ALU blocks. There are stack b ased processors that are very small, smaller than even a few kB of memory.

ne be interested? Or is a C compiler mandatory even for processors running very small programs?

rs ago on an I/O board for an array processor which had it's own assembler. It was very simple and easy to use, but very much not a high level langua ge. This would have a language that was high level, just not C rather some thing extensible and simple to use and potentially interactive.

ng else but its assembler language.

I don't know about Svenn Are Bjerkem, but can tell you about myself. Last time when I considered something like that and wrote enough of the pro gram to make measurements the program contained ~250 Nios2 instructions. I' d guess, on minimalistic stack machine it would take 350-400 instructions. At the end, I didn't do it in software. Coding the same functionality in HD L turned out to be not hard, which probably suggests that my case was small er than average.

Another extreme, where I did end up using "small" soft core, it was much mo re like "real" software: 2300 Nios2 instructions.

Vote

A

already5chosen 7 years ago

UK political system is completely off-topic in comp.arch.fpga. However I'd say that IMHO right now your parliament is facing unusually difficult probl em on one hand, but at the same time it's not really "life or death" sort o f the problem. Having troubles and appearing non-decisive in such situation is normal. It does not mean that the system is broken.

Vote

T

Theo 7 years ago

The point is that we need to understand what the whole system is doing. In the XMOS case, we can look at a piece of software with N threads, running across the cores provided on the chip. One piece of software, distributed over the hardware resource available - the system is doing one thing.

Your bottom-up approach means it's difficult to see the big picture of what's going on. That means it's hard to understand the whole system, and to program from a whole-system perspective.

Here are the sematic descriptions of basic logic elements:

LUT: q = f(x,y,z) FF: q

Vote

A

already5chosen 7 years ago

o

-CPU"

CLBs).

bit

and

y.

etween

I think, you are looking at it from wrong angle. One doesn't really need new tools to design and simulate such things. What' s needed is a combinations of existing tools - compilers, assemblers, proba bly software simulator plug-ins into existing HDL simulators, but the later is just luxury for speeding up simulations, in principle, feeding HDL simu lator with RTL model of the CPU core will work too.

As to niches, all "hard" blocks that we currently have in FPGAs are about n iches. It's extremely rare that user's design uses all or majority of the f eatures of given FPGA device and need LUTs, embedded memories, PLLs, multip lies, SERDESs, DDR DRAM I/O blocks etc in exact amounts appearing in the de vice. It still makes sense, economically, to have them all built in, because mask s and other NREs are mighty expensive while silicon itself is relatively ch eap. Multiple small hard CPU cores are really not very different from featu res, mentioned above.

Vote

T

Tom Gardner 7 years ago

Firstly, you chose to snip the analogy, thus removing the context.

Secondly, actually currently there are /very/ plausible reasons to believe it might be life or death for my 98yo mother, and may hasten my death. No, I'm not going to elaborate on a public forum.

I will note that Operation Yellowhammer will, barring miracles, be started on Monday, and that a prominent *brexiteer* (Michael Gove) is s*it scared of a no-deal exit because all the chemicals required to purify our drinking water come from Europe.

Vote

T

Theo Markettos 7 years ago

A lot of these 'niches' have been proven in soft-logic.

Implement your system in soft-logic, discover that there's lots of multiply-adds and they're slow and take up area. A DSP block is thus an 'accelerator' (or 'most compact representation') of the same concept in soft-logic.

The same goes for BRAMs (can be implemented via registers but too much area), adders (slow when implemented with generic LUTs), etc.

Other features (SERDES, PLLs, DDR, etc) can't be done at all without hard-logic support. If you want those features, you need the hard logic, simple as that.

Through analysis of existing designs we can have a provable win of the hard over soft logic, to make it worthwhile putting it on the silicon and integrating into the tools. In some of these cases, I'd guess the win over the soft-logic is 10x or more saving in area.

Rick's idea can be done today in soft-logic. So someone could build a proof of concept and measure the cases where it improves things over the baseline. If that case is compelling, let's put it in the hard logic.

But thus far we haven't seen a clear case for why someone should build a proof of concept. I'm not saying it doesn't exist, but we need a clear elucidation of the problem that it might solve.

Theo

Vote

T

Tom Gardner 7 years ago

That would be one perfectly acceptable embodiment of a toolset that I mentioned.

But more difficult that creating such a toolset is defining an application level description that a toolset can munge.

So, define (initially by example, later more formally) inputs to the toolset and outputs from it. Then we can judge whether the concepts are more than handwaving wishes.

All the blocks you mention have a simple API and easily enumerated set of behaviour.

The whole point of processors is that they enable much more complex behaviour that is practically impossible to enumerate.

Alternatively, if it is possible to enumerate the behaviour of a processor, then it would be easy and more efficient to implement the behaviour in conventional logic blocks.

Vote

A

already5chosen 7 years ago

I don't understand what you are asking for.

If I had such thing, I'd use it in exactly the same way that I use soft cor es (Nios2) today. I will just use them more frequently, because today it co sts me logic resources (often acceptable, but not always) and synthesis and fitter time (and that what I really hate). On the other hand, "hard" core would be almost free in both aspects. It would be as expensive as "soft" or even costlier, in HDL simulations, bu t until now I managed to avoid "full system" simulations that cover everyth ing including CPU core and the program that runs on it. Or may be, I did it once or twice years ago and already don't remember. Anyway, for me it's no t an important concern and I consider myself rather heavy user of soft core s.

Also, theoretically, if performance of the hard core is non-trivially highe r than that of soft cores, either due to higher IPC (I didn't measure, but would guess that for majority of tasks Nios2-f IPC is 20-30% lower than ARM Cortex-M4) or due to higher clock rate, then it will open up even more nic hes. However I'd expect that performance factor would be less important for me, personally, than other factors mentioned above.

Vote

T

Tom Gardner 7 years ago

Go back and read the parts of my post that you chose to snip.

Give a handwaving indication of the concepts that avoid the conceptual problems that I mentioned.

Or better still, get the OP to do it.

Vote

G

gnuarm.deletethisbit 7 years ago

I suppose you can make anything difficult if you try hard enough.

The point is you don't have to make it difficult by talking about "software running on such heterogeneous cores". Just talk about it being a small hu nk of software that is doing a specific job. Then the mystery is gone and the task can be made as easy as the task is.

In VHDL this would be a process(). VHDL programs are typically chock full of processes and no one wrings their hands worrying about how they will des ign the "software running on such heterogeneous cores".

BTW, VHDL is software too.

Ok, that's a start.

Rick C.

Vote

A

already5chosen 7 years ago

Frankly, it starts to sound like you never used soft CPU cores in your designs. So, for somebody like myself, who uses them routinely for different tasks since 2006, you are really not easy to understand. Concept? Concepts are good for new things, not for something that is a variation of something old and routine and obviously working.

With that part I agree.

Vote

G

gnuarm.deletethisbit 7 years ago

On Wednesday, March 20, 2019 at 6:29:50 AM UTC-4, snipped-for-privacy@yahoo.com wrot e:

e:

hat don't need to happen at high speed. Simple CPUs can be built into an F PGA using a very small footprint much like the ALU blocks. There are stack based processors that are very small, smaller than even a few kB of memory .

yone be interested? Or is a C compiler mandatory even for processors runni ng very small programs?

ears ago on an I/O board for an array processor which had it's own assemble r. It was very simple and easy to use, but very much not a high level lang uage. This would have a language that was high level, just not C rather so mething extensible and simple to use and potentially interactive.

hing else but its assembler language.

rogram to make measurements the program contained ~250 Nios2 instructions. I'd guess, on minimalistic stack machine it would take 350-400 instructions .

HDL turned out to be not hard, which probably suggests that my case was sma ller than average.

more like "real" software: 2300 Nios2 instructions.

What sorts of applications where these?

Rick C.

Vote

Tiny CPUs for Slow Logic

Join the Discussion

Didn't find your answer?