Tiny CPUs for Slow Logic

Most of us have implemented small processors for logic operations that don' t need to happen at high speed. Simple CPUs can be built into an FPGA usin g a very small footprint much like the ALU blocks. There are stack based p rocessors that are very small, smaller than even a few kB of memory.

If they were easily programmable in something other than C would anyone be interested? Or is a C compiler mandatory even for processors running very small programs?

I am picturing this not terribly unlike the sequencer I used many years ago on an I/O board for an array processor which had it's own assembler. It w as very simple and easy to use, but very much not a high level language. T his would have a language that was high level, just not C rather something extensible and simple to use and potentially interactive.

Rick C.

Reply to
Loading thread data ...

If it is going to appeal to software developers, you need C. And it has to be reasonable, standard C, even if it is for small devices - programmers are fed up with the pains needed for special device-specific C on 8051, AVR, PIC, etc. That does not necessarily mean it has to be fast, but it should work with standard language. Having 16-bit size rather than 8-bit size makes a huge difference to how programmers feel about the device - aim for something like the msp430.

You might, however, want to look at extensions for CSP-style communication between cpus - something like XMOS XC.

If it is to appeal to hardware (FPGA) developers, C might not be as essential. Some other kind of high level language, perhaps centred around state machines, might work.

But when I see "extensible, simple to use and potentially interactive", I fear someone is thinking of Forth. People who are very used to Forth find it a great language - but you need to understand that /nobody/ wants to learn it. Most programmers would rather work in assembler than Forth. You can argue that this attitude is irrational, and that Forth is not harder than other languages - you might be right. But that doesn't change matters.

Reply to
David Brown

Certainly this would be like Forth, but the reality is I'm thinking of a Fo rth like CPU because they can be designed so simply.

The F18A stack processor designed by Charles Moore is used in the GA144 chi p. There are 144 of them with unusual interconnections that allow the CPU to halt waiting for communications, saving power. The CPU is so small that it could be included in an FPGA as what would be equivalent to a logic ele ment.

In the same way that the other functional logic elements like the block RAM s and DSP blocks are used for custom functionality which requires the desig ner to program by whatever means is devised, these tiny CPUs would not need a high level language like C. The code in them would be small enough to b e considered "logic" and developed at the assembly level.

People have mindsets about things and I believe this is one of them. The G A144 is not so easy to program because people want to use it for the sort o f large programs they write for other fast CPUs. In an FPGA a very fast pr ocessor can be part of the logic rather than an uber-controller riding herd over the whole chip. But this would require designers to change their thi nking of how to use CPUs. The F18A runs at 700 MIPS peak rate in a 180 nm process. Instead of one or two in the FPGA like the ARMs in other FPGAs, t here would be hundreds, each one running at some GHz.

Rick C.

Reply to

The problem this boils down to is programmability.

If you have a small core, you can therefore have lots of them. But writing software for and managing dozens or hundreds of cores is troublesome. At this level, you have enough headache with the inter-core communication that you'd rather not throw a strange assembler-only core architecture into the mix. A core like this would need a simple inter-core programming model so it's easy to reason about system behaviour (example: systolic arrays)

There's a certain merit in having a CPU as a building block, like a LAB, BRAM or DSP block. I'm not familiar with the literature in this space, but it's the sort of thing that turns up at the 'FPGA' conference regularly (keyword: CGRA). That merely punts the issue to now being a tools problem - the tools know how to make use of a DSP block, but how to make use of a CPU block? How to turn HDL into 'software'? Can you chain the blocks together to make wider logic?

I suppose there's also a niche at the ultra-cheap end of the spectrum - for $1 gadgets with an 8051 because a 16 bit CPU would be too expensive (and a Cortex M0 would have licence fees). But if this is an ASIC then I don't think there's a whole lot more to pay to get a C-compatible processor (even in 8 bits). And it's unclear how much speed penalty you'd pay for that.

How much code/data memory would you expect to have? Would that dwarf the size of your core?

Finally I can see the use as a 'state machine implementation engine' for say a CPLD. But for that you need tools (taking HDL or state-transition diagrams) to allow the programmer to describe their state machine. And your competition is the regular HDL synthesiser which will just make it out of flip flops. I'm unclear how often you'd win in these circumstances.

And I can't really see 'interactive' as a feature - either you have only one core, in which case you could equally hook up JTAG (or equivalent) to something larger for interactive debugging, or you have many cores, in which case I can't see how you'd interact sensibly with dozens at once.


Reply to
Theo Markettos

Who cares about yet another processor programmed in the same old language. It would not have a *U*SP. In fact it would be "back to the 80s" :)

However, if you want to make it interesting enough to pass the elevator test, ensure it can do things that existing systems find difficult.

You should have a look at how the XMOS hardware and software complement each other, so that the combination allows hard real time operation programming in multicore systems. (Hard means guaranteed-by-design latencies between successive i/o activities)

Reply to
Tom Gardner

I appreciate that.

I can only tell you how /I/ would feel here, and let you use that for what you think it is worth. I don't claim to speak for all software developers, but unless other people are giving you feedback too, then this is the best you've got :-) Remember, I am not trying to argue about the pros and cons of different designs or languages, or challenge you to persuade me of anything - I'm just showing you how software developers might react to your design ideas.

Yes, but look how popular the chip is - it is barely a blip in the landscape. There is no doubt that this is a technologically fascinating device. However, it is very difficult to program such chips - almost no one is experienced with such multi-cpu arrangements, and the design requires a completely different way of thinking from existing software design. Add to that a language that works backwards, and a syntax that looks like the cat walked across the keyboard, and you have something that has programmers running away.

My experience with Forth is small and outdated, but not non-existent. I've worked with dozens of programming languages over the years - I've studied CSP, programmed in Occam, functional programming languages, lots of assemblies, a small amount of CPLD/FPGA work in various languages, and many other kinds of coding. (Most of my work for the past years has been C, C++ and Python.) I'm not afraid of learning new things. But when I looked at some of the examples for the GA144, three things struck me. One is that it was amazing how much they got on the device. Another is to wonder about the limitations you get from the this sort of architecture. (That is a big turn-off with the XMOS. It's fantastically easy to make nice software-based peripherals using hardware threads. And fantastically easy to run out of hardware threads before you've made the basic peripherals you get in a $0.50 microcontroller.) And the third thing that comes across is how totally and utterly incomprehensible the software design and the programming examples are. The GA144 is squarely in the category of technology that is cool, impressive, and useless in the real world where developers have to do a job, not play with toys.

Sure, it would be possible to learn this. But there is no way I could justify the investment in time and effort that would entail.

And there is no way I would want to go to a language with less safety, poorer typing, weaker tools, harder testing, more limited static checking than the development tools I can use now with C and C++.

The modern way to use the DSP blocks on FPGA's is either with ready-made logic blocks, code generator tools like Matlab, or C to hardware converters. They are not configured manually at a low level. Even if when they are generated directly from VHDL or Verilog, the developer writes "x = y * z + w" with the required number of bits in each element, and the tools turn that into whatever DSP blocks are needed.

The key thing you have to think about here, is who would use these tiny cpus, and why. Is there a reason for using a few of them scattered around the device, programmed in assembly (or worse, Forth) ? Why would the developer want to do that instead of just adding another software thread to the embedded ARM processor, where development is so much more familiar? Why would the hardware designer want them, instead of writing a little state machine in the language of their choice (VHDL, Verilog, System C, MyHDL, C-to-HDL compiler, whatever)?

I am missing the compelling use-cases here. Yes, it is possible to make small and simple cpu units with a stack machine architecture, and fit lots of them in an FPGA. But I don't see /why/ I would want them - certainly not why they are better than alternatives, and worth the learning curve.

Exactly. And you have a choice here - work with people with the mindsets they have, or give /seriously/ compelling reasons why they should invest in the time and effort needed to change those mindsets. Wishful thinking is not the answer.

It is not easy to program because it is not easy to program. Multi-threaded or multi-process software is harder than single-threaded code.

The tools and language here for the GA144 - based on Forth - are two generations behind the times. They are totally unfamiliar to almost any current software developer.

And yes, there is the question of what kind of software you would want to write. People either want to write small, dedicated software - in which case they want a language that is familiar and they want to keep the code simple. Or they want bigger projects, reusing existing code - in which case they /need/ a language that is standard.

Look at the GA144 site. Apart from the immediate fact that it is pretty much a dead site, and clearly a company that has failed to take off, look at the examples. A 10 Mb software Ethernet MAC ? Who wants /that/ in software? A PS/2 keyboard controller? An MD5 hash generator running in 16 cpus? You can download a 100-line md5 function for C and run it on any processor.

It has long been established that lots of tiny processors running really fast are far less use than a few big processors running really fast.

700 MIPS sounds marvellous, until you realise how simple and limited each of these instructions is.

At each step here, you have been entirely right about what can be done. Yes, you can make small and simple processors - so small and simple that you can have lots of them at high clock speeds.

And you have been right that using these would need a change in mindset, programming language, and development practice to use them.

But nowhere do I see any good reason /why/. No good use-cases. If you want to turn the software and FPGA development world on its head, you need an extraordinarily good case for it.

Reply to
David Brown

Yup. The hardware is easy. Programming is painful, but there are known techniques to control it...

There's an existing commercially successful set of products in this domain. You get 32-core 4000MIPS processors, and the IDE guarantees the hard real-time performance.

Programming uses a techniques created in the 70s, first implemented in the 80s, and which continually reappear, e.g. TI's DSP engines, Rust, Go etc.

Understand XMOS's xCORE processors and xC language, see how they complement and support each other. I found the net result stunningly easy to get working first time, without having to continually read obscure errata!

Reply to
Tom Gardner






So how do they design with the many other functional elements in an FPGA? Is it really that hard to program the various logic functions in an FPGA be cause of the difficulty in defining their communications?



Wow! Makes you wonder how FPGAs ever get designed at all.

"Inter-core programming model", not sure what that means.

I think you are overthinking this, much as people do when using large CPUs, not to say they are overthinking for those designs. The whole point is th at these CPUs would be used like logic blocks, not like CPUs. Small, limit ed memory with programs written to be easy to debug and/or designed to simp ly work by being simple.

I'm not sure how software people think really. I worked with one guy to tr y to solve a problem and they were using a subroutine to do a memory access . I suppose this was because it needed to be this specific code to get the access to work the way it needed to. But then the guy kept looking at tho se five lines of code for some serious time. It was pretty clear to me wha t it was doing. Or he could have run the code or simulated it and looked t o see what it did. Actually that would have been a valid use of JTAG to si ngle step through those five lines a few times. But he just kept reading t hose five lines. Wow!


m -



I don't see the difficulty. I'm not so familiar with Verilog, but in VHDL you have sequential code. It wouldn't be hard to program a CPU using VHDL I think. If nothing else, there should be a way to code in assembler and e mbed the code similarly to what is done for ROM like functions in HDL.



I believe we have officially reached the point where $1 processors are 32 b it ARMs and you have to get below $0.50 before you consider needing 8 bit p rocessors. Not sure what this has to do with adding CPU functional element s to FPGAs.


You are thinking of something totally different from using a CPU as logic. Processors like ARMs are too large to have hundreds in an FPGA (unless it is a really large chip). Their architectural capabilities are much more th an what is required for this. I suppose a small 8 bit CPU could be used, b ut why use such a tiny data path with such limited capability? The archite ctural simplicity of a stack machine allows it to be designed to run very f ast. With speed comes a certain flexibility to keep up with the discrete l ogic.

Small, very small. Maybe 256 words of RAM. Instructions on the F18A are o nly 5 bits and so pack four per in the 18 bit word. The last instruction i s only 3 bits wide expressing a subset of the 32 instructions otherwise cod ed for. Round the word width up to 20 bits or even 32.

I'm not sure what happens if this actual processor is shrunk from 180 nm to something like 20 nm. It was highly optimized for the 180 nm process it i s built in and it may require some tweaks to work well at smaller processes . The F18A has no external clock and different instructions time different ly with basic logic instruction running very fast and memory accesses takin g more time. You can think of it as an async processor.



If you have logic that is well implemented sequentially (at a very high spe ed, likely multiple GIPS) it will save a lot of room in the FPGA just as mu ltipliers and other function blocks. Hard cores are much more efficient an d sequential code is most efficient in a CPU type design which leverages th e size advantage of memory over logic.



If you have to use JTAG to debug something like this you are pretty much do omed. I haven't used JTAG for anything other than programming FPGAs in dec ades.

In general FPGAs are 99.9% debugged in simulation. The odd 0.1% requires p retty special thinking anyway and I don't find JTAG to be very useful. My best debugging tool is wetware.

The point of interactivity is to allow the code to be tested one definition at a time. But then that is a Forth concept and I'm pretty sure not a fam iliar concept with most people.

Rick C.

Reply to

On Tuesday, March 19, 2019 at 2:13:38 AM UTC+2, snipped-for-privacy@gmail.com wro te:

n't need to happen at high speed. Simple CPUs can be built into an FPGA us ing a very small footprint much like the ALU blocks. There are stack based processors that are very small, smaller than even a few kB of memory.

e interested? Or is a C compiler mandatory even for processors running ver y small programs?

go on an I/O board for an array processor which had it's own assembler. It was very simple and easy to use, but very much not a high level language. This would have a language that was high level, just not C rather somethin g extensible and simple to use and potentially interactive.

It is clear that you have Forth in mind. It is less clear why you don't say it straight.

Reply to


ing a




t was


Sorry, I don't get what any of this means.

Yeah I think the XMOS model is way more complex than what I am describing. The XMOS processors are actually very complex and use lots of gates. They also don't run all that fast. Their claim to fame is to be able to commun icate through shared memory as if the other CPUs were not there in the good way. Otherwise they are conventional processors, programmed in convention al ways.

The emphasis here is for the CPU to be nearly invisible as a CPU and much m ore like a function block. You just have to "configure" the operation by w riting a bit of code. That's why 'C' is not desirable, it would be too cum bersome for small code blocks.

Rick C.

Reply to

That is 'C' world, conventional thinking. If you can write a hello world p rogram without using a JTAG debugger, you should be able to write and debug most programs for this core in the simulator with 100% correctness. We ar en't talking about TCP/IP stacks.

And they are designed to provide MIPS, not logic functions.

I don't want to go too far into the GA144 since this is not what I'm talkin g about inserting into an FPGA, but only as an analogy. One of the critici sms of that device is how hard it is to get all 144 processors cranking at full MIPS. But the chip is not intended to utilize "the full MIPS" possibl e. It is intended to be like an FPGA where you have CPUs available to do w hat you want without regard to squeezing out every possible MIPS. No small number of these processors will do nothing other than passing data and con trol to it's neighbors while mostly idling because that is the way they are wired together.

The above mentioned 4000 MIPS processor is clearly intended to utilize ever y last MIPS. Not at all the same and it will be programmed very differentl y.

But not at all relevant here since their focus is vastly different from pro viding logic functions efficiently.

Rick C.

Reply to


don't need to happen at high speed. Simple CPUs can be built into an FPGA using a very small footprint much like the ALU blocks. There are stack bas ed processors that are very small, smaller than even a few kB of memory.

be interested? Or is a C compiler mandatory even for processors running v ery small programs?

ago on an I/O board for an array processor which had it's own assembler. It was very simple and easy to use, but very much not a high level language . This would have a language that was high level, just not C rather someth ing extensible and simple to use and potentially interactive.

Because this is not about Forth. It is about very small processors. I wou ld not really bother with Forth as the programming language specifically be cause that would be a layer on top of what you are doing and to be efficien t it would need to be programmed in assembly.

That said, the assembly language for a stack processor is much like Forth s ince Forth uses a virtual stack machine as it's programming model. So yes, it would be similar to Forth. I most likely would use Forth to write prog rams for these, but that is just my preference since that is the language I program in.

But the key here is to program the CPUs in their stack oriented assembly. That's not really Forth even if it is "Forth like".

Is that what you wanted to know?

Rick C.

Reply to

Individually not especially fast, aggregate fast.

Not just shared memory, *far* more interesting than that.

Up to 8 cores in a "tile" share memory. Comms between tiles is via an interconnection network Comms with i/o is via the same interconnection network.

At the program level there is *no* difference between comms via shared memory and comms via interconnection network. Nor is there any difference between comms with a i/o and comms with other cores.

All comms is via channels. That's one thing that makes the hardware+software environment unique.

No. You are missing the key differentiating points...

Conventional processors and programming treats multicore programming as an advanced add on library - explicitly so in the case of C. And a right old mess that is.

xC+xCORE *start* by presuming multicore systems, and use a set of harmonious concepts to make multicore programming relatively easy and predictable.

Why bother? What would be the *benefit*?

Yes, you can use a screw instead of a nail, but that doesn't mean there is a benefit. Unless, of course, you can't use a hammer.

Reply to
Tom Gardner

That alone is a misunderstanding of what I am suggesting. I see no reason to involve "programmers". I don't think any FPGA designer would have any t rouble using these processors and "programmers" are not required. Heck, th e last company I worked for designed FPGAs in the software department, so e veryone writing HDL for FPGAs was a "programmer" so maybe the distinction i s less that I realize.

That's not the issue, I'm not proposing anyone use a GA144.

Again, that's not what I am proposing. They have hundreds of multipliers a nd DSP blocks in FPGAs with no one worrying about how they will tie togethe r. These CPUs would be similar.

Now you are interjecting your own thoughts. I never suggested that cats be used to program these CPUs.

Too bad this isn't about Forth.

There are many areas where a "little" knowledge is a dangerous thing. I th ink programming languages and especially FPGA design are among those areas.

I see why you started your comments with the big caveat. You seem to have a bone to pick with Forth and the GA144, neither of which are what I am tal king about. You've gotten ahead of yourself.

Yes, well good thing you would never be the person who wrote any code for t his. No "programmers" allowed, only FPGA designers... and no amateurs allo wed either. ;)


I guess I'm not modern then. I use VHDL and like it... Yes, I actually sai d I like VHDL. The HDL so many love to hate.

I see no reason why these devices couldn't be programmed using VHDL, but it would be harder to debug. But then I expect you are the JTAG sort as well . That's not really what I'm proposing and I think you are overstating the case for "press the magic button" FPGA design.

Because and ARM can't keep up with the logic. An ARM is very hard to inter face usefully as a *part* of the logic. That's the entire point of the F18 A CPUs. Each one is small enough to be dedicated to the task at hand (like in the XMOS) while running at a very high speed, enough to keep up with 10

0 MHz logic.

That depends on what the state machine is doing. State machines are all ad

-hoc and produce their own little microcosm needing support. You talk abou t the issues of programming CPUs. State machines are like designing your o wn CPU but without any arithmetic. Add arithmetic, data movements, etc. an d you have now officially designed your own CPU when you could have just us ed an existing CPU.

That's fine, if it is what you intended. Many FPGA users add their own sof t core CPU to an FPGA. Having these cores would make that unnecessary.

The question is why would an FPGA designer want to roll their own FSM when they can use the one in the CPU?

Yes, but you aren't really an FPGA designer, no? I can see your concerns a s a Python programmer.

You are a programmer, not an FPGA designer. I won't try to convince you of the value of many small CPUs in an FPGA.

I can see that you don't understand the GA144. If you are working on a des ign that suits the GA144 (not that there are tons of those) it's not a bad device. If I were working on a hearing aid app, I would give serious consi deration to this chip. It is well suited to many types of signal processin g. I once did a first pass of an oscilloscope design for it (strictly low bandwidth). There are a number of apps that suit the GA144, but otherwise, yes, it would be a bear to adapt to other apps.

But this is not about the GA144. My point was to illustrate that you don't need to be locked into the mindset of utilizing every last instruction cyc le. Rather these CPUs have cycles to spare, so feel free to waste them. T hat's what FPGAs are all about, wasting resources. FPGAs have some small p ercentage of the die used for logic and most of the rest used for routing, most of which is not used. Much of the logic is also not used. Waste, was te, waste! So a little CPU that is only used at 1% of it's MIPS capacity i s not wasteful if it saves a bunch of logic elsewhere in the FPGA.

That's the point of discussing the GA144.

And they are not relevant to this discussion.

Who is "they" again? I'm not picturing this being programmed by the progra mming department. To do so would mean two people would need to do a job fo r one person.

Wow! You are really fixated on the GA144.

Again, you are pursuing a MIPS argument. It's not about using all the MIPS . The MIPS are there to allow the CPU to do it's job in a short time to ke ep up with logic. All the MIPS don't need to be used.

"A few big processors" would suck in being embedded in the logic. The just can't switch around fast enough. You must be thinking of many SLOW proces sors compared to one fast processor. Or maybe you are thinking of doing wo rk which is suited for a single processor like in a PC.

Yeah, you can use one of the ARMs in the Zynq to run Linux and then use the other to interface to "real time" hardware. But this is a far cry from wh at I am describing.

"On it's head" is a powerful statement. I'm just talking here. I'm not wr iting a business plan. I'm asking open minded FPGA designers what they wou ld use these CPUs for.

Rick C.

Reply to

On Tuesday, March 19, 2019 at 1:14:56 PM UTC+2, snipped-for-privacy@gmail.com wro te:


t don't need to happen at high speed. Simple CPUs can be built into an FPG A using a very small footprint much like the ALU blocks. There are stack b ased processors that are very small, smaller than even a few kB of memory.

ne be interested? Or is a C compiler mandatory even for processors running very small programs?

rs ago on an I/O board for an array processor which had it's own assembler. It was very simple and easy to use, but very much not a high level langua ge. This would have a language that was high level, just not C rather some thing extensible and simple to use and potentially interactive.

ould not really bother with Forth as the programming language specifically because that would be a layer on top of what you are doing and to be effici ent it would need to be programmed in assembly.

since Forth uses a virtual stack machine as it's programming model. So ye s, it would be similar to Forth. I most likely would use Forth to write pr ograms for these, but that is just my preference since that is the language I program in.

That's not really Forth even if it is "Forth like".

I wanted to understand if there is PR element involved. Like, you afraid th at if you say "Forth" then most potential readers immediately stop reading.

I am not a PR consultant, but I was then I'd suggest to remove word "intera ctive" from description of the language that you have in mind.

BTW, I agree that coding in HDLs suck for many sorts of sequential tasks. And I agree that having CPU that is *not* narrow in its data paths and opti onally not narrow in external addresses, but small/configurable in everythi ng else could be a good way to "offload" such parts of design away from HDL . I am much less sure that stack processor is a good choice for such tasks.

Reply to


B of









n the

Yes, I said that.

Implementation details I don't really care about.

Implementation details I don't really care about.

Implementation details I don't really care about.

Implementation details I don't really care about and it has no relevance to the topic of embedding in an FPGA.

Irrelevant in this context since this would never be used in the same way o f scattering many CPUs around an FPGA die.



Isn't that obvious? It could do the work of a lot of FPGA logic in the sam e way that MCUs are used rather than FPGAs. It's the same reason why multi pliers, DSP blocks and even memory is included in FPGAs, because they are m uch more efficient than using the fabric logic.

I guess no one uses screws, eh?

Rick C.

Reply to



hat don't need to happen at high speed. Simple CPUs can be built into an F PGA using a very small footprint much like the ALU blocks. There are stack based processors that are very small, smaller than even a few kB of memory .

yone be interested? Or is a C compiler mandatory even for processors runni ng very small programs?

ears ago on an I/O board for an array processor which had it's own assemble r. It was very simple and easy to use, but very much not a high level lang uage. This would have a language that was high level, just not C rather so mething extensible and simple to use and potentially interactive.

would not really bother with Forth as the programming language specificall y because that would be a layer on top of what you are doing and to be effi cient it would need to be programmed in assembly.

th since Forth uses a virtual stack machine as it's programming model. So yes, it would be similar to Forth. I most likely would use Forth to write programs for these, but that is just my preference since that is the langua ge I program in.

y. That's not really Forth even if it is "Forth like".

that if you say "Forth" then most potential readers immediately stop readin g.

No, this is not about Forth.

ractive" from description of the language that you have in mind.

That is one of the advantages of this idea. Why is "interactive" a bad thi ng?

tionally not narrow in external addresses, but small/configurable in everyt hing else could be a good way to "offload" such parts of design away from H DL.

Stack processors can be made very simply. That is the main reason to sugge st them. There are simple register processors, but I find them more diffic ult to program.

I do use Forth for programming this sort of task. I find it easy to develo p in. I understand that many are so used to programming in more complicate d languages... or I should say, using more complicated tools, so they aren' t comfortable working closer to the hardware. But when the task you are pr ogramming up is so simple, then you don't need the training wheels. But th at is not what I am talking about here. This is about a small processor th at can be made very efficiently on the FPGA die.

Would a small, hard core CPU likely run at GIPS in an FPGA?

Rick C.

Reply to

It is clear that you want other people to validate your ideas, but you have no interest in - understanding what is available - understanding in what way your (vague) concepts would enable designers to do their job better than using existing technology - explaining your concept's USP

The first of those is a cardinal sin in my book, since you are likely to waste your time (don't care) reinventing a square wheel, and waste other people's time (do care) figuring out that you aren't enabling anything new.

Good luck.

Reply to
Tom Gardner

I can see the merits of the XMOS approach. But I'm unclear how this relates to the OP's proposal, which (I think) is having tiny CPUs as hard logic blocks on an FPGA, like DSP blocks.

I completely understand the problem of running out of hardware threads, so a means of 'just add another one' is handy. But the issue is how to combine such things with other synthesised logic.

The XMOS approach is fine when the hardware is uniform and the software sits on top, but when the hardware is synthesised and the 'CPUs' sit as pieces in a fabric containing random logic (as I think the OP is suggesting) it becomes a lot harder to reason about what the system is doing and what the software running on such heterogeneous cores should look like. Only the FPGA tools have a full view of what the system looks like, and it seems stretching them to have them also generate software to run on these cores.

We are not talking about a multi- or many- core chip here, with the CPUs as the primary element of compute, but the CPUs scattered around as 'state machine elements' justs ups the complexity and makes it harder to understand compared with the same thing synthesised out of flip-flops.

I would be interested to know what applications might use heterogenous many-cores and what performance is achievable.


Reply to
Theo Markettos

FPGA designers already have at least one foot in the "programmer" camp. An increasing proportion (AFAIUI) of FPGA design is done from a software viewpoint, not a hardware viewpoint. People use C-to-HDL, Matlab, high-level languages (Scala, Python, etc.) for their FPGA designs. Thinking in terms of wires, registers, logic elements, etc., does not scale - the "hardware" part is often dominated by choosing and connecting the right modules. (Yes, there are other parts too, such as clock design, IO, etc.)

I am not convinced that there really is a significant group of hardware designers who want to program small, limited cpus using low-level languages, but who don't want to be "mere programmers" working in C or other common programming languages.

Again, I am failing to see the use-cases you have in mind. It's hard to guess what you might be talking about if you don't say.

Fair enough. It was an example you gave, so I ran with it.

You don't need to program the multipliers or DSP blocks.

Now, if you can find a way to avoid any programming of these tiny cpu cores, you might be on to something. When the VHDL or Verilog synthesis tool meets a calculation with a multiply, it automatically puts in the DSP blocks that are needed. When it meets a large array, it automatically infers ram blocks. If you can ensure that when it meets some complex sequential logic, or a state machine, that it infers a tiny cpu and the program for it, /then/ you will have something immediately useful.

I'm telling you how things look.

You say that, yet it seems to be entirely about Forth. Or at least, about programming cpus in assembly where the stack-based design means the assembly language is practically Forth.

Of course, even though these devices might have a Forth-like assembly, it would be possible to have other languages on top. Do you have any existing ones in mind?

And there are many areas where a little knowledge is a useful thing - programming languages and FPGA design are amongst them. I am aware of the limitations of my knowledge - but the breadth is a useful thing here.

To sum up this conversation so far:

Rick: What do people think about tiny processors in an FPGA? Will programmers like it even if it does not support C? Opinions, please.

David: Programmers will want C, possibly with extensions. They won't want Forth.

Rick: Look at the GA144, and how great it is, programmed in Forth. The programming world is bad because people are stuck in the wrong mindset of using existing major programming languages and existing major programming platforms. They should all change to this kind of chip because it can run at 700 MIPS with little power.

David: I can only tell you my opinion as a software developer. The GA144 is technically interesting, but a total failure in the marketplace. No one wants to use Forth. The cpus may have high clock speeds, but do almost nothing in each cycle. If you want people to use tiny cpus, you have to have a good reason and good use-cases.

Rick: If I want your opinion, I'll insult you for it. You are clearly wrong. This is nothing like the GA144 cpus - I mentioned them to confuse people. They won't be programmed in Forth - they will be programmed in an assembly that looks almost exactly like Forth. You should be basing your answers on reading my mind, not my posts.

What is it you actually want here? Posts that confirm that you are on the right track, and you are poised to change the FPGA world? I am giving you /my/ opinions, based on /my/ experience and /my/ understanding of how programmers would likely want to work - including programmers who happen to do FPGA work. If those opinions are of interest to you, then great. If you want a fight, or a sycophant, then let me know so I can bow out of the thread.

So let's get this straight. I don't have any "bones to pick" with the GA144, Forth, small cpus, or anything else. I don't have any biases against them. I have facts, and I have opinions based on experience. If your opinions and experiences are different, that's okay - but don't tell me I am ignorant, or have dangerously little knowledge, or that I have bones to pick.

Feel free to rule out every other possible user too - especially those that are interested in code quality. There is a reason why software developers want good tools and good languages - and it's not laziness or incompetence.

Sure, there is nothing wrong with that. But if you want to make something that appeals to other people, you need to be looking for "better than the modern choices" - not "worse than the old choices".

I use JTAG debugging when that is the appropriate choice. I use other types of debugging at other times, or simulations, testing on other platforms, etc. Dismissing JTAG debugging as a tool is just as bad as relying upon it for everything.

When you are thinking of a new way of doing design here, then debugging and testing should be of prime concern. I don't think doing it all in VHDL will cut it. I can agree that JTAG debugging is not going to work well for multiple small processors - but you have to think about what /would/ work well, rather than what won't work.

The F18A devices don't keep up with the logic - not when you are doing something more than toggling a pin at high speed. They do so very little each clock cycle.

But the big question here - the elephant in the room that you keep ignoring - is what you want these devices to /do/. Why are you trying to make them "keep up with the logic" ? Use hardware to do the things that hardware is good at - fast, predictable timing, wide data, parallel actions. Use software for what software is good at - flexible, sequential, conditional. Combine them appropriately - use software to control the hardware parts, use buffers to avoid the latency variations in the software. Use hardware state machines for the small, simple, fast sequential parts.

Where do you see your new cpus being used? Give us some examples.

Equally, having soft core CPUs makes your cores unnecessary. Sure, a real soft core CPU is usually bigger than the cpus you imagine - but they can do so much more. And they can do so /today/, using tools available /today/, that are familiar with programmers /today/. That massive benefit in developer efficiency outweighs the cost in logic cells (if it doesn't, you should not be using FPGA's except to prototype your ASICs).

Equally, why should they want a special purpose mini cpu core, when they can write their state machines as they always have done, using pure VHDL or Verilog, or additional state machine design software ?

Pretend I am an FPGA designer, and then try to explain the point of them. As far as I can see, this whole thing is a solution in search of a problem. Convince me otherwise - or at least, convince me that /you/ think otherwise.

(And while I would not classify myself as an FPGA designer, I have done a few FPGA designs. I am by no means an expert, but I am familiar with the principles and the technology.)

I can see that you didn't understand what I wrote - or perhaps you don't understand programming and software development as much as you think. Let me try again - the GA144 is not easy to program. I didn't say anything about what apps it might be good for - I said it is not easy to program. That is partly because it is a difficult to make designs for such a many-processor system, partly because the language is awkward, and partly because the skill set needed does not match well with skill sets of most current programmers. I am basing this on having read through some of the material on their site, thinking that this is such a cool chip I'd like to find an excuse to use it on a project. But I couldn't find any application where the time, cost and risk could be remotely justified.

Agreed. I am happy with that - and I am not locked into a mindset here. I can't see what might have given you that impression.

FPGA development is usually about wasting space - you are using only a small proportion of the die, but using it hard. Software development is usually about wasting time - in most systems, the cpu is only doing something useful for a few percent of the time, but is running hard in that time spot. It is not actually that different in principle.

I do understand that most of the cpus on a chip like that are "wasted" - they are doing very little. And that's fine.

Again, /you/ brought it up. You are trying to promote this idea of lots of small, fast cpus on a chip. You repeatedly refuse to give any sort of indication what these might be used for - but you point to the existing GA144, which is a chip with lots of small, fast cpus. Can't you understand why people might think that it is an example of what you might be thinking about? Those examples and application notes are about the only examples I can find of uses for multiple small, fast cpus, since you refuse to suggest any - and they are /pointless/.

So, please, tell us what you want to do with your cpus - and why they would be so much better than existing solutions (like bigger cpus, hard cores and soft cores, and ad hoc state machines generated by existing tools). I will happily leave the GA144 behind.

I understand that - but you are missing the point. Even if all the cpu needs to do is take a 32-bit data item from /here/ and send it out /there/, a core like the F18A is lost. A 700 clock does /not/ let you keep up with the logic if you can't do anything useful without lots of clock cycles - then it is better to have a slower clock and real functionality.

An ARM Cortex M1 at 100 MHz is going to do a great deal more than an F18A-style core at 700 MHz (though I don't expect it to get anything like that as a soft core on an FPGA).

The SpinalHDL / VexRiscv RISC-V processor can run at 346 MHz from 481 LUT's on the Artix 7, in its smallest version. Bigger versions have slower clock speeds, but faster overall execution (with more variation in latency - which may not be what you want).

And you can program these in a dozen different languages of your choice.

Yes, I know that is not what you are describing. (And "big cpu" does not mean "Linux" - you can run a real time OS like FreeRTOS, or pure bare metal.)

I am trying to be open minded, despite how often you tell me you think I am closed. But I am failing to see an overwhelming case for these sorts of cpus - I think they fall between too many chairs, and they fail to be a better choice than existing solutions. If you can give me use-cases where they are a big benefit, then I will be happy to reconsider - but if you are asking where I think they have a good use, then currently I can't think of an answer.

Reply to
David Brown

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.