64-bit embedded computing is here and now

Sometimes things move faster than expected. As someone with an embedded background this caught me by surprise:

Tera-Byte microSD cards are readily available and getting cheaper. Heck, you can carry ten of them in a credit card pouch. Likely to move to the same price range as hard disks ($20/TB).

That means that a 2+ square inch PCB can hold a 64-bit processor and enough storage for memory mapped files larger than 4GB.

Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices as the FABs mature? Will video data move to the IOT edge? Will AI move to the edge? Will every embedded CPU have a built-in radio?

Wait a few years and find out.

Reply to
James Brakefield
Loading thread data ...

I don't care what the people say--

32 bits are here to stay.
Reply to
Paul Rubin

8-bit microcontrollers are still far more common than 32-bit devices in the embedded world (and 4-bit devices are not gone yet). At the other end, 64-bit devices have been used for a decade or two in some kinds of embedded systems.

We'll see 64-bit take a greater proportion of the embedded systems that demand high throughput or processing power (network devices, hard cores in expensive FPGAs, etc.) where the extra cost in dollars, power, complexity, board design are not a problem. They will probably become more common in embedded Linux systems as the core itself is not usually the biggest part of the cost. And such systems are definitely on the increase.

But for microcontrollers - which dominate embedded systems - there has been a lot to gain by going from 8-bit and 16-bit to 32-bit for little cost. There is almost nothing to gain from a move to 64-bit, but the cost would be a good deal higher. So it is not going to happen - at least not more than a very small and very gradual change.

The OP sounds more like a salesman than someone who actually works with embedded development in reality.

Reply to
David Brown

I think there will be divergence about what people mean by an N-bit system:

Register size Unit of logical/arithmetical processing Memory address/pointer size Memory bus/cache width

I think we will increasingly see parts which have different sizes on one area but not the other.

For example, for doing some kinds of logical operations (eg crypto), having

64-bit registers and ALU makes sense, but you might only need kilobytes of memory so only have
Reply to
Theo

There has always been different ways to measure the width of a cpu, and different people have different preferences.

Yes, that is common.

As is that. Sometimes the width supported by general instructions differs from the ALU width, however, resulting in classifications like

8/16-bit for the Z80 and 16/32-bit for the 68000.

Yes, also common.

No, that is not a common way to measure cpu "width", for many reasons. A chip is likely to have many buses outside the cpu core itself (and the cache(s) may or may not be considered part of the core). It's common to have 64-bit wide buses on 32-bit processors, it's also common to have

16-bit external databuses on a microcontroller. And the cache might be 128 bits wide.

That has always been the case.

Agreed.

32-bit processors have often had 64-bit registers for floating point, and 64-bit operations of various sorts. It is not new.
Reply to
David Brown

I agree with your points and those of Theo, but the cache is basically as wide as the registers? Logically, that is; a cacheline is several times that, probably you refer to that. Not that it makes much of a difference to the fact that 64 bit data buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are useless to me) are unlikely to attract much interest, nothing of significance to be gained as you said. To me 64 bit CPUs are of interest of course and thankfully there are some available, but this goes somewhat past what we call "embedded". Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered there is some real mess with their out of order execution, one needs to do... hmmmm.. "sync", whatever they call it, all the time and there is a huge performance cost because of that. Anybody heard anything about it? (I only know what I was told).

Dimiter

Reply to
Dimiter_Popoff

(General) Register size is the primary driver.

A processor can have very different "size" subcomponents. E.g., a Z80 is an 8b processor -- registers are nominally 8b. However, it support 16b operations -- on register PAIRs (an implicit acknowledgement that the REGISTER is smaller than the register pair). This is common on many smaller processors. The address space is 16b -- with a separate 16b address space for I/Os. The Z180 extends the PHYSICAL address space to 20b but the logical address space remains unchanged at 16b (if you want to specify a physical address, you must use 20+ bits to represent it -- and invoke a separate mechanism to access it!). The ALU is *4* bits.

Cache? Which one? I or D? L1/2/3/?

What about the oddballs -- 12b? 1b?

But you don't buy MCUs with a-la-carte pricing. How much does an extra timer cost me? What if I want it to also serve as a *counter*? What cost for 100K of internal ROM? 200K?

[It would be an interesting exercise to try to do a linear analysis of product prices with an idea of trying to tease out the "costs" (to the developer) for each feature in EXISTING products!]

Instead, you see a *price* that is reflective of how widely used the device happens to be, today. You are reliant on the preferences of others to determine which is the most cost effective product -- for *you*.

E.g., most of my devices have no "display" -- yet, the MCU I've chosen has hardware support for same. It would obviously cost me more to select a device WITHOUT that added capability -- because most purchasers *want* a display (and *they* drive the production economies).

I could, potentially, use a 2A03 for some applications. But, the "TCO" of such an approach would exceed that of a 32b (or larger) processor!

[What a crazy world!]

Reply to
Don Y

sync instructions of various types can be needed to handle thread/process synchronisation, atomic accesses, and coordination between software and hardware registers. Software normally runs with the idea that it is the only thing running, and the cpu can re-order and re-arrange the instructions and execution as long as it maintains the illusion that the assembly instructions in the current thread are executed one after the other. These re-arrangements and parallel execution can give very large performance benefits.

But it also means that when you need to coordinate with other things, you need syncs, perhaps cache flushes, etc. Full syncs can take hundreds of cycles to execute on large processors. So you need to distinguish between reads and writes, acquires and releases, syncs on single addresses or general memory syncs. Big processors are optimised for throughput, not latency or quick reaction to hardware events.

There are good reasons why big cpus are often paired with a Cortex-M core in SOCs.

Reply to
David Brown

Is it, though? What's driving that? Why do you want larger registers without a larger ALU width?

I don't think register size is of itself a primary pressure. On larger CPUs with lots of rename or vector registers, they have kilobytes of SRAM to hold the registers, and increasing the size is a cost. On a basic in-order MCU with 16 or 32 registers, is the register width an issue? We aren't designing them on 10 micron technology any more.

I would expect datapath width to be more critical, but again that's relatively small on an in-order CPU, especially compared with on-chip SRAM.

This is not really the world of a current 32-bit MCU, which has a 32 bit datapath and 32 bit registers. Maybe it does 64 bit arithmetic in 32 bit chunks, which then leads to the question of which MCU workloads require 64 bit arithmetic?

Sure, what you buy is a 'highest common denominator' - you get things you don't use, but that other people do. But it still depends on a significant chunk of the market demanding those features. It's then a cost function of how much the market wants a feature against how much it'll cost to implement (and at runtime). If the cost is tiny, it may well get implemented even if almost nobody asked for it.

If there's a use case, people will pay for it. (although maybe not enough)

Theo

Reply to
Theo

You can use a smaller ALU (in the days when silicon was expensive) to do the work of a larger one -- if you spread the operation over time.

It's just how people think of CPU widths. If there's no cost to register width, then why didn't 8b CPUs have 64 bit accumulators (and register files)?

Correct. I was just illustrating how you can have different "widths" in a single architecture; yet a single "CPU width" has to be used to describe it.

I treat time as a 64b entity (32b being inadequate). IPv6 addresses won't fit in 32b. There are also algorithms that can benefit from processing data in wider chunks (e.g., count the number of set bits in a 64b array goes faster in a 64b register than on a 32) My BigRationals would be noticeably faster if I could process

64b at a time, instead of 32. [This, of course, assumes D cache can hold "as much data" in each case.]

And you don't always need the full width of a register -- do you use all 32b of a register when you use it to keep track of the remaining number of iterations of a loop? Or, the index into an array? Or the time remaining until an upcoming deadline? Or processing characters in a string?

Yes. Or, an application domain that consumes lots of parts.

You also have to remember that the seller isn't the sole actor in that negotiation. Charge too much and the customer can opt for a different (possibly "second choice") implementation.

So, it is in the seller's interest to make his product as cost-effectively as possible. *Or*, have something that can't be obtained elsewhere.

Nowadays, there are no second sources as there were in decades past. OTOH, I can find *another* ARM (for example) that may be "close enough" to what I need and largely compatible with my existing codebase. So, try to "hold me up" (overcharge) and I may find myself motivated to visit one of your competitors.

[As HLLs are increasingly used, it's considerably easier to port a design to a different processor family entirely! Not so when you had 100K of ASM to leverage]

I worked in a Motogorilla shop, years ago. When I started my design, I brought in folks from other vendors. The Motogorilla rep got spooked; to lose a design to another house would require answering some serious questions from his superiors ("How did you lose the account?"). He was especially nervous that the only Moto offering that I was considering was second sourced by 7 or 8 other vendors... so, even if the device got the design, he would likely have competitors keeping his pricing in line.

Designers often have somewhat arbitrary criteria for their decisions. Maybe you're looking for something that will be available for at least a decade. Or, have alternate sources that could be called upon in case your fab was compromised or oversold (nothing worse than hearing parts are "on allocation"!)

So, a vendor can't assume he has the "right" solution (or price) for a given application. Maybe the designer has a "history" with a particular vendor or product line and can leverage that experience in ways that wouldn't apply to a different vendor.

A vendor's goal should always be to produce the best device for his perceived/targeted audience at the best price point. Then, get it into their hands so they are ready to embrace it when the opportunity presents.

Microchip took an interesting approach trying to buy into "hobbyists" with cheap evaluation boards and tools. I'm sure these were loss leaders. But, if they ended up winning a design (or two) because the "hobbyist" was in a position to influence a purchasing decision...

Reply to
Don Y

Of course I know all that David, I have been using power processors which do things out of order for over 20 years now. What I was told was something about a real mess, like system memory accesses getting wrong because of out of order execution hence plenty of syncs needed to keep the thing working. I have not even tried to verify that, only someone with experience with 64 bit ARM can do that - so far none here seems to have that.

Dimiter

====================================================== Dimiter Popoff, TGI

formatting link
======================================================
formatting link

Reply to
Dimiter_Popoff

It depends on the actual PPC's in question - with single core devices targeted for embedded systems, you don't need much of that at all. Perhaps an occasional sync of some sort in connection with using DMA, but that's about it. Key to this is, of course, having your MPU set up right to make sure hardware register accesses are in-order and not cached.

If the person programming the device has made incorrect assumptions, or incorrect setup, then yes, things can go wrong if something other than the current core is affected by the reads or writes.

Reply to
David Brown

You *do* need it enough to know what is there to know about it, I have been through it all. How big a latency there is is irrelevant to the point.

May be the assumptions of the person were wrong. Or may be your assumption that their assumptions were wrong is wrong. Neither of us knows which it is.

Dimiter

====================================================== Dimiter Popoff, TGI

formatting link
======================================================
formatting link

Reply to
Dimiter_Popoff

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.