the secret sauce

R

RichD 5 years ago

It?s well known that the ARM design goes into the bulk of the mobile devices. Presumably because it?s most efficient, energy wise.

However, Intel has the best chip designers in the world. So how did ARM outwit them? Anybody here have any insight into the guts of their design? And why is Intel unable to replicate their techniques?

Similarly, I heard a seminar on drones, they said they?re using a Qualcomm device. Again, we can presume for power efficiency, the dominant factor in drone design.

But I wonder what else makes it so attractive. Is that because it has mixed signal analog/digital functions? Qualcomm is known for CDM, is that useful in a drone?

? Rich

Vote

R

Rick C 5 years ago

Which CDM would that be exactly?

Cash deposit machine, in banks Clean Development Mechanism, a mechanism in the Kyoto Protocol for reducing emissions Ceramic discharge metal-halide lamp, a lamp/light source Charged-device model, used in electrostatic discharge testing Clinical data management Code division multiplexing Cold dark matter, a scientific theory Combining Diacritical Marks, for keyboards Combustion detection module Common Diagnostic Model, a standard of the Distributed Management Task Forc e Conceptual data model Content Decryption Module Customer data management, software and behaviors for businesses to handle c ustomer data Continuous Diagnostics and Mitigation, a program of the Department of Homel and Security; see Federal Systems Integration and Management Center Congenital dermal melanocytosis, a benign birthmark

Inquiring minds want to know.

Rick C. - Get 1,000 miles of free Supercharging - Tesla referral code - https://ts.la/richard11209

Vote

D

Dimiter_Popoff 5 years ago

ARM have just been lucky to be there with the power efficient

32 bit architecture. Flawed as it is (just 12 GP registers in a load/store machine) it is still much much better than x86 which has been a mess on top of a previous mess, many layers of the above. Power is the best architecture I know of by a great margin but in the 90-s the lowest power power (PPC back then) was 603e, much too large/power hungry to be considered for a mobile phone.

So it is more a matter of having the right part at the right time *the first time*, legacy takes care of it for decades to come. I am far from sure 64 bit ARM is any good (have had just a vague look at it a long time ago, no memories left) but because ARM32 took over the mobile market ARM64 just lived on.

Dimiter ====================================================== Dimiter Popoff, TGI

formatting link

======================================================

formatting link

Vote

C

Clifford Heath 5 years ago

Simple. ARM doesn't have to emulate the x86 architecture.

Back when MIPS, HP, DEC were all making chips of comparable power to the Pentiums, the MIPS chips were 2 mask layers, the HP were 3, the Alpha was 4, and the Pentium...

... wait for it...

was 17 mask layers. Every one with a finite yield < 100%.

It's amazing they were able to stay in business.

It's amazing that Intel didn't push their StrongARM chips into the limelight 20 years ago. Instead their arrogance and momentum (and undeniable competence) carried them to this juncture.

Clifford Heath

Vote

G

Gerhard Hoffmann 5 years ago

You count mask levels for Intel and metal levels for the others. You cannot do 2 metal levels with 2 masks, and that was entry level evan then.

When Intel pumped out the 200 MHz Pentium into the mass market, that was the end for DEC Alpha, Power PC, Moto, Clipper, HP Snakes, Spark and MIPS.

And HP's New Architecture, that HP could not transform to silicon, and that became Itanium was nothing but a big flop. It turned out that one could use an instruction bundle decoder + a RISC CPU with 200 renaming registers with an X86 bundle just as well.

It took ARM 20 years to catch up speed-wise, and the shootout is still to happen. And with the 960, Intel had the best-selling RISC chip for some years.

Intel has become lazy, rested on its laurels and was taken over by the bean counters. No more leading-edge process technology. They are backporting chips to 14 nm because they can make them. :-(

Gerhard

(who liked the 432. It was ahead of its time. With today's cache technology and multi-issue it could be a hot performer.)

Vote

D

Dave Platt 5 years ago

Their need to service the X86 legacy market was quite an anchor around their necks. They did try to move away from it with the Itanium/EPIC architecture, but (as I understand it) this was targeted for high-end servers and not for low-power mass-depoloyment applications.

Some years ago, Intel tried to gin up a set of chips which would be attractive candidates for the consumer-electronics business - set-top cable boxes, DVRs, and so forth. As I recall these were basically Atom-type CPU cores with added, specialized peripherals.

They were hoping to compete against Broadcom (with its MIPS- architecture SOCs). I don't think ARM was a big player in that product space yet.

The company I was working for did an evaluation, and worked with them on a proposal for a next-generation chip. Nothing really came of it - their offering wasn't cost-attractive for us vs. the Broadcom chips we were using at the time. If I recall correctly we did do a proof- of-concept port of our software onto the Intel architecture and got it running on their eval boards, but abandoned the work when Intel cancelled the program.

Vote

J

John Larkin 5 years ago

Intel has an apparenty emotional commitment to X86. They have taken shots in both directions, super-CISC and RISC, even had an ARM license, but flubbed them all.

They used to get away with a dreadful CPU architecture by having the best fabs. They don't any more. AMD even does x86 better.

RISC-V will be interesting.

Vote

J

Jan Panteltje 5 years ago

On a sunny day (Fri, 8 Jan 2021 09:31:14 +1100) it happened Clifford Heath wrote in :

Linus on Intel:

formatting link

I really do not know, this latop is core i5, and runs several Linux distros, never a problem, is REAL fast, also has Radeon AND Intel display hardware, but not very long battery life.. Rest of PCs is AMD, but wait, now 4 raspberies with ARM running, 3 of those 24/7, only takes a few Watts, latest one an RPI4B 8GB, just weeks old.

ARM has won here on the desktop, I asked in the raspi newsgroup if they should add ECC memory correction :-) After all we will, no doubt, take raspi and ARM to space a some point.

Intel has done very good work over the years, not sure why people are all of the sudden so against it. Sure keeping compatible with the old x86 was a must for some people, or may be a selling point.

Just move on. If it is open source and written in C no matter what comes after ARM will be no problem. I think there is, as ARM was sold to investors a few years ago, now a fight to prevent having Nvidea own it?

formatting link

when investors and hedge funds and those that do 'the profit thing only' get control no telling where it will go.

Maybe specialized hardware _again_

formatting link

Who controls the anti Intel media? ;-)

Just a thought.

Vote

D

Don Y 5 years ago

It was also ahead of the tools and mindsets of folks developing systems, back then.

An affordable F/T/J-11 would have been an instant hit!

Personally, I liked the 16032 et al. -- buggy as they were. They were also relatively easy to design (hardware) into systems at a lower overall cost-point.

Vote

M

Martin Brown 5 years ago

What makes you think that?

ARM had better ideas and better brains and no compulsion to provide legacy backwards compatibility. It was a novel 1990's architecture. (which admittedly had some faults but not as many as the ancient x86).

Perhaps your assumptions are invalid. Being stuck with an old x86 architecture that was a bodge on the 8080 CPU ties their hands a bit...

ARM used a RISC instruction set long before it was fashionable - in part inspired by the humble 6502 which was the core of early home computer products like the Atom and hugely successful BBC computer (aka Beebon). ARM was a spin off from that in the 1990's.

Right combination of compute power and comms.

It remains to be seen if being taken over by NVidia totally wrecks their business model. I was not impressed being forced to sell to Softbank.

Regards, Martin Brown

Vote

B

Bill Sloman 5 years ago

That was when it was Acorn Computers, which went bust in 1985 and got rescu ed by a takeover by Olivetti.

formatting link

Long after the Olivetti takeover.

Being taken over by Olivetti didn't seem to put much a crimp into Acorn's l onger term projects. Development is an expensive business, and financing it can be tricky.

Bill Sloman, Sydney

Vote

D

David Brown 5 years ago

PPC is a nice ISA, certainly, but has some weird quirks like its backwards bit numbering. And performance per watt has never been its strong point, nor performance per dollar. It has also suffered by being a very closed environment - even today, for a lot of PPC devices you need expensive and restricted compilers to get good support.

The biggest competitor for low-power cores was MIPS. Even now, they have cores that match many of ARM's cores but beat them on at least one of the key aspects (speed, power, size). And yet most chip designs use ARM, not MIPS. I wonder why that is?

Part of it is momentum, of course - ARM got in there early and its then always easier to continue with the existing core or choose the market leader than to pick something else.

Part of it is, I think, that ARM was independent for such a long time, while MIPS was owned by various different companies at different times. Generally speaking, if a processor architecture is owned by a parent company that makes end products with those cores, everyone else will be reluctant to use the cores. MIPS has mostly been owned by companies that want to /use/ MIPS cores, not companies that want to /sell/ MIPS cores.

ARM was free of that - they could sell to anyone, and anyone could buy from them. Even when that Japanese fund bought them, this remained the case. But if Nvidia buys them, it's a whole different story. Nvidia makes processors - so they will be in direct competition with any other company that wants to use ARM cores in their processors. The same goes for graphics systems, tablets, car parts, and anything else Nvidia makes now or in the future.

The only long-term way forward for many core users will be RISC-V.

Vote

D

Dimiter_Popoff 5 years ago

It is not just the ISA, it is the register model, the MMU, the overall idea, the 32 -> 64 bit vision behind the design. The guy who did it has been really good. The mnemonics are horrible of course if you are limited to their toolchains. The bit numbering is simply big endian, but it can be confusing as few of use count the bits this way. Which is why in our toolchain you can have it either way at any moment. Has not been an issue for me ever (almost 20 years since I have and actively use our toolchain for power and have never strayed from it).

This is wrong, you cannot name a part which can deliver better power per watt using the same nm technology being clocked at the same speed. In fact none can come near (try 120nm 200 64 bit FPU MMACs at 400 MHz clock plus plenty of parallel integer unit work plus networking, lots of serdes in operation via smart DMA etc., all that within 1W).

This is true of course. I am not sure *why* they were so determined to keep the architecture under a lid only for chosen ones (I am probably the only one who has managed to get around that) but it is a fact. Probably someone somewhere thought they were preserving an advantage this way - which may have even worked for a while. Yet to be seen if opening the power architecture (which IBM did a few months ago) will have any effect, neither of us is too optimistic about that but it will take a few years to know.

This is probably valid for core choice for new product lines but the ARM inertia driven by linux/android etc. is no smaller than the windows/x86 inertia and the vast majority of the processors go into that sort of thing.

I have given up trying to predict the future a long time ago, for now I can get the parts I need (not RISC-V, some day I may check if they allow both little *and* big endian move instructions of equal efficiency the way power does, if they do I might be interested.)

Dimiter

====================================================== Dimiter Popoff, TGI

formatting link

======================================================

formatting link

Vote

J

jlarkin 5 years ago

Wrong! x86 was a bodge on the 8008 which was a bodge on the 4004.

Instruction sets have no inherent dollar value, so RISC-V will take over many cases. ARM will be valuable only as licensing complex silicon subsystems, cpus with cache controllers and DMA engines and macs and such. RISC-V will eventually take over a lot of that too as it evolves.

formatting link

John Larkin Highland Technology, Inc The best designs are necessarily accidental.

Vote

D

David Brown 5 years ago

There is only one set of assembly mnemonics for the PPC (for every PPC programmer, except you). But you don't often work with that kind of device in assembly.

Anyway, the EIEIO instruction has to count as the best mnemonic ever used!

The bit numbering on the PPC is, AFAIK, unique - at least on cores used in microcontrollers. There are plenty of cores with big-endian byte ordering (m68k and original MIPs for example). But on the PPC, "bit 0" is the most significant bit.

That means if you have a purely 32-bit PPC, "bit 0" represents 2 ^ 31, and the LSB is "bit 31". If you have a 64-bit device, the LSB is now "bit 63". That includes chips (such as most NXP PPC microcontrollers) that are primarily 32-bit in practice, but have support for a few 64-bit operations, your registers are bits 32 down to 63. Documentation and reference manuals are therefore seriously confusing.

On the other hand, in order to be "consistent", the data bus and address bus fix "bit 0" as 2 ^ 31 and "bit 31" as the LSB. This means you have to swap all your numbering on any schematics in order to fit with any other device on the planet. And if you have a PPC chip capable of addressing more than 4GB, the MSB's of the address are labelled A-1, A-2, etc. A 64-bit external databus has LSB D31 and MSB D-32, with D0 in the middle - but those bits are internally numbered from D63 at the LSB up to D0 for the MSB.

It is utter insanity.

Sure, when you are programming in C, C++, Ada, or whatever, most of that is hidden. But if you are designing electronics with these things or doing low-level programming with a PPC microcontroller, you have can't escape it.

Right - so you take one bad system and mix it with a good system so that no one knows what's going on.

You can get used to anything with enough practice. That can make it difficult to see how bad it is in the first place.

I strongly doubt that - but to be fair, none of the PPC microcontrollers I have used have targeted low power usage. None of the PPC microcontrollers I have /seen/ have been for low power - they are all big, high power and high cost, with large packages. They are fast, and have a reputation for serious reliability and safety - fine chips for their target usage.

It is entirely possible that low power PPC cores can be made with similar performance per watt than, say, ARM or MIPS. I just don't see it in the PPC chips that are actually available.

The target users for PPC microcontrollers are primarily in the automotive industry - who see high tool costs as an advantage (it gives them more arse-covering credit in court). There simply aren't enough small users to justify NXP, IBM, etc., paying Red Hat or others to fully support the various PPC cores and varieties in gcc toolchains. gcc supports real Power devices (because IBM likes Linux), PPC chips that used to be common in workstations and Macs, and a few varieties of embedded cores.

It is too little, too late. Power has a reputation that can't be eliminated by just "opening the architecture". And people need /very/ good reason to switch from existing cores, or ones that are clearly simpler, more open and more popular and suitable for the future (RISC-V).

That's true in the mobile world. For the embedded world, it's easier to change core as you don't have the same requirements for binary compatibility. Yet MIPS has not caught on there either. (I blame Microchip for at least part of that.)

Out of curiosity, what do you need the efficient reverse-endian moves for?

Vote

D

Dimiter_Popoff 5 years ago

The lowest level mnemonics - which are the only ones you know - exist for me as well, though I have to deal with them only during tracing. The only modification I have made is the source/destination ordering, to be consistent with the higher levels (the destination is the right most operand, not the left most as in the original IBM mnemonics).

I have not seen that sort of thing, 32 bits are 0 to 31, 64 bits are 0 to 63. It is just that if you count the bits big endian the lsb will have the highest bit address, nothing too hard about that. Then nobody is forcing you to accept their bit numbering unless you are implementing a compiler (which I have for our vpa and it cost me no additional effort of any significance).

You may want to reread what you are commenting. Otherwise you will have to explain how I got used to it - since I said I never had a problem with it - at the beginning. Hopefully you understand that here you just don't know what you are talking about.

Not just similar. Better.

To move data equally efficiently when it is ordered either way, to be able to know how bytes-words-longwords-doublewords are ordered in memory and thus be able to access a byte say of a doubleword without having to access the entire doubleword. Not a huge thing but I need a better reason to give these up than anybody's prediction of the future (mine included).

Dimiter

====================================================== Dimiter Popoff, TGI

formatting link

======================================================

formatting link

Vote

D

David Brown 5 years ago

If you have to read the documentation for a PPC microcontroller, then you /are/ forced to accept their bit numbering. The same applies to using their headers.

Sure, if you are doing everything yourself, writing your own headers, and writing your own tools, and you are using the same device for a decade so you don't need the manual - then you can ignore the odd numbering and use whatever suits you best. But that doesn't apply to most developers.

If /you/ don't find the PPC bit numbering odd and awkward, and never have done, then that's great. All I can say is that /I/ find it odd and awkward, I find it inconvenient and quickly confusing in schematics, in the reference manuals (of the Freescale PPC microcontrollers I have used), and in writing low-level code for the devices. I can live with it - I've done five PPC microcontroller based boards that I can remember. But it's still horrible - and that opinion is shared by everyone I have worked alongside in connection with these projects.

Surely all the data in memory is ordered in the way that is natural for the processor? It is only when you have to share data with something outside - file formats, network protocols, interface to external chips - that you'd want something else. I can accept that controlled endian access is sometimes convenient - maybe you want to access a buffer as both a byte-oriented buffer for transmission on a network and as a structure of data of different sizes and types, and that only works if you have control of the endianness. But I'd view the existence of a "load 32-bit with reversed endianness" instruction, rather than using "load 32-bit with native endianness; swap endianness" pairs, as a very minor point.

Vote

D

Dimiter_Popoff 5 years ago

I have only used some of their SOCs, never looked into their small power based MCUs, for that I used coldfire, hc11 previously, some TI DSP etc. (hey, I am even about to utilize a kinetis M0 for a toy I want to make :) I can imagine if the bits of the peripherals for an MCU are partly ordered big endian and partly little endian this can be annoying, I am not sure I ever saw that for the SOCs I use though. And the bits for the peripherals are ordered little endian I think. Then under vpa (for the record: virtual processor assembler, soon to be renamed to mia - machine independent assembler) you can happily live almost all the time without knowing there is such a thing like big endian bit numbering (though it can be handy at times and is of course available).

I am certainly writing my own headers of course. If under "most developers" you mean people pouring something into an MCU after a week or two of coding I have nothing to say, I just don't remember how I have done something like that in a short while. Most of what I do goes into the dps environment, has been the case for nearly

30 years now - with very few of what has been done thrown away or forgotten. So spending a day or two writing the headers for a device I will be using for years to come is not the biggest of issues when it comes to the huge SOCs (huge headers respectively), and writing the header for a small MCU takes hours, not days; probably more efficient than learning/adapting to another author's naming ideas etc.

Well when I first moved from 68k to power - around 2000 - and faced their bit ordering in the core it was just a matter of raised eyebrows, nothing more. Was negligible, not least because it was the tiniest of parts of the rest I have been doing I suppose.

On a small MCU system this is minor indeed. On the larger systems I do it is less so, the internet is generally big endian but not without exceptions, so are many file formats etc. So I have both "move" (big endian) and "mover" (move reversed, little endian) and occasionally I want to access bytes or words, not entire longwords or doublewords. Power (the parts I use) can do this with equal efficiency, not that it will cost that much to be done at compilation time of course so the issue is survivable but not minor enough for me not to take into account.

Dimiter

====================================================== Dimiter Popoff, TGI

formatting link

======================================================

formatting link

Vote

J

Joe Gwinn 5 years ago

[snip]

There is a long history. This is from the early days, when the Endian Wars broke out:

.

Joe Gwinn

Vote

U

upsidedown 5 years ago

That history doesn't go far enough back in time.

Many of the first computers were serial machines to limit the number of tubes required. While logical operations like AND, OR, NOT are simple operations affecting only one bit position, thus it is irrelevant in which order different bit positions are processed.

However, additions/subtractions are more complicated since a carry can be produced from one bit position, which must be propagated to the _more_ significant bit position. Thus least significant bit positions must be processed first, so it is natural to use a Little-Endian format circulating an integer through acoustic delay line registers with the LSbit first.

Later decimal computers were implemented around a 4 bit BCD adder and the decimal digits were fed serially with least significant decimal digit first, i.e. Little-Endian, so that the decimal carry was properly handled.

Even in microprocessors with 8 bit data bus but generating larger than

8 bit effective addresses (e.g. index register+offset) the LSByte addition must be done first, which can be done inn parallel while fetching the most significant byte from (code)memory i.e. Little-Endian.

Of course, if all computer ALUs, registers, main memory and communication channels would have the same width, say 36 bits and memory would only be addressed as 36 bit words, then endianess would be irrelevant.

Vote

the secret sauce

Join the Discussion

Didn't find your answer?