Compare ARM MCU Vendors

Cortex-M3 includes a three-word prefetch buffer, which in the best case can hold six instructions.

The macrocell also provides signals that export certain internal pipeline states for controlling "flash accellerators".

Best regards Marcus

--
note that "property" can also be used as syntactic sugar to reference
a property, breaking the clean design of verilog; [...]

             (seen on http://www.veripool.com/verilog-mode_news.html)
Reply to
Marcus Harnisch
Loading thread data ...

A significant point missing in the above comment you made, David, is that the _environment_ was different then as now.

For example, when the MIPS R2000 RISC processor was being designed and then instanced, they didn't have access to the highest density FABs... which were only found at Intel and Motorola, at the time. For obvious commercial reasons. Instead, they had to live with much lower feature sizes and capabilities and _still_ field a competitive product.

Also, in and around that period of time, the number of inverters and transmission gates (aka 'transistors') available was much, much less than now. Even if you were Intel or Motorola. As time passed, that capability grew to the point where folks weren't at all strapped and began wondering what else they could do with all those extras sitting around. Which opened the door for making design decisions that were impossible, earlier. Such as the PPro/P2 choice of decoding CISC instructions into RISC instructions, executed out of order and re-assembled later on. That simply wasn't possible, earlier.

I wouldn't characterize decisions made at that time as "wrong." The options available to a designer back then were very little like what is available now.

Jon

Reply to
Jon Kirwan

Since I'm intimately familiar with both, I'd be interested in discussing some of those decisions made in the 68k case if you are open to the idea in a public space.

The 9900 was very interesting, though I never had the chance to actually program one of them. If I recall correctly, it supported the visibility requirements of Pascal's local variables within nested functions. I'm not entirely sure I appreciate your comment about data flows because the Intel x86 did add some function prologue/epilogue instructions later on to also support that feature... but you are probably referring to some other aspect I know zero about.

Good books on this subject, too. I remember talking with one of the founders of MIPS (Hennessey) about his analysis of the

68020 (and this is part of why I'd very much enjoy a discussion on that topic, because of what he shared with me that long day) and discussing even such 'insignificant' details as why they chose to not flag registers as 'busy' in the R2000. There was a cycle-length cost to it because it added delay to a combinatorial chain, which reduced the clock rate possible. Instead, the next instruction would NOT wait until a write completed. They left that to the compiler to worry over. I was blessed to hear about many other such interesting design decisions they made on the R2000, that day. (It was a personal 1:1 meeting.)

PIC and 6502 being examples?

As in the Intel REP MOV or the DEC PDP-11's MOV (R5)+, (R6)+ as two very different examples?

8051 being a classic here?

I think I remember some comments you made about this, earlier.

Thanks, Walter. I enjoyed reading this.

Jon

Reply to
Jon Kirwan

That is precisely while I put the inverted commas around "wrong". ISA's like Thumb2 or ColdFire are more efficient than the x86 or ARM instruction sets using modern design and fabrication techniques - I fully agree that the situation was different 30 years ago (well, 21 years ago for ARM, IIRC). There is also the minor matter of trial-and-error - we know better know based on experience from past designs.

However, it is still valid to compare the x86 and the m68k architectures because they were from a similar time. The x86 was considered poor and old-fashioned when it was first made, and not suitable for powerful computing. The m68k was considered an elegant and modern design, with forward-looking design choices (such as support for 32-bit in the ISA, even though it only had a 16-bit ALU in the implementation). It was not without reason that the IBM engineers wanted the m68k for their new "PC".

Reply to
David Brown

In my mind I'd made the cut between specialized-register machines (i8080 etc., Z80, TI32000 DSPs, DEC 8, GE-635) and general-register machines (m68k, mc6800 -- sort of, m88000 DSPs, DEC 11, IBM360). Specialized were easier to code for because after you'd figured out which registers your data had to be in the low-level code was pretty much determined; general registers needed a conscious resource-allocation step. In code performance, the only head-to-head comparison I've done was to rewrite identical functions for Z180 and MC68HC11. The functions always took less code memory and less execution time on the HC11. Partly because there was less data-shuffling (and partly because HC11 conditional-branch operations were handier.)

Mel.

Reply to
Mel

y

The 68k guys certainly did that - and they created the 68k assembly language, which has been the most efficient native CPU assembly language I have seen since.

But the power architecture designers did a great job, too - apart from the barely if at all usable native assembly mnemonics.

I can hardly think of something I miss from the 68k on the PPC; and you will be surprised how little code size difference there is if you write with the PPC in mind (on VPA, which is what I call the language I built over the 68k assembly to produce PPC code).

Some examples:

68k byte or word operands affect only the byte or word in the destination register. This usually costs either a preceding extra clr.l or a subsequent ext.l, it is very rare one can take advantage of the reserved upper part data.

Power takes advantage of its 32 bit opcode size and lets us leave the condition codes unaffected if needed; not so on the

68k, this also costs code size (I am not talking speed at all here which is clearly in favour of power).

What I do miss from the 68k is the movem instruction with a bit per register which will be moved; VPA does this on power by simply doing a single move per register. I suppose this is one of the major code space eaters I have in my code. They do have a native move multiple opcode, but only for sequential registers and I do not use it much.

But the rlwinm, rlwnm and rlwimi PPC instructions are a major help. While the 68020 did have bitfield instructions, they did not make it to the cpu32 and were probably too bulky to implement; these 3 opcodes with a little help from others can really do a lot, I would not give them up, not without a fight :-) .

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Reply to
Didi

That and insisting on numbering bits the opposite way to everyone else.

-a

Reply to
Anders.Montonen

ROFL, I don't know whether to strongly agree or to strongly disagree with you :D .

Strictly, since it is big-endian, the numbering they use is correct. Heck, even the RFCs do it "officially" so. The technical implication is that bit, byte, word, long word, dual word etc. addressing is done only using shifting and AND, no complements need to be calculated (my vpa has to do that on rare occasions for 68k emulation, like:

00000001 00000000 20C9 001F 5D46 3001 btst d1,d2 00000000: 20C9001F subfic 31,r9,r6 00000004: 5D463001 rlwnm. r10,r6,0,0,r6 r6 being by convention wastable ) .

OTOH after many years of using both notations, even though I am now used to have them simultaneously, it is still a pain. Routinely manageable, but a pain. E.g. many times I do not go the extra mile to halve the code size of the example above, just use it as it is there to avoid the headache.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Reply to
Didi

HC11 and HC08 both have reasonably good code density. THere are many arguments for and against conditional branch vs conditional skip. In benchmarks conditional branch generally wins but on some very time critical code conditional skip wins.

Motorola's conditional branch set is complete but has limited range resulting in "5" byte two instruction long branches in some code.

I am surprised that the 6809 has yet to show up in this discussion

Regards,

w..

-- Walter Banks Byte Craft Limited

formatting link

Reply to
Walter Banks

:D

Indeed. But even I - having grown up on it - have long since put it out of use :-).

6800/11 code can be very dense indeed, some 25+ years ago I was used to count every single byte of code, used cpx # to skip 2 bytes and cmpa # (or did I use cmpb #) to skip 1 byte and plenty of other things, most were applicable also to the 09.

But since you asked for the 6809, here are it is - 3 instances of it emulated in DPS windows, the tiniest being as large as the graphics board I had built back then.. only here in colours, that first graphics board of mine had 2 bits/pixel I used to view on a mono monitor...

formatting link
A 400 MHz PPC does the emulation, something like 80+ times faster than the original 2 MHz 6809 system which is emulated, never cared to really measure it precisely.

Oh no, I can't believe all these years have passed. Walter, you really should not have asked that :D :D .

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Reply to
Didi

I consider the 68000 to be a nightmare compared to a real orthogonal machine like the Series 32000...

This did not fix the location of the fields though, so you would get a messy instruction decoder. The immediates were 7,14 or 30 bits wide with the size encoded in either the top bit or the two top bits.

I think variable size, fixed location of fields seems to be most efficient. The National CompactRISC was one of the first implementations of this idea.

16/32/48 bit instructions. "Quick" 5 bit immediates with reserved values that indicated that an extension word or two followed the instruction.

Internally, the decoder would decode directly the 16/32/48 bit instruction for a simple pipeline. For multiple clock functions like interrupts & exceptions there were 7-8 state machines that could override the instruction decoder.

Since the datapath was controlled by ~90 signals, this turned out to be a significant part of the chip.

90 x (9->1 mux) + state logic...

Turned out that this could be simplifed further. The reason for the state machines were that you need to do operations which are not supported by the instruction set.

When you enter an interrupt, you need to clear the interrupt flag. This is really an (unsupported) instruction

AND $IRQMASK, PSR I.E: AND 0xFF7F, PSR

The problem we found, was that you need to do operations on registers which are not accessible,and you need a few more instructions.

The problem was solved by extending each register address from 4 to 6 bits, allowing all registers (including PSR) in the CPU to be directly addressable by all instructions operating on registers. The opcode was extended by two bits allowing more instructions to be directly handled.

The normal instructions was extended from 16 bits to 22 bits but user code, would only use the normal 16 bits. If a multiclock function was needed, then the instruction decoder was fed from a 22 bit wide ROM which ran for a few clock cycles.

8 x 90 bit statemachines + 90 x 9->1 multiplexer were replaced by a 32-64 x 22 bit ROM and 22 x 2->1 muxes.
--
Best Regards
Ulf Samuelsson
These are my own personal opinions, which may (or may not)
be shared by my employer Atmel Nordic AB
Reply to
Ulf Samuelsson

The issue "big endian" vs "small endian" was resolved years ago, when it was finally realized that "the only good endian is a dead endian"

--
Best Regards
Ulf Samuelsson
These are my own personal opinions, which may (or may not)
be shared by my employer Atmel Nordic AB
Reply to
Ulf Samuelsson

Some of the instructions are barely understandable in assembly - but it's designed to be used with a compiler, not assembly.

And the EIEIO instruction is wonderful.

Yes, the PPC's bit numbering does not make it easy to understand. I've used a couple of the MPC5xxx devices, and the reference manuals are cryptic enough without having to deal with bit 31 being the LSB. It gets even more fun when some registers are 64 bit, and the LSB becomes bit 63. And if you have a part with more than 32 address lines, the LSB stays A31 for consistency - and the lines above A0 are numbered A-1, A-2, etc.

Reply to
David Brown

Indeed, but in VPA the same instructions are as easy to use and understand as in 68k assembly, so it is a matter of the mnemonics they have chosen, not the instruction set. For example:

00000000 8971 0004 movez.b- (4,a1),d3 00000004 3940 1234 move.l- #$1234,d2 00000008 3D80 1234 618C 5678 move.l- #($12345678).l,d4 00000010 61AD 0008 bset.l- #3,d5 00000014 71C6 0008 61CE 0008 bset.l #3,d6

these look pretty familiar and "normal".

Below is the list of the same with the native PPC mnemonics VPA generates, now these do not look that straight forward :-).

00000000 * 00000000 8971 0004 movez.b- (4,a1),d3 00000000: 89710004 lbz 4,r17,r11 00000004 3940 1234 move.l- #$1234,d2 00000004: 39401234 addi $1234,r0,r10 00000008 3D80 1234 618C 5678 move.l- #($12345678).l,d4 00000008: 3D801234 addis MSW_of_($12345678),r0,r12 0000000C: 618C5678 ori LSW_of_($12345678),r12,r12 00000010 61AD 0008 bset.l- #3,d5 00000010: 61AD0008 ori 1!
Reply to
Didi

Op Sat, 25 Sep 2010 00:28:10 +0200 schreef Ulf Samuelsson :

I prefer the term "cul-de-sacian"

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)
Reply to
Boudewijn Dijkstra

But then what do you call two of them: culs-de-sacian or cul-de-sacians? 8-)

"ded enjuns" is not only appropriate, but funny too.

George

Reply to
George Neuner

Op Mon, 27 Sep 2010 19:09:12 +0200 schreef George Neuner :

I hope not a lot of people will find genocide appropriate and/or funny.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)
Reply to
Boudewijn Dijkstra

Grow a sense of humor.

And watch where you sling the "G" word ... your ancestors have as much to answer for as do anyone else's.

George

Reply to
George Neuner

Sorry. I apologize for the last message ... I did not mean to imply anything in particular with respect to your family, which I have no knowledge of.

Rather I badly phrased my thought that, if you go back far enough, everyone has regrettable violence in their personal family history. It is a sad truth that people just don't have to go back as far.

George

Reply to
George Neuner

Op Tue, 28 Sep 2010 21:33:59 +0200 schreef George Neuner :

Read my previous posts and judge again. Hint: I seem to be funniest in April.

Anyway, I did not wish to spoil your fun, I was merely trying to emphasize the fact that we are making fun with dead Indians, and not making fun of their deaths.

The (alleged) actions of anyone's ancestors should not have any influence on their opinion of genocide.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)
Reply to
Boudewijn Dijkstra

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.