New ARM Cortex Microcontroller Product Family from STMicroelectronics

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Jun 26, 2007 10:14 AM

The data is embedded inside the instruction, and gets loaded into R0, using immediate addressing mode. I bet there is no Harvard architecture, which does not allow use of constants embedded in instructions, so if you accept that, then you also accept that there is a path from the instruction bus to the internal registers without having to connect the databus.

What you "like" is really irrelevant for the discussion on whether moving from instruction bus to ALU is deviating from original harvard or not.

If the microarchitecture connects the instruction memory to the ALU using an internal mux in the CPU core it is original harvard. If it connect the databus to the instruction memory to fetch data, it isn't.

--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

- W
- wilco.dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Jun 27, 2007 3:35 AM

meddelandetnews: snipped-for-privacy@p77g2000hsh.googlegroups.com...

That still doesn't data. If I have 1KByte of data, can you encode it in 1KB of code? If the answer is no, then they are not equivalent.

Sure this path always exists for immediates. But this path can't read any random byte from the instruction memory, it can only be used for immediates. You need special instructions like MOVC to read actual data from a random address. The key "feature" of the original Harvards was that they could not treat instructions as data, unlike Von Neumann.

I don't see why you'd want to separate them based on how the buses are implemented, this is a micro architecture detail. There are various options, including a mux inside the core, 2 separate buses with a mux to unify them or an independent bus used only for code->data. Each of these options behaves the same from a programmer's perspective, so there is no obvious way to differentiate them.

Wilco

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Jul 4, 2007 1:38 AM

memory, and no read code from ram, without

connection).

std 80C51 has no Code-Data opcodes, and

and unless you somehow claim the

MOVC Data,@A+PC opcode

There are many more Acc-> Data Space

Copying data between different memories always involves some kind of temporary buffer, usually a register. Whether there is an instruction that can do it in one go is irrelevant, the implementation would sequence it into a load and a store anyway.

Put simply, the question is: "can I read data from a random address in code memory?". If the answer is yes, there is a connection between the code and data memories, even if the path goes via the accumulator. If so, you can implement general pointers in C (albeit not very efficiently). If not, you cannot do general C pointers at all and you need to transform all constant data into executable code (very inefficient).

Wilco

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Jul 4, 2007 7:17 AM

memory, and no read code from ram, without

connection).

The std 80C51 has no Code-Data opcodes, and

memories", and unless you somehow claim the

MOVC Data,@A+PC opcode

There are many more Acc-> Data Space

one go is

memory?".

even

in C

and you need

I seem to recall you saying that "pure Harvard" is the case where this is not allowed (code space cannot be read as data, at all) and also saying that such beasts aren't found, today. Doesn't this suggest to you that you are defending a distinction without a difference?

When I think of a processor as being "Harvard," several meaningful modern distinctions come to mind. One is more a matter of software, where I am mentally aware that (1) I probably cannot run code from RAM, should I want to do so ... for example, because I'd like to program the code flash and there is only one memory controller for the code flash (no separate blocks, for example) and (2) the depth and breadth of the data access methods coded in the instruction space should not be construed to indicate anything for the code access methods. Two is more a matter of hardware, where I'm mentally aware that there may be (often is) separate data buses, address lines; and if neither of these, then control lines indicating separate spaces, to be considered together with any intended software -- for example, I may _want_ to subvert the architecture and overlay the memories. And if there is no external bus, that I may still simply have to take that into account in considering data structure design.

A distinction matters, in other words. And I don't find much value in your conflation of architectures which have a single memory viewpoint, where code and data share a common address space and all the memory addressing mechanisms can be used for reading either code or data, with architectures which have several memories with material differences in their access from an instruction point of view. It seems to me that you think there is little meaningful difference, so you just lump them together. I find that kind of conflation as without useful discerning value.

Jon

- P
- PaulInCa
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Jul 4, 2007 1:43 PM

If I can add a few cents on this: the term "Harvard" architecture di originally refer two completely different buses (with overlapping addres spaces), but like the term RISC, has been overloaded and adapted over th years. With big registers (32 bits), the need to reuse addresses just i not there (unlike 12-bits (of old) or 16-bits of DSPs and 8-bi processors).

The true Harvard devices have to add special move instructions whic refocus the address in the register, since 0x100 (say) will point to tw different locations: one instruction and one data. In some DSPs, this als means different sized buses. The advantage was that b 0x100 clearly mean the instruction bus and load R0,[0x100] clearly meant data. The bad thin was when you needed to read data from the code bus (constants an literals) or run from RAM (such as for external boot).

For Cortex-M3, the term Harvard is still in reference to two buses (tha can operate in parallel), but for convenience, the address mapping is suc that they do not overlap. So, addresses 0 to 1/2GB are mapped to one bus and addresses 1/2GB on up to one or more other buses (system an RAM/peripheral). The processor can take advantage of this by pre-fetchin instructions while also reading/writing memory through the LSU. If you ru code from RAM or you read/write memory from the code space, an arbite simply prioritizes contending operations (favoring data over pre-fetch since it is just pre-fetch). Running code from RAM does not run at hal speed since LSU operations usually make up less than 1/5th of th instruction stream. Further, Thumb-2 has many 16-bit instructions, so on pre-fetched word can serve two instructions.

The value is seen over ARM7 and the like simply because code and data ar not normally competing for bus, so LSU operations are fast. Further, th LSU can pipeline back-to-back loads and stores (as well as load-multipl and store-multiple). Finally, a store buffer hides the wait time on th store to complete (unless another load or store comes along too soon).

The other big advantage is for interrupt processing. On an interrupt, th pre-fetcher is loading the instructions of the ISR while the LSU i handling stacking (saving return_PC, xPSR, LR, R0-R3, and R12). Becaus they are running in parallel, the users 1st instruction is executed tha much faster. And note, this is a normal C function since the 5 scratc registers have been saved and the LR properly setup. On return fro exception, if not tail-chaining, it can use the same trick of loading cod while popping the stack. This gets back to the interrupted code that muc faster.

The original term was "Harvard style" architecture. I think it is fair t say that it avoids the nastiness of true Harvard, since you can addres any location (0 to 2^32-1) for any purpose from any instruction (othe than range limitations).

Hope this clarifies. Paul

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Jul 5, 2007 4:06 PM

I don't see how that is true at all. When you add a waitstate you freeze all stages of the pipeline while you wait for the Flash to finish the access.

I don't have the option of increasing the width of the Flash. Besides, you statement is just plain wrong. If you add waitstates, the Flash width is not relevant. If you increase the Flash width, you can use fewer wait states, but that is entirely different from what you are saying. What did you mean to say?

Sure, if you want to turn off the flash entirely you can get the power down, or if you want to stop the clocks you can get the power down. My point is that under normal operating conditions, the part has a hefty static power consumption so that running at half speed does not get you near half current. Still, it is a lot better than the Luminary parts, but not so near the Atmel ARM7 parts.

I don't see that figure anywhere. What page did you read this? Regardless, if they are doing things to reduce power consumption, like running from RAM, then the comparison is still not apples to apples. It just depends on what you want to compare.

Ok, has anyone else on the planet published similar results? I have not even heard anyone else make this claim, much less be able to support it.

I'm not even considering that. I am reporting what I have read as measured results. But then I don't know of any published benchmarks. I guess that is what is required.

How do you make the memory wider? I would love to be able to do that with a lot of the MCU chips I have used in the past. Can you use this Flash stretcher tool on Atmel parts as well? Then they could run at

55 MHz with no wait states!!!

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Jul 5, 2007 8:30 PM

I don't know exactly how the Cortex work, but I worked on the internals of another 32 bit RISC core. This core had a 16 byte FIFO in the first pipeline stage. The prefetch mechanism loaded 32 bits into this FIFO each access. The memory controller could add waitstates to this access if neccessary.

The first pipeline stage did a simple decoding of the top halfword of the FIFO to determine the length of the instruction (1-3 halfwords) and if the FIFO had enough valid content, the full instruction was made available to the second decoding stage, otherwise a "not valid" signal was asserted.

The second stage would either execute the instruction, reading 1-3 halfwords from the FIFO, or if the "not valid" was asserted, the second stage would execute a NOP instruction.

Since most instructions are 16 bits, and you read 32 bits at a time, zero waitstate operation allows to fetch almost two instructions per cycle. The FIFO will quite soon be filled and if the odd 32/48 bit instruction pops up, it wont hurt your performance.

If you have one waitstate, you will see that the bandwidth is still high enough that 1MIPS/MHz can be maintained as long as you only execute 16 bit instructions. You will be hurt by fetching a 32 bit instruction since that takes 2 clocks.

I have run the SAM7 at 48 MHz, zero waitstate. Does not work over the full temp range though. The AVR32 will support 1.2 MIPS/MHz @ 1 waitstate operation @ 66 MHz due to its 33 MHz 2 way interleaved flash memory. (1st access after jump is two clocks, subsucquent accesses are 1 clock)

--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 6, 2007 10:55 AM

meddelandetnews: snipped-for-privacy@o61g2000hsh.googlegroups.com...

Thanks for the rundown on this alternative CPU. Sounds a bit like the National 32 bit CPU with variable length instructions. That was supposed to be a fast CPU, but not a commercial success. If there had been a longer term commitment, it may have grown in popularity. But the realities of the commercial CPU market allowed it to pass on to the CPU boneyard.

...snip...

No, the "odd" 48 bit instruction won't hurt performance, but the FIFO already has had a negative influence anytime the instruction sequence is not linear. It is, in terms of the negative effect, like adding pipeline stages. The entire FIFO has to be flushed anytime you branch.

Even executing 16 bit instructions takes a 1 clock cycle hit on a branch. Instead of having the next instruction in the FIFO, you have to wait 2 clock cycles before you can start decoding it.

How does that compare to the Cortex M3 running at 50 MHz with no waitstates and no branch penalty?

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 6, 2007 2:51 PM

It was not a Series 32000 CPU. The Series 32000 has (IIRC) instruction sizes which varied between 2 and 21 bytes.

I.E. movzbd x(y(sp))[r0:d], z(w(sb))[r1:d]

with all displacements beeing 30 bits.

The FIFO is implemented using Flip-Flops and you had a simple three stage pipeline (fetch, decode,execute) so your latency was not dramatic.

Yes, but if the jumps are probably only 10-20% of all instructions so you lose only between 10-20% of the performance instead of 50%. The AVR32 loses less than 10% in average.

The UC3000 is claimed as 80 MIPS at 66 MHz. For the Cortex M3 to reach 80 MIPS at 50 MHz, you have to have 80/50 = 1,6 MIPS per MHz. I think that ARM does not claim that the Cortex is close to 1,6 MIPS per MHz.

The AVR32 is decidedly better on DSP algorithms due to its single cycle MAC and also it has faster access to SRAM. Reading internal SRAM is a one clock cycle operation on the AVR32. Bit banging will be one of the strengths of the UC3000.

--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, Jul 7, 2007 9:32 PM

That is not the point. By prefetching the instructions, you are setting up for a bigger dump and subsequent loss of instruction memory bandwidth when you branch. FIFOs or instruction prefetching are not a perfect solution. It is much better to just have single cycle memory.

But you are comparing apples and oranges. A processor that has no wait states doesn't have to deal with this no matter what the instruction mix is. It is just much simpler to not have to consider memory latencies.

Oh, this is marketing stuff. I thought you might have run some real benchmarks or someone else at Atmel might have. Certainly they have looked hard at the Cortex. But if it competes too well against the AVR32, I can see why it would not be pushed at Atmel. Certainly there will be a lot of sockets that will be won by an ARM device over a sole source part like the AVR32. At this point I don't think anyone can say whether the AVR32 has legs and will be around in 5 years. It has been out for what, a year or so?

Isn't reading internal SRAM a single cycle on *all* processors? I can't think of any that require wait states. In fact, most processors try to cram as much SRAM onto the chip as possible because it is so fast. Did you say what you meant to say?

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, Jul 7, 2007 10:44 PM

Actually it is not, because if you try to decode your instruction in the same stage as the decoding, your clock frequency will go down significantly. The prefetching will work with single cycle memory and with memory having waitstates.

Prefetching, decoding and execution, all will take one clock. If you execute at 66 MHz with a three stage pipeline then you probably will execute around ~40 MHz with a two stage pipeline (Just a guess).

If you execute blocks of 5 instruction including one jump, each block will use 7 cycles (3 + 1 + 1 + 1 + 1) @ 66 Mhz in a three stage pipeline for ~ 10 blocks / us.

In a two stage pipeline, you could use 2 clocks for a jump so you execute (2 + 1 + 1 + 1 + 1) @ 40 MHz which is 6,5 blocks / us, clearly slower.

A processor running from flash without waitstates will be limited in performance by the memory. A processor which reads multiple instructions with waitstate will be able to execute faster due to its higher bandwidth to memory.

They have run benchmarks on the AVR32, but I think people are relying on official figures for the Cortex.

And hopefully ARM device from Atmel :-)

Fortunately there are plenty of sockets around, and some will go AVR32.

On the UC3000 family, loading from internal SRAM will take one clock in the execution stage. Using single cycle SRAM does not mean that the load instruction is 1 clock.

--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sun, Jul 8, 2007 12:42 AM

Of course, yes it "is much better to just have single cycle memory" - but in the real world, chip designers have to settle on what they can get, and right now, FLASH access speeds are a real bottleneck on uC performance.

Width of FLASH access, (or Interleave), can have MORE impact on final speed, than any subtleness in the core itself.

You can say (almost) the same for the CortexM3 ? It too is quite new, and I've not seen any multi-sourced (pin/peripheral compatible) offerings. Will it hit 'critical mass' ? From a porting viewpoint, an Atmel ARM7 user, could find it less of a jump to go to AVR32 (or the comming Atmel Flash ARM9's), than CortexM3, as the Atmel peripherals are very similar.

The AVR32 I see as having a long life, it seems to have low cost tool flows, and good debug support. (Don't underestimate the importance of good debug support.)

The actual uC Cores matter less and less : package and peripherals have determined our shortlists in latest projects - and the ST Cortex even made it onto the list, on that basis, until we found their serious oops, that CAN and USB were mutually exclusive ?!?

Then, there is the new Coldfire V1 core from Freescale. Choices, Choices....

-jg

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jul 9, 2007 3:39 PM

What are you talking about??? How is slow memory faster than fast memory???

Since when do I get to design my own processor??? Everything you have just written is based on your own assumptions. This is a pointless discussion since everything you say is based on *your* assumptions! In addition, you only consider the parts of the issue that you choose to include. You did a timing analysis on paper that does not include the effect of branches. Clearly not accurate regardless of your assumptions!

Again you are assuming facts that are not in evidence. Where do you get the higher bandwidth from memory if it is running with wait states? Oh, right, you are *assuming* that there is something different in the design that will make that one faster. Something that is not part of a slower Flash that requires wait states.

"People" being "you"?

There are a number of sockets that Atmel won't win if they don't have a CM3 device. There are two companies with the new core in production and a third on their heels. I am sure sales of the ARM7 devices won't drop off a cliff. But this business is all about design wins and I stand by my earlier post in another thread that the CM3 will start to steal significant numbers of design wins by the end of this year and by the end of next year they will overshadow the ARM7 design wins in the off the shelf MCU market.

Is that the plan for the AVR32, to take *some* sockets? You know as well as I do that if the AVR32 does not get significant market penetration within a two years from now, it will be put on the back burner and eventually discontinued. Atmel has no reason to keep making a part that consumes significant resources and does not make significant profit. Look at what happened to Atmel programmable logic. When was the last time they added a new FPGA to the product line? How many FPSLICs have been designed into new sockets?

Like I said, aren't all internal SRAMs in all processors single cycle???

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, Jul 14, 2007 8:04 AM

If you have a memory capable of running at 50 MHz and you put that in a CPU capable of running at 25 MHz, then you will run slower.

In a two stage pipeline, you do "fetch-decode" and "execute". If memory access, decoding and execution takes 20 ns, then it will take 20 + 20 = 40 ns to handle the "fetch-decode" stage, so the CPU can run at 25 MHz.

In a three stage pipeline, you do "fetch", "decode", "execute". If all three stages take 20 ns, then you will be able to run at 50 MHz.

Statistics is likely to show that branches are normally not that frequent that you gain speed by having a shorter pipeline.

By making it wider.

No, Atmel marketing.

And maybe the ARM9 designs overshadows the ARM7 and CM3 as well. I see most high volume designs nowadays require 200 MHz + operation. The large customers (1M+) requiring low power, seems to focus on 1,8V SAM7s or AVR32s. This is of course only 5% of the total MCU market normally so things could be different in your region.

A company selecting a binary compatible family, will still be better off with ARM than with Cortex, due to larger performance span.

Maybe so, but from a performance point of view, you are more interested in how many cycles it takes to load from SRAM into a register, and if this takes 1 clock cycle due to a 1 clock load instruction, or 3 clock cycles due to a 3 clock load instruction (from a 1 clock cycle SRAM), then you do see a performance differnence.

--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

- S
- Stephen Pelc
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, Jul 14, 2007 2:32 PM

Branch frequency is highly dependent on the application domain and coding style. However, it has been reported than in control-type applications branch instructions can be 20% to 30% of the total.

Stephen

--
Stephen Pelc, stephenXXX@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jul 16, 2007 2:04 PM

meddelandetnews: snipped-for-privacy@n2g2000hse.googlegroups.com...

This conversation has become pointless. It started discussing the loss of performance in processors that use slow Flash memory and you have turned it into a discussion of processor design. You are way off topic and your comments are irrelevant to the original point. The bottom line is that if all other things are equal, a processor with faster Flash memory will run faster. The Stellaris CM3 running at 50 MHz with no wait states from Flash will be faster for most apps than a processor running at 70 MHz with 1 or two wait states like the STM parts we were discussing. It may also be faster in many apps than a processor running at 70 MHz using a wide flash bus interface to overcome the wait states required because the lookahead fetch is often wasted when the instruction flow changes.

You can dance around that, but those are the facts.

Funny, you are bringing in both statistics *and* probability. That is the type of language I hear all the time in commercials where they want you to think they have just told you a fact when in fact they have said pretty close to nothing.

Ahhh, *marketing*! That makes it very clear now. We can all have complete trust in benchmark figures from *marketing*!

Yes, the swan song of the truly desperate. If anyone connected to the ARM7 feels threatened by the CM3, they simply bring in the ARM9 which is a totally unsuited processor for most of the apps that the ARM7 and CM3 target. The ARM9 will never fit the sockets that the ARM7 and CM3 fill. However, the CM3 fill most of those sockets much better than the ARM7 and that is my point.

If they can shoe horn it onto their board! An ARM9 may be the right choice for a router, but not for a controller. The CM3 is targeted to the lower end bumping up against the 8 bit devices and eating into their market segment. The ARM9 will never compete in that area. It is too large of a chip and will always be uncompetitive at the low end.

I see you ignored this comment. There are any number of "good ideas" that have totally failed in the market place. It is very possible that the ARM32 will be one of them.

What processor only uses 3 clock instructions to access 1 clock memory? My understanding is that many processors not only use faster instructions to load, but can use memory in other instructions which allow single cycle back to back memory accesses.

Besides, no one feature ever makes or breaks a processor chip. There are literally dozens of distinguishing points between different processors and only marketing and salesmen try to narrow an engineer's focus to a small number of features. I care about the overall utility of a processor and one of the big selling points to me is the ubiquitousness of the ARM chips. Very soon that will include the CM3 devices which will take over the low end squeezing the ARM7 between the CM3 and the ARM9.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jul 16, 2007 9:30 PM

Couple of teensy weeny problems to that sweeping statement: For something to hope to "fill most of those sockets", it needs to be Pin and code compatible, Alas, the M3 is neither.

I note that NXP has licensed the Cortex A8, but simply not bothered with the M3. [Likely their 128 bit fetch ARM7, makes the M3 too small a change]

Many designers will think the same. I don't see many taking an ARM7 out of a released product, just for the fun of dropping in a M3.

So, the M3 competes for new designs, and Ulf is right that the leading edge will want a bigger new-design jump than ARM7->M3, so that leaves the M3 chasing a narrow aperture of design wins. There, it competes against all the other 32bit offerings, and it competes on Peripherals package and power, as much as Core.

We looked at the new ST M3's : Great I thought, a Small MCU, with USB and CAN (notice the actual core is not even on this selection list! )

-Oops, seems ST have designed a part that is USB _or_ CAN. Even a good 8 bit core would run USB & CAN, so we don't actually care about a 25% performance window.

Atmel are adding new CPLDs, (but their FPGAs are in stable design mode). They have the new CAP series, with ARM7 and ARM9. The new family looks well placed, to pick up 'Cost Down Design Passes' on products that started commercial life in FPGAs, but as volume (and competition) ramps, they need more efficent silicon.

I'm guessing you actually meant to say AVR32 here ;)

I see AVR32 and FpSLIC as very different animals.

FpSLIC: - a "good idea" ? Hmm... It was clear (to me, at least) even from release, the FpSLIC had problems, which was that it LOOKED to be very flexible to someone in marketing, but to a designer was actually very constraining:

You had to KNOW you code was NEVER going to go above the (16K?) chip limit, and you had to have an application too big for a CPLD, and small enough to use the FPGA portion (but never exceed it) Then you notice that an application small enough to fit in 16K, but that ALSO needs a Small-Moderate FPGA, is becomming a tiny segment indeed.

AVR32: This is a much simpler design choice. high end uC design choice is based mainly on the 4 P's : Peripherals, Power, Package & Price. Success is helped a lot by low cost tools, and good on chip debug will be important, as will a strong eco-system.

Atmel's road map on this is looking pretty good. [So do Freescale's, and Infineon's, and none of these use M3...]

-jg

- U
- Ulf Samuelsson
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 20, 2007 10:37 PM

Nope it isn't, the AVR32 running at 66 MHz will run mostly at zero waitstates due to its interleaved flash controller design. Each flash access done by the memory controller will have 1 waitstate, but since the memory controller can do two accesses in parallel, the CPU will only see waitstates during jumps, and no waitstates during non jump instructions. If you do jumps 20% of the time, then the average number of waitstates is

0,2. On top of that you will be able to perform dataaccesses to the flash while eating from the instruction queue wihout any performance penalty.

The ARM9 will fit almost any sockets where the user require an external bus.

You'd be surprised how often ARM9 fits the bill.

The simple three stage pipeline processors (and the CM3) normally use a few clocks in the execution stage to load data, but the uC3 family does not.

--
Best Regards,
Ulf Samuelsson
This is intended to be my personal opinion which may,
or may not be shared by my employer Atmel Nordic AB

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jul 23, 2007 3:44 PM

No, when I say "fill the sockets" I am not talking about new chips being used in old designs, I am talking about the new chips being used in new designs that would otherwise make use of the other MCUs. So when new designs are started, a designer who considers the CM3 will see that it is a better choice for most designs that he would otherwise use an ARM7. Likewise, for designs that would otherwise use an ARM9, they will mostly continue to use the ARM9. I see the ARM7/ CM3 as fitting different sockets than the ARM9 with little overlap.

So please try to read my words carefully. I know you can figure out what I mean since we have discussed this before and I am saying the same things I have said before. I guess I should reconsider my purpose in continuing to discuss this with you since you don't seem to pick up on what I am saying and the meaning seems to get twisted a lot.

I agree, it would be silly to pull back a released product just to change the MCU when it is working just fine.

I don't know what you mean by "leading-edge". New designs cover a wide range of requirements for the MCU from tiny 8 bit devices that give the lowest cost to huge 32 bit processors that nearly keep up with x86 CPUs. The application range of the ARM7/CM3 has little overlap with the ARM9. The most significant separator is cost. Most ARM9s do not include program storage requiring external Flash. The one ARM9 family that includes Flash runs much slower than the other ARM9s and is only a slight speed (or any other) improvement over the ARM7 or CM3. The CM3 has several advantages over both the ARM7 and ARM9 which you seem to want to dismiss while focusing on how the ARM9 is a very different processor with more advanced capabilities targeting a different market. Using an ARM9 in many applications is like using a mortar to hunt rabbits. There may be more features in the ARM9s than the CM3, but if you don't need them, why pay for them?

Why do you continue to try to compare the ARM9 to the CM3? They address different markets and there is very little overlap.

Now you are going off into left field. My point was to compare the single source AVR32 to other single source products such as the FPSLIC which has failed in the market and will leave someone high and dry when it is discontinued. You bring an ASIC into the discussion as if it were somehow relevant. What was your point???

Yes, my slip...

Yes, they are different, but they have a significant common point, they are both single source with very stiff competition. It will be very easy for the AVR32 to slowly die just like the FPSLIC, the Transcend processors and many other products that just could not compete in the market. It is especially interesting that Atmel continues to introduce new ARM processors along side of the AVR32. I seem to recall Intel doing that with various processors like the 860,

960 and others, all of which died off and left users high and dry. I believe the 860 was a popular product in the military camp and was designed into a number of systems with 10 to 20 year lifespans. Then 3 years in, the family was discontinued so customers didn't even have similar chips to upgrade to. I can see the AVR32 going this same route.

This sounds like a specious argument. *EVERY* CPU has limitations which you have to accept when you use it. At the time the FPSLIC was introduced some 10 years or more ago, 16kB was a generous amount of RAM for an 8 bit MCU. This memory is RAM, not Flash which was stored off chip in the FPSLIC. Regardless, it does not matter what flaws the product had, the point is that this type of product was sole sourced which had a lot to do with the product failure. It is not just a matter of pin compatibility, there was no one else making devices remotely like FPSLICs. That was actually the reason I did not use it in a design it was perfectly suited to. Likewise switching from an AVR32 to another processor will require a lot more work than just switching between ARMs.

That rolls off the tongue well, but there are significant difference between CPUs. You seem to point that out in spades when you compare the ARM7 to its sibling CM3, but completely dismiss it when you compare the AVR32 to all the other 32 bit processors. Staying within a family saves a lot of work. The ARM family has a great deal of commonality between all of its members with a wide target range while the AVR32 has a limited target range and requires switching families to go outside it. The bottom line is that the ARM chips have legs that other, proprietary products don't. Even ignoring the technical issues, the ARM has momentum which will capture a lot of design wins in close races.

I seem to recall that the ARMs are a big part of Atmel's road map. That is my point, the CM3 is a better ARM than the ARM7 is. Everything the ARM7 does, the CM3 does better. The designs they target are not a good match to the ARM9 because of higher power consumption, larger physical size or higher cost. The CM3 out competes the ARM7 in every area except for the number of implementations which I am saying will be changing over the next few years. Finally, I don't see the AVR32 having any real advantages over the ARM processors unless there is an app which just happens to fit the AVR32 details better than any of the ARMs. The number of apps for which this is true will be very small indeed.

So with more makers announcing new CM3 chips, I see the crossover point (more design wins of off the shelf MCUs) for the CM3 vs the ARM7 to be within the next year and may be by the end of this year.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jul 23, 2007 3:55 PM

meddelandetnews: snipped-for-privacy@57g2000hsv.googlegroups.com...

That is pointless. It does not matter how large the FIFO is, if you are pulling data out at a given rate and you can only put data in at that same rate, as soon as you have to stop instruction reads to do a data read, you will not be filling the FIFO as fast as it is being emptied and performance will suffer. Run through a simulation and see if that is not true. Based on the info you provided, this is the result.

So you are agreeing with me that the ARM9 is not a good match for most ARM7 or CM3 designs? The ARM9 may "fit" the design, but it will not be as good a fit if the ARM7 or CM3 can do the job. If nothing else, the cost and power consumption will be higher with the ARM9. In most cases the package size will be larger for the ARM9. Why use a shotgun when a slingshot will do the job?

No, I think I have a pretty good handle on the differences between Atmel's ARM9 processors and the CM3 product line. They are similar CPUs with very different interfaces to the outside world for two very different target ranges. Anyone who thinks there is much overlap is kidding themselves.

Ok, I have to assume that you don't have any examples. Regardless, this seems like a red herring in this discussion anyway.

I stand by my analysis of the competitiveness of the CM3.