Any ARMs with hardware divide? - Page 2

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: Any ARMs with hardware divide?
Quoted text here. Click to load it

  Fully agree. but some Cortex's are compatible, and some are not.

  So, imagine a hypothetical one, done properly by ARM, so that it has
all the nice features of Thumb2, BUT also operates (choke free) on
any ARM opcodes that may arrive (but at reduced speed is fine,
especially if that saves  silicon).

  [ ie new design weighting applied, so as to be smaller than the -A, -R
variants, but not as broken as the -M variant. ]

  Surely THAT would have to interest the ARM_uC vendors like Atmel,
Philips, ST, AnalogDevices, STm etc ?
  Once one of them had it, the others are rather forced to play catch-up ?
  ie then the Value of the Cortex_uCFIX core becomes higher, and the
older ARM7s are the ones significantly lower value....

  We will see how this plays out over the next 18 months..

-jg




Re: Any ARMs with hardware divide?
Quoted text here. Click to load it

No one would put an Cortex M3 in a phone or PDA. It is not the market it is
aimed at. They would most probably use a Cortex-A series core.

-p
--
 "What goes up must come down, ask any system administrator"
--------------------------------------------------------------------

Re: Any ARMs with hardware divide?


Quoted text here. Click to load it

Do you have a reference? The only benchmark source I can find
on Cortex-M3 is the comparison with other MCUs (1MByte):
http://www.arm.com/Multimedia/DevCon2004_presentation.pdf

The reason I find this statement surprising is that in fact Thumb-2 works
best on wider interfaces (>= 32-bits), as it uses 32-bit instructions. It is
faster than ARM when using the same flash interface since it fetches less
code.


Quoted text here. Click to load it

Yes, it looks promising, but without further details it is difficult
to figure out why those numbers look suspiciously good. It's obvious
that a prefetch buffer can hide the fetch latency in straight-line code.
However typical code branches a lot and the latency of non-sequential
accesses can only be hidden by a cache. Maybe that is what it does...

Note a wide interface will not only speedup ARM, but also Thumb-2.

Quoted text here. Click to load it

I said that *within* families CPUs are 100% compatible. The
paragraphs are consistent. To clarify with a detailed example:

Suppose we have 2 different Cortex-M cores: M3 and M4. These
are fully binary compatible in that you should be able to run an M3
binary on the M4 and visa versa [1][2]. If newer versions provide
the performance and features you want then you'll never want to move
to another Cortex family (ie. binary compatibility is a non-issue).

However say we also have an R5 core. You should be able to
run M3 and M4 binaries on the R5 [1]. However you will need
to do some more porting and recompilation to get the best out of the
new CPU [3]. The same is true today when you move from an ARM7
to an ARM11.

Alternatively you can also run R5 binaries that have been compiled
with downwards compatibility in mind (ie. no ARM code, no
R5-specific features etc) on the M3 and M4. Doing this requires
a bit of care of course, but no more than you need today for code
that is designed to run on many architectures (eg. C libraries).

So moving *within* a Cortex family is generally trivial - you'll get
full binary compatibility. Moving *between* Cortex families may
require some porting and care to get full binary compatibility.
In all cases a recompilation is highly desirable as the compiler can
then optimise for that particular CPU.

So... where do you want to migrate to today? (tm)


[1] Of course this level of compatibility only applies to the instruction
set - most MCUs have lots of peripherals which cause another level of
incompatibility. For example any code that runs on the AT91 series can't
run on the LPC2000 series (or visa versa). Even with identical interfaces
one chip may have 2 timers and another 8. So if you use a purist definition
of "binary compatible" no 2 chips are compatible.

[2] Of course while your M3 code runs fine on newer versions, the
pipeline may be a little different, and so your code doesn't run as
fast as it could (unless you recompile it - you may not care, but your
competitor might).

[3] You'll end up running with the caches disabled as the M3 doesn't
have a cache and thus has no code to enable it. So you're not getting
full use of the new features - and the difference between potential
and actual performance is likely much larger than [2].


Quoted text here. Click to load it

We already knew that the M3 does not run ARM code natively, however
it does run existing Thumb and Thumb-2 code, so it is binary compatible
with that. So you could port your OS to the M3 (which is something you
would have to do even if the M3 supported ARM), then relink your existing
Thumb objects/libraries. If you did have any ARM objects without source
you could disassemble them and reassemble for Thumb-2 without too much
effort. Not 100% compatible, but close enough.

Also a key goal of the M3 is to aid migration of non-ARM 8/16-bit MCU to
the ARM world. The ARM world is totally incompatible of course, but if the
gain is worth more than the cost, people will move. The M3 tries to lower
the entry barrier as much as possible by removing features that cause new
users trouble (like ARM/Thumb interworking, the OS model), and introducing
features that make things easier (Thumb-2, DIV, faster interrupts, more flash
for a given die size).


Quoted text here. Click to load it

I'd expect tools to automatically detect incompatibilities:

(a) when linking (automatically select compatible libs, error if incompatible)
(b) when simulating/debugging an image
(c) when burning an image into flash
(d) when running on hardware (trap when executing an incompatible instruction)

This is basic stuff. You could even emulate unsupported instructions if you
absolutely needed it.

Quoted text here. Click to load it

Given that M3 outperforms the good old ARM7tdmi by such a large
margin on all aspects and Cortex has Thumb-2 written all over it, what do
you think may quietly get "de-emphasised"? :-)

Wilco







Re: Any ARMs with hardware divide?
Quoted text here. Click to load it

Yes, the ARM info is sparse, and poorly detailed, but what they have
published shows Thumb2 to have LOWER peformance than ARM, but better
code density. Thumb2 _does_ decrease the Step effect, between
ARM//Thumb, and adds smarter embedded opcodes.
They state it is a mix of 16 bit and 32 bit opcodes.


Quoted text here. Click to load it

Yes, but the biggest effect is to shift the normal hit 32 bit opcode
fetch encounters. It is an opcode-bandsidth, and matching that to memory
bandwidth issue.

Quoted text here. Click to load it

These verbal gymnastics aptly demonstrate my point that calling M3
something clearly different would have helped. When you have to
underline the difference between 'within', and 'between', then
perhaps a clearer name scheme would have been smarter.


Quoted text here. Click to load it

Binary compatible means what it does on the 80C51. NO opcode choking.
Very simple. SFR and Peripheral compatability are easier to manage.

Quoted text here. Click to load it

'Close enough' for who ?
ARM users will make that call, not ARM marketing.

Quoted text here. Click to load it

and this seems to be the crux of problem. ARM seem to think they can
replace the 8051/8 bit sector with this new variant. Instead, they have
lost focus on what attracts users to ARM ( see Ulf's comments ).
Atmel, Philips et al _already_ have sub $3 offerings, so there is
substantial overlap into the 8/16 bit arena now. And this with an
ARM/Thumb offering.

  Mostly, the uC selection decisions I see made, hinge on Peripherals &
FLASH/RAM, NOT the core itself. As Ulf says, they choose ARM's
_because_ they are binary [opcode] compatible.

  Philips seem to have a HW solution that simply and effectively
reduces the ARM/Thumb step effect. Thus any "new core" benchmarks that
exlude this solution, lack credibility.

  It is better to talk about the better embedded opcodes/features in
Cortex. - and the A and R variants _include_ ARM opcodes.

  After all, code size is steadily getting both larger and cheaper,
with FLASH ARMs now well clear of 8/16 bit models in FLASH resource.


Quoted text here. Click to load it

Key words here are 'expect' and 'could'. We are talking about existing,
proven tools and in use right now, not horizonware.

Quoted text here. Click to load it

That's easy : The lack of binary [opcpode] compatibility.

Will Ulf be pushing Atmel to release a -M3 microcontroller : I doubt it!


  I simply don't see the 'such a large margin on all aspects' in ARMs
published information at all ?

  These graphs show Thumb-2 as being LARGER than Thumb, and SLOWER than
ARM ?!  [but also smaller than ARM, and faster than Thumb]
  Their example claim of a system Size saving of a (mere) 9%, also
avoids any comments on Speed. Hmmmm... ?

  To me, Thumb2 is a sensible, middle ground between ARM and Thumb,
( fixes some of the older core's shortcommings ) but the removal of ARM
binary compatibility on the M3, and apparent pitch into a space users
are leaving void, is poorly researched.

  Time will show who is right :)

-jg



Re: Any ARMs with hardware divide?

Quoted text here. Click to load it


Those are ARM1156T2-S benchmarks - not M3. The first Thumb-2 compiler
indeed generates code that is almost 1% larger than Thumb (still 34% smaller
than ARM!). The difference in performance is less than 3% on the ARM1156
(the first Thumb-2 CPU). So it is pretty close to the marketing statement
"ARM performance at Thumb codesize". The next compiler release will
without a doubt improve upon this and close the gap if not bridge it.

Quoted text here. Click to load it

I'd say Cortex-M and Cortex-R are clearly different names. They are
Cortex because they all support the same base instruction set (Thumb-2).

Quoted text here. Click to load it

Yes, these are nice parts, but they don't compete with many of the cheap
MCUs like the 8051. The M3 can compete much better and maybe get down
to the $1 price range.

Quoted text here. Click to load it

Yes that is true.

Anyone moving to ARM from the 8051 or similar simply won't care
whether the M3 supports the ARM instruction set or not as long as it
doesn't make porting harder. The resulting code is of course binary
compatible with any other Cortex CPU as I explained.

Quoted text here. Click to load it

There are no benchmarks on narrow flash for the M3 AFAIK. When running
Thumb-2 on a wide flash it will run faster than ARM because of its smaller
codesize. If the performance penalty of running from flash is 15% for ARM,
it would be 10% for Thumb-2. If we use the current figure of Thumb-2 being
3% slower than ARM using perfect memory, it would be 2% faster on flash.

Quoted text here. Click to load it
...

ARM's tools have had this feature for over 5 years now (since ADS): any potential
incompatibilities are immediately fed back to the user. It is not
incompatibilities
themselves that cause the trouble, it is the wasted hours due to trivial
mistakes that
aren't spotted by tools that are the real issue. Loading a big endian image on a
CPU
configured for little endian is something I've done many times, but it never
took me
more than a second to correct the mistake as the debugger simply refused to run
the
image...

Quoted text here. Click to load it

Or rather your perceived lack thereof. I don't understand how the lack of ARM
instruction set support can be crucial while differences in peripherals are
somehow excluded from binary compatibility issues... In the real world both
stop you from running the same binary on different cores.

Quoted text here. Click to load it

You're looking at the wrong information. On an ARM7tdmi with perfect
memory, Thumb gives about 0.74 MIPS/Mhz, ARM does 0.9. The M3 gives
1.2 - about as fast as ARM code running on an ARM9. That's about 60%
performance improvement over the 7tdmi using Thumb (at Thumb codesize)
or 30% when using ARM (with a 35% codesize gain).

Then there is the power consumption and die size which are less than half that
of the ARM7tdmi, the much better interrupt latency and multiply/division
performance, unaligned access, simplified OS model etc.

Quoted text here. Click to load it

You mean the gatecount here? The saving over ARM7tdmi with the same set
of peripherals is about 37K gates (70K - 33K). Assuming a gate is equivalent
to 16 bits of flash (probably too conservative), that is an extra 74KBytes of
flash for free. You'd need 820KBytes of flash before this becomes a mere 9%
saving, and that is definitely not a low-end MCU. You could build an M3 with
1K SRAM and 16KBytes of flash and _still_ be smaller than a bare ARM7tdmi!

Quoted text here. Click to load it

Thumb-2 is not "middle" ground - it combines the best features of ARM with
the best features of Thumb, effectively superceding both. Why do you think
Cortex is based around Thumb-2?

Quoted text here. Click to load it

Sure - I bet there are many people working hard to try prove you wrong :-)

Wilco





Re: Any ARMs with hardware divide?
Quoted text here. Click to load it

?! - but the M3 is Thumb-2, and you have just confirmed "not quite ARM
performance yet..."

<snip>
Quoted text here. Click to load it

Not the users I talk with.
Binary compatible is near the top of their lists, _especially_
80C51 users.

With Cortex-M, as Ulf says, they may as well also look at the raft of
other  'new core' alternatives. Like CyanTech, MAXQ, & the many new
Flash DSP's .....
  Gamble: Choose which ones will not hit critical mass, and survive only
one generation.

 > The resulting code is of course binary
Quoted text here. Click to load it

Well, we'll agree to differ on our definition of Binary compatible.

Could one write code that ran fine on a Cortex-R, but choked a Cortex-M ?

I call that NOT binary [opcode] compatible.

Other users are free to apply their own definitions.


<snip>
Quoted text here. Click to load it

To help you with that distinction, I stated binary [opcode] compatibility.

80C51 designers are fully versed in peripheral porting, but they
also expect [even demand?] to have one stable/proven/mature tool chain.

Quoted text here. Click to load it

I was looking at ARMs own web data, on Thumb-2.

If that is wrong, then we'll wait for it to be corrected.

Your own numbers above agree that Cortex is struggling to match
ARM performance on Speed -[ real soon now... just need another compiler
pass...]

Quoted text here. Click to load it

No Code Size. They somehow 'missed' mention of the speed numbers ?


Quoted text here. Click to load it
<snip>

  The more important comparison is -M and -R, -A gate counts, then you
compare 'same design generation'.

  Better still, give us the incremental cost of adding size-optimised
ARM compatible execution to M3 [ie: can be a little slower, NO-choke is
the design brief] ?


  Summary: Thumb-2 has performance merits, but the -M variant
risks 'falling between two stools' - instead of building on their
strengths, they seem to be trying to be all things to all users.
That's a pity, as the talent and resource could be better applied.

  Probably time to end this thread, and wait 18 months for the users to
vote.. :)

-jg


Re: Any ARMs with hardware divide?


Quoted text here. Click to load it

Yes you can. A program can be a Thumb-only application, a library or
a DLL or similar. The OS interface doesn't change for new hardware -
a basic principle of OS design!

With the ARM1156T2-S however it is possible to create a Thumb-only
system, including the OS.

Quoted text here. Click to load it

Yes until Thumb-2 you generally need some ARM code in your OS
(but not necessarily in your application).

Quoted text here. Click to load it

Operating systems need to be ported to new CPUs. What's new?

Quoted text here. Click to load it

Perhaps Cortex is so new that none of the CPUs have been released yet?

Quoted text here. Click to load it

No, existing tools will continue to work fine for Cortex. For Cortex-M
you'll need Thumb-2 capable tools of course, but these either exist today
or are close to being released by your favorite compiler vendor.
Cortex-A and -R can continue to use any existing tools of course.

Wilco



Re: Any ARMs with hardware divide?
Quoted text here. Click to load it

And these will be made available to existing ARM tool customers free of
charge?
One reason why people selected ARM was that they did not want to continue
spend money on switching tools.


Quoted text here. Click to load it


--
Best Regards,
Ulf Samuelsson
We've slightly trimmed the long signature. Click to see the full one.
Re: Any ARMs with hardware divide?

Quoted text here. Click to load it

Ah, but if you look at ARM's annual reports you'll see that in recent years
they've made as much money from tools sales as from chip licencing. Making
chips you need a new compiler for fits neatly into that business model.

--
Kevin Bracey, Principal Software Engineer
Tematic Ltd                                   Tel: +44 (0) 1223 503464
We've slightly trimmed the long signature. Click to see the full one.
Re: Any ARMs with hardware divide?

Quoted text here. Click to load it

If you use GNU yes - it's still free last time I heard. However if you want
support for the latest and greatest CPUs and you want it *now* then you'll
have to pay for it. In what way is this different from the introduction of
Thumb-1, Thumb-2, DSP, VFP, Media or any other architectural extension?

Quoted text here. Click to load it

There is a big difference between upgrading and switching tools. Upgrading
is typically an order of magnitude cheaper (cost and effort wise) than switching
to a new toolkit. The ARM tools business is healthy with over 20 compilers
available, so you can choose whatever suits you best. Note that saving money
on the cost of a toolkit is false economy in most cases - even a small
improvement in the per-unit-cost/feature set of a product or programmer
productivity will pay for it.

Wilco



Re: Any ARMs with hardware divide?
On Wed, 4 May 2005 23:36:32 +0200, "Ulf Samuelsson"

Quoted text here. Click to load it

Doesn't Thumb-2 just add this to Thumb what is needed to do these
things.

Quoted text here. Click to load it

I can't speak for others, but I am pretty sure, I'll have our RTOS
ported to a new core within a week or less (and this is pure
assembly).


Quoted text here. Click to load it

AFAIK Thumb-2 is just an extension to Thumb, or am I wrong. So the new
codes could be easily hand-coded for the beginning
--
42Bastian
Do not email to snipped-for-privacy@yahoo.com, it's a spam-only account :-)
We've slightly trimmed the long signature. Click to see the full one.
Re: Any ARMs with hardware divide?
mnoone.uiuc.edu@127.0.0.1 says...
Quoted text here. Click to load it

Not sure if you ever got an actual answer to your question.
The Philips/NXP LPC3180 is based on the ARM926EJ-S, and has a vector
floating point co-processor (not in the core, but at least it's on the
same chip...)

"This CPU coprocessor provides full support for single-precision and
double-precision add, subtract, multiply, divide, and multiply-
accumulate operations at CPU clock speeds. It is compliant with the IEEE
754 standard, and enables advanced Motor control and DSP applications.
The VFP has three separate pipelines for floating-point MAC operations,
divide or square root operations, and load/store operations. These
pipelines can operate in parallel and can complete execution out of
order. All single-precision instructions, except divide and square root,
take one cycle and double-precision multiply and multiply-accumulate
instructions take two cycles. The VFP also provides format
conversions between floating-point and integer word formats."

--Gene

Re: Any ARMs with hardware divide?
Quoted text here. Click to load it


The answer is that ARMv7-M and ARMv7-R architecture processors have hardware
divide. The Cortex M3 implements ARMv7-M and the Cortex R4 implements
ARMv7-R.

-p
--
"Unix is user friendly, it's just picky about who its friends are."
 - Anonymous
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline