Arm Cortex-M7 - still single precision FPU

Looks like ARM has not moved to double-precision yet with M7. ST announced M7 parts that look really impressive except for this issue, and claims to have preview parts available. Wonder why still single precision?? Hmmm...

Reply to
Dave Nadler
Loading thread data ...

Actually Wikipedia claims that the M7 will have both SP and DP floating point:

formatting link

--
Fletto i muscoli e sono nel vuoto.
Reply to
dalai lamah

I suspect that the size of the FPU goes up as bits^2 or bits^3 or something obnoxious like that.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com
Reply to
Tim Wescott

yeh something like that, four x*x mults to do a 2x*2x mult and a barrelshifter with twice the bits and twice the reach

-Lasse

Reply to
langwadt

Depends on how it is implemented. They can do the work by adding more hardware or they can do the work by taking more time. It is not uncommon to do double precision floating point by using multiple clock cycles in single precision hardware.

--

Rick
Reply to
rickman

ST data-sheet says single-point...

Reply to
Dave Nadler

Page 12 here:

formatting link

Reply to
Dave Nadler

However, ARM says it has double precision here:

formatting link

Reply to
Dave Nadler

The ARM info is marketing... notice all that they *don't* say. I believe even on the M7 floating point is optional. I'd believe the data sheet.

Anyone looked at the R series? I searched about a bit and didn't find anything with an FPU. TI and Spansion are the only ones I found making them.

--

Rick
Reply to
rickman

A full Wallace tree (combinatorial) multiplier is bits^2, but that is rare.

A fully pipelined mulitplier is also bits^2, but can produce a new product every clock cycle, ones the pipeline is full. (Nice for vector processors.) Less than fully pipelined produces a product every N cycles with P(bits^2/N) logic.

Usual dividers are O(bits) space and O(bits) time.

Newton-Raphson dividers use the pipelined multplier, and produce a quotient in a small multiple of the number of cycles to run the multiplier.

For specific examples, the IBM 360/91 and Cray-1 are favorites in books on pipelined processors.

formatting link

-- glen

-- glen

Reply to
glen herrmannsfeldt

I think all TI R5 have an FPU, at least these do:

formatting link
formatting link

If you really need a fast DP FPU, the Renesas RZ may be an option. The onboard RAM sounded really appealing, but unfortunately production was not in time for our product so we switched to Xilinx Zynq. Dual core A9 with fast DP FPU and an FPGA along with it. Requires external DDR3 however so we opted to used a MicroZed module, at least for the first series.

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

Sooner or later you must pay for your sins. 
(Those who have already paid may disregard this cookie).
Reply to
Stef

Op Thu, 25 Sep 2014 03:39:36 +0200 schreef rickman :

About half of the available TMS570 devices has FPU.

--
(Remove the obvious prefix to reply privately.) 
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/
Reply to
Boudewijn Dijkstra

I believe that, like a few things on the Cortex-M family, it is optional and up to the actual manufacturer to decide whether to include single or double (and maybe even none at all as per one of the Cortex-M4 options).

...of course, FreeRTOS already supports the M7 ;o)

Regards, Richard.

  • formatting link
    Designed for microcontrollers. More than 107000 downloads in 2013.

  • formatting link
    IoT, Trace, Certification, FAT FS, TCP/IP, Training, and more...

Reply to
FreeRTOS info

And you win the jackpot! TI doesn't make it easy to find what these parts do without downloading the data sheets. Either many of their R4/R5 parts don't have FPU or they are keeping it a secret. They have changed their web site over the last year or so and it seems to be much more marketing and less info. I did not see one selection guide that included this info. I get the impression they are making significant changes in this product line and much of the info is out of date.

Also, this series which is called "Hercules" under the "Safety" line seems to be all about the dual CPUs which I think are intended to run duplicate code as a redundant backup. They talk about running them in "lockstep" which means you can use logic to tell if they differ which would indicate a failure.

So far I haven't seen any speed requirements stated, just a request for double precision.

--

Rick
Reply to
rickman

Only two CPUs ?

If they disagree, how does the logic tell, which gives the correct result and which doesn't ?

You would need at least three CPUs or any voting systems.

Does these even have separate power supply pins for each CPU so that you could run them from two separate power supplies ?

Reply to
upsidedown

Error detection, not correction.

Protection from power failure is up to the rest of the design. It's for safety where you want things to not do damage I think. Not sure that it is required to continue working. But I'm not sure, just interpreting what I see.

--

Rick
Reply to
rickman

I should have qualified my statement somehow, to indicate "for the same speed" or whatnot.

I read someplace that most of the area of an IEEE-compliant hardware FPU (and most of the lines of code for a similar software FP library) are involved in error trapping and exception handling. I dunno if the complexity of that goes as bits^1, bits^2, or bits^gawdaful.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com
Reply to
Tim Wescott

This kind of safety most often comes down to keeping a system in a state that nobody dies. If the lock=step processor fails, this generally signals a more low-level circuit to bring the system to a non-operational, but still safe state.

--
(Remove the obvious prefix to reply privately.) 
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/
Reply to
Boudewijn Dijkstra

True error correction is a lot harder than just "majority voting" of three cpus.

Correct.

I haven't read the details of this chip, but there can also be other detection mechanisms such as ECC on buses that are used to spot that which cpu is having trouble. If the hardware can identify which cpu has failed, then the other one can continue in a "limp" mode.

Indeed. There are many systems where it is sufficient to stop everything if there is a critical failure. There are also many that can have a simple safe mode (such as a car - if a failure is detected, you keep the brakes and steering going but bring the engine to a controlled stop). A chip like this can detect chip failures and then perhaps run in a limp mode, such as at lower speed or with cache disabled (cache errors are a substantial part of single-event upsets).

I heard somewhere a little about the physical layout of chips like the Hercules (though I may well be mixing this up with similar lock-step chips from Freescale's MPC range) - they do things like lay out the two cpus at 90 degrees and upside down, so that electrical interference will affect the two cpus differently. Sometimes the second cpu layout is done entirely separately by a different group from the first layout.

If you need full operation after critical failure, you need a more complex system. I believe that in the aircraft industry, they use majority voting from three processor boards - but each board has a different type of processor, running software from different development teams, so that systematic errors in one design will not affect the others.

Reply to
David Brown

Follow-up: ST clarifies that their M7 is indeed SINGLE precision FP. ST claim (conversation this AM with local ST specialist):

- ARM couldn't deliver IP for double

- competitors Freescale and Atmel will also only deliver single in short-term

- expect double FP ST parts in ~one year

Best Regards, Dave

Reply to
Dave Nadler

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.