two fpu compared

alb · 2015-02-12T08:30:32+00:00

Hi everyone, assuming two differently implemented FPU, both validated against the IEEE754, and limiting operations to the golden 5 (+,-, *, /, sqr), can I be sure that provided the inputs operands are the same the result would be the same bit wise? BACKGROUND: our dear customer wanted to verify that the control law specified is correctly implemented by means of comparing output with a golden model. Now the golden model is running on a standard PC, with matlab, a pile of mathematical libraries and a processor that is humongous compared to our little embedded unit. I've already warned the customer that bitwise comparison might only be viable if precision wise the two calculations are equally configured (single precision, rounding, etc.). Are there any other elements to consider? Moreover, we are driving a PWM with the control law on a 8bit, or max 10bit precision, so most of that fine grain precision would not be used anyway. Any comment/pointer/suggestion is appreciated. Al -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?

D

David Brown 11 years ago

Colin's style of quotation is ugly and non-standard, messes up threading and branches, and is completely useless when there is more than one level of quotation. He has been asked many times, by many people, to stop using it - and has always disregarded them. So please do not attempt to copy it.

Vote

G

George Neuner 11 years ago

It looks like HTML quoting rendered into plain text. Could be the editor and he doesn't known how to turn it off.

George

Vote

D

David Brown 11 years ago

He's using a text-based email client/newsreader (alpine, a descendent of the venerable pine), so it is unlikely that HTML is involved directly - although it could be a poor attempt at imitating HTML quotations.

Either he thinks it is "cool" to be different from everyone else, or someone else has configured his newsreader and he doesn't know how to fix it.

Vote

D

Dimiter_Popoff 11 years ago

Yes, this is the primary purpose of FP. I just gave an example how it can be practical to use an FPU for some other reason, not the primary one (i.e. one can make use of the horsepowers it provides in parallel to the integer unit even though the calculations can be done quite well in integers (the first time I implemented that same conversion algorithm of mine was on a TI 5420 DSP, using integers).

I did not realize you were talking integer divide, I though you were referring to FP division hence me being puzzled. FP (IEEE FP at least) maintains the mantissa as an absolute value and the sign is not a part of it, then there is no remainder etc :-).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

D

Dimiter_Popoff 11 years ago

Oh everybody please do not start such a nonsense, I thought we were past that.

Dimiter

Vote

G

glen herrmannsfeldt 11 years ago

(snip, I wrote)

Yes, but you want to be sure that it gives the right answer.

If a problem is defined in fixed point, and you want to do it in floating point instead, you want to be sure it will give the right result. Floating point rounding can give a different result than integer divide would, after converting the result to integer.

The discussion on remainder has to do with the way integer divide works.

It should always be true that (a/b)*b+a%b==b (C notation). That is, divide and modulo (remainder) have to be consistent, though in the case of negative operands there are two possible choices.

But even with both positive, floating point can round in an unexpected way.

Assuming you avoid the problems of negative division, fixed point should always give the same result, but floating point might not.

-- glen

Vote

D

Dimiter_Popoff 11 years ago

Well of course. But then arithmetic is not such a high science, you know. All you have to do is do the respective homework :-).

Hmmmm, not really. As long as there is no precision lost during normalization/denormalization the result should be the same, rounded to the nearest (or whatever the rounding mode has been set to, if settable). Integer divide would round to the smaller value if it returns quotient and remainder, it is up to the programmer to resolve the remainder and round to the nearest (I usually do it this way unless there are other considerations).

Yes, I have lost the count of divides I have written for various processors over the years, there is nothing special about it.

Ouch, no. Different yes, unexpected - no, one has to know how the FPU behaves, otherwise it is just broken.

Hmmm, I can't think of a way the FPU would yield two different results for the same calculation without its controls being altered. Especially if the operation is doable using integer division, this would mean no mantissa bits will be lost during normalization/denormalization (assuming 32 bit integer divide, obviously one can write integer divide for a longer word than the typical 53 bits on FP).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

G

glen herrmannsfeldt 11 years ago

(snip regarding integer divide)

(snip)

I haven't thought of the fine details recently, but I believe that if you have round to nearest floating point, it is not so easy to get the appropriate truncated integer quotient.

From the double rounding rule, I suspect that it can be done if the floating point quotient is more than twice the needed length. Of course, for efficent use of hardware you don't want it that long.

(snip)

Given a 53 bit quotient of two integers, can you find the correct 32 bit integer quotient?

The favorite for many years was the x87 temporary real. You couldn't (from a high-level language) be sure that your values were consistently kept at 53 bits or 64 bits, so sometimes got different results.

As the compiler might keep values in registers between statements, even assigning to a double wasn't always enough.

Otherwise, an older favorite was the Cray machine with non-commutative multiply. A*B-B*A might not be zero.

-- glen

Vote

H

Hans-Bernhard Bröker 11 years ago

Am 18.02.2015 um 17:52 schrieb glen herrmannsfeldt:

If memory serves Seymour Cray also gained some notoriety for building machines where A*1 might not be equal to A.

Vote

D

Dimiter_Popoff 11 years ago

Absolutely correct of course. Setting the FPU rounding mode "to zero" would solve this (if available, otherwise it would take some work). I got bitten not so long ago by a similar, simpler error I had made; instead of using "convert FP to integer and round to zero" (there is such a power architecture opcode) I had used just "move FP to integer". The latter rounds to nearest and I had to locate and fix it.... :-).

Hmmmm, you make me scratch my head. I think yes, using the correct rounding modes etc., but I would not claim anything without thinking about it in "doing work" mode, which I can't at the moment (head busy doing other things).

Well this is a compiler issue, not an FP one, although related.

Hah! Now such a bug could drive one insane if one has to discover it :D .

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

R

Robert Wessel 11 years ago

While I'm not sure A*1 was ever not equal to A, there were certainly cases where A*B was not equal to B*A.

Vote

D

Dimiter_Popoff 11 years ago

LOL, head must have been busy with other nonsense indeed. Of course you can. On the power FPU - if it is a 32 bit power - you can get the correct integer quotient if it fits into 32 bits (signed, so 31 bits really). All one has to do is the divide, then convert to integer rounding towards zero (there is such an opcode); for the remainder, subtract and multiply will do it. On 64 bit power I think the limit is 64 (63) bits but I am not sure, yet to lay my hands on a 64 bit beast.

My head would have to work a bit more about the limits/data loss re the remainder though, some other time :D .

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

T

Tauno Voipio 11 years ago

The classical example is 10.0 * 0.1. It is due to 0.1 having an infinitely long binary fraction (001100110011....).

-TV

Vote

R

Reinhardt Behm 11 years ago

I remember that one version had PI wrong in their Fortran real*16 routines. A colleague found this when his calculation differed between Cray and IBM-370.

Reinhardt

Vote

G

glen herrmannsfeldt 11 years ago

(snip, I wrote)

(and I also wrote)

The assumption of round to nearest from above was supposed to apply to this case.

Traditionally, you got what the hardware gave you.

IBM S/360 and successor hexadecimal floating point truncates on division (except on the 360/91 where it rounds).

Many other processors from before IEEE 754 round only.

For discussion of double rounding, see:

formatting link

-- glen

Vote

R

Robert Wessel 11 years ago

That's a general issue with FP. And on most machines you're going to get the same (slightly off) result if you do 10*.01 or .01*10.

We're talking about some early Cray machines, which had a tendency to curdle a few low bits of various FP operations, so that while 10*.01 and .01*10 would still not produce 1.0 exactly, they'd produce slightly *different* values.

Numerical analysis on the early Crays was a contact sport.

Vote

D

Dimiter_Popoff 11 years ago

Hmmmm, while I did not know what they call "double rounding" obviously I knew what it is, as well as the obvious consequences.

On the power FPU there is the "round towards zero" mode, though. Meaning the infinitely precise result absolute value will always be rounded down. IOW it will not be rounded after the operation (at least this is how I understand it, seems clear enough). So doing divide such that the integer part of the quotient will fit in the 53 bits - or let's say in 31 bits for simplicity - will always be precise (unrounded). Once one turns "round to nearest" on (which is how I keep it most if not all of the time) obviously prior to recording the result in the FP register they will add a 1 to the bit just below the LSB and all of the problems you point to will apply. I am not that frightened by that sort of thing because I write in VPA, with the register model in mind, and if I begin to scratch my head about how practical it is to use FP for something I will just not use it, or will take into account the applicable rounding etc. side effects. On the example I gave earlier (the netmca-3) I used the FPU as a DSP, doing plenty of MAC and other messy calculations, where the LSB of the result (used up to 14 bits though more are available) is thinner than the noise of the incoming signal I don't remember even having to care much about rounding. I have had my moments with it, obviously, but I don't remember when and doing what .... :-) .

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

N

Nobody 11 years ago

Or maybe they just want to be sure that they have a *precise* specification in case they ever need to source perfectly-compatible parts, or perform simulations or analysis whose results aren't rendered meaningless by "minor" differences between implementations.

There are situations where you need different implementations to be able to perform identical calculations on identical inputs and obtain

*identical* results, down to the least-significant bit. In such cases, whether the results are "correct" is often less relevant than whether they're consistent.

Key things to avoid are unsafe optimisations (-ffast-math etc, i.e. anything which allows the compiler to pretend that floats observe the same rules as reals), extended intermediate precision (use

-ffloat-store and/or -fexcess-precision=standard), fused multiply-add (use

-ffp-contract=off and/or -mno-fused-madd), and transcendental functions (these invariably have to be implemented in software or avoided altogether).

Vote

two fpu compared

Join the Discussion

Didn't find your answer?