# Damned floating point

• posted

I assume the answer is "yes", but is it normal behavior in an x86 processor to get slightly different answers when you add up the same short vector of doubles at different times?

(Very slightly different -- basically so that sum1 == sum2 isn't true, but not noticeable in any other way.)

I assume this has something to do with how the floating-point processor state is treated when a calculation is done. Does anyone know the details?

TIA.

```--
Tim Wescott
Wescott Design Services ```
• posted

Floating point addition is not commutative. Never has been, never will be. IOW, (a+b)+c is *not* the same as a+(b+c).

In short its a problem with the rounding of intermediate results. Consider a three digit decimal FP system, and you add 100, -100 and .01. If you compute (100 + -100)+.01, you'll get .01. If you compute (100+.01) + -100, you get 0, since the (100+.01) rounds back to 100. It's actually fairly easy to get into a situation (like the one I just illustrated) where the result is *entirely* noise. People doing long sums (matrix multiplication, for example), often take considerable pains to try and minimize the accumulated error, including carefully controlling the order in which things are added.

In general, there are few cases where two FP numbers can be meaningfully compared for equality. Usually the correct approach is to do something like "fabs(a-b) < epsilon", although picking epsilon can be a bit tricky.

"What Every Computer Scientist Should Know About Floating-Point Arithmetic":

• posted

Can't believe I wrote commutative... Argh. FP addition does not follow the usual rules of associativity. OTOH, there have be machines where elementary FP operations have not been commutative (Cray-1 FP multiplication, for example, where a*b often didn't equally b*a).

• posted

In this case, though, I'm computing score = a + b + c. Strictly in that order. Twice, with other floating point calculations in between. And getting different answers.

```--
Tim Wescott
Wescott Design Services ```
• posted

That's not at all unusual. There are extra guard bits in the FPU, and if an intermediate value at any times gets put in a CPU register or memory it can affect the rounding. I'll 2nd the idea of reading the docs.oracle.com link. (and never* test floating point values for equality) (*for limited definitions of never)

• posted

Make sure you are not mixing floats and doubles. If you are casting ints to floats be careful with your cast ordering. Are you messing with a,b or c between the 2 calculations. There was a test we used to run on Prime Computers that went something like: float f1=1.0 for (i=1;i

• posted

In addition to Robert Wessel's excellent response:

If the operations are performed in the same order every time, then the results *could* be the same. However ...

The FPU registers carry extra bits internally which are lost (zeroed) when the values are stored to memory and then reloaded. This happens, e.g., when the compiler runs out of registers and needs to spill temporaries.

It also can happen during a context switch if you multitask, although this is less predictable because registers can be spilled lazily and (from your POV) non-deterministically depending on the states of the other processes using the FPU.

In general, if you want to guarantee consistent results, you need to control when and where intermediate results are stored to memory. In the most extreme cases that could mean storing the result of every individual operation.

George

• posted

No, not if you add up the same numbers in the same order with the same code. Can you post some sample code? If you're using gcc and the summations are in different parts of the code, did you try

-ffloat-store? That can get rid of inconsistencies that can result from doing some computations in the 80-bit FP registers and others in 64-bit machine words. But it can slow down the code.

Can you refactor the summation to a non-inlined function (call same function from both places) and see what happens? Something like

noinline double sum(int n, double vec[]) { ... }

the noinline should make sure it's the same code called from both places.

• posted

Until now I had only ever done it to test if a value had been copied (i.e., a = b). It seemed safe...

```--
Tim Wescott
Wescott Design Services ```
• posted

I appreciate what you're trying to say, but it's too much work -- I found a way to do the comparison the "right" way (fabs(a - b) < tolerance) and still meet my other goals, and that's what I'm running with.

```--
Tim Wescott
Wescott Design Services ```
• posted

Are a & b & c identicle both times? Maybe rounding 1/2 up half the time and 1/2 down the other half...

Hul

Tim Wescott wrote:

• posted

In general, if you want to do this, it means you're doing something wrong.

And it is not easy to do that...

Which is going to involve never doing more than one operation per statement and declaring all sorts of stuff volatile and/or putting in memory barriers between all the statements. It'll get ugly fast, and you'll never really know it's going to continue to work if the compiler or surrounding code changes.

To address the context-switch problem you might have to disable interrupts or lock the scheduler -- unless you know for a fact that your OS preserves guard bits during FPU context switches.

```--
Grant Edwards               grant.b.edwards        Yow! Am I SHOPLIFTING?
at                ```
• posted

If you're doing this on the x87 FPU, Mark is right, but his explanation is a bit wrong. Normally the x87 does everything with 80 bit temp reals, and it can lead to odd results if in some cases the compiler decides to store a value back to memory (in which case it'll use the defined format, 64 bits for a double, and then you'll lose some precision that might have been there the first time it was used). So if you have:

a =... b =... c =... score = a + b + c; ...other code... score2 = a + b + c;

The first sum might be using values of a, b, and/or c still in registers from the calculations that produced them in the 80-bit format, but by the time you get to the second sum, those values may be gone, and the code will reload the saved values of a/b/c from memory, which will have been stored rounded to 64-bit format.

There are ways to set the precision of the FPU (so instead of 64 bits of precision you get 53 or 24), but compilers often don't do that (MSVC gives you _control87() so you can do it yourself). Your compiler might also have a "strict IEEE math" option, which will force stores and (re-)loads to avoid the issue (IOW everything gets forced to work in 64-bit format), you might be able to set that ("-fp:strict" in MSVC).

If your compiler has an option to do FP with SSE2, that will also eliminate the variable precision problem (and that should be true of all compilers generating x86-64 code).

Also note that the two sums do *not* have to be performed in the same order, and that's actually at least somewhat common in practice if one of the values in still in a register. For example if c happened to be used in "...other code...", that register might still be live, and the second sum might just add a and b (from memory) to the c still in the register, where as the first sum might have added the three values in the "normal" order.

• posted

Hi Tim, you already got all these "excellent answers" but I am not sure we know how you are testing this. If you do it using high level then we don't know if this is an FPU rounding issue or a compiler shortcoming. Basically, you want to go down to the lowest level to see what's up, (obviously I know you don't need to be told that). I don't know the x86, but it must also have that FPSCR (floating point status and control register) which gives you control over rounding modes etc.

Generally in an isolated environment (i.e. no task switches which might fail to save/restore all of the FPU context, or might be unable to if the FPU design won't allow it, say the guard bits - I have not seen that but I would not be too surprised to see it) you have to get the same result all other things being same. Key being "all other"...

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

------------------------------------------------------

• posted

The x87 floating point stack is small, so you may run out of stack registers whit complex arithmetic expression. If part of the expression is calculated in a floating point function, you easily can run out of stack registers. If the function is declared locally, the compiler will know if the whole calculation can be done in the FP stack. However, if the function is defined external, the compiler doesn't know the stack usage, so it may need to save FP stack register across the call.

Floating point processors also have various rounding/truncation modes set by some flags. If these bits are altered between calculations, the result might be different.

The MMS multimedia instructions use the same FP registers so the float values need to be saved during the MMS calculations and the compiler may fail to set the original rounding/truncate mode during register restore.

• posted

It's on a PC, so I'm pretty much going to treat it as an appliance, say "oh well, I can't compare for equality in this instance", and move on.

I'm just curious about what might be going on under the hood.

When I revamped the code I actually ended up making it tighter, which was a happy happenstance.

```--
www.wescottdesign.com```
• posted

Oh it is the right thing to do anyway, instead of comparing FP values for equality the practical thing to do is to compare the difference between the two values for being lower than some threshold of your choice.

I am not sure there is a practical way to do it as things seem to be, it just won't be worth the effort.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

------------------------------------------------------

• posted

OK, I'm baffled. At least three people have mentioned failure to store guard bits as a possible source for this error. Unless there's a *very* unconventional usage equating the extra bits of precision in the 80-bit format used by default in the x87 to "guard bits", it's just not so.

Guard bits are using within a single FP operation to prevent loss of precision when the two operands need to be scaled. These are never visible in the ISA, and never persist between operations (except in an indirect way in that an operation may have a more precise result than it would otherwise).

Context switches are also non-issues. You can save the state of the entire x87 FPU, and restore it later. That stores the contents of all

80 bits of each FP register. Now it's theoretically possible that an OS might manually step through the individual registers, and store only (say) 64-bit values for each, but I've never heard of such a thing - not only would it be a lot more complex, it would likely be quite a bit slower too, especially when the context would need to be restored.
• posted

In general yes. However there are applications that really do need floating point, high speed, and reproducible results.

As you noted, getting consistent results from the hardware is not easy. But fixed point using the ALU is not always a viable option, and almost any use of software FP will be slower than even the slowest FPU code.

Case in point: some years ago I worked on a system for rendering MRI data. At that time, MRI "pixels" were 32-bit floats - now they are

64-bit doubles. There are differing levels of USFDA certification for tools ... not all require reproducible results ... but receiving certification for diagnostic/surgical use does require that identical inputs produce identical outputs. And because MRI sometimes is used _during_ surgery, there is a medical need to produce pictures as quickly as possible.

George

• posted

When the compiler needs to spill what it sees to be a "double" value, it will save only 64 bits even when the x87 is in (default) extended mode. Unless you deliberately set the x87 into double mode, you don't just lose a few guard bits, you lose 11 bits of actual value when the

64-bit extended mantissa is truncated to 53 bits for storage.

Not all x86 compilers recognize "long double" as an 80-bit value, some treat it as a synonym for "double". And AFAIK, every x86-64 compiler treats "long double" = "double" ... they don't use the x87 at all but rather use XMM instead.

The extra ~3 bits of guard precision are available as long as the value remains in an FPU register.

Both Windows and Linux save XMM registers lazily - but they can do this because there are [many] hidden rename registers. As long as the CPU doesn't run out of rename registers, it can preserve XMM values across a context switch without spilling them to memory.

This goes for ALU values also. On x86-64, at least theoretically, it is possible to context switch without spilling any values at all. In practice it is CPU model and code dependent [and likely very hard to achieve in any case].

I don't know whether x87 state is similarly protected. I don't know whether the early Pentiums ever did FPU renaming, and since the Pentium 4 (SSE-2) x87 use has been highly discouraged, so I tend to doubt that there has been much (or any) enhancement since then.

George

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.