Extending floating point precision

I keep thinking that maybe, just maybe there is a way...

Let's look at it again; if you start with this (done in the hardware FPU)...

Double-precision 64 bit

1 11 52
Reply to
Guy Macon
Loading thread data ...

[...]

I think that's still a loser, because some numbers, irrational or repeating decimals, do not have whole bit boundaries at all. If I'm wrong about that, then I need to think some more.

Reply to
Bryan Hackney

No. I maintain it doesn't work. If you round one segment you have to know precisely what effect that had on all the lesser insignificant digits, and you have no idea of that because they were never calculated.

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
     USE worldnet address!
Reply to
CBFalconer

[...]

Again... Thanks RR...

I think that's still a loser, because some numbers, irrational or repeating decimals, do not have whole bit boundaries at all. If I'm wrong about that, then I need to think some more.

Reply to
Bryan Hackney

This first routine will split up a double value into two values that retain only enough mantissa bits so that a multiplication between two double values can be exactly rounded. (I dug it up out of some old QBASIC code I wrote, years back.)

SUB split (a AS DOUBLE, hi AS DOUBLE, lo AS DOUBLE)

DIM x AS DOUBLE, y AS DOUBLE

LET x = a * 134217729# LET y = a - x LET hi = y + x LET lo = a - hi

END SUB

Once split up, a value is carried as the pair (hi, lo), where the original value is simply hi+lo. To multiply two of these values together, say P and Q, you would first split P into (Phi, Plo) and Q into (Qhi, Qlo) and then perform the obvious computation:

FUNCTION mult# (p AS DOUBLE, q AS DOUBLE)

DIM phi AS DOUBLE, plo AS DOUBLE DIM qhi AS DOUBLE, qlo AS DOUBLE DIM r AS DOUBLE, dr AS DOUBLE

split p, phi, plo split q, qhi, qlo

LET r = p * q

LET dr = phi * qhi - r LET dr = dr + phi * qlo + plo * qhi LET dr = dr + plo * qlo

LET mult = r + dr

END FUNCTION

This combination will produce EXACT rounding, if I got it right back when I wrote it.

Consequently, I don't find your argument persuasive since the above code demonstrates the very case of exact rounding you say cannot be achieved.

Jon

Reply to
Jonathan Kirwan

I suspect somewhere we are talking about different things. You can't do that split in the first place. Floats are not neatly broken into maskable components. There is no such thing as an exact round - any round says "this is the nearest representation to the actual value" and drops the other information, for the simple reason that there is no place to hold that extra information. Remember that a float consists of the following components, represented in some form or other:

An exponent unit, usually (but not restricted to) 2. A significand, representing all values below exp. unit to 1. An exponent, the integral power for the exp. unit that forms a multiplier for the significand.

The whole thing represents a value, and has an intrinsic error. Knuth has a good discussion of it. The above neglects the gradual underflow technique.

You have no control over the representation, apart from the usual additive and multiplicative operations (that includes subtract and division). Other things are formed from those basics.

That is why you have to build the FP system up from the integral system, upon whose precision the precision of the FP depends.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
Reply to
CBFalconer

Certainly, that could be true.

Why?

I didn't mask anything. The number used depends only on the fact that the representation is binary, which the IEEE formats are. Actually, that method I've shown goes back at least to 1971. Long before IEEE's standard arrived on the scene.

You are repeating yourself, but not demonstrating anything here. Speaking louder or more frequently doesn't help. Facts do.

Did you test the code? It actually does break a double into two less precise portions, quite accurately and well, which can be re-added to provide the original value. Further, once the lower mantissa bits have been set to zero (as happens in that code), multiplications can be exact (in that, if you multiply two numbers with, say, 26 bits of valid mantissa and the remainder all zero bits, then the result will be something with potentially up to 52 bits of valid mantissa, which just happens to fit into a double with two guard bits to spare.)

The result is that the rounding does take place correctly under those circumstances... and exactly, given the two original numbers being multiplied are considered exact to begin with.

If you look at 'r' and 'dr' in the routine before they are added, they make up a pair of doubles with combined mantissas that yield all of the required precision for a full multiply of the two original numbers.

At least, I believe that is true given the article I'd read many years ago.

I am more than a little aware of this, having written a least some modest floating point packages of my own. Both recently and many years ago. However, it was for practical need and not as some theoretical dissertation on the subject, so I'm no expert on the subject. But I am intimately familiar with the mechanisms to make such a system work with practical effect.

But the question was put on the table by the OP...

How so? I didn't include some snippets of additional code that deal with large exponents (those greater than something like 10^292 and less than the limit of around 10^308, but I thought that would only confuse matters more.) But I would like to see a specific case (or a carefully crafted general argument) that illustrates your point better, as I don't follow you just now.

By the way, I've since discovered that it was Dekker, in 1971, who wrote the article I'd read so many years back. I'd forgotten who it was until I did a search. He has dealt with these issues using carefully crafted arguments and I relied upon these for what I'd tested, before. So I really would like to see what you are seeing here.

Different methods would apply, should you have a decimal based system. Of course. But I think the routine mentioned above is rather portable across systems using binary exponents applied to their binary mantissa bits. Which pretty much is everything, these days.

My preference, too. But the question by the OP remains and I've tried to deal with it, squarely.

But we could still be talking at cross-purposes. And I could, of course, be entirely mis-remembering the implications of Dekker's article. So I'll leave those doors open.

Jon

Reply to
Jonathan Kirwan

... snip ...

No, I didn't really look at the code, so I may be missing something. I have saved your message for possible future examination. I never use Basic, so translating it to C or Pascal will be a strain. You can find one of my floating point packages in back issues of DDJ - March and April 1979. All 8080 assembly code.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
Reply to
CBFalconer

Jonathan Kirwan wrote:

A nice fellow sent me some information by email that apples to this:

| | The Berkley folks have cracked this one with there double double quad | package. |

formatting link
| | There's an early paper/web page from about '95-'97 over in the UK. |
formatting link
| | Rumor has it this is how the latest nvidia GPUs do 128b floats. |

-------------------------------------------------------------------

For those who missed it, here is Jonathan Kirwan's code:

: This first routine will split up a double value into two values that : retain only enough mantissa bits so that a multiplication between two : double values can be exactly rounded. (I dug it up out of some old : QBASIC code I wrote, years back.) : : SUB split (a AS DOUBLE, hi AS DOUBLE, lo AS DOUBLE) : : DIM x AS DOUBLE, y AS DOUBLE : : LET x = a * 134217729# : LET y = a - x : LET hi = y + x : LET lo = a - hi : : END SUB : : Once split up, a value is carried as the pair (hi, lo), where the : original value is simply hi+lo. To multiply two of these values : together, say P and Q, you would first split P into (Phi, Plo) and Q : into (Qhi, Qlo) and then perform the obvious computation: : : FUNCTION mult# (p AS DOUBLE, q AS DOUBLE) : : DIM phi AS DOUBLE, plo AS DOUBLE : DIM qhi AS DOUBLE, qlo AS DOUBLE : DIM r AS DOUBLE, dr AS DOUBLE : : split p, phi, plo : split q, qhi, qlo : : LET r = p * q : : LET dr = phi * qhi - r : LET dr = dr + phi * qlo + plo * qhi : LET dr = dr + plo * qlo : : LET mult = r + dr : : END FUNCTION : : This combination will produce EXACT rounding, if I got it right back : when I wrote it. : : Consequently, I don't find your argument persuasive since the above : code demonstrates the very case of exact rounding you say cannot be : achieved.

-------------------------------------------------------------------

[Meta] Just in case one of you changes his mind about posting to misc.business.product-dev, I have set the modbot to autoapprove anything by Jonathan Kirwan or CBFalconer. You can verify this with a test post; it will be approved in less than a second - far too short for a human to change it or interfere with it in any way.

You can also check with Igor Chudov and confirm that his moderation software does not allow the moderator to make any changes to a post (headers or body) other than adding a clearly labeled moderators comment at the bottom or rejecting the post, and that once a poster is put on the whitelist, even those actions are impossible.

If there is anything else that I can do to assure you that your posts will not be in any way under the control of anyone else, let me know.

Reply to
Guy Macon

... snip ...

Before I go any further, this is my interpretation of the (ugh) Basic code, which makes no sense to me. The magic number is

8000001 in hex by my count, needing 28 unsigned bits to express. It will probable be expressed exactly in some doubles. This means the product may be the result of adding a value and the value shifted left by 28 places, in a binary system. 1xxxx...xxxxx1 1xxxx...xxxxx1 ============================ ^ ^--- about here is end of reg. only possible carry comes from here. This may result in: 1000000000000001111111111000 ^ ^-- truncated, no register room 1 + 1 yields 0 with carry here

and the truncation might cause the 0 sum bit to become 1 due to rounding.

which the FP will immediately shift right in normalizing, assuming all the significand bits marked x were set. I don't see that all this does anything for anybody. void split(double flt, double *hi, double *lo) { double x, y;

x = flt * 134217729; y = flt - x; *hi = y + x; *lo = flt - *hi; } /* split */

double mult(double p, double q) { double phi, plo; double qhi, qlo; double r, dr;

split(p, &phi, &plo); split(q, &qhi, &qlo); r = p * q; dr = phi * qhi - r; dr = dr + (phi * qlo) + (plo * qhi); dr = dr + (plo * qlo); return (r + dr); } /* mult */

... snip ...

What is this about? Is the cross-post moderated and delaying? I am in no rush.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
Reply to
CBFalconer

I am the moderator of misc.business.product-dev. Jonathan Kirwan is concerned about having his posts be subject to moderation, so he snips misc.business.product-dev when he replies. (All of this is perfectly acceptable, of course; his post, his choice). This causes replies from those who don't have a problem with moderation to also not go to misc.business.product-dev.

In an attempt to persuade Jonathan to post to misc.business.product-dev (empasis on the "persuade"; it's his choice to make), I have whitelisted him and those who reply to him so that anything he or they post goes right to the newsgroup without any possibility of moderation. I also offered to make him a moderator if that would help.

Reply to
Guy Macon

You seem to have something against the form. What reason, though, I really could not say. E'en floating point instructions can be fine if learning code technique is what you seek, or perhaps bringing brain cells greater fit- ness. (Use a rhyming meter? Not this one.)

mlp

Reply to
Mark L Pappin

Third try to post this silly response.

Reply to
Bryan Hackney
[...]

After trying 3 times to reply, and failing to propagate beyond my home server, I'm quite done.

Reply to
Bryan Hackney

Which must be why this one got through :-)

-- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson

Reply to
CBFalconer

This isn't the answer to what you're asking, but I've got something on my website (this page is six years old) that you may find interesting. I use the FP multiply instruction to do multiplies of 8 (decimal) digits by 8 digits with 16 digit result and "carry," which is then used to for multiprecision integer multiplication. Go here and scroll down to Factorials:

formatting link

-----

formatting link

Reply to
Ben Bradley

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.