Do you have a question? Post it now! No Registration Necessary
 Guy Macon
June 30, 2005, 10:50 pm
My project team was having a discussion about doing math in
assembly language, and an interesting sideissue question was
raised that nobody could answer.
We all know how to extend all sorts of math operations in fixed
point with more bits than are native to the processor. and we
all know how to plug in a floating point library or to use a
floating point coprocessor if there is one, but is it possible
to extend floating point math operations to have more bits than
are native to the coprocessor?
For example, let's say that I have a math coprocessor that handles
ANSI/IEEE Std 7541985 floating point in singleprecision 32 bit
and doubleprecision 64 bit:
Singleprecision 32 bit
1 8 23 < Width
++++
Si Exp  Mantissa 
++++
3130 2322 0 < Bit #
length 32 bits
sign 1 bit
exponent 8 bits
mantissa 23 bits(+1)
bias 127
Doubleprecision 64 bit
1 11 52 < Width
++++
Si Exp  Mantissa 
++++
6362 5251 0 < Bit #
length 64 bits
sign 1 bit
exponent 11 bits
mantissa 52 bits(+1)
bias 1,023
Is it possible, through some fiendishly clever scheme
of coprocessor operations, carries, multiplies, and
black magic, to write a routine to do math in "Quadruple
precision" like this?
1 22 105
++++
Si  Exp  Mantissa 
++++
127126 105104 0
length 128 bits
sign 1 bit
exponent 22 bits
mantissa 105 bits(+1)
bias 2,097,151
(Or maybe exponent23% bits and mantissa10%4 bits(+1)..)
To answer the obvious questions, no, I don't have a use for
such precision (nobody has asked me to measure the width of
the universe with a precision of the width of a neutron
lately), and yes, I do know how to do this with a FP routine
written in integer math. What I am curious about is whether
I can do it through a clever scheme of carries, multiplies,
etc. on the results from 64bit floating point hardware.

Guy Macon <http://www.guymacon.com/
Guy Macon <http://www.guymacon.com/
Re: Extending floating point precision
On 30 Jun 2005 22:50:08 GMT, Guy Macon
Yes, if by "clever" you mean largely obvious. Just using a vector of
floats instead of a scalar and requiring "exact rounding" and a few
simple rules about the operations. Can be coded portably, I believe.
Must be a lot of web pages on the subject, I'd imagine. I believe I
read about this some 15 years ago and I'm sure it was old, then.
Jon
Re: Extending floating point precision
(1) Because I don't care to have my posts able to be in any way under
the control of anyone else  even if that control remains unexercised
 where that's possible. (2) And perhaps more especially, when under
the specific circumstances with which I was presented in this case.
...
Regardless, do you see the obvious? Or would you like some easy
examples to help illustrate, more precisely? I think I can remember
enough to provide some concrete examples.
Jon
Re: Extending floating point precision
Jonathan Kirwan wrote:
I understand. If you ever wish to post there without any chance
of outside control, I will be happy to make you a moderator.
Maybe I am just being dumb today, but I don't see the obvious, and
neither did the other two engineers who I was having lunch with.
That being said, just knowing that it's possible is probably enough
for me to get it, so let me work on it tomorrow when I am fresh.
Thanks!
Re: Extending floating point precision
On Fri, 01 Jul 2005 03:47:12 +0000, Guy Macon
My life is filled enough as it is....
That's the right spirit!
A clue: you implied that you want to be able to use the existing low
level operators to combine values, yet to do so it must be done
without any rational chance of exceeding the final precision in the
result. This result is, of course, usually of the same size as the
values supplied to the binary operators. Looking at multiplication
should stress this point more clearly than addition, by the way. The
way to achieve the extension should be clearer, once you stare at that
problem, squarely in the face of it, and figure out how to deal with
it. Just focus on multiplication of two values.
But I'm also sure that an almost trivial search of the web will pop up
more than a few somethings on this subject. I'm sure it's just too
important to have somehow escaped getting some moderate level of
attention there.
Jon
My life is filled enough as it is....
That's the right spirit!
A clue: you implied that you want to be able to use the existing low
level operators to combine values, yet to do so it must be done
without any rational chance of exceeding the final precision in the
result. This result is, of course, usually of the same size as the
values supplied to the binary operators. Looking at multiplication
should stress this point more clearly than addition, by the way. The
way to achieve the extension should be clearer, once you stare at that
problem, squarely in the face of it, and figure out how to deal with
it. Just focus on multiplication of two values.
But I'm also sure that an almost trivial search of the web will pop up
more than a few somethings on this subject. I'm sure it's just too
important to have somehow escaped getting some moderate level of
attention there.
Jon
Re: Extending floating point precision
... snip ...
Are you are talking about the usual practice of expressing A as
a = a0 + a1 * 2^N
where the ai are in the precision we have, and the a in the
precision we want, and N quantifies the precision we have. We can
then do multiplications etc. and combine the portions. The problem
in the floating world is that we can't control the normalization
and rounding done. That action on the more significant portion
will mask anything from the less significant portions. At least as
I see it.
What we can do is to build an arbitrary precision set of integer
operations, and then build a single floating point mechanism upon
that. To me that means there is no point in having a hardware
floating point processor at all, if all we need is the extended
float.

Chuck F ( snippedforprivacy@yahoo.com) ( snippedforprivacy@worldnet.att.net)
Available for consulting/temporary embedded and systems.
Chuck F ( snippedforprivacy@yahoo.com) ( snippedforprivacy@worldnet.att.net)
Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Extending floating point precision
Alright, what are you talking about. Rounding is inherently
inexact. I see no way of using the usual FP system to extend its
own precision. Range, yes. Precision, no.

Chuck F ( snippedforprivacy@yahoo.com) ( snippedforprivacy@worldnet.att.net)
Available for consulting/temporary embedded and systems.
Chuck F ( snippedforprivacy@yahoo.com) ( snippedforprivacy@worldnet.att.net)
Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Extending floating point precision
Hmm. If you reduce the precision of the numbers you store, though a
technique of splitting up the precision, then calculations on them
will have enough bits to have absolutely precise results and you will
be able to _precisely_ round them back to your split precision. Does
this help?
Jon
Re: Extending floating point precision
No. I maintain it doesn't work. If you round one segment you have
to know precisely what effect that had on all the lesser
insignificant digits, and you have no idea of that because they
were never calculated.

Chuck F ( snippedforprivacy@yahoo.com) ( snippedforprivacy@worldnet.att.net)
Available for consulting/temporary embedded and systems.
Chuck F ( snippedforprivacy@yahoo.com) ( snippedforprivacy@worldnet.att.net)
Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Extending floating point precision
This first routine will split up a double value into two values that
retain only enough mantissa bits so that a multiplication between two
double values can be exactly rounded. (I dug it up out of some old
QBASIC code I wrote, years back.)
SUB split (a AS DOUBLE, hi AS DOUBLE, lo AS DOUBLE)
DIM x AS DOUBLE, y AS DOUBLE
LET x = a * 134217729#
LET y = a  x
LET hi = y + x
LET lo = a  hi
END SUB
Once split up, a value is carried as the pair (hi, lo), where the
original value is simply hi+lo. To multiply two of these values
together, say P and Q, you would first split P into (Phi, Plo) and Q
into (Qhi, Qlo) and then perform the obvious computation:
FUNCTION mult# (p AS DOUBLE, q AS DOUBLE)
DIM phi AS DOUBLE, plo AS DOUBLE
DIM qhi AS DOUBLE, qlo AS DOUBLE
DIM r AS DOUBLE, dr AS DOUBLE
split p, phi, plo
split q, qhi, qlo
LET r = p * q
LET dr = phi * qhi  r
LET dr = dr + phi * qlo + plo * qhi
LET dr = dr + plo * qlo
LET mult = r + dr
END FUNCTION
This combination will produce EXACT rounding, if I got it right back
when I wrote it.
Consequently, I don't find your argument persuasive since the above
code demonstrates the very case of exact rounding you say cannot be
achieved.
Jon
Re: Extending floating point precision
I suspect somewhere we are talking about different things. You
can't do that split in the first place. Floats are not neatly
broken into maskable components. There is no such thing as an
exact round  any round says "this is the nearest representation to
the actual value" and drops the other information, for the simple
reason that there is no place to hold that extra information.
Remember that a float consists of the following components,
represented in some form or other:
An exponent unit, usually (but not restricted to) 2.
A significand, representing all values below exp. unit to 1.
An exponent, the integral power for the exp. unit that forms
a multiplier for the significand.
The whole thing represents a value, and has an intrinsic error.
Knuth has a good discussion of it. The above neglects the gradual
underflow technique.
You have no control over the representation, apart from the usual
additive and multiplicative operations (that includes subtract and
division). Other things are formed from those basics.
That is why you have to build the FP system up from the integral
system, upon whose precision the precision of the FP depends.

"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
We've slightly trimmed the long signature. Click to see the full one.
Re: Extending floating point precision
Certainly, that could be true.
Why?
I didn't mask anything. The number used depends only on the fact that
the representation is binary, which the IEEE formats are. Actually,
that method I've shown goes back at least to 1971. Long before IEEE's
standard arrived on the scene.
You are repeating yourself, but not demonstrating anything here.
Speaking louder or more frequently doesn't help. Facts do.
Did you test the code? It actually does break a double into two less
precise portions, quite accurately and well, which can be readded to
provide the original value. Further, once the lower mantissa bits
have been set to zero (as happens in that code), multiplications can
be exact (in that, if you multiply two numbers with, say, 26 bits of
valid mantissa and the remainder all zero bits, then the result will
be something with potentially up to 52 bits of valid mantissa, which
just happens to fit into a double with two guard bits to spare.)
The result is that the rounding does take place correctly under those
circumstances... and exactly, given the two original numbers being
multiplied are considered exact to begin with.
If you look at 'r' and 'dr' in the routine before they are added, they
make up a pair of doubles with combined mantissas that yield all of
the required precision for a full multiply of the two original
numbers.
At least, I believe that is true given the article I'd read many years
ago.
I am more than a little aware of this, having written a least some
modest floating point packages of my own. Both recently and many
years ago. However, it was for practical need and not as some
theoretical dissertation on the subject, so I'm no expert on the
subject. But I am intimately familiar with the mechanisms to make
such a system work with practical effect.
But the question was put on the table by the OP...
How so? I didn't include some snippets of additional code that deal
with large exponents (those greater than something like 10^292 and
less than the limit of around 10^308, but I thought that would only
confuse matters more.) But I would like to see a specific case (or a
carefully crafted general argument) that illustrates your point
better, as I don't follow you just now.
By the way, I've since discovered that it was Dekker, in 1971, who
wrote the article I'd read so many years back. I'd forgotten who it
was until I did a search. He has dealt with these issues using
carefully crafted arguments and I relied upon these for what I'd
tested, before. So I really would like to see what you are seeing
here.
Different methods would apply, should you have a decimal based system.
Of course. But I think the routine mentioned above is rather portable
across systems using binary exponents applied to their binary mantissa
bits. Which pretty much is everything, these days.
My preference, too. But the question by the OP remains and I've tried
to deal with it, squarely.
But we could still be talking at crosspurposes. And I could, of
course, be entirely misremembering the implications of Dekker's
article. So I'll leave those doors open.
Jon
Re: Extending floating point precision
... snip ...
No, I didn't really look at the code, so I may be missing
something. I have saved your message for possible future
examination. I never use Basic, so translating it to C or Pascal
will be a strain. You can find one of my floating point packages
in back issues of DDJ  March and April 1979. All 8080 assembly
code.

"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
We've slightly trimmed the long signature. Click to see the full one.
Re: Extending floating point precision
A nice fellow sent me some information by email that apples to this:

 The Berkley folks have cracked this one with there double double quad
 package.
 http://crd.lbl.gov/~dhbailey/mpdist /

 There's an early paper/web page from about '95'97 over in the UK.
 http://members.lycos.co.uk/keithmbriggs/doubledouble.html

 Rumor has it this is how the latest nvidia GPUs do 128b floats.


For those who missed it, here is Jonathan Kirwan's code:
: This first routine will split up a double value into two values that
: retain only enough mantissa bits so that a multiplication between two
: double values can be exactly rounded. (I dug it up out of some old
: QBASIC code I wrote, years back.)
:
: SUB split (a AS DOUBLE, hi AS DOUBLE, lo AS DOUBLE)
:
: DIM x AS DOUBLE, y AS DOUBLE
:
: LET x = a * 134217729#
: LET y = a  x
: LET hi = y + x
: LET lo = a  hi
:
: END SUB
:
: Once split up, a value is carried as the pair (hi, lo), where the
: original value is simply hi+lo. To multiply two of these values
: together, say P and Q, you would first split P into (Phi, Plo) and Q
: into (Qhi, Qlo) and then perform the obvious computation:
:
: FUNCTION mult# (p AS DOUBLE, q AS DOUBLE)
:
: DIM phi AS DOUBLE, plo AS DOUBLE
: DIM qhi AS DOUBLE, qlo AS DOUBLE
: DIM r AS DOUBLE, dr AS DOUBLE
:
: split p, phi, plo
: split q, qhi, qlo
:
: LET r = p * q
:
: LET dr = phi * qhi  r
: LET dr = dr + phi * qlo + plo * qhi
: LET dr = dr + plo * qlo
:
: LET mult = r + dr
:
: END FUNCTION
:
: This combination will produce EXACT rounding, if I got it right back
: when I wrote it.
:
: Consequently, I don't find your argument persuasive since the above
: code demonstrates the very case of exact rounding you say cannot be
: achieved.

[Meta] Just in case one of you changes his mind about posting to
misc.business.productdev, I have set the modbot to autoapprove
anything by Jonathan Kirwan or CBFalconer. You can verify this
with a test post; it will be approved in less than a second  far
too short for a human to change it or interfere with it in any way.
that his moderation software does not allow the moderator to make any
changes to a post (headers or body) other than adding a clearly labeled
moderators comment at the bottom or rejecting the post, and that once
a poster is put on the whitelist, even those actions are impossible.
If there is anything else that I can do to assure you that your posts
will not be in any way under the control of anyone else, let me know.
Re: Extending floating point precision
... snip ...
Before I go any further, this is my interpretation of the (ugh)
Basic code, which makes no sense to me. The magic number is
8000001 in hex by my count, needing 28 unsigned bits to express.
It will probable be expressed exactly in some doubles. This means
the product may be the result of adding a value and the value
shifted left by 28 places, in a binary system.
1xxxx...xxxxx1
1xxxx...xxxxx1
============================
^ ^ about here is end of reg.
only possible carry comes from here. This may result in:
1000000000000001111111111000
^ ^ truncated, no register room
1 + 1 yields 0 with carry here
and the truncation might cause the 0 sum bit to become 1 due to
rounding.
which the FP will immediately shift right in normalizing, assuming
all the significand bits marked x were set. I don't see that all
this does anything for anybody.
void split(double flt, double *hi, double *lo)
{
double x, y;
x = flt * 134217729;
y = flt  x;
*hi = y + x;
*lo = flt  *hi;
} /* split */
double mult(double p, double q)
{
double phi, plo;
double qhi, qlo;
double r, dr;
split(p, &phi, &plo);
split(q, &qhi, &qlo);
r = p * q;
dr = phi * qhi  r;
dr = dr + (phi * qlo) + (plo * qhi);
dr = dr + (plo * qlo);
return (r + dr);
} /* mult */
... snip ...
What is this about? Is the crosspost moderated and delaying? I
am in no rush.

"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
We've slightly trimmed the long signature. Click to see the full one.
Re: Extending floating point precision
CBFalconer wrote:
I am the moderator of misc.business.productdev. Jonathan Kirwan
is concerned about having his posts be subject to moderation, so
he snips misc.business.productdev when he replies. (All of this
is perfectly acceptable, of course; his post, his choice). This
causes replies from those who don't have a problem with moderation
to also not go to misc.business.productdev.
In an attempt to persuade Jonathan to post to misc.business.productdev
(empasis on the "persuade"; it's his choice to make), I have whitelisted
him and those who reply to him so that anything he or they post goes
right to the newsgroup without any possibility of moderation. I also
offered to make him a moderator if that would help.
Re: Extending floating point precision
CBFalconer wrote:
Ah. Yes, I see how to do that  build an arbitrary precision set
of integer operations, and then build a single floating point
mechanism upon that. That's probably the most efficient way to
do it as well, under normal circumstances. Then again, some of
the newer processors can pipeline FPU instructions without slowing
down the rest of the CPU, so building extended precision math out
of existing floating point instructions would be essentially free.
The way I write code, the FPU has a *lot* of time on it's hands...
Re: Extending floating point precision
[...]
I think that's exactly it. As soon as you lose a quantum of precision
on a boundary, everything to the right of that boundary is useless.
It may appear OK, and that would be a dangerous assumption, because
after many dozens or thousands of additional operations, the less
significant portion of the numbers would be unrecognizable as well
as being wrong.
I think that's probably the only way.
Site Timeline
 » Reverse current into a lithium battery
 — Next thread in » Embedded Programming
 » MSP430, gcc > IAR
 — Previous thread in » Embedded Programming
 » [AVR] What assembler to use?
 — Newest thread in » Embedded Programming
 » Serial EEPROM or Serial Flash?
 — Last Updated thread in » Embedded Programming
 » Pb Pile
 — The site's Newest Thread. Posted in » Electronics (French)