#### Do you have a question? Post it now! No Registration Necessary

**posted on**

June 25, 2005, 3:09 pm

I wonder if you could help me with this.

Because of system restrictions, I have to convert input data (floating

point, 32 bit, IEEE format) to a 16 bit format (for example: 1 sign

bit, 5 exponent bits, 10 mantissa) and, after processing, back to IEEE

32 bits.

What issues can I expect in terms of dynamic range, clipping etc? Also,

what would be the most efficient way to convert between the two

formats?

Many thanks.

Re: Floating point issues

Come on Tom, the guy's probably a computing 101 student, asking for advice.

Assuming only +ive values ... You've assumed that you have to convert fp32

to fp16.

One commonly used approach is to convert the FP data into unsigned 16 bit

integer, that is, multiply the fp number by, say, 1000, do the math

processing, then convert the integer back to fp by dividing by 1000.

Since unsigned 16 bit integer has range 0 - 65535, the conversion gives you

an effective fp range of 0 - 65. If you use a factor of 100, you get an

effective fp range of 0 - 655. Get the picture?

Try thinking outside the box a bit: Are you are limited to ALWAYS having

just one 16 bit integer for each fp32 value? There's nothing wrong with an

encoding scheme that might use an occasional extra 16 bit value to define

something useful. For example, what about scanning each line (or buffer

full) to see what the minimum value per line is, then each integer value is

the delta ie base

___value (int16), delta___values (int16). This will increase

your effective dynamic range considerably, and make very little difference

to the "system restrictions" you mention. Your algorithm sets the line or

buffer size.

example: base value is 3000, and all buffer values are the delta * 1000 (as

described above).

This scheme could be modified to set the base value to the data mean, then

have signed integer delta values.

You should be able to work out dynamic range and clipping issues from this.

--

regards,

Stewart DIBBS

regards,

Stewart DIBBS

We've slightly trimmed the long signature. Click to see the full one.

Re: Floating point issues

We've had at least one long thread about FP formats here recently.

The math(s) may be trivial, but an appreciation of the trade-offs

seems to be distinctly non-trivial. If were choosing or designing

a 16-bit format, I'd want some reassurance that others who had been

down the same road had made similar decisions.

Re: Floating point issues

Well, just to enumerate the primary decisions:

1. Do negative values exist. If so, 1 bit used.

2. Range required.

2a. Exponent base. Affects resolution.

3. Resolution required. Affects range.

For most purposes a system with binary exponents and significands

is likely to be optimum. Others do not allow the use of the

implied leading one bit which get the sign bit for free, or can be

considered to give one extra bit of resolution.

--

"A man who is right every time is not likely to do very much."

-- Francis Crick, co-discover of DNA

"A man who is right every time is not likely to do very much."

-- Francis Crick, co-discover of DNA

We've slightly trimmed the long signature. Click to see the full one.

Re: Floating point issues

Google for "half float format". It is used by ATI and nVidia to save memeory

and gain speed while rendering floating point. Also, the file format OpenEXR

supports half floats and is well documented, including discussions on half

float advantages over fixed point and 32 bit float. The OpenEXR libraries

are open source and include half float conversion and, IIRC, math.

Re: Floating point issues

*no-spame-matt> Google for "half float format". It is used by ATI*

*no-spame-matt> and nVidia to save memeory and gain speed [ ... ]*

*no-spame-matt> [ ... ] including discussions on half float*

*no-spame-matt> advantages over fixed point and 32 bit float. [*

*no-spame-matt> ... ]*

Quite interesting. But for small number of bits I was quite

impressed by FOCUS and other logarithm based ''floating

point'' formats:

http://DBLP.Uni-Trier.DE/rec/bibtex/journals/cacm/EdgarL79

"Communications of the ACM", v22 n3, March 1979: "FOCUS

Microcomputer Number System"; Albert Edgar,Samuel Lee.

AB%FOCUS is a number system and supporting computational

algorithms especially useful for microcomputer control

and other signal processing applications.

FOCUS has the wide-ranging character of floating-point

numbers with a uniformity of state distributions that

give FOCUS better than a twofold accuracy advantage over

an equal word length floating-point system.

FOCUS computations are typically five times faster than

single precision fixed-point or integer arithmetic for a

mixture of operations, comparable in speed with hardware

arithmetic for many applications. Algorithms for 8-bit and

16-bit implementations of FOCUS are included.BB%

They require a different programming attitude though, probably

the reason why GPUs use them.

IIRC Some UK academic had started a few years ago a company for

hardware accelerated logarithm based ''floating point'', but it

does not seem to have achieved world domination yet, which may

be a pity.

Re: Floating point issues

Peter Grandi schrieb:

Sounds really interesting.

Is there any more detailed information about the principle available (online and

for free)?

The only link I found so far is

<http://portal.acm.org/citation.cfm?id35%9080.359085

which is a commercial portal - absolutely no information unless you pay... :-(

Sounds really interesting.

Is there any more detailed information about the principle available (online and

for free)?

The only link I found so far is

<http://portal.acm.org/citation.cfm?id35%9080.359085

which is a commercial portal - absolutely no information unless you pay... :-(

--

Dipl.-Ing. Tilmann Reh

http://www.autometer.de - Elektronik nach Maß.

Dipl.-Ing. Tilmann Reh

http://www.autometer.de - Elektronik nach Maß.

Re: Floating point issues

The generic term is probably LNS "logarithmic number system"

there are many variants. Apart from software

implementations in hardware/VLSI are common too.

The basic simple variant is:

Addition and subtraction are easy with integers.

The most straightforward way to get from integer

to logarithmic format and back is tables. There one has

strength reduction: multiplication, division is addition,

subtraction again.

Tables can be appropriate with low-resolution applications,

i know of a flight-simulator for very early x86 PCs.

But for high-resolution the tables get out of hand, so

from time to time someone comes up with a new fix for that

problem. Often with a two word format similar to float.

Log Point Technologies had a version in 1997 that

claimed tables from 17k - 55k bytes. For 8 bit

microprocessors and DSPs with 32x32 integer multiply.

An old Ph.d:

Stouraitis "Logarithmic Number System, Theory, Analysis and Design"

University of Florida

gives in google as "stouraitis logarithmic" about 250 hits

one can examine.

Another starting point would be groups.google on

comp.arch.arithmetic with "logarithmic"

Thats the oldest (?) and certainly the most often quoted. I think

there is a description of Focus as a chapter in a book too.

MfG JRD

Re: Floating point issues

*pg_nh> Quite interesting. But for small number of bits I was quite*

*pg_nh> impressed by FOCUS and other logarithm based ''floating point''*

*pg_nh> formats: [ ... ] They require a different programming attitude*

*pg_nh> though, probably the reason why GPUs use them. [ ... ]*

Oops, that should have read "the reason why CPUs

***don't***use them".

Re: Floating point issues

When using smaller floating points, you first have to define

the requires dynamic range. What is the biggest number,

what is the smallest number. Then is the take as many bits to

represent this range in the exponent and the rest is left for the

mantissa.

Rene

--

Ing.Buero R.Tschaggelar - http://www.ibrtses.com

& commercial newsgroups - http://www.talkto.net

Ing.Buero R.Tschaggelar - http://www.ibrtses.com

& commercial newsgroups - http://www.talkto.net

#### Site Timeline

- » Reverse current into a lithium battery
- — Next thread in » Embedded Programming

- » Windows CE .NET & Visual Studio .NET
- — Previous thread in » Embedded Programming

- » Webinar: Introducing the new Colibri SoM based on the NXP i.MX 6ULL SoC
- — Newest thread in » Embedded Programming

- » Wzmacniacz w.cz layout
- — The site's Newest Thread. Posted in » Electronics (Polish)

- » 1920 x 1080 o 3840 x 2160 ?
- — The site's Last Updated Thread. Posted in » Electronics Hobby (Italian)