# Integer/Fixedpoint to 32 bit float

• posted

Currently I'm in the process of replacing a custom compass / accelerometer with an ST LSM303D. The 'old' custom device produced single precision floats. Without parsing the values there where just passed inside a UDP packet.

Unfortunately the LSM303D produces 16 bit signed integers. So, the embedded system needs to convert these values to floats. The scaling it self is quite simple: Values -32768..32767 need to be scaled to [-2,2).

Now, my hardware has no floating point support. However doing the following:

float output = ( (float)input ) / 16384f;

will require quite a bit of FP magic. I would imagine that this:

const float scale = 1f / 16384f; float output = ( (float)input ) * scale;

may be faster, but still requires FP multiply support.

Is there a simple and fast way which I can use to convert these integers to floats without the aid of an FP library? I have not found much code in this respect.

Vincent

• posted

Of course there is, will cost a few operations. You can look up the IEEE 32 bit floating point format and figure out how to do it. Basically it will take several shifts left (you will have to count these) of your 16 bit integer until its MSB gets positioned as bit

23, subtract the count of shifts it took to get there from 150 (\$96) and insert this result into bits 23 to 32 (this will overwrite the MSB which got to position 23). For negative input numbers, negate first, convert as above and set bit 31 of the final result to 1.

Disclaimer: this is what I just made up at the moment by memory, I can't guarantee its correctness. It is meant to show you the direction, not to be a 1:1 algorithm you can use. You should understand the FP format yourself and figure out how to do it.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

------------------------------------------------------

• posted

Oops, sorry for the brain damaged "bits 23 to 32", it should read "bits 23 to 30" of course (my fingers here _again_ typed in 32 and I had to correct it.....).

• posted

[Remember to check for a negative input value, convert it to positive, and then later set the sign bit in the FP result]

That converts the value to FP.

To scale it by factor of 2/32768 you should be able to simply subtract

14 from the exponent.

Don't forget that the exponent is stored in an excess-127 representation (not 2's compliment or sign-magnitude).

```--
Grant Edwards               grant.b.edwards        Yow! I guess it was all a
at               DREAM ... or an episode of ```
• posted

... and start by writing a C test program to debug/test on your desktop, where you have the float library available to compare against...

• posted

Op 06-Apr-16 om 3:17 PM schreef Vincent vB:

Reality check: what is the use of having the values in floating point format, if you want to avoid using an FP library? What are you going to do with those values?

Wouter van Ooijen

• posted

Assuming you're starting with an IEEE float containing an integer

-32768..32767, you can just subtract 14 from the exponent, *but* you have to handle zero as a special case.

OTOH, that sounds like extra work, just convert it correctly in the first place. Starting with a 16 bit signed integer:

1. handle zero as a special case
2. consider handling -32768 as a special case
3. remove (and save) sign (IOW, take the absolute value)
4. count the number of leading zeros (lz) 4a. the result would be 1-15, 0-15 if you didn't special case -32768
5. put the result in a 32 bit unsigned integer
6. shift left (23-lz) places
7. set the 8 exponent bits (30-22) to (127+(15-lz)) 7a. note that this overlays one bit from step 6 7b. approximately: ui32 &= 0x007fffff; ui32 |= (127-(15-lz))
• posted

And duh, subtract the 14 in step 7 above...

• posted

And then write a test program that tests and verifies the conversion function for all 65536 possible input values.

You could try to figure out a minimal set of inputs that provides 100% test coverage, but

a) you'll be wrong[*]

b) with only a 16-bit input space, it'll be faster and easier to just test all possible inputs

[*] At least that's been my experience with stuff like this.
```--
Grant Edwards               grant.b.edwards        Yow! Now that I have my
at               "APPLE", I comprehend COST ```
• posted

The OP said that the rest of the system expects the direction values in single-precision floating point format.

```--

-TV```
• posted

Presumably the OP is just shipping the FP values out via some interface to another piece of the system that does have FP support of some sort. That's pretty common for embeded firmware in "sensor" type applicaions: you do all the of the data acquisition and signal conditioning using fixed point, and then convert it to IEEE FP before shipping it out the door. On devices like that, often the only FP operation that's available is a custom-written "convert from fixed point to floating point with some hard-wired scaling" function.

```--
Grant Edwards               grant.b.edwards        Yow! !  Now I understand
• posted

Oh come on, the only special case with a 16 bit integer -> 32 bit IEEE FP is the case of 0 which is obvious enough (or will make itself obvious if one tries to shift left and count until finding the 1....).

The scenario when you can be wrong is possible if we throw in all the NaNs and tiny and rounding etc. but these just do not apply to this case, it is straight forward.

Dimiter

• posted

What processor are you using that does not have easy to find floating point software? I would expect that to come with the tools.

IIRC, the IEEE floating point standard has a whole bunch of stuff that you need to do to handle exceptions, but doing your conversion should be pretty direct. You'll need to find the leading significant digit (which will probably be what takes the longest), shift, and then shove mantissa, sign, and exponent into floating point format.

```--
Tim Wescott
Wescott Design Services ```
• posted

Hi Tim, finding the leading 1 is not necessarily taking the longest, on power it is 1 cycle (cntlz, count leading zeroes). The 68020 had "bitfield find first 1" (don't remember the exact mnemonic, I did miss that on the CPU32 which was more or less a downgraded 020). I expect ARM have that too in some form (?), it is key to being able to say find the largest contiguous block in a bitmap allocated space.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

------------------------------------------------------

• posted

CLZ. Available for ARM v.5 and above in full instruction mode (not available in Thumb).

```--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com

Email address domain is currently out of order.  See above to fix.```
• posted

gcc has "int __builtin_clz (unsigned int x)", lso long and long-long versions. These map to whatever is most efficient for your hardware.

It's a pity that there's no integer equivalent of ldexp; maybe called ldiexp.

To the OP: If your endian-ness and compiler bit-fields work out, you can use this (works for me on x64 with gcc) for building and breaking float values.

typedef union { float f; struct { uint32_t mantissa:23; uint32_t exponent:8; uint32_t sign:1; }; } FloatU;

Note that building a floating point value like this is likely to be slower than just saying "(float)l" - with any decent compiler. But it will help you understand what's going on.

Clifford Heath.

• posted

Based solely on the fact that he doesn't have floating point support already, I'm assuming that he's using some dire little 8-bitter.

Which could be horribly wrong, but -- no floating point support?

```--
Tim Wescott
Wescott Design Services ```
• posted

tweak the exponent to get scale to +-2

-Lasse

• posted

Indeed, I'm afraid I can not change the output format. The older modules also need to be supported.

• posted

Thanks for the link! Very informative

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.