Integer/Fixedpoint to 32 bit float

Vincent vB · 2016-04-06T13:17:05+00:00

Currently I'm in the process of replacing a custom compass / accelerometer with an ST LSM303D. The 'old' custom device produced single precision floats. Without parsing the values there where just passed inside a UDP packet. Unfortunately the LSM303D produces 16 bit signed integers. So, the embedded system needs to convert these values to floats. The scaling it self is quite simple: Values -32768..32767 need to be scaled to [-2,2). Now, my hardware has no floating point support. However doing the following: float output = ( (float)input ) / 16384f; will require quite a bit of FP magic. I would imagine that this: const float scale = 1f / 16384f; float output = ( (float)input ) * scale; may be faster, but still requires FP multiply support. Is there a simple and fast way which I can use to convert these integers to floats without the aid of an FP library? I have not found much code in this respect. Vincent

V

Vincent vB 10 years ago

I'll start with a 16 bit signed integer. Thanks for your step-by-step explanation!

Vote

V

Vincent vB 10 years ago

Well, its not really a microcontroller. It is a LatticeMico 32 on a Xilinx FPGA. I think its a big-endian processor, so this may work out. I've tried doing the (float)l, but then the LM32 compiler really attempts to convert the integer into a float. Horrible tricks like this would work (except for the compiler screaming 'murder and fire' as we say in Dutch):

uint32_t l = ...; float f; *f = (float*)&l;

Vote

D

David Brown 10 years ago

You handle this by writing:

float output = ((float) input) * 2.0f / 32768;

Then you let the /compiler/ generate code that works. Ignore everyone here who has suggested "count leading ones", "handle 0 as a special case", "subtract 14 from the exponent", etc. That is not your job - other people have already figured out this stuff long ago, and the bugs have been ironed out.

The LatticeMicro 32 compiler is gcc. Use the "-ffast-math" option to tell it that you are happy with a bit of obvious code re-arrangements rather than insisting on perfect IEEE operation - this lets you write "*

2.0f / 32768" to clearly express your intent in the code, while the /compiler/ turns it into "* (1f/32768f)".

Write your code clearly and correctly, and let the tools do the work.

Then all you need to do is make sure that you give the tools the best chance to generate fast code (such as -O2 -ffast-math, and whatever LM32 flags such as -mbarrel-shift-enabled and -mmultiply-enabled are appropriate for your particular cpu).

Vote

D

Dimiter_Popoff 10 years ago

Are you saying this will work without the compiler bringing in an FP library?

Dimiter

Vote

D

David Brown 10 years ago

Almost certainly the compiler will bring in parts of its FP library. But as long as you have /any/ floating point in the code, that is usually the case anyway. And assuming your library is constructed reasonably, you will only get the required functions linked in, not the entire library.

Of course it would be possible to write a dedicated and optimised function to handle this conversion from integer to floating point, combined with scaling. But you would be doing an enormous amount of work in order to save a few KB of code space and/or a few microseconds of run time - not to mention the significant effort in testing, the risk of code having bugs or portability issues, and the maintenance effort when the scale factors change.

Therefore my advice is to write the code simply, cleanly and in an obviously correct manner. Understand your tools and how to help them generate optimal code. And then get on with the rest of the project, having handled this tasks in a few minutes rather than days.

Vote

D

Dimiter_Popoff 10 years ago

The question you replied to was how to do the conversion _without_ bringing in an FP library.

Dimiter

Vote

V

Vincent vB 10 years ago

I finally came to this code:

=== #include #include #include

int clz(uint32_t val) { int t = 0; if ((val & 0xFFFF0000) == 0) t += 16; else val >>= 16; if ((val & 0x0000FF00) == 0) t += 8; else val >>= 8; if ((val & 0x000000F0) == 0) t += 4; else val >>= 4; if ((val & 0x0000000C) == 0) t += 2; else val >>= 2; if ((val & 0x00000002) == 0) t += 1; return t; }

static inline float castU32ToFloat(uint32_t f) { void * v = &f; return *((float *)v); }

float fltFromI16(int16_t val, int fBits) { bool sign; uint32_t ival; int zeros;

if (val == 0) return 0.0f;

if (val < 0) { ival = -val; sign = true; } else { ival = val; sign = false; }

zeros = clz(ival) - 16;

ival = ival

Vote

V

Vincent vB 10 years ago

There is an error in this line: > ival = ival ival = (ival

Vote

D

David Brown 10 years ago

The OP said that his cpu had no hardware floating point, and then he said he wanted to do the conversion "without the aid of an FP library". And yes, my recommendation uses floating point library code (technically it is part of the compiler language support library, rather than being part of the standard C library or other library, but it is still library code).

I should really have first asked the OP exactly why he requires the code without using the compiler's library. Usually when people say the don't want an FP library, it is because they have a fixed idea that software FP is always big and slow - but they have not properly considered or tested whether it is /too/ big or /too/ slow for the job, nor thought enough about the complications (size, time, development effort and risk) of alternatives. Of course, it may be that the OP /has/ worked through this and concluded that even the small amount of library code needed for the conversion is too large.

Vote

V

Vincent vB 10 years ago

I wrote a test, creating the fltFromI16 using floats, as suggested by expert David Brown.

I think I wrote my code clearly and correctly and let the /compiler/ do the work, with the following additional objects as result:

/libgcc.a(_mul_sf.o) /libgcc.a(_div_sf.o) /libgcc.a(_si_to_sf.o) /libgcc.a(_thenan_sf.o) /libgcc.a(_muldi3.o) /libgcc.a(_lshrdi3.o) /libgcc.a(_clzsi2.o) /libgcc.a(_pack_sf.o) /libgcc.a(_unpack_sf.o) /libgcc.a(_mulsi3.o) /libgcc.a(_udivmodsi4.o) /libgcc.a(_clz.o)

Used GCC flags that matter:

-mbarrel-shift-enabled -mmultiply-enabled

-msign-extend-enabled -Os -ffast-math

'The tools' also required 2736 more bytes for the same task than my highly flawed and inferior code.

Vincent

Vote

D

David Brown 10 years ago

Whether your own code is flawed or not depends on whether or not it works, and what coding standards or practices you follow. And whether it is inferior or not depends on your requirements - some would say that baking the scaling constants into a somewhat complicated manually optimised function is inferior, while others would say that requiring a couple of extra KB of library code is inferior. I can't tell you what /you/ need for /your/ design.

But I can ask you - are you so short on code space that these 2.7 KB is a significant issue? Or is it just that it "feels wrong" to "waste" this space? If it is the former (which is not unlikely if the code is in "rom" inside a small FPGA), then that's fine. If it is the later, then think about what is really the /best/ solution for you, the project, and the customer.

Vote

T

Tim Wescott 10 years ago

That gives you the template, but not the values to stick therein.

If it's on a Xilinx, and if the processor doesn't have a clz instruction, and if it's really not fast enough to do it "by hand", how about making that functionality out in FPGA-land? You could either do it as an extension to the instruction set (if Lattice makes it easy), or as a peripheral that coughs up an answer somewhere in the memory map when the operand is written somewhere in the memory map.

This seems to be a fairly easy thing to do in an FPGA; I'd be tempted to make an entire int-to-float converter "out there" if it really needed to be that fast.

www.wescottdesign.com

Vote

D

Dimiter_Popoff 10 years ago

I spent a few minutes to do it in my vpa (for power architecture), takes 60 bytes, destroys 2 registers. Might be some help, here it is:

formatting link

Vote

N

Nobody 10 years ago

If you have 256 kiB to spare, you could use a lookup table.

The following should work on any 32-bit architecture. count_bits() can be optimised, possibly to a single instruction.

static int count_bits(unsigned int x) { for (int i = 0; i < 32; i++) if (x>>i == 0) return i; abort(); }

static void int_to_float(int x, float *result) { if (x == 0) { *(uint32_t*)result = 0; return; } int negative = x < 0; if (negative) x = -x; int exponent = count_bits(x) - 1; // 0 .. 15 uint32_t bits = x

Vote

Integer/Fixedpoint to 32 bit float

Join the Discussion

Didn't find your answer?