While those who make heavy use of floating-point computations in embedded sytems probably go for systems with hardware support for floating-point calculations, there are plenty of systems without such support out there. And the existence of soft float implementations indicates that there is a real need for doing floating-point computations on systems without hardware floating-point support. I'd like to know a bit more about that.
What kind of computations do people use soft float for? Any real-world examples?
Which types of soft float do they use? IEEE 32- or 64-bit? Some other C standard-compliant type? 16-bit IEEE? Other 16-bit?
On 20.5.2018 ?. 12:56, Philipp Klaus Krause wrote:
I can only speak for myself of course, here it goes.
Some 20 years ago on our first Nukeman (
) I needed floating point. The need came late in the project, during spectrum evaluation one has to calculate isotope decay during the time acquisition has taken place (some isotopes have short half lives so this matters a lot). This took e^x or something like that, and the processor was an MC68340 - a CPU32 core (sort of a downgraded 68020) which had no FPU. So I wrote the FPU functions I needed - just those I needed, basic arithmetic plus the exponential thing I believe, may be something else I don't remember at the moment. The format was that the 68K series used, called "extended precision", 96 bit. However the precision was far far from it - the 68340 had 32*32->64 bit multiplication so I left the mantissa precision at just 32 bits, was plenty. Later when our spectrometers (multichannel analyzers) moved to power architecture I had again to deal with FP emulation as power has a very nice and fast FPU but it only does the fundamentals, i.e. mul, div, add, sub etc. Well, and FMAD (multiply/add) of course, this was crucial (could do what a 5420 DSP used to do formerly). I had to do square root again, e^x etc. (so the old 68340 code would run when compiled with vpa); I also had to handle the .x (extended precision, 96 bit) format, the FPU on the PPC core does 64 bit standard FP (again the precision I did was that of a 64 bit IEEE number, just format conversion without extending the mantissa precision). Another few years later I did a sort of complete FP library, was not in a dire need of it but I thought a respectable OS like DPS - and the entire environment - would be less respectable if there was not one I suppose. I did plenty, sin, sinh, log etc., don't remember it all now. It comes handy though, often used also in shell scripts...
How much precision did you even need? If you have a regular clock tick and not to many decays in parallel you can do exponential decay just by subtracting some fraction (shift and subtract) on each cycle.
Most recently, I used division by fractional multiplication to speed up calculation of DDS delta words. The master clock is fixed, and so is the calibration factor to use, so you can calculate how many bits of headroom you have and do the division by a multiplication plus a shift back by the headroom amount. That's a kind of floating point:
Even a crappy little Arduino UNO can do that in 100usec, which isn't bad for effectively a 64-bit divide. The precision averages better than 1/10 LSB compared to the full divide.
It is not so simple, far from it really. During spectrum acquisition, when you have a "clock tick" you don't know what is in the spectrum, acquisition times of many hours - or days - are normal. So all you have the acquisition time and the half lives after you have identified the isotopes. Then the operator needs to be able to calculate activities referenced to any particular moment, not the moment of acquisition. Etc etc. Steering away from FP makes sense as long as you don't have an FPU on chip of course, but sometimes using FP - even emulated - is just the easiest way to go. I had managed back then - prior to writing the evaluation mentioned - to do curve fit to spectrum peaks using fixed point, works quite well to this day - although I have migrated some (not sure how much) of the arithmetic to FP since I have one anyway.
Any C compiler will provide software floating point for IEEE 32-bit "float". Most also provide 64-bit "double" support, but for 8-bit targets it is not unusual to have non-standard 32-bit "double".
I would expect the vast majority of software floating point usage to be directly from a C compiler. Some few people write their own functions optimised to their particular needs (such as unusual accuracy requirements), but that is rare. For the basic arithmetic, you are unlikely to make significantly better general-purpose floating point functions than the ones provided by the compiler, and you have the huge advantage of convenience of use.
Maths functions (like trig functions) are a different matter. In embedded systems, even those with floating point hardware, it is not uncommon to have your own functions with a more appropriate balance between speed and accuracy than the slow but IEEE bit-perfect standard functions.
What is so "bit perfect" about IEEE floats ? The only thing that I can think of is handling of infinity and denorms and other special values. t seems that the IEEE standard committee thought that is completely OK to have INF intermediate results and still continue doing further calculations with such sick values.
Some "IEEE" soft libraries handle ordinary values quite well, but might not handle these special cases properly.
In an embedded system you usually try to clean the mathematics before coding in order to avoid divide by zero or having extremely small (less than 1E-38) intermediate results.
Before IEEE floats, each computer manufacturer had their own floating point representation, some used sign/magnitude significand, while other 2's complement, some used biased exponent while other 2's complement or sign/magnitude exponent, some used base 2 float with/without hidden bit normalization while others used base 16. Some had one more bit in the significand or other used an extra bit to exponent to extend the range from 1E38. Converting computer tapes containing binary floats was a mess, usually the data on the sending machine was converted to printable ADCII (or EBCDIC) characters like the character string "1234.56" and on the receiving side it was converted to binary in the other system format. Of course, this meant loss of 1-3 least significant bits.
If you could do this directly in binary, one could receive a wizard status :-). But still a special conversion was required for each pair of computer manufacturer.
When the IEEE format was adopted by most manufacturers, much of these conversion problems were solved. If some still used some proprietary float format, the manufacturer provided conversion from/to their private format to IEEE. Conversion between two private formats was now done via IEEE, only loosing about 1 bit of accuracy, compared to 1-3 bit losses via printable decimals. This century, I have only needed to do one such to/from binary conversion for an ancient hardware.
In practice, it is not a big issue using any proprietary (non IEEE) float library internally, unless you need to handle INF or other special values or need to communicate binary floats to/from the external world.
Exactly how rounding and intermediate values interact is a sticky issue. For example, the old Intel 8087 had 80-bit internal FP registers. If you do a series of floating point operations in the most obvious/efficient way (with 80-bit intermediate values), the result you get might not be IEEE "bit-perfect" correct.
Grant Edwards grant.b.edwards Yow! My polyvinyl cowboy
at wallet was made in Hong
Not only that, but the x87 has a very limited 8 level stack, which can overflow on very complex expressions.
Consider a situation, in which some FP calculations are done in main program, which expression calls a function, which also does some complex FP calculations on the parameters.
If the function is declared in the same module before the main program, the compiler _might_ (or might not depending on optimizer settings) inline the function and do all the calculation on the 80 bit hardware stack. However, if the function is in an external module, the compiler can't in-line so some 32 bit parameters will be passed.
Of course, the same problem exists if 64 bit IEEE are used internally and some intermediate 32 bit IEEE values may be passed.
However, for typical embedded applications, this might not be so much an issue, since both ADC and DACs are typically 8, 12 or 16 bits. I practice, only when cumulating long series of values, the intermediate procession becomes critical. Even in such cases knowing in which order values are cumulated, such as cumulating every millisecond to one hour and then cumulating hours into months instead of cumulating millisecond values directly into monthly sums, will solve the problem.
All true. Soft FP isn't as bad as some may think. But in the specific situation I mentioned, the integer-based approach is still many times faster. And I've made good use of exponential decay (IIR filters) done with integers too. It's just another tool in the math toolbox.
IEEE specifies rounding, errors and limits for all sorts of operations. The idea is that if you have two IEEE compatible implementations (hardware, software, doesn't matter) and pick the same settings for roundings, you will get exactly the same (or perhaps a LSB out) results for many operations.
In particular, if you have a maths library that is IEEE compliant and use a function like "sin", you will get an accuracy to within a bit or two. Following C standards, IEEE standards, and common practice, the "sin" function is typically done at 64-bit double resolution - thus your "sin" function will be bit-perfect to 52 bits. This is, of course, utterly pointless if you are using that function for driving a motor and have a 10-bit PWM resolution.
Using IEEE makes it far simpler to work with tools, because they all give the same results. In particular, it means your compiler (if it is smart enough, and the right optimisations are enabled) can often do calculations at compile time, knowing the results will be the same as at run time.
Very often you only actually need /approximate/ IEEE compatibility. For gcc (and clang), the flag "-ffast-math" is very useful - it tells the compiler that you don't care about infinities, denormals, and other such awkwardness, and you are happy to assume the floating point maths is associative, commutative, etc. This can make results vary marginally depending on the optimisation details, but can give you much smaller and faster code.
You still need to write your own transcendental functions if you need speed, however.