In industrial control systems, often 8-16 additional bits are transferred with the actual measurement all the way from the sensor through out the system to an operator display or controller. These extra bits are often called data quality or fault bits. Such bits could include sensor cable open/shorted, out of range etc. but an overflowing intermediate calculation could add an overflow bit to this bit mask.

The final data user then has to determine how to react to this data. On the operator's display, questionable data could be displayed in a different colour or discard questionable data from a control loop.

In the Harrisburg case, there was a lot of confusion, which sensors produced reliable values and which didn't. In some situations, the quality of data may be even more important as the actual value.

Now the question is, is it sufficient to code some special values into the FP representation or add one or two lines in a FPGA implementation.

If not, then no need to wire it up. Seems to me that some would do best with saturating arithmetic, where overflow generates the largest value and underflow zero. That is probably better than wrapping.

My favorite for FPGA is the systolic array. Some algorithms are easy to convert to systolic array form, others not so easy.

For S/360 style (still available with z/Architecture) radix 16, they would only come from add unnormalized, and subtract unnormalized. For other instructions, on post normalization underflow they either generate zero or wrap the exponent and interrupt, depending on a mask bit.

For S/360 style prenormalization for add/subtract, it is done based on the exponent value and not the bits. The Fortran AINT function (truncate to integer, but keep in floating point form) you just add X'47000000' that is, 0*16**7. During prenormalization, the appropriate bits will be shifted out, zero added, and then post normalized.

Note also how easy it is to read S/360 style floating point values in a hex dump. The first (leftmost) two digits are sign and seven bit biased exponent, then six or 14 hex digits of fraction in base 16.

So, yes, if you stored a denormal it would work right in add/subtract. I am not sure what multiply and divide would do with one.

The function may have had a pole at some point where you ran it. In this case the algorithm is faulty if the floating point architecture doesn't handle infinity properly. IEEE-754 was designed to make the algorithm non-faulty. That said, there are a lot of sharp corners to the behavior so you're going to write a non-buggy algorithm that makes use of it, it helps to really know what you're doing with the intricacies of the standard.

Interesting paper. If I'm correct in reading that his main thesis is "be paranoid and don't trust the math to be correct" then I'm already there, although there's certainly details in there that round out my understanding.

It's interesting to contrast his dislike for single-precision floating point, vs. the abundance of processors on the market with single- precision floating point hardware.

Yes. The same *could* be true in software if, for example, floating point data types were *tuples*: treat the "flags" as something different from the "value". 754 just tries to jam everything into the smallest space possible (presumably, to allow more bits to be used for numeric data!)

The same can be done in software, as well. But, makes the code more tedious to write and maintain. Littered with conditionals, etc. The joy (?) of 754 is that you can defer worrying about intermediate results (assuming you can afford the time to continue processing something that has been meaningless since very early in the computation!).

[Actually, I thought the pun effective! :> ]

Again, ditto for software. Esp if you are conscious of the implementation costs to use floating point (whether hardware or software).

When tackling a numerically intense algorithm in software, I immediately focus on how to "cheapen" operations to keep my product cost/size/complexity lower. It's no big deal keeping track of two Q14.1's being multiplied and treated as a Q20.2 (assuming you can discard high bits due to range limitations inherent in the data!). Then, summing 8 of these and opting to represent the total as a Q23.1 (etc.).

In hardware, you make these decisions all the time because each "bit" in a register eats up some resource, increases wiring complexity, etc. "Do I *really* need this, here? I'm just going to divide by 1000 in the next step and discard the fractional component..."

In software, there are seldom any gains to be made by this sort of thinking *if* you've already adopted a "real" data type! (Same sort of reasoning applies to floating point DSP's)

I have to say that the whole concept of serious floating point numerics happening in dinky embedded cpu's (much less fpga's) is a new and in some sense astonishing development. Most numerics work including probably IEEE-754 was done for the world of scientific and engineering computation running on big computers or at least PC-class workstations. Very small implementations weren't really a consideration.

Indeed but we have on a chip more processing power than many people had back then inside a room - and sometimes we can make use of the resources we have (as opposed to making a couple of GHz range cores too slow to respond to a pressed button which we are also (too?) familiar with... :-) ). Obviously we still do not need FP to do the DSP-ing but sometimes this is just what we have at hand. Not so long ago I did the netMCA-3 (

formatting link

); I had made a similar device using a TI DSP (the 5420) some years back. This time I used an MPC5200B where the only way to do the MAC intensive work was to use the FPU (32 64-bit FP regs), I could have a 64 bit MAC done in about 5.5 nS. Obviously I had to go through lots of considerations, which data to keep as 16-bit integers, which as .s FP (32 bit, that is), and which as .d FP (64 bit) as data moving and conversion to get to MAC was anything but negligible. But the FPU came to good use, there was no need for a second processor chip at all, the whole OS, networking, windows etc. etc. needed just this one processor. ( board:

When it comes to dark corners of floating point issues, my "go to" reference is (this is a reprint, not the original):

I suspect there are more recent, similar writings out there. But, this served me well ages ago (mid 90's) when I first starting pushing the limits of "real" data types in "appliances" (i.e., where you don't always have someone handy to NOTICE screwups!)

I wouldn't describe it as "lightly and amusingly written" by a long shot! But, if you've ever written a floating point library and.or floating point algorithms that had to work in ALL cases, it is an excellent discussion of many of the "why's" and the "issues to be wary of".

Well worth the time to read, IMnsHO. And, *reread* if you actually need to *understand* these issues!

I've scanned most of my paper documents out of necessity. Just take up too damned much *space*! Unfortunately, they aren't searchable in that state -- but, they weren't searchable as paper, either! :>

Boils down to understanding your algorithms and where they are likely to blow up. It's just too easy for folks to "write an equation" and, because they're using "floats",

*assume* it's going to work ("I checked for division by zero so all is well...").

The same exercises that protect you from shooting off your foot here also have value when trying to economize on *fixed* point algorithms. E.g., reordering operations to avoid overflow, preserve as much precision as possible, etc.

I don't trust bookmarks. Online stuff tends to "disappear" over time. So, I download anything interesting and move it to my "References" archive.

Makes it harder to point *other* people to those documents without doing a "current" web search (I kept misremembering the author's name as "David GREENberg" and finally had to spin up my archive to find his *real* name :-/ )

Also, really hard to come up with effective ways of tracking

*large* numbers of documents (without being able to access their content in searches!)

No, "fast-math" is not about "withstanding wrong answers" any more than "numerical maths on computers is about withstanding wrong answers". It is about being less fussy about details that are irrelevant to your application. If you are writing software on a microcontroller for positioning a motor, then insisting on strict IEEE might give you a few molecule width's worth of accuracy, at the cost of not getting the calculations done in time. Rough floating point - such as "-ffast-math"

- is about a different balance between the accuracy and the cost of the calculations.

I think I have said in several places that this is about /embedded/ applications - /obviously/ you have to pick your implementation details according to your needs.

The kind of applications where IEEE really gets useful is when you need accurate control of the order of calculations - such as summing power series or inverting large matrices. These sorts of calculations can quickly result in rubbish even with DP floating point, if you are not precise about calculation ordering.

Note that these occur on a regular basis in HPC, but far less so in embedded systems. Different types of problem, different solutions.

If you don't know your program works, it is a useless program.

That does not mean you "verify all possible inputs" - it means you write correct code, think about different possible cases, consider corner cases and extreme cases, and write test code to confirm your theories.

If you don't know how to do that for the program in hand, you are not qualified to write the code. That means if you are writing code that does floating point calculations including things like matrix inversions, and you don't know how to be sure your values remain accurate (to within your requirements) and avoid nasty things like dividing by zero, subtracting nearly-equal numbers, etc., then get someone else to write the code.

In the real world, NaN's are nonsense. At best, they represent mistakes. Embedded systems are about the real world. Hence, NaN's are nonsense in c.a.e. and c.dsp. (Infinities might turn up in the mathematical theory behind the code - that's find, because they can be useful mathematical concepts. But in floating point code, they tell you that your algorithm or your implementation is either misused or broken.)

Provided, of course, that your compiler or combination of compiler flags doesn't decide to "optimise" the computation.

True in the past, but now embedded systems are sufficiently powerful to contain complex numerical algorithms that previously were the domain of HPC experts. There are, I am told by people that I respect, very good reasons why FORTRAN still rules in numerical computing. See some of the references in other postings for reasons, e.g.

Dunno. I don't use google, normally. And, tend to be pretty good at coming up with keywords that quickly refine my searches.

E.g., I was able to quickly find the Goldberg article once I *stopped* looking for GREENberg! :>

With my own archive, I just have to remember where I likely

*put* something! E.g., will source code for device X be stored with device X-specific files? Or, with source code for the sorts of apps that X *performs*??

I'm also (belatedly) learning to create composite file names for things that are named unintuitively. E.g., x23498.pdf may mean something to its author, but not to me! In the past, renaming this as "Watchamacallit User Manual.pdf" worked well. Until I found myself with files that were duplicates ONLY ON CLOSER INSPECTION. "Gee, the Watchamacallit User Manual and x23498 sure *seem* to be the same document..."

So, new naming scheme: "Watchamacallit User Manual (x23498).pdf" (too bad you can't tie metainformation to *all* files in a portable manner!)

To get back to hardware, I don't know if it's possibly useful but Intel has been plugging a board called the Minnowboard as their answer to the Raspberry Pi, Beaglebone type of board. It uses a 1 GHz Atom processor and uses something like 3.5W of power (IIRC, I may be mistaken). It's all FOSS, has GPIO pins etc. like a Raspberry, and has hardware floating point including double precision. Plus it has gigabit ethernet. Main drawback is it's rather expensive at $200 or so.

I think $200 puts it entirely out of the running for a mainstream popular product. Heck, I've bought a netbook for $230. I don't think this is even remotely an "answer" to either of these products. This price puts it solely in the domain of an eval board or reference design for corporate users.

I've been *really* disappointed with the results! You have to double-check everything to ensure nothing has been "dropped".

I've seen programs that would *skip* large blocks of text. Or, get confused over images with callouts, complex formats, etc.

So, I've opted to just preserve an *image* of the page in the hope that, someday, tools get smarter (or, I find myself with gobs and gobs of time to proofread tens of thousands of scanned pages :-/ )

Disk space is cheap -- $0.10/GB? It's not worth the downside risk.

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.