Small, fast, resource-rich processor

The number of computations in a Kalman filter is something on the order of 6 * n^3, where n is the number of states you're trying to estimate. For more than a few states (and I'm well over ten here), you'd need a Really Big FPGA to do the whole computation in parallel.

If you look in finer detail, it's a bunch of matrix multiplies and adds, and one division. Matrix multiplies are nothing but a collection of dot products, just like a FIR filter is one dot product, so that part is known.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com
Reply to
Tim Wescott
Loading thread data ...

My understanding is that it's way more than a handful -- you'd need to ask someone who's actually done it, though.

What I was told, ages ago, was that in an IEEE-compliant floating-point coprocessor, there's more silicon area devoted to detecting and handling the exceptions than there is to doing an actual computation.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com
Reply to
Tim Wescott

I dunno, at 45 nm, a handful is a whole bunch!

--
Randy Yates 
Digital Signal Labs 
http://www.digitalsignallabs.com
Reply to
Randy Yates

of course. in fact, that conversion of the mantissa from sign-magnitude format to 2's-complement is something you would have to do if implementing math functions (at least for addition) on some hardware context like an ASIC or FPGA. at least i don't understand very well how you could add two IEEE floats without doing it.

the issue for me is that it might have been nice if the IEEE guys had defined 754 to have a twos-complement rendering of negative values rather than having a sign-magnitude rendering. then no need to pull the sonuvabitch apart to do a compare with an integer machine.

--

r b-j                  rbj@audioimagination.com 

"Imagination is more important than knowledge."
Reply to
robert bristow-johnson

Exactly. Even neglecting the "cost" (time/silicon/etc.) involved.

Far more frequently overlooked is a critical analysis of the actual algorithm ("equations") used and how they behave in the "dark corners". I.e., do you understand the implementation of the data type well enough to know what sorts of test cases to throw at it to *really* stress it's performance in conditions that *may* well pop up in The real World?

(This is akin to someone naively testing for equality among floats -- though much more nefarious!)

Reply to
Don Y

(snip on systolic array for FIR)

Yes, you don't do that. I believe that there is literature on systolic array matrix multiply. Most obvious is N processing units, so N**3 multiplies in O(N**2) clock cycles. For smaller N, some might do a 2D array. Easiest when N is constant (known at compile time) and the array is appropriately arranged.

How big is N, and is it somewhat constant? (within a small range)

-- glen

Reply to
glen herrmannsfeldt

A favorite example over the years has been the quadratic formula.

It seems so easy, but with the appropriate values you get enough cancellation that the results are very far off. Also, the values don't look so obviously exceptional.

(snip)

-- glen

Reply to
glen herrmannsfeldt

Yes. Any time you're mixing very big and very small. Solve one type of quadratic equation one way; another a *different* way -- based on a, b and c.

(The first time this happens to you, you stare at the results and the code and wonder what the hell went wrong! Everything

*appears* correct -- except the "answer" :-/ Thereafter, you learn not to be so naive when approaching these problems! Regardless of single, double, extended, quad, etc. precision formats!! Sometimes, numbers aren't *just* numbers...
Reply to
Don Y

"Most things"??? What does that mean? How do you know if your app fits into the definition of "most things?

Ok

--

Rick
Reply to
rickman

I don't know how many times I have to explain this.

There are two kinds of numerical algorithms:

1) poorly behaved (unstable, working near singularities, etc). 2) well-behaved (everything else)

It usually isn't that hard to tell which kind you're dealing with in a given application. Well-behaved algorithms are characterized by a slow buildup of errors rather than massive oscillation or whatever.

With double precision, you're potentially in bad shape with #1 and there's not much you can do, but you're generally in good shape with #2, for reasons already explained. You can tolerate a lot of 1-ulp error buildup before the imprecision reaches into the realm of physical measurement.

With single precision, you can get clobbered even in situation #2, like in that Patriot missile example. That's the difference.

Also, situation #1 is more common than #2 because real-world numerics applications are pretty routine, and use algorithms designed by numerical analysts to avoid bad behavior. Like including the step of finding the largest pivot and swapping rows in the Gaussian elimination article that I mentioned. That took some sophistication to figure out, but they teach it in math class now, so everyone knows about it unless they skipped class.

Of course if you've got someone skipping math class AND designing numerical algorithms despite having skipped class AND choosing low precision for those badly designed algorithms because they wanted to save 5 cents worth of transistors, well, all bets are off.

Reply to
Paul Rubin

+1 I had a colleague who would periodically "re-bug" our floating point library. All in the hope of eeking out a tiny bit more performance or trimming a few bytes out of the code. :<

Well, it's biggest advantage is that it presents *a* "standard". Previously, the capabilities of the "floating point library" would vary from project to project, environment to environment, etc. 754 can be overkill for lots of applications (esp if you know you don't need NaN's, denorms, etc.) but it's often easier just to implement it all and forget about it.

OTOH, amusingly, Limbo was created with with a single "real" data type -- doubles. Yet, some concessions were made to efficiency -- no support for denorms, gradual underflow, a different set of traps, etc.

So, its interesting to see how folks juggle requirements and capabilities over time!

Reply to
Don Y

That was the subject of a very long debate mentioned in the interview I linked in another post, but consensus finally emerged at the time, and appears to have held up in retrospect, that the denormals (I think this is what they mean by gradual underflow) was the right thing.

Really, they are used in calculations. You can run a calculation without a lot of intermediate tests because you can check at the end if a NaN came out. Similarly in cases where you can get real answers despite the appearance of Inf in some intermediate result (e.g. since

1/Inf=0), you can rely on it working. That isn't someone abusing the standard in some way that's too smart for their own good. The standard was designed in order to make that type of calculation work in the determinate cases and give NaN in the indeterminate cases.
Reply to
Paul Rubin

(snip)

I think you snipped out too much.

In an FPGA implementation, it is easier to just run a separate line saying that the value is Inf or NaN. That is faster and easier than coding it into 64 bits, and then decoding it again just a little later.

It is the bit representation that isn't needed, not the concept.

(I suppose that was confusing, since I was also suggesting that the concept of denormals wasn't needed.)

-- glen

Reply to
glen herrmannsfeldt

I'd have to say that is a misconception. It's not for repeatbility: IEEE imposes those requirements in order to help numerical algorithms give the right answers. I.e., bypassing the requirements leads to wrong answers. Mathematics is not modern art or literary criticism where everything is subjective and there's no right or wrong answer. Numerical problems actually do have wrong answers and IEEE-754 was designed because the mathematicians who designed it had a huge amount of experience with previous systems and they were tired of getting wrong answers and they decided to (somewhat) fix the situation.

If your application can withstand wrong answers, then sure. It's something you have to be judicious about rather than generalizing.

Verification (in the sense of certifying that the program does the right thing for ALL POSSIBLE inputs, not just for your test vectors) is a very complicated subject. I know a little about how it's done for integer math, but floating point adds another level of complexity and I just mean I don't have any clue how the high-assurance community deals with it. It goes way beyond the topic at hand though.

There is nothing nonsensical about NaN's. They are part of the standard and algorithms intended to run on standard-conformant hardware can and do use NaN and Inf on purpose, expecting them to propagate through calculations the way the standard says they should. If your floating point implementation requires avoiding them, then you're imposing restrictions on the user's algorithm choices.

Reply to
Paul Rubin

I see, yeah, that makes some sense, as long as the algorithm can make use of the features. I'm still pretty unclear about how a conventionally presented algorithm is supposed to be translated into FPGA form.

Reply to
Paul Rubin

I highly recommend reading this set of notes: - lightly and amusingly written - information dense - university course in "how to avoid being bitten by computer arithmetic" - theoretical, why features are there and how features interact - practical, how various languages get it right/wrong - written by somebody that has been on the sharp end of diagnosing corner-case HPC "issues" over the last 40 years Even a cursory inspection will cut through some of the arrogant guff that has appeared in this thread

formatting link
formatting link

Reply to
Tom Gardner

Everybody on this thread should, at the very least, speed read

formatting link
formatting link

It is amusingly written by someone that's been on the sharp end of diagnosing numerical problems with HPC since the 60s.

Even a cursory > David Brown writes:

Reply to
Tom Gardner

Read, learn and inwardly digest the (amusing written) contents of

formatting link
formatting link
Amusing, written by someone on the sharp end of numerical problems since the 1960s.

Then you will begin to have an understanding of where the dragons lie.

Reply to
Tom Gardner

The denormal question only exists due to using hidden bit normalization in a Radix-2 system. Since a Radix-8 or Radix-16 floating representation does not have that hidden bit issue, were denormals ever a problem with these machines ?

Reply to
upsidedown

Sounds like sweeping the problem under the carpet :-).

If you really get infinity at some intermediate stage, this really looks that the algorithm was faulty from the beginning or not valid for the range of arguments.

Relying on 1/Inf=0 may hide much more fundamental problems.

Reply to
upsidedown

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.