Small, fast, resource-rich processor

Tim Wescott · 2013-09-11T16:11:04+00:00

I'm working on a project that needs to have a pretty hefty amount of digital signal processing done in more or less real time ("soft" real time, if you must split hairs). For a variety of reasons I think this algorithm would work best on a small single-board computer (my customer disagrees -- but getting it shoe- horned into the chips I was considering is going to take WORK, and I think it'll be cheaper for them to go with more expensive hardware). So I'm looking for suggestions. I mostly build custom boards or I make algorithms for other people's hardware -- I've never specified a single- board computer that's gone into production. I was thinking PC-104, but I've never actually used a PC-104 computer, and I have no idea, beyond trade-show displays, how the market has evolved. So, here's what I think I need. Anyone who wants to look through this and point me to the current crop of solutions for all this is welcome to do so -- I'll be grateful. Small: PC-104 form factor, or some other solution that's less than about 20 square inches of board and less than an inch tall. Fast: Something that supports native dual-precision floating point, and has a clock rate of 500MHz or better. This algorithm runs about 5x faster than real time as a Linux application on a Dell Dimension 8300. That's a 2.8GHz Pentium 4, so if it's running alone it should do more with less. Resource-rich: The algorithm runs, albeit way slow, on a STM32F407, using less than 128kB of memory. So at least that much memory plus whatever is necessary for any OS (see below). Ports: Comes with serial ports. I don't need Ethernet or that stuff. Depending on the processor (see below), having a JTAG debug port would be nice. Extensible: I need something onto which I can easily slap an ADC board, or something that talks USB, and suggestions for matching ADC modules that talk USB. My preference is something that has an easy parallel I/O implementation, an SPI controller that I can hook...

T

Tom Gardner 12 years ago

It is surprisingly difficult to get rid of the corner cases, even where humans aren't involved.

When confronted by suits thinking time is easy and quick, I've found it useful to put them on the spot by asking the questions:

How many - days in a week - months in a year - days in a month - days in a year - minutes in an hour - seconds in a minute - hours in a day

Everybody gets at least one of the answers right. Most eventually get three right. Most get two or three wrong.

Vote

D

Don Y 12 years ago

The problem is that these "units" only have nominal definitions

*and* can vary culturally (as well as over time). Everyone *thinks* they know what time is -- until pressed for details.

Sort of like the "Sure! It tastes like milk!" commercial...

What weighs more: a pound of feathers or a pound of gold?

Unfortunately, a machine needs a *real*/formal definition, not just some handwaving that kinda/sorta/might be correct.

Vote

T

Tom Gardner 12 years ago

It is worse that that. They have multiple definitions that mutually conflict with each other.

Just so :(

Vote

P

Paul Rubin 12 years ago

Meh, at 13 MAC per result (due to the dumb algorithm) that's a little over 2 Gflop, a yawner. Here's a DSP that is over 5x that fast:

formatting link

That is a 33 dollar part, so relatively high end, but I'd like to know how much the FPGA costs and how the power consumption compares at comparable speed (2 gflop). TI also has other DSP's that are even faster and cost more. If you can live with 0.9 GFlop there is a 6 dollar one (that can do double precision at a lower speed). And of course consumer GPU's are orders of magnitude faster.

What mainstream tools? That C thing doesn't sound mainstream and using it doesn't sound easy, per that blog post from a while back:

formatting link

Vote

D

dp 12 years ago

Well pressing a few buttons to have the result cannot be a nightmare indeed, not to the pressing operator :-). But the result itself can be, and in this case it is one.

33 pipeline stages for a single precision FPU, ROFL. The one I use on the processor I have is double precision and has 6 pipeline stages. It does single clock .s FMUL, FMADD etc; two clocks for .d .

Now don't get me wrong, it is a nightmare to *me*, may be it is just me perceiving like that the trend to smear it all on top of what has been smeared in years past and get away with it because things are too complicated for anyone to be curious&capable enough to notice. The trend is by far not an FPGA phenomenon, may be it is least pronounced there. Obviously this has been done in 10 minutes not by logic synthesis but by some tool recognizing what is asked for and by copying readily available designs. While I am all for powerful programmable logic and good synthesis tools - allowing one to go to as low a level as he *wants* - I am all against training people to think that because everything has been done already all they have to do is move boxes, no need to learn anything more complicated than that.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Vote

P

Paul Rubin 12 years ago

If the final answer is Inf then it's invalid, but you might have an intermediate result be Inf, and IEEE arithmetic says what's supposed to happen then (e.g. 1/Inf = 0). Kahan has given some examples of calculations designed to make use of this property.

I remember a trick question from math class: where are the singularities of the cotangent function? Obvious answer: cot x = cos x / sin x, so it has poles at the roots of sin x: 0, 180 degrees, etc. Trick answer: cot x is actually defined as 1/tan x, so it also has removable singularities at the places where cos x = 0. Inf in IEEE arithmetic can let you treat the function as continuous at points like that, which you might want to do.

Not at all, Inf is valid in IEEE arithmetic as described above, and NaN just means you did an invalid calculation, maybe on purpose, in which case it's not "wrong". For example, think of a general purpose numerical root-finding algorithm which you give an arbitrary function f and an initial guess x. It starts making other guesses near x, and if it gets a NaN, it says "ok that guess was outside the function domain, I'll make the next guess somewhere else". HP calculators of the 1990's had a rootfinder that did that, using IEEE 854 arithmetic.

Vote

P

Paul Rubin 12 years ago

You're saying in that application, speed is more important than accuracy, i.e. it can withstand answers that are wrong according to the expectations one would place on general purpose numerics.

I like to think we're moving towards a world where floating point hardware is standard even in embedded MCU's used for anything numerical. So no need for anything like -ffast-math.

Wasn't this thread originally about a Kalman filter? Those involve matrix inverses (or at least pseudo-inverses) and are used all the time in embedded systems such as GPS navigation.

"Verification" in the fussier parts of the software assurance world means something more specific: it means you produce a machine-checked mathematical theorem that the program does what its specification says for all possible inputs, e.g. using something like Coq, or Spark/Ada, or maybe the DBC stuff in Ada 2012. That's what I was saying was above my pay grade for floating point programs. Obviously tons of real-world software is written without these methods, but they're becoming more accessible, and they're mandatory for some critical systems.

Indeed, I've had (some) formal training in the subject, but yes, at my current level of knowledge I wouldn't want to work on serious numerics code without expert advice. At least I have enough sense to recognize this situation. I have my doubts about certain other people around here.

By the way, one of the numerical analysis professors at my school used to say any idiot could write a good matrix inversion routine, but to compute eigenvalues properly you had to know what you were doing.

Not at all. Say you have a navigation system: that's embedded, ok? Say you give it some coordinates you want to go to, with some constraints on the route that turn out to not have a solution. That could perfectly well result in some calculation giving a NaN, at which point the device tells you to try again. All working as intended. Or if you have a machine tool that does motion planning and you tell it a shape you want it to make, the same situation could arise. Embedded is not all about flushing toilets.

Vote

P

Paul Rubin 12 years ago

I hadn't seen those before. They are nice, thanks.

Vote

T

Tom Gardner 12 years ago

snip

:) Just so. Know the feeling :)

Vote

G

glen herrmannsfeldt 12 years ago

There are problems where you need somewhat less precision than those available, and a motor controller might be one.

For a larger example, some iterative partial differential equation solvers are fairly insensitive to errors, as long as they can average out. Faster allows for a finer discretization, and so, in the end, more accurate results. Those are the kind of problems that Cray was designing for.

Maybe, but in many cases more speed is still useful.

The important word above is large. Matrix inversion, and problems related to matrix inversion, easily lose as the matrix gets bigger. Available algorithms, such as partial pivoting, allow one to keep more of the precision along the way, and so larger matrices before all precision is gone. (snip)

Probably true. For one, actual matrix inversion tends to be used only for small problems, and hopefully well conditioned problems. Otherwise, LU decomposition is more often used.

(snip)

-- glen

Vote

P

Paul Rubin 12 years ago

I think we're in agreement. A general purpose solution has to be suitable for a wide range of applications, but a given specific application might be able to get by with something more limited.

Yeah, I remember Kahan mentioning that, Cray arithmetic was so inaccurate that you had to use very numerically robust algorithms on it.

Found it:

...and not ask the guys designing H-bombs or supersonic wings, because these latter don't give much of a damn about floating point. Their algorithms are very robust, they can tolerate all sorts of floating point, after all, they run on Crays! Anything that runs on a Cray doesn't care how you round, because a Cray rounds in a way that beggars description.

I wonder whether fancier algorithms with lower computational complexity are equally numerically robust.

Hardware sounds faster than software no matter what. In another post I mentioned a six dollar TI DSP that does around 0.8 GFlops in IEEE single precision or (IIRC) maybe a third of that in double precision. I wonder if there is any CPU that can simulate floating point with

--fast-math at that speed.

Hmm, ok. I seem to remember hearing the original CAT scanners solved big linear systems (maybe not with actual inversion) so maybe that counts as an embedded application with large matrices. Later they did it more efficiently with the Radon transform. I don't know about now.

Vote

T

Tom Gardner 12 years ago

Entertaining and useful reference, thanks.

A good glimpse of the deficiencies of various languages and hardware. I liked these snippets about the "here there be dragons" issue...

DDJ: So for those who are interested in the mathematical accuracy of their praxis, there are tools available for free that allow them to achieve this. WK: It is remarkable how infrequently these tools are used by people who ought to know better. DDJ: Like who? WK: Like Microsoft. But let's get back to John Palmar.

...

WK: Most numerical computation doesn't matter. I know that sounds perverse, but in my experience, looking over the shoulders of other people, at least nine-tenths of what they compute gets thrown away, much of it without ever being looked at. They have only to get a glimpse of something to realize they never should have computed it in the first place. Most numerical computation doesn't matter, therefore a great deal of it can be wrong without deflecting the course of history. Some numerical computation matters a lot. We don't usually know what it is until afterwards. We may not know until too late. How do you know the answer is wrong unless you know the right answer? And if you knew the right one, why would you compute the wrong one?

...

WK: Error creeps in. You have to learn the techniques of error analysis and decide what degree of error is tolerable. There are three good books... But look at the thickness of these books! Look at the bibliography in this one -- 1134 citations! People generally are not going to read these books.

...

WK: So finally, in exasperation, [John Palmar] made them a sort of "put up or shut up" argument. He said, "I'll tell you what. I'll relinquish my salary, provided you'll write down your number of how many you expect to sell, then give me a dollar for every one you sell beyond that." They didn't do it, but if they had, John Palmar would not have to think of working for a living

Vote

R

rickman 12 years ago

Why? Are you designing a system? What are your requirements?

Did you read the article I linked to? It gives you all the 611 you could want on the tools, it was from the tool vendor! If you are interested, read.

Rick

Vote

R

rickman 12 years ago

You didn't even read the article. This isn't an FPU, this is a sin() calculation using a series expansion as an example of how to use the tool. This example uses a number of multiply and add operations to produce a result on every clock cycle.

Yes, that is *exactly* my point. Many people have a predjudice because of a lack of knowledge about how to use FPGAs.

What?

What trend? You lost me.

Really? You have a tremendous insight into complex software. You also are good at making unsupported assumptions.

Rick

Vote

D

David Brown 12 years ago

You do not get /workable/ removable singularities in calculations using simple floating point numerical approximations, such as IEEE describes. It may be possible to find particular examples when things happen to work out correctly, but that's just luck. Write your code /correctly/ within the limits of the numerical model you are using, or use a different model that provides enough detail and flexibility to be able to correctly model the types of numbers or sequences you want.

As for cot, you can define it as cos/sin and there are no singularities of any sort at cos x = 0. Your maths teacher just defined it as 1/tan in order to illustrate that some functions need more careful definitions in order to work as you first think. And how it is defined bears no relationship to how it is calculated in a numerical approximation - and it is the approximation algorithm that is key when you want to get useful results. If you think you can just write "1/tan(x)" and rely on the "magic" of IEEE's infinities to make things work at cos x = 0, you are kidding yourself.

Such "rootfinder" algorithms are the only reasonable justification I have seen for NaN's. And yes, all they tell you is precisely "you've done something wrong". For rootfinders, such information can be useful

- since you have (presumably) already tested the correctness of the test function when you are within the valid input domain, they then tell you you are outside that domain.

Vote

D

David Brown 12 years ago

I wonder if you are confusing accuracy and resolution. Floating point is /always/ approximate - as are pretty much all measurements and outputs. There is no such thing as "the expectations one would place on general purpose numerics" once you move outside the purely integer domain. Right or wrong is about be accurate /enough/ - once you are accurate enough, everything else is a waste of time and resources.

You can like to think whatever you want, but the reality is that that the floating point hardware seen in embedded MCU's generally does not follow IEEE. It follows a "-ffast-math" style world - often with just single precision. At best, you get flags or can enable software exceptions to get the full IEEE (although these are seldom if ever used). In fact, since many software floating point libraries /do/ support IEEE, the move towards hardware floating point is a move /away/ from NaNs and friends.

Yes, matrix inverses are used "all the time" in some embedded applications - although you should remember that these represent a tiny proportion of the world of embedded systems.

But why are you worrying about matrix inverses? It is true that if your algorithms are implemented badly, then you can quickly lose precision when inverting large matrices. But if you think that IEEE - rather than

-ffast-math - makes any significant difference, you are wrong. At best, it pushes the limit of "large" to very slightly larger. If your algorithms are good, and well-tuned to the data you are working with (you are not making general matrix inversions here - the data is application-specific), then your results will be good. If your algorithms are bad, then your results will be bad - and no amount of IEEE compliance will save you.

Perhaps I misread you - I thought you meant actively running through all possible inputs and checking that the results are as expected. This is sometimes done - I know of processors where the floating point hardware (single-precision only) has been verified in this way.

The level of verification required varies enormously according to requirements - sometimes a few test vectors is enough, sometimes a formal mathematical proof is required (with or without the help of computerised theorem checking).

Verification at the appropriate level for the job in hand should not be "above your pay grade" - it should be a critical part of all coding you do (in some jobs, most of the verification may be done by a separate person - but you certainly should be capable of doing it yourself).

Knowing your limits is, of course, vital knowledge - and something a lot of people have trouble with!

Garbage in, garbage out. A NaN cannot turn garbage into something useful. At best, it can tell you you are getting garbage out - but it is much better if your code can figure out that the input data makes no sense, and take appropriate action.

Consider "divide by zero" errors. When you rely on IEEE NaNs and infinities, you only get your NaN or inf when you are actually dividing by zero - when you are dividing by something close to zero, you get technically "correct" but equally meaningless values that propagate into more errors and nonsense numbers, causing all sorts of havoc along the way such as ruining your running averages or accumulators.

On the other hand, if you simply check that the value makes sense (such as checking for a minimum absolute value), you can quickly and reliably identify the problem before anything goes wrong.

The only concrete thing you can say about embedded systems is that they vary enormously. So I am sure there exist embedded applications where strict IEEE-compliant floating point (single or double) makes sense. But I am confident that it is very rare.

Vote

S

Simon Clubley 12 years ago

As a aside, has there been any movement towards open source tools for FPGA development ? The last time I looked at this a couple or so years ago it was all vendor specific and closed source.

A quick look now didn't seem to turn up much in the open source area apart from the Papilio board range (which claims to be open source, but I cannot see any evidence the FPGA tools themselves fall into that category).

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

T

Tom Gardner 12 years ago

The place and route plus post-route timing simulation requires intimate knowledge of the FPGA internals. I can't see the chip manufacturers /ever/ divulging that to (non-governmental) third parties.

There are quite a few open fpga source boards.

Vote

D

dp 12 years ago

I had a glance at the article, of course I did not waste all day on it. So have you a decent FPU to demonstrate - with a reasonable number of pipeline stages - or do you not have. I don't think you can impress many people in this group by the fact something is readily available (10 minutes of clicking).

You quote me out of context. While I agree with what you say here, what you quote was addressed to the implementation itself (I perceived it as messy), not at the difficulty of the task.

Which word exactly do you have a problem with?

Oh it is not that complicated, come on. Just read my former post.

Yes.

So try and prove me wrong. I offer my opinions here for free, feel free to take them or to ignore them. It is up to you to evaluate their validity for yourself, you get them here for free, remember.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Vote

D

dp 12 years ago

Well may be they will open these data some day (when the sky turns yellow?) :D . My first (very very naive) attempt to get these data was still in the LCA era (1990 or so). Then, some years later - 1998-9 may be - I managed to get the data on a CPLD - the coolrunner, then still Philips. Months if not days before they sold it to xilinx (after which getting the data would have been mission impossible). Or at least I thought I had all the data. When I sat to write my logic compiler tool for the purpose I discovered I did not have a major part, describing a multiplexor inside the chip. Took me some time (a month? - don't remember) to write some code to do the all the reverse engineering it took. Here is the result:

formatting link

More recently I needed a coolrunner again (had to be it for the 5V tolerant inputs) but the old version was no longer being made. Just Xilinx. I had to settle for their logic compiler (used some ABEL thing they had which worked OK), yet it took me a nightmare of NDA-s, waiting (months IIRC) until I got the data I needed to be able to JTAG program the thing on the board (I wanted to have it reprogrammable over the net, here is the board of the netmca in question:

formatting link

(coolrunner in the bottom left corner).

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Vote

Small, fast, resource-rich processor

Join the Discussion

Didn't find your answer?