Small, fast, resource-rich processor

Tim Wescott · 2013-09-11T16:11:04+00:00

I'm working on a project that needs to have a pretty hefty amount of digital signal processing done in more or less real time ("soft" real time, if you must split hairs). For a variety of reasons I think this algorithm would work best on a small single-board computer (my customer disagrees -- but getting it shoe- horned into the chips I was considering is going to take WORK, and I think it'll be cheaper for them to go with more expensive hardware). So I'm looking for suggestions. I mostly build custom boards or I make algorithms for other people's hardware -- I've never specified a single- board computer that's gone into production. I was thinking PC-104, but I've never actually used a PC-104 computer, and I have no idea, beyond trade-show displays, how the market has evolved. So, here's what I think I need. Anyone who wants to look through this and point me to the current crop of solutions for all this is welcome to do so -- I'll be grateful. Small: PC-104 form factor, or some other solution that's less than about 20 square inches of board and less than an inch tall. Fast: Something that supports native dual-precision floating point, and has a clock rate of 500MHz or better. This algorithm runs about 5x faster than real time as a Linux application on a Dell Dimension 8300. That's a 2.8GHz Pentium 4, so if it's running alone it should do more with less. Resource-rich: The algorithm runs, albeit way slow, on a STM32F407, using less than 128kB of memory. So at least that much memory plus whatever is necessary for any OS (see below). Ports: Comes with serial ports. I don't need Ethernet or that stuff. Depending on the processor (see below), having a JTAG debug port would be nice. Extensible: I need something onto which I can easily slap an ADC board, or something that talks USB, and suggestions for matching ADC modules that talk USB. My preference is something that has an easy parallel I/O implementation, an SPI controller that I can hook...

D

David Brown 12 years ago

Re-read your posts of the past few days, in this thread and the "AREF bypass capacitance" thread. Fair enough, you haven't explicitly claimed that FPGA's are the best solution in all ways for all problems - I exaggerated. But you have made still a range of absurd claims about how good they are in many circumstances, despite evidence to the contrary.

FPGAs have their uses - there are things you can do with them that would be near-impossible, and very expensive, to do in any other way. And there is an overlap of problems that can be solved by either processors/microcontrollers or FPGAs. But face facts - there are lots of problems that are more efficiently done in software, and that means a microcontroller or processor (or soft processor /if/ you already need an FPGA for other things).

You will be a better advocate of FPGAs if you are realistic about them - at the moment, you are scaring off the fence-sitters.

I haven't implemented a Kalman filter myself, though I trust Tim's judgement here. But you don't need any experience to look at the Wikipedia page (or any other webpage or book) and see that this is a complex algorithm, and is best done step by step.

What you seem to be missing entirely here is that no one is saying Kalman filters cannot be implemented on FPGAs - we are saying it is vastly more difficult to do so. It takes a lot of work to learn to understand these things, and to test and debug the code step by step. It is orders of magnitude easier with software that is easy to start and stop, debug, view data, print out logs of data, feed with test data, run on a PC rather than the target, etc. (And if you think to mention FPGA "simulation" - or even "co-simulation" - don't bother, for reasons that are obvious to everyone else. If you want to talk about MyHDL or Lava, I'll be happy to hear your new ideas.)

When you have a good, working Kalman implementation in software, and you need to run it 100 times faster with little regard for hardware costs - /then/ it is time to break out the FPGA and transfer it over.

Yes, people implemented Kalman filters in FPGAs before there were hard ARM cores - there were other hard cpu cores before that (PowerPC, and older weaker ARMs) as well as a multitude of soft cpu cores. And as I say, it /is/ possible to implement Kalman in "pure" fpga. People implement these things for a doctoral thesis - while in the pure software world, people knock them up in a couple of days using software downloaded from the net. /That/ is the difference.

Vote

D

Don Y 12 years ago

[KMS experiences]

To be fair, this "problem" isn't confined to "hardware engineers". It happens when *any* influence dominates design decisions -- be they hardware engineers, software engineers, marketeers, bean counters, etc.

If "everything looks like a nail", then you *will* feel justified using "your hammer"! (software folk have just as much tunnel-vision as hardware folks: "I'll write this in ...")

Good design teams include a balanced mix of all the factors/parties "with skin in the game".

I can design a clever/inexpensive piece of hardware that would greatly increase the time (cost) for a software developer to implement a given algorithm/application. Or, seriously impact his ability to verify its correctness!

I can write an elegant piece of software that would require very expensive (or bleeding edge!) hardware to run properly.

I can write a specification that will cause both of the above to expend lots of resources for "dubious" ends.

etc.

The *best* projects I've worked on have had diverse personnel with respect for the capabilities of their peers. And, a very blunt, forthcoming attitude towards questioning claims as well as "demanding" explanations/justifications (which means being able to unequivocally admit you don't know something!).

I've often been in meetings where some claim is made (honestly, genuinely) that, when asked for clarification, falls apart under closer scrutiny -- not because of any arrogance or incompetence on the original claimant's part but because of a closer inspection of the *actual* requirements of the project and their consequences wrt the issues claimed: "Um, Tom, that won't work because..." "Argh! I didn't realize that was an issue, here! Nevermind..."

And, ultimately, you need a decision making process that respects these various inputs and knows how to balance them within the framework of the organization/project. "Value added" comes from knowing how and when to draw from each particular area of expertise.

Vote

T

Tom Gardner 12 years ago

That just about sums up my perception as well.

As I noted in a prelude to a note on the AREF thread 'My starting point is to be amused by anybody that implicitly extrapolates from "my previous constraints and requirements" to "everybody's constraints and requirements". Having got that out of the way...'

Vote

T

Tom Gardner 12 years ago

Yup.

One of my bozo filters is to listen for embarrassing "no" answers when glib "yes" answers are the easy simple default.

If I hear an embarrassing "no" from an individual then I'll tend to believe any "yes" answer from that individual.

Quickly sorts corporate drones and political lackeys into the bozo pile.

Vote

D

Don Y 12 years ago

A close associate worked with me on a particular job. He had worked out a "clever" solution with the client for one particular aspect of the project. "They" (he and client) presented it to me after a few weeks of working out the details.

I was puzzled as to why *I* couldn't see the cleverness! Oh, I could see what *looked* like it would be "clever" but there was a huge flaw in the technique.

When I questioned my colleague (in front of the client), I drew his attention to this and watched the "Oh, S*it!" spread across his visage.

No "face" lost. Just *time*. We (he and I) both knew that he would come up with a good solution -- probably even *more* clever! But, he was chagrined at all the time he had invested without noticing this (obvious?) flaw in his approach.

"Forest"... "trees"...

Vote

M

Mel Wilson 12 years ago

Don Y wrote: [ ... ]

Daniel Kahneman's recent book _Thinking, Fast and Slow_ about the "personal economics" of hard+slow and easy+fast thinking has some interesting things in this regard.

Mel.

Vote

T

Tim Wescott 12 years ago

Yes! Yes yes yes! Yes to all of it.

When one is graced with a group to work within, the best part comes if the members of the group aren't offended by such blunt questioning, and are willing to undertake it themselves.

Tim Wescott Control system and signal processing consulting www.wescottdesign.com

Vote

M

Mark Curry 12 years ago

Xilinx DSP48s are 48 bits. Altera's accumulators are similar. That's not narrow in my book.

Regards,

Mark

Vote

R

robert bristow-johnson 12 years ago

...

and i'm totally in agreement, too. sometimes software people are just too unaware of the memory costs (usually resulting in the unnecessary copying of signals and other large data block often by "passing by value" or just another use of "new" in C++) that hardware guys would be quite aware of. it's why DSP guys need to be good at math, good at programming, and at least competent enough to be aware of what is costly in hardware (and not just CPU MIPS, but memory size and costs, bus bandwidth and the like). i think i can turn any decent mathematician into a decent DSPer which is more than i could say about a decent hardware engineer or a decent analog engineer or a decent programmer. but, to be a decent DSPer, we would have to get the mathematician to know about and be concerned about what the resources are what costs dearly and what costs less dearly. and if the mathematician writes crappy, convoluted "Microsoft" code, i would have to slap him down

r b-j rbj@audioimagination.com "Imagination is more important than knowledge."

Vote

R

rickman 12 years ago

This is a more reasonable conversation, but you still make unsupported claims. What did I say that was "absurd"?

Again, I have never said anything to the contrary. My point is that the line between the taks more usefully done on an FPGA and tasks done more usefully on an MCU is not where most people (at least in this thread) think it is. There is a *lot* of prejudice and bias about FPGAs and the effort required to use them. This mess started when I questioned the use of the word "nightmare" used to describe the FPGA development process. I don't think you are trying to support that claim are you? No, you are trying to make the point that it's not the opposite because you seem to feel I am saying FPGAs make everything easy. I'm not saying that and that is obvious if you read what I actually write.

Again, you are making a claim about my position without any support. What did I say that was unrealistic?

Why does that make it hard in an FPGA?

Yes, they say it is *MUCH* harder to do in an FPGA... without *any* supporting evidence. Your bias is pretty clear from this paragraph alone. You list in detail the software process and then negate without

***any*** evidence the utility of HDL simulation. In fact, you decry it specifically stating that you don't need to provide any evidence because it is "obvious"!!! That is *exactly* the sort of bias I am addressing.

Have you done FPGA work?

/then/ it is time to break out the FPGA and transfer it over.

More bias... this assumes that the *only* utility of an FPGA is to make things run very fast and that FPGA hardware costs are much higher than CPU hardware costs. Really? I have FPGA boards that sell for under $50. I believe the hardware cost the OP talked about was a SBC of some sort which is not likely to be under $50.

And yet, no one can tell me what aspect of a Kalman filter makes it so hard to implement in an FPGA...

Rick

Vote

R

rickman 12 years ago

Double precision floating point is not a difficulty in an FPGA... next!

Why is this hard in an FPGA?

I appreciate you taking the time to break the algorithm down like this, thanks.

What part of this is unsuited for an FPGA? It sounds like a lot of computations to me. Computations are easy in FPGAs. FPGAs are very well suited to computations. It also sounds like there is some control structure guiding the computations, that is not hard in an FPGA.

I am speculating that because there are a *lot* of computations and in the OP's case the sample rate is (relatively) low, this makes the algorithm suited to serial computations which is what CPUs do. None of this makes it *unsuited* to FPGAs and does not mean it would be a "nightmare" to implement.

Is there any part of the above algorithm that you see as hard to do in hardware in an FPGA?

Rick

Vote

R

rickman 12 years ago

Thanks for this info. I have some questions.

First, without any details of *how* a KF works, what does it do? I'm asking for the 10,000 foot view. Or maybe the "why" is more appropriate or both?

So am I correct in assuming you have an input vector and and output vector in addition to the state vector? Or are the input and output scalars?

Please help me understand. X is an input or an output or both?

I can't say if fixed point or floating point would be better. I'm mostly agnostic on that matter and would leave that decision to the algorithm designer. I'm a little confused about your mention of "huge precision" of 64 bits. Isn't the double precision floating point also

64 bits? I think I'd actually prefer the floating point because the mantissa is smaller than the fixed point. Fixed point can become very complicated if block floating point or a similar functionality is used to compensate for the lack of range compared to floating point. The length of 64 bits in itself is not a significant issue IMHO.

Rick

Vote

R

rickman 12 years ago

Since you seem to be familiar with these and to save me digging up a data sheet, what is the width of the inputs to the multiplier in the DSP48? Is the accumulator 48 bits or is it wider?

I was helping a colleague learn VHDL and he was telling me about the Altera DSP blocks, but I don't recall the details. He did mention multiple accumulators for each multiplier which I expect helps in many apps. I just don't recall all the bit widths, inputs, product, accumulators.

Rick

Vote

D

Don Y 12 years ago

But this requires people who are "secure" in their abilities and shortcomings. Not "embarassed" to admit when they don't understand something. And, not "gloating"/smug when one of their peers "exposes his soft underbelly". And, similarly, respectful of what their colleagues' capabilities (and limitations) happen to be.

In this kind of environment, it's hard NOT to learn something!

Unfortunately, the flip-side of this is also true: when forced to work with a group of insecure, closed-minded, defensive fools,

*nothing* gets done -- besides lots of finger-pointing and ass-covering.

Vote

M

Mark Curry 12 years ago

The accumulator is 48 bits. The inputs to the accumulator are: * The accumulator itself (or an accumulator from another DSP48) * A shifted version of the accumulator (or an accumulator from another DSP48) * The output of a 25 bit x 18 bit multiply (43 bits) * A full 48 bit input from the FPGA fabric

There's restrictions on which inputs can be used when (i.e. not all of the above can be used at once). There's other variations too, but the above captures the high-level.

The above is for Virtex7. For previous generatations, the multiplier was only 18x18.

Regards,

Mark

Vote

D

Don Y 12 years ago

I think a reason "people building THINGS" (e.g., embedded systems) tend to listen to the hardware folks is because they deal with tangible objects that have (recurring) price tags on them. One can assume the nonrecurring costs will "eventually" disappear. So, the recurring costs (and pricing) determine your profit.

Flip the bias around and let the "software" guys drive the design:

"OK, what resources do you *need* ('want' is not an option!) in the product? How much RAM? What sort of computational ability? How much ROM? Clock rate? etc. We'll have the hardware guys start working on THAT design next week -- and the marketing guys figuring out how we can set the price point to make money on this. Of course, if you have a faulty estimate, we can't go back and make a second pass at it, later!"

Ask yourself how well *you* can estimate the TEXT + DATA sizes of a project "on day 1"? And, how much horsepower it will take to get the computations done in a "timely" fashion? And, *justify* the costs of this as they will directly impact the selling price...

Ask a hardware guy what the costs of "some design" he has in his head are likely to be and he'll tend to be pretty close to the final costs. (whether the design can be effectively

*developed* is a different issue)

Now, imagine you're Manglement and have to listen to these two guys. Who are you more likely to feel comfortable following?

Vote

D

Don Y 12 years ago

Thanks, I just added it to my library list (I'm too old to be purchasing any more books! Even reading just one a week I'd have run out of shelf space ages ago!! :-/ ).

I enjoyed Lehrer's _How We Decide_ a few months back. And, Pink's _Drive: The Surprising Truth About What Motivates Us_ before that. (A few other similar titles escape me at the moment).

Currently chasing down Gregory's _Stupid History: Tales of Stupidity, Strangeness, and Mythconceptions Throughout the Ages_ and Poundstone's _Priceless: The Myth of Fair Value_

[What the hell is it with colons in book titles, nowadays?? :> ]

Always amusing to see how and why we/others think the way they do. Esp in light of contrary evidence.

--don

Vote

R

rickman 12 years ago

Thanks

Rick

Vote

T

Tim Wescott 12 years ago

With apologies if I seem to be abstruse here -- I don't mean to, it's just the nature of the KF that it tends to abstruse answers:

From the mathematician's point of view, the Kalman Filter is making the Best Damned Possible estimate of the state of the system at each step, as long as the system involved is perfectly known, perfectly linear, and subject only to Gaussian noise.

Alas, there are exactly no real-world systems that match those criteria exactly.

From the engineer's point of view, the Kalman Filter is a good way to make a pretty good estimate of the state of a system at each step, even when intuition fails. Basically, if you have no intuitive grasp of how to proceed, but you've got the math chops to do a bunch of opaque mathematical operations, you can get an algorithm that does what you want.

In this particular case, the system description changes at every step, because the "state" being estimated is a set of equalizer taps, while the system output matrix is taken directly from the input data stream.

The input and output are generally vectors, but can be scalars. In the Kalman Filter literature a scalar is just taken to be a 1x1 vector, although in practical applications you can, of course, get some improvements in efficiency by recognizing that they're scalars.

Neither. The vector x is the system state. A concrete example of this would be a KF application to merge GPS and IMU information: in that case, x would be the position, velocity, and orientation of the vehicle. With three dimensions to play in, that means that x would represent nine states.

The output is taken to be a function of the state (and possibly the input), but it is not the state per se unless it's a pretty boring Kalman Filter.

In this case, the output is an ever-changing function of the state, just to make life interesting.

64 bits is more precision than I've ever needed for fixed point. Therefore it is "huge". I usually need 32 bits when I'm doing control stuff in fixed point. Therefore 64-bit floating point precision is "wasteful".

Life's so much easier when you're egocentric.

For this particular application 64-bit fixed point could be used without block floating point. Since in an FPGA one isn't constrained to 2^n sizes, one could probably trim it down to 48 bits and still get decent performance.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

T

Tim Wescott 12 years ago

I think you err both in assuming that recurring costs are always king, and that the software guys are always unaware of them.

I did a lot of work at a company making $500,000 systems where we might build 200 units over the lifetime of the system. In that case, $50,000 worth of engineering time spent to take $100 out of the bill of materials cost was $30,000 wasted (more if you were looking for a 1-year ROI).

I've got the same circumstance with this system: I can shoe-horn this into a $100 board, but we're only going to amortize it over a few units, so I'll be throwing a lot of money into a rathole if I try to make it all fit.

Sometimes a high-dollar bill of materials _is_ an ingredient in the least expensive solution.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

Small, fast, resource-rich processor

Join the Discussion

Didn't find your answer?