You can do a lot in HDL's - but not everything. First, lets get the easy points out of the way (I hope!), even though they don't apply in the particular case of the Kalman Filter. There are several "C to HDL" tools available these days (that these tools exist at all suggests it is faster and easier to write and debug code in C, then move it into FPGA hardware for speed). By looking at some of the limitations in such software, we can get an idea about things that map well to FPGAs, and things that do not. Now, I have not made an exhaustive search of all such tools - I've just had a quick google and read around. But these are typically difficult or impossible to translate:
- Dynamic memory (malloc, free, etc.)
- General pointers (pointers are usually limited to within a specific array implemented in a memory block)
- Recursion (unless it can be completely determined at compile time)
Algorithms that require a lot of dynamic or unpredictable behaviour, or that require complex data structures, are going to be hard to implement in the FPGA. Not impossible, of course, but hard.
As a general point, if you have a algorithm that is expressed sequentially, and it does not involve such dynamic behaviour, then you can probably code it in a fairly straight-forward way in an HDL. But what good does that do you? Your system is doing one step at a time, even though you are probably using lots of resources - you will end up with a lot of blocks being instantiated for arithmetic and other functions as they are used in the algorithm, even though they are only active for a few cycles out of each multi-cycle loop. You've just created a large, expensive and not particularly fast sequential system (albeit with very predictable timing). To make the whole thing worth the effort, you will have to work at re-arranging the code to make good use of the FPGA with more of the blocks working more of the time.
My guess is that you don't see the issues here because you do this so often that you don't really think about it (just as you say about me below) - but that does not make them any less real.
It's true that there are balances and trade-offs in software too. But I think that these sorts of decisions have far less impact in software than in FPGA design, in terms of the types of changes they need in the source code, the time taken to make the changes, and impact and effect of the changes. And I think more often the "obvious" implementation is going to be the optimal one. In the example of where to put your variables, then obvious answer is to make them automatic variables - the compiler will put them in registers, on the stack, or eliminate them entirely. You don't have to think about anything here. If you are talking about an array of data, you begin to have real choices - put the array on the stack, the heap, or in statically allocated memory. But the code and run-time impact of the choice is minimal in most cases.
But I'll accept that my experience makes me see these things as a bit more obvious than they would be to non-experts.
HDLs work at a much lower level than software, and the tools and simulators match that. This is both a benefit and a disadvantage - it can give you far more control and more precise timing - but it also makes it harder to see the wood for the trees. (This is not entirely different from the age-old assembly vs. HLL debates.)
It is /much/ simpler than that.
If a system has to read 6 input signals, generate 10 output signals, and handle telegrams on an RS-485 bus, then it can handle them one at a time sequentially - and as long as the timing is good enough, the result is the same as if everything were done in parallel.
Sometimes an operating system of some sort (I presume that's what you mean by the "additional software") can make this task easier, but it is certainly not a requirement.
Of course, if timing is very tight or there are difficult synchronisation requirements, then the job might be done more easily using "real" parallelism - an FPGA.
I'm guessing that the IP blocks Altera sells (or re-sells, for the third-party blocks) are reasonably optimised for their devices. The number of LE's used varies according to the device used, so I assume that on the more advanced devices the DSP blocks can do more of the work.
And certainly there will always be tradeoffs in terms of the resources used and the time used, and there will always be scope for optimisation based on more precise usage of the blocks. I am just using specific examples, from what I assume is a realistic source, to get rough figures.
I agree that I am not proficient at FPGA work - and I have no doubts at all that efficient implementation of a Kalman Filter in an FPGA would be a lot more time-consuming for me than for experienced FPGA developers such as yourself (assuming equal understanding of the maths and the algorithm, of course). But I believe I have enough understanding and experience of FPGA work to have an idea of the challenges involved, the costs, and the time and effort required - as well as the benefits of FPGAs for some types of problem. I have not managed to convince you of this, and I don't think we will ever agree here. But Usenet discussions are about exchanging ideas - no one ever really expects other people to change their minds!
If I had the time and opportunity, I would love to do more FPGA development. Perhaps my opinions would change if I did - I will be that open-minded at least. But it has been a while since we've had a project for which an FPGA was ever a serious contender.
Better, cheaper, faster, easier, more flexible - is that too demanding?
It must be said that the devices and the tools for programmable logic have improved enormously over the years, making FPGAs suitable for a wider range of applications than they used to be (many years ago, I worked on a CPLD design that took 6 to 8 hours for place and route at each trial, and was debugged using a couple of flashing LEDs). On the other hand, microcontrollers have got enormously faster and cheaper too, and applications that used to require expensive FPGAs can now be done on cheap micros.
$3 will get you a Cortex M4 (Freescale K10) at 72 MHz. You don't need hardware floating point when you used fixed point arithmetic (and to be fair, I know that converting the algorithm to fixed point would make an FPGA implementation much easier too). Although these devices don't have floating point (SP FP takes the Cortex M4F costs to about $6+), they have MAC type instructions. Obviously I have no details of the OP's project, but I except such chips to be fast enough with a bit of tuning of the algorithm. (He didn't want to have to tune the algorithm or the implementation, as development time was more relevant than hardware costs.)
You don't need any support circuitry for these chips other than a single power supply (1.7 - 3.6V, about 200 mA for full speed), a header for programming and/or debugging, and perhaps a couple of capacitors. You don't even need a crystal for many applications - the internal oscillator is much less than 1% accuracy at room temperature. If your inputs and outputs are analogue, you have an ADC and DAC built in.
In comparison, you usually need a lot more support circuitry for an FPGA
- typically you need multiple voltage levels with significantly more current and tighter tolerances, you need some sort of accurate clock source, and you need external flash. (I know there are some FPGAs with less requirements.) You may also need external RAM - you don't get nearly as much built-in ram for your money with FPGAs.
I leave you with one more thought. When you google for Kalman Filter software, it's easy to find ready-made implementations in C to download and use. When you look for Kalman Filters in FPGAs, results are mostly academic papers, and mostly without any code. The impression is that Kalman in C is so easy (/if/ you understand the algorithm!) that it can be given away - but implementing them in an FPGA is an undertaking worthy of a major project at university, and anyone making them outside of academic circles considers the results too valuable to share.