Small, fast, resource-rich processor

I would like to second the above sentiments, with regret.

Why regret? Because Mr Rickman appears to have some useful expertise and has gone out of his way to be helpful (by posting here).

The impression I form is that - Mr Rickman has considerable expertise with FPGAs and so finds developing solutions using FPGAs easy. Fair enough. - Mr Rickman has less expertise outside that area, and so finds them more difficult - or more difficult to understand for topics of which he has no experience. Fair enough. - Mr Rickman castigates those with less expertise in FPGAs for thinking they are more difficult to use And therein lies a certain degree of irony.

A variant of Amdahl's Law :)

Not true for modern processors - where there can be many independent processors in a single chip.

Anyway, the parallelism argument is not a good argument for FPGAs. Parallelism can be bought by the application of $.

*Latency*, however, can be much lower and more predictable in FPGAs, and can't be bought with $.

In the networking fraternity there's an aphorism: "bandwidth is determined by dollars, latency is determined by physics". Same's true at this level too :)

With DSP algorithms, better edge-of-envelope performance can sometimes be obtained by "clipping" fixed point values when they go out of range. (It always amazes me how much clipping some types of spread spectrum systems can tolerate without loss of performance. Sometimes it feels like the front-ends only need one or two bits!)

How is fixed-point "clipping" specified within a C program or library? (Bog-standard C allows silent overflows, of course)

Reply to
Tom Gardner
Loading thread data ...

The much more significant difference is that numerical calculations work correctly in IEEE that were broken in earlier formats. That is why Prof. Kahan (the designer) got the Turing award. It wasn't for data interchangeability, it was for finally getting the math right. As he put it, he designed IEEE 754 to make the world safe for floating point hardware.

formatting link
gives some of the rationale for the standard. Other pages on his site are also interesting if you care about numerics.

Reply to
Paul Rubin

I think in the case of the Kalman filter, this would be either unworkable or highly suspicious. If the KF were being used for anything important, some serious mathematical justification would be warranted before going ahead with such an approach.

You'd use intrinsics for saturating arithmetic if your CPU supported it. For example, the XMM instructions on the x86, or comparable multimedia instructions on the fancy ARM's.

Reply to
Paul Rubin

I'm going from hearsay here, but:

The biggest argument for using non IEEE-compliant floating point is that by and large the largest expenditure of logic (or code and clock ticks in emulation) to implementing 100% compliant floating point code is properly dealing with all possible exceptions and combinations thereof.

If your algorithm is debugged and verified to the point where you can be sure of never, ever hitting an exception, then your floating point processing costs go down.

The second-biggest argument is a strong advantage of FPGAs: you can tailor your data path to your data.

And yes, you'd really want a numerics expert on board, or to do your design conservatively, if you were going to proceed. Which is just yet another tax on your project if you're going to use FPGAs instead of standard processors.

--
Tim Wescott 
Wescott Design Services 
 Click to see the full signature
Reply to
Tim Wescott

Trimming the post didn't change what I read. If you want to be rude, then please don't bother to post.

I read what you wrote, all of it. I don't know what you are trying to say with "enterprise software". If you have a point to make, please make it in a different way.

No, just your post not being clear. "it" is used poorly here. Try writing more clearly and I won't need to read "it" so many times.

No, just another example of your poor use of the language. If you just want to rag on me, please don't bother. If you really want to communicate, please rewrite your post more clearly.

--

Rick
Reply to
rickman

Really? If you don't implement the full IEEE-754 standard your code is "hacky"? I don't even know what that means. It is useful in software so that code is easily ported between processors. Otherwise there is nothing magical about the format.

If a design uses 22 bits of mantissa instead of 24 bits, why is that a problem? Or if a designer decides the app needs 30 bits of mantissa, how is that an issue?

--

Rick
Reply to
rickman

Thank you for having the honesty to confirm my suspicion.

Communication requires that something be written and read. Not speed-read so fast that relevant parts are omitted.

Reply to
Tom Gardner

I learned that a long time ago working on an array processor, the ST-100. It had two rack cabinets of circuitry, one was full of conventional TTL/ECL and the other was three boards of ECL custom gate arrays (the bulk was for the cooling). Two of the three boards were the "compute head", all the logic for doing the floating point math - two adders, two multipliers and a square root/divide circuit. The third board was the SMP, Storage Move Processor. It was responsible for keeping the cache memory filled with data the compute head needed to operate on and tucking away the results into main memory.

At the time I was very impressed with the fact that the SMP was a full

50% of the size of the actual ALUs. It made me realize the importance of data movement and how that actually defined the capabilities of a DSP machine.
--

Rick
Reply to
rickman

Hmmm, doing the calculations to see how many bits you need, how you begin to accumulate error in FP if the mantissa is too short etc. hardly takes more than high-school grade maths...

Dimiter

Reply to
dp

loss of 12 dB?

i'm basically with you here, Rick. (BTW, i started a response to the other thing, but there are too many points to back-and-forth on, so i dunno if i'm gonna finish it.) when it's a specialized hardware, you can use whatever format you like. i think that ultimately, you have to represent the mantissa as either a twos-complement (or maybe a sign-magnitude) integer and the exponent as an integer and do the math with those. at least that's how i pulled apart a float to do math on, i think it would be about the same in hardware (but you would have issues regarding the NaN and INF exceptions).

one thing i'll say for IEEE-754 is that the biased exponent placed immediately left of the mantissa (with the leading 1 removed for normalized values), and then with the biased exponent held at its smallest value (i think it's 0 which corresponds to 2^(-127)) while the mantissa just continues to count down into the denorms, that design is nice so that if you compare two floats with the same hardware of comparing two ints, the compare operation comes out the same.

except, in IEEE-754, if both values are negative, then the twos-comp compare result is opposite of what it should be. this could have been corrected nicely if the IEEE-754 guys had been thinking the same as the guys that designed the DEC PDP-10. what they should have done, rather than to represent negative numbers as a sign-magnitude representation, they should have twos-complemented the negative numbers from their positive counterparts. then, positive or zero or negative, normalized or denormalized, the mapping of the ostensible fixed-point value of some bit pattern in the word to the floating-point value (of the same bits) would be a strictly increasing function (it would look like a piece-wise linear approximation to the sinh() function). a compare of the bits as twos-comp would result in the same compare decision.

--
r b-j                  rbj@audioimagination.com 

"Imagination is more important than knowledge."
 Click to see the full signature
Reply to
robert bristow-johnson

Ok, a *factual* statement that can be discussed, even if it is unsupported. You make a claim that you have to reorganize the algorithm to suit HDLs. What aspect of HLLs is missing from HDL? They include conditionals, looping, branching, etc. Why is it hard to implement any aspect of an algorithm in an HDL?

Why is this significantly different from software? The fact that there are multiple types of resources? Software has the same issues. Do you put the 64 bit variables on the stack dynamically or allocate static memory? Software can be implemented in many, many ways as well and that is where experience comes in. Many people have lots of experience with software and are comfortable with these trade offs. So comfortable that they don't even see them as issues... such as in your case apparently.

Why can't you single step the system in a simulator? Actually, it is better if you don't. If you had more familiarity with the simulators used for HDL design you would realize that they work very much like software debuggers but with a significant advantage (at least to me) of in addition to all the standard information displays of memory, signals, etc., being very visual, showing graphs of the signals changing over time. I can run a simulation and go forwards and backwards in time to see what caused a given result. Single stepping is great, but it is hard to go back to an earlier time.

...no comment...

It is not a mater of perception when you are writing the software. Implementing virtual parallelism on a sequential processor requires a lot of additional work somewhere, buy someone. Much of it may have been done for you, but if your app needs to use that parallelism, you need to address that in ways which are unique to this situation. When things are *actually* run in parallel this all goes away.

The point is that "natural" has no meaning, in food or in this discussion, unless you give it one. I'm asking for the meaning you intended.

This statement is accurate and has no bias. But it says nothing about what the "natural fit" is for FPGAs.

Yes, I believe that is what I wrote above, but in more detail.

Unless you use the dedicated multipliers as they are intended.

I don't see what is difficult and you have not explained it. You are trying to make a point that FPGAs are poor at decision making, which is not true. You have tried to make a point that FPGAs are poor at floating point, which is not true.

Ok, I think this sums it up. You have done just enough FPGA work to appreciate that it is not the same as coding in an HLL, but clearly you have not become proficient in it. More importantly, I think you are a bit stuck in the sequential code mindset. This might not be a problem using FPGAs, but it is nice if you open up a little more and see the full capabilities.

I once gave some free advice to a software guy who had been tasked by his company with porting a design to an FPGA. I forget the details but he wanted to start out with a "hello world" program. A number of us advised him that this was not so simple a task in an FPGA, but he motored on and proved us wrong. With just a little coaching he was able to complete his task an I was not able to turn it into a consulting gig. He was rather grateful for my support and convinced his employer to send me a $500 check.

So clearly if you are open to what can be done with an FPGA, they aren't so hard to use after all.

Ok, I won't dispute that for many users, FPGA design is harder than software. My statement was that calling it a "nightmare" is inaccurate. I have provided plenty of evidence and you have as well. You said that FPGAs are Turing complete. HDLs are as well. They are also very facile (if you don't mind strong typing in the case of VHDL) and the debug tools are excellent. What more do you want?

The OP tested the algorithm on an ARM and said it wasn't fast enough. I seriously doubt that you would be able to run it adequately on a $3 ARM since that would be the low, slow end of ARM devices. I'm pretty sure you would have a hard time getting hardware floating point on a $3 device much less double precision. BTW, by the time you add all the support circuity and put it on a board, that $3 chip will have a retail cost of $50 or close to it.

How long is a piece of sting? We can't really say what the OP's algorithm will run on specifically, can we? Although he did say something about wanting 1 MFLOPS IIRC. You might be able to run that on a $35 raspberry Pi.

--

Rick
Reply to
rickman

Not really fair. I have experience with MCUs and even PC programming. I will admit I am not current in the technology however as I have gone over to the "dark" side... I program in Forth now. I don't find software very hard to understand and I don't find it *difficult* at all.

Hmmm... castigate sounds like a loaded word... I am simply disputing the point stated that implementing a Kalman Filter with an FPGA would be "a nightmare". I maintain that FPGAs are largely as easy to use as MCUs and CPUs, but that they are less well understood and appreciated by those who are making the statements about how hard they are to use.

Is that what you mean by "castigate?

I've already excepted the multi-core chips elsewhere.

I am not trying to debate the issue of what is the best way to solve problem X. I am simply trying to get people to understand that FPGAs are not a "nightmare" to use.

--

Rick
Reply to
rickman

I have never said it would be "zero" effort. I am disputing the claim that implementing a Kalman Filter in an FPGA would be a "nightmare". Why does everyone extend my statements beyond the stated scope?

Uh, it would be the same in Verilog as in C. It would even look much the same, no?

--

Rick
Reply to
rickman

I'm sure that's true for some classes of software, and equally sure not for others such as application frameworks (J2EE/JAIN etc), distributed caches, enterprise service busses, map-reduce frameworks, ACID and non-ACID databases, software transactional memory, CORBA, webservices, REST, AJAX and so on.

Mind you, many of the people that program the "beans" etc in those environments have vanishing little concept even of what a compiler emits.

I'm sure it would be more successful teaching "low-level" people like us about enterprise stuff than vice versa.

Your statements have been interpreted by many in your audience as being far wider ranging and black-and-white than that.

Having seen the problems "average" s/w bodies have with basic concepts such as state machines ("they're something to do with parsing languages, aren't they?"), I believe most software people will have more problems with casting their problems into FPGAs than hardware people casting them into software.

That's a fair objective, but you have (IMNSHO) over-egged your argument.

Reply to
Tom Gardner

The Cortex M4F on the TI Stellaris Launchpad board has IEEE floating point, but unfortunately I think it's only single precision or else I would have suggested it for this app.

That Stellaris board is $12.95 retail but I got two of them on pre-release promotion for $5 each.

Reply to
Paul Rubin

I don't know, I'm not the Verilog guy here. I know that Verilog looks like C in the sense that both have curly braces. But I'd like to know how to implement an algorithm like that in Verilog, and how big an FPGA it would take to run it, and what you think development time would be. I've implemented that algorithm a couple of times as math homework and it took a few hours each time.

Reply to
Paul Rubin

That is correct - processors with double precision floating point hardware are usually quite a bit more expensive than those with just single precision (partly because a DP unit is significantly bigger and more power-hungry than an SP unit, and partly from the economies of scale - SP is usually enough).

To run the algorithm on a $3 chip means changing to fixed point rather than DP FP - or possibly dividing the algorithm into parts that need DP and parts where SP is sufficient, and doing the DP in software (which is obviously slower - but perhaps still fast enough).

Reply to
David Brown

Which is exactly what I didn't want to screw with for just a few units...

--
Tim Wescott 
Wescott Design Services 
 Click to see the full signature
Reply to
Tim Wescott

OK. This is too much. An FPGA expert is implying that if it can be implemented in Verilog then it'll just synthesize cleanly and efficiently.

All the _other_ FPGA experts that I know, and my own small experience with FPGA's, indicates that if you can express it in HDL then you're about as far along as if you can express it in any software HLL. It is absolutely no guarantee that can synthesize it, that it will operate trouble-free on an FPGA, that it'll meet timing, etc.

--
Tim Wescott 
Wescott Design Services 
 Click to see the full signature
Reply to
Tim Wescott

verilog uses begin/end not curly braces :P

-Lasse

Reply to
langwadt

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.