All-real FFT for FPGA

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 3:48 PM

I would be very leery of an optimizer that felt free to optimize things so that they are no longer bit-exact -- and for some combinations of bits, I'm pretty sure that -(a * b) is not necessarily (a * -b).

--
Tim Wescott 
Control systems, embedded software and circuit design 
I'm looking for work!  See my website if you're interested 
http://www.wescottdesign.com

- S
- Steve Pope
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 4:39 PM

Of course, synthesizers need to be bit-exact and conform to the HDL language spec

That is not the example I gave, but in either example you still would not need two multipliers, just one multiplier and some small amount of logic; any reasonable synthesizer would not use two multipliers worth of gates.

Steve

- E
- eric.jacobsen
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 5:35 PM

Another way to understand it is that the DFT is a linear-algebra multiplication of the length-N input vector with the NxN transform matrix. Before electronics were prevalent astronomers would hand-calculate DFTs (for orbit calculations, I think), and after doing this a lot Gauss observed that many computations were repeated in a pattern which allowed the transform matrix to be factored to save redundant calculations. This is why Gauss is often credited with "inventing" the FFT, and others much later rediscovered the same thing.

There are multiple ways to do non-power-of-two-sized FFTs, some are more efficient than others. Alternative radix or even multiple radix transforms are sometimes used.

--
This email has been checked for viruses by Avast antivirus software. 
https://www.avast.com/antivirus

- E
- eric.jacobsen
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 5:37 PM

The results of synthesizer optimization on something like an FFT will depend a LOT on the architecture used. A highly parallel architecture, or an architecture where the FFT stages are independently pipelined, would likely benefit from optimization when the imaginary inputs are static zeroes than a serial architecture would.

--
This email has been checked for viruses by Avast antivirus software. 
https://www.avast.com/antivirus

- S
- Steve Pope
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 5:59 PM

I agree. Sequential designs especially those that resemble a microcoded architecture will not optimize well.

An FFT built around a single multiplier-accumulator would not optimize, OTOH it isn't very high gate-count to begin with.

Steve

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 7:21 PM

How is that not the example you gave? If the tool is going to use a single multiplier for calculating (a * b) and (a * -b), that implies it calculates (a * b) and then negates that to get (a * -b) as -(a * b), no?

I don't know that -(a * b) wouldn't be exactly the same result as (a *

-b).

--

Rick C

- C
- Christian Gollwitzer
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 7:42 PM

Am 14.02.17 um 20:21 schrieb rickman:

It is bitexact the same. In IEEE float, the sign is signalled by a single bit, it doesn't use two's complement or similar. Therefore, -x simply inverts the sign bit, and it doesnt matter if you do this before or after the multiplication. The only possible corner cases re special values like NaN and Inf; I believe that even then, the result is bitexact the same, but I'm too lazy to check the standard.

Christian

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 8:22 PM

While the use of floats in FPGAs is getting more and more common, I suspect that there are still loads of FFTs getting done in fixed-point.

I would need to do some very tedious searching to know for sure, but I suspect that if there's truncation, potential rollover, or 2's compliment values of 'b1000...0 involved, then my condition may be violated.

And I don't know how much an optimizer is going to analyze what's going on inside of a DSP block -- if you instantiate one in your design and feed it all zeros, is the optimizer smart enough to take it out?

--
Tim Wescott 
Control systems, embedded software and circuit design 
I'm looking for work!  See my website if you're interested 
http://www.wescottdesign.com

- S
- Steve Pope
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 9:56 PM

Okay so far

Not necessarily exactly this.

You just answered your own question.

Steve

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 10:20 PM

If not this, then what? Why are you being so coy?

No, that's a separate question. In fact, as Chris pointed out the IEEE floating point format is sign magnitude. So it doesn't matter when you flip the sign bit, the rest of the number will be the same. For two's complement the only issue would be using the values of max negative int where there is no positive number with the same magnitude. But I can't construct a case were that would be a problem for this example. Still, if that is a problem for one combination, then there will be cases that fail for the other combination. Typically this is avoided by not using the max negative value regardless of the optimizations.

--

Rick C

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 10:24 PM

There is such a thing as "hard" IP which means a block of logic that is already mapped, placed and routed. But they are rare, but not subject to any optimizations. Otherwise yes, if you feed a constant into any block of logic synthesis will not only replace that logic with constants on the output, it will replace all the registers that may be used to improve the throughput of the constants. There are synthesis commands to avoid that, but otherwise the tool does as it sees optimal.

The significant optimizations in an FFT come from recognizing all the redundant computations which the optimizer will *not* be able to see as they are dispersed across columns and rows or time. Logic optimizations are much more obvious and basic. Otherwise you could just feed the tool a DFT and let it work out the details. Heck, that might get you some significant savings. In a DFT done all at once, the tool might actually spot that you are multiplying a given input sample by the same coefficient multiple times. But that's still not an FFT which is much more subtle.

--

Rick C

- S
- Steve Pope
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 10:48 PM

Note: I did not say that in an HDL, -(a * b) is equal in a bit-exact sense to (a * -b).

We agree

Because one would have to look at the HDL language definition, the declared bit widths and declared data types and possibly other stuff.

But you still wouldn't need two multpliers to compute the two values in my example; just one multiplier plus some other logic (much less than the gate count of a second multiplier) to take care of the extremal cases to make the output exactly match.

Steve

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Tue, Feb 14, 2017 11:23 PM

It was a *long* way around the woods to get here. I seriously doubt any logic synthesis tool is going to substitute two multipliers and an adder with a single multiplier and a bunch of other logic. Have you seen a tool that was that smart? If so, that is a pretty advanced tool.

BTW, what *are* the extremal[sic] cases?

--

Rick C

- S
- Steve Pope
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Feb 15, 2017 12:35 AM

I very strongly disagree ... minimizing purely combinatorial logic is the _sine_qua_non_ of a synthesis tool.

Lots of other things synthesizers try to do are more intricate and less predictable, but not this.

Steve

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Feb 15, 2017 2:54 AM

I hear you, but I don't think the tools will be able to "see" the simplifications as they are not strictly logic simplifications. Most of it is trig.

Take a look at just how the FFT works. The combinations of multiplications work out because of properties of the sine function, not because of Boolean logic.

--

Rick C