#### Do you have a question? Post it now! No Registration Necessary

**posted on**

- Tim Wescott

February 12, 2017, 6:05 pm

So, there are algorithms out there to perform an FFT on real data, that

save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for

FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include

these algorithms in their FFT libraries.

save (I think) roughly 2x the calculations of FFTs for complex data.

I did a quick search, but didn't find any that are made specifically for

FPGAs. Was my search too quick, or are there no IP sources to do this?

It would seem like a slam-dunk for Xilinx and Intel/Altera to include

these algorithms in their FFT libraries.

--

Tim Wescott

Wescott Design Services

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: All-real FFT for FPGA

On 2/12/2017 1:05 PM, Tim Wescott wrote:

I thought I replied to this, but my computer has been crashing a bit so

maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed by

taking advantage of some of the symmetries in the FFT. I don't recall

the details as it has been over a decade since I worked with this, but

you can fold half of the real data into the imaginary portion, run a

size N/2 FFT and then unfold the results. I believe you have to run a

final N sized complex pass on these results to get your final answer. I

do recall it saved a

I thought I replied to this, but my computer has been crashing a bit so

maybe I lost that one.

An FFT is inherently a complex function. Real data can be processed by

taking advantage of some of the symmetries in the FFT. I don't recall

the details as it has been over a decade since I worked with this, but

you can fold half of the real data into the imaginary portion, run a

size N/2 FFT and then unfold the results. I believe you have to run a

final N sized complex pass on these results to get your final answer. I

do recall it saved a

***lot***when performing FFTs, nearly 50%.
--

Rick C

Rick C

Re: All-real FFT for FPGA

My understanding is that there were some software packages that baked

that into the algorithm, for what savings I don't know. I was wondering

if it was done for FFTs as well.

--

Tim Wescott

Wescott Design Services

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: All-real FFT for FPGA

On 2/13/2017 12:03 AM, Tim Wescott wrote:

I'm not sure what you mean. When you say "baking" it into the

algorithm, it would do pretty much what I described. That

algorithm. I haven't heard of any other optimizations. The savings is

in time/size. Essentially it does an N/2 size FFT with an extra pass,

so instead of N

I'm not sure what you mean. When you say "baking" it into the

algorithm, it would do pretty much what I described. That

***is***thealgorithm. I haven't heard of any other optimizations. The savings is

in time/size. Essentially it does an N/2 size FFT with an extra pass,

so instead of N

***log(N) step it takes (N+1)***log(N/2) steps.
--

Rick C

Rick C

Re: All-real FFT for FPGA

On 2/13/2017 1:48 AM, rickman wrote:

Opps, I wrote that wrong. The optimized result would be order N/2 *

(log(N)+1). Just to be clear (or less clear depending on how you read

this), there are actually N/2 butterflies in each pass of the FFT. I

didn't show that since the constant 1/2 applies to both the standard FFT

and the optimized FFT. The point is the number of butterflies is cut in

half on each pass of the FFT for the optimized approach.

Opps, I wrote that wrong. The optimized result would be order N/2 *

(log(N)+1). Just to be clear (or less clear depending on how you read

this), there are actually N/2 butterflies in each pass of the FFT. I

didn't show that since the constant 1/2 applies to both the standard FFT

and the optimized FFT. The point is the number of butterflies is cut in

half on each pass of the FFT for the optimized approach.

--

Rick C

Rick C

Re: All-real FFT for FPGA

There's a distinct symmetry to the Fourier transform that you could use

at each step of the way instead of doing the whole thing and fixing it up

at the end.

I don't know if it would save steps, but it would certainly be easier on

someone who just wants to apply an algorithm.

--

Tim Wescott

Wescott Design Services

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: All-real FFT for FPGA

On 2/13/2017 12:56 PM, Tim Wescott wrote:

Are you referring to the nature of the FFT on real signals (i.e. data

sets that have zeros for the imaginary data)? That would only help with

the first pass of butterflies. Once your perform those the imaginary

part is not zero anymore.

That's why you get very little optimization by plugging in the zero data

and letting the tools work it out.

On the other hand, the vendor's FFT logic generator should support real

data inputs. It is a common enough need.

Are you referring to the nature of the FFT on real signals (i.e. data

sets that have zeros for the imaginary data)? That would only help with

the first pass of butterflies. Once your perform those the imaginary

part is not zero anymore.

That's why you get very little optimization by plugging in the zero data

and letting the tools work it out.

On the other hand, the vendor's FFT logic generator should support real

data inputs. It is a common enough need.

--

Rick C

Rick C

Re: All-real FFT for FPGA

On Mon, 13 Feb 2017 16:08:00 -0500, rickman wrote:

Yes, but the imaginary and real parts are still symmetrical around f = 0,

so you should neither have to compute nor use them.

Yes, unless the optimizers get

I agree, but when I looked I didn't see it.

The Xilinx logic generators also seem to only support 2^n vector sizes.

I don't know how convoluted things get, but I know that with a general-

purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n

one. Don't ask me how -- it's just a box with magic inside for me at

this point.

Yes, but the imaginary and real parts are still symmetrical around f = 0,

so you should neither have to compute nor use them.

Yes, unless the optimizers get

___really___smart.I agree, but when I looked I didn't see it.

The Xilinx logic generators also seem to only support 2^n vector sizes.

I don't know how convoluted things get, but I know that with a general-

purpose processor, a prime-value-sized FFT is nearly as fast as a 2^n

one. Don't ask me how -- it's just a box with magic inside for me at

this point.

--

Tim Wescott

Control systems, embedded software and circuit design

Tim Wescott

Control systems, embedded software and circuit design

We've slightly trimmed the long signature. Click to see the full one.

Re: All-real FFT for FPGA

On 2/13/2017 9:39 PM, Tim Wescott wrote:

We are talking two different things. The HDL synthesis optimizers are

optimizing

logic for.

When I first learned FFTs, I did it by looking at the equations in terms

of the twiddle factors each input was multiplied by as it contributed to

the final value. It works out to be the same calculation as the DFT.

It is taking advantage of the cyclic nature of the twiddle factors to

avoid redundant calculations. The same basis provides the various

symmetries.

I can't say how prime sized data sets work out. I'm very surprised they

even exist. I can't picture how the butterflies would work.

We are talking two different things. The HDL synthesis optimizers are

optimizing

***logic***. They don't know diddly about what you are using thelogic for.

When I first learned FFTs, I did it by looking at the equations in terms

of the twiddle factors each input was multiplied by as it contributed to

the final value. It works out to be the same calculation as the DFT.

It is taking advantage of the cyclic nature of the twiddle factors to

avoid redundant calculations. The same basis provides the various

symmetries.

I can't say how prime sized data sets work out. I'm very surprised they

even exist. I can't picture how the butterflies would work.

--

Rick C

Rick C

Re: All-real FFT for FPGA

On Mon, 13 Feb 2017 22:36:41 -0500, rickman wrote:

Well, yes, but these computers are getting smarter all the time. When

you tell Synopsis to synthesize and it says "your fly is unzipped",

you'll know that the optimizers are too damned smart.

--

Tim Wescott

Wescott Design Services

Well, yes, but these computers are getting smarter all the time. When

you tell Synopsis to synthesize and it says "your fly is unzipped",

you'll know that the optimizers are too damned smart.

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: All-real FFT for FPGA

If for example you compute (a

*** b) and also compute (a ***-b),

the synthesizer is smart enough to know there are not two full

multipliers needed.

I'm very unclear that it would not optimize past the first butterfly.

They can also optimize across pipeline stages and also modify

the number of bits of state and their logic values within a pipeline

register.

But, to know for sure, try running it and see what happens.

(And you want to use Synopsis or the equivalent, and probably not

Xilinx/Altera native synthesizers...)

If faced with this design I would certainly give it a shot and

see if the synthesis is good enough, rather than assuming it isn't

and embarking on a major bit of perhaps unneeded design work.

Steve

Re: All-real FFT for FPGA

On Tue, 14 Feb 2017 06:52:32 +0000, Steve Pope wrote:

I would be very leery of an optimizer that felt free to optimize things

so that they are no longer bit-exact -- and for some combinations of

bits, I'm pretty sure that -(a

I would be very leery of an optimizer that felt free to optimize things

so that they are no longer bit-exact -- and for some combinations of

bits, I'm pretty sure that -(a

*** b) is not necessarily (a ***-b).
--

Tim Wescott

Control systems, embedded software and circuit design

Tim Wescott

Control systems, embedded software and circuit design

We've slightly trimmed the long signature. Click to see the full one.

Re: All-real FFT for FPGA

Of course, synthesizers need to be bit-exact and conform to the

HDL language spec

That is not the example I gave, but in either example you still would

not need two multipliers, just one multiplier and some small amount of

logic; any reasonable synthesizer would not use two multipliers

worth of gates.

Steve

Re: All-real FFT for FPGA

On 2/14/2017 11:39 AM, Steve Pope wrote:

How is that not the example you gave? If the tool is going to use a

single multiplier for calculating (a

calculates (a

I don't know that -(a

-b).

How is that not the example you gave? If the tool is going to use a

single multiplier for calculating (a

*** b) and (a ***-b), that implies itcalculates (a

*** b) and then negates that to get (a ***-b) as -(a * b), no?I don't know that -(a

*** b) wouldn't be exactly the same result as (a ***-b).

--

Rick C

Rick C

Re: All-real FFT for FPGA

Am 14.02.17 um 20:21 schrieb rickman:

It is bitexact the same. In IEEE float, the sign is signalled by a

single bit, it doesn't use two's complement or similar. Therefore, -x

simply inverts the sign bit, and it doesnt matter if you do this before

or after the multiplication. The only possible corner cases re special

values like NaN and Inf; I believe that even then, the result is

bitexact the same, but I'm too lazy to check the standard.

Christian

It is bitexact the same. In IEEE float, the sign is signalled by a

single bit, it doesn't use two's complement or similar. Therefore, -x

simply inverts the sign bit, and it doesnt matter if you do this before

or after the multiplication. The only possible corner cases re special

values like NaN and Inf; I believe that even then, the result is

bitexact the same, but I'm too lazy to check the standard.

Christian

Re: All-real FFT for FPGA

On Tue, 14 Feb 2017 20:42:12 +0100, Christian Gollwitzer wrote:

While the use of floats in FPGAs is getting more and more common, I

suspect that there are still loads of FFTs getting done in fixed-point.

I would need to do some very tedious searching to know for sure, but I

suspect that if there's truncation, potential rollover, or 2's compliment

values of 'b1000...0 involved, then my condition may be violated.

And I don't know how much an optimizer is going to analyze what's going

on inside of a DSP block -- if you instantiate one in your design and

feed it all zeros, is the optimizer smart enough to take it out?

While the use of floats in FPGAs is getting more and more common, I

suspect that there are still loads of FFTs getting done in fixed-point.

I would need to do some very tedious searching to know for sure, but I

suspect that if there's truncation, potential rollover, or 2's compliment

values of 'b1000...0 involved, then my condition may be violated.

And I don't know how much an optimizer is going to analyze what's going

on inside of a DSP block -- if you instantiate one in your design and

feed it all zeros, is the optimizer smart enough to take it out?

--

Tim Wescott

Control systems, embedded software and circuit design

Tim Wescott

Control systems, embedded software and circuit design

We've slightly trimmed the long signature. Click to see the full one.

Re: All-real FFT for FPGA

On 2/14/2017 3:22 PM, Tim Wescott wrote:

There is such a thing as "hard" IP which means a block of logic that is

already mapped, placed and routed. But they are rare, but not subject

to any optimizations. Otherwise yes, if you feed a constant into any

block of logic synthesis will not only replace that logic with constants

on the output, it will replace all the registers that may be used to

improve the throughput of the constants. There are synthesis commands

to avoid that, but otherwise the tool does as it sees optimal.

The significant optimizations in an FFT come from recognizing all the

redundant computations which the optimizer will

they are dispersed across columns and rows or time. Logic optimizations

are much more obvious and basic. Otherwise you could just feed the tool

a DFT and let it work out the details. Heck, that might get you some

significant savings. In a DFT done all at once, the tool might actually

spot that you are multiplying a given input sample by the same

coefficient multiple times. But that's still not an FFT which is much

more subtle.

There is such a thing as "hard" IP which means a block of logic that is

already mapped, placed and routed. But they are rare, but not subject

to any optimizations. Otherwise yes, if you feed a constant into any

block of logic synthesis will not only replace that logic with constants

on the output, it will replace all the registers that may be used to

improve the throughput of the constants. There are synthesis commands

to avoid that, but otherwise the tool does as it sees optimal.

The significant optimizations in an FFT come from recognizing all the

redundant computations which the optimizer will

***not***be able to see asthey are dispersed across columns and rows or time. Logic optimizations

are much more obvious and basic. Otherwise you could just feed the tool

a DFT and let it work out the details. Heck, that might get you some

significant savings. In a DFT done all at once, the tool might actually

spot that you are multiplying a given input sample by the same

coefficient multiple times. But that's still not an FFT which is much

more subtle.

--

Rick C

Rick C

#### Site Timeline

- » Intel (Altera) announces Cyclone-10
- — Next thread in » Field-Programmable Gate Arrays

- » Go to church
- — Previous thread in » Field-Programmable Gate Arrays

- » No more gate-level simulation. for Cyclone V !!!
- — Newest thread in » Field-Programmable Gate Arrays

- » Re: MÅ‚ody technik-roczniki, archiwum
- — The site's Newest Thread. Posted in » Electronics (Polish)

- » Huawei E3372h-153 USB 4G stick working on Rpi4
- — The site's Last Updated Thread. Posted in » Raspberry Pi Group