CORDIC for digital downconversion

B

bgaughan 20 years ago

I started with Ray Andraka's papers, "A Survey of CORDIC algorithms...", and "High Performance DDC for FPGAs". From the second paper it seems that DDS approaches with phase resolution larger than 10 bits or so should be done with other methods, such as CORDIC.

Using the CORDIC should also give you the complex mix for free. I'm thinking that you would put your real input into x, set y to zero, and then put in a phase accumulated value for your desired channel LO value. Then you can get out a "de-rotated" I and Q from the X and Y outputs. There's some quadrant mapping in there too.

I'm going to start hacking on one of the Xilinx System Generator CORDIC blocks (SINCOS?) to get what I need. In practice, how many iterations or PEs do you need to get a significant SFDR such as 96dB? With DDS functions there's a formula ceil(SFDR/6) for the phase width. Is there something similar for a CORDIC implementation?

For the 96dB phase-dithered DDS, I'm seeing 15 Block RAMs required, which is expensive. I'm assuming the cores (Sysgen blocks) are using quarter-wave tables? The Taylor-series DDS drops down the RAM but requires multipliers.

Thanks for any suggestions, Brady

Vote

J

John_H 20 years ago

I was playing with a CORDIC implementation approach but with lower (12-bit) phase resolution with a very tiny footprint (no memory). Is your demodulation from 50MS/s+ or 10MS/s- data? I'd be interested to see if I could extend the performance to your needs. Basically I'm pursuing bit-serial manipulation of I/Q data *and* phase to give lower error in the LSbits of the CORDIC stages. I was considering a 16X clock relative to the I/Q data but saw that I might double the performance without doubling the footprint.

The errors from the multiple add/subtract stages need to be included in the SFDR to figure the number of bits of precision needed and should be your overriding "tweak." Adding another 5 stages of fine phase resolution don't do you any good if the arithmetic errors are large. I don't know of - but haven't looked for - a general fomula for the SFDR but one of Ray's papers discusses the arithmetic errors.

Care to share your requirements?

If you use 1 BlockRAM for 9 bits of phase in a 1/8th wave sine/cos lookup (for a total of 12 phase bits) you can have a second 9 bits of lookup (because you have 9 bits available at the BlockRAM address) for the sine/cos of the phase remainder in a second BlockRAM. Standard sin(x+d) transform. The total number of multiples should be 6 and the total memory: 2 BlockRAMs. If you can spare *those* resources, the approach is a smaller footprint than a 1-stage DDS demodulator and provides exceptional dynamic range.

Vote

R

Ray Andraka 20 years ago

Brady,

The larger BRAMs in recent devices have pushed the phase resolution achievable using BRAMs up a little. 12-14 bits is possible.

The CORDIC rotator phase resolution improves by one bit per iteration. For 96 dB, you'll need iterations 0-16 (working from the top of my head here). It is basically the same formula ciel(sfdr/g) but using the combination of the phase error of CORDIC and the phase resolution of your phase input.

The quadrant mapping can be done with either of the reduction rotations I mentioned in the CORDIC survey paper. I'm not sure the Xilinx CORDIC core does that or not (I have my own).

Vote

R

Ray Andraka 20 years ago

You do have to make sure you carry enough extra LSBs in the CORDIC rotator to keep the rounding or truncation errors from accumulating into your output lsb. It turns out it is fairly easy to round at each stage in the CORDIC only adding one extra bit to the bottom of each adder. This works because one input to each adder is shifted, while the other is not. You are only rounding off the lsbs of the shifted input, so adding a '1' below the lsb of the unshifted input reduces the bias. As long as you keep the accumulated I and Q path rounding errors below the lsb error at the output, those won't contribute significantly to the sfdr. The accumulated error is bounded by roughly log2(iterations) bits (I've got it in a spreadsheet somewhere)... that means adding log2(iterations) extra lsbs under the output lsb.

For the look-up table approach, you can use a dual port block RAM with a sine table in it. You access one port with the sine address, and the other port with the cosine address, which is modified from the sine address. For a full cycle table, the address modification is just adding 1 to the top two bits. For a quarter cycle table it is a little more complicated. By using both ports to access different points in the same table you cut the real estate needed for your sine/cos tables in half.

In virtex4, the CORDIC is at a significant performance disadvantage because the fabric carry chains are quite slow compared to the speed of the DSP48 slices and memory. The DSP48's are really not set up for attempting to do CORDIC using them, and if you did, you'd give up all those multipliers.

Vote

B

Brady Gaughan 20 years ago

Thanks for the reply, John_H. My demod is from about 10MHz to 35MHz with a sample rate of 117MHz. I don't need much frequency resolution, say 1Hz or so, but I do require significant SFDR, 96dB. I started with the Xilinx System Generator CORDIC SINCOS block and modified it to take in a Z input and increased the number of iterations or PEs. It looks like I may be running into arithmetic errors as you suggested.

I don't if the Xilinx cores are using quarter-wave tables or not, but I do know that there data sheets claim the lower phase res. blocks uses full-wave and the larger phase DDS's do not. I don't know if it's half-wave or quarter-wave or what, but I would think the ROM address behavior should give me a clue.

Vote

B

Brady Gaughan 20 years ago

Thanks for the reply, Ray. I think the 6dB per bit makes sense intuitively. I will be experimenting with it. I believe I am running into rounding errors with the Xilinx reference design as is, and need to control the LSBs as you mentioned above.

I will also look at smarter look-up table approaches as well.

This application is for a channelizer and synthesizer that is moderately large (24x2 downconverted channels, 24 upconverted channels) and may be too large for a traditional DDC type approach. I am looking at FFT approaches as well such as Polyphase DFT and WOLA. However, I don't want to completely rule out a "smart" DDC design, especially with very large oversampling, 400+.

Vote

R

Ray Andraka 20 years ago

Brady, What kind of sample rates are you dealing with? 24 channels isn't much if the sample rates are reasonable. 24 Channels is fairly light for a polyphase channelizer, so the FFT can be physically quite small even for high data rates. The polyphase channelizer means the channels are equally spaced instead of being independently tuned, but you already know that.

For example, I did a 160 channel DDC for a customer about 2 years ago in a 2v6000 that had input sample rates ate 2M and output sample rates at 50K. Each channel is independently tuned, and the output rate is the same for all 160 channels.

I just finished a 10 channel DDC design in a V4SX55 that samples at

500MHz and has variable downsampling from 62.5MHz on down for a beamforming reciever. Each channel is independently tuned (each FPGA services one antenna element). It's 70% or so full 4VSX55, clocked at 250 Mhz.

Vote

B

Brady Gaughan 20 years ago

Ray, the first channelizer/combiner will be GSM at 1X symbol rate,

270.833kHz, 200kHz spacing. So channel spacing and sample rate are not a simple integer relation. One of the things that I have been looking at is mixed-Radix FFT, such as 3-3-5-13, or 585, 117Mhz/585 would give 270.833kHz for a polyphase DFT. I have also been looking at resampling approaches such a Farrow Interpolation, and weighted-sinc interpolation.

Still, I don't want to rule out a DDC approach. The decimation is large, 432, so I can definitely take advantage of multiplexing and distributed arith.

Vote

B

Brady Gaughan 20 years ago

Ray, the channel bandwidth is also fairly large, 22MHz, so while there are only 24 channels, I potentially would need more than 110 FFT bins. It would not have to be as big as I put above though, since I could decimate the above by 3 or 5 or 9 as part of the complex downconvert.

Vote

CORDIC for digital downconversion

Join the Discussion

Didn't find your answer?