DDS - ElectronDepot

- M
- maxascent
  
  Contact options for registered users
posted
9 years ago

Mon, Mar 2, 2015 3:03 PM

What is the best way to implement a multi channel DDS. I need a DDS that has 8 channels that are time-multiplexed. I am using a Sparatn 6.

Thanks

--------------------------------------- Posted through

formatting link

- L
- langwadt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Mar 3, 2015 12:31 AM

Den mandag den 2. marts 2015 kl. 16.03.14 UTC+1 skrev maxascent:

what's wrong with this?

formatting link

-Lasse

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Mar 3, 2015 2:47 AM

A DDS circuit is not as simple as some would think and it is not as complex as others would lead you to believe. What you need to be aware of is that they can produce spurs if not carefully designed.

That said, I'm not sure what a "multichannel" DDS is. If you mean 8 DDS circuits, then ok, that is clear enough. Are you looking for some way to share the circuitry? The circuitry is not overly complex - it usually consists of a counter or adder to set the phase and a sine look up table to convert the phase to a sine value for the output.

If you don't wish to fully duplicate this circuit and your speed requirements are such that you can multiplex the logic, you only need to duplicate the phase step size register and the phase accumulator an add some circuitry to multiplex them through the adder and look up table. This is easy to do in an FPGA by using LUTs as an 8 register bank.

If you have a phase offset register that needs to be replicated and muxed as well.

If you need low spurs and high resolution to your DAC you can use some approximations to a sine value using the trig function

sin(A+B) = sin A cos B + cos A sin B or cos(A+B) = cos A cos B ? sin A sin B

Either one will do once we make the following approximations...

The main one is based on A being the MSBs and B being the LSBs. So A

small angle. The main approximation is that cos B will be very close to

1 so that you replace it with 1. Then the first term in each equation will just be one trig lookup for the coarse value if sin A or cos A.

The second term can be looked up using the same first table for A and a second table for B which has the fine values, then multiply to get the product. Or you can make another approximation. Since the value of sin B is very small, the second term is going to be very small. This means the error from truncating the A and B inputs to the second term will also be small. So instead of a full size trig table for B you can use a single table with a truncated A and truncated B input with the second term as the output saving the multiplier. This will give you larger spurs than using the full look up tables and performing the multiply, but will save some hardware. In either case the spurs will be *much* smaller than if you simply use the truncated term sin A or cos A.

--

Rick

- M
- mnentwig
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Mar 5, 2015 9:43 PM

One possibility is to implement the waveform as polynomial / spline.

The Horner scheme on Spartan 6 works nicely with four cycles pipeline delay. In other words, you can run four independent channels using the same multiplier and one port of a dual-port RAM for coefficients. The second RAM port can serve a second multiplier => 8 channels.

A "mainstream" DDS would be my first choice - don't fix it if it ain't broken. The above might work, depending on your application's needs, and would be fairly compact.

You can find example Verilog code for a four-variable pipelined polynomial interpolator here, at the bottom of the post ("Pipelined RTL implementation")

formatting link

There is a matlab script included to calculate the fixed point coefficients, e.g. edit the "chirp example"

y = cos(2*pi*x.^2*5); to a plain sine wave.

--------------------------------------- Posted through

formatting link

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Mar 5, 2015 10:10 PM

The problem with the mainstream DDS is that for any app where spurs close to the carrier is a problem, it *is* broken. That's why I suggested the calculations to extend the precision of the LUT method. Of course they are only needed if the phase noise is a problem.

I don't follow your notation. What does the period after the 'x' mean?

What sort of phase noise does your polynomial generate? Are there spurs close to the carrier? Many apps need spurs to be -120 dB or so from the carrier. For some apps the spurs need to be either that low to start with or are far enough in frequency from the carrier so they can be filtered to that level.

--

Rick

- M
- mnentwig
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Mar 6, 2015 9:40 AM

Hi Rick,

I don't have any hard data on the signal quality as I used this for audio-frequency (modeling a Vox Continental electron organ with something like 96 independent oscillators). Spurs "should" be an implementation issue, but then most things are...

Pocket calculators use polynomials for function approximation, so the method itself doesn't worry me. Increasing polynomial order is usually quite effective, compared to increasing lookup table resolution.

It might help to use a wider multiplier and more (e.g. 8-stage) pipelining. With 18 bit arithmetics in my example implementation, the total SNR over the whole bandwidth can't exceed 18*6+1.7 ~ 100 dB and the implementation is probably 10 dB worse than that (e.g. 1 LSB error would be 6 dB loss).

What makes the method attractive is that multi-channel operation can exploit the pipelining, which is needed anyway to manage the critical path in the Horner scheme calculation.

This refers to the matlab script that calculates the polynomial coefficients (link from the blog article). It's Matlab notation for "square every vector entry individually". This example creates a chirp waveform with linearly increasing frequency. To create any other waveform, e.g. a plain sinewave, put it here into the script.

Spur performance, I didn't analyze this. Intuitively, I don't see anything that couldn't be "cleaned up" - if nothing else helps, dither the phase accumulator before it goes into the polynomial, lose some wideband SNR but get rid of discrete spurs.

In other words: I have used this for heavily multi-channel tone generation, but not to communications-quality requirements. I don't see any hard obstacles, but the proof is left to the reader.

--------------------------------------- Posted through

formatting link

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Mar 6, 2015 7:44 PM

Of course, but the method used imposes costs for any given requirement for spurs and that is the issue. How complex does the logic need to be for a given quality of signal and in particular, how the specifics of that quality level affects a given application.

That depends on the number of taps in the polynomial, which I assume you equate to the number of stages in your pipeline. In that case the number of stages is the number of multipliers, e.g. the cost in terms of logic.

Utilizing pipelining is a separate issue really. Nearly any method can do that, even a table lookup.

I can't say I follow the notation.

Ok, thanks.

--

Rick

- M
- mnentwig
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sun, Mar 8, 2015 9:28 AM

Hi,

for spurs and that is the issue. How complex does the logic need to be for a given quality of signal and in particular, how the specifics of that quality level affects a given application.

well, the answer could use some better requirements and a couple of days' working time :-)

equate to the number of stages in your pipeline. In that case the

number of stages is the number of multipliers, e.g. the cost in terms of logic.

Not necessarily. My example implementation (previous link) maps four channels and all polynomial coefficients to a single multiplier. As audio frequency example, clock it at 100 MHz for a 96 kHz sample rate => 1000 cycles per sample. Use 10 cycles per waveform (e.g. 5th order polynomial plus some overhead) and I can generate 100 independent waveforms using a single multiplier.

This can be very compact even with fully independent coefficients, because the overhead is fairly cheap, when address selection in a block ram muxes most of the wide signals.

Using one multiplier per polynomial term would be also possible for higher output rate.

Never mind that. It's just a line from the Matlab example (link) where I can put the function, for which I want the fixed-point polynomial coefficients.

It's fairly straightforward in the context of the Matlab script.

do that, even a table lookup. Well, yes, true. But for the polynomial, the critical path is comparatively long (i.e. four multiplications in series, each using four cycles delay) so the pipelining makes a big difference.

Cheers

Markus

--------------------------------------- Posted through

formatting link

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Mar 9, 2015 11:35 PM

Yes, here your cost is time rather than logic. That is the tradeoff. If you were doing faster calculations like radar, for example, you would not have the option of multiplexing the hardware. You would need to burn more logic.

The tradeoff remains. You have to do more work to get a higher precision result with a polynomial, not just in terms of the order of the polynomial, but increased resolution in the multiplies as well.

Why not just explain your equation in terms I can understand? Are you trying to obfuscate it?

You just gave an example where you had time to perform 1000 multiplies for each result and so used no pipelining. Still, the point remains that pipelining has nothing to do with resolution really.

--

Rick

- M
- mnentwig
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Mar 14, 2015 9:39 PM

Hi Rick,

when the critical path of an operation is x cycles long, you have a choice that includes the options of

a) utilizing your hardware 1 cycle out of x cycles, effectively wasting (x-1) cycles

and b) pipelining x independent operations and utilizing the hardware x cycles out of x.

For higher-order polynomial interpolation, x is relatively high (say, 20 cycles), that's where pipelining comes in in the context of this thread (multi-channel DDS).

With regard to the equation, if you can't disentangle it, please start a new thread. It is an example describing an arbitrary chirp function, which is off-topic for this discussion, other than pointing out the line where to put your desired function into my matlab script.

--------------------------------------- Posted through

formatting link

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Mar 20, 2015 5:01 PM

Where did these cycles come from? Logic takes some amount of time to process. I can make my clock cycles match my logic if I choose. I'm not sure where you are going with this. I believe we all understand pipelining.

But that depends on many things such as the relative timing of your clock and your logic. You seem to be supposing that each calculation in your algorithm requires a register, a clock cycle and a pipeline stage. The logic *can* be linear without registers. It depends on the application.

I think I asked you to explain your script rather than my learning Matlab. Any chance of using a more conventional notation?

--

Rick