Parallel NCO (DDS) in Spartan3 for clock synthesis - highest possible speed?

Gurus,

I have built and tested a numerically-controlled oscillator (clock generator) using a simple phase accumulator (adder) and two registers. One register contains the tuning word (N), and the other is used in the feedback loop into the second input of the adder.

I take the MSB of the feedback register as my synthesised clock. I am generating sub 50kHz clock frequencies, by clocking the feedback register at 100 MHz. The accumulator is a 32 bit adder as is the feedback register (of course). Works nicely on a board (my tuning word comes from a processor chip, and my spectrum analyzer tells the truth when I look at my MSB generated clock).

To reduce the jitter I would like to run two or more phase accumulators in parallel which are clock-enabled on every-other clock cycle (as per Ray Andraka's suggestion from the "how to speed up my accumulator" post by Moti in Dec 2004) and then switch between the MSBs of each accumulator using a MUX on the MSBs.

The problem then comes down to how fast I can switch the MUX - the faster the better.

  1. Is the Xilinx CoreGen 1-bit MUX a good option?

  1. For a 4-input 1-output MUX I would need a 2 bit counter counting the select word in sequence 00, 01, 10, 11, 00 .... - how fast could this be done?

  2. What about using a fast parallel-to-serial converter approach ? (feeding the outputs of each NCO into a shift register and then blasting out the bits really fast to a pin - effectively doing a round-robin type switching between the MSB of each NCo).

I have designed (but not yet implemented) this scheme, and I would like some advice relating on how best to best do this.

I look forward to everyone's replies!

Cheers, PeterC.

Reply to
PeterC
Loading thread data ...

Peter, You need to get the MSB out of the FPGA, right? Look at using the double data rate FFs in the IOBs. I think the FF is called FDDRCPE in the libraries guide. This will let you get data out of the FPGA at twice your clock rate, one bit on the riding edge, one on the falling edge. This is probably the fastest, certainly the easiest and most reliable way to mux data out of the part. Also, use a DCM for your clock to make sure you have a 50% duty factor on the clock. Check out XAPP265. That guy gets 840 Mbps out of an LVDS output. HTH and good luck, Syms.

Reply to
Symon

"PeterC" schrieb im Newsbeitrag news: snipped-for-privacy@g47g2000cwa.googlegroups.com...

What about going analog ? This means: Build a R2R DAC with 2 CAT16 Respacks from 4 output's and then anti alias filter it.

I use this approach to generate a high spectral purity 27MHz +-1% clock with a 48Bit DDS running at 100MHz. I generate a 5 Bit sine value out of a 16 entry ROM, dither this to 4 Bit at 200MHz (with help of the DDR IOB FF's).

I connect the output node of the R2R DAC to a LC Parallel Resonant ciruit (the Filter) at 27MHz. This filter has the additional advantage to filter out more than the aliasing frequencies (also the quantization noise away from the 27MHz filter center) than a multiple order low-pass _and_ being much simpler and cheaper.

This is then fed to the "receiving gate", a LVDS transmitter in my case making the analog sine wave a good digital signal.

The spectral purity can get quite high.

And there is still room for additional improvement.

Raymund Hofmann

Reply to
Raymund Hofmann

The DCM for 50% duty cycle correction is great. I'd add two things: 1) the phase accumulator can be staged so you have

4 8-bit adders instead of 1 32-bit adder allowing higher accumulator speeds, and 2) don't implement the full phase accumulator for the multiple NCO copies; use one phase accumulator but add different phase values (N/4, N/2, 3N/4, N) for different MSBs. This way your accumulators will never be mis-syncronized. If your frequency range is always tight (e.g., 25-50kHz) you can even reduce the resolution of the non-accumulating adders (N/4, N/2, 3N/4).

For real *fun* you can use bit-serial arithmatic do to a 32-bit NCO then do a bit-serial divider to figure out what fraction of N the accumulator had when (and only when) it rolled over. While this isn't your typical 30-minute design session, it can be a great learning experience! I designed a bit-serial NCO a while back and know how to do nice pipelined dividers but haven't yet implemented those as bit-serial elements. Since your 50kHz or lower speed gives 2k cycles at 100MHz (or

8k cycles at 400MHz) you could use the technique to give you the maximum achievable DDR output rate the chip can support. Bit-serial is really amazing in this respect.

In any case, the speed of the MUX you choose shouldn't be the limiting factor in your design. With the DDR IO register and pipelining, the MUX functionality can be 1 LUT of logic between registers at the maximum chip speed.

Reply to
John_H

At your Sub 50KHz, what frequency step can you tolerate ? You can trade off average precision for purity.

DDS gives a numerical frequency, whose average has many digits.. but as you have found, it has a lot of phase jitter. The alternative is a simple divide by N, ( for 100Mhz - 50KHz, N=2000, so your next freq step (/2001) is just under 25Hz away. For audio, that's probably tolerable ?

(You can think of the DDS as dithering between these two values)

At 1KHz, steps are much smaller.

More complex, is to use a DPLL, and create Fo = M/N, and you scale both M and N. You will pick up the DPLL jitter as well, but that's usually much smaller than system clk times.

-jg

Reply to
Jim Granville

Thank you for your detailed system description Raymond - unfortunately cost is critical, and I simply don't have the option of using any external components - hence the desire to synthesize useable audio clocks completely in the FPGA, ideally from a cheap crystal (or the crystal already used by the processor chip, as I'm doing now)..

PeterC.

Reply to
PeterC

Symon,

Yes, I need the MSB out of the FPGA, to drive an audio DAC. It's value only really changes at 50kHz or so, but to reduce the jitter associated with this low frequency transition, the clock that drives it out needs to be as fast as possible (obviously). 840 Mbps would give 1.2 ns of jitter which would be more than good enough. The problem is that the same NCO must generate an (approx) 12 MHz and 24 MHz signal - a few ns jitter on these is unacceptable. I will look at the FDDRCPE in the IOBs

- great hint and much appreciated.

I'm considering introducing 4 bits of dither, using a four 30-bit LFSR (linear feedback shift registers) which would give a nice and long (in terms of repeat cycles) pseudo-random 4-bit word sequence, to spread out my side-bands (I can live with the raised noise floor).

Cheers, Peter C.

Reply to
PeterC

John -

Pipelining the accumulators I will certainly look at and this should be simple, since they have simple ripple-carry carry chains, will try 8 then 4-bit granularity if needed.

On your point (2), I'm not sure I understand completely - this would require MUXing both inputs of a single adder - both the feedback and the input tuning words, adding an additional MUX delay? Yes, my tuning range spans about 10kHz around the 50 kHz point, and I would like to do this with single Hz resolution. If you can send a quick sketch to peter (at) geckoaudio (dot) com that would be great.

By "reducing the resolution of the non-accumulating adders" I take it to mean that since N/4 etc will be a relatively small number, it certainly would not need to sit in a 32-bit register?

The bit serial approach is interesting, but I think the internal fabric clock limit is around 300 MHz anyway, and an 8-bit or 4-bit pipelined adder would probably run at close to this anyway (I'm guessing here)?

On the topic of *fun* - how does knowing the ratio of the contents of the accumulator to the tuning word (N) after it turns over? Excuse my ignorance, but I don't see how this is useful.

Cheers, PeterC.

Reply to
PeterC

Jim,

I can tolerate a 1 Hz step (I need real-time tuning with at least this resolution, as well as a small number of "coarse" steps of about 5kHz). Apologies for not posting this initially to eliminate this as a candidate, I have thought about the simple integer division - but my range and tuning require DDS. As much as I'd like to, I can't use a PLL due to cost!

Cheers, Peter.

Reply to
PeterC

The DPLL I meant, was the Clock module inside the FPGA, not an external one.

A simple divider, from ~200Mhz, gives better than 1Hz dF, below 14KHz Fo. Could that be good enough ? [It will have vey low jitter]

-jg

Reply to
Jim Granville

Peter, you fist of all, have to decide on frequency resolution, and acceptable jitter or phase noise. Resolution is easy with DDS, just make the accumulator long enough. Jitter is fundamentally determined by the clock frequency. I would try

200 MHz or (in Virtex-4) 400+ MHz. That gets you to a few ns. If you need better, you can struggle with a factor of 2, but anything below 2 ns is tough, and below 1 ns is impossible, unless you use MGTs. There you can get 300 ps granularity from the 3 Gbps outputs, which sounds like 150 ps jitter. It takes some trickery and some duplication of resources, so it's not all that cheap. And Spartan and its friends do not have 3 Gbps transmitters... Find out first what you really need. Jitter is your enemy. And fighting it is never cheap. Peter Alfke, Xilinx (from home)
Reply to
Peter Alfke

Jim - the numbers you have chosen are of course correct, but I'm missing the point -

14,000.7000 Hz = 200 MHz / 14285. Next divisor is 14286,, which gives 13999.7200 Hz, so yes 0.7 Hz control is possible for a 14 kHz output frquency. But sub 1 Hz adjustment is also possible Fo = 15 kHz for example.

I do need the same degree of control around 50 kHz (ideally even better than 1 Hz, down to as low as 0.1 Hz) so I don't think a simple integer division is acceptable.

As far as Peter's comments - I simply don't know exactly what the jitter spec and freq resolution should be - it all depends on other parts of the system which are being simultaneously designed. It comes down to a certain amount of experimentation to see how the audio DAC output spectrum will behave with jittery clocks.

Reply to
PeterC

The accumulator feeds four values but only one of those values (Acc+N) feeds back to the accumulator. The other three values (Acc+N/4, Acc+N/2, Acc+3N/4) are simple adders where all you care about are the MSbits. Are you a Verilog guy? It would be much easier to send you a few lines of code rather than a sketch.

It's not about the size of the number, it's that you don't need 32 bits of precision to decide if the edge should go at at the quarter-period point or at the half-period. If your jitter is already at about 1/4 cycle, your adders don't have to be much more precise than that to give you 4 MSBits for 4 clock phases with no (noticeably) additive jitter.

If you're running a phase accumulator at a master clock rate, at the update of that clock when the MSBit changes will have LSBits in the range of 0

Reply to
John_H

Would it be possible to use the FPGA to interpolate the values going to the DAC to compensate for the clock jitter? Use the previous and current uncorrected DAC values and the phase error for the current clock pulse to estimate the corrected DAC value. This would ease your clock jitter requirements greatly.

Daniel Lang

Reply to
Daniel Lang

1.2ns of jitter would be good for about 14 bits accuracy on a 20kHz signal, assuming a traditional multi-bit DAC. Oversampling, noise-shaping DACs ("1-bit" outputs) tend to be less tolerant of jitter as it affects the entire audio spectrum instead of mainly high frequency signals.

- Brian.

Reply to
Brian Drummond

Hi, PeterC Here is a method that gives very fine resolution at low cost, but is not so easy to tune:

Use the DCM in frequency synthesis mode. Let's assume a 100 MHz input clock. By using various mixtures of Multiply and Divide, you can generate, for example, 16 different frequencies between 103.22 MHz and

106.6 MHz (starting with 32/31, the 31/30, then 30/29 etc) Yes, you can multiply 100 MHz by 32 if you simultaneously also divide it down to a resonable value.

you can divide down to 50 kHz with a granularity of a few Hz (perhaps even below 1 Hz if you have some patience playing with the numbers. The output frequency is 100 MHz x (A/B) / C,, where A and B are any integer up to 32, and C is any integer that you need. Jitter will be low, but there might be some wander. Best check for spectral purity.

The problem is that this approach is not straightforward. It needs pre-computation. But if you need just a few hundred frequencies, you can store the constants in a BlockRAM. And you have to reconfigure the DCM whenever you need to change A or B. That may be the biggest drawback. I remember that you needed to convert between specific audio frequencies last summer. This method might be interesting for that. I did not think of it then.

Peter Alfke, Xilinx Applications

Reply to
Peter Alfke

Daniel - this may well be an option (excellent suggestion - thank you!), and is essentially the same technique as used in some sample-rate converters (for arbitrary up-sample rate ratios). Except in this case the output rate is the same as the input rate, which simplifies the design substantially as a simple linear interpolation between adjacent samples would suffice.

I guess the DAC being driven by a clock which has a highly jittery period should be spectrally correct if the data is interpolated in time to the word clock edges which drive it.

I will give this more thought over the coming weeks and may even try an inplementation if time allows.

Cheers, PeterC.

Reply to
PeterC

This would depend on the DAC designs - some have quite long settling times, so this might not work too well on those. Plus, you need to know the exact nature of your jitter ?

-jg

Reply to
Jim Granville

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.