how to speed up my accumulator ??

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Dec 9, 2004 3:26 PM

Well, I wouldn't go *that* far.. :)

formatting link

You have a *much* better memory than I do. I think I had looked into this, but my idea was rejected by higher ups in favor of a speciallized chip that actually used the top N bits of the accumulator to drive an ADC. This sine wave was then filtered and fed back to the chip for clipping via a comparator.

I looked at the posts that you refer to. That post has some defunct links for other posts or web pages. Heck, a couple of them are to altavista that doesn't even refer you to whoever bought them. Things change fast on the internet.

What phase detectors *are* linear?

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Dec 9, 2004 5:03 PM

formatting link

That is a sound way of solving the problem. In addition, it's fairly well understood. It's no surprise that the higher ups liked it.

And now we have groups-beta.google.com. I liked the old interface.

Here's the missing link (about phase noise in analog plls):

formatting link

Aargh! groups-beta.google.com now uses a variable spaced font for ascii art. Bad! Bad!

Digital PFDs with charge pump outputs are about as good as it gets.

An analog multiplier or a diode ring DBM might be quite linear for small level inputs, but these have the disadvantage of a sinusoidal characteristic (i.e. they're quite non-linear for jitter inputs above about 0.1UI - a post NCO PD will typically see much more than that) and don't come with a built-in frequency detector.

Regards, Allan

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Dec 9, 2004 9:36 PM

On this design I don't have the luxury of enough space for this chip. I can provide a PLL and put the NCO in the FPGA however.

formatting link

Thanks for the link. I found the font to be fixed. But some of the drawings did wrap. Maybe the font can be specified in the browser?

PFD means Phase-Frequency-Detector?

I believe you said that the digital phase detectors are not very linear. Are you saying that there are *no* good detectors? What if a digital phase detector were connected to analog current sources so that the pullup and pulldown were balanced? Would that be linear enough? I don't think that would be hard to design.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Dec 9, 2004 11:14 PM

Ayup.

One example of a linear phase detector is an XOR to provide a center level at 90 degree phase shift with nonlinearities as one approaches +/- 0.25 UI of jitter with the circuit giving up in the "other" 180 degrees of phase. I'd worry about implementing an XOR-type PFD in an FPGA because of problems not only with jitter but with feedthrough from other frunctions in the FPGA on the Voh and Vol levels.

One PFD I used about 10 years ago that appeared to have good specs was the Analog Devices AD9901 which also produced a voltage output rather than the charge pumps we tend to be more familiar with.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 12:35 AM

Moti,

There are a couple things you can do. First off, if you look closely at an accumulator, the feedback from a particular bit only affects that bit and bits with greater significance. That suggests that you can perform partial sums and then combine the results. One simple trick that takes 2x the resources of the straight accumulator is to break your 32 bit accumulator into two 16 bit accumulators. The carry out of the first gets registered and fed into the carry in of the second. Note that by doing that, the second follows the first by a clock cycle, so you need to delay the upper half of the addend (the new value getting added, not the feedback value) by a clock cycle using a register so that it arrives at the upper half of the accumulator at the same time as the registered carry from the lower half. Likewise, the lower half sum output (but not the feedback) has to be delayed by a clock cycle to align it with the upper half sum. On the surface, that would seem to permit almost double the clock speed (and it did in older Xilinx devices), however in Virtex devices the propagation time to get on and off the carry chain is an order of magnitude larger than the bit to bit propagation times, so in reality the gain from this trick is rather small until you get into truely huge accumulator widths.

A more usable trick requires a little more attention to the design implementation. The carry chains are typically the critical path (mostly because the times to get on and off the chain are on par with the LUT delay). You can't do much anything about the delay in the carry chain or the intrinsic delay for getting on and off the chain. You can, however, minimize the delays on the signal connecting to the carry chain input. This means making sure that you only have one level of logic (ie, flip-flops are directly driving the LUTs that feed the carry chain), and you need to make sure those flip-flops are placed in close proximity to the carry chain (ideally either in the same CLB, or on an adjacent CLB so you can use the direct connect wires). Note that the automatic placement is not particularly good at making sure those flip-flops are placed this way. The accumulator feedback doesn't need to be pipelined because it is connecting back around to the same bit (assuming you've reduced the logic to 1 level), which means it is already pipelined as much as it could be. You may need to pipeline the new addend path in order to achieve the one level of logic at the accumulator and keep the driving flip-flops in adjacent slices, but that is OK as it doesn't affect the accumulator operation.

You normally should use active high resets because that is what is native to the fpga. In this case, I don't think it is affecting your timing however, because it is an asynchronous reset. Had it been a syncrhonous reset, some synthesizers would have inserted a gate between the carry chain and the register, which would have added an extra LUT delay to the input path rather than inverting the resetn signal.

Hope this helps

Moti Cohen wrote:

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 12:50 AM

Moti,

Another trick for NCO's that will double the speed, assuming you are careful about how you update the tuning word (phase increment) is to use two accumulators running clock enabled on every other clock. One is initialized with 0, the other is initialized with the phase increment value, and then both are incremented by 2x the phase increment on every other clock. This allows two clock periods for the accumulator carry to propagate, and you get the current and next phase out at the same time on every other clock. A 2:1 mux switching at the clock rate selects the output from the two accumulators on alternate clocks.

I've used this trick also for cases where the mixer it is driving can't run at the full clock rate (I usually use a CORDIC rotator there, see my XCELL article about that). In that case, you use duplicate copies of the mixer, one for even samples and the other for odd samples. The phase for both is incremented by 2x the sample phase increment, with one offset by 1x the phase increment.

I don't think you'll have to resort to this to get to 200MHz with a Spartan3,however. I'm pretty sure careful floorplanning combined with making sure you have just one level of LUT logic in the critical path (new addend through carry chain to accumulator register) will get you 200MHz with a very comfortable margin at 32 bits.

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 6:58 AM

I made an error earlier - the mixer type phase detectors only have a sinusoidal PD characteristic when fed with sine waves. When fed with square waves (such as from the output of an FPGA or divider) they have performance basically identical to that of an XOR gate, with a triangular PD characteristic. An XOR gate is much cheaper and easier to use than a multiplier, of course. One can work around the Voh & Vol problems by using an external CMOS gate fed from filtered supplies.

One disadvantage of the XOR gate (apart from the lack of frequency discrimination) is that, at lock, at has a square wave on its output. This results in reference spurs at the PLL output that can be difficult to remove. Since the square wave is predictable, it is possible to come up with compensation schemes (e.g. by adding a fixed duty cycle square wave of opposite phase) to reduce the level of the spurs.

The AD9901 is basically an XOR gate with an added frequency discriminator.

Regards, Allan

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Dec 10, 2004 8:30 AM

... and T flip flops on the XOR inputs to ensure 50% duty cycle.

Allan