Delaying a pulse train

- M
- m
  
  Contact options for registered users
posted
16 years ago

Tue, Aug 14, 2007 6:52 PM

I have to delay a pulse train by a given number of clocks on the same domain as the pulse to be delayed.

The best approach I can think of is to run a counter of sufficient width and log pulse transitions and states into a circular "pulse transition list" of sufficient depth. After waiting for the desired number of clock transitions (the delay) an output counter of the same width as the input sample counter is allowed to start counting. This counter is used to address the "pulse transition list" to generate a delayed output that matches the input.

Using SelectRAM memory for delay is out of the question as too much memory would be required and it is needed elsewhere in the design.

Can anyone suggest a better way to do this? The incoming pulses are relatively regular and can be of any duration, from a few clocks to hundreds.

Thanks,

-Martin

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 7:01 PM

Martin, have you thought of using the dual-ported BlockRAMs? Use a free-running counter to address one port, where you write the incoming data. Use another counter, appropriately offset, (or a subtractor) to read the data out on the other port.

As l> I have to delay a pulse train by a given number of clocks on the same

- M
- m
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 7:10 PM

It would consume too many BRAMs. I need delays in the order of 500us at 150MHz. That's about 5 BRAMs (Virtex2). I need them for other portions of the design.

It'd sure be easy though!

-M

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 7:28 PM

The entire delay can be "a few" to "hundreds" of clocks and the overall delay can be 500us*150MHz=75000 periods. How many pulses can be in transit at any one time? hundreds? Only one? The transition list might be the lowest-resource approach, especially when using SRLs to maintain the list. I'm just not sure if 16 (or 32 or 64) elements are enough for your needs. You'll need a 17 bit counter to reach 500 us at 150 MHz. It might be handy to guarantee clean operation by adding an 18th bit for pulse polarity. The number of entries you'll need is twice the number of pulses you must accommodate.

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 8:25 PM

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 8:54 PM

His original suggestion was to mark the edges on the input and replicate the edges to the output with a counter shifted by the needed delay, accumulating these transitions in a list. As long as the number of transitions in the delay is small (compared to 75k, at least) his suggested approach is wholly adequate.

- M
- m
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 8:56 PM

I have the module I described in the original post working just fine right now. So, for cases where the number of transitions during the delay period isn't excessive, yes, there is another way. I agree though, that, if the intent was to store any random pulse train you'd have no choice but to store and "replay" at a later time...requiring BRAM.

-M

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 9:15 PM

I think his edge-tag scheme is valid and is often used in Logic Analysers, to get better apparent dynamic range. Of course it is more complex that a simple spinning delay buffer, but it may be the Logic/BRAM trade off is worth it ?

-jg

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 9:16 PM

What is your expected maximum edge count, in that 500us ?

=jg

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Aug 14, 2007 10:42 PM

Huffman run-length encoding has been used successfully in early- generation fax machines. Whether that or any other compression (e.g. edge-detection) scheme is good, depends on the characteristics of the bitstream in question. Peter Alfke

- M
- m
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Aug 15, 2007 1:17 AM

I should have also mentioned that there's a need to delay more than one such pulse streams (at least four). That would require about twenty 18K BRAMS...I use that many without going to a significantly larger device.

-M

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Aug 15, 2007 2:13 AM

If you have the time-headroom, you might be able to compress what you have now, with a simple dictionary type lookup, and a dT storage. One more level of lookup, but the average storage/edge can drop. How many bits do you store now, per edge ?

-jg

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Aug 15, 2007 4:27 AM

- L
- lb.edc
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Aug 15, 2007 12:33 PM

Martin,

At the end this will result in using LUT's as long shift register or a DPRAM as FIFO, doesn't it? A memory based approach can save up the usage of LUTs. I think it's best to implement by using block RAM's.

Luc

- N
- nospam
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Aug 15, 2007 3:59 PM

I think the description of your solution in your original post was rather confused.

To me your "circular pulse transition list" needs to be a FIFO. If you only record transitions you need wide counters and FIFO to handle wide pulse spacing. A very long 1 bit FIFO is the opposite extreme.

The optimum will be run length encoding with a FIFO width of something in between (assuming there is no pattern to the pulse data which a compression scheme could exploit).

1 bit to indicate the data state and n bits of count of clock cycles at that state. For multiple streams you can add more data bits and use only one set of encoding/decoding logic and one larger FIFO.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Aug 15, 2007 7:40 PM

A good idea, but what if you want co-incident edges ? Perhaps a small phase-nudge field, that passes to the IO block, and allow sequential FIFO unloads, but coincident IO edges ?.

-jg

- J
- John_H
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Aug 15, 2007 10:52 PM

The multi-bit data approach supports coincident edges. An event at time t presents a data value of 4'b0010 for a single pulse active. The event at time t+delta is 4'b0101 resulting in 2 pulses going active the same time as the original pulse deasserts. No nudging required.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Aug 15, 2007 11:34 PM

Yes, you are correct, I was thinking of more compressed port-pointer storage, but if you store one-bit-per-port, then all ports can change. Becomes very like a dT logic anaylser in playback mode.

-jg

- M
- m
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Aug 16, 2007 5:41 PM

Depends on the pulse in question.

I've implemented five different variants utilizing different techniques. The first is a "brute force" BRAM-based approach. The other variants use a-priori knowledge of the pulse patterns to implement delays using counters, pulse lengths, etc.

It's a good problem. It is clear that low resource solutions are possible (and desirable) if pulses are relatively cyclic and this knowledge can be coded into the logic from the start. The case of a randomly changing pulse with a random number of transitions per unit of time is probably one that almost requires a BRAM buffer approach.

Thanks for your suggestions. I think I have a couple of low-resource solutions that work well now. Valuable BRAM resources have been preserved for the rest of the design.

-M

- P
- Peter Alfke
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Aug 16, 2007 5:50 PM