Implementing a very fast counterin VirtexII

Hi,

I'm trying to write some code for a 64 bit counter for a VirtexII.

The problem I'm facing is that it has to run at least at 200MHz, and therefore a simple "a = a + 1" doesn't work (Xilinx rate the 64b counter to 114MHz).

I've tried a split approach with four smaller counters and a selector depending on the carry out of the previous stages but it only got me to about

180MHz.

Did anyone ever had a similar problem and solved it ? Unfortunately I'm not familiar with a pipelined implementation, I'll be happy to learn one.

Many thanks, Erez.

Reply to
Erez Birenzwig
Loading thread data ...

"Counter" can mean many things. If you need a synchronous counter that gives you the updated value before the next count pulse comes in, that is a demanding design and may have timing problems at 200 MHz.

If, at the other extreme, you just need a counter that can resolve 200 ( or 500+ ) MHz, and you can wait some nanoseconds before you read the final count value, that is trivial. In the extreme case you would just concatenate 2-bit Johnson counters (at least at the input end), one slice clocking the next. And there are many variations on this theme. I built a 400 MHz frequency counter 5 years ago with XC4002XL...Playing around, aiming at 1 GHz now.

Peter Alfke, Xilinx Applications

Erez Birenzwig wrote:

Reply to
Peter Alfke

To be more precise the implementation requires the calculation of: a = a + 1

When a is a 64bit vector, every clock cycle at 200MHz, using a virtexII-6 FPGA.

Erez.

114MHz).

be

Reply to
Erez Birenzwig

Reply to
Peter Alfke

Then when you read the counter every clock cycle once every 64K counts you'll get a wrong result. I don't think it's good enough.. Remember the FMUL bug ?

Anyway I got a good answer from another list :

1) Build a fast 2-bit counter 2) Build a slow 62-bit counter, with enable 3) Use enable = q[1]&q[0] 4) latch the slow counter using the enable as well

You get a full 4 cycles for the carry to ripple through the upper 62 bits. Be careful in timing analysis. Some systems let you specify that the carry chain is a multi-cycle path. Other's force you to ignore these paths with falsepath commands.

(Thanks Bill for the help).

Erez.

virtexII-6

may

(

XC4002XL...Playing

and

selector

to

I'll

Reply to
Erez Birenzwig

example 64bit pipelined counter with two 32bit counter. (Please view in fixed-width font, e.g. Courier)

-- (+) : adder (count)

-- [ ] : dff

--

-- +-----------+

-- | |

-- 1--(+)-+-[31:0]-+---------[31:0]-----+

-- | |

-- +-[carry]-+ |

-- | |

-- | +------------+ |

-- | | | |

-- +-(+)---[63:32]-+---+---cnt(63:0)

ps_cnt_pipeline : process(i_rst_an, i_clk) is constant C_CNT_WIDTH2 : positive := G_CNT_WIDTH / 2; variable v_cnt1_d : unsigned(C_CNT_WIDTH2 downto 0); variable v_cnt1_q1 : unsigned(C_CNT_WIDTH2-1 downto 0); variable v_cnt1_q2 : unsigned(C_CNT_WIDTH2-1 downto 0); variable v_carry_q1 : unsigned(0 downto 0); variable v_cnt2_q2 : unsigned(G_CNT_WIDTH-C_CNT_WIDTH2-1 downto 0); begin if i_rst_an = '0' then v_carry_q1 := (others => '0'); v_cnt1_q1 := (others => '0'); v_cnt1_q2 := (others => '0'); v_cnt2_q2 := (others => '0'); elsif rising_edge(i_clk) then v_cnt1_d := ('0' & v_cnt1_q1) + 1; v_cnt1_q2 := v_cnt1_q1; v_cnt2_q2 := v_cnt2_q2 + v_carry_q1; v_carry_q1(0) := v_cnt1_d(C_CNT_WIDTH2); v_cnt1_q1 := v_cnt1_d(C_CNT_WIDTH2-1 downto 0); end if; o_cnt_q Many thanks,

Reply to
fe

How do you use the counter? Do you read current counter values or are you only interested when it reached zero?

Göran

Erez Birenzwig wrote:

Reply to
Goran Bilski

:> > > >

:> > > > Many thanks, :> > > > Erez.

Argh.

Why do so many people fullquote? Is it so hard to delete non-needed parts of the posting they refer to?

Fullquoting spoils the archives and make them harder to use. So people don't search the archive but instead ask the same questions all over. That makes the newsgroup hard to use.

Bye

--
Uwe Bonnes                bon@elektron.ikp.physik.tu-darmstadt.de

Institut fuer Kernphysik  Schlossgartenstrasse 9  64289 Darmstadt
--------- Tel. 06151 162516 -------- Fax. 06151 164321 ----------
Reply to
Uwe Bonnes

Every clock cycle? Not gated/enabled or cleared?

You split the counter into halves earlier, correct? Pre-decode the terminal count from the fast counter so you have a synchronous TC signal when the

32'hffff_ffff arrives (or 8'hff or whatever size you choose). With the synchronous TC fed to the enable of the top, slow counter you can achieve great speeds.

Only a little more coordination is needed for proper enables, clears, or loads, but the design should be easy to achieve once you understand the "tricks" that will help you "retime" your synchronous designs.

Reply to
John_H

Reply to
Peter Alfke

Peter Alfke wrote:

I think item 4) was to cover capture of the counter at any instant, and to cover the carry ripple. I'm with Peter in questioning 4). Carry ripple is certainly long, but this is on the .D side, and determines the NEXT Clock delay. However any Capture is on the .Q side, and all Q's will be fully sync (no ripple adders ) Capture of both the prescaler, and long counter, can be clock syncronous, and does not need any enables.

Capture and Clear (can be more application usefull), can be done in a single clock with a little more .D side logic.

Capture of fractional clocks, to push the time resolve better than

1/clock speed, is challenging, but looks possible in modern FPGA.

-jg

Reply to
Jim Granville

4) is to cover the fact that once you enable the +1 on the long carry chain you can't sample it at the next clock cycle (It won't be ready by then), so you have to latch the previous value which is the new count value.

The problem is that I must be able to sample the counter on every arbitary clock cycle therefore it must be glitch free.

The counter that I need doesn't require a clear but thanks for the thought, it needs a load though.

Reply to
Erez Birenzwig

you have

We may be differing in topology A +1 is normally done on the register INPUT side (.D), not on the register OUTPUT (.Q) side. On a FPGA, you use carry logic/ +1 maths, on a CPLD, you use wide-AND and toggle flipflops.

So, it does not matter if the long carry chain results are not ready (and you are right, it will not be ready ), because the latch sample is taken from the .Q, whilst the carry results drive the .D You DO need to enable the counter clock, as that requires a fully settled summation result.

-jg

Reply to
Jim Granville

Reply to
Peter Alfke

chain

so you have

Sorry, you're right on the spot there. My mistake.

Reply to
Erez Birenzwig

"Erez Birenzwig" escribió en el mensaje news:x_bsb.1057$% snipped-for-privacy@news.xtra.co.nz...

chain

It wouldn't be easier to always latch the low part? That is, you build a circuit with a latency of 1 cycle to load a, and then you have new a+1 result on every cycle.

Something like this:

process (...) begin if rising_edge(clk) then a_low >

Reply to
Francisco Rodriguez

Reply to
Peter Alfke

"Erez Birenzwig" skrev i melding news:9eUrb.2$% snipped-for-privacy@news.xtra.co.nz...

Just a thought: Run a 63 bit at clk/2 (100Mhz). reclock the 63 bit result at 200Mhz with a clock enable on the second 200MHz clock in the 100MHz domain. Use a logic accessible 100Mhz clock as bit0. Should work, shouldn't it?

Reply to
Morten Leikvoll

Why are we re-inventing complicated solutions to a simple problem? A perfectly synchronous counter solution has been described, using a

2-(or3-)bit front end driving the enable for the rest of the bits. If you want a fast counter (not incrementer!) then there is nothing to be improved or fixed. It's the best solution!

Peter Alfke

Reply to
Peter Alfke

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.