Optimizing an inferred counter

Hello everyone,

After banging our heads for last few weeks (sometimes literally), I figure I'll query the group of experts here. We have a design that is functionally correct (ModelSim test bench) but it appears to be very iffy when it gets on the real chip. I have a couple copies of "identical" boards with Virtex2-1000 chips on them. I'll check again soon, but I believe they are -6 parts. We've been synthesizing the design in Synplify Pro (v7.5 though others are available; this design has some history of working fine). Sometimes it works on one or more boards, other times after I load it (and verify) with iMPACT this counter acts screwy (it messes up critical timing, and it also looks all wrong on Chipscope). Today I got brave enough to load it into the EEPROM; it worked this afternoon but who knows tomorrow (grrr). Looking at the Synplify timing report (with -4 speed setting in Synplify), the timing is marginal for a specified clock of 100 MHz around this path, but the chip is really running at 66 MHz (PCI clock). The key code is very simple (some syntax may be a bit off since I'm doing it from memory). We tried trimming the size of the counter from 32 bits down to 20 and it seems to help some.

signal my_counter : std_logic_vector(COUNT_WIDTH-1 downto 0);

countdown_process: process (CLK) begin if rising_edge(CLK) then -- do everything synchronous if RESET = '1' then my_counter

Reply to
Marty Ryba
Loading thread data ...

Combine the two processes into one.

-- Mike Treseler

Reply to
Mike Treseler

I've been banging my head as well trying to improve poor legacy code to pass timing at 250Mhz, for the last month. Based on this experience, I'd suggest registering *everything*... well, as much as possible. Make sure your making_data is a flop, 'cos if it is combinatorial and based on my_counter, that's a recipe for failure.

HTH,

-P@

Reply to
PatC

I notice neither arith_std or numeric_std

Thats because ++ -- are nothing special, it still requires an adder with the the 1st input as the registered output of the adder, and the

2nd tied to +-1. An FPGA is just an array of LUTs, flip-flops and RAMs, not alot more (some FPGAs may have dedicated multipliers too).

As for the "making_data" becoming the 2nd adder input, Im surprised. It might be better if you can try and force it to synthesize "making_data" as the adder's register enable rather than the 2nd adder input, and then you can keep the 2nd adder input as a constant -1. As to how to do this, Im not sure. How about changing "my_counter" into an unsigned instead (or signed, makes no difference) using the numeric_std package (implementation is IEEE defined) instead of the std_logic_arith package (implementation is Vendor defined, and non- standard).

Reply to
Tricky

When you have symptoms like this, that suggest the real limit is lower than the tools report, have you tried variable clocking speeds, to check if at 10MHz or 1MHz, it DOES work properly ?

In a counter, you usually need to 'see' the state of the lower bits to decide when to toggle the upper bit - and in a FPGA the carry chain is often faster than other paths, so that makes adders a natural counter solution. Certainly easy to write.

For long counters, the carry pathway can limit the speed, then you can split it and make it more complex, but faster. Look at 74161 for a faster carry scheme.

-jg

Reply to
Jim Granville

What does the Xilinx timing report say? Have you constrained the clock correctly (or indeed at all :-)?

Synplify's report is an educated guess on the part of the tools. Xilinx's represents what they think the absolute worst-case is, so if it thinks you meet your timing constraints, then any chip you get will run that design. Of course, that depends on your constraints being right :-)

You say this design has a history of working fine - what's changed since then?

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html
Reply to
Martin Thompson

I
s
.
,
d
.
s

he

e

le

Just some ideas: Are all control signals synchronous to "CLK" ? Is the clock "clean"? What about supply voltage (DC-level, ripple, decoupling etc)? Could it be a board layout problem (insufficient ground plane, crosstalk)? If the long carry chain is the problem, you may divide the counter into 2 smaller counters with a pipelined carry chain.

/Peter

Reply to
Peter

^^^ What he said.

Syms.

Reply to
Symon

I
s
.

Which design are you referring to here that has some history of working fine? The PCB design or the FPGA design?

,

You might want to clarify what you mean by 'messes up critical timing' and 'looks all wrong'. I'm assuming here that the counter starts off correctly and just doesn't decrement properly which would lead one to suspecting problems related in some way to the signal 'making_data' but again you should clarify this.

In any case, the problem is one of the following (not in any particular order):

  1. Inadequate power supply. Check the Vcc at the chip with a high speed scope and good probing techniques, make sure that you're within spec. If only 'slightly' out that's not likely the cause of your symptoms but is still something that needs to be addressed.
  2. Timing. Are the signals 'RESET', 'counter_load' and 'making_data' all synchronized to 'CLK'? As I said, I'm not sure which symptoms you're exactly seeing but I'm guessing that it resets and initializes properly it's just not counting correctly in which case 'making_data' is the likely culprit.
  3. More timing. There is more to timing than just clock frequency. There are also setup/hold time requirements. Do 'counter_load' or 'making_data' come from external I/O pins? If so, then

- Do the signals on the board meet the timing requirements that you specified?

- You did specify a timing requirement on the inputs?

- Did the computed setup time from the P&R timing report (not Synplify's estimated timing) meet all requirements?

  1. Yet more timing. 'CLK' isn't a gated clock is it?
  2. Clock signal quality. Put a scope on the input clock. Is it absolutely monotonic through the entire Vih voltage range? Both edges? No dips and bounces anywhere between Vih(min) and Vih(max)?

Go through the above checklist and I'm fairly confident that you'll find the cause.

s

This is a symptom of failing timing, see items #2, 3 and 4 or double clocking , see item #5 above.

This is a process block that is clocked by 'CLK' I presume?

How about the inputs into that process block? The same sort of timing considerations mentioned previously apply here as well. Violating timing may cause 'making_data' to miss or double hit occasionally.

e

le

The RTL viewer is a graphical view of your SOURCE code, it is not a view of the final routed design. Have no fear, the adder that adds -1 and the muxer that selects the final output will get optomized appropriately.

Good luck

Kevin Jennings

Reply to
KJ

One thought: Are you using anything like a DCM or (since it's a Virtex) DLL to clean up the clock?

PCI clock can be stopped, and switched between 33 and 66 MHz, during a PC's boot sequence (I have watched this in a scope). This can confuse a DLL; you may need means to reset it after any such change; or use an alternative (constant frequency) clock.

- Brian

Reply to
Brian Drummond

boot

My experience agrees with the common wisdom that it's almost always clocks or power. But sometimes in mysterious, non-obvious ways :-)

Basic things to check on the timing side:

- how do you *know* that this is the process that's giving you trouble? I'm assuming that there's a lot more logic than that in a V2-1000 :-) Step back and ask yourself why you're so sure this is the culprit.

- do you know that the clock is OK inside the chip? Have you brought the clock out to a pin (or even a divide-by-two or by-four version, using a simple flop, *not* a DCM/DLL) and scoped it out?

- are any of the signals in the process generated or used by a different clock? If so, fix that first, and make sure any clock crossing logic is (1) designed right, and (2) laced with do_no_replicate and do_not_retime synthesis attributes (details differ by synthesis tool)

- Xilinx DLLs and DCM tend to exhibit peculiar behaviour in that their LOCK output can assert even though the output clock is completely unstable, or possibly just running at a harmonic, like half-rate. If you are using one of these devices, you need to manually implement your own "LOCK" output, in a frequency measurement module using a pair of counters, one from DCM and one from a known rock-solid (XTAL input) reference and verifying that they count up at the right rate relative to each other. Otherwise you need to keep hitting them with a reset. Then once everything else is stable, reset the rest of the chip. This problem can happen when input clocks are changed, as Brian mentioned, and can be made worse if you have a cascade of DLL's, depending on each other, producing a series of clocks.

- do you have enough synthesis constraints on the clock and any pin inputs that may drive signals going into the state machine?

- have you tried running a post-place-and-route timing annotated simulation to see if any timing errors show up there?

Good luck,

- Kenn

Reply to
kennheinrich

I
s
.
,
d
.
s

he

e

le

Marty, for the past 15 years, all Xilinx FPGAs have had a built-in ripple-carry structure which assures that a counter takes only one flip-lop per bit, up to 32 bits and beyond. If I were you, I would test this counter with a clean clock and adjust the clock frequency up until the counter fails. That should eliminate (or illuminate) the alleged frequency dependence. If your clock comes from PCI, then you should be very leary about running it through a DCM, which inherently does not tolerate abrupt frequency changes or excessive jitter. But the raw counter itself is very forgiving. Peter Alfke, Xilinx

Reply to
Peter Alfke

Mike is spot-on here. In my experience, the comparison(s) is (are) the problem, not the counter operation.

You need to be a little clever and predict when the counter will be zero (or whatever):

elsif making_data = '1' then my_counter

Reply to
ghelbig

her

More general, and from a hardware point of view: When the counter approaches zero, the most significant bits go to zero first, and it is only the least significant bits that determine the final count of zero. So most of the inputs to your big comparator of

32 bits are stable long before the zer0-count is reached. That means you can divide up, and even pipeline your 32-bit comparator. You might also be clever and detect underflow at the Most Significant bit, and avoid the detector AND gate alltogether (but after one additional clock pulse). Peter Alfke, Xilinx
Reply to
Peter Alfke

Really!? What's the point of the LOCKED output then? Do that flaw not make them a bit useless?

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html
Reply to
Martin Thompson

Technology

formatting link

There are input clock specifications for which the the DCM operation is guaranteed, including the operation of the locked signal. Violate those specs, and the DCM is no longer guaranteed to function "properly". This is no different than any other clocked electronic device, and is far from rendering the device/function "a bit useless".

Andy

Reply to
Andy

Technology

formatting link

Ahh, that makes more sense. I misunderstood the original statement to be a bit wider than that!

Thanks, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html
Reply to
Martin Thompson

That's kind of what I thought, too :-(

- Kenn

Reply to
kennheinrich

Technology

formatting link

I thought so, too :-( And in case anyone thinks I'm making this up, see for example Xilinx answer record #9451 (Virtex-2), containing the magic words:

"...the DLL will produce an unreliable lock signal and unreliable output clock. To recover from this condition, the DLL must be manually reset."

Or Answer record #30306 (Spartan):

"The LOCK output is to indicate when the DCM outputs are valid. In some cases it may not go LOW to indicate the DCM has lost lock."

..and I'm sure there are many others.

This is not an attempt to slag Xilinx, I'm just pointing out something to watch for. It's an instance of one of my pet peeves, that in

99.999 percent of cases, an analog "thing" (like locked-ness, or signal level, or sync pulse detection, or what have you) gets done wrong up (often by design oversimplifications) when translated into a digital "true/false" output. PLL lock signals are classic examples, as are the outputs of video sync separators.

This is digressing from pure VHDL, so I'll stop while I'm ahead.

- Kenn

Reply to
kennheinrich

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.