Post-map simulation: timing violation and delays

S

sdaau 15 years ago

Hi all,

I am trying to implement a custom counter (with clock and enable inputs); synthesis and behavioral & post-translate simulation pass just fine (using ISE WebPack 13.2). On post-map simulation, I get this:

at 271179 ps(5), Instance /my_counter_test/UUT/c_0/ : Warning: /X_FF SETUP High VIOLATION ON CE WITH RESPECT TO CLK;

Expected := 0.428 ns; Observed := 0.144 ns; At : 271.179 ns

.. as well as X values in my output. I am already aware that I can avoid the X's by doing `INST "c_0" ASYNC_REG = TRUE;` in the constraints .ucf file; but that simply gets rid of the X's (in which case I do get correct values) - however, I'd like to tackle the timing violation.

I was looking for a while into this, and I interpret it like so: the variable c that I have in my counter code, has been converted by synthesis process into (at least) one flipflop for each 'bit', c_0 being the FF corresponding to bit 0 (of variable c). After some searching, I found that this FF has its own clk and CE inputs - the relationship between the these signals, and the 'master' clock and enable is shown in the below screenshot from isim:

formatting link

So, I can see that:

wclk and wenbl are the 'master' signals, and they are synchronous (they both rise at exactly the same time)

The delay between wclk(wenbl) and c_0.ce is some 1.035 ns
The delay between wclk(wenbl) and c_0.clk is some 1.179 ns
(Thus, the delay between is c_0.ce and c_0.clk is 0.144 ns)

.. and so, I gather, the violation tells us that c_0.ce must be high for at least 0.428 ns before c_0.clk goes high (i.e. the setup time); however that state lasted for only 0.144 ns in the simulation, so the simulator complains.

Now, the most obvious thing would be to insert a delay of at least

0.428-0.144= 0.284 ns between c_0.clk and c_0.ce (or between c_0.clk and wclk), and I guess then the timing violation would be gone, is that correct?

However, the problem is that I would not want to move the first clk after enable in the next period using the state machine - and I have no idea how to otherwise implement such a delay of ~ 0.3 ns.

I was thinking that timing constraints in the .ucf file would help, and I was experimenting with some `OFFSET = IN 8 ns VALID 6 ns BEFORE "clk" RISING;` - and while this helps with Timing Analysis report errors, there is no change in the simulator. Then again, here I want to *increase* the (minimum) delay - and as far as I can tell, timing constraints in the .ucf file serve to limit ("decrease") the (maximum) delay. If that is so, then .ucf file constraints cannot help much with introducing delay, I guess...

So I was wandering - what would be the appropriate method to handle these timing violations? And have I understood the above situation correctly?

Thanks in advance for any answers, Cheers!

--------------------------------------- Posted through

formatting link

Vote

S

sdaau 15 years ago

Hi again,

how

Well, I remembered the old "two inverters in cascade = delay"; so I decided to try that:

... -- simulate delay for enable with two inverters ATTRIBUTE keep : STRING; SIGNAL wi1_enbl, wi2_enbl: STD_LOGIC := 'Z'; ATTRIBUTE keep of wi1_enbl, wi2_enbl: SIGNAL IS "true" ; begin wi1_enbl

Vote

G

glen herrmannsfeldt 15 years ago

(snip)

I presume you forced it to keep the inverters, otherwise they will usually optimize away. You might try with only one forced, in which case it will optmize the other by inverting the signal somewhere else. Or with a forced non-inverting gate. (snip)

-- glen

Vote

K

KJ 15 years ago

g

P

d

No, the most obvious thing would be to check your testbench and validate that your inputs meet the timing requirements because that's where the problem likely lies.

No it is not correct...unless you're only interested in covering up the problem and pushing it down the road to be fixed later.

w

In FPGAs, you can't implement controlled time delays. Delay lines are not a primitive element in the device.

Timing constraints should have already been specified, but if you haven't done so yet, then yes you should specify them.

I'm guessing based on what you described from the error message to signals in your design that you may understand the failing path, but what you're not understanding is what really needs to be fixed. The problem could very likely be in your testbench rather than the design but below I've listed the basic steps you need to follow:

Did you enter setup time constraints for all inputs? Did you setup clock to output delay time constraints for all outputs? (Note: For your particular problem, the cause is likely on the 'input' side)
What is the basis for the time constraints in #1? The correct answer to this question is the datashee(s) of any device(s) that are connected to the FPGA.
Are you sure you used the datasheet(s) timing constraints properly? Setup time (Tsu) for the FPGA will be clock to output (Tco) of the external device less any clock skew (Tskew) of the clock (period T). In other words, the UCF file needs to specify a setup time constraint of Tsu =3D T - Tskew - Tco. Repeat for each input. Do a similar procedure for FPGA outputs.
Did the FPGA's timing report state that it meets all timing constraints? The correct answer here is 'yes'. If not, iterate #1-4 until you have the correct answers to each question.

On the assumption that you've properly made it through #1-4 (and assuming that there are no clock domain crossings), then your design is OK. Since the design is OK, this implies the result of a timing failure must be the testbench. The basic triage here is:

Verify that the inputs to the FPGA meet the requirements listed in the FPGA's timing report output. As an example, if you have some input that is generated synchronously, like this... Some_Inp

Vote

J

jt_eaton 14 years ago

Nothing in post route rises at exactly the same time. Are these input signals driven from your testbench? If so you need to spec a hold time from wclk->wenable and change your testbench to add this.

Clock enables are derived from the clock so they will have a clk->Q delay that gives them hold time. The easiest way to model this is to resync the wenable to the falling edge of wclk.

The scary thing is that I think your simulation is catching the enable on the same wclk that creates the wenable. If thats so then everything is happening one cycle before it should. In real life if a clk creates an enable then the enabled act occurs on the next clock.

John

--------------------------------------- Posted through

formatting link

Vote

S

sdaau 14 years ago

Hi all,

First of all, thank you all for the very prompt responses, and sorry I couldn't respond earlier. I think the crux of the matter is summed up in @jt_eaton's comment:

.. but I believe I should try to explain a bit, what it is I'm looking after. A bit of a mammoth post follows - apologies in advance.

For one, I have only partial knowledge of HDL, but so far I manage somehow. My biggest problem is, basically, that when I start coding, usually I end up confused in the "things happening in the next clock cycle" thing.

From my sequential programming background, say when I see "a=2;" in C; I read that as: "_after the program counter passes this statement, a holds value 2_" ... I try to relate that to HDL as in "_after the simulator passes this posedge, a holds value 2_" - so when I code stuff with this expectation, and I see 'action on next cycle' in simulator, I get confused thoroughly. Then I do all my best to defeat that in behavioral simulation - and usually I manage; then I come to post-map sim, and I realize most of that does NOT really work.

So, I decided to study this a bit on a simpler example; for instance, for a chip interface, I'll need a clocked counter with enable and reset. The concept would be simple: when enable high, do increase count on clock posedge; on reset high, do not increase count and set count to 0. For instance, that is exactly the kind of device which is given here:

formatting link

I modified that code a bit (counter_aw.vhd), and used my own testbench (test_twb.vhd), which I put here (along with some screenshots I'll refer to):

formatting link

Clock is 50 Mhz (period 20 ns). The "Counter_Design_Block" is architecture 'behav' in the 'counter_aw.vhd' file (uncommented). This one works under behavioral simulation as I expect it to (aw_orig_beh_sim.png); that is, reset of counter to 0 and its increase happen at the posedges I expect. Same results are for post-translate simulation (aw_orig_post-trans_sim.png)

- however, post place and route sim (post-par_sim_delayed.png) is 'delayed'

- e.g. from posedge of enable, to when cout becomes high, is like 30 ns (~

3 clock semiperiods); however that is not the same delay throughout the sim run!

Since I encountered this before, I tried to code "my own" counter (architecture my_starting_point, commented), and I immediately made some mistakes - first, the final assignment to the output port was within an 'IF', so even behavioral simulation showed everything delayed to next clock cycle (aw_startp_beh_sim_delayed.png); after fixing that, this counter behaves more-less the same as the previous example (aw_startp_beh_sim_ok.png) - but the problem with it, is that it is not synthesizable (as far as I can see, the problem is using rising_edge twice on different signals in the same process).

So, after solving that, I basically ended up with the problem described in the original post - unfortunately, I cannot reconstruct the conditions with the X's (that appear approx 4 ns after rise of wclk) that I got in the original post (then again, that day my PC did crash a couple of times, so maybe that had something to do with problems with memory for ise or isim?). Then I got to the inverter thing, removed some of the timing violations with it; and found that to avoid the final timing violations, 'reset' internally would have to be effectuated 'first', 'enable' second and the 'clk' last - so I delayed the clk twice (four inverters), and enable once - and I got to architecture my_ending_point (commented).

With my_ending_point code, the behavioral simulation (aw_endp_beh_sim_delayed_no-ucf.png) seems fine, except that the very first count after enable happens in "next" clock cycle -- however, post-par sim (aw_endp_post-par_sim_delayed_ucf.png) shows that, in addition, there are glitches - and there is almost 10 ns delay (the 'effectuation' of the count happens almost on clk negedge)!! For the post-map sim (post-map_sim_delayed_ucf.png) this delay seems to be less (though still 5 < x < 10 ns) , but glitches are still there.

While I'm at the glitches, "Xilinx Synthesis and Simulation Design Guide" notes:

When it says "Xilinx *models*", does it mean that the glitches will be there present "by design" of the HDL code circuit - or is it something the simulator introduces? Meaning, should I try to eliminate them through design, or should I just be careful if they "propagate"? Then again - I wasn't really aware of this until now - I was reading a bit more on this, and turns out from basics, that minimal configuration of synchronous (as in combinatorial/unclocked) circuits (Mealy/Moore ?!) are *by default* glitchy, and one is advised to "buffer" the result with a (clocked) FF - which results with the actual 'effectuation' occurring on next clock cycle; so maybe the glitches in the sim just try to illustrate this effect?

Anyways - I'm sure in my initial code I used to get somewhat less than 5 ns delay for post-map (which is why I'm surprised slightly at the above results), but I can't reconstruct that anymore. Which, of course, means I haven't done something right :) I guess my question would be down to - what am I missing, so that I can get somewhat like the aw_orig_beh_sim.png results in post-par sim, but delayed by no more than quarter period? That, for me, would be a confirmation that the engine should more or less work reliably on the chip as well - but is that a correct assumption? (if not, then I probably shouldn't bother getting so "ideal" post-map/par results, ideal as in "results almost like behavioral sim").

I've tried putting in some timing constraints (aw_endp_counter.ucf), while trying to get rid of static timing and ise warnings as well (synthesizer doesn't like outputs of combinatorial logic [due to use of inverters] to be used as clock) - but I'm not really sure what I'm doing; since as far as I can remember, changing the constraint values didn't really result with much difference in post-map/par simulation.

Well, I guess this is as detailed as I can formulate my problem for now ...

Interesting trick about keeping only one forced - I just used "attribute KEEP" on all of the involved signals, that seems to have worked..

Got it - thanks to this comment, I started looking into timing constraints as ISE understands them (in .ucf file), but I still cannot get a proper understanding of those..

Got that too - but could one consider two inverters to behave as a somewhat controlled delay (as in, the actual delay obtained by them is dependent on how they end up being routed - but we can still now they'll insert, say, approx 0.4 ns?)

Exactly - this is 100% correct :)

That could indeed be the problem - @jt_eaton seems to agree ...

Thanks for taking the time to write those up, @KJ, much appreciated!

I didn't at first; then I tried, but as I said, I'm not sure I understand it. For instance, i have:

OFFSET = IN 6 ns VALID 8 ns BEFORE "clk" RISING;

ISE draws a sort of a diagram, and the way I interpret the diagram, the above sentence should mean "do not allow that a data signal synchronous with rising edge of CLK, propagates outside of 2 < x < 4 ns range"; which is likely not correct, since I couldn't perceive anything to that effect in simulation results.

Well, I have the wrong answer, unfortunately :/ Essentially, I saw the above timing violations, and simply tried to 'translate' them to timing constraints (as I understood them above) - that probably was not the right way to do it. Other than that, I'm running clock @50 MHz, so I tried to make the testbench for that - and to make the timing constraints relate to

100 MHz clock (as in - "if it works @100, it will work for 50 MHz too"); the device I'm intending to use this with counter with, however, may require a much slower counter (kHz).

Thanks for this - I'll need to chew on this a bit more, I wasn't aware of the "setup time constraint".

Thanks for this too - I found the Implement Design/Map/"Analyze Post-Map Static Timing"; at first it was complaining (showed red X's), then I got it to stop (but for the most part, I was just trying different numbers around based on the messages I got, not sure what I actually did there :) )

Actually, now that I come back to it, I can see a fail:

--------------------------------------------------------------------------------

I guess from this, if I put OFFSET = OUT 6.2 ns, it will pass? Or is there another way to force the synthesizer to conform to 5 ns?

Talking about clock domain crossings - would inverting a clock four time, and "declaring" that signal as clock as well, constitute clock domain crossing?

Many thanks for writing this up as well :)

Thanks for this (emphasis mine) - as it can be seen in test_twb.vhd (from link above), what I do is simply:

... wenbl 2. Although not relevant to your current problem, one would also want

Would this be related to glitches too? I.e. if glitches occur close to posedge sampling clock transition, I may want to 'buffer' the output, until the next negedge for instance?

Could it be, that the synthesizer recognizes the "twice inverted" clock signal as a clock from a second domain?

Would that resynchronization be like the 'buffering' for the minimal Moore/Mealy glitching mentioned above? If so, then it would 'delay' the 'effectuation' of values until next clock cycle, right?

(they

Thanks for that - I guess now, I'm better aware of that; but when the thread started I wasn't. Can this also be interpreted as: "Nothing in post route should rise at exactly the same time" (as far as signals from the testbench are concerned)?

Yup.

Many thanks for that - see, *that* I wasn't aware of ... Will have to look that up.

delay

Ok, that makes sense - much appreciated :)

Makes a lot of sense now - will give it a shot. I know the answer is probably yes - but in that case, do I again have to worry about timing constraints?

on

I think that is correct - actually, it seems it does perceive some delay between the wenable and the wclk, but (I guess) not enough.

Thanks for that - the occurring on "next clock" was exactly what I wanted to avoid; and it seems, with all the "inverter delays" and such, what I managed to do is move everything to happen "one cycle before it should" :)

In any case, to sum up - while I'm starting to see why "update on next clock" is so important - is it still possible (or smart) to aim for updates occurring at least earlier than a semiperiod *before* the 'next' clock (and this is simply for my own perceptual ease in reading simulation results: then it would be easier for me to read, if I get the value I expect in

*this* cycle)?

Thanks again for the awesome guidance, Cheers!

--------------------------------------- Posted through

formatting link

Vote

S

sdaau 14 years ago

Hi all,

Just a followup to the previous post, as it seems I got some kind of a closure:

I have now uploaded counter_aw2.vhd, test_twb2.vhd and aw2.ucf on the same location (no changes in previous files).

First of all, I got a hint at "Buffer_type BUFG ignored? #8"

formatting link

about synthesis of asynchronous vs synchronous reset counter. So I decided to take a look at the synchronous version from the start (counter_aw2.vhd, uncommented part).

Again behavioral sim was I expected it to be, and post-map sim showed X's and timing violation. Now, the thing is that I wrote the testbench more-less randomly, just tossing arbitrary signal changes and WAIT delays, just to see 'in general' how the resulting circuit would behave. So, I had ended up enforcing 'clk' and 'enable' testbench signals to change at the

*same* moment in time. That may be good enough for behavioral sim, but not for post-map - and to confirm the previous posts, this is the essence of the problem.

So I just inserted a 'WAIT for 1 ns' delay in the testbench (test_twb2.vhd), and looped the rest of the signals - and there were no more timing violations in post-map. (noting that if there is no such explicit loop, the whole process will - loop moving the phase for 1ns each iteration; and eventually causing the phase between 'enable' and 'clk' to be again zero, thus causing periodic timing violations). Then I tried delaying for PERIOD-1ns (just to have the enable rise just before clk) - and that worked fine as well.

Now, my guess is that, if I was working with external clk and enable, I couldn't just delay the signals for as many nanoseconds as I please, just to avoid timing violations - so I'd have to work according to some spec. However, in my case, the only thing external signal is the clock, and enable and reset would be calculated from it - hence there will be some inherent delay between clk and enable; and that would further limit the usage of the counter. While talking of 'limitations': expanding the data out in bit lines and zooming in (in isim) will reveal that the glitches are due to different propagation times of individual bit changes (so they occur only between particular value transitions) - so, in fact, nothing strange there (as I first thought) :)

Measuring (in isim) the time between the clk posedge and change of data (cout) will reveal about 5.5 ns delay. Just for the heck of it, I tried to limit that with a timing constraint in the .ucf file:

... TIMEGRP "couts" OFFSET = OUT 4 ns AFTER COMP "clk";

.. and immediately after, post-map static analysis failed:

-- Timing constraint: TIMEGRP "couts" OFFSET = OUT 4 ns AFTER COMP "clk";

-- 16 paths analyzed, 16 endpoints analyzed, 16 failing endpoints

-- 16 timing errors detected.

-- Minimum allowable offset is 5.878ns.

So, I guess this tells me: if I sample the cout 6 ns after clk posedge, I should have safe cout data; so for the testbench clock @50Hz, I could initiate count at clk posedge, and consider to have the right data @ next negedge, 10 ns after. However, even with a separate process:

process(clk) begin if falling_edge(clk) then cout different delays -> glitch. So the min

6ns wait would be needed in respect to 'when does the rest of the engine read this data' (rather than, 'when to read for a buffer' to avoid seeing glitches altogether).

In the end, even if I could somehow mask the glitches and avoid seeing them, metastability is still inherent

formatting link

in reality; so I guess this is as good as it gets in post-map sim (given that my testbench is 'arbitrarily written'; and the only constraint I have in the aw2.ucf is clock @100 MHz).

And, of course: getting rid of the X's and timing violations in post-map sim, doesn't mean that post-route sim will be just as well behaving :) But at least I have some sort of understanding from a simple example to go along with, when tackling that - thank you all for the help!

Cheers!

--------------------------------------- Posted through

formatting link

Vote

Post-map simulation: timing violation and delays

Join the Discussion

Didn't find your answer?