Completely puzzled: Strange shift register behaviour

O

o pere o 14 years ago

Hi,

I am not an FPGA expert although this is not my first design. The problem that I am having for two days now, is that I am observing different results when simulating a design in GHDL and in Modelsim ALTERA starter edition.

The design includes a shift register. The significant code is:

read_sro : process(s_sample) begin if rising_edge(s_sample) then if ... --irrelevant here end if; if ... end if; if sr_burst_ena = '1' then --First phase: store data in shift_reg new_data

Vote

K

KJ 14 years ago

shift_reg

o 1);

ata

o 1);

shift_ena =3D'1' then

=20

es=20

=20

By generating an internal clock signal you're not following synchronous des= ign practice. In an FPGA design that spells trouble. What you need to do = to make it synchronous is:

- Change the 'sr_signals' process that currently generates 's_sample'. Wha= t you'll want is for 's_sample' to be a one clock cycle pulse at the approp= riate time. Modify your if statement to accomplish this.

- Modify the 'read_src' process to use the free running clock and 's_sample= ' like this

read_sro : process(clk)=20 begin=20 if rising_edge(clk) then if (s_sample =3D '1') then -- Put all of the code you have that was previously within -- the "if rising_edge(s_sample) then" statement here end if; end if; end process read_src;

The reason that generated clocks are bad news in an FPGA is that you have e= ssentially zero control over signal skew. A signal generated in one clock = domain can, depending on the routing, make it over into the other clock dom= ain even before the generated clock gets there, or perhaps after. In your = case, you really have no way to say what skew there is between 'clk' and 's= _sample'. The signals 'sr_burst_ena' and 'dec_shift_ena' are used in both = of your processes and yet each can only be valid in one clock domain (your = posted code does not show those signal assignments so it is not clear in wh= ich domain they have been generated, but I suspect it is in 'clk').

Kevin Jennings

Vote

O

o pere o 14 years ago

shift_reg

='1' then

practice. In an FPGA design that spells trouble. What you need to do to make it synchronous is:

you'll want is for 's_sample' to be a one clock cycle pulse at the appropriate time. Modify your if statement to accomplish this.

like this

essentially zero control over signal skew. A signal generated in one clock domain can, depending on the routing, make it over into the other clock domain even before the generated clock gets there, or perhaps after. In your case, you really have no way to say what skew there is between 'clk' and 's_sample'. The signals 'sr_burst_ena' and 'dec_shift_ena' are used in both of your processes and yet each can only be valid in one clock domain (your posted code does not show those signal assignments so it is not clear in which domain they have been generated, but I suspect it is in 'clk').

Kevin

Thanks for your input!

I know that this does not result in a synchronous design, and I was already thinking of rewriting it with clock enables, as you suggested.

However, this was done because, in this case, there is no signal generated in the "clk" domain that is read by "s_sample". Actually "s_sample" is only used to read some signals from the outside world and the shift register more or less tries to correlate what is read in the present cycle with the previous ones.

Is it correct to say that, in this case, "it doesn't matter"?

Now, completely agreeing with your observation, I am still puzzled by the fact that the even the RTL simulation (which I assume does not take any delays into account) is not working properly. If you have a look at the simulation results, this is absolutely not a shift register!

It seems that Altera's synthesis tool infers something quite different... but why??

Pere

Vote

G

glen herrmannsfeldt 14 years ago

(snip)

You don't show where dec_shift_ena comes from.

(snip)

There are designs where the HDL code can simulate differently from the usual synthesis. In that case, one could expect differences from different simulators.

Unless you are doing post-route simulation, there are no delays in use, but some signals are known to come after others.

The output of a FF will change after, never at the same time, as the input.

You can't make any assumption, though, on the outputs of two FFs with the same clock input. In simulation, one may be considered before or after the other. In actual logic, with actual wire delays, the result can be a race condition, where the result is uncertain.

I didn't see in the logic where that happened, but maybe in the part that wasn't shown. Especially dec_shift_ena.

-- glen

Vote

R

rickman 14 years ago

in shift_reg

nto 1);

e data

nto 1);

c_shift_ena =3D'1' then

d

th

gives

t

n

design practice. In an FPGA design that spells trouble. What you need to= do to make it synchronous is:

What you'll want is for 's_sample' to be a one clock cycle pulse at the ap= propriate time. Modify your if statement to accomplish this.

mple' like this

ve essentially zero control over signal skew. A signal generated in one cl= ock domain can, depending on the routing, make it over into the other clock= domain even before the generated clock gets there, or perhaps after. In y= our case, you really have no way to say what skew there is between 'clk' an= d 's_sample'. The signals 'sr_burst_ena' and 'dec_shift_ena' are used in b= oth of your processes and yet each can only be valid in one clock domain (y= our posted code does not show those signal assignments so it is not clear i= n which domain they have been generated, but I suspect it is in 'clk').

Your design is synchronous, but you have more than one clock, so you need to be sure of the timing of signals that cross the clock domains. For example, the signals sr_clear, sr_burst_ena and dec_shift_ena seem to be used in both the clk and the s_sample clock domains without being synchonized. I guess that might be ok if you can make sure the delays in the s_sample clock don't result in one of these signals changing on the edge of either clock.

If the simulation is not what you want, then there is likely something wrong with the code. But remember that while the simulation does not account for logic delays, there are delta delays that mean signals clocked by clk will have already changed by the time the remaining code sees the change in s_sample.

Rick

Vote

O

o pere o 14 years ago

Sorry for the omission!

dec_shift_ena is generated as follows. First, I have a counter the is reset periodically:

clock_counter: process(clk) --counter counting over the symbol length begin if rising_edge(clk) then if tc_symbol = '1' then clk_count The GHDL simulation shows up ok, with the register correctly loaded with

I get the same results with RTL and with post-route simulation (except for minor delays, as expected). I guess Altera's synthesis tool is doing something strange at exactly the half of the shif register new_data: The top half shifts correctly, but the bottom half looks as if it was inverted. I can't figure out how a timing issue could have this outcome, but I am probably missing something obvious -and hopefully someone can point this out :) !!

Pere

Vote

O

o pere o 14 years ago

shift_reg

dec_shift_ena ='1' then

design practice. In an FPGA design that spells trouble. What you need to do to make it synchronous is:

you'll want is for 's_sample' to be a one clock cycle pulse at the appropriate time. Modify your if statement to accomplish this.

like this

essentially zero control over signal skew. A signal generated in one clock domain can, depending on the routing, make it over into the other clock domain even before the generated clock gets there, or perhaps after. In your case, you really have no way to say what skew there is between 'clk' and 's_sample'. The signals 'sr_burst_ena' and 'dec_shift_ena' are used in both of your processes and yet each can only be valid in one clock domain (your posted code does not show those signal assignments so it is not clear in which domain they have been generated, but I suspect it is in 'clk').

Hi, Rick

the signals you are mentioning are being updated in the same process(clk) that creates s_sample. They are only read in the process(s_sample) as shown. However, there are no timing issues with this showing up in the post-route simulation (in hindsight, perhaps I was too lucky with this).

Otoh, the main point is that one simulation does what I want (and corresponds with the intended behaviour: the GHDL result) but the other one does not! GHDL simulates the semantics of VHDL, and Simulink simulates what Quartus' synthesis has inferred -which does not seem to be the same!

Pere

Vote

G

glen herrmannsfeldt 14 years ago

(snip, I wrote)

I read verilog much better, but this sounds suspicious. Why does it need resynchronizing?

-- glen

Vote

O

o pere o 14 years ago

Being a combinational assignment, it is subject to glitches. The instants where they happen seemed to upset the post-route simulation, with complaints of hold time to short (iirc). BTW, the timing analyser did not complain at all. Making them a registered signal (and pre-compensating the one-cycle delay) cleared glitches and moved the transitions so that simulation goes ok.

Pere

Vote

R

rickman 14 years ago

ase: store data in shift_reg

(c_burst_no-1 downto 1);

phase: rotate data

(c_burst_no-1 downto 1);

ss

h

' or dec_shift_ena =3D'1' then

and

of

with

s gives

But

an

us design practice. =A0In an FPGA design that spells trouble. =A0What you n= eed to do to make it synchronous is:

. =A0What you'll want is for 's_sample' to be a one clock cycle pulse at th= e appropriate time. =A0Modify your if statement to accomplish this.

sample' like this

iously within

ent here

have essentially zero control over signal skew. =A0A signal generated in on= e clock domain can, depending on the routing, make it over into the other c= lock domain even before the generated clock gets there, or perhaps after. = =A0In your case, you really have no way to say what skew there is between '= clk' and 's_sample'. =A0The signals 'sr_burst_ena' and 'dec_shift_ena' are = used in both of your processes and yet each can only be valid in one clock = domain (your posted code does not show those signal assignments so it is no= t clear in which domain they have been generated, but I suspect it is in 'c= lk').

d

e

t

I don't know why you are expecting to see your problems or solutions in any simulation. A post route simulation uses typical timings and can't be expected to be 100% realistic, even a chip can't be expected to be the same as another chip. That is why they use synchronous concepts, meet the cycle timing and all other timing issues are handled by the chip qualification and the design tools.

If the signal is generated in the clk domain and s_sample is generated in the clk domain, you can't predict which delay will be longer and you have a race condition between all the signals and the s_sample clock. One thing that would help is to generate the signals on one edge of clk and generate s_sample on the other edge of clk. Then you know the timing on a course level and can constrain the timing of paths from the clk to the s_sample clock domain... or just use an enable instead of generating a new clock.

If you don't believe the timing is a problem, draw a timing diagram showing *all* delays. You will see the routing delay of s_sample can be giving you trouble.

Rick

Vote

R

rickman 14 years ago

There is a problem. dec_shift_ena was transitioning in the setup/hold time window of s_sample because the logic/routing delays of the two paths were too close to equal. The signal needs to be synced, but not to the clk domain, to the s_sample domain! Better to change the timing altogether by generating s_sample on the opposite edge of clk... or like I said above, use an enable rather than a new clock.

Rick

Vote

K

KJ 14 years ago

I pointed out exactly these points (use a clock enable and that he was using two signals in two different clock domains) in the very first reply to his post...maybe it will click this time.

KJ

Vote

O

o pere o 14 years ago

Thanks to everybody for your inputs. Making the whole design work on only one clock (with suitable enables) solves the issue. This is really not a surprise.

However, I still have some questions on this. If I built my design from, lets say, discrete components, I might have problems with setup and hold times at the shift register (SR) input, but I would NEVER EVER get the results that showed up in the simulation:

The SR was initially cleared. The serial input to the SR is zero at the first clock but, surprisingly, a one appears at the 9th (!!) SR output after the first edge of the clock. Can anybody find a plausible explanation for this?

The second question is, is there a way to tell Quartus that it should synthesize something that resembles the physical circuit? I did tell Quartus that s_sample was a clock (and told the frequency), however this was obviously not sufficient. Any thoughts?

Pere

Vote

R

rickman 14 years ago

n

h

I can't answer question one, but the answer is in the code and the simulation. If necessary you will need to step through the code that assigns a value to the SR, but the answer will be there. The real point is that it doesn't matter really. The simulation told you something was wrong and you found the problem. Is it important to understand how the error was generated?

The second question is easy. You don't tell the tool what the "physical circuit" is really. The tool tells you! Telling Quartus that your clock is a clock shouldn't be needed as it can figure that out because the signal is clocking FFs. There was no problem with the tool. The problem was exactly as KJ described in his first post, once you create a clock from logic, you can't control the skew between the two clocks and so lose control over setup and hold times. The tool also looses control over this as it depends on things the tool can't control or even know about, the exact detailed timing of the chip.

In summary, generating clocks is bad ju-ju. Don't cross the beams.

Rick

Vote

O

o pere o 14 years ago

Well, I am now just curious to know what happened to cause this behavior. Most of us learn much more from errors than from blindly following the rules. And I already looked at the code (actually I posted the lines that update de SR) and I also looked at the synthesized circuit. Nothing special there!

I can follow this reasoning. However, the result of this should be a setup or hold time violation signaled by the simulator, shouldn't it? But the simulator did NOT show any of them. Instead, a completely bogus behavior was observed. Nothing special was reported except the result being completely wrong.

Ok. I will highlight this dogma once more ;)

Thank you for your time!

Pere

Vote

K

KJ 14 years ago

=20

Yes you could. If you built a circuit with exactly the timing delays that = were the result of simulation and did so with the same type of components u= sed in the FPGA implementation (i.e. LUTs and FFs). It's probably not very= likely that anyone could actually physically build such an accurate model = out of discrete parts, but the point is that the simulator is not lying to = you. It simulated a model that was provided to it.

=20

Timing violation. In this case, it was most likely that a setup time requi= rement was violated. The cause of this violation was due to the skew betwe= en the two clock domains and the signals that crossed from one to the other= .

Since you have the detailed model, you would have to track that one down an= d since you are rightly interested in getting to the bottom of this you sho= uld do so. The method I would suggest you follow is to use the original RT= L that works as you expect as the 'golden' model and compare the simulation= results to that of the 'failing' model. One way to do this is to

- Create a testbench that instantiates both models

- Connect all inputs to both models

- Run the simulation up until the outputs of the two models diverge

- Look at the input to the logic that creates the failing output and determ= ine if why the output is wrong (which is generally because the input is wro= ng...which means you iterate on this step until you finally come across the= root cause signal that started the cascade of bad)

=20

Unless you describe your logic in the form of lookup tables and flip flops,= Quartus will never be able to "synthesize something that resembles the phy= sical circuit". Your source code describes a logic function description. = Synthesis takes that description as input and produces a bitstream that can= be loaded into a device that will then implement that function. If you cr= eate timing problems that's a design issue (i.e. you described a function t= hat has a design flaw). In many cases, such design issues are beyond the s= cope of the tool.

Kevin Jennings

Vote

K

KJ 14 years ago

=20

Simulators don't necessarily catch timing violations. That is the job for = a static timing analysis tool. However, even STA doesn't catch problems wh= en the constraints are not properly defined. In this case, did you tell Qu= artus to ignore clock domain crossings when performing timing analysis? If= I recall correctly, Quartus defaults to this setting and you have to actua= lly tell it to perform the cross clock domain timing analysis.

In short, there are a lot more requirements that need to be defined correct= ly in order for static timing analysis to be correctly performed (and no go= od way to validate that you have really got those requirement correctly spe= cified). But when you do specify all the requirements correctly, the timin= g analysis tool will catch and report the problems. A simulator might get = lucky and catch something, but it will miss far more errors because it is n= ot equipped to simulate function in a manner equivalent to how you perform = static timing analysis.

Kevin Jennings

Vote

G

glen herrmannsfeldt 14 years ago

(snip)

If you wrote it in verilog, I might be able to figure it out.

My only guess is that there is a glitch on the generated clock.

That is, it clocks twice when you think it clocks only once.

Simulated clocks can have infinitesimal delay and still clock the FF.

Real gates won't be able to do that, at least not infinitesimally. With enough delay, you can get a real glitch that can clock a real FF. Another reason for synchronous logic.

The old favorite from asynchronous logic days was generating a counter reset from the output of a ripple counter. In some cases, the reset pulse is too short to reset all the FFs, though that depends on the logic being especially fast and the FFs slow. Usually it won't glitch, but I am not sure it isn't possible. (That is, when the reset is also used to clock the next FF.)

(Use a 74L74 FF and 74S00 to generate the (not) reset. Except that I forget which FFs have an active low reset.)

-- glen

Vote

G

glen herrmannsfeldt 14 years ago

(snip)

What does it do in the case of internally generated clocks.

If you have multiple clock inputs to the logic, then I might see what it would do for static timing. If you generate a clock internally, then it isn't so obvious.

-- glen

Vote

J

Jon Elson 14 years ago

In FPGA hardware, easily! Xilinx guarantees their LUTs to be glitch-free by themselves, but as has been pointed out here before, when multiple LUTs are strung together with routing delays between them, all bets are off. Can't say about glitch behavior on Altera, but I suspect similar things. Then, the tools pack logic into the LUTs as they fit best, without any attention to making sure glitches can't propagate.

Now, why this behavior showed up in simulation is not clear. But, again, if you do things that are really wrong in the HDL, it is possible the simulator just fouls up.

To some extent, but it reveals you are trying to do something that is bending the tools and FPGA in a way they were not designed for, and will usually lead to trouble. Some people do special things in very special cases such as all-silicon ring oscillators where they are using special knowledge of the internal workings to do what the tools can't handle, but they then have to test the results carefully. It is far better to describe in the HDL the behavior you want and let the tools fit that onto the chip's internal architecture.

Jon

Vote

Completely puzzled: Strange shift register behaviour

Join the Discussion

Didn't find your answer?