FSM in illegal state

Jerker Hammarberg (DST) · 2004-07-07T17:51:32+00:00

Hi all! I'd like to once again bring up the subject of state machinesrunning into illegal states (illegal in the sense that the state vector doesnot correspond to any of the states defined in the VHDL code), becausedespite having spent half a day googling and reading related threads, I'mstill left with a couple of questions:1. Most discussions cover how to recover from illegal states, but few coverhow it actually happens. What are the (I presume) electrical reasons to thata state machine runs into an illegal state in the first place? Is thereanything one can do to reduce the risk? Assume all FSM inputs connected toI/O pins are synchronized with one FF each, and the whole design issynchronous. Does anyone know of a good tutorial on this issue? I could addthat in my case, the transition into an illegal state almost always happenimmediately upon startup of the system, if it happens.2. How can I force Xilinx XST (6.2 SP3) to produce a safe FSM that recoversfrom an illegal state? A "when others => state

R

rickman 22 years ago

The noise could be on the power rail or ground bounce. It does not need to be on an input signal.

That depends entirely on your timing specs. If you have none, then they are not likely to be wrong ;) XST will be trying to make every path meet single clock timing.

As others have suggested, if it fails on startup, it could easily be the async reset vs. clock. I think you made two bad assumptions from the way you describe your initial state. You indicated you used "initial values by declarations". I don't think synthesis tools use initial values as reset values. I have never asked if XST does this or not since I don't depend on this. Accepted style is to put it explicitly in your hdl code like this...

ByteFlag: process (SysClk, Reset) begin if (Reset = '1') then membyte I am not a fan of dealing with this type of problem by illegal state

You can directly force the FSM FFs into any state you wish. I have not done it often, but I have used this feature. Read up on your simulator.

Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX

Vote

R

rickman 22 years ago

One other thing you can do is to add the logic manually. If you are using an enumerated data type, you can get the encoded value. Write your own logic on this state value and detect the illegal combinations.

Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX

Vote

J

Jerker Hammarberg (DST) 22 years ago

Hi Philip, and thank you for your enlightening and extensive reply!

Right, I should say that this design with 20MHz and 1 out of 10 failure rate is just one of several designs that I have problems with. I don't only want to solve the problem for that particular one design; I want to learn as much as possible about this issue, so that I can solve the problem for all my designs and avoid it in the future.

For that 20MHz design, I could add that it doesn't seem to happen when the board's power supply is less loaded. Considering that it almost always happens on startup, when components all over the board need a lot of current, I suspect it might have something to do with voltage dips. But I don't see why that would only result in a state machine hanging; I would expect to see more diverse errors. Whatsoever I will examine the voltage curve on powerup.

I actually read the faq before posting, like a good newsgroup poster should do! I will add a second synchronizer, but it hurts a little, because one of the inputs is an SPI clock and the FPGA is supposed to react as fast as possible on its edges.

I have to admit I haven't cared much about static timing, since I believed that if I specify a clock period constraint, all FF to FF paths would be constrained by this time, and that's all I need in a completely synchronous design. Is that wrong? Anyway, I realize that I need to learn more about timing... or finally tell my boss I shouldn't be doing FPGA design without proper education.

I will check the unconstrained paths, but I expect to find only input to FF and FF to output paths. I don't care about output delay since they are only connected to leds and SPI units working at a considerably slower rate. In particular, these paths can't affect my state machines. And the inputs can arrive anytime anyway. Would you agree that given this, I could rule out timing issues (apart from the following)?

This was totally new to me. I take it ALL state machines that may transition on the first clock pulse need this kind of protection.

Thanks again, Philip (and Phil)!

/Jerker

Vote

J

Jerker Hammarberg (DST) 22 years ago

I'm not so sure... again, XST finds all unreachable states and takes away the logic for them, even states that are legal and have their own "when" clause. I've even seen in another thread on comp.lang.vhdl the suggestion to connect an input pin to ground, then add code to the state machine like

if ground_signal = '1' then state

Vote

J

Jerker Hammarberg (DST) 22 years ago

Sure! I have specified a clock period constraint, which I thought would be enough. See further my reply to Philip.

But this is a well documented feature in XST, see XST User guide, Chapter 6, Initial Values. It says "When you give a register an initial value in a declaration, XST sets this value on the output of the register at global reset, or at power up.". So if the feature is there, it must be OK to use it, and to skip explicit resets?

/Jerker

Vote

J

Jerker Hammarberg (DST) 22 years ago

Or maybe you mean I should add separate logic outside of the state machine? That could work... I'll try that.

/Jerker

Vote

J

Jim Granville 22 years ago

logic for them, even states that are legal and have their own "when" clause. I've even seen in another thread on comp.lang.vhdl the suggestion to connect an input pin to ground, then add code to the state machine like

was hoping there were better solutions...

To solve any reliance on 'what xxxx tools might choose to do', the best solution is to code ELSE state

Vote

D

Duane Clark 22 years ago

Yes, I am talking about one-hot encoding. Where I have seen the problem is on power when, for whatever reason, none of the states are selected. In that case, the state machine is dead.

Xilinx XST.

My real email is akamail.com@dclark (or something like that).

Vote

D

Duane Clark 22 years ago

Uggh, that should say power-on.

My real email is akamail.com@dclark (or something like that).

Vote

H

Hal Murray 22 years ago

Everything I've seen so far points to troubles with the initial state and the first transition - the time when reset during configuration goes away.

The basic problem you have to understand is that the global reset signal is asynchronous. The next problem is that much of the software doesn't beat you over the head with that fact. (Every FF in the chip has this problem. Mostly we just ignore it and get away with it.)

FPGAs work well with one-hot state machines. But if you use that encoding, then it takes a lot of logic to detect an invalid state. If you want to detect invalid states, it's probably less total logic to use a denser encoding. It's free if you use a ROM. At slow speeds with 16 or fewer states, a binary encoding may be fine.

So the other approach is to avoid getting into invalid states. The first key step is to make sure that all your normal logic inputs meet setup/hold times. Metastability comes in here.

My rule of thumb is to use 2 FFs (costs a whole cycle) so I don't have to think about it. Well, you actually do have to think about it - they must be near eachother. If the placer puts them on opposite sides of the chip the routing delay can cause problems. You should be able to avoid that with timing constraints.

If you don't want to wait that extra clock cycle then you have to think carefully. At 20 MHz with a modern chip it should be easy. Check the app notes and such.

Now all that's left is that pesky initial reset.

If your state machine starts in state IDLE and stays there until some external event comes along then you are probably safe with no extra logic. You just have to make sure that no "event" happens until several clocks after initialization. Many times that happens for free.

If your events do happen early, you can add a simple state machine to mask them for a few cycles. You get that right by inspection - that is you design it as logic rather than a FSM.

Some state machines start off running, for example to generate a pattern of constantly running timing pulses. You can make them safe by modifying them to wait for an artifical event. If you feed in an event of "1", things reduce to the previous paragraph.

For a simple one-hot encoding that starts off running, you can detect the illegal states that result when things go wrong without a lot of logic. That is don't catch all illegal states, just the ones that can happen due to asynchronous reset troubles.

This has been an interesting discussion. Seems like good bait for an app-note.

The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.

Vote

R

rickman 22 years ago

Ok, if you are sure this works for XST, then that is ok. But it won't work with other synthesis tools which will make your code not portable. If you later want to use a better tool, you will have to go back and use the standard method on every signal you are initializing.

Either way, if it is being reset, then that is covered. But this does not necessarily mean it will come out of reset correctly. Since the clock is async with respect to the reset, and the reset has variable delays throughout the chip, you can release reset on different parts of the chip (including different FFs in the same state machine) on different clock cycles.

One way to fix this is to make sure the global reset path delay is less than one clock cycle and to sync the reset to the global clock. I believe there is a config bit stream option to synchonize the end of reset with your global clock. Check the docs. This is something you will set when you generate the bit stream. I am not sure how to find out how long the reset delay paths are.

Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX

Vote

D

DrB 22 years ago

Phil

Your allusion to "common" DCM / DLL issues has aroused my curiosity.

Could you elucidate please.

Martin

Vote

P

Phil Hays 22 years ago

Sure. I'm not sure "common" was the correct word, but I suspected that the original poster might have run into the following issue. I've seen it in someone else's design. These problems are curable by reading the fine manual (RTFM).

RTFM # 1 "Do not use the DCM output clock signals until after activation of the LOCKED signal. Prior to the activation of the LOCKED signal, the DCM output clocks are not valid and can exhibit glitches, spikes, or other spurious movement."

You can keep the part in configuration until the DCMs are locked by a bitgen option, or you can use the synchronized locked signal to force reset to all state machines and all other critical logic. Better do one of the two. The bitgen option is probably best, unless you may be resetting the DCMs after configuration, in which case you need to do the second.

I didn't think he had this issue, but I've seen it as well. In my own design.

RTFM # 2 "To ensure consistent locking, if a DCM is configured with external feedback, applying a reset after configuration is strongly recommended."

"Strongly recommended" isn't quite correct. "Required" would be better wording.

-- Phil Hays Phil-hays at posting domain should work for email

Vote

D

DrB 22 years ago

Phil

Thank you for the dits

If only one could RTFM (Read The Full Manual) - it never seems to get published these days

An F* problem in its own right; but fore warned is fore armed

Martin

Vote

J

Jerker Hammarberg (DST) 22 years ago

I have now implemented the protection against race condition as a result of asynchronous reset, as proposed by Phil and others. The machine hasn't locked after that, so hopefully that was the solution.

I learnt a lot through this discussion. Thanks a lot to all who contributed!

/Jerker

Vote

P

Peter Alfke 22 years ago

A Xilinx App Note was suggested. Here is an attempt. Let me know if it makes sense.

Beware of Start-Up Synchronization Errors Peter Alfke, July 12, 2004

All Xilinx FPGAs provide global (p)reset to all internal flip-flops and latches, which means that every flip-flop is either set or cleared (configuration option) when the FPGA goes active after configuration has been completed. This attractive feature requires no general routing resources. It is ³for free², but like with many ³free² offers, there can be strange side effects.

? The (p)reset signal is distributed across the whole chip, and it is not very fast. Tens of nanoseconds delay are common on large chips. ? The (p)reset signal is synchronous with either CCLK or, better, with the user clock, but there may be more than one user clock. ? The user clock is normally running while the FPGA is being configured. This is only tolerable because flip-flops are being held (p)reset, nodes are forced High, and outputs are being held inactive, as described in the configuration documentation.

These three features combined can cause unreliable start-up after configuration, when the trailing edge of the asynchronous (p)reset signal has so much skew or uncertainty with respect to any flip-flop clock, that some flip-flops can begin operating on different clock cycles. This might lead to an irrelevant start-up glitch, but it might also cause a state machine to enter an illegal state, or even to freeze up.

There are several alternate solutions to this problem.

Disable all clocking until about 100 ns after the end of GSR, the automatic global signal that asynchronously (p)resets all flip-flops.

Distribute a synchronous CE (clock disable) signal with a tight distribution delay of less than one clock period.
Analyze the design for sensitive circuitry, e.g. state machines, and create a localized synchronous CE signal that delays operation for several clock cycles after the end of GSR.

A convenient reset synchronizer and stretcher consists of a flip-flop with the usual GSR reset, with a High on its D input, and with its Q output driving the SRL16 input to its own LUT. The SRL16 output then goes High a controlled number of clock pulses after the end of GSR. This is a good signal to use for driving the CE input of critical state machines.

Ken Chapman published a longer and more entertaining description in TechXclusives. Click on this insanely long URL:

formatting link

Peter Alfke

Vote

J

Jim Lewis 22 years ago

Peter, Since the Xilinx parts have internal POR, it seems a shame not to use it. However, since GSR is slow, we are cautioned against using it due to problems like these.

Is there a way to connect the output of GSR to the input of the circuit that Ken Chapman suggested in the section "Strategy for the 0.01% of cases" of his TechXclusive and leave off the asynchronous reset input. Would this be a way that I can effectively cause all of resets (or critical resets) for a clock domain to be deasserted at the same time without having to have a redundant external POR?

If this is possible, is there a VHDL coding methodology that you can recommend? The big missing piece for me is how do I access the output of GSR? #1: Explicitly through instantiation? #2: Implicitly through initializing the VHDL signals that create the registers? Although this technique seems possible in XST, would it also work the same in other synthesis tools? #3: Put the reset logic in a separate hierarchical block and explicitly code an asynchronous reset, but at the next level of the design tie the asynchronous reset to a constant inactive value?

If I have multiple FPGAs and they are all connected in a serial fashion for configuration, is the internal POR released at approximately the same time?

The details of POR and GSR functionality has always been a little troubling to me, is there a good (really detailed) application note that you can recommend.

Regards, Jim Lewis SynthWorks VHDL Training

Vote

H

Hal Murray 22 years ago

I think your summary was good, but I also think I understand the problem. (I got bit years ago. It's one of my hot buttons.) The real question is how to get this problem on every designer's mental checklist so that they are thinking about it at the right time.

My request for an app-note was actually misleading. I should have asked for support in the tools. They should be smart enough to complain about state machines that don't have some mechanism to come out of reset cleanly.

What do ASIC/CPU designers do? My guess is that they don't have a global-reset so their version of this problem is a bit different.

This seems like a good open source project. Or maybe a masters thesis. All the info is in the EDIF/netlist. Right?

Is the GSR in the netlist? Or would a tool have to add it by magic?

Ken's TechXclusive article is good. Thanks.

It actually doesn't cover the mechanism I think I would try to use. He holds the whole state machine in reset. That chews up local resources. Many FSMs stay in the idle state until they get poked, for example a memory controller waits for a read or write request. I was thinking of blocking that sort of request signal.

The suespammers.org mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my suespammers.org address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.

Vote

J

Jim Lewis 22 years ago

Hal,

In ASIC design, we would connect the asynchronous reset to the circuit Ken showed (or one like it). Reset for each clock domain is separately synchronized. Reset would need to settle in a clock period. Routing generally allows this. If not, you work it out (build a fanout tree with buffers or registers). Going back some time, on my first ASIC we had to manually balance the clock tree.

See my post. GSR would be generally alot more useful if we could quantify in general that it settles in X to Y amount of time and then apply its output to a circuit like the one Ken shows to stabilize all logic or all control logic. From the difference between Y and X we could calculate how many states we needed to delay "critical" reset so that it occurs last. Four registers is kind of ad-hoc and means something much different for different clocks (20 MHZ vs

200MHZ).

Of course, going a step further, since synthesis tools can and do recognize statemachines, and they know the clock frequency of a particular clock, they could automatically do the right thing for Xilinx (insert Ken's circuit). However I like being in control of these type of things and I am not sure I would like this solution.

Cheers, Jim

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Jim Lewis Director of Training mailto:Jim@SynthWorks.com SynthWorks Design Inc. http://www.SynthWorks.com 1-503-590-4787 Expert VHDL Training for Hardware Design and Verification ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Vote

R

rickman 22 years ago

I don't think the problem with GSR synchronization equals a recommendation to not use the GSR. The point is to use it with "caution" and to take the appropiate design steps.

Reread Peter's post. It is a good start and I think you misunderstood some of the points in it. You don't need an external reset and most likely you don't need a sync reset that is "global" to a clock domain. Typically there are subsets of the design which are sensitive to synchronous release of GSR. These circuits are typically FSM, counters and the like. Normally the data path does not care about reset since it will immediately go to the correct state on the next clock.

One circuit you can add to any FSM to provide a sync reset is just one or a pair of FFs which work just like a metastability reduction circuit. Have the GSR hold these FFs in the '1' state and wire the D input of the first one to a '0'. The last output will be your sync reset. The sync reset will be held for one or two clock edges after the release of GSR. This will only need to be used by the first two bits of a one-hot FSM, the first changed bit of a gray coded counter, the first bit of a JRC or even the first bit of a binary counter (assuming it starts at 0); and so will not overly complicate them and slow them down.

I have done lots of FPGA designs and have never needed anything complex to start the chip up. Just pay attention to the startup and don't assume that all FFs are released at the same time, between FSMs, not just within them.

This is something that should be added to Peter's app note.

You don't need to access the GSR signal explicitly. If you provide an async reset control that is common to most/all of your sequential logic, I believe the tools will infer the GSR.

run_en_reg: PROCESS (sysclk, reset) BEGIN IF (reset = '1') THEN run_en in a serial fashion for configuration, is the internal

Approximate is the key word. There is a common sync signal to provide this, but you still have the wide range of delay inside each chip and you need to select what clock is used to control it.

You can learn a lot about it by reading how the end of configuration works. But the main thing is to stop assuming that the end of GSR is a synchronous event. On other chips we get used to the reset just working (like magic). In the FPGA we need to give it a little thought, even if it is not really hard.

Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX

Vote

FSM in illegal state

Join the Discussion

Didn't find your answer?