bizarre state machine behavior

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hello,

I ran into a bizarre state machine problem last week.  I had a fairly
simple state machine written in VHDL, with an enumerated type and 5
states.  The code is of the form :

if clock'event and clock='1' then
    . . .
   if state = a then
     if inputa = '1' then
       state <= b;
       outputa <= '1';
     end if;
   end if;

   if state = b then
   . . .

the whole thing is synchronous, running at 40 MHz on a Spartan 2E,
except a couple external inputs such as the "inputa" above.
What I was seeing was the state machine locking up, so I added a process
to decode the valid states and send them to LEDs.  I could see that the
lockups left signal state in some non-valid condition, ie. NOT one of
the enumerated values of the type.

I theorized it was encoding this as "one hot" as the software was set.
So, I forced the enumeration to have specific binary-coded values, and
enumerated all 8 possible codes, providing an if for the unused states
to go back to the reset state.  This fixed the problem as far as I can
tell.  I think this is telling me that in the "one hot" mode, it was
somehow getting more than one bit set true at a time.

Well, I was careful to set up the if's for each specific state to have
nested ifs in such a way so that any combination of input conditions
could only satisfy one of the lowest nested ifs.  So, I just can't
figure out how it could possibly set more than one bit hi at a time.
There was a state where it could assign 2 different states values to
state, depending on the ifs.  But, if all the ifs were nested, it could
never try to assign both values at the same time on the same clock.
But, that seems to be what was happening.  The software is ISE 4.2
(I am still supporting some 5V Spartan stuff, and I'm also cheap).

Any hints about what was happening would be greatly appreciated!

Jon


Re: bizarre state machine behavior

 > <snip>
Quoted text here. Click to load it
 > <snip>

That right there could be your problem.  If those inputs aren't
synchronous then you could get into some trouble if they change just
before a clock edge happens.  Some of your state machine flops get the
new message, some get the old one, and you've magically got an illegal
state.

Can you register those signals for a clock before you use them?

--
Rob Gaddi, Highland Technology
Email address is currently out of order

Re: bizarre state machine behavior
Quoted text here. Click to load it

In addition to registering all inputs, you also should make sure that
the state machine is initialized with a synchronous reset after your
DLLs have locked.

-Jeff

Re: bizarre state machine behavior


Quoted text here. Click to load it
No DLLs, just a plain single clock.  The state machine and all other
hardware
does initialize perfectly.

As for registering the inputs, that DOES seem to be the right thing to
do, but the binary-coded state version works fine without.  Also, the
clock rates on this are so low, it seems that this malfunction is
happening too frequently.  I hadn't thought about the possibility of
there being multiple gating paths from the syntax

if state = x then
   if inputa = '1' then
     state <= y;

to the actual flip-flops of signal "state", but I can see how that would
synthesize to such a condition.  A pretty narrow window for this to
happen, but certainly conceivable.

Thanks, I will do the extra registering of the asynch inputs on the next
rev of this!

Jon


Re: bizarre state machine behavior

Quoted text here. Click to load it

It is.


So far, but wait a while.
The temperature may change.

Quoted text here. Click to load it

It's the the frequency *difference*
that sweeps the setup times
and throws the race the wrong way.

      -- Mike Treseler

Re: bizarre state machine behavior
Quoted text here. Click to load it

That may have more to do with the implicit ELSE handling.
ie One State engine locks solid, the other will recover
in a few clocks (which means you may not notice, or have not
yet noticed the effects!)

Even with input registering, you should cover ALL states,
(including the 'illegal' ones) in your state code.


Quoted text here. Click to load it

Can you clarify 'too frequently' ?
With a 25ns clock, a couple of IPs and 5 choices, lets
take a nice round 100ns IP sample rate. (10MHz)

An aperture effect of 1ns would be hit 1:100, or average 10us.

A more likely 100ps aperture, would hit 1:1000, or
average 100us, or 10,000 times a second. (assumes random hits)

Take your true IP sample rate, and reported timing skews, and
get a more accurate prediction.

Quoted text here. Click to load it


There could be a case for the tools to
a) Warn on async state conditions
b) Warn that illegal/ELSE options are not covered


-jg


Re: bizarre state machine behavior


Quoted text here. Click to load it
That's pretty easy to do with binary coded states, but with one-hot,
and enumerating the type, how do you even SPECIFY the illegal states,
as those, by definition, would be the ones with two or more bits "hot"?


Quoted text here. Click to load it
The external signals, all two of them are from a mechanical system,
and change slowly.

Thanks,

Jon


Re: bizarre state machine behavior

Quoted text here. Click to load it

If you choose to use enumerated types in your source code you don't have a
way to specify 'illegal' states.  But that misses the point, which is that
you should have a design that can not get into an 'illegal' state.  In your
case you got there by violating setup time by bringing your asynchronous
input signal into more than one flop (i.e. the multiple flops that make up
your state machine).

Quoted text here. Click to load it

The frequency of the signals from your mechanical system are irrelevant
unless they are 0 Hz.  That input signal will not be changing at any
particular time relative to the clock of your state machine so you are
guaranteed to have instances that do not meet the setup time requirements.
It doesn't matter how frequently you think those things are changing, you
have to meet setup/hold time requirements on each and every clock cycle.

Kevin Jennings



Re: bizarre state machine behavior
Quoted text here. Click to load it

True, it becomes more a tools issue.

Out of interest, how did the resource/speed reports compare, with
the two coding schemes ?

Quoted text here. Click to load it

but you should be able to do an aperture calculation, to see if
your observed lock-ups, match the prediction (roughly).
Do these IPs bounce ?

-jg


Re: bizarre state machine behavior
Quoted text here. Click to load it
Hi Jon,
You do realise that every build can have different timing? If you're saying
DOES because of your P&R results, you MAYBE mistaken.
HTH., Syms.



Re: bizarre state machine behavior

Quoted text here. Click to load it

Just be aware that even without DLLs to worry about, the internal reset
is asynchronous and can cause problems if the state bits see it go away
on different clock pulses, i.e. it is another asynchronous input to your
state machine. I generally always do something like this:

signal resetv: std_logic_vector(2 downto 0) := "000";

process(clk)
begin
   if rising_edge(clk)
     resetv <= resetv(1 downto 0) & 1;
   end if;
end process;

state_machine_reset <= not resetv(2);

-Jeff

Re: bizarre state machine behavior


Quoted text here. Click to load it

I never had a problem with the reset/initial state, that has worked fine
all the time.  But, I'll keep this in mind!

Jon


Re: bizarre state machine behavior
Quoted text here. Click to load it

Hi Jon,
That's your problem! You have a one hot state machine with five states. This
is implemented as five flipflops(FF). Your external inputs are asynchronous,
and so if their transitions happen to be close to the clock transition, you
wil have a race condition where the signal can get to one/some FF/s, but not
others.
Try this link http://en.wikipedia.org/wiki/Race_condition
Cheers, Syms.
p.s. For completeness, I should mention there is a difference between race
hazards and metastability. Your circuit can suffer from both, but in your
case the race condition is many orders of magnitude more observable than the
'm' word! See CAF passim!



Site Timeline