FSM state minimization with ISE?

- B
- backhus
  
  Contact options for registered users
posted
17 years ago

Tue, Jun 13, 2006 8:54 AM

Hi everybody, I just tried to minimize a simple statemachine with ISE8.1 and wondered that there is no effect, while the machine itself is an ideal target for state minimization. (It's a lecture example fur just that purpose)

I tried several synthesis properties with no effect.

Anyone knows how to do state minimizing with ISE, or what the reasons are for not minimizing the given FSM? It works with Design Compiler!

Best regards Eilert

-- example FSM source

-- analyzes a serial bitstream and

-- detects non-BCD values in 4-bit segments (tetrades)

LIBRARY ieee; USE ieee.std_logic_1164.ALL;

ENTITY pseudotetradedetector IS PORT ( X : IN STD_LOGIC; Reset : IN STD_LOGIC; CLK : IN STD_LOGIC; Y : OUT STD_LOGIC ); END pseudotetradedetector;

ARCHITECTURE ptd_2p OF pseudotetradedetector IS

TYPE state IS (S0, S1, S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14); SIGNAL cs,ns : state;

BEGIN

PROCESS (Reset, CLK) BEGIN IF Reset = '1' THEN cs

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jun 13, 2006 7:30 PM

If state encoding is set to AUTO you are getting a one-hot encoding which is not optimum in this case. Might be a good subject for a lecture :)

Quartus says

13 flops and 17 luts for one-hot 4 flops and 13 luts for binary

What did you get?

-- Mike Treseler

- J
- JustJohn
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jun 13, 2006 11:28 PM

(Interesting post. Eilert asks about ISE 8.1, Mike [who usually gives good answers] responds with wrong info on AUTO setting, and odd info on Quartus...it's way past April 1st)

Eilert,

I agree with Mike that you should give your results. If you can't get any improvement, maybe you already have the best you can get.

For your posted circuit, XST in ISE 8.1 with global setting "-fsm_encoding auto" gives:

Number of Slice Flip Flops: 4 Number of 4 input LUTs: 10 when compiled to a virtex2 (Assigned states are binary sequential).

Given 15 states, you can't do any better than 4 FFs!

Asking for "one-hot" gives: Number of Slice Flip Flops: 15 Number of 4 input LUTs: 19

Eilert, RTFM (Read The Fine Manual). In the XST manual, you can find out how to individually specify the state encoding, with a selection of:

"auto, one-hot, compact, sequential, gray, johnson, speed1, and user".

Mike, Does Quartus really make a one-hot w/ 13 FFs for a 15 state machine?

- T
- Tommy Thorn
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Tue, Jun 13, 2006 11:44 PM

...

... > Does Quartus really make a one-hot w/ 13 FFs for a 15 state machine?

I could be wrong, but I think you missed the point. The OP's FSM is redundant (which is the whole point) and can be reduced to a smaller FSM with the same behaviour. This minimization is similar to how you derive a DFSM from an NFSM (widely used in regexp).

Without having gone through the whole exercise, the Quartus results suggests that you can merge at least three state of the original machine.

Mike post was relevant and informative.

Tommy

- J
- JustJohn
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jun 14, 2006 12:08 AM

Thanks Tommy! I stand corrected... But note that the state minimization increases LUT usage...I suppose it might decrease decoding H/W elsewhere... but how is does one specify exactly what sort of optimization is wanted? Personally, I wouldn't want states eliminated unless I explicitly said to do it.

- T
- Tommy Thorn
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jun 14, 2006 12:14 AM

Where did you get the LUT usage increase from? As we don't have the number of LUTs for the original FSM w/o minimization, I don't follow the conclusion.

AFAICT, state minimization should always result in fewer or same number of LUTs, assuming the same encoding strategy.

IMO, redundant FSM are unlikely when written by hand, but much more likely when autogenerated.

Tommy

- T
- Tommy Thorn
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jun 14, 2006 1:01 AM

I guess that ISE and Quartus just doesn't do state minimization (probably because real designs aren't redundant).

I was waiting for a compile, so I tried it by hand. S8, S9, S10, S12, S13, and S14 are clearly identical so they can all be replaced by S8, which leads to new identical state etc. My final machine had only six states:

TYPE state IS (S0, S1, S3, S4, S7, S8); .... WHEN S0 => Y

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jun 14, 2006 6:45 AM

Hi Tommy, while your solution is not really correct(in s0 you have to branch into two states according to X), at least you got the point. In fact States 7 to 10 and 11 to 14 are equivalent and each group can be reduced to on state. Furtermore states 3,4 and 5,6 are equivalent too.

After all it results in 7 States.

ISE Generates

15 FFs in OneHot and 4FFs (with 15states encoded) in the binary encodings.

Quartus seems to eliminate two states, possibly the pairs 3,4 and 5,6. (Yes Mike, it is a subject for a lecture. Our students have to do the minimization. By pen&paper in the 3rd semester and with DesignCompiler in the higher semesters)

So, what I expected from ISE is that it performs at least as good as a

3rd semester student. :-)

With OneHot the FFs should be reduced to 7 which saves over 50%. And the combinatonal Increase, well for this example I dobt that there is any significant.

For binary encoding only one FF is saved, but the encoding will fit into one LUT (3 Statebits + X) instead of two LUTs (4 Statebits + X) in the non-minimized solution.

(John, I mentioned in my first posting that dc does a minimization, so a saving in FFs could be expected. Furthermore I read the manual before and found a clear statement, that "Xilinx® recommends that you use enumerated type encoding to specify the states and use the Finite State Machine (FSM) extraction commands to extract and encode the state machine as well as to perform state minimization and optimization algorithms." [ 8.1 synthesis and Simulation Design Guide, p.128])

The mentioned part about "extraction commands" seems to come from some older ABEL Manual for CPLDs. Everything else there describes HDL designs. Is it a copy&paste error of the manual editor?

So, the original Question still needs an answer. Is ISE capable of performing State minimization? The above cited manual says so, but the experiance holds against it.

Best regards Eilert

Tommy Thorn schrieb:

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jun 14, 2006 7:49 AM

Hi Tommy, John and Mike I posted some results discussion as a reply to Tommys direct reply for my OP. Mike, interesting result from Quartus. At least it seems to do some optimization. Is there anything in the synthesis report that gives information about what exactly was done to the FSM?

John, if a synthesis Tool optimizes away equivalent states you only will have to worry about it when you try to compare your timing sim with the behavioral sim. It causes some nasty work though, but from minimization one should expect area savings or speed increase, which might be useful if you are designing on the edge of performance.

Tommy, I mostly agree that handwritten/designed FSMs seldom have equivalent states. Because they are often (not always) quite small. But when one has to design a large and more complicated algorithm and want to keep the models source readable or for an easier analysis of the simulation results, then equivalent states may appear intended or accidental. Autogeneration of FSMs is surely a source of such things.

Best regards Eilert

Tommy Thorn schrieb:

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Wed, Jun 14, 2006 5:41 PM

It may be of academic interest, but binary encoding is the clear winner here, as it usually is in small cases. Even in large cases the difference is mostly one of speed, not area.

looks like it pitched 7 and 11:

formatting link

-- Mike Treseler

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jun 15, 2006 9:00 AM

Hi Mike, what do you mean with winner? Lowest overall FF count? Sure, but there was never a doubt about it. The question was if a state minimzation can be performed with ISE regardless of the encoding style.

The Quartus results are quite interesting. In pseudo_states.pdf we can see the states before(!) minimization. From the autogenerated diagram it's kind of hard to see that the states build a decision tree. (Well... tools :-) ) It's funny to see that S7 and S11 are in double circles unlike the other States above S6. The only difference between the double circled states and the single circled states is, that Y remains constantly '0'.

The ptd-fsm checks a serial bitstream X for non-BCD tetrades with LSB first (at S0) and MSB last (at S(>6)), setting the output Y at that moment. Now what happens in the QUARTUS created FSM when it should branch to S7 or S11. Does it remain in S3 or S5 for 2 clock cycles? How? Or has it really kicked away these states, thus running out of synchronization with the data stream. (I doubt, but who knows.) May be it goes over states S8 or S12 and does a tricky output decoding.

At least it does some minimization, even if it wasn't expected that way. Thank you for presenting these results.

Best regards Eilert

Mike Treseler schrieb:

- J
- JustJohn
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Jun 16, 2006 2:49 AM

Hi All, First, apologies to Eilert and Mike:

Eilert, I posted in haste on Tuesday, not giving full consideration to your original post, thought you were the student, not the instructor, and so the unfortunate RTM comment. Mike, I tried this out at home yesterday on my laptop, and "auto" used to produce one-hots in ISE 6.1, so you were right then.

Tommy, when I said "LUT usage increase", I was (wrongly) comparing apples to oranges, i.e., ISE to Quartus. ISE 8.1 gave 10 LUTs (auto, with sequential binary coding) where Quartus gave 13 LUTs.

Trying to contribute a little more:

Today I tried ISE with "compact" encoding, and it produced a solution w/ 4 FFs, and 6 LUTs.

Finally, I tried the "hand" reduction myself. Not using pen/paper, but with an editor, eye-balling for equivalent states, and replacing all equivalent states with the lowest numbered state in an eqiv group. It takes more than one pass, and I ended up with a different final number than the 7 states that Eilert posted (Eilert, since this is a class exercise, I'll send you the final result direct). Putting the final reduced state machine through ISE w/ compact encoding resulted in: 3 FFs, 3 LUTs. (I think XST gave the same speed estimate for both the 4 FF and the 3 FF solution, and I am not going go further and try hand placing and routing)

This simple example is a very good demonstration that state reduction can produce dramatic resource reduction: 4 FFs => 3 FFs, 6 LUTs => 3 LUTs. I don't get excited about anything less than a factor of 2 lately, but the LUT results clearly indicate that this can be a very valuable feature for a synthesis tool, in spite of the difficulty reconciling functional sim with post synth sim...after all, the states can remain hidden. It's certainly an easy process algorithmically, given a canonical represention of each state...

So I've reversed my opinion, because I'm often wrong, and remain, Just John (aka John Smith (Not to be confused with a certain noisy Viking))

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Jun 16, 2006 6:50 AM

Hi Tommy, your solution is perfectly correct! While the example comes from the script and has 6 states in the end, my (then absent) mind was fixed to the laboratory assignment, which is slightly different and has the mentioned 7 States.

Sorry for causing confusion.

Best Regards Eilert

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Jun 16, 2006 7:30 AM

Hi John, you are the second that comes up with a 6 state solution. That made me doubt about my posted code, so I fed it into DesignCompiler and got the same result. I nearly went mad, looking for the mistake when I finally took a look in the mirror, and there it was! :-)

I have to apologize.

Tommys solution (and yours probably too) is correct. I confused the script exercise (which I posted) with the slightly different lab exercise. The lab exercise reduces to 7 States.

The solution to the posted script exercise is in the script anyway (That's why we have created a different version for the lab), and the VHDL version has already been posted by Tommy.

If you want to send me your results use this link to get my real email-adress:

formatting link

After all it was a nice brain teaser, wasn't it? It's correct that more than one pass is needed to do a complete minimization. And now that you have posted some results about saved resources everyone should ask why these algorithms are not available in ISE XST. (I already opened a webcast, since none of the XILINX guys in this Newsgroup wrote any statement)

Best Regards Eilert

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Sat, Jun 17, 2006 9:02 AM

Sorry. I thought you meant converting a graph to a netlist and minimizing the logic, maybe like this:

formatting link

Minimizing an arbitrary digraph is a hard problem, so I'm not surprised the tools punt it.

Yes, it drew them for me without even being asked.

That makes some sense, as zero outputs require no state decodes for d-flop synthesis.

Run a sim on the netlist. I expect that it is functionally equivalent.

You are welcome.

-- Mike Treseler

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Jun 19, 2006 8:57 AM

Hi everybody, XILINX answered my webcast:

"I've spoken to our XST expert and he informed me that FSM minimization isn't a feature of XST. If you have two states that are totally identical then the tool will optimise them, otherwise the FSM will be encoded with the encoding algorithm specified within the process properties. "

Bad news for this topic. I'm going to check out Precision Synthesis RTL on this.

Best regards Eilert