Style of coding complex logic (particularly state machines)

- E
- Eli Bendersky
  
  Contact options for registered users
posted
17 years ago

Thu, Aug 24, 2006 5:28 AM

Hello all,

In a recent thread (where the O.P. looked for a HDL "Code Complete" substitute) an interesting discussion arised regarding the style of coding state machines. Unfortunately, the discussion was mostly academic without much real examples, so I think there's place to open another discussion on this style, this time with real examples displaying the various coding styles. I have also cross-posted this to c.l.vhdl since my examples are in VHDL.

I have written quite a lot of VHDL (both for synthesis and simulation TBs) in the past few years, and have adopted a fairly consistent coding style (so consistent, in fact, that I use Perl scripts to generate some of my code :-). My own style for writing complex logic and state machines in particular is in separate clocked processes, like the following:

type my_state_type is ( wait, act, test );

signal my_state: my_state_type; signal my_output;

... ...

my_state_proc: process(clk, reset_n) begin if (reset_n = '0') then my_state if (some_input = some_value) then my_state ... when test =>

... when others =>

my_state

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 6:04 AM

Hi Eli, discussion about styles is not really satisfying. You find it in this newsgroup again and again, but in the end most people stick to the style they know best. Style is a personal queastion than a technical one.

Just to give you an example: The 2-process -FSM you gave as an example always creates the registered outputs one clock after the state changes. That would drive me crazy when checking the simulation.

Why are you using if-(elsif?) in the second process? If you have an enumerated state type you could use a case there as well. Would look much nicer in the source, too.

Now... Will you change your style to overcome these "flaws" or are you still satisfied with it, becaused you are used to it?

Both is OK. :-)

Anyway, each style has it's pros and cons and it always depends on what you want to do.

-- has the synthesis result to be very fast or very small?

-- do you need to speed up your simulation

-- do you want easy readable sourcecode (that also is very personal, what one considers "readable" may just look like greek to someone else)

-- etc. etc.

So, there will be no common consensus.

Best regards Eilert

Eli Bendersky schrieb:

- E
- Eli Bendersky
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 2:34 PM

I guess this indeed is a matter of style. It doesn't drive me crazy mostly because I'm used to it. Except in rare cases, this single clock cycle doesn't change anything. However, the benefit IMHO is that the separation is cleaner, especially when a lot of signals depend on the state.

I prefer to use if..else if there is only one "if". When there are "elsif"s, case is preferable.

In my original post I had no intention to reach a common consensus. I wanted to see practical code examples which demonstrate the various techniques and discuss their relative merits and disadvantages.

Kind regards, Eli

- A
- Andy
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 7:14 PM

Very interesting coding style. I'm curious why there are separate clocked processes. You could just tack on the output code to the bottom of the state transition process, but that is only a nit.

As long as I'm using registered outputs, I would personally prefer a combined process, but that's just how I approach the problem. I want to know everthing that happens in conjunction with a state by looking in one place, not by looking here to see where/when the next state goes, and then looking there to see what outputs are generated.

To illustrate, by modifying the original example:

my_state_proc: process(clk, reset_n) type my_state_type is (wait, act, test); variable my_state: my_state_type; begin if (reset_n = '0') then my_state := wait; my_output if (some_input = some_value) then my_state := act; end if; ... ... when act =>

if some_input = some_other_val then my_output ... when others =>

my_state := wait; end case; end if; end process;

The only time I would use separate logic code for outputs is if I wanted to have combinatorial outputs (from registered variables, not from inputs). Then I would put the output logic code after the clocked clause, inside the process. I try to avoid combinatorial input-to-output paths if at all possible.

Then it would look like this:

my_state_proc: process(clk, reset_n) type my_state_type is (wait, act, test); variable my_state: my_state_type; begin if (reset_n = '0') then my_state := wait; my_output if (some_input = some_value) then my_state := act; end if; ... ... when act =>

if some_input = some_other_val then my_output ... when others =>

my_state := wait; end case; end if; if state = act then -- cannot use process inputs here my_output > Hi Eli,

- M
- mikegurche
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 7:36 PM

I usually separate the state register and combinational logic for the following reason.

First, I think that the term "coding style" is very misleading. It is more like "design style". My approach for designing a system (not just FSM) is

- Study the specification and think about the hardware architecture

- Draw a sketch of top-level block diagram and determine the functionalities of the blocks.

- Repeat this process recursively if a block is too complex

- Derive HDL code according to the block diagram and perform synthesis. This approach is based on the observation that synthesis software is weak on architecture-level manipulation but good at gate-level logic minimization. It allows me to have full control of the system architecture (e.g., I can easily identify the key components, optimize critical path etc.).

The basic block diagram of FSM (and most sequential circuits) consists of a register, next-state logic and output logic. Based on my design style, it is natural to describe each block in a process or a concurrent signal assignment. The number of segments (process and concurrent signal assignments etc.) is really not an issue. It is just a by-product of this design style.

The advantage of this approach is that I have better control on final hardware implementation. Instead of blindly relying on synthesis software and testing code in a trial-and-error basis, I can consistently get what I want, regardless which synthesis software is used. On the downside, this approach requires more time in initial design phase and the code is less compact. The VHDL code itself sometimes can be cumbersome. But it is clear and easy to comprehend when presented with the block diagram.

One interesting example in FSM design is the look-ahead output buffer discussed in section 10.7.2 of "RTL Hardware Design Using VHDL"

formatting link

the book mentioned in the previous thread. It is a clever scheme to obtain a buffered Moore output without the one-clock delay penalty. The code follows the block diagram and uses four processes, one for state register, one for output buffer, one for next-state logic and one for look-ahead output logic. Although it is somewhat lengthy, it is easy to understand. I believe the circuit can be described by using one clocked process with proper mix of signals and variables and reduce the code length by 3 quarters, but I feel it will be difficult to relate the code with the actual circuit diagram and vice versa.

My 2 cents.

Mike G.

- D
- David Ashley
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 8:05 PM

Combining the input and registered state this way allows for a non registered path from input to output. Is this ok? Or is there an assumption that the device connected to the output is itself latching on the clock edge?

-Dave

--
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 8:43 PM

I have observed that synthesis software does what it is told. If I describe two gates and a flop, that is what I get. If I describe a fifo or an array of counters, that is what I get.

What I want is a netlist that sims the same as my code and makes reasonable use of the device resources. Synthesis does a good job of this with the right design rules. Trial and error would only come into play if I were to run synthesis without simulation.

I prefer clean, readable code, verified by simulation and static timing. I use the rtl viewer to convert my logical description to a structural one for review.

-- Mike Treseler

- M
- mikegurche
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 9:10 PM

Mike T.,

This issue has been debated in many threads and I don't want to do it again. The original poster, Eli, stated:

". . . I had no intention to reach a common consensus. I wanted to see practical code examples which demonstrate the various techniques and discuss their relative merits and disadvantages"

I expressed my opinion and gave an example from a book. You can do the same. Whatever method you choose is fine with me, but I am irritated that you always think your way is THE WAY.

Mike G.

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Aug 24, 2006 9:45 PM

I've already shared my examples.

My posting was intended as part of the "discussion of their relative merits and disadvantages"

The vast majority of designers use your style, not mine.

Backhus said it best:

"discussion about styles is not really satisfying. You find it in this newsgroup again and again, but in the end most people stick to the style they know best. Style is a personal question than a technical one."

-- Mike Treseler

- E
- Eli Bendersky
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Aug 25, 2006 5:18 AM

[...]

Thanks for this example. I have been always trying to avoid variables for things like this and it's interesting to see them used correctly.

The problem I see with the approach comes in complicated code where several signals depend on my_state (say 3-4 is enough). Then, the single-process-handling-everything becomes rather convoluted. Besides, since my_state is a variable local to the process, you can't see it outside so you can't use it to drive other signals. So basically you force all code dealing with my_state to be in one process. Another thing is that I prefer out-of-process statements for combinatorial logic, because IMHO it makes a cleaner separation (I immediately see it's combinatorial, without the need to see if it has some extra "end if"s below it that signify it's clocked.

comb_out

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Aug 25, 2006 6:14 AM

Hi Eli, Ok, that's something different. Earns some contribution from my side :-)

My example uses 3 Processes. The first one is the simple state Register. the second is the combinatocrical branch selection, The third creates the registered outputs.

Recognize that the third process uses NextState for the case selection. Advantage: Outputs change exactly at the same time as the states do. Disadvantage: The branch logic is connected to the output logic, causing longer delays. Workaround: If a one clock delay of the outputs doesn't matter, Current State can be used instead.

The only critical part I see is the second process. Because it's combinatorical some synthesis tools might generate latches here, when the designer writes no proper code. But we all should know how to write latch free code, don't we? ;-)

The structure is very regular, which makes it a useful template for autogenerated code.

Have a nice synthesis Eilert

ENTITY Example_Regout_FSM IS PORT (Clock : IN STD_LOGIC; Reset : IN STD_LOGIC; A : IN STD_LOGIC; B : IN STD_LOGIC; Y : OUT STD_LOGIC; Z : OUT STD_LOGIC); END Example_Regout_FSM;

ARCHITECTURE RTL_3_Process_Model_undelayed OF Example_Regout_FSM IS TYPE State_type IS (Start, Middle, Stop); SIGNAL CurrentState : State_Type; SIGNAL NextState : State_Type;

BEGIN

FSM_sync : PROCESS(Clock, Reset) BEGIN -- CurrentState register IF Reset = ?1? THEN CurrentState

- M
- mikegurche
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Aug 25, 2006 11:37 AM

Hi, Eilert,

I generally use this style but with a different output segment. I have three output logic templates:

Template 1: vanilla, unbuffered output -- FSM with unbuffered output -- Can be used for Mealy/Moore output -- (include input in sensitivity list for Mealy) FSM_unbuf_out : PROCESS(CurrentState) Y Y

- M
- Martin Gagnon
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Aug 25, 2006 3:46 PM

[snip]

Hi.. I've read this pdf and it's look very interesting.. it's how many different type of state machine implementations etc.. But the way I code my state machine is different of all of them and I don't know if it's good and I'm not sure to which one mine is equivalent. My state machine is on one single process.. but is different than the way is shown in the rtl_chap10_fsm.pdf file.. (the one that's is done in a single process and is supposed to be bad) Here's one of my state machines example. ====================================================================

type txgen_states_t is ( st_idle, st_gotsync, st_tx_delay, st_tx_startcharge, st_tx_stopcharge, st_tx_fire, st_wait_min_period );

...

constant zero32: std_logic_vector(31 downto 0) := (others=>'0'); signal prev_state_buf, cur_state_buf : txgen_states_t ;

...

txgen_state_machine_proc: process(clk, reset_n) begin if reset_n = '0' then prev_state_buf

- K
- KJ
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Aug 25, 2006 8:43 PM

Rather than posting code, I'll refer to yours since it is roughly along the lines of what I do. Instead I'll hope that my explanation is clear enough that one can follow my reasoning (whether you agree or disagree with it) without any more than occasional snippets of code.

First off, I don't make any religious distinction between 'state machine' signals and 'output' signals so I wouldn't feel compelled to have a separate process for outputs, so I might choose to simply combine them into a single process. The advantage (IMO): generally less code, somewhat more readable and maintainable since in many cases, it is much easier to follow the logic that says "if x then goto this state and set this output to this value", end of story.

Having said that though, I do tend to have multiple clocked processes. I base what goes into each process on the somewhat fuzzy definition of what things are in some sense 'related'. Things to me are 'related' if I'm replicating code to implement them in separate processes. An example would be if I have three signals A, B, C that all are of the form "if (x = '1') then.... else.... end if;" then I would most likely have A, B, C in a single process. Of course A, B, C being different would have some additional logic associated uniquely with them so within the overall "if (x = '1')...else...end if;" statement there would be additional logic 'if', 'case', whatever that go into defining them.

Outputs that depend on 'next' state will tend to get implemented in with the state machine for the simple reason that they meet the 'related' criteria. Outputs that depend on current state will tend to get implemented elsewhere because they are not 'related'. Again, no heartburn here because I'm being pragmatic rather than dogmatic about source code positioning, I let the relationships drive how it appears. This tends to produce more robust code (IMO) since there tends to be less duplicated logic that will over time start to diverge because something changed 'up there' but forgot to be changed 'down there'. By physically grouping related things, it is easier to see implications of the change I'm contemplating on other related signals and whether there is a relationship that should be maintained or severed somewhat.

I then try to balance that out with the again somewhat fuzzy term of 'readability'. A single process of 1000 lines of anything to me is too long, I aim for it to fit on a screen....maybe one with somewhat high resolution but that's the basic idea. Scrolling back and forth while you're trying to understand code is not productive and is disruptive I think.

Another criteria I use for whether things should be together in a single process is the number of signals going in and out of that process. I happen to really like the Modelsim 'Dataflow' window and how it integrates with the source and wave windows so that as I'm debugging I can immediately see the inputs that go into producing the one signal that I'm moseying through in order to find the root cause of whatever it is I'm debugging. The single monolithic process that has

100 inputs and 100 outputs will show up as just a large block with all those I/O when I click on it. But if the equation is simply A

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Aug 25, 2006 9:32 PM

I have not seen the reference, but I do FSM one of two ways. If I need to truely optimize things for speed or size or both, I separate my logic from the register; otherwise I use a single clocked process for both. I always register my outputs just like the state and in essence use lookahead for that. But this happens in the same logic so it is very easy to see.

I define the state diagram as a pseudo Mealy machine. By pseudo Mealy machine I mean that you define your outputs on the transitions rather than the states with the realization that the output is only reflected when the state changes. Given a cur_state value, the transitions in the diagram and the code both indicate the next_state and the next_output. The coding matches the diagram so coding is easier.

- M
- mikegurche
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Aug 25, 2006 10:00 PM

Love the enemy :) (I hope the code is right)

ENTITY Example_Regout_FSM IS PORT (Clock : IN STD_LOGIC; Reset : IN STD_LOGIC; A : IN STD_LOGIC; B : IN STD_LOGIC; Y : OUT STD_LOGIC; Z : OUT STD_LOGIC); END Example_Regout_FSM;

ARCHITECTURE RTL_1_Process_Model_undelayed OF Example_Regout_FSM IS TYPE State_type IS (Start, Middle, Stop); SIGNAL CurrentState: State_Type; BEGIN FSM_one_for_all: PROCESS(Clock, Reset) VARIABLE NextState: State_Type; BEGIN IF Reset = '1' THEN CurrentState

- E
- Eli Bendersky
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 5:33 AM

[snip]

This is what I use as well, also avoiding combinatorial processes. Their merit is probably faster simulation time, but it comes at the price of inferior readability and those "latch avoidance" side effects.

I also try to avoid variables for another reason (in addition to the ones you stated). Somehow, when variables are used I can't be 100% sure if the resulting code is synthesizable, because it can turn out not to be. Additionally, since I do use signals, variables create the mixup of "update now" and "update later" statements which make the process more difficult to understand. With signals only it's all "update later".

Don't you run into fanout problems for that single flip-flop that pushes the sync reset signal to all other FFs in the design, or does the synthesis tool take care of this ? I tend to use async resets, but my whole design is usually synchronized to the same clock so there are no reset problems.

Can you point out a few common problems with async resets ? In particular, what is using them "appropriately" and what isn't ?

Eli

- K
- KJ
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 11:08 AM

The fanout of the reset signal is the same regardless of whether you use synchronous or asynchronous resets. In either case, the reset signal still needs to be synchronized to the clock (see further down for more info) and in both cases the reset signal itself must meet timing constraints. If the reset signal doesn't meet timing constraints due to fanout (and the synthesis tool didn't pick up on this and add the needed buffers automatically) then most fitters for FPGAs give some method for limiting fanout with some vendor specific attribute that can be added to the signal.

Forgetting (or not realizing) that the reset signal does in fact need to be synchronized to the clock(s). Whether using async or sync resets in the design, the timing of the trailing edge of reset must be synchronized to the appropriate clock. Simply ask yourself, what happens when the reset signal goes away just prior to the rising edge of the clock and violates the setup time of a particular flip flop? The answer is that well...you can get anything....and that each flip flop that gets this signal can respond differently.....and then what would that state do to you think your 7 state, one hot, state machine will be in after this clock? Quite possibly you might find two hot states instead of just one.
Somewhat related to #1...Forgetting that your 'synchronized to the clock' reset signal is only synchronized within the one clock domain....and using it in some other clock domain which puts you right back into the situation of #1, that the reset signal can violate timing. This is really a clock domain crossing problem and would occur whether async or sync resets were used though but thought I'd toss it in. It does mean though that you need separate shift chains (one for each clock domain that needs a reset) but again, you need this regardless of if the rest of the design uses reset synchronously or asynchronously.
On a board that distributes the reset signal to whoever needs it, having that reset signal pick up noise that gets coupled over from some other signal on the board. By using the reset signal synchronously internal to the device, you can minimize (and often eliminate) what otherwise would have been 'inadvertant' resets caused by noise coupling. If the board design happens to be a single clock design, then this 'noise' would most likely be occurring just after the clock when all the outputs are switching, but if you use the reset signal in a synchronous manner then it is just like any other signal and doesn't need any special care when routing the board....you can't the same for an async reset signal on the board, routing of the 'reset' signal can be an issue...and one that you won't be able to give any real good guidance about to the PCB designer that is trying to route this signal.
Overuse of just which signals really need to be 'reset'. This is somewhat related to #3 and is also a function of the designer. Some feel that every blasted flip flop needs to be reset...with no reason that can be traced back to the specification for what the board is supposed to do, it's just something 'they always do'. Inside an FPGA this may not matter much since we're implicitly trusting the FPGA vendors to distribute a noise free signal that we can use for the async reset, but on a board this can lead to distributing 'reset' to a whole bunch of devices...which just gives that signal much more opportunity to pick up the noise mentioned in #3. If you're lucky, the part that gets the real crappy, noisy reset signal is the one where you look at the function and realize that no, nothing in here 'really' needs to get reset when the 'reset' signal comes in. At worst though, you see that yes the reset is needed, and you may start band-aiding stuff on to the board to get rid of the noise or filter it digitally inside the device if you can, etc. Bottom line though is that if more (some?) thought had been put in up front, the reset signal wouldn't have been distributed with such wild abandon in the first place.
There was also a post either here or in comp.lang.vhdl in the past couple months that talked about how using the generally listed template can result in gated clocks getting synthesized when you have some signals that you want reset, and other signals that you don't. Being in the same process and all, the original poster found that gated clocks were being synthesized in order to implement this logic. The correct form of the template (that rarely gets used by anyone posting to either this group or the vhdl group) is of the form process(clk, reset) begin if rising_edge(clk) then s1

- K
- KJ
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 12:53 PM

I overstated somewhat. There are times when external interfaces require asynch reset behaviour. Generally though the behaviour that is required is for the outputs to 'shut off', 'tri-state' or something of that flavor. In those situations, you are of course are then required to async reset those outputs....but that in no way implies that that the async reset needs to go anywhere else (like into the state machines that have the logic that drives those outputs).

So use those async reset flip flops where it is actually required per specification and nowhere else is probably closer to the truth about my actual usage.

KJ

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, Aug 28, 2006 1:11 PM

The fanout of an async reset in an FPGA is not an issue because the signal is a dedicated net. The timing is an issue as all the FFs have to be released in a way that does not allow counters and state machines to run part of their FFs before the others. But this can be handled by ways other than controlling the release of the reset. Typically these circuits only require local synchronization which can be handled easily by the normal enable in the circuit. For example most state machines do nothing until an input arrives. So synchronization of the release of the reset is not important if the inputs are not asserted. Of course this is design dependant and you must be careful to analyze your design in regards to the release of the reset.

That is what I addressed above. Whether the circuit will malfunction depends on the circuit as well as the inputs preset. It is often not hard to assure that one or the other prevents the circuit from changing any state while the reset is released.

Since the dedicated global reset can not be synchronized to a clock of even moderately high speed, you can provide local synchronous resets to any logic that actually must be brought out of reset cleanly. I typically use thee FFs in a chain that are reset to zero and require three clock cycles to clock a one through to the last FF.

This is not a problem when you use the dedicated reset net. Even though there are FFs that do not need a reset, it does not hurt to put the entire device in a known state every time. It is not hard to miss a FF that needs to be reset otherwise.

Personally I think the noise issue is a red herring. If you have noise problems on the board, changing your reset to sync will not help in general. You would be much better off designing a board so it does not have noise problems.