Style of coding complex logic (particularly state machines)

Eli Bendersky · 2006-08-24T05:28:46+00:00

Hello all,In a recent thread (where the O.P. looked for a HDL "Code Complete"substitute) an interesting discussion arised regarding the style ofcoding state machines. Unfortunately, the discussion was mostlyacademic without much real examples, so I think there's place to openanother discussion on this style, this time with real examplesdisplaying the various coding styles. I have also cross-posted this toc.l.vhdl since my examples are in VHDL.I have written quite a lot of VHDL (both for synthesis and simulationTBs) in the past few years, and have adopted a fairly consistent codingstyle (so consistent, in fact, that I use Perl scripts to generate someof my code :-). My own style for writing complex logic and statemachines in particular is in separate clocked processes, like thefollowing:type my_state_type is( wait, act, test);signal my_state: my_state_type;signal my_output;......my_state_proc: process(clk, reset_n)begin if (reset_n = '0') then my_state if (some_input = some_value) then my_state ... when test => ... when others => my_state

K

KJ 19 years ago

My point was that if timing is not met due to the large fanout, that the typical fitter will allow for the fanout to be limited by the user if necessary. But to directly answer the original question, 'no' I haven't had reset signal fanout as a problem but if I did I know I could fix it by limiting the fanout on the fitter side without having to change the source code. But I also tend to reset only those things that really need resetting which, by itself, cuts down on the fanout as well.

I agree.

But simply synchronizing the reset in the first place will do that as well...two different approaches to the problem, each equally valid.

Agreed, but one can also view these locally generated resets as simply synchronized versions of the original reset. In fact, the synthesizer would probably do just that seeing that you have (for example) 4 places throughout the design where you've generated a local reset which is simply the raw reset signal brought into a flip flop chain (I think that's what you're describing). So it would take those four instances and probably generate a single shift chain and local reset signal to send to those 4 places. So all you've really done is written the source code for the local reset 3 more times than is needed. Had you taken the view that the reset input to those 4 places must be a synchronized reset signal in the first place you probably would've written the reset shift chain logic one time at a top level and connected it up to those four inputs yourself and not written it on the receiver side.

I agree, but I was referring more to the reset signal distribution on a board rather than inside an FPGA.

OK, it doesn't 'hurt', but it doesn't 'help' either in the sense that both approaches would meet the exact same requirements of the functional specification for that part.

Inside the FPGA it doesn't matter since if you discover something that you now realize needs to be reset you re-route and get a new file. Not routing it to a part on a board and then discovering you need it is a bit more of an issue. Resolving that issue by routing reset to every part and then using it asynchronously is where problems have come up when there are a lot of parts on the board.

If it's a red herring than I can safely say that I have slayed several red herrings over my career...but actually not many of late....not since a certain couple designers moved on to to greener pastures to be brutally honest.

Maybe. But remember the scenario when you're brought in to fix a problem with an existing board that you trace back to some issue with reset. In that situation, a programmable logic change is more likely the more cost effective solution.

KJ

Vote

R

rickman 19 years ago

I don't know exactly what you mean by fanout. If a sync reset has to go to 100 FFs, then there is nothing you can do to tell the fitter to change that. The async reset is free, or actually already paid for, so if it does the job why not use it?

Both valid, but typically I find the async reset takes less effort and resources. Only a small portion of my typical design has to be controlled coming out of reset.

Yes, that is exactly how I think of it, a local sync'd reset. Putting it where it is needed is both very clear and saves resources. I never use this in place of the async reset, but rather to supplement it for synchronization. Much of the logic has to be reset, but very little of it has to be synchronously released from reset.

I understand, but noise still can upset a sync reset. This is just not a workable solution to noise.

I don't agree. By globally resetting the device, you have handled all FFs so that if your requirement misses one, you don't find out about it after the unit is in the field.

The question is when do you find out about the missing reset? It is easy for this sort of thing to slip totally through testing and only show up in the users's hands.

I assume you mean board designers who were not producing quiet boards?

I am in a fairly long thread in comp.arch.embedded about how to design boards so that you don't have SI and EMI issues. I think this sort of problem should be dealt with before you make the board, not after it is in the field. Too many engineers learn to cover their butts rather than to produce good designs. I am tired of working that way and not really knowing if my design will work before it is shipped. The one universal rule I learned very early on is that you can not prove a product works correctly by testing. It has to be designed to work correctly by using design methods based on understanding what you are doing. I have never seen a board noise issue that could be fixed by an FPGA design change.

Vote

K

KJ 19 years ago

Yes you can. If for example, the timing analysis failed because of reset then you can tell the fitter to limit fanout to say, 20. Then what the fitter would do is replicate the flip flop that generates the reset signal so that there are 5 of them and distribute those 5 (logically identical) resets to those 100 loads.

We can debate the extra resource usage of those 4 extra flops or that maybe there wouldn't have been 100 in the first place, but I think we've both made our points already.

At what point do you want to find out that the answer to the question "if it does the job..." turns out to be "No, it doesn't do the job" because the designer of some hunk of code that you're integrating in didn't pay as close attention to resets as they should have and that the way that the code 'used' the reset, while implying it could be asynchronous really was not the case and that it needed to be synchronous after all? (Either that or 'fix' the errent hunk of code of course).

And unless that small portion is actually 'zero' you'll need some synchronizer somewhere. In that case, I've found that resources differences is neglibile or non-existent. I'll accept that you may have seen differences and I don't want to get into the nitty gritty but I'll bet that those differences that you saw were pretty small as well. If not, then to what did you attribute the large difference would be interesting to know.

As for effort, the only effort I see in either case is the coding which is identical. It's just a question of where you physically put the "if (Reset = '1) then"....or is there some other effort that you mean?

I'm not sure what solution you're referring to. All I'm saying is that use of a synchronous reset is less susceptible to a noise issue than an asynchronous one because it requires the noise to be somewhat coincident with the clock in order for it to have any effect. On a given board design though that coincidence will tend to either be near

0 or near 100%....but those near 0 ones don't need to be fixed because they're not broken if used synchronously.

Only if your requirement is that the flip flop be set to the state that you happened to have coded for it and not the other state. In any case, it's not sporting to say that one design approach is better because it has a chance that it just happened to code correctly for a missing requirement. Actual reset signal requirements are usually pretty benign and in many cases NOT coding it as a matter of course could lead one to finding this missing requirement earlier....during simulation. The scenario I'm thinking of here is that OK, the functional requirements has an as yet unidentified reset state. Based on that I code the design and do not do anything to signal 'ABC' as a result of reset. During simulation I find that I just can't get signal 'ABC' into the proper state (since it is an unknown at the end of reset) and that I need to because the logic tree that it feeds into requires 'ABC' to be in the proper state. In that situation the simulation has immediately hit on to the missing functional requirement and you can investigate, whereas coding to a specific value you have the chance of getting it right or not and not finding out until product is in the field. Starting with 'U' states in simulation and seeing your system simulation model drive the 'U' out as a result of signals other than 'reset' is a good indicator of things I've found.

Simulation and the 'U' value in the std_logic type is the key here I've found to getting all initialization issues properly identified really early on, long before prototypes.

And that then had problems that needed to be fixed.

Totally agree. But being realistic here, if you DO have boards out in the field with this problem, there is also the issue of what is the cost effective way to fix the problem from the perspective of both your company and your customer?

Agreed.

Here's a hypothetical one (but not far from what I've seen) for you then. You've got a 'blip' on reset where it gets above threshold and that lasts for...maybe 1 ns at the receiver. You trace it down and find out exactly what output switching condition is causing the blip to happen. You can also characterize and analyze it to say that it will never be able to couple and cause this blip to exist for more than 2 ns. On the receiver you have a clocked device that receives this reset signal.

The 'proper' solution of course is to re-route the board to get the reset away from the noise initiator, guard it appropriately, etc.....the 'soft' design change is to change the code in the receiving device to ignore resets that last for only two clocks or less (or whatever works for you). Granted, the reset response of the device has been degraded (by that clock cycle or two) but in many cases, that's OK as well. You need to investigate it of course to validate but under the right circumstances it would work just as flawlessly as the PCB re-route.

The point being that just because a solution does not tackle the root cause does not necessarily imply that it is in any way less robust. And I'll also accept that in some (possibly many) situations there may be no 'soft' solution...if you'll also accept that in some (possibly many) situations that there really might.

Now, you've got "N" boards in the field. What is the 'best' solution, not only from the perspective of your company (presumably the 'soft' update is easier to distribute) but from your customers as well (who would have down time to swap out the PCBA....oops, that board is in a deep sea sensor? On the way to Mars? Inside average Joe user's PC?)

KJ

Vote

M

Mike Treseler 19 years ago

If you mean that a variable does not always infer a register I agree. If you mean that synthesis does not always produce a netlist that simulates the same as the code, I disagree.

I agree, and this is exactly why I do not declare any signals for synthesis.

-- Mike Treseler

Vote

D

Duane Clark 19 years ago

I would be interested in whether anyone has theories on why variables would simulate faster than signals. And whether this behavior has been seen on different simulators, or only Modelsim.

Vote

A

Andy 19 years ago

KJ,

I may be the previous poster you are speaking of...

The standard template with "if reset then... elsif rising_edge(clk) then ..." will not cause a gated clock, but rather a clock enabled register, disabled during reset, for those signals not reset in the reset clause. This is also independent of whether reset is coded as a synchronous or an asynchronous input (because of the elsif). The template you used above would allow the normal clocked statements to execute, and then override those signals that are reset, leaving the unreset ones to retain their normal clocked behavior, thus avoiding the need to disable them during reset.

Other comments on this thread:

If one disables all retiming and other sequential optimizations, then there is definite merit in a descriptive style that explicitly describes combinatorial behavior separately from registered behavior (i.e combinatorial processes or concurrent statements separate from clocked processes). But once retiming, etc. are enabled, all bets are off. In those cases, I believe one is better off focusing on the behavioral (temporal and logical) description and getting it right, and not paying so much attention to specific gates and registers which will not exist in the final result anyway. Since I enable retiming by default, I use single, clocked processes by default as well.

One aspect that has not been touched upon is data scoping. One convenient aspect of using variables is that their scope limits their visibility to within the same process. The comment about "related" functions being described in the same process is important in this aspect. There is no need for unrelated functions to have visibility to unrelated signals. Within "one big process" for the whole architecture, scoping can be implemented with blocks, functions, or procedures declared within the process to create islands of related functionality, with limited visibility. I generally prefer to separate unrelated functions to separate processes, but all my processes are clocked.

State variables are one such scoping application. I generally don't want any process but the state machine process itself to have any idea of what states I am using, and what they mean (the concept of "information hiding" comes to mind). If I need something external to happen when the state machine is in a certain state, I do it from within the state machine process, either by handling it directly (e.g. adding one to a count), or "exporting" a signal to indicate when it should happen. The same effect can be accomplished with local signal declarations inside a block statement that contains the combinatorial next state process, the output process (if applicable), and the state register process.

Andy

Vote

A

Andy 19 years ago

Variables simulate faster because there is no scheduling of a later value update, as with signals (signal values do not actually update until after the assigning process suspends). If the signal has processes that are sensitive to it (i.e. separate combinatorial and registered processes), then there is the process invocation overhead as well.

Most modern simulators also merge all processes that are sensitive to the same signal(s), to avoid the duplicate overhead of separate process invocations. Combinatorial processes, because of their widely varying sensitivity lists, foil this optimization.

By using only clocked processes with variables, one can write synthesizable RTL that simulates at speeds approaching that of cycle-based code on cycle accurate simulators.

Andy

Duane Clark wrote:

Vote

M

Mike Treseler 19 years ago

Exactly. Synthesis will go through asynchronous contortions to *prevent* a register from being reset. This is why I reset all registers the same way and why I don't touch my process template between _begin_ and _end_.

Well said.

-- Mike Treseler

Vote

K

KJ 19 years ago

No, you weren't the one Andy although you and I did discuss resets on that thread as well. The one I'm referring to is from June 15 in comp.lang.vhdl called "alternate synchronous process template" started by "Jens" (all that just in case the link below doesn't work)

formatting link

At the time, nobody seemed to dispute Jen's claim that the gated clock could be created....I dunno, don't use them async resets ;)

I'm not sure what merit you see in that. I'm describing the functionality of the entity. If there is some need for what amounts to a combinatorial function of the current state I'll do it with a concurrent statement whereas you and Mike T will do it with a variable. In either case, we would be trying to implement the same function whether optomizations were on or off.

The "focusing on the behavioral (temporal and logical) description..." is what I'm focused on as well. I also couldn't care less about "specific gates and registers which will not exist in the final result anyway". I'm just trying to get the function and timing to meet the goal, if it all gets mushed together in the synthesis process that's fine...that's what I pay for the tool to do....

Either that or I'm missing what your point is, I've been known to do that.

I'll agree but add that that is somewhat of a 'religious' statement. If taken to the other extreme yes you have a huge mass of only global signals (and I'm not advocating that) but if one breaks the problem down into manageable sized entities you don't (or should I say, I don't) tend to have hundreds of signals in the architecture either. It's a managable size, say from 0 to 2 dozen as a rough guess.

This wouldn't address the issue I brought up about the use of Modelsim's Dataflow window as a debug aid, but OK....my islands of related functionality are the multiple processes and the concurrent statements.

As do I.

I would consider that to be a 'religion' thing. I wouldn't draw the somewhat arbitrary boundary, I consider all of the logic implemented in an entity to be closely enough related that they can at least talk amongst themselves if it is helpful to get the overall function implemented. Not really disagreeing with you, just saying that there is no reason that relates back to the functional spec that would justify this hiding so I wouldn't necessarily break them apart unless the 'process fits on a screen' fuzzy rule starts kicking up.

And that tends to muddy the waters somewhat for someone following the code since they can't perceive the interaction between the state machine and the outputs all in one fell swoop that they could if it was put together (and it didn't violate the 'process fitting on a screen' fuzzy rule.

Good points, I don't necessarily disagree with the idea of local scoping and information hiding as a general guiding principle but it can be taken as dogma too that results in hiding things from those who have a need to know (i.e. those other statements, processes, etc. that are all within the same entity/architecture).

If you view all of those statements and processes in an entity as being part of the same 'team' doing their little bit to get the overall function of the entity implemented none is really more important than the other, they all live and die together. By that rather crude sports analogy the idea of 'information hiding' should be taken with some suspicion. And yes, I realize that VHDL has nothing to do with sports just thought I'd toss out an unrelated analogy to break up the day.

But which approach one takes is definitely a function of just how 'big' the function is being implemented. One with hundreds of signals would be far worse than multiple processes with local variables all scoped properly. But if you have hundreds of signals I'll bet you have thousands of lines of code all within one architecture and I'll bet would be a good candidate for some refactoring and breaking it down into multiple subentities that could be understood individually instead of only as some large collective.

KJ

Vote

A

Andy 19 years ago

Wow, I never even noticed he said "gated clock" in the OP of that other thread. I have never seen that, just the clock-disabled registers (which creates a problem when the reset asynchronously asserts, all mine synchronously deassert anyway).

The synthesis tool is not just trying to keep those unreset registers from resetting, it is keeping them from doing anything else while the other registers are reset, which is exactly the way the code simulates, because of the elsif. Avoiding the elsif by using a parallel if statement (whether synchronous or asynchronous) at the end avoids the clock-disables. The main place where I have run into this is when inferring memories from arrays. The array cannot be reset, otherwise you get a slew of registers. But if it is in a process with other registers that do get reset, then that creates a problem, which is solved by putting the reset clause in parallel, at the end. Occasionally, resets cause optimization or routing problems when I'm trying to squeeze the most performance from a design, and I'll remove the reset from those registers as well if it is not needed. My general preference is to reset everything though, and I generally use the traditional form since it will give me a warning if something is not reset.

I don't take data scoping to a religious level, but I do keep it in mind, even below the architecture level.

When coding state machines and their outputs, I prefer to see everything associated with one state in one place. If it is not there, it does not have visibility of the state anyway, the way I code it. That way if I change my mind about the organization or naming of the states, the effects of such a change are limited to one place and one place only. It is more for maintenance than anything else, to try to limit the extent to which all those signals are interweaved, and impossible to untangle. VHDL makes it relatively easy to see what all the inputs are to a function, but finding all the places where a signal goes is another matter. That's what the text search function is for...

As to when to isolate different processes in a separate entity/architecture, that is a touchy-feely type of decision. I usually know it when I see it, but trying to describe a set of rules for it is much more difficult than just doing it. Because my coding styles are generally more compact than those with separate processes for combo and registered logic, I generally get more in an architecture before it gets too big. So a lightweight scoping mechanism is useful to deal with more complexity within a given architecture. Let's just say it helps keep a borderline too-complex description from overflowing into multiple entity/architectures.

I like your "what fits on a screen" standard for processes. That seems to work well for me too. That could be extended to functions and procedures too, although mine are not usually anywhere near that long, and they are usually defined within the process anyway.

My point about merits of separate combinatorial and clocked processes is that most proponents of that style like the fact that they can easily visualize what is gates and what is registers. I try to encourage them to lift their visual ceilings (and floors, to some extent) and focus on behavior since, especially with retiming and other sequential optimizations, their original description will have little in common with the synthesis output, except for the behavior which is often obscured by the separation of registers from gates in the first place. The same argument applies to using variables for registers and/or combinatorial logic.

Thanks for the ideas...

Andy

KJ wrote:

couple

formatting link

Vote

E

Eli Bendersky 19 years ago

Is all code using variables always synthesizable, and can you tell by a single look how many clock cycles the update of all values take ? I'd really appreciate a simple example or two. Thanks in advance

Vote

M

mikegurche 19 years ago

In VHDL, a variable is a more "abstract" construct. Unlike a signal, which is mapped to a wire or a wire with memory (i.e., a latch or FF). There is no direct hardware counterpart for variable and the synthesized circuit depends on the context in which the variable is used.

The variable in VHDL is "static", which means that its value will be kept between the process invocations. This implies a variable may need to keep its previous value. Thus, a variable infers a latch or an FF if it is "used before assigned a value" in a process and infers a combinational circuit if it is "assigned a value before used". For this aspect, a variable is usually synthesizable. I personally use variable in a very restricted way:

- don't use variable to infer memory

- avoid self reference (e.g., n := n+1).

- use it as shorthand for a function.

Although I don't do it, this approach can even be used in a clocked process and obtain combinational, unbuffered, output (see my previous post on 1-process FSM example).

In synthesis, the problem is normally the abuse of sequential statements, rather than the use of variable. I have seen people trying to convert C segment into a VHDL process (you can have variables, for loop, while loop, if, case, and even break inside a process) and expecting synthesis software to figure out everything.

My 2 cents.

Mike G

Vote

M

Mike Treseler 19 years ago

The variables are updated every clock but that "update" may be to keep the same value.

The advantages of a variable logic description

*increase* with complexity, so a persuasive yet simple example is a challenge.

My favorite simple example is the "clock enabled counters" source here:

formatting link

The focus is on updating values for simulation rather than recipes for gates and flops. The procedure "update_regs" only describes value updates required for the slow, medium and fast counts. Note that I read carry bits and immediately clear them without worrying about what that means in gates or flops.

Note in the RTL schematic view (object) that synthesis does just fine working out how the carries and enables work and where registers are not needed. Also note that a process-per-block description using this view would be more complicated than the example source.

-- Mike Treseler

Vote

R

rickman 19 years ago

Personally I think most problems in using HDLs in this way come not directly from the way signals or variables are used, but rather from the use of an HDL to describe the solution in an abstract way. I nearly always design in terms of registers and "clouds" for the logic. I get a feel for how large the design is and if I need to optimize at this block diagram level. I can even get an idea of how complex the logic part is by looking at the equations that describe it. Then I use an HDL to "describe" the hardware rather than describing the functionality and letting the tool decide what hardware to invoke.

If I know I want a register, I add the code that will infer a register. If I need a certain logic, I can include those equations in the register process or I can use combinatorial descriptions separately. I never start writing the HDL before I have a clear understanding of what the hardware should look like. To me the HDL is just the lowest level description of a sucessive decomposition of the design. The HDL is never used to "program" a solution. This seldom results in the types of problems you are discussing.

Just for the record, I do use integer variables for memory or other sequential logic like counters. Memories simulate much faster when coded with integer variables. This is both because of the integer and the variable, IIRC.

Vote

M

mikegurche 19 years ago

I agreed with you completely. What I am trying to say is that variable may not be synthesizable if you write the code with a "C mentality."

Mike G.

Vote

M

mikegurche 19 years ago

I agreed with you completely. What I am trying to say is that variable may not be synthesizable if you write the code with a "C mentality."

Mike G.

Vote

M

Mike Treseler 19 years ago

The problem is not that it can't be done but rather a lack of tradition and good examples of what the present generation of synthesis tools can do if I let them.

-- Mike Treseler

Vote

R

rickman 19 years ago

I'm not sure I agree that variables are the problem at all. There are many ways to write code that is not synthesizable. This can be done with signals as well as variables. The difference between signals and variables is just that the value of a variable is updated immediately just like a 'C' variable. Signals are only updated at the end of the process. So if you make an assignment to a variable and then use that value in a calculation in the same process, the new value will be used. If you do the same thing with a signal, the old value of the signal will be used. I don't know of a way that this can be unsynthesizable. Variables can not exist outside of a process, IIRC. So the variable must be assigned to a signal in order for it to affect anything outside the process. So in reality, it can only be used as an intermediate value in an assignment to a signal.

Do you have an example of unsynthesizable code using a variable that would be synthesizable with a signal?

Vote

M

Martin Thompson 19 years ago

Why not do this? Synthesis software is good at figuring all this out. If it does what you need it to and meets timing, you're done. Move on to the next problem.

Personally, I have seen people spend far too long doing very explicit coding of detailed stuff, giving the synth tool very little to do, which for a relatively low-performance (still in the 10s of MHz though) design, was a waste of effort. The so-called "naive" approach of writing code in a natural "softwary" way and letting the synth sort it out would have left us more time to sort out the one nitty-gritty bit of code which did have a performance problem.

Sure, if you are pushing the performance envelope, you're going to have to put more work in. If you are doing a high-volume design then you might get in a smaller part and save some money by putting the effort in. But that's just an engineering-tradeoff like any other. Softies do it all the time, optimising their hardcore interrupt handlers, leave the rest to the tools. I assume civil engineers do similar things with their bridges as well :-)

My tuppence :-)

Cheers, Martin

martin.j.thompson@trw.com TRW Conekt - Consultancy in Engineering, Knowledge and Technology http://www.conekt.net/electronics.html

Vote

R

rickman 19 years ago

Is TRW still around? I thought they were bought by Northrop Grumman. I guess some part of TRW was not part of that deal? I used to word in Defense Systems in McLean or whatever they called it that week.

I guess I am too old school to feel good about using 'C' like code. Sure if it works, do it. But I always think in terms of hardware and like to know what I am building before I let the tool build it. I guess I would not want to debug a design where I didn't know what the tool was doing. Then I would be debugging software and not hardware. Maybe that works for some people, but I like to know the hardware I am building so I know exactly how to debug it. That also includes avoiding certain types of bugs that are caused by poorly designed hardware. If the tool generated the hardware then I can't say it doesn't have race conditions and such.

Vote

Style of coding complex logic (particularly state machines)

Join the Discussion

Didn't find your answer?