State Machines....

hi ... I have a question for the experts , I am doing a post mortem of my last project , it was a communication processor that was basicly a lot of dataflow paths controlled by several rather complex state machines ( 100-200 states ) , I did the design by thinking out the control and drawing the state diagrams and then coding them into VHDL for Quartus and into a Stratix .. it worked , after handing over the design to the test department they loaded a board and signed off on the design within a week. My fellow engineers were rather impressed since they knew how complex the control was.

The downside here was the state machines became complex and took quite a while to figure out ... then transcibing them into VHDL and also drawing a pretty state chart for documenting the design took a while. Yes I know some say start right with typing VHDL but I find that hard to comceptualize. Since this is a small company big $$ tools are out of the question.

So my question ... are there tools out there that can make this process faster ? like drawing the state charts on the screen and outputing VHDL ? or other suggestions ??? again if these are $5-10K tools I won't be getting them in this company so shareware or

Reply to
stan
Loading thread data ...

I think you do need to find another job with a better manager! An error in scheduling made by an engineer doesn't deserve a pay cut. If anyone deserves a pay cut, it's a manager who didn't know a basic management rule: take an engineer's estimate, multiply it by 2 and then use the next available unit :) If he knew the rule he would celebrate the work finished way faster than expected!

/Mikhail

Reply to
MM
[...]

I don't know of any really free tools, but isn't there still a state chart editor in the Xilinx tools? However...

Obviously I don't know the insides of your application, but

100-200 states sounds like a real behemoth. Isn't there some way you could partition it and make it hierarchical? Alternatively, it might make sense to think in terms of a microcoded solution - a custom state sequencer engine, and a little ROM containing the sequence information.

Way back in the bad old days of the late 80s there was a nifty little thing from AMD called the 29PL141 that would probably have helped. One of my "must do one of these fine days" jobs is to write an HDL implementation of that - I still have all the original AMD docs on one of my bookshelves at home.

Anyhow, here's my main point: state diagrams are supposed to be clear and self-evident; if they're too big to be clear and self-evident, then perhaps they are the wrong tool for the problem at hand. Similarly, if you have a state diagram of modest size, converting it into VHDL or Verilog is pretty much a no-brainer.

Now, if you managed to get this leviathan to work first time, you obviously know what you're doing and I'm sure you thought of these things for yourself. So, can you offer a clue about *why* your state machines needed to be so huge? and *why* you couldn't make them hierarchical? It would be very interesting to hear your experiences.

Cheers

-- Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK Tel: +44 (0)1425 471223 mail: snipped-for-privacy@doulos.com Fax: +44 (0)1425 471573 Web:

formatting link

The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.

Reply to
Jonathan Bromley

I've tried Statecad, which I believe is the progenitor of today's Xilinx state machine graphical entry tool. Rather than saving me time, the tool made me guess as to just what I'd have to do to get what I wanted in hardware. Maybe if you spent more time at it than I did it'd be worth it, but I find it easier to design in Verilog. The fewer tools between you and the hardware, the better.

I'd have to agree with others who said that you probably should have given more thought to decomposing your FSMs into smaller ones. This is more art than science, and I don't know of any good references that explain how to do this; most people learn from someone else, or from experiences such as yours, after which they vow, "No more huge state machines."

You're to be congratulated on doing a thorough enough job on design that debug went smoothly. Perhaps you deserve a better manager.

Bob Perlman Cambrian Design Works

Reply to
Bob Perlman

My first rules, when describing a complex fsm, is to try to minimize the number of signal generated by one fsm, and to write many concurrent fsm.

One process for one generated signal, and multiply the process.

Normal fsm construction can have two or three process. Then for one fsm do not generate more than 3-5 signals (or vector).

If you have to generate 100 signals, I will write 20 concurrent fsm.

Laurent

formatting link

Reply to
Amontec Team, Laurent Gauch

My first rules, when describing a complex fsm, is to try to minimize the number of signal generated by one fsm, and to write many concurrent fsm.

One process for one generated signal, and multiply the process.

Normal fsm construction can have two or three process. Then for one fsm do not generate more than 3-5 signals (or vector).

If you have to generate 100 signals, I will write 20 concurrent fsm.

Laurent

formatting link

Reply to
Amontec Team, Laurent Gauch

Stan -

I agree with Jonathan that 200 states is nuts. Very error prone and probably very difficult to maintain/modify. Were you parsing the incoming data stream and making state decisions based on that? If so, a programmable communications processor or an embedded processor (like Nios for Altera) is a much better and more flexible choice. Some tasks just aren't meant to be done in hardware, and parsing data streams is one of them.

Nice reward for delivering a working design. I hope you find a way to exit that situation.

Robert

Reply to
Robert Sefton

Reply to
Peter Alfke

"Peter Alfke" ha scritto nel messaggio news: snipped-for-privacy@xilinx.com...

Just read the article... I like the idea, I'll try to remember the trick next design.

Makes medium/big fsm simplicity itself: no routing issues, etc.. very clever. But... too much work for translating to numeric "init" format...

Why don't you try to "whisper" a bit at the ears of ISE StateCad mantainers ?

Maybe outputting INIT tables from a StateCad graph could boost the use of this particular trick.

Reply to
Antonio Pasini

Some hardware objects are not efficiently described as a state machine. For example, an 8 bit shift register could be described using a single variable assignment or as a case of 256 variable states.

Complex sequential logic is easiest for me to describe using clocked processes with lots of local variables.

The state machine model is a clocked process with a single enumerated local variable or signal. With a state editor as your design entry, this is all you can do.

With synthesis tools, you can have many local variables of many types in a single clocked process. You might have a local counter register and a shift register, etc. One page of clear code replaces pages of circles and arrows.

I expect that he could not really find anyone faster. Keep honing your skills, and politely ignore the "schedule as whip" tactic.

-- Mike Treseler

Reply to
Mike Treseler

Do what I did when I had to implement the Rijndael S-boxes: Write a small program that takes an abstract representation and dumps the init strings. Heck, I may end up having to write one myself, if so, I'd release it.

--
Nicholas C. Weaver                                 nweaver@cs.berkeley.edu
Reply to
Nicholas C. Weaver

Using RAM/ROM for a FSM is about as basic as it gets. I remember using that method some 20 years ago about the same time fuse PLDs were starting to appear. Back then fuse ROMs were just about this same size.

Peter Alfke wrote:

--
Rick "rickman" Collins

rick.collins@XYarius.com
 Click to see the full signature
Reply to
rickman

Reply to
Peter Alfke

Hey Peter -

I think it's a great solution, especially with the large number of outputs that can be generated. I also like the multiplier as barrel-shifter. I designed a 32-bit barrel shifter running at 27MHz in an XC4000 part ten years ago and it took up the whole part.

Robert

Reply to
Robert Sefton

Some even included registers, to make single chip state engines ...

"pencil

It is a good idea, but the SW tool side could need work to help it take off.. :) It should be FAST, and quite tolerant of state code revisions - with good tools, the state table could be revised without a P&R :)

Re the OP of large state machines, some form of nested engines might be a better idea

- 200 states is plausible, but they are not likely to have fully random path-links, so a simpler 'umbrella state', with waits from 'task states' could be easier to manage...

-jg

Reply to
Jim Granville

Did you think my comment was a criticism? It clearly is a good way to implement a complex or even not so complex FSM. As long as you won't miss the block RAM I can't see anything wrong. It would be very easy to set up a simple table of present state, next state with outputs. In fact, any time I design a complex FSM, I do that anyway. HDL is nice, but often a table makes the output assignments a lot more clear.

Peter Alfke wrote:

--
Rick "rickman" Collins

rick.collins@XYarius.com
 Click to see the full signature
Reply to
rickman

I think it's a great idea. But I'm probably biased because I learned hardware back in the days when people built that sort of state machine out of real ROMs and various gates and I think writing that type of firmware is fun.

(Registered ROMs were a big advance - they saved a whole chip.)

I think the main advantage of this approach is that it changes a hardware problem (state machine) into a software problem (microcode). At some level of complexity (number of states or instructions) it just seems easier to think about the problem as software.

Your idea would probably get used a lot more if there was an example all worked out. In particular, you need an assembler so that other people can use it as a skeleton. And as others have mentioned, you need an example that shows how the microcode gets through the tool chain and merged into the FPGA bit stream.

What's a good toy example? Can we think of something semi-useful (toy) that would run on a demo board? It would need 50-100 states. I might be willing to hack together some software if somebody would do the rest of the work.

Has anybody considered using LUT sized ROMs for state machines? It doesn't seem likely to be practical but might make an interesing exercise. The classic traffic light controller or vending machine might fit.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
 Click to see the full signature
Reply to
Hal Murray

If you have an FPGA with lots of RAM, doing it in microcode is surely feasable.

256 states 3 possible next states for each state (8 additional bits per "possible new state"). 100 outputs (will have 4 additional bits so 104 outputs is OK)

256 * (104 + (3 * 8)) = 256 * 128 = 32768 bits or 4 KB.

If there is not extremely high clock frequencies, you can build the state machine using deeper SRAMs.

512 * 64 @ 2 clocks per state machine update 1024 * 32 @ 4 clocks per state machine update 2048 * 16 @ 8 clocks per state machine update 4096 * 8 @16 clocks per state machine update

The update will read the SRAM to a shadow register, one slice at the time and will update the outputs simultaneously during the last clock. The next state is determined from the inputs which will select one of the three possible "next" states (S1,S2,S3) by controlling a 3->1 multiplexer. The output of the multiplexer can be written to the "state" register. If you want to stay in the same state, then you do not write to the state register.

You can reduce the number of inputs you react to by having additional output bits in the RAM which controls input multiplexers. You need to figure out the maximum number of inputs any state will react to and create that many input multiplexers.

The outputs of those muxes, and the "current" state can then be used as an address a second 4 bit SRAM. The output is used to select the next state (Keep,S1,S2,S3).

Lots of SRAM and very little Logic. Would work with an 5 k gate FPSLIC...

--
Best Regards
Ulf at atmel dot com
 Click to see the full signature
Reply to
Ulf Samuelsson

Hal Murray wrote: [...]

How about something better than just a toy, but nearly as easy to describe? Something that could be donated to opencores and would actually be reused would surely be more worth someones time (while still useful to students and/or others wanting to see a real world example of this). I'd be interesting in helping on a project like that.

Here is my idea that takes up a non-trivial amount of space in an FPGA:

Ethernet (especially GbE) has the ability to send PAUSE frames (I'll just call them packets since that is what many call them). While a device is receiving one of these packets, it must verify that it is valid (say 16 bits at a time, for 64 bytes, including CRC). Once verified, it outputs a hold signal for the amount of time specified in the PAUSE packet. The hold signal can be used to stop transmitting data in the opposite direction.

This is a simple state machine and could easily be put into BRAM(s), using the BRAM to compare and validate each 16 bit word. There are actually two different packets that are valid to receive (they differ only in the first six bytes), hence there there are two valid states for the first couple words, after which they will recombine to a single valid state thereafter.

Now imagine this for a multi-port system. 24 ports aren't uncommon on systems anymore, and you'd need a state machine for each port (or one larger [and much faster] one that does context switches) to verify the reception of the packets (possibly simultaneously).

You also need a "multi-port" timer that signals when it is ok to start transmitting data again. Having 24 stand-alone timers seems like quite a waste (although they can be quite slow since the unit of measure of the "pause time" field is 409.6 ns at GbE rates).

I'm not sure why it wouldn't be practical, except that the amount of resources saved (a couple LUTS) may not be worth the effort involved (unless there was a program that just spit out the LUT contents for you, as you have been discussing).

Marc

Reply to
Marc Randolph

deserves

than

Especially bearing this in mind...

"it worked , after handing over the design to the test department they loaded a board and signed off on the design within a week. My fellow engineers were rather impressed since they knew how complex the control was"

This would seem to demonstrate that careful up front design saved time in test and verification. I'd point that out to your 'manager' and his boss before telling him where to shove his job (after you've found something better).

Nial

------------------------------------------------ Nial Stewart Developments Ltd FPGA and High Speed Digital Design

formatting link

Reply to
Nial Stewart

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.