Forking in One-Hot FSMs

K

Kevin Neilson 18 years ago

Having two bits hot in a one-hot FSM would normally be a bad thing. But I was wondering if anybody does this purposely, in order to fork, which might be a syntactically nicer way to have a concurrent FSM. This would imply that multiple states in the FSM could be active at once. This would be an example:

parameter STATE1=1, STATE2=2, STATE33,... // state defs casex (state) ... if (state[STATE1]) begin if (condition) begin m

Vote

K

Kevin Neilson 18 years ago

Sorry; there was not supposed to be a case statement here.

Vote

A

Aiken 18 years ago

Vote

K

Kevin Neilson 18 years ago

The example might not show this well, but you may want to fork from several different states and the length of the fork, before it dies, could be several states. So if I just have the single state, I would have to manually branch out through the state tree and figure out all states I could possibly be in while the "fork" would be operating and then add the fork logic to all those states. If that makes sense. Anyway, it's completely unmaintainable, because when you add a new state to the machine you would have to figure out if the pipeline is supposed to be full at that time and remember to add in the logic for that.

-Kevin

Vote

B

Brad Smallridge 18 years ago

Kevin,

I wonder about this too. I am currently doing a pipeline and some code is shown below. So I wrote out the states without an array so when the ModelSim comes up I don't have to expand the states to see them. I also group signals that I want to see associated with the states in the declarative region so I don't have to futz too much with the ModelSim.

Some states stay on longer with the state_count condition. I read one header record on state31 and follow it with three of another type of data record with state32.

Now the "branching" happens because these states address a NoBL SRAM and there is a two cycle lag between the address and the data. Not show below, I also have clock delays on these states, state32_1, state32_2, and so on so when the address goes out on state32, I then have data to process on state32_2.

In my Zen thinking about this, I always have a When state associated with every What.

It actually gets deeper than that because there are FIFOs involved as well. You'll need FIFOs in your design if you are going to tackle a Sobel function. Here the trick is to start thinking about your processses starting from the READ data and figure out how many delays you need to deliver an answer, then figure out where the WRITE data should marry into the flow. I have now states out to _7.

Perhaps someone could suggest a better term than state machine "forking"? And if there is some guidelines on how to code ane debug pipelined architecture. I'm with Kevin, it get's real messy, real soon.

Brad Smallridge AiVision

inner_cell_state_machine:process(clk) begin if(clk'event and clk='1')then inner_cell_restart

Vote

E

Eric Smith 18 years ago

DEC used that style of design in the PDP-16 Register Transfer Modules. Possibly also in the control units of some of their asynchronous processors such as the PDP-6 and KA10.

Vote

M

Mike Treseler 18 years ago

There is no requirement that a process/block must update only a single register named 'State'.

When I look at large, textbook style state machine examples, like the ones in this thread, I imagine a much simpler process that updates several smaller registers. Maybe an input reg, output reg, a couple of counters and a few well-named booleans.

-- Mike Treseler

Vote

B

Brad Smallridge 18 years ago

That has been on my mind because there is a DRAM on my board. Not only will the DRAM require more cycles but perhaps too a varying number of cycles depending on the sequentiality or randomness of the addressing.

I have seen controllers on the Xilinx site, but nothing, that talks about several ports, and how the hand shaking is handled. My FAE has said that some multiport examples are availble.

Brad Smallridge AiVision

Vote

K

KJ 18 years ago

n

Except for the most special case examples, DRAM access will be a variable delay because of page changes and memory refresh.

Trying to design a state machine that is simply trying to *access* memory for some algorithmic purpose would likely result in a difficult to maintain design.

Designing a request/acknowledge interface to some other process or entity (in this case the 'other' being a DRAM controller) results in a much easier to maintain design.

Using the exact same interface signal functionality whether one is talking to internal FPGA memory, NoBL or SDRAM or SPI results in a design that can be reused, retargeted and improved upon if necessary.

Using the same signal naming functionality as an existing documented specification (i.e. Avalon, Wishbone) allows others to (re)use your design without getting bogged down in details that they are not currently interested in and allows them (and you when you re-use the design) to be more productive.

Figure out where you are and where you want to be in the design productivity chain. The synthesis cost in terms of logic resource is zero, the upfront learning cost will start to pay back in the form of quicker debug and reusable designs.

Kevin Jennings

Vote

K

Kevin Neilson 18 years ago

That's interesting--I'm not even familiar with an "asynchronous processor". What does that mean? -Kevin

Vote

K

Kevin Neilson 18 years ago

...

This is a great example, because switching from one type of RAM to another means you *do* have to change everything, if you want the controller to be good. You can certainly modularlize the code and make concurrent SMs with handshaking and this is easy to maintain. And a lot of DRAM controllers are designed this way. But here is the problem: while you are waiting around for acknowledges, you have just wasted a bunch of memory bandwidth. If you want to make better use of your bandwidth, you can't use handshaking. You have to start another burst while one is in the pipe. You have to look ahead in the command FIFO to see if the next request is going to be in the same row/bank to see if you need to close the row during this burst and precharge or if you can continue in the same open row in a different bank, etc. If I do all that with handshaking, I'm frittering away cycles. And to do this in a way that doesn't fritter away cycles with standard methodology means everything is so tightly bound together that to change from SDRAM to some other type of RAM means I have to tear up most of the design.

Another issue I came up with today in the design of my current SM is that I updated a value x and then in the next cycle realized I wanted the old value of x. But I hadn't really updated x; I had issued a request that gets put into a matching delay line and then goes to a concurrent FSM which then updates x. So even though I had "updated" x, I could still used the old value for a few cycles and didn't need a temporary storage register. Again, I can't just send the request to update x and then wait for an ack because the SM has to keep on trucking. This is confusing, and I'd like to have some sort of methodology that would be as efficient as what I'm doing but somewhat more abstract.

-Kevin

Vote

K

KJ 18 years ago

The methodology I use makes use of every clock cycle, DRAMs are running full tilt, transfers from fast FPGA through a PCI bus to some other processor, etc., the whole 9 yards.

Then you're waiting for the wrong acknowledgement. Taking the DRAM again as an example, every data transfer consists of two parts: address/command and data. During a memory write, all of this happens on the same clock cycle. When the controller 'fills up' it sets the wait request to hold off until it can accept more commands (reads or writes).

During a read though, the address/command portion happens on one clock cycle, the actual delivery of the data back to the requestor occurs sometime later. The state machine that requests the read does not necessarily have to wait for the data to come back before starting up the next read. The acknowledge that comes back from a 'memory read' command is that the request to read has been accepted, another command (read or write) can now be started. There are also situations where one really does need to wait until the data is returned to continue on, but in many data processing applications, the data can lag significantly with no real impact on performance, the read requests can be queued up as fast as the controller can accept them.

Although I've been using the DRAM as an example, nothing in the handshaking or methodology is 'DRAM specific', it is simply having to do with transmitting information (i.e. writing) and requesting information (i.e. reading) and having a protocol that separates the request for information from the delivery of that information (i.e. specifically allowing for latency and allowing multiple commands to be queued up).

I disagree.

That's correct...but you can't start one if the pipe is full (which can happen when a memory refresh or a page hit occurs and the pipe fills up waiting while those things get serviced). The handshake tells you that the pipe is full and you absolutely need to have it. The 'pipe full' signal is a handshake, when it is full, it says 'wait', when it is not full, it says 'got it'.

OK

Then you're not doing it properly. It's called pipelining, not frittering.

Latency can matter in certain situations, in others it doesn't. If there is some situation where latency mattered, one would have to come up with a way where the requestor could start up the read cycle earlier...but if there is such a way to start it up earlier, then that change could be applied equally well to the lower latency situation as well which means that you could have a common design

Kevin Jennings

Vote

E

Eric Smith 18 years ago

Kev> That's interesting--I'm not even familiar with an "asynchronous

There's no central clock. At any given time, one particular "unit" in the computer is active. When it completes its work, it sends a pulse to the next unit that needs to do something, thus handing off control.

In some situations, a unit might trigger two other units. Usually in such a case, a later unit implements a "join" between the two paths, by waiting for both to complete.

The logic implementing such a control system looks just like a flowchart.

There were quite a few asynchronous computers in the old days, but the world settled on synchronous designs for various reasons. In recent years there has been a resurgence of interest in asynchronous designs, partly due to the possibility of power savings. There are still no mainstream asynchronous processors, though.

Vote

G

glen herrmannsfeldt 18 years ago

Eric Smith wrote: (snip)

Sometimes also known as "self timed logic", and probably easier to search under that name. (snip)

There are rumors of asynchronous functional modules, such as multipliers or dividers. That might make more sense in current systems than a completely asynchronous design.

-- glen

Vote

T

Tommy Thorn 18 years ago

The industry trend for the last few years have been GALS, globally Asynchronous, Locally Synchronous for many good reasons, including:

- managing the clock skew across a large design is hard, expensive, and power hungry.

- it's a natural paradigm when you want to run islands a different speeds, or even power it down for power savings.

I expect it could also be a life saver, isolating a speed path to just its island rather than impacting the whole chip.

I don't know how common this is in FPGA design, but the LPRP reference design uses a handful of clocks.

Tommy

Vote

Forking in One-Hot FSMs

Join the Discussion

Didn't find your answer?