pipelined algorithm, flow control

Hi,

one RTL coding style for pipelined processing goes as follows:

- set arguments to function with latency, i.e. memory lookup or multiplication

- set a trigger bit (/multi-bit word) in a parallel shift register

- when the trigger arrives at the output, continue processing

- cascade multiple stages, i.e. first memory lookup, output triggers multiplication etc

My question is: Is there any commonly accepted and proven way to code this in RTL? I see the above emerging as a "red thread" in my own code, maybe there is some "RTL design patterns" or 10-volume "The Art Of RTL Coding" that would discuss such ideas?

--------------------------------------- Posted through

formatting link

Reply to
mnentwig
Loading thread data ...

I'm wondering how others handle this too. I've done lots of pipelined designs, but don't have a consistent design style with regard to these types of things. I've used spreadsheets (with "time") along one of the axis, and state along the other. Block diagrams, diagraming state values along side the registers, and other haphazard strategies.

I don't usually code an explict handshake - after all the pipeline delays are fixed in the end. Calculating the "fixed" value can sometimes be tricky. But in the end you end with a fixed delay between "data in valid" and "data out valid". (This may vary, for instance with a parameter number of stages, but is still "fixed" in the end). You do have to be careful with matching latencies for stuff coming together, which is again tricky, but fixed.

So often I lay down the "datapath" with the "din_valid" -> "dout_valid" delay just set along side in a SRL, with a tuneable depth.

I often struggle with the "definition" of registers with respect to which register is just a "pipeline" register, and which are actual "Z-1" delays of the actual filter you're trying to design. There's some kind of trick here that I know I'm just missing. (I'm usually designing with a non-systolic clock - i.e. my processing clock has no relation to my sampling "clock").

So, not much advice here, just noting that I see the same issues....

Regards,

Mark

Reply to
Mark Curry

Hi Mark,

thanks for the comments. I did one design twice: Once with a hard-coded 15-state FSM, the other one with shift registers. The first one is more readable, the second one slightly smaller. But this may be because I usually end up with most registers unused, while LUTs are the bottleneck.

Keeping multiple samples "in flight" for the hardware multiplier is so much work that I'm looking into bit-serial multipliers, maybe that's more rapid-prototyping-friendly for audio.

Cheers

Markus

--------------------------------------- Posted through

formatting link

Reply to
mnentwig

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.