Wishbone comments

After implementing the Wishbone interface for main memory access from JOP I see several issues with the Wishbone specification that makes it not the best choice for SoC interconnect.

The Wishbone interface specification is still in the tradition of microcomputer or backplane busses. However, for a SoC interconnect, which is usually point-to-point, this is not the best approach.

The master is requested to hold the address and data valid through the whole read or write cycle. This complicates the connection to a master that has the data valid only for one cycle. In this case the address and data have to be registered *before* the Wishbone connect or an expensive (time and resources) MUX has to be used. A register results in one additional cycle latency. A better approach would be to register the address and data in the slave. Than there is also time to perform address decoding in the slave (before the address register).

There is a similar issue for the output data from the slave: As it is only valid for a single cycle it has to be registered by the master when the processor is not reading it immediately. Therefore, the slave should keep the last valid data at it's output even when wb.stb is not assigned anymore (which is no issue from the hardware complexity).

The Wishbone connection for JOP resulted in an unregistered Wishbone memory interface and registers for the address and data in the Wishbone master. However, for fast address and control output (tco) and short setup time (tsu) we want the registers in the IO-pads of the FPGA. With the registers buried in the WB master it takes some effort to set the right constraints for the Synthesizer to implement such IO-registers.

The same issue is true for the control signals. The translation from the wb.cyc, wb.stb and wb.we signals to ncs, noe and nwe for the SRAM are on the critical path.

The ack signal is too late for a pipelined master. We would need to know it *earlier* when the next data will be available --- and this is possible, as we know in the slave when the data from the SRAM will arrive. A work around solution is a non-WB-conforming early ack signal.

Due to the fact that the data registers not inside the WB interface we need an extra WB interface for the Flash/NAND interface (on the Cyclone board). We cannot afford the address decoding and a MUX in the data read path without registers. This would result in an extra cycle for the memory read due to the combinational delay.

In the WB specification (AFAIK) there is no way to perform pipelined read or write. However, for blocked memory transfers (e.g. cache load) this is the usual way to get a good performance.

Conclusion -- I would prefer:

  • Address and data (in/out) register in the slave * A way to know earlier when data will be available (or a write has finished) * Pipelining in the slave

As a result from this experience I'm working on a new SoC interconnect (working name SimpCon) definition that should avoid the mentioned issues and should be still easy to implement the master and slave.

As there are so many projects available that implement the WB interface I will provide bridges between SimpCon and WB. For IO devices the former arguments do not apply to that extent as the pressure for low latency access and pipelining is not high. Therefore, a bridge to WB IO devices can be a practical solution for design reuse.

A question to the group: What SoC interconnect are you using? A standard one for the peripheral devices and a 'home-brewed' for more demanding connections (e.g. external RAM access)?

Martin

Reply to
Martin Schoeberl
Loading thread data ...

AMBA AXI.

Cheers, Jon

Reply to
Jon Beniston

A follow-up to my lamenting ;-)

I started to define and implement a 'new' SoC inertconnect (Yes, just another interconnect 'standard'). However, will see how far this gets.

The idea for (some) pipeline support is twofold:

1.) The slave will provide more information than a single ack signal or wait states. It will (if it is capable) signal the number of clock cycles remaining till the read data is available (or the write has finished) to the master. This feature allows the pipelined master to prepare for the upcoming read.

2.) If the slave can provide pipelining, the master can use overlapped wr or rd requests. The slave has a static output port that tells how many pipeline stages are available. I call this 'pipeline level': 0 means non overlapping 1 a new rd/wr request can be issued in the same cycle when the former data is read. 2 one earlier and 3 is the maximum level where you get full pipelining on the basic read cycle with one wait state (command - read - wait - result).

The draft of the spec at the moment are few sketches on real paper - takes some time to draw all diagrams for a document (BTW does anybody know a tool for quick drawing of timing diagrams).

The spec. is still not written, but I've implemented it in JOP for the Cyclone and for the Spartan-3. Both sub-projects use now the same memory interface and the Spartan version benefits from the bytecode block cache that was up to now only available in the Cyclone version.

If you are interested in the implementation download the sources from [1] or from the CVS [2] at opencores.org. You can find the SimpCon master in mem_sc.vhd and a slave for a 32-bit SRAM in sc_sram32.vhd. The master uses a pipeline level of 2 for the bytecode cache load. There is also a ModelSim simulation available at modelsim/sc.bat.

Comments are very welcome, Martin

[1]
formatting link
[2] cvs -d :pserver: snipped-for-privacy@cvs.opencores.org:/cvsroot/anonymous -z9 co -P jop
Reply to
Martin Schoeberl

A draft of the specification is available at:

formatting link

Martin

Reply to
Martin Schoeberl

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.