After implementing the Wishbone interface for main memory access from JOP I see several issues with the Wishbone specification that makes it not the best choice for SoC interconnect.
The Wishbone interface specification is still in the tradition of microcomputer or backplane busses. However, for a SoC interconnect, which is usually point-to-point, this is not the best approach.
The master is requested to hold the address and data valid through the whole read or write cycle. This complicates the connection to a master that has the data valid only for one cycle. In this case the address and data have to be registered *before* the Wishbone connect or an expensive (time and resources) MUX has to be used. A register results in one additional cycle latency. A better approach would be to register the address and data in the slave. Than there is also time to perform address decoding in the slave (before the address register).
There is a similar issue for the output data from the slave: As it is only valid for a single cycle it has to be registered by the master when the processor is not reading it immediately. Therefore, the slave should keep the last valid data at it's output even when wb.stb is not assigned anymore (which is no issue from the hardware complexity).
The Wishbone connection for JOP resulted in an unregistered Wishbone memory interface and registers for the address and data in the Wishbone master. However, for fast address and control output (tco) and short setup time (tsu) we want the registers in the IO-pads of the FPGA. With the registers buried in the WB master it takes some effort to set the right constraints for the Synthesizer to implement such IO-registers.
The same issue is true for the control signals. The translation from the wb.cyc, wb.stb and wb.we signals to ncs, noe and nwe for the SRAM are on the critical path.
The ack signal is too late for a pipelined master. We would need to know it *earlier* when the next data will be available --- and this is possible, as we know in the slave when the data from the SRAM will arrive. A work around solution is a non-WB-conforming early ack signal.
Due to the fact that the data registers not inside the WB interface we need an extra WB interface for the Flash/NAND interface (on the Cyclone board). We cannot afford the address decoding and a MUX in the data read path without registers. This would result in an extra cycle for the memory read due to the combinational delay.
In the WB specification (AFAIK) there is no way to perform pipelined read or write. However, for blocked memory transfers (e.g. cache load) this is the usual way to get a good performance.
Conclusion -- I would prefer:
- Address and data (in/out) register in the slave * A way to know earlier when data will be available (or a write has finished) * Pipelining in the slave
As a result from this experience I'm working on a new SoC interconnect (working name SimpCon) definition that should avoid the mentioned issues and should be still easy to implement the master and slave.
As there are so many projects available that implement the WB interface I will provide bridges between SimpCon and WB. For IO devices the former arguments do not apply to that extent as the pressure for low latency access and pipelining is not high. Therefore, a bridge to WB IO devices can be a practical solution for design reuse.
A question to the group: What SoC interconnect are you using? A standard one for the peripheral devices and a 'home-brewed' for more demanding connections (e.g. external RAM access)?
Martin