In a basic ring, there needs to be only N segments to create a closed loop with N nodes, memory-controller included. Double that for a fully doubly-linked bus.
Why use a ring bus?
- Nearly immune to wire delays since each node inserts bus pipelining FFs with distributed buffer control (big plus for ASICs)
- Low signal count (all things being relative) memory controller: - 36bits input (muxed command/address/data/etc.) - 36bits output (muxed command/address/data/etc.)
- Same interface regardless of how many memory clients are on the bus
- Can double as a general-purpose modular interconnect, this can be useful for node-to-node burst transfers like DMA
- Bandwidth and latency can be tailored by shuffling components, inserting extra memory controller taps or adding rings as necessary
- Basic arbitration is provided for free by node ordering
The only major down-side to ring buses is worst-case latency. Not much of an issue for me since my primary interest is video processing/streaming - I can simply preload one line ahead and pretty much forget about latency.
Flexibility, scalability and routability are what makes ring buses so popular in modern large-scale, high-bandwidth ASICs and systems. It is all a matter of trading some up-front complexity and latency for long-term gain.
Since high-speed parallel buses end up needing pipelining to meet high-speed timings, the complexity and area delta between multiple parallel buses and ring-bus topologies is shrinking.