ddr with multiple users

In a basic ring, there needs to be only N segments to create a closed loop with N nodes, memory-controller included. Double that for a fully doubly-linked bus.

Why use a ring bus?

- Nearly immune to wire delays since each node inserts bus pipelining FFs with distributed buffer control (big plus for ASICs)

- Low signal count (all things being relative) memory controller: - 36bits input (muxed command/address/data/etc.) - 36bits output (muxed command/address/data/etc.)

- Same interface regardless of how many memory clients are on the bus

- Can double as a general-purpose modular interconnect, this can be useful for node-to-node burst transfers like DMA

- Bandwidth and latency can be tailored by shuffling components, inserting extra memory controller taps or adding rings as necessary

- Basic arbitration is provided for free by node ordering

The only major down-side to ring buses is worst-case latency. Not much of an issue for me since my primary interest is video processing/streaming - I can simply preload one line ahead and pretty much forget about latency.

Flexibility, scalability and routability are what makes ring buses so popular in modern large-scale, high-bandwidth ASICs and systems. It is all a matter of trading some up-front complexity and latency for long-term gain.

Since high-speed parallel buses end up needing pipelining to meet high-speed timings, the complexity and area delta between multiple parallel buses and ring-bus topologies is shrinking.

--
Daniel Sauvageau
moc.xortam@egavuasd
Matrox Graphics Inc.
1155 St-Regis, Dorval, Qc, Canada
514-822-6000
Reply to
Daniel S.
Loading thread data ...

Hi Daniel, It is very interesting to learn there is a ring bus structure over there.

"Flexibility, scalability and routability are what makes ring buses so popular in modern large-scale, high-bandwidth ASICs and systems"

Can you please me some reference papers about ring bus applications in ASIC or FPGA?

Normally what a designer is concerns most about is data latency in a bus structure Thank you.

Weng

Reply to
Weng Tianxiang

Weng,

It occured to me that your circuit was identical to the ring buffer one. N users each had a fifo going to the DDR. Then there was one stream coming out of the DDR, so it's (N+1) interfaces. But then you said each user needs its own fifo so it can store + forward data from the DDR. So you've got 2N interfaces effectively. The new fifos are just moved into the user's realm and not part of the DDR controller.

My point is the same circuitry exists in both cases. You've just exercised some creative accounting :).

-Dave

--
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
Reply to
David Ashley

HI David, I am really surprised to what you recognized.

They are totally different.

This is ring topology:

A --> B --> C --> D ^ -----------------------|

This is my topoloty:

A --> E --> | --> A B --> E --> | --> B C --> E --> | --> C D --> E --> | --> D

Weng

Reply to
Weng Tianxiang

Real-world ring-buses:

- IBM Power4 Multi-Chip-Module core-to-core interconnect

- IBM Power4 MCM-to-MCM interconnect

- IBM Power4 system-to-system interconnect

- ATI X1600/X1800 memory ring-bus

IBM made lots of noise about its ring bus architecture a few years ago but I am pretty sure I read about something similar many years earlier. I am guessing Power5 must be using much of the same even though IBM did not make as much noise about it.

--
Daniel Sauvageau
moc.xortam@egavuasd
Matrox Graphics Inc.
1155 St-Regis, Dorval, Qc, Canada
514-822-6000
Reply to
Daniel S.

Right, ok I didn't understand what ring topology was, sorry. Snip out my reference to ring topology then but my observation still goes. E is the DDR, right? Anyway you don't show the fifos associated with ABCD on the right side.

-Dave

--
David Ashley                http://www.xdr.com/dash
Embedded linux, device drivers, system architecture
Reply to
David Ashley

Hi David, Yes, every component has a read fifo internally. I don't have to show them in the interface. They are not the part of interface.

  1. We are talking about how to implement a multiple DDR controller interface;
  2. We are talking about its performance efficiency;
  3. We are talking about the minimum number of wires of the DDR controller interface;

An individual read fifo for each component will give designers a freedon or a buffer to isolate many differences among DDR and compoments.

For example, they may use different clocks, different data rate and so on.

Weng

Reply to
Weng Tianxiang

Hi Daniel,

  1. IBM uses ring buses in its core-to-core interconnect. It is a good option there, because time latency among cores are not urgent. That saves a fast switch design among multiple CPUs. You may imagine if there are 16 cores, how difficult it is to design a fastest switch among the 16 cores!!! And IBM has good experiences with ring net system.

  1. I guess you cannot list a 2nd example to use ring buses in a CPU among its tens of registers. Why? Internally nodoby could afford the clock latency among any registers.

  2. You option to use ring buses in your application is justified: you are not concerned about the latency.

Thank you for ring bus information.

Weng

Reply to
Weng Tianxiang

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.