Block Ram vs Distributed Ram

Hi,

I have a query regaring RAM usage in FPGA. Please help me in understanding it.

Whats the thumb rule for choosing Block ram vs distributed Ram?

When, which should be preffered? Why?

Thanks

Ashish

Reply to
Ashish
Loading thread data ...

- If it should be dual ported with different clocks per port, use block ram

- If it is large (more then 1kbit, use block ram)

- If you are have unused block ram, use it ...

Use distributed ram only when you can't use block ram

Zara

BTW, these are guidelines. As every guide line, it may be followed. Or not. It is your personal option, and you may behave differently if such things as timing, clock distrbuiton... indicates you should.

Reply to
Zara

With each Virtex FPGA you get a certain number of BlockRAMs. Until you have used all of them, they are incrementally free, so use them, even if it may look wasteful (18 K or 36 K bit capacity).

Nice features like progrmmable width/depth ratio, dual-ported, read-befor-write option, etc You can even do one read-modify-write operation per clock cycle (using both ports)

One caveat: Reading is a synchronous operation, it only occurs after a clock edge.

Distributed RAM uses 16-bit LUTs and reads in a combinatorial way, and is faster, but requires more design effort.

Peter Alfke, Xilinx Applications

Reply to
Peter Alfke

This is my main query. How does speed matter here? What is thumb rule from operation speed point of view?

I have some logic which uses counters, some configuration registers and FIFO. Fortunately same clock is used for all.

What will be best choice for implementing these? BRAM and distributed RAM? Pls. advice.

I am using Spartan2 device.

Thanks

Ashish

Reply to
Ashish

RAMs and that will be very difficult to avoid if your block RAM usage is in a scattered manner. I tried to implement a 16 port RAM design suggested early in this group but with lot of effort could get only

10ns delay. Sumesh V S
Reply to
vssumesh

The big advantage of distributed RAM is that it's everywhere on the chip which means that PAR can put it near the logic that it's driving. Routing delays are the dominant factor in determining the overall performance of a design. When I look at the worst case paths in .twr reports I generally see that 75% of the delay is due to interconnect so anything that can be done to simplify PARs placement job is helpful. Sixteen word FIFOs are deep enough for many applications, for example I generally use them to couple different stages of my pipelines together. The other advantage of distributed RAM is lower latency. The read port on distributed RAM is asynchronous which saves a minimum of one clock cycle. In high speed designs it's frequently necessary to double pipeline the output of a Block RAM, i.e use the output register that's included V4 Block RAM plus an additional dflop register to handle the interconnect delay, so the difference between distributed RAM and Block ram can be as much as three cycles (although I would generally pipeline the output of the distributed RAM in those applications also so the difference is two cycles).

In my experience it always been pretty obvious which type of memory to use. Either you need a lot, in which case you use Block RAM, or you need a little in which case you use distributed RAM. I've never run into a situation where I needed something intermediate, either 16 or rarely 32 words is enough or I need 1K or more.

Reply to
Josh Rosen

Reply to
Peter Alfke

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.