VGA and framebuffer interface (Waste of BlockRAM)

I

Isaac Bosompem 20 years ago

Hi everyone, I have recently purchased a XC3S200 based board with 256KB Flash, 256KB platform flash and 32KB SRAM. So out of my interests I figured I would design a simple SoC as a learning excercise. I have designed a VGA framebuffer which does 640x480 (but uses pixel doubling so 320x240x2-bit). A complete framebuffer is ~19KB.

At this point I decided I would have to read the framebuffer at a line at a time. A scanline in this mode would need 80 bytes of memory.

Naturally I decided to infer a block RAM with 8-bit data width (well

9-bit, but I am not using parity).

The problem though is that when the Block RAM is 8-bits, you get almost

2KB of space!! So that means I am wasting more than 90% of the space!!

I was looking into using a 8 128x1 distributed RAM and wire them in a way to extend the data word to 8-bits. I am not certain how much of my logic resources this would eat up.

I am fairly new to the FPGA's so I'm not certain if these are the best methods to buffer such a small amount of memory. What would you do if you were in my situation?

Regards

-Isaac

Vote

I

Isaac Bosompem 20 years ago

I just would like to also add that the unit is working perfectly. I just would like some suggestions. Thanks

Vote

W

wv9557 20 years ago

I don't know about spartan 3. On a spartan 2, you can hook 8 blocks of RAMB4_S1 in parallel, giving you a total of 1KByte.

Vote

A

abgoyal 20 years ago

HI,

I am not sure i understand your architecture, can you please describe exactly what you are doing with a single BRAM?

Pending further information, One this is for sure, you dont want to use D-RAM as long as you can avoid it.

I may have completely misunderstood, but why can't you address the entire block RAM (with its 2K depth) using a combination of the horizontal and vertical address lines?

so suppose you keep your current form factor for the BRAM block (8-bit wide). then you could have the high 7 horizontal pixel counter bits hooked up to the low seven address lines of the BRAM, giving you 128 addressable locations. Each location contains 4 pixels, which you can then multiplex on to the output with addtional logic (as i presume u are already doing).

the high 4 address lines to the BRAM block can be connected to the low

4 bits of the vertical line counter. Thus, you would be using 16*80 = 1280 bytes of the total available 2K.

You would still have 4 additional vertical counter bits remaining, which means you will have to use a total of 16 BRAM blocks. This also implies you will need a 16to1 8-bit wide MUX.

Hope this helps.

Vote

S

Sylvain Munaut 20 years ago

You have two choice : - Keep the 1 line prefetch architecture and use Distributed RAM. 128x8 will take you 16 Slices but you then need a 4:1 mux to select between pixels - Continue with the block ram and fetch 8 lines at a time, then you can use the asymettric port width features of the BRAM to select between pixels.

Vote

J

John Adair 20 years ago

Memory in all FPGAs is relatively expensive and generally limited in size.

One Xilinx feature that is generally useful to Video applications is the SRL16 mode of the LUTs. You get 16 bits of storage per LUT. With these you can build a line FIFO either in x8 or x1 format very efficiently. We often use these in conjuction with external memory for some of the video work we do and have 2 or 3 lines of data stored within the FPGA.

Be careful of the 128x1 macro. I am not sure if this is supported in Spartan-3 due to the fact that only half the LUTs can be configured as RAM in the Spartan-3.

John Adair Enterpoint Ltd. - Home of Raggedstone1. The Low Cost Spartan3 Development Board.

formatting link

Vote

S

Sylvain Munaut 20 years ago

er ... 32 slices not 16 ...

but as John pointed out, the 128x1 macro might not work in spartan3/virtex4 ...

Vote

J

John_H 20 years ago

If your frame buffer is in the off-chip SRAM and you want the BlockRAM as the line buffer, don't look at the BlockRAM as wasting 90% of the space. Most people don't end up using all their BlockRAM making this an ideal use. If you implement the buffer in logic to avoid wasting the BlockRAM, you end up wasting 100% of the BlockRAM by not using it rather than the 90% you were concerned about. If you're trying to use the BlockRAMs for other functionality and are concerned about running out of memory *but* you have plenty of logic resources then the 40-45 LUTs (for 8-9 bits at 80 byte depth) is a good tradeoff.

Another consideration: if the BlockRAM is used as a single port (you use one address to write during the blanking and that same address to read when it's active) you have a second single-port memory in that same BlockRAM to access the remainder of that 2kByte memory. To share resources like this usually requires that you instatiate the BlockRAM primitive rather than inferring the memory.

Another suggestion since you're concerned about reading the data during the blanking period: are you pushing the SRAM near its maximum clock rate? (Probably not if you're doing pixel doubling.) If you increase the clock speed with the DCM, you can increase the data throughput into and out of the external SRAM. The BlockRAM can take in data at the SRAM's maximum rate with ease (as will the SRLs). Using a DCM (Digital Clock Manager, I believe) requires a little more care in your design with the suggestion that you use the 1X clock output from the DCM to "phase match" the higher speed clock to the 1X clock rather than using the input clock that feeds the DCM.

Vote

I

Isaac Bosompem 20 years ago

Yes I am concerned about potentially running out of on-chip space. It is not a major issue right now. But I would like to see how you guys would handle it and I am very happy with the responses I got!

I'm sorry for confusing you guys but I did instance the BlockRAM I did not infer it.

I was thinking of allowing external modules to access the rest of the memory through the 2nd port. That would allow me to access the rest of the space in the BlockRAM when I need it in the future.

I am not pushing the SRAM to its maximum speed. The SRAM on my board has a 20ns access time, so I get a little less than 50Mhz when taking setup and hold times into account. I might be able to use a lot more of the BlockRAM using that speed but that would require me to utilize the

2nd port making it unavailable to external entities. The framebuffer reader will stay ahead of the raster counters.

I will try and see if a clock multiply will help, thanks for the tip with the DCM. If you had not told me that I would have used the original signal for the parts that run at 25Mhz.

Vote

I

Isaac Bosompem 20 years ago

Yes I am concerned about potentially running out of on-chip space. It is not a major issue right now. But I would like to see how you guys would handle it and I am very happy with the responses I got!

I'm sorry for confusing you guys but I did instance the BlockRAM I did not infer it.

I was thinking of allowing external modules to access the rest of the memory through the 2nd port. That would allow me to access the rest of the space in the BlockRAM when I need it in the future.

I am not pushing the SRAM to its maximum speed. The SRAM on my board has a 20ns access time, so I get a little less than 50Mhz when taking setup and hold times into account. I might be able to use a lot more of the BlockRAM using that speed but that would require me to utilize the

2nd port making it unavailable to external entities. The framebuffer reader will make use of the 2nd port to stay ahead of the raster counters.

I will try and see if a clock multiply will help, thanks for the tip with the DCM. If you had not told me that I would have used the original signal for the parts that run at 25Mhz.

Vote

J

John_H 20 years ago

The multiply makes life much easier. You can even run the entire design at

50 MHz but use a clock enable every-other clock to get 25 MHz processing downline. With the clock-enabled configuration, it might become more obvious how you can still use the BlockRAM at 50MHz without using the second port to do it.

Vote

VGA and framebuffer interface (Waste of BlockRAM)

Join the Discussion

Didn't find your answer?