You'd be better off on comp.arch.fpga - I've corssposted there and set the followup there.
The answer to your questions depends on whch of Xilinx's many chips you are going to use. You can do what you want with 9 Virtex-II architecture BlockRAMs (assuming each element only needs a single port access). There's plenty of space in each BRAM (each can hold 2048 bytes), and there are two ports on each BRAM. This will fit in even a smallish Spartan-3 device. Check the data sheets for precise numbers of BRAMs.