I am reworking a design I did a couple of years ago to fit a newer part. The original design made use of async ram blocks since they fit the application better. Now I am forced to use registered synchronous block rams. This will create an extra clock delay on reads if I don't think of a way around it. But the clever guy I am, I have come up with a couple of alternatives.
The source of the address is a register that will often be updated just before the cycle that needs to do the memory access. Call the old memory access cycle 1 and the cycle that calculates the address cycle
- The write enable is not valid until this clock cycle 1, but during the clock cycle 0, the address is on the input to the address register, call it "next address". If I run "next address" to the address inputs on the memory, I can start the memory read on clock cycle 0 and the timing works out. I can't do the same trick with the read or write enable since they depend on decoding that will not take place until cycle 1 where the memory access was happening. Since the enable signal is not available early, the read will have to take place on every clock cycle wasting some power whether I need to do a read or not. So it looks like I will have to do a read on every cycle using the "next address" and a write only when I need it using the "current address".
I can use a dual port memory and connect one for the read and one for the write. This can even be done on the same port if the address has a separate enable. Then I can use the address input to the block ram as the address register. The address is updated on every clock cycle and a read performed, except when the logic signals a write, then the write enable is asserted on cycle 1 and the address enable is removed so keep the same address that was latched on cycle 0. I see the Altera Cyclone
2 parts have an address enable that will let me hold the last address. I don't see an address enable on the Xilinx Spartan 3 parts and I am not sure about the Lattice ECP2 parts as I don't have the full data sheet.The only down side to this "trick" is that it adds a bit of time to the logic path that updates the address register. But the actual ram setup time seems to be pretty small and I expect the routing can be kept pretty minimal as well. So this timing impact may not make the address setup the critical path. But I expect the overall timing to change significantly since the instruction fetch and other internal memory access will be greatly improved using the sync block ram. So the address update may end up as the critical path.
Am I missing something about how to best use a block ram? Any other ideas on how to do the read without adding a clock cycle?