I using AVNET's Virtex II pro development kit. They have included a simple memory project with the board which using xilinx EDK. I want to scrap their memory controller and EDK based project (all the implementations are hidden through the use of countless wrappers), and design my own simple memory controller for on board SRAM and FSM to handle transactions with their PCI bus controller (Spartan IIe) Does anyone have a suggestion as to how to approach this. I have never designed a memory controller before. I am trying to use the board as a coprocessor in a rapid image classification system - it needs to receive data via the PCI bus and operate on it, then put it in a shared SRAM, so it can be accessed via the PCI bus.
Masters Student in Electrical/Computer Engineering
Those wrappers are inserted by EDK; they were not put there by Avnet. The actual source code that is being wrapped is in your EDK directory in the hw/XilinxProcessorIPLib/pcores directory. The wrapper used for the SRAM is the opb_emc core, as specified in the system.mhs file:
So to see the source, in the pcores library, go to the opb_emc_v1_10_b/hdl/vhdl directory. And there it is.
Of course, all this assumes you are using a processor. In the case of the Avnet projects, they use the built in PPC processor. It sounds like this is overkill for your case.
Go to the Cypress website. Download the datasheet, and the VHDL/Verilog models for the SRAM. Put the SRAM HDL model in your testbench and write your interface code. An SRAM is simple to use; it barely justifies being called a "memory controller". You should read the datasheet and do it yourself; you will learn a lot more than asking here.
The problem is that you generate an address, it takes some finite time to actually get out to the pins, then more time to get to the SRAM, then the 10nS to get valid data, then more time to get back to the FPGA, and then a setup time to acquire latched_data. All of which adds up to more than the 10nS cycle time of your clock. But because the tAA is a worst case number, it looks like the actual times are adding up to very close to 10nS, and hence unreliable data.
With that clock rate and that device, you should use two clocks to do a read. Timing constraints won't fix it.
If you are trying to supply a new address on every clock and capture the data, then you need a faster device. For that kind of operation with a
100MHz clock, I would suggest that you use a synchronous SRAM.
Thanks, but I think you're missing my point. I want to use the SRAM in a pipeline manner, capturing the data from the previous read in the same cycle that I present new data. Assuming the delay of getting my address to the SRAM is roughly equivalent to the delay getting the data back I should only have to worry about the tOH (data hold time) of the SRAM being large enough. At 4 ns I'm assuming it is.
And in fact, at 50MHz I can kind of do it. Here's a snippet of my state machine:
S_READ: begin if (data_latched != data_expected) state
Hmmm... reading what you have below, I don't think so...
I think you have a fundamental misunderstanding there. The two delays mentioned add up, and along with the input setup time, need to be less than the clock cycle time. For the 100MHz clock previously mentioned, they do not.
Also these delays create an artificial hold time that makes the tOH of the memory chip completely irrelevant.
You seem to have switched boards on me, and I don't have Altera specs handy. But on the previous board mentioned with a Virtex2P chip, the registered inputs do not even have a hold time requirement, only the setup time.
Yes, 20nS should be enough if the only thing there is the SRAM, and assuming the Altera timing is similar to the Xilinx timing. But you need to check several thing. First, when you say the flash and ethernet are fully disabled, do you mean they are never enabled? Have you verified that is really the case?
Have you verified that all signals to and from the SRAM are registered within the IOBs of the FPGA? Including the data output enable signals? I assume Altera has something similar to FPGA editor to make sure this is really the case.
Assuming that ALL inputs and outputs, including output enables, are registered, then timing constraints on external pins are completely irrelevant, and will have absolutely no effect. The timing is fixed, and can be obtained from the FPGA data sheet. In general, you should be registering all these signals within the IOBs. You should need a very good reason not to.
I think I'm starting to understand it. While I'm still using it in a pipelined manner (latching the previous data while presenting a new address), I see now that I need to allow for additional timing margin. Is it correctly understood though, that if I _didn't_ use it this way, but instead held the address stable while I latched the data, I wouldn't be able to achieve the same memory bandwidth?
So I wonder what Altera means when they write zero-wait-states? In other words, what kind of bandwidth does Nios sees when using this 10ns async sram?
Sorry, no trickery intended. I have two Xilinx boards, but it's my Altera board I use the most. AFAICT, both X and A are very similar and both have zero hold time.
To the best of my knowledge: ... assign flash_cs_n = 1'b1; assign flash_oe_n = 1'b1; // Unnecessary? assign enet_aen = 1'b1; assign enet_be_n = 4'b1111; // Unnecessary? ... where
That would be using asynchronous design techniques. Actually, asynchronous design techniques allow for much faster designs. In a synchronous design, every stage takes exactly one clock period, even though the signals take less than one clock period to go through the intervening logic and propagate.
In an asynchronous design, each stage can be run at whatever the speed of the slowest signal is.
The reason very few people use asynchronous designs is because they can be a royal pain to get the timing right. And in a FPGA almost impossible. In part because the FPGA tools were developed assuming a synchronous design.
I can't really speak for either Altera or Nios, but pesumably zero wait states would require a clock of somewhat less than 100MHz.