sram

kristoff · 2017-07-22T17:52:38+00:00

Hi, OK, left the lora chips asside for a while, so .. now back to FPGAs. I have two olimex ice40 boards where I would like to use the onboard SRAM. The RAM chip is a samsung K5R4016V1B-10 (256K words * 16 bits). The datasheets are here: The most important pages are page 7 (for "read"), pages 8 and 9 (for "write") and page 10 (for the functional description of the pins). I am trying to interprete the datasheets to see how to use the chip. I think I understand how to read or write one word, but I still puzzled on how to do bulk-write transfers * For read, it seams to be simple: set /WE high and /OE low (*) 1/ put the address on the address-bus 2/ 10 ns later, read the data from the data-out (*) ignoring the /CS, /LB and /UB pins to keep things simple. In bulk transfer, it is like this: - set address 1 on the Address bus - 10 ns later: -> read the data of address 1 from data-out -> (at the same time) set address 2 on the address bus - 10 bs later: -> read the data of address 2 from data-out -> (at the same time) set address 3 on the address bus (etc) * For write, to write one single word, I think it goes like this 1/ set /WE low and /OE high to go to "write" mode -> at the same time set te address on the address bus -> do not yet put the data on the databus (as it still in "output" mode) 2/ 10 ns later: -> put the data on the data-bus (by then, the data-bus has switched to "data-in" 3/ another 10 ns later: -> set /WE high and /OE low to leave "write" mode But I am still puzzled on how to do a "bulk write" of data. The datasheets do not mention anything on what happens if leave the chip in "write" mode and just change the address on the address-bus (as is done for bulk-read) It there is no seperate bulk-write protocol, it looks like a write to the chip takes 3 times as much steps then a bulk-read (3 steps compaired to one single step). Is this a correct interpretation of the datasheet? Can somebody who has already interfaced an FPGA with SRAM confirm or ...

R

rickman 8 years ago

I'm talking about the time the address must remain stable. Your calculations above show it is at a minimum T/2.

When running with fast SRAM it can be very hard to get this to work properly. The devil is in the details of the chips.

Great! You were able to use it on one device at an unknown speed. What was the clock period?

Did you supply the WE from the external clock (same as to the FPGA) or a copy of the clock from inside the FPGA? In the case of the former the total delay through the chip of the signals can be a significant part of the setup margin. If the latter it is hard to control the routing delays.

Rick C

Vote

L

lasselangwadtchristensen 8 years ago

ting

ting delays on data vs. WE". Do you have any basis for that statement?

to get your clock out to you WE pin you first have to get off the clock net work and out to an IO, how are you going to guarantee that delay is the sam e as the data going from an output flop to an io?

74X374 flip flops, nothing more, nothing less. How is that a 'headache'?

with a string of 374 you also have to make sure the delay on the clock is controlled with regards to the data

Vote

K

KJ 8 years ago

On Sunday, August 6, 2017 at 2:08:09 PM UTC-4, snipped-for-privacy@gmail.com wr ote:

outing delays on data vs. WE". Do you have any basis for that statement?

etwork and out to an IO, how are you going to guarantee that delay is the s ame as the data going from an output flop to an io?

One does not need to "guarantee that delay is the same as the data going fr om an output flop to an io" in order to get a working design as you stated. Instead, one can design it such that the clock to the flip flops that gen erates the address/data/control signals are simultaneous, within some appli cable design tolerance, with the clock signal (aka WE) arriving at the SRAM .

In fact, since there are tolerances, if you design it such that the nominal data delay matches the nominal clock delay as you suggest you are essentia lly crossing your fingers hoping that you don't run across a 'fast' data pa th and a 'slow' clock path over full PVT range...either that or you are len gthening the data path on the PCBA to guarantee that it never beats the clo ck. Yes you can do that to get a guaranteed working design, but that would seem to be more of the 'headache' that you mentioned than my approach of j ust routing them on the shortest path as one would probably normally do any way.

o 74X374 flip flops, nothing more, nothing less. How is that a 'headache'?

No. The delay of data relative to clock in a string of flip flops is not i mportant at all if every flip flop receives the same rising edge. Getting multiple receivers to receive the same clock signal (to within some toleran ce) is something that a designer does have control over. Relying on the co ntrol of skew between two or more signals, not so much.

This simultaneous receipt of the clock signal is essentially what goes on i nside every FPGA. You can send any FF output to the input of any other FF on the device because they design the clock network to produce this simulta neous action. It's not because they added data routing delays.

Kevin Jennings

Vote

K

KJ 8 years ago

ock cycle"? The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].

My calculation is T/2-Tco(max). As long as Tco(max) When running with fast SRAM it can be very hard to get this to work

Speaking for myself I can say that no it was not hard at all, it worked rig ht at the start. I'm not sure where you see the difficulty.

And I've provided the details. More so than you.

Thanks!

You're making assumptions here that are incorrect.

I dunno, that was 20+ years ago but it was using the fastest available CMOS SRAMs of the mid to late 1990s. But the clock speed is not relevant, the technique is still valid. The biggest limiting factor is going to be the r ead/write speed of the async SRAM.

Kevin Jennings

Vote

R

Richard Damon 8 years ago

I think, if I understand what you are proposing, one big issue is you seem to be assuming that the clock that you are using as WE starts external to the FPGA (or at least comes out and goes back in) so that you know the clock rises before the data on the address bus can change. From my experience, in very many cases, this is NOT true for an FPGA design, but some slower clock comes in, and the highest speed clocks are generated by PLLs in the FPGA.

A second really big issue is how do you do a read cycle if the write comes ungated from the clock. The best I can figure is you are assuming you can get a read done in 1/2 a clock cycle and just rewrite the data. In most such rams WE overrides OE, and the Selects kill both read and write. Unless you had a part with both a WE and WS (where WE could disable the WS, but did not itself need to have the required setup/hold to address) I can't see how you do reads with the clock anywhere close to cycle time, and having a WOM (Write only Memory) isn't that useful here.

Vote

L

lasselangwadtchristensen 8 years ago

routing delays on data vs. WE". Do you have any basis for that statement?

network and out to an IO, how are you going to guarantee that delay is the same as the data going from an output flop to an io?

from an output flop to an io" in order to get a working design as you state d. Instead, one can design it such that the clock to the flip flops that g enerates the address/data/control signals are simultaneous, within some app licable design tolerance, with the clock signal (aka WE) arriving at the SR AM.

how are you going to control the delay from output ff to io vs, clock getti ng of the clock tree to io?

al data delay matches the nominal clock delay as you suggest you are essent ially crossing your fingers hoping that you don't run across a 'fast' data path and a 'slow' clock path over full PVT range...either that or you are l engthening the data path on the PCBA to guarantee that it never beats the c lock. Yes you can do that to get a guaranteed working design, but that wou ld seem to be more of the 'headache' that you mentioned than my approach of just routing them on the shortest path as one would probably normally do a nyway.

using a DDR output data and WE all have the same path to io and should thus track over PVT

using the clock directly is pretty much guranteed to add more delay than th e clock to out on the output ffs

two 74X374 flip flops, nothing more, nothing less. How is that a 'headache '?

is

important at all if every flip flop receives the same rising edge. Gettin g multiple receivers to receive the same clock signal (to within some toler ance) is something that a designer does have control over. Relying on the control of skew between two or more signals, not so much.

inside every FPGA. You can send any FF output to the input of any other F F on the device because they design the clock network to produce this simul taneous action. It's not because they added data routing delays.

FF out to FF in is safe by design, once you m mix in clock used as "data" y ou add an unknown delay

Vote

K

KJ 8 years ago

No that was not my assumption. The clocking situation is no different than how one synchronizing the internal clock and the external clock in SDRAM o r DDR. Even before there were DDR parts and DDR flops in FPGAs, there were single clock SDRAMs and they had FPGA controllers. Clock synchronization between the FPGA and the SDRAM is required there as well and would use the same control technique.

Well, looking around now that does seem to be the case today which sort of makes me wonder which SRAM I was using back then. At that time, CE and OE enabled the I/O drivers independent of WE. Writing to memory was sometimes (depending on the part) inhibited if OE was active. I don't believe I rel ied on any bus-hold circuit circuit or any sort of other trickery like that . I will say that the design did work and was in production for several ye ars without issue but, in any case, my solution does not seem applicable to day. Interesting, good catch.

Kevin Jennings

Vote

K

KJ 8 years ago

By using the phase control of the PLL to adjust the clock leaving the chip relative to the clock internal to the chip. That can be done in a way to guarantee operation.

'Should' is an important word there...but practically speaking I agree that there is probably 'slim' chances of failure.

Mixing in clock as data was not what I was doing. In any case, based on my reply to Richard Damon's post, my approach, while it worked back in the day, wouldn't work now.

Kevin

Vote

R

rickman 8 years ago

Hmmm, looking at a current data sheet I don't see where you can gate the write cycle with OE. WE, the byte enables and CE, but not OE.

Rick C

Vote

B

brimdavis 8 years ago

As has been explained to you in detail by several other posters, your method is not 'easier' with modern FPGA's and SRAMs.

The simplest way to get a high speed clock {gated or not} off the chip, coincident with other registered I/O signals, is to use the dual-edge IOB flip-flops as I suggested.

The DDR technique I mentioned would run synchronous single-cycle read or write cycles at 50 MHz on a Spartan-3 Starter kit with an (IIRC) 10 ns SRAM, 66 MHz if using a duty-cycle-skewed clock to meet the WE pulse width requirements.

Another advantage of the 'forwarding' method is that one can use the internal FPGA clock resources for clock multiply/divides etc. without needing to also manage the board-level low-skew clock distribution needed by your method.

-Brian

Vote

R

rickman 8 years ago

I can't say I follow what you are proposing. How do you get the clock out of the FPGA with a defined time relationship to the signals clocked through the IOB? Is this done with feedback from the output clock using the internal clocking circuits?

Rick C

Vote

A

Allan Herriman 8 years ago

About a decade back, mainstream FPGAs gained greatly expanded IOB clocking abilities to support DDR RAM (and other interfaces such as RGMII). In particular, one can forward a clock out of an FPGA pin phase aligned with data on other pins. You can also use one of the internal PLLs to generate phase shifted clocks, and thus have a phase shift on the pins between two data signals or between the clock and the data signals.

This can be done without needing feedback from the pins.

You should try reading a datasheet occasionally - they can be very informative. Just in case someone has blocked Google where you are: here's an example:

formatting link

selectio.pdf

Allan

Vote

B

brimdavis 8 years ago

The links I gave in my original post explain the technique:

Allan Herriman wrote:

Nearly twenty years now!

Xilinx parts had ODDR equivalents in Virtex-E using hard macros; then the actual ODDR primitive stuff appeared in Virtex-2.

-Brian

Vote

R

rickman 8 years ago

Thank you for the link to the 356 page document. No, I have not researched how every brand of FPGA implements DDR interfaces mostly because I have not designed a DDR memory interface in an FPGA. I did look at the document and didn't find info on how the timing delays through the IOB might be synchronized with the output clock.

So how exactly does the tight alignment of a clock exiting a Xilinx FPGA maintain alignment with data exiting the FPGA over time and differential temperature? What will the timing relationship be and how tightly can it be maintained?

Just waving your hands and saying things can be aligned doesn't explain how it works. This is a discussion. If you aren't interested in discussing, then please don't bother to reply.

Rick C

Vote

R

rickman 8 years ago

I haven't used a Xilinx part in at something like 15 years. So I don't recall all the details. I don't follow how you achieve the timing margin needed between the address, control and data signals which are passing through the IOB and the WE signal pulse is being generated in the IOB DDR. Even with a hold time requirement of 0 ns something has to be done to prevent a race condition. Your posts seem to say you used different drive strengths to use the trace capacitance to create different delays in signal timing. If you can't use a data sheet to produce a timing analysis, it would seem to be a fairly sketchy method that you can't count on to work under all conditions. I suppose you could qualify the circuit over temperature and voltage and then make some assumptions about process variability, but as I say, sketchy.

Rick C

Vote

R

Richard Damon 8 years ago

Thinking about it, YES, FPGAs normally have a few pins that can be configured as dedicated clock drivers, and it will generally be guaranteed that if those pins are driving out a global clock, then any other pin with output clocked by that clock will change so as to have a known hold time (over specified operating conditions). This being the way to run a typical synchronous interface.

Since this method requires the WE signal to be the clock, you need to find a part that has either a write mask signal, or perhaps is multi-ported so this port could be dedicated to writes and another port could be used to read what is needed (the original part for this thread wouldn't be usable with this method).

Vote

R

rickman 8 years ago

I'm not sure you read the full thread. The method for generating the WE signal is to use the two DDR FFs to drive a one level during one half of the clock and to drive the write signal during the other half of the clock. I misspoke above when I called it a "clock". The *other* method involved using the actual clock as WE and gating it with the OE signal which won't work on all async RAMs.

So with the DDR method *all* of the signals will exit the chip with a nominal zero timing delay relative to each other. This is literally the edge of the async RAM spec. So you need to have some delays on the other signals relative to the WE to allow for variation in timing of individual outputs. It seems the method suggested is to drive the CS and WE signals hard and lighten the drive on the other outputs.

This is a method that is not relying on any guaranteed spec from the FPGA maker. This method uses trace capacitance to create delta t = delta v * c / i to speed or slow the rising edge of the various outputs. This relies on over compensating the FPGA spec by means that depend on details of the board layout. It reminds me of the early days of generating timing signals for DRAM with logic delays.

Yeah, you might get it to work, but the layout will need to be treated with care and respect even more so than an impedance controlled trace. It will need to be characterized over temperature and voltage and you will have to design in enough margin to allow for process variations.

Rick C

Vote

A

Allan Herriman 8 years ago

messages/2076

messages/2177

Nearly twenty years! Doesn't time fly when you're having fun.

Thinking back, the last time I connected an async SRAM to an FPGA was in

1997, using a Xilinx 5200 series device.

The 5200 was a low cost family, a bit like the XC4000 series, but with even worse routing resources, and (keeping it on-topic for this thread) NO IOB FF. Yes, that's right, to get repeatable IO timing, one had to LOC a fabric FF near the pin and do manual routing from that FF to the pin. (The manual routing could be saved as a string in a constraints file, IIRC).

Still, I managed to meet all the SRAM timing requirements, but only by using two clocks for each RAM read or write. The write strobe used a negative edge triggered FF.

"And if you tell that to the young people today, they won't believe you"

Regards, Allan

Vote

A

Allan Herriman 8 years ago

ultrascale-

As you say you've never done DDR I'll give a simple explanation here, using Xilinx primitives as an example.

The clock forwarding is not the same as connecting an internal clock net to an output pin. Instead, it is output through an ODDR, in exactly the same way that the DDR output data is produced. (Except in this case, instead of outputting two data phases, D1 and D2, it just outputs two constants, '1' and '0' (or '0' and '1' if you want the opposite phase) to produce a square wave.

The clock-forwarding output and the data output ODDR blocks are all clocked from the same clock on a low skew internal clock net. This will typically have some tens of ps (to hundreds of ps, depending on the particular clocking resource) skew. There will also be skew due to the different trace lengths for each signal in the BGA interposer, but these are known and can be compensated for in the PCB design.

Perhaps you want deliberate skew between the clock and data (e.g. for RGMII) - there are two ways of doing that:

Use an ODELAY block on (a subset of) the outputs, ODELAY sits between the ODDR output and the input of the OBUF pin driver. The ODELAY is calibrated by a reference clock, and thus is stable against PVT. It has a delay programmable between ~0 and a few ns. It has an accuracy of some tens of ps, and produces some tens of ps jitter on the signal passing through it.
Use a PLL (or MMCM) to produce deliberately skewed system clocks inside the FPGA. These will need separate clocking resources to get to the IO blocks (leading to some hundreds of ps of additional, unknown skew).

More details can be found in the user guide that I linked earlier.

Allan

Vote

B

Brian Davis 8 years ago

Then maybe you shouldn't post comments like this:

Xilinx characterizes and publishes I/O buffer switching parameters vs. IOS TANDARD/SLEW/DRIVE settings; this information is both summarized in the dat asheet and used in generating the timing reports, providing the base delay of the I/O buffer independent of any external capacitive loading [1].

The I/O drive values I used in my S3 testing provided an I/O buffer delay difference of about 1 ns (at the fast device corner) between WE and the add ress/data lines.

While these I/O pins will be slowed further by any board level loading, fo r any reasonable board layout it is improbable that this loading will someh ow reverse the WE timing relationship and violate the zero-ns hold requirem ent.

My original 2004 posts clearly specified what was (timing at FPGA pins) and wasn't (board level signal integrity issues) covered in my example:

For purposes of a demo example design, I'm perfectly happy with an addres s/data hold of 10% of the SRAM minimum cycle time, given that the SRAM hold specification is zero ns.

If a design needs more precise control, many of the newer parts have cali brated I/O delays (already mentioned by Allan) that can be used to produce known time delays; in the older S3 family, the easiest way to provide an ad justable time delay would be to use a DCM to phase shift the clock to the O FDDRRSE flip-flop primitive driving WE.

-Brian

[1] UG199 S3 data sheet v3.1

formatting link

page 83: " " The Output timing for all standards, as published in the speed files " and the data sheet, is always based on a CL value of zero. "

Vote

Join the Discussion

Didn't find your answer?