sram

K

kristoff 9 years ago

Hi,

OK, left the lora chips asside for a while, so .. now back to FPGAs.

I have two olimex ice40 boards where I would like to use the onboard SRAM. The RAM chip is a samsung K5R4016V1B-10 (256K words * 16 bits).

The datasheets are here:

formatting link

The most important pages are page 7 (for "read"), pages 8 and 9 (for "write") and page 10 (for the functional description of the pins).

I am trying to interprete the datasheets to see how to use the chip. I think I understand how to read or write one word, but I still puzzled on how to do bulk-write transfers

For read, it seams to be simple: set /WE high and /OE low (*)

1/ put the address on the address-bus

2/ 10 ns later, read the data from the data-out (*) ignoring the /CS, /LB and /UB pins to keep things simple.

In bulk transfer, it is like this:

- set address 1 on the Address bus

- 10 ns later:

-> read the data of address 1 from data-out

-> (at the same time) set address 2 on the address bus

- 10 bs later:

-> read the data of address 2 from data-out

-> (at the same time) set address 3 on the address bus (etc)

For write, to write one single word, I think it goes like this

1/ set /WE low and /OE high to go to "write" mode

-> at the same time set te address on the address bus

-> do not yet put the data on the databus (as it still in "output" mode)

2/ 10 ns later:

-> put the data on the data-bus (by then, the data-bus has switched to "data-in"

3/ another 10 ns later:

-> set /WE high and /OE low to leave "write" mode

But I am still puzzled on how to do a "bulk write" of data. The datasheets do not mention anything on what happens if leave the chip in "write" mode and just change the address on the address-bus (as is done for bulk-read)

It there is no seperate bulk-write protocol, it looks like a write to the chip takes 3 times as much steps then a bulk-read (3 steps compaired to one single step).

Is this a correct interpretation of the datasheet?

Can somebody who has already interfaced an FPGA with SRAM confirm or deny this. Or is there another trick on how to do a bulk-write on a SRAM chip?

Cheerio! Kr. Bonne.

Vote

C

Cecil Bayona 9 years ago

Static RAM chips do not have bulk mode, it's not needed, you write to it one word at a time. Its EEPROM, FLASH, and similar memory with it's complicated setup that are in need of bulk mode as they are slow and bulk mode is faster, some only have bulk mode.

Cecil - k5nwa

Vote

K

kristoff 9 years ago

Hi Cecil,

Thanks for your reply.

I agree it's not a bulk-mode as such.

What I meant was that when doing multiple reads one after the other you can stich them together:

Correct me if I am wrong, but how I interprete the datasheets, the "read data from the address-bus" can be done at the same time as the "set next address on address-bus". This -I think- means you can "overlap" two concequative reads, resulting in one read per clock cycle.

At least, that is -I guess- what the "t OH" (Output Hold from Address Change) means in the "ready cycle(1)" timing waveform on page 7 of the datasheet).

But I do not see how (or if) something simular can be done for "write" operations, but perhaps I am missing something.

Kristoff

Vote

R

Richard Damon 9 years ago

This looks to be a fairly standard asynchronous static ram.

The basic requirement for a write cycle is that there is a Tas (Address stable) which the address bus must be stable before you can pull the WE line low, a Twp as the minimum length of time you can need to pull the WE signal low, and a Taw address hold you need to hold the address bus stable after WE goes high.

Sine Tas >= 0, and Taw >= 0, it is easy to think that you can just clock the WE signal on the same clock edge as the address, but that requires that the FPGA and the board layout has ZERO skew, which is basically impossible.

As you note, it is easy to read at full speed, cycle after cycle, you just need clock new addresses and one cycle later you can read the results. Note, this is not really a 'burst' operation, but just running full cycles one after the other (the burst terminology tend to imply there is some setup you do and after that you can read a given number of locations without needing to do the setup again).

For write with this sort of part there are several options:

1) Simplest, do every thing on rising edges and need 3 clock cycles to write, cycle 1, change address, cycle 2: drop we, cycle 3: Raise we and address hold.

2) Slightly more complicated, again do things on rising edges, but have something to delay the WE signal slightly. 2 Cycles, 1) Set Address, and with slight delay drop WE. 2) Hold address, and after a slight delay raise WE.

3) Instead of a slight delay in WE, drive WE on the falling edge of the clock, again 2 Cycles as above with the slight delay being the 1/2 cycle delay of the falling edge.

4) Discrete Pulse generation logic, have logic on the board with delay lines to generate the write pulse, so that WE will pulse low shortly after the address is stable, and comes back high shortly before the address might change again. This lets you do a write every cycle.

5) Like the Discrete Pulse Generation, but in the FPGA using a higher speed clock. If you can be sure that the WE pulse is faster or slower than the address bus (including FPGA skew), you could use a 400-500 MHz clock and create a 7.5/8 ns pulse on WE. If you can enforce that, you can use a 700 MHz clock and generate a 5 clock cycle pulse (7.14ns) in the middle of the 10 ns cycle.

This is one of the limitations of asynchronous rams, write cycles take more 'edges' to perform. Thus either needing more cycles or something to generate higher speed edges.

Vote

L

lasselangwadtchristensen 9 years ago

SRAM doesn't have a clock, you just have to comply with the required timing

write happens on the rising edge on /WR

-Lasse

Vote

R

Richard Damon 9 years ago

Actually, with asynchronous parts, things don't happen 'on edges' but on levels (you measure timing requirements edge to edge). Asynchronous Srams tend to be a sea of RS Flip flops, and when write is low, the addresses flip flops will have their set or reset line asserted, so if you wanted to talk of a time when the write happened, it was on the falling edge, with a propagation delay/hold requirement.

Toh is the minimum guaranteed propagation delay from address to data, just like Taa is the maximum delay from address to data. (Trc actually isn't a critical parameter for the ram itself, but is a nominal system parameter. With Asyncronuous SRam, changing the address inputs faster than Trc won't cause any problems, except for the fact that you won't get valid data out until you stop doing it.

Vote

K

kristoff 8 years ago

Hi Richard,

Thank you for your reply.

Your message really helped to better understand the timing waveforms.

I'll start with the simpest setup and after that experiment with using the falling edge of the clock to clear the /WE signal (option 3).

Kristoff

Vote

R

Richard Damon 8 years ago

One thing to remind about, having a 10ns memory part does NOT mean you can talk to it with a 100MHz (10ns) clock. You will need to add in time from Clock->output on your address bus, and the needed Setup time on the data bus in. If you want the best performance, if possible you want both of these to be using FF in the I/O block of the FPGA, as those will have much lower propagation delays.

Asynchronous devices can be harder to use, but can give you significantly improved read performance if you are worried about latency, as synchronous interfaces can cost clock cycle. (on the other hand, synchronous interfaces can often write faster as you can often just stream the data, and the latency isn't important).

Vote

R

rickman 8 years ago

I think what Richard wrote is the clearest explanation of why there is no bulk write with async RAM. The level of the AND of WR- and CS-. So while these two signals are low it is expected the address does *not* change. If the address changed, the RAM cell selected will change and there can be extraneous cells selected as the address lines settle. By writing to location 3 and then 4 without removing WR or CS you can be writing to any combination of 0 to 7 in the switch. Since none of this meets timing the writing will be random garbage and not even the data you are trying to write to locations 3 and 4.

When both WR and CS are asserted, keep the address stable and keep the data stable for the last N ns before either control line is deasserted.

Rick C

Vote

M

Mike Perkins 8 years ago

There are some forms of clocked SRAM. ZBT was one type introduced by IDT.

I assume it still exists?

Mike Perkins Video Solutions Ltd www.videosolutions.ltd.uk

Vote

R

Richard Damon 8 years ago

The datasheet pointed to was a classical Asynchronous Static Ram, which doesn't have a clock.

There ARE Synchronous Static Rams which do have a clock pin. Synchronous devices tend to be a bit easier to interface to a synchronous systems, which most FPGA systems tend to be. Sometimes you lose a bit in latency when using them though.

Vote

R

rickman 8 years ago

I don't know the details of how SRAM is constructed, but there was a strong market for it until maybe about 10 years ago. Then growth of SRAM sizes pretty much stopped as new devices dwindled. DRAM has continued to improve at the cutting edge of semiconductor technology along with Flash, but SRAM is now the red headed stepchild. I guess the functionality of SRAM has largely been incorporated internally in FPGAs. If more size is needed than is convenient in FPGAs, DRAM is used. They may have longer latency, but speed is certainly not lacking.

Rick C

Vote

D

David Brown 8 years ago

Roughly speaking, DRAM needs one transistor and a capacitor for a cell - SRAM needs more transistors (4, I think). So SRAM costs a good deal more per bit than DRAM. Once speeds reached the point where bus speeds were the limiting factor for throughput rather than the memory speed, and after DRAM started having internal refresh rather than external refresh (needing active read/re-write cycles from the memory controller), DRAM was almost as fast as SRAM but much cheaper. SRAM still wins out on latency (and lower standby power), but as you say the SRAM has moved on board on devices (FPGAs, caches in processors, on-chip ram in microcontrollers) for even lower latency.

Vote

B

brimdavis 8 years ago

Generating a synchronously gated WE in a single cycle with a 1x clock can be done fairly easily by using the FPGA's dual-edge output flip-flop primitives.

I posted some notes on this technique (for a Spartan-3) to the fpga-cpu group many years ago:

formatting link

That S3 example code can be found here:

formatting link

The dual-edge I/O primitive for the ICE family would be SB_IO or SB_IO_OD, see:

formatting link

-Brian

Vote

K

KJ 8 years ago

It's even easier than that to synchronously control a standard async SRAM. Simply connect WE to the clock and hold OE active all the time except for cycles where you want to write something new into the SRAM.

Kevin Jennings

Vote

R

rickman 8 years ago

That would depend a *lot* on the details of the setup and hold times for the async SRAM, no? You can do what you want with data for much of the clock cycle, but the address has to meet setup and hold for the entire WE time. That's typically more than half a clock cycle and makes it hard to use it on every clock cycle.

Rick C

Vote

L

lasselangwadtchristensen 8 years ago

:

can be done fairly easily by using the FPGA's dual-edge output flip-flop pr imitives.

u group many years ago:

2076

2177

AM. Simply connect WE to the clock and hold OE active all the time except for cycles where you want to write something new into the SRAM.

the

on

and just using the clock give you the headache of trying to control routing delays on data vs. WE

using the dual edge output flipflop makes it all much controllable

Vote

K

KJ 8 years ago

Address (and data) setup and hold times are easily met. As a first order approximation, the setup time will be T/2-Tco(max). The address hold time will be Tco(min).

What is your source for statement "That's typically more than half a clock cycle"? The ancient Cypress CY62256N lists both of these requirements (Tsa and Tha) as 0 ns [1].

The technique works. You get single cycle read or write on 100% of the clock cycles, timing is met, period...and it worked 20+ years ago on product I designed [2].

Kevin Jennings

[1]

formatting link

page 7. [2] USPTO 6,169,703 (Patent status = Expired)

formatting link

Vote

K

KJ 8 years ago

On Sunday, August 6, 2017 at 1:30:46 PM UTC-4, snipped-for-privacy@gmail.com wr ote:

ng

Not true. There is nothing special that needs to be done to "control routi ng delays on data vs. WE". Do you have any basis for that statement?

Using the method I described is absolutely the same as connecting up two 74 X374 flip flops, nothing more, nothing less. How is that a 'headache'?

Kevin Jennings

Vote

R

rickman 8 years ago

As long as the signals are register in the output FFs that's true. But you can't register the clock! So the routing delays will be *very* important if running a fast asynch SRAM.

If the dual edge output flipflops are used the clock can in fact be registers in essence giving all signals the same delays within a tolerance.

Huh?

Rick C

Vote

Join the Discussion

Didn't find your answer?