[NIOS-II SOPC] SDRAM Read Burst Cycle Length ...

Hi all [SOPC users],

is there a way a can configure the read burst length of the standard SDRAM controller within SOPC 4.1?

Best Regards Markus

Reply to
Markus Meng
Loading thread data ...

Hi Markus,

You might try asking this over on the Nios Forum

formatting link
I'd like to know the answer as well. I looked through the controller's class.ptf file and even the verilog source and don't see anything.

On writes however, I'm getting bursts of at least 480 long words at one clock per word. (my system is running at 75MHz)

Ken

Reply to
Kenneth Land

Did you have to do anything special to achieve that? I have a custom peripheral that is writing as fast as it can to the sdram, but I'm getting one 32-bit write every 3 clocks. With the prototype system I have at the moment, that's good enough, but I'd like to improve on it when we start making the real thing. When reading, I'm getting one read every 2 clocks - again, it's not ideal but it works. I'd expect one read/write per clock for most of the burst, with some waits while changing banks or refreshing.

Also, my reader and writer peripherals are independant, so sometimes they coincide. The Avalone bus arbitration apparently cannot take bursting into account, and swaps between the two accesses. Is there any way this can be improved upon, or do I have to implement my own mini-arbitrator to control the two peripherals?

Reply to
David Brown

Hi david,

Apperently we have similar problems. I am designing my own peripheral that needs to read/Write a word in every clock cycle. This peipheral is connected to the SDRAM controller as a master on the Avalon bus. What I see in simulation, when I am connecting to the SDRAM controller, is bursts of 2 words and then 2 or

3 clocks delay etc. I think the difference between my application (and I guess yours too) and Kenneth's, is that Kenneth is using a DMA to burst data whereis we are using a simple master to slave transactions. Maybe by instantiating the DMA in the SOPC builder, the SDRAM contoller is configured to longer bursts. I started develpoing my own SDRAM controller that will support full page bursts, but if I will find a way the SDRAM controller is working, I might return to it.

Zohar

Reply to
zg

Hi David,

There are a number of factors that affect SDRAM performance in the general sense. Typically you'll achieve best performance (approaching one clock per word read or write) if the accesses to SDRAM that are presented by your peripheral are burst-like. That is, you are reading from or writing to sequential accesses without interruption; this applies to our SDRAM controller. Even when transactions are optimal, you'll still face the occasional bank-switch delay when your address causes a bank changes, and of course, the inevitable refresh delay every so often.

By contrast "thrashing" SDRAM, like thrashing a microprocessor cache, will have negative performance consequences... by thrash I mean accesses that are all over the place, requiring the SDRAM controller to take time switching banks continuously.

To address your concerns: first, we have some enhancements to Avalon in the works that will address a lot of these problems for the case you present (wanting to achieve burst-performance between multiple peripherals). If the results you're getting are good enough for now, I'd suggest waiting for the next SOPC Builder release as it will include these Avalon enhancements.

If you need better performance right now with your setup, feel free to send me an email and I'd be happy to give you a few pointers (about avalon arbitration and a couple other things) that may help.

Jesse Kempa Altera Corp. jkempa at altera dot com

Reply to
Jesse Kempa

Have you guys looked at increasing the arbitration priority for the SDRAM controller? The NIOS needs to fetch opcodes every once in awhile.

Ben

Reply to
Ben Twijnstra

I'm not using the standard DMA peripheral, but I am using intelligent masters. Both the reader and the writer are designed to work with continuous bursting - they feed to or from fifos with a depth of 256 32-bit words. In each case, the other end of the fifo is slower than this - the idea is to use bursting so that they occupy the minimum bus bandwidth. When running them on a different board with a fast synchronous ram, each runs one transfer per cycle (the reader is latency-aware).

I'd

one

getting

the

clocks -

for

they

into

be

control

Reply to
David Brown

getting

the

clocks -

for

they

into

be

control

The transfers should be optimal (except when the reader and writer coincide, or another master uses the same memory). I've tested the reader on a board with a synchronous ram, and it runs one transfer per cycle (it is latency-aware). I haven't done as intensive testing on the write master, but it should work fine at one transfer per clock.

I've got a few things still to look at. My design is purely synchronous on the rising edge of the system clock - perhaps the waitRequest signal is comming for a short glitch and is being captured as high. So I've got a bit of testing with an oscilliscope to do first - signalTap uses the same clock to capture bits, and therefore would suffer from the same problem.

I'm aware of that, and it's not a problem. I also have the Nios II program and data in the same memory device, so that will grab occasional cycles (not many, since the Nios has cache and is busy-waiting during the transfers).

Sounds good - that should save me from handling the overlap between the reader and writer (and other masters). I have fifos connected to both bursting masters, giving me a fair amount of slack - my main interest in bursting is to cut downt the bus time used to fill/empty the fifos.

I think I have a fair overview of the arbitration - conflicts when both masters want the bus at the same time is not by biggest problem, and the enhancements you're planning to SOPC Builder will help there. It's more the bursting that I haven't got 100% yet.

I suppose another possibility for me would be to use a standard DMA component to transfer between the fifo and the main memory, with a little component inbetween to act as a Avalon slave tied to the fifo and changing "fifo used" information into streaming control signals. Do you think it would be easier to get full-speed bursting using the standard DMA component?

David david at westcontrol dot com

Reply to
David Brown

I'd

Yes - it's not the source of the problem (for me, anyway). The Nios is running an empty loop most of the time during these bursts, and it's configured with instruction cache.

Reply to
David Brown

Hi David,

Sorry for the delay I've been out of town with no access.

I'm using the standared included dma peripheral to empty a standard single clock fifo. I did not modify any of the master priorities.vr

I can post a test project to the Nios Forum that runs on the Cyclone devkit board, but I don't have time to document it very well.

If anyone is in a bind they can contact me through the forum and I will email them a .zip of the project. (~4MB)

The credit for this perfect performance goes mostly to an Engineer at Altera who worked with me for over a week until we had it perfect.

One thing was that Read_Latency="1" had to be added to the Interface to User Logic settings. (see below for the entire settings list)

Ken

SYSTEM_BUILDER_INFO { Bus_Type = "avalon"; Address_Alignment = "dynamic"; Address_Width = "2"; Data_Width = "32"; Has_IRQ = "1"; Base_Address = "0x010019A0"; Has_Base_Address = "1"; Read_Latency = "1"; Read_Wait_States = "0.0cycles"; Write_Wait_States = "0.0cycles"; Setup_Time = "0.0cycles"; Hold_Time = "0.0cycles"; Is_Memory_Device = "1"; Uses_Tri_State_Data_Bus = "0"; Is_Enabled = "1"; MASTERED_BY SCAN_IN_DMA/read_master { priority = "1"; } IRQ_MASTER cpu/data_master { IRQ_Number = "0"; } MASTERED_BY cpu/data_master { priority = "1"; } }

I'd

clocks -

for

into

Reply to
Kenneth Land

So you have a DMA device set up to read from a fifo as a slave. Did you make a class.ptf file for the fifo slave, that you could post here? Am I right in thinking you used a standard fifo (i.e., readReq triggers a new read) rather than a read-ahead fifo (i.e., readReq works as an acknowledge) ? I think the read-ahead version would eliminate your need of a read latency. Also, were you feeding data into the fifo at full speed? In my application, the other end of the fifo is much slower (roughly 1/6 of the system speed), so I'd like to let the fifo build up a bit before starting a burst - perhaps triggering a burst start when it is half-full, and stopping again when it is empty. I expect to write my own slave, wrapping the fifo and providing such control signals (I have much of the code for that from my original version using my own master). Would I be correct in thinking that the way to handle this is for the DMA to be set up to transfer a given number of words, and have the slave assert waitRequest to pause the DMA reads until the fifo was half-full?

David

devkit

Altera

User

one

getting

the

they

be

control

Reply to
David Brown

I have just thought of another possibility for my non-optimal bursts. I have been running the cpu at 60 MHz rather than the default 50 MHz for the development board. There is no problem running at that speed (and it could probably go a fair amount faster), as long as I'm lucky with the sdram clock synchronisation (the clock setup on the Cyclone Nios development card is daft, IMHO).

However, I notice that some of the timing parameters for the Micro sdram chip are at 20 ns. The sdram controller will round these up to integer clock cycles - at 50 MHz clock, that's one clock, while at 60 MHz they would be two clocks. As far as I can see, the two timings in question ("Active to read or write delay trcd" and "duration of precharge trp") should cause an extra cycle during bank activation, but not during bursts - the sdram supports full speed bursts up to 143 MHz. I'm also wondering if there is a particular reason for having the default CAS Latency at 3 - the sdram chip supports CAS 2 up to 100 MHz.

Reply to
David Brown

You've got it. I generate a IRQ based on the fifo level and then in the ISR I setup the dma to dump the fifo to an sdram buffer. The fifo is being filled at about one word every 700ns and when the fifo level reaches 480 the IRQ dump process is triggered. The interrupt latency is about 8uS and the dma itself takes approximately 485 system clocks. (other code is hammering the sdram in parallel + sdram refresh etc.)

Here is the fifo instantiation:

NFIFO : scfifo WITH ( INTENDED_DEVICE_FAMILY = "Cyclone", LPM_WIDTH = 36, LPM_NUMWORDS = 512, LPM_WIDTHU = 9, LPM_TYPE = "scfifo", LPM_SHOWAHEAD = "ON", OVERFLOW_CHECKING = "OFF", UNDERFLOW_CHECKING = "OFF", USE_EAB = "ON", ADD_RAM_OUTPUT_REGISTER = "ON" );

Then I hook the chipselect (CS) and read signals from the IUL port thusly:

RD_DN.d = SCAN_IN_CS & SCAN_IN_RE; RD_DN.clk = CLK100M;

NFIFO.rdreq = RD_DN; NF_OUT[35..0] = NFIFO.q[35..0];

NF_OUT[31..0] is connected directly to the data in port of IUL.

CLK100M is actually the sysclk which is 75 MHz right now. Timing analysis says I'm good to 97MHz, but I haven't pushed it yet.

I'm pretty sure this info with the system builder IUL port settings I posted earlier are all you need. Hope this helps.

Ken

single

acknowledge)

a

stopping

my

that

formatting link

start

clock

can

Reply to
Kenneth Land

I've now tried something similar - I removed my own master for my reader, and replaced it with a standard DMA component. This feeds a fifo (via a slave port on my reader component, which controls the "waitRequest" line in order to stall the DMA when the fifo is fairly full until it is nearly empty again). I actually implemented a small master to control the DMA, starting transfers automatically and independantly of the Nios. It all works fine, except...I get one read every two cycles, exactly as when using my own master.

When I ran at 50 MHz instead of 60 MHz, the bursts were one read per cycle. I was getting some corruption, but I suspect that was a matter of the phase for the sdram clock being not quite right for 50 MHz, so I'm worried there. However, it looks very much like the standard Nios II sdram controller with the dev kit sdram and default timings won't run properly at 60 MHz. Could that be the case?

David

ISR

the

posted

I

my

the

starting

fifo

from

will

to

message

controller's

at

custom

at

refreshing.

bursting

Reply to
David Brown

Are you reading sdram or writing? I'm writing sdram in one clock @75MHz. Since the sc_fifo is good to 200+MHz and has look ahead, in theory it should be easy to achieve one clock access into (or out of) any Nios/SOPC system.

Your dma setup sounds useful. How do you setup the dma independent of the Nios? Do you have external logic writing to its control registers? Also, when you stall the dma with waitRequest, it doesn't stall the Avalon bus does it?

I think I'll try to set that up in my system so I can dma in chunks that make sense to my application instead of fifo almost full.

Ken

in

empty

starting

cycle.

phase

there.

with

the

hammering

thusly:

analysis

you

Am

new

DMA

at

anything.

sometimes

this

Reply to
Kenneth Land

should

I've got both in the system - at the moment, I've been working on the reader. My writer is also sub-optimal - it seems to be 3 cycles per word at

60 MHz, and 2 cycles per word at 50 MHz, although I haven't tried fine-tuning it as much (and I haven't tried using a standard DMA component).

My reader component has 3 Avalon bus interfaces - two slaves and a master. One slave is used for control and setup, and is connected to the Nios II data master. The other slave is for the fifo input - it has no address decoding, and simply accepts words written to it and feeds them into the fifo. It generates a waitRequest signal as necessary to stall its master when the fifo does not want more data (after the fifo is mostly filled, waitRequest is asserted until the fifo is mostly empty again). This slave is connected to the dma's writer master. The dma's reader master is connected to the sdram slave (as are the Nios masters), while the dma's slave is connected to the reader's master. Thus the whole system has three independant Avalon buses - there is the main one used by the Nios and the dma reader, there is a private one between the dma writer and the fifo slave, and a second private one between the reader's master and the dma's slave port. The reader master port simple reloads the dma's configuration and starts it running on a regular basis.

Being a private bus, there is no problem with stalling the dma writer master with a waitRequest - the other buses run regardless.

The whole thing is a little under 100 lines of Confluence code (plus another

150 lines for the component at the other end of the fifo).

David

reader,

fine,

Could

the

being

480

message

standard

read

In

of

the

thinking

given

Cyclone

Interface

words

I'm

have

we

2

per

to

Reply to
David Brown

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.