Altera's altsyncram MAXIMUM_DEPTH

What does this generic means?

I am wondering if I am missing out on a possible memory optimization.

Altera's docs are decidedly vague and a search on their website brings up nothing.

-- Pete

Reply to
Peter Sommerfeld
Loading thread data ...

Hi Peter,

yes you do.

Quartus allocates memory by depth first, 512x8bit therefore uses two M4Ks in 512x4 mode. If your memory width and depth is a power of two, allocation order doesn't matter except for some speed details. But a 700x8bit memory is much better allocated by width than by depth (because only 3 M4Ks are needed for the first compared to 4 for the latter). (see

formatting link
for further details) MAXIMUM_DEPTH should help you to force Quartus not to waste this addtional memory block.

Unfortunately it doesn't work. Not even the way Altera thinks it should work. I had a long (and somewhat bizarre) service request the last entry being the following one:

-- Altera wrote This is to let you know that a software problem request has been filed in order to reflect this issue. I will let you know as soon the software group gets back to me with any infomation or when a resolution is made.

-- Altera wrote much more, but [snip]

This was written the 25th of august and the service request was closed without further comment. I have posted an additional request asking for the actual state of the problem request about one month ago and did not receive any answer. Either Altera doesn't care or they don't want to state that this is an issue at present before they are able to ship the new Quartus

4.0 (hopefully fixing this and a lot of other things) - who knows? If anyone in the group thinks he can help on this topic or has further details I would be thankful to hear about it as Quartus wastes a lot of my memory and this has to change!

I have to say that life with Altera mySupport is very ambiguous to me. Answers are generally quick and friendly (which is already a lot) but generally only helpful when problems are simple. Whenever the problem gets more complex or there is a bug thinks get very slow (or even stop).

Regards, Manfred

BTW: "Release notes for Service Pack 2 will be released on Friday, October

24, 2003." (seen on
formatting link
qii30sp2.jsp the 17th november)

======= Service Request Detail (reordered for your convenience) Request #: 10363308 Status: Closed Date Opened (PDT): 8/19/03 9:03 AM Date Closed (PDT): 9/4/03 6:52 PM Inquiry Type: Product Question

Device Family: CYCLONE Device: Title: FIFO implementation size

Description: I have created a 1300word by 8bit FIFO (sfifo). The implementation of this needs 16384 memory bits. Why?

The FIFO-size should result in about 1300x8=10400 memory bits. As the blocksize of the embedded ram in Cyclone is 4096bits which can be organized

512x8 I expect Quartus to use three M4K's resulting in 4096*3=12288bits. Obviously it uses a fourth block, why?

Regards, Manfred ------ 8/19/03 3:17 PM To Customer Hello Manfred,

This is to let you know that I am currently looking into this. I will let you know as soon as I am able to verify the problem as you have described and come into a resolution.

------ 8/19/03 4:20 PM To Customer Hello Manfred,

Since 1300 is larger than 1k, it'll use 2kx2 mode for best performance. To get the x8 mode you'll need 4 M4Ks. Click custom on (page 6 out of 8 of the megawizard), then you get an option to set Maximum depth option and if you set 512 then it'll use that mode and should only need 3 M4Ks.

For more information on this, you may refer to the following link:

formatting link

------ 8/20/03 12:36 AM From Customer Hello Marlon,

thanks for your quick and helpful reply. Now the behaviour of Quartus is clear to me. Unfortunately setting the parameter max. block depth to 512 in the Megawizard Plug-In Manager as you proposed does not result in a smaller memory consumption. I have attached the packed project for your convenience. Setting this parameter adds the following line in the scfifo instantiation code: maximum_depth => 512, however this parameter is not described in the Quartus II help page for the scfifo-Megafunction. Why?

Regards, Manfred

------ 8/20/03 9:47 AM To Customer Hello Manfred,

The MAXIMUM_DEPTH parameter is an internal parameter so there won't be any information on this in the Quartus II Help or Megawizard.

------ 8/20/03 11:26 PM From Customer Hello Marlon,

again: Unfortunately setting the parameter max. block depth to 512 in the Megawizard Plug-In Manager as you proposed does NOT result in a smaller memory consumption. Why? Please check with the attached project file.

Regards, Manfred ------ 8/21/03 5:08 PM To Customer Hello Manfred,

Sorry for the inconvenience, but actually, in order to get the x8 mode you'll need 4 M4Ks.

------ 8/21/03 11:49 PM From Customer Hello Marlon,

could you please specify why it is not possible to implement a 1300x8 FIFO in 3 M4K Blocks as this information is the opposite of both your first advice and the mentioned support database page

formatting link
What exactly is the parameter maximal block depth for then?

Regards, Manfred

------ 8/25/03 6:50 PM To Customer Hello Manfred,

This is to let you know that a software problem request has been filed in order to reflect this issue. I will let you know as soon the software group gets back to me with any infomation or when a resolution is made.

Reply to
Manfred Mücke

nothing.

MAXIMUM_DEPTH controls the underlying RAM block size that will be used to construct the user's altsyncram memory. By default, the altsyncram megafunction will round up the memory depth to the next power-of-2, and use that as a RAM block size. For example, if you ask for a

3K-word memory, altsyncram will normally construct it from 4K RAM blocks, because this gives the best performance. If you are running short of RAM blocks, you could specify MAXIMUM_DEPTH=1024 for this example, and the altsyncram megafunction will construct the 3K memory from 1K-word RAM blocks, which might potentially use 1/4 fewer RAM blocks. The penalty for doing this is that the 3K-word memory constructed from 1K-word RAM blocks will need LEs to mux and de-mux the data, and will also run slower as a result.

In summary, MAXIMUM_DEPTH is a control to increase memory efficiency for non-power-of-2 memory depths, but at a cost of lower memory performance, and a few LEs to stitch the smaller RAM blocks together. MAXIMUM_DEPTH can only take power-of-2 values, with 32 being the smallest meaningful value, since it corresponds to the shallowest M512 memory block configuration.

- Subroto Datta Altera Corp.

Reply to
Subroto Datta

Hi Manfred, Subroto:

Thank you very much for your in-depth replies. I'm happy to see that MAXIMUM_DEPTH does what I was hoping it does, because I need many RAMs at non-power-of-2 bits storage, and I'm feeling a little too lazy to write my own muxing logic.

Manfred, I compiled a design that had one depth-first and one width-first RAM block, each being 1,089 x 32 bits. The depth-first used 16 M4k's as 4096x2, and the width-first used 9 M4k's as 128x32, so the functionality appears to be working for me. Perhaps certain memory configuration work properly with MAXIMUM_DEPTH, while others (ie. yours) do not?

As expected the critical path was in the width-first logic, but was still 220 MHz+.

I am using Quartus II 3.0 SP2. I found the release notes at

formatting link

Thanks again,

-- Pete

Reply to
Peter Sommerfeld

nothing.

Hi Manfred, Peter,

The MAXIMUM_DEPTH description that was posted in my previous reply applies to the altsyncram megafunction, and indirectly to scfifo and dcfifo megafunctions. The FIFO megafunctions do not support non-power-of-2 depths, so the memory example I gave does not apply. In Quartus II 4.0, the FIFO MegaWizard plug-in will not allow you to enter non-power-of-2 depths.

The only reason for specifying a MAXIMUM_DEPTH parameter in a FIFO megafunction in pre-4.0 versions of Quartus would be to enforce a smaller RAM block size to give added freedom to the fitter. MAXIMUM_DEPTH values of 128, 256, and 512 can fit in either M512 blocks or M4K blocks. A MAXIMUM_DEPTH value of 4096 can fit in either an M4K block or an M-RAM.

Here's an example: I have a 2K word FIFO, and I don't care if it goes into M4K blocks or M512 blocks. If I set MAXIMUM_DEPTH=512, the FIFO will be constructed from 512-word RAM slices, which gives the fitter the flexibility to place the FIFOs in either M512 blocks or M4K blocks.

- Subroto Datta Altera Corp.

Reply to
Subroto Datta

Hi Subroto,

This is a very clean answer to a very long service request issue, it would have saved me a lot of time getting the very same answer from Altera mySupport. Instead they left me with a dangling service request and the information that there is a potential bug in Quartus. Do you have the possibility to look into that, or to share your knowledge with your support team? I would appreciate getting an official answer from mySuport, really closing my service request. BTW: Why do you restrict FIFO depths to powers of two? That would allow trading memory usage versus implementation speed (like with altsyncram).

Regards, Manfred

Reply to
Manfred Mücke

=20

=2E

Probably because FIFO storage is based on a ram, and ram comes in increments of one address bit.

As Subroto said, the extra space from altsyncram MAXIMUM_DEPTH to the top could not be used as RAM in any case.

-- Mike Treseler

Reply to
Mike Treseler

True as long as the size of the memory/FIFO is smaller than the memory blocks available in the device. A Cyclone for example uses M4K memory blocks with 4096bit each (as the name suggests). So for RAM/FIFOs 4096bits the M4K-block is the smallest building unit, allowing you to implement a RAM/FIFO using 3*4096=12288bits from 3 M4K-blocks (depending on the FIFO width). Because address decoding is easier when aligning by depth an to improve speed, it can make sense to use more (four in our example) M4K-blocks wasting some memory, but it is by no ways a necessity. This is a limitation which does not apply to RAM but only to FIFOs and will be introduced in Quartus 4.0 as Subroto said. However RAM and FIFOs are both implemented in the very same memory blocks so it's up to the Wizard/Module Designer to allow or restrict the depth. It is a choice to restrict FIFO depths to powers of two but as long as there is no special FIFO-RAM block no must. My question was why this limitation which restricts potential savings on memory bit consumption will be introduced.

Regards, Manfred

Reply to
Manfred Mücke

Followup to: By author: =?iso-8859-15?Q?Manfred_M=FCcke?=

In newsgroup: comp.arch.fpga

There is another issue, which is that the RAMs are actually 4608 bits, not 4096. I have seen Quartus refuse to use those extra bits in situations where it could have, because it prefers to organize by depth, and apparently no way to work around this.

I would really like to see:

(a) support of non-power-of-two memory sizes; (b) ability to optimize for RAM consumption at the expense of timing.

This in particular was an issue when I tried to create a 16384 x 9 bit ROM, and yes, I needed all 9 bits...

-hpa

--
 at work,  in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
Reply to
H. Peter Anvin

Hi Subroto,

I would like to renew my question: Why do you restrict FIFO depths to powers of two? I can't see the need for that.

Regards, Manfred

Reply to
Manfred Mücke

Hi Manfred, There is no real need to restrict it. There have been several requests to relax this condition and we will get to it in a future release.

- Subroto Datta Altera Corp.

Reply to
Subroto Datta

Hi Manfred,

The dual-clock FIFO internally uses a Gray counter, which is fairly trivial to write for a power of two, plus the fact that counter rollover happens with a single-bit transition as well.

The Gray counter greatly reduces the risk of the Other Side (the one in in the different clock domain) seeing inaccurate counter values: the count is either the same, or only one bit has changed. For a normal counter, due to variations in the delay path between the various counter bits, part of the logic in the other clock domain might see a number of counter bits still having the old value, and a number that has the new value, resulting in a nonsense value. When there's a large difference between reader and writer clock frequencies, there may be not a single bit transition, but at least the number of transitions is minimized over time.

I haven't studied Gray counters deeply enough to see whether it's feasible, or even possible to write a Gray counter generator algorithm that can _efficiently_ do single-bit-transition counter rollover on an arbitrary (though pre-computed) value. If this is possible without going into long combinatorial chains (which would reduce operating frequency) it should definitely be feasible to remove this power-of-two restriction.

For the single-clock version - hey, why not?

Just my $.02

Ben Twijnstra

Reply to
Ben Twijnstra

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.