Cylone Problem with Large Shift Register

- J
- John
  
  Contact options for registered users
posted
19 years ago

Fri, Dec 3, 2004 10:29 AM

I have been running a shift register design through a web version of Quartus

4.1 (SP2). Depending on the size of shift register either the tools don't complete (I waited 30 mins and gave up) or on smaller shifts of 720 I get a design that is a large size and it takes a long time to implement.

Has anyone else seen this problem ? Or know of any tool switches that need set to solve this ?

I have selected large enough Cyclone part and before anyone asks I am running a reasonable machine. An Athlon64 3000 with 512 MByte of memory for those that want the detail. I have run the same design (large version) on Spartan-3 / ISE and it less than 3 minutes to do the same.

John

- S
- Subroto Datta
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Dec 4, 2004 12:30 AM

John, We would like to investigate this further and help you. It would help if you would send me the source that you used or post it here. Thanks

- Subroto Datta Altera Corp.

- W
- Walter Gallegos
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Dec 4, 2004 12:28 PM

720 stages shift register need 45 Xilinx CLBs; but how many Altera LEs ?

Walter.

"Subroto Datta" a écrit dans le message de news: snipped-for-privacy@posting.google.com...

Quartus

don't

get a

need

for

on

- J
- John
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Dec 6, 2004 10:28 AM

I've slightly modified it and the 720 version isn't bad now but run times on the following setup are still bad. It looks a bit pointless as a design but as you can probably guess it was aimed at getting a real, in the field, power consumption.

library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity TEST_POWER is Port ( OUT_LINES : out std_logic_vector(260 downto 1); CLOCK : in std_logic; RESET : in std_logic); end TEST_POWER;

architecture a0 of TEST_POWER is

CONSTANT VEC_SIZE : INTEGER:=7150; CONSTANT IOSIZE : INTEGER:=260;

SIGNAL XOR_INPUT : STD_LOGIC_VECTOR(VEC_SIZE DOWNTO 1); SIGNAL SHIFT_REG : STD_LOGIC_VECTOR(VEC_SIZE DOWNTO 0);

begin

XORGEN : FOR I IN 1 TO VEC_SIZE GENERATE BEGIN XOR_INPUT(I) > complete (I waited 30 mins and gave up) or on smaller shifts of 720 I get a

need

for

on

- J
- John
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Dec 6, 2004 10:34 AM

It is only 45 CLBs if you don't use a reset. The point of this design was as a power test, to fill a device, and subsequently get a genuine accurate power reading.

Xilinx have a hugh advantage on shift registers with SRL16s which I believe Altera can't easily mimic due to patent issues. Someone from Altera can tell me if I am wrong in this.

The 720 version was done with the help of Wizards and not the code posted elsewhere and came to about 1100 LEs. With a variation of the code posted it started to get better.

John

memory

- V
- Vaughn Betz
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Dec 14, 2004 5:37 AM

believe

tell

Hi John,

I'd disagree that Altera is at any disadvantage on shift registers -- we just have a different approach.

Building large shift registers out of LE registers is very inefficient in both area and power, and isn't the way we build them in Altera devices. Instead, large shift registers are automatically converted to RAM-based FIFOs. If you don't like to rely on automatic conversion, you can instantiate the altshift_taps megafunction yourself (it implements all sorts of RAM-based shift registers).

I just coded up a 721-bit shift register in VHDL, and it takes 1 M4K RAM and

15 Logic Cells in Stratix -- vastly less area and power than the alternative of using 721 logic cell registers. This is also less area and power than the Xilinx SRL16 solution for large shift registers. For example, a 4096-bit shift register takes 1 M4K RAM and 17 Stratix Logic cells, which is a lot smaller than 256 SRL16s. In terms of power, the altshift_taps implementation results in only one entry in the RAM being read and one written each cycle (plus a small amount of switching in the FIFO pointer counters), instead of having each of 4096 registers toggle.

I know in your case you're trying to make a high-power design to make a power measurement, but that is definitely not what most of our customers are trying to do, so the power efficiency of FIFOs is pretty compelling.

Building clever structures like this out of RAM is why we have 3 different sizes of RAM, and lots of RAM, in our devices. The M512 lets us build moderate size shift registers efficiently, the M4K lets us build big ones efficiently, the MRAM lets us build very big shift registers efficiently, and the register cascade chain feature in the Stratix/StratixII lets us "recycle" unused registers (there are almost always a lot of registers left over in the FPGA devices, since most designs use more LUTs than registers) to build any small shift registers (e.g. 4-bit shift) needed.

As for patents, Altera and Xilinx have cross-licensed their patent portfolios, so there's no patent barrier on this. However, for the reasons I've listed above we don't think SRL16 is compelling, so we've judged it not worth the area it adds to the LUT.

Vaughn Altera v b e t z (at) altera.com [remove spaces and use proper @ to reach me]

- P
- Peter
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Dec 15, 2004 1:39 AM

It's amazing how everything can become a Xilinx vs Altera battle. It seems to me that the original posting was not really looking for the most compact solution. Both Altera and Xilinx can of course provide RAM-based shift registers, and as long as you stay below 16K length, the A and X solutions are indistinguishable.

But let me fix one bad misstatement: It does of course take 45 SRL16s to implement a 720 bit shift register, but these 45 SRL16s fit in lessthan six CLBs, since there are eight LUTs in a CLB. That takes less silicon area than any big RAM in either Altera or Xilinx chips...

Peter Alfke

- V
- Vaughn Betz
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Dec 15, 2004 5:01 AM

Peter,

If you check my post, you will see that I was replying to the poster's question of whether or not Altera was at a disadvantage on shift registers vs. Xilinx.

The area of a Xilinx BlockRAM (18 kbit RAM) is of course a lot larger than that of either an Altera M512 RAM (576 bits) or an Altera M4K RAM (4.5 kbit). All the extra RAM area beyond what the shift register needs is wasted, so having a variety of RAM sizes and a lot of smaller RAMs makes for a more efficient FIFO mapping of shift registers than having only a relatively small nuber of BlockRAMs of one size. Now, of course Xilinx has the SRL16's to implement smaller shift registers, so it's less crucial to have small RAMs around to build FIFOs for moderate size shift registers. The solutions are different, and you can argue about which is better, but they are certainly not indistinguisable.

An M4K RAM also takes less area than 6 CLBs, and can do a significantly bigger shift register than 720 bits. An M512 can't quite do 720 bits, (can do 512 bit shifts) and has an area that's somewhere in the 1 to 2 CLB range. So I don't agree with your argument that area is lower for SRL16's for this case.

Vaughn Betz Altera v b e t z (at) altera.com [Remove spaces and put in proper @ to reach me]

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Dec 15, 2004 6:40 PM

For the user, though, it is more complicated. If the RAMs are otherwise unused then they go to waste in the SRL16 case. If one is short on RAM resources, brand A may be at a disadvantage.

In the beginning FPGA's had only one type of cell, and the only question was how many were requried. Now, one has to balance different designs based on the numbers of CLBs, RAMs, and anything else that may be added.

Also, as pointed out in another post, different FPGAs may have different sized RAMs which may affect the optimal solutions to these problems.

-- glen

- P
- Peter
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Dec 15, 2004 7:03 PM

Agreed. Years ago, when we offered unstructured "sandboxes" full of LUTs and flip-flops, it was easy to benchmark and compare. Now all FPGAs offer many features that are both more powerful and more dedicated. This gives the user higher performance at a lower cost, but it makes comparisons more complicated.

Users should take benchmarks published by any one of the competitors with a big grain of salt, especially when they claim a vast superiority of their own product. That is often Marketing at its worst.

If you are serious about evaluating X vs A, then look beyond the LUTs and memories, dig deeper into the architecture, and investigate the systems-oriented functions... Peter Alfke