How many Altera LE's to Xilinx Slices????

Hello All,

I've been designing with Xilinx FPGAs for a while so I'm used to the "Slice" concept. I'm looking at Altera's Max II as a nice possible solution for a design.

I took my VHDL code and it synthesized to 40 Slices in a Spartan III. Then I took the same code and sythesized it for a Max II (using Quartus II now) and it was 71 LE's.

I realize a blanket statement 71 LE's (approx. =) 40 Slices, is totaly dependant on how the code is sysnthesized.

But is a approximate 1 Slice = 2 LE's a pretty close all around estimate.

Thanks Eric

Reply to
Guitarman
Loading thread data ...

Hi Eric,

Give or take ~10% as a design-dependant margin and you should be OK.

Best regards,

Ben

Reply to
Ben Twijnstra

The problem is not a hardware issue, but a granularity issue. Slices are not a good measure of how much logic your design is using. Slices have two LUTs and two FFs. If one FF is used, the slice is counted as used. You are better off determining how many LUTs and FFs are used in each design. They are much more comparable although there will be family dependant differences in how well the designs can pack into the larger granules. Mostly the newer parts will pack logic and FFs more densely than the older parts.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

Followup to: By author: snipped-for-privacy@hotmail.com (Guitarman) In newsgroup: comp.arch.fpga

Well, given that 1 slice = 2 LUTs + 2 FFs + some more logic, and 1 LE = 1 LUT + 1 FF + some more logic, it would be expected.

-hpa

Reply to
H. Peter Anvin

Hi Eric,

Yes, that's a good 1st order estimate. We believe that 1 Slice is equal to about 1.8 LEs based on average results across a suite of designs, but mileage will vary from design to design -- this lines up well with your result though.

One thing you should do is ensure that the CAD tool is trying to use as few LEs (and slices for Xilinx) as possible. When you are not filling up the device, Quartus will not try too hard to put LUTs and FFs into the same LE -- if there's any chance it will hurt rather than help timing, it will avoid it. When you start filling the device close to capacity, Quartus will try to pack more aggressively. This is the default "auto" setting for register packing.

To artificially force Quartus to pack as aggressively as possible into LEs, go to the menu Assignments/Settings... select the Fitter Settings tab, and click the "More Settings..." button. There is a setting called "Auto Packed Registers -- Max II". Setting this to Minimize Area w/Chains will cause the most aggressive packing.

Also, under the Analysis & Synthesis Settings tab, you can try out the "area" optimization technique which heuristically cares more about area than delay, though doesn't always necessarily reduce LE count.

Regards,

Paul Leventis Altera Corp.

Reply to
Paul Leventis (at home)

"Guitarman" a écrit dans le message de news: snipped-for-privacy@posting.google.com...

I disagree, both architectures are different, you can't compare it in this way have how many slices into the following code ? ..... DI : in std_logic; DO : out std_logic; CLOCK : in std_logic; ..... ....... signal temp: std_logic_vector(15 downto 0); ...... begin

Demo : process(CLOCK) begin if rising_edge(CLOCK) then temp

Reply to
Walter Gallegos

What would make the timing better if the LUT and FF are not packed in the same LE?

I'm assuming that there is a very good path connecting the LUT/FF in the same LE because it is such a common case. What makes not using that faster?

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.
Reply to
Hal Murray

He is not talking about a LUT and FF that are connected, he means ones that are separate. Like a FF with the D input connected to the output of another FF and a LUT that has its output going to another LUT only. Unless there is a shortage of IO in the LAB, they can share the same LE. Same thing in the Xilinx slice. Due to crowding of the routing, it may result in a faster design to keep them separate.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

Hi Hal, Rick:

Rick's got it mostly right. The Stratix/Cyclone/Max II LE/ALMs can have a number of register/LUT pairings:

  1. LUT feeds FF
  2. FF feeds LUT
  3. Unrelated FF and 3-input LUT
  4. FF->FF connection from adjacent LE and a 4-input LUT (a register chain) For example, we could pack an 8-bit shift register in with 7 4-LUTs and 1
3-LUT to form 8 LEs.

As Hal observed, it seems like doing #1 (or #2) is always a win. If you look at one FF, in our architecture we can choose to pack it with its fan-in (#1) or fan-out (#2). For example, if the critical path of the design is on the output of the FF, through only one of its LUTs, using packing #2 is the better choice for that flop. So there is an interesting optimization problem here.

Some of the LEs created by #1 or #2 will have two seperate LE outputs (the Flop and the LUT) in the event that the FF/LUT connection is not single fanout. In theory, these multiple output LEs create a bit more routing pressure and so you may hurt timing more by making one than you do by bringing the FF and LUT together. But our routing architecture has been designed to tolerate aggressive packing.

One way that using packing #1 or #2 can be sub-optimal is in the event where the flop really wants to be placed somewhere in-between all the things that it feeds and feeds it. Packing it with either source or destination might help one path, but hurt others more than if you just left the FF in a seperate LE and thus were free to move it where it wanted to be during placement.

Now, when you look at #3, you must be intelligent in how you pack. If you take two unrelated functions that otherwise would want to be in opposite corners of the chip and put them together, you can hurt timing. Also, as Rick points out, LEs of this type will have 4 inputs and 2 outputs; if you make many of them you can start stressing the routing and this can lead to lower performance. Incidentally, this packing problem also arises on Stratix II when it comes to packing multiple functions into an ALM -- if they are unrelated, you must choose pairings wisely to not hurt performance.

Packing #4 is particularly nasty from a CAD perspective. Creating these packings implies a group of LEs that must all be placed in the same LAB (register chain) and must move as a group. This further restricts placement and routing choices, and thus has the largest chance of being a net negative. But it can also help reduce the number of LEs in some designs.

Note: The more your pack together into LEs, the closer in general you can place the LEs of a design, so doing these packings can also help performance :-)

The trade-offs are likely different here. The VII slice has some FF packing capabilities. It can do #1, but #2 requires use of local routing (I think). It's not clear to me from the slice diagram whether packing #3 can be done. #4 is not possible. Also, I'm not sure how well the architecture responds to slices with multiple outputs (using the Y and Q outputs at the same time). If it was not architected for heavy use of both outputs, there could be more routing/performance trade-off here. This is all speculation.

What I do know is when we compare half-slice vs. LE counts on a suite of designs, we find a ~9% advantage for Quartus + LEs over ISE + slices. We believe that the primary reason for this difference is the increased flop packing density available in the Altera LE.

Regards,

Paul Leventis Altera Corp.

Reply to
Paul Leventis (at home)

Not just routing, but also placement: The separate pieces (FFs, LUTs etc) are not placed independantly, but are packed together and then placed. Thus if unrelated logic is packed together inappropriately, the placement for the packed component may be significantly worse than if each component was placed separately.

--
Nicholas C. Weaver.  to reply email to "nweaver" at the domain
icsi.berkeley.edu
Reply to
Nicholas Weaver

The answare is

1 slice into a Spartan 3 16 LE into a MAX-II

Can you compare this architectures as 1 Slice = 2 LE's ?

Walter.

"Walter Gallegos" a écrit dans le message de news: snipped-for-privacy@news.supernews.com...

this

Reply to
Walter Gallegos

I agree that there some areas that you can't simply compare the two architectures. For example, I had an old design with an Altera 10K series that used a fully async RAM block. Now, move it to a Spartan 3 architecture and you see that you should use the whole chip just to make that block of async RAM! However, it is perfectly understandable that a user might need to compare different available options and to do this, he/she would need to have rough estimates to compare a Xilinx device to that of Altera. For example, recently I had this interesting offer for a an FPGA prototype board with the same price of $99 for an Altern EP1C12 or a Xilinx XC3S400. I would like to use a prototype board for very different designs so I had to compare between the two chips. As I program in VHDL and use synthesize tools, I don't really care for any specific architecture (unless something like your example or my example above happens) and the thing that matters in cases like that is you only look for the BIGGER FPGA. To do it, you need to compare and to compare you can only use rough estimates. Personally, I find the simple equation of 1 Slice = 2 LE a very good rough estimate and for many designs it gives you a good answer. You have a very specific design and need a very good answer? Fire your synthesize tool and see how much resources you'd really need!

Reply to
Arash Salarian

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com
formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Reply to
Ray Andraka

(snip)

I still miss the XC4000 series where the carry chain was separate from the LUTs, for convenient implementation of saturating adders and MAX(a,b) functions by feeding the carry out or overflow back to an LUT input.

-- glen

Reply to
glen herrmannsfeldt

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com
formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

Reply to
Ray Andraka

Yes, this is my point, Both structures have different resources, when write your code; your code stile make the difference.

Walter.

"Arash Salarian" a écrit dans le message de news:417397f6$ snipped-for-privacy@epflnews.epfl.ch...

architecture

rough

like

your

Reply to
Walter Gallegos

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.