Adding "super-LUTs" to FPGA, good idea ?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi,

A thought cross my mind ...

I've been working much on Virtex4 lately and getting fast (~300-350 Mhz)
logic for the datapath isn't really hard. But making the control stuff
go that fast is a whole lot more tricky, just a 10 bits comparator
becomes "a lot" at that speed ... and some control signals have high
fanout and that brings the net delay in the 1 - 1.5 ns range which is
half of the period ...

So what if every now and then in the FPGA fabric, there was a small
cluster of like 1 CLB with "Super LUTs" that would have a whole lot
faster logic (but no special func like SRL and distributed ram) and
"bigger" drivers to charge/dischare the net faster to propagate the
controls.

Maybe it's un-feasible for some reason, it's just a thought ...


    Sylvain

Re: Adding "super-LUTs" to FPGA, good idea ?
Quoted text here. Click to load it

I guess altera would claim they have it in the stratix ALm
AL
(Antti Lukats)





Re: Adding "super-LUTs" to FPGA, good idea ?

Quoted text here. Click to load it

They do ?
I'm gonna check that out ...


    Sylvain

Re: Adding "super-LUTs" to FPGA, good idea ?
Quoted text here. Click to load it

not quite so but they claim to have 7-input lut capabilities for better
logic opt.

antti






Re: Adding "super-LUTs" to FPGA, good idea ?
Quoted text here. Click to load it



I think if you look at the logic that is not making speed, it is
probably using the carry chain (comparators over 7 bits do, for
example).  General logic is quite fast in V4.  The carry chain is very
slow comparatively, which has been a beef of mine.  Simply speeding up
the carry chain so that reasonable sized adders (16-24 bits) can run at
speeds similar to the block rams and DSP slices would make all the
difference.  (yes Austin, I know the "simply" isn't all that easy).

You already do have "super LUTs" in the Virtex4.  They are called
RAMB16, and can be used for logic functions with up to 14 inputs, at
clock rates of 400 MHz in a -10 part.

The other option you do have is to optimize your control logic to reduce
the reliance on difficult structures such as carry.  For example, if
your control is using a compare to decode a count, consider instead
using a down counter so that the terminal count is the most significant
bit.  Also consider other counter architectures, such as linear feedback
shift register counters to eliminate wide logic functions.
Quoted text here. Click to load it

Re: Adding "super-LUTs" to FPGA, good idea ?
Quoted text here. Click to load it


... Well 400 MHz if you register both side and don't have too many logic
before and after.

A block ram without output reg is like 2.1 ns clock to out and around
0.5 ns net delay after. If you have output reg then it's 0.9 ns clock to
out. But sometimes you just can't have a 1 or 2 clock cycle latency ...

And here I was more referring to the drive strenght than the number of
input nets. For example if you have to generate a clock ena
combinatorially (just a single LUT level but still) and it controls like
 50 FFs, the net take like 1.5 ns propagation ... half of my period ...


Quoted text here. Click to load it

Well, yes optimizing control is good but sometimes very hard ... I've
basically spent the last few days just doing that to finally meet
timing. My comparators are not for counters but to detect a "empty"
condition in a FIFO like block. ('FIFO like' because it's quite more
complicated than a simple FIFO).



    Sylvain


Re: Adding "super-LUTs" to FPGA, good idea ?
Agreed about the BRAM speed.  You pretty much have to use the DO_Reg for
a 400 MHz design in a -10 part.  There shouldn't be any logic between
the previous register and inputs to the BRAM, and the outputs can go
through a single level of logic, but placement isn't critical.

As I said, the real stumbling block for fast fabric stuff is the carry
chain.  If you are using an SX part, you can use the DSP48's to get
faster arithmetic, but at a considerable cost.

I stand by my contention that if the carry chains were faster (more
specifically, the time to get on and off them), you'd probably find it a
lot easier to make timing in your design.

Re: Adding "super-LUTs" to FPGA, good idea ?


Quoted text here. Click to load it


I don't know if your initial idea is feasible or not, but it sounds
good to me.

In the meantime, you can reduce the fanout (at cost) by using logic
duplication. If you duplicate the signal and drive only half the flip
flops, that should improve your timing (at the increased cost in terms
of area).
You can do that in one of two ways:
1. Manually (in your code) create two signals, and set options so that
your synthesis tool does not optimize redundant logic
2. Turn on logic duplication,and hope the synthesis tool will recognize
that the critical path can be improved by duplicating that piece of
logic

Fred


Re: Adding "super-LUTs" to FPGA, good idea ?

Quoted text here. Click to load it
.... (stuff deleted)
Quoted text here. Click to load it

I have the same problem with the high clock to output. I found that when
putting the signal through a delay (SRL16's) I can actually detect zero
condition BEFORE going in, and shift the zero signal with the same delay. In
theory I can detect some bits at each shift, making it very fast. When using
RAMB I can also detect zero condition on any port and reserve a bit for
that.



Re: Adding "super-LUTs" to FPGA, good idea ?
Sylvain Munaut schrieb:
Quoted text here. Click to load it
Well, there are couple of 14-Input LUTs in their newer devices. The
speed is about 2ns in Virtex-4.
They call them BRAMs.

Kolja Sulimma

Re: Adding "super-LUTs" to FPGA, good idea ?
And one of those dual-ported BRAMs can be
either
two identical, but independently addressable, 14-input LUTs,
or two completely different, independent 13-input LUTs.
Naturally...
Peter Alfke


Important BRAM safety tip ( was: Adding "super-LUTs" to FPGA, good idea ?)
Quoted text here. Click to load it

An important "Danger Will Robinson" observation on using BRAMS:

 If you violate setup/hold on the address inputs of an enabled BRAM,
EVEN IF WE IS INACTIVE, BRAM contents can (will) be corrupted.

This means:
 - No multicycles (unless you use EN).
 - No async inputs.
 - TIMING CONSTRAINTS ARE A NECCESSITY!!!

IF BRAM TIMING CONSTRAINTS ARE NOT SET PROPERLY,
AND MET, BRAM CONTENTS WILL BE CORRUPTED!!!

See Answer Record 21870
"Virtex-II/-II Pro/-4 block RAM - Do the setup/hold times for the
Address inputs need to be met, even if the output is unused and
WE is deasserted?"

Brian


Re: Important BRAM safety tip ( was: Adding "super-LUTs" to FPGA, good idea ?)
Brian, I think you overdramatize this.
(I was involved in finding and explaining this behavior a few weeks
ago).

Anybody who writes into the BRAM must of course abide by the address
set-up time requirement.
Anybody who reads from the BRAM must also abide by the address set-up
time requirement.
The surprising, non-obvious requirement is that, if the BRAM is
enabled, a violation of the address set-up time can corrupt data, even
though WE remained disabled.
So, do NOT change the address right before the enabled active clock
edge.
You would obviously not do this when you are writing, and you wouldn't
do it when you are reading, but you must also not do it when you have
the BRAM clock-enabled and read-enabled and you really do not care
about the result of the uncontrolled read operation. The easy way out
of it is to disable the clock, not just WE.

Thisis a highly unusual (but explainable) restriction, so unusual that
neither Xilinx nor any customer  found it for many years.
Peter Alfke, Xilinx Applications


Re: Important BRAM safety tip ( was: Adding "super-LUTs" to FPGA, good idea ?)
Quoted text here. Click to load it
Hardly- it should be mentioned front and center in the BRAM
sections of the datasheet and user guides; in bold print; with
circles and arrows and a paragraph on the back explaining
the problem.

 Adopting the same head-in-the-sand, "it's in an Answer Record
somewhere", mentality that, of recent years, has pervaded
Xilinx's approach to documenting serious problems, does not
help your customers one whit.

Quoted text here. Click to load it
The thread in question was about using BRAMS as logic.

Who would expect a ROM to clobber its' own contents due to an
address setup violation?

Brian


Re: Important BRAM safety tip ( was: Adding "super-LUTs" to FPGA, good idea ?)
<snip>
Quoted text here. Click to load it

Valid point.
  Reminds me of an oops Philips made in their UARTS,
which was a Test mode kicked into by READ [?!] of a certain address.
  So, yes, without care on the selection lines, this could go
out-to-lunch. Took them a while to admit to it.....

  Another issue here, is if this IS loaded/used as a ROM,
what happens during brownout, where it is quite possible that timing MAY
be violated.
  Sounds like there could be a lot of ?? space between
the 'Let's Reconfigure' decision point, and the 'Inside Specs'
operate point ?

-jg


Re: Important BRAM safety tip ( was: Adding "super-LUTs" to FPGA, good idea ?)
Quoted text here. Click to load it

 Or if the BlockROM clock is sourced by a DCM which goes unlocked,
thus rendering all BlockROM contents unreliable until the device has
been reconfigured.

Oops.

Better not use that XST BRAM_MAP logic-into-BRAM mapping option
any time soon...

Brian


Re: Important BRAM safety tip ( was: Adding "super-LUTs" to FPGA, good idea ?)

Quoted text here. Click to load it

I have to agree with Brian.  This is a big deal.  I expected that
violating read address setup time would screw up the read result that
cycle; I was amazed to find out that violating address timing could
actually change the contents of the RAM.  I imagine that anyone using
the BRAM as a ROM, with WE arc-welded to ground, would be doubly
surprised.

I have a design in which different buses supply the read and write
addresses to a large number of BRAMs.  The write address is
synchronized; the read address isn't, because  I saw no reason to, at
least until I saw Answer Record 21870 (which I saw only by accident,
thanks to a tip from another designer).  Even then, it took about a
week and a half working with the Hotline and an FAE before I found out
what the Answer Record actually meant, the original version having
been more vague than the current one.  So I've got a design that I
have to redo.

The "easy way out of the problem" is easy only if you know there's a
problem in the first place.

I heard from the Hotline that the data sheets for the affected
families would be amended.  If amended data sheets haven't been
released already, I hope they will be soon.

I guess the thing that bothers me the most is that once the problem
was identified, no one at Xilinx seemed to know that when RAMs don't
work like RAMs, it's potentially a Big Damn Deal for at least some
designers, and deserving of something more than to be hidden away in
an Answer Record that you might or might not see.

Bob Perlman
Cambrian Design Works

Re: Important BRAM safety tip ( was: Adding "super-LUTs" to FPGA, good idea ?)
Quoted text here. Click to load it


I'm with Brian and Bob on this.  As a designer, we need have limitations
like this as well as those with the FIFO16's printed in bold right in
the users guides so that it can be avoided by design rather than
discovered in the lab.  Findng it in the lab is too late in the design
cycle.  The question is, what other gems like this are hidden away in
obscure answer records?

Site Timeline