tri-state in altera

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
hi,
i m still in planning phase of my design. i was just looking at xilinx
and altera devices. Xilinx provides tri-state buffers as well as
tri-state lines whereas altera doesn't and suggests to use muxs
insteed of tri-state buffers.
Now assume that i have a bus in my design where lots of drivers are
there n driving bus through tri-state buffers. I am just wondering
what will happen if i implement this design in altera. I'll have to
take all drivers at one place, put a mux and re-route them to all
sink. won't it affect timing considerably.
considering it xilinx becomes obvious choice because of tri-state
buffers n lines. Anyone has any other opinion or observation on the
topic??

Re: tri-state in altera
Quoted text here. Click to load it

The older Xilinx chips have lots of tristate buffers.  But they have
been phasing them out for the last two generations and have completed
that task with the Spartan 3 chips.  The internal tristate buffer is
dead!  

BTW, if you think routing signals to a common mux is slow, you should
check the timing numbers on the tbufs driving long lines which then run
around the chip.  If you do a really good job of placement, you can
minimize the speed penalty.  But tristate buffers will *always* be slow
due to the nature of a passive pullup.  

Altera has a cascade backbone inside their LABs that will AND the
outputs of the LUTs at a very high speed.  This can implement a very
wide AND-OR gate for wide muxes at high speed.  

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera
Yup,

Tristate is actually slower.

The tristate buffers in Virtex and all subsequent families are in fact
separate bidirectional logic structures that simulate the behavior of a
tristate bus.

http://www.xilinx.com/bvdocs/appnotes/xapp466.pdf

see page 11:  Spartan 3 is faster and less expensive without any
tristate elements at all!

Austin

rickman wrote:
Quoted text here. Click to load it

Re: tri-state in altera
Is Spartan 3 still faster and less expensive when there are 100+
16/32-bit registers on a bus?

-qlyus


Quoted text here. Click to load it

Re: tri-state in altera
That's hardly a typical application. Also, I'm not sure if there ever were
any FPGAs of any flavour that had longlines with 100+ tristate drivers on
them. You should put those registers in a BlockRam mate!
Cheers, Syms.
Quoted text here. Click to load it



Re: tri-state in altera
 >
 >>Is Spartan 3 still faster and less expensive when there are 100+
 >>16/32-bit registers on a bus?
 >>
 >>-qlyus
 >>
 >>
Quoted text here. Click to load it

Perhaps we aren't typical, but we have done quite a few FPGA's that had
over 100 separate 16 bit (or more) control or status registers.  We try
to use BRAM's for stuff like this when it makes sense to, but most just
end up out in the sea of gates.

    Marc

Re: tri-state in altera
With that many registers, you might look at alternative architectures.  If
sequential access is
normally used, a shift register works well.  If random access is needed, the
troublesome part is
readback.  You might consider readback through a dual port RAM such that the
external world has
access to one port and the FPGA internals have access to the other port.

qlyus wrote:

Quoted text here. Click to load it

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera

Quoted text here. Click to load it

hmm.. I guess that means when using tri-states these days I don't need
to worry about heating up a part by turning on multiple drivers fighting
on a bus. It is simply a matter of data corruption?

Jeff


Re: tri-state in altera and xilinx
Jeff,

Yes.  No possibility of contention and a "X" value (unknown).  In fact
that was a real challenge to simulate a "X" condition so that a user
felt better.  Calling it a 0 or a 1 (which is what really results) and
not even having a "z" condition (tri-state) made a few quite
uncomfortable when simulating.  We had to emulate the tristate behavior
in simulation runs.....yuch!

As to who did the "right" thing, Altera recognized early on that
tristate muxes were hogs, and were slow, and didn't addict an entire
generation to them with a successful product line that had them, whereas
Xilinx has had to wean folks off of using them (in effect, break a bad
habit) because we had a large number of users who used them, and liked
them but they were inefficient and slower than using logic already there.

The perception of being efficient or not is an interesting one:  if we
had dedicated more area to logic and less to tristate ciruits, which is
more efficient?  Just another reason why you can argue just about any
angle of FPGA architecture as being "good", or "bad".

Definitely a "glass half empty or glass half full" problem.  Not a whole
lot to get excited about.

At the level most people design at now (VHDL or verilog) instantiating a
tristate structure will be automatically get mapped to logic anyway (if
you let it) or give you an error message (if you do not allow it and the
target has no tristate blocks).

Austin

Jeff Cunningham wrote:
Quoted text here. Click to load it

Re: tri-state in altera and xilinx
When you're trying to squeeze a pipelined RISC processor into a small tile
(say 4Rx6C of CLBs + 1 BRAM), (because you intend to tile dozens or hundreds
of processors per FPGA), and your result bus needs to mux amongst 4+
sources, and you have to burn several LUTs/bit just for lousy *muxes*, fer
gosh sakes, THEN you will shed a nostalgic tear for TBUFs passed (or other
non-LUT resources for wide horizontal muxes).

The xr16 profitably used a TBUF for every LUT site in the datapath.
[http://www.fpgacpu.org/xsoc/cc.html ]

The loss isn't so bad once you learn the trick to implement
  o = a + b ? c;
or even
  o = mux(sel1, sel2){a + b, a - b, a & b, a ^ b};
in one LUT per bit. [http://www.fpgacpu.org/log/nov00.html#001112 ]

Jan Gray
Gray Research LLC



Re: tri-state in altera and xilinx
Quoted text here. Click to load it

Oops.  I meant:
  o = sel ? (a + b) : c;

Jan.



Re: tri-state in altera and xilinx
Jan,

Also, if you put one block Ram per processor, you get an area of at least 8x20
CLBs for each block RAM.  I don't miss the TBUFs as much as I thought I
would...most of the time.

Jan Gray wrote:

Quoted text here. Click to load it

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera and xilinx
Oh yeah, one other trick that sometimes helps.  If yor resets are available on
flip-flops leading into a 4:1 mux, you can use the resets as a select so that the
mux reduces to a 4 input OR.  Sometimes works for pipelined stuff, but probably
not good for your processor.

Ray Andraka wrote:

Quoted text here. Click to load it

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera and xilinx
Quoted text here. Click to load it
8x20

Ray, there are (not uncoincidentally) 4Rx6C of CLB / BRAM+mult in Virtex-II
Pro devices, yes?  And up to 444 BRAMs per device? :-)

Also, for good old XCV600E, (NB half as many slices per CLB), I used 8Rx6C
per processor, floorplanning 60 16-bit CPU + BRAM tiles or 36 32-bit CPUs +
2 BRAM tiles. [http://www.fpgacpu.org/log/mar02.html#020302 ]

TBUFs R.I.P.

Jan Gray
Gray Research LLC



Re: tri-state in altera and xilinx
Jan,

Of course!  I was thinking in terms of V2 not V2P.  Don't get to use the latter
as much as the former because of the nature of the clients I've been dealing
with (several space and military projects).

Jan Gray wrote:

Quoted text here. Click to load it

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera and xilinx
Quoted text here. Click to load it

One interesting aspect of the TBUFs is that they went onto long lines,
which were, well, long.  That helped simplify floor planning.

Assume that I have a design in mind where I would have used TBUFs.
Is there some layout pattern that works well after I switch to
using MUXes?  Do I just toss it on the chip in some sensible
looking way and assume the routing will be good enough?  What if
I'm pushing the speed or density envelope?

I guess I'm slightly surprised that some quirky feature hasn't
evolved to replace that nitch - something like a 2:1 mux or 2 input
OR tied to special routing.  (with a pitch to match an adder
using the dedicated carry logic)  Maybe the routing is just good
enough for the old type of design and newer chips are big enough
so that the typical design is a different sort of project.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera and xilinx
Quoted text here. Click to load it

Keep in mind that the newer Xilinx chips have a MUXF6 which allow up to
8 input muxes to be made with a single level of delay.  That compares
well with the 16 input mux you can make from an Altera LAB.  Routing is
an issue, but the speed of the tbufs driving long lines make them pretty
impractical for the newer chips running at high speeds.  If you don't
need speed, you can use a single wire with a serial bus to reduce the
amount of logic and routing used.  What the newer chips provide is speed
and lots of it.  That can do a lot to reduce the size of a design.  

--

Rick "rickman" Collins

snipped-for-privacy@XYarius.com
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera and xilinx
The MuxF5's and MUXF6's have the wrong pitch to match up to arithmetic, which
makes
them a pain in the tail to use on heavily arithmetic designs.   The mux pitch has
been a consistent complaint about the Virtex architecture.  Routingto them, as
you
point out, is also an issue.

rickman wrote:

Quoted text here. Click to load it

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera and xilinx
Quoted text here. Click to load it

I'm not sure that "lots of speed" translates into don't need tbufs.
I'd expect that designers expectations and goals would grow to use
all available resources - both space and time.

Yes, if I'm using a modern/fast part to implement an old design,
I may be able to make speed/space tradeoffs.  But I could also be
speeding up the whole project and expecting a state machine that
used to run a X MHz to now run at 3X or 5X.  (adjust your goals
to match the age of your design)

Is there something fundamentally evil with tbufs?  Or is the problem
that they don't scale because the chips are getting bigger (when
measured in gates, not microns).

Suppose I design a FPGA with old fashioned tbufs and long lines, but
don't cover the width of the whole chip, but just X LUT/FF units.
Would that track other speed improvements as silicon gets faster?

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
We've slightly trimmed the long signature. Click to see the full one.
Re: tri-state in altera and xilinx

<Snip>
Quoted text here. Click to load it

No, interconnect does not scale.  Interconnect gets slower as the
device geometry gets smaller.

Transistors scale.  As they get smaller, the operating voltage
decreases and the switching speed increases.  Nice, eh?

Interconnect has a bulk resistivity set by the material.  The end-to
end resistance is (resistivity*length)/(width*thickness).  If the
ratios between length : width : thickness are constant, the resistance
doubles if the size halves.  This is why interconnect was almost
ignorable at 3 micron geometry and is a major source of delay at .90
micron, even after changing to copper with a lower bulk resistivity.


--
Phil Hays
Phil_hays at posting domain should work for email


Site Timeline