tri-state in altera

- D
- digari
  
  Contact options for registered users
posted
19 years ago

Wed, Jun 2, 2004 5:25 AM

hi, i m still in planning phase of my design. i was just looking at xilinx and altera devices. Xilinx provides tri-state buffers as well as tri-state lines whereas altera doesn't and suggests to use muxs insteed of tri-state buffers. Now assume that i have a bus in my design where lots of drivers are there n driving bus through tri-state buffers. I am just wondering what will happen if i implement this design in altera. I'll have to take all drivers at one place, put a mux and re-route them to all sink. won't it affect timing considerably. considering it xilinx becomes obvious choice because of tri-state buffers n lines. Anyone has any other opinion or observation on the topic??

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Jun 2, 2004 4:11 PM

The older Xilinx chips have lots of tristate buffers. But they have been phasing them out for the last two generations and have completed that task with the Spartan 3 chips. The internal tristate buffer is dead!

BTW, if you think routing signals to a common mux is slow, you should check the timing numbers on the tbufs driving long lines which then run around the chip. If you do a really good job of placement, you can minimize the speed penalty. But tristate buffers will *always* be slow due to the nature of a passive pullup.

Altera has a cascade backbone inside their LABs that will AND the outputs of the LUTs at a very high speed. This can implement a very wide AND-OR gate for wide muxes at high speed.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Jun 2, 2004 5:13 PM

Yup,

Tristate is actually slower.

The tristate buffers in Virtex and all subsequent families are in fact separate bidirectional logic structures that simulate the behavior of a tristate bus.

formatting link

see page 11: Spartan 3 is faster and less expensive without any tristate elements at all!

Aust> digari wrote:

- V
- Vaughn Betz
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 4:23 PM

On-chip tri-state is pretty near dead. As metal has gotten slower relative to transistors, making a long wire with multiple tri-state drivers on it has become very slow, since there's no easy way to re-buffer the signal (the signal could be flowing in one of two directions, since there are multiple drive points).

For that reason, all Altera devices have relied on multiplexers to make on-chip buses, rather than on-chip tri-states.

This approach is clearly winning out. Recently a company (I forget the name) filed a patent for SoC designs on ASICs where they use multiplexers rather than tri-state buses. So ASICs are following in FPGA footsteps here. Xilinx has also gradually abandoned on-chip tri-states (the 4K had real tri-states, Virtex-2 has a dedicated distributed mux, and Spartan-3 gets rid of that and relies on the regular logic). Altera has always used the regular logic approach.

So don't worry about the lack of on-chip tri-states in Altera devices

-- muxes are the way to go. Note that once you're using muxes to implement your buses, it also opens up the possibility of using more general switching fabrics (in the limit a crossbar) rather than one centralized bus.

Vaughn Altera

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 4:52 PM

Vaughn,

Yes, we do agree. Nice to let folks know that we agree on many things.

Aust>

- Q
- qlyus
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 6:36 PM

Is Spartan 3 still faster and less expensive when there are 100+

16/32-bit registers on a bus?

-qlyus

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 7:09 PM

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 7:10 PM

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- Q
- qlyus
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 10:05 PM

I always thought the internal tri-state bus in Xilinx was an advantage over Altera's devices. I started to use this feature in 4k series, in which the save of gates in this low density was more obvious when building a bus with 10+ registers.

I have designed many projects with internal tri-state bus. The latest one is the 93-tap FIR filters. Each coefficient is a 16-bit register. With other registers and memory blocks, it is very easy to have the need of 100+ random access. The tartget device is a V-II Pro. The speed is not an issue as the clock for register access can be separate from the data stream clock.

With Xilinx abandon of tri-state and Altera not doing it from the beginning, i am confused with who was smarter.

If Xilinx gets rid of its unique features (such as SRL and tri-state) more and more, or Altera offers more features which used to be unique in Xilinx (Stratix-II started to offer 100+ multiplier), I have to ask why using Xilinx devices anymore?

-qlyus

- E
- Eric Crabill
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 10:28 PM

Hi,

In devices where you are constrained by the number of LUTs, and not performance, TBUFs are great. And you can still code with TBUFs if you want, because the synthesis and mapping tools can automatically transform those into some other (logically equivalent) representation.

I am sure there are at least two answers to this question.

Flame on, dude! Eric

- J
- Jeff Cunningham
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 10:46 PM

hmm.. I guess that means when using tri-states these days I don't need to worry about heating up a part by turning on multiple drivers fighting on a bus. It is simply a matter of data corruption?

Jeff

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Jun 3, 2004 11:09 PM

Jeff,

Yes. No possibility of contention and a "X" value (unknown). In fact that was a real challenge to simulate a "X" condition so that a user felt better. Calling it a 0 or a 1 (which is what really results) and not even having a "z" condition (tri-state) made a few quite uncomfortable when simulating. We had to emulate the tristate behavior in simulation runs.....yuch!

As to who did the "right" thing, Altera recognized early on that tristate muxes were hogs, and were slow, and didn't addict an entire generation to them with a successful product line that had them, whereas Xilinx has had to wean folks off of using them (in effect, break a bad habit) because we had a large number of users who used them, and liked them but they were inefficient and slower than using logic already there.

The perception of being efficient or not is an interesting one: if we had dedicated more area to logic and less to tristate ciruits, which is more efficient? Just another reason why you can argue just about any angle of FPGA architecture as being "good", or "bad".

Definitely a "glass half empty or glass half full" problem. Not a whole lot to get excited about.

At the level most people design at now (VHDL or verilog) instantiating a tristate structure will be automatically get mapped to logic anyway (if you let it) or give you an error message (if you do not allow it and the target has no tristate blocks).

Aust> Aust>

- M
- Marc Randolph
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jun 4, 2004 2:36 AM

Perhaps we aren't typical, but we have done quite a few FPGA's that had over 100 separate 16 bit (or more) control or status registers. We try to use BRAM's for stuff like this when it makes sense to, but most just end up out in the sea of gates.

Marc

- J
- Jan Gray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jun 4, 2004 2:42 AM

When you're trying to squeeze a pipelined RISC processor into a small tile (say 4Rx6C of CLBs + 1 BRAM), (because you intend to tile dozens or hundreds of processors per FPGA), and your result bus needs to mux amongst 4+ sources, and you have to burn several LUTs/bit just for lousy *muxes*, fer gosh sakes, THEN you will shed a nostalgic tear for TBUFs passed (or other non-LUT resources for wide horizontal muxes).

The xr16 profitably used a TBUF for every LUT site in the datapath.

formatting link

The loss isn't so bad once you learn the trick to implement o = a + b ? c; or even o = mux(sel1, sel2){a + b, a - b, a & b, a ^ b}; in one LUT per bit.

formatting link

Jan Gray Gray Research LLC

- J
- Jan Gray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jun 4, 2004 2:44 AM

Oops. I meant: o = sel ? (a + b) : c;

Jan.

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jun 4, 2004 12:15 PM

Jan,

Also, if you put > When you're trying to squeeze a pipelined RISC processor into a small tile

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jun 4, 2004 12:17 PM

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- J
- Jan Gray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jun 4, 2004 2:54 PM

8x20

Ray, there are (not uncoincidentally) 4Rx6C of CLB / BRAM+mult in Virtex-II Pro devices, yes? And up to 444 BRAMs per device? :-)

Also, for good old XCV600E, (NB half as many slices per CLB), I used 8Rx6C per processor, floorplanning 60 16-bit CPU + BRAM tiles or 36 32-bit CPUs +

2 BRAM tiles.

formatting link

TBUFs R.I.P.

Jan Gray Gray Research LLC

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jun 4, 2004 4:37 PM

One interesting aspect of the TBUFs is that they went onto long lines, which were, well, long. That helped simplify floor planning.

Assume that I have a design in mind where I would have used TBUFs. Is there some layout pattern that works well after I switch to using MUXes? Do I just toss it on the chip in some sensible looking way and assume the routing will be good enough? What if I'm pushing the speed or density envelope?

I guess I'm slightly surprised that some quirky feature hasn't evolved to replace that nitch - something like a 2:1 mux or 2 input OR tied to special routing. (with a pitch to match an adder using the dedicated carry logic) Maybe the routing is just good enough for the old type of design and newer chips are big enough so that the typical design is a different sort of project.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jun 4, 2004 4:58 PM

Keep in mind that the newer Xilinx chips have a MUXF6 which allow up to

8 input muxes to be made with a single level of delay. That compares well with the 16 input mux you can make from an Altera LAB. Routing is an issue, but the speed of the tbufs driving long lines make them pretty impractical for the newer chips running at high speeds. If you don't need speed, you can use a single wire with a serial bus to reduce the amount of logic and routing used. What the newer chips provide is speed and lots of it. That can do a lot to reduce the size of a design.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX