Virtex 4 released today

- A
- Austin Lesea
  
  Contact options for registered users
posted
19 years ago

Mon, Sep 13, 2004 6:32 PM

All,

As Peter would say, the teasing is over: V4 is ALIVE.

formatting link

for all of the details.

Now I can finally talk about it.

Austin

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Sep 13, 2004 9:15 PM

.. and one of Austin's shortest posts ever ... ;)

-jg

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Sep 13, 2004 9:59 PM

Jim,

Thanks for the encouragement,

(long-winded, who me?)

Aust> Aust>

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, Sep 13, 2004 11:06 PM

You're welcome :)

To give you a chance to wind up on Virtex-4, here are a couple of questions :

Virtex-4 does not seem to be supported in WebPACK

- when is this planned ?

Virtex-4 seems only available in large, BGA packages. When do we expect to see Spartan-4? in TQFP and anything < 360 pins ?.

Xilinx has a nice Spartan-3 Eval PCB for $99, what is the Virtex-4 EvalPCB status ? ( $99 ? :)

-jg

- K
- Kevin Neilson
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 3:12 AM

Whoa; you can't expect the paperback to come out at the same time as the hardcover. -Kevin

- J
- Jon Beniston
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 11:31 AM

Austin,

How fast will the MicroBlaze run in the fastest speed grade?

Cheers, JonB

- G
- Goran Bilski
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 12:53 PM

010701070206040609010302 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit

Hi,

I think it's better that I answer that. MicroBlaze will run about 185 MHz in speedgrade -12. With the new architecture Virtex4, I will need to create a different aspect ratio on the RPM block since this architecture is smaller and higher. VII and V2Pro was more rectangular in the shape. With the new floorplan I achieves 165 MHz in -11 and this will give us around 185 MHz in -12.

Göran Bilski

J>

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 3:04 PM

Jim,

Here we go, (below)

Aust> Aust>

Don't know. I am not in the software world at all, so maybe someone else can answer.

Yes, V4 is a flip chip only device due to the ASMBL architecture (no wirebond packages possible). Flip chip does offer real advantages in the SI realm, as the advanced packaging used in V4 has 1/3 to 1/4 the inductance of any other competitive FPGA packaging to date, which will lead to much less ground bounce LdI/dt (SparseChevron-tm power and ground pins are always within 2-3 pins of an IO pin leading to far better SI). Since almost all problems with the most advanced FPGAs have to do with SI (signal integrity engineering on the part of the customer) we decided that we would do what we could to make their job easier.

The triple oxide allows us to keep the advantages of the low leakage memory cells for config at 130 nm (and soft error resistance), without any penalty for speed. Using three different oxide thicknesses, and low Vt and high Vt transistors means that an IC designer's choices are much better when it comes to designing for speed, and power. No one else has this triple oxide process, and those that have 100% 90 nm core may have issues with thermal runaway due to leakage, as well as serious power issues. But when customers crank the clock up to 500 MHz, and run all that stuff, even at 40% less dynamic power than V2P, there is 2X the logic per area, so that BGA package is needed to get the heat out! (higher clock rate + more logic in the same space = more heat -- nothing is free). When

GSD (Spartan) is their own division now, with their own set of customers (set top boxes, big screen TV's, automobiles, etc.) and their needs are quite different. I suspect that Spartan [n+1] devices will look less and less like the APD (Virtex) division devices due to the customer needs. Spartan is all about gates/$ and IO/$. The more, the better. MHz/ns/speed is almost ignored, as long as the devices meet the bare minimum that folks seem to want. There are also issue of low power in these markets, although V4 is 40% less power than V2P for fabric, and 1 mW/100 MHz for the DSP48 MAC blocks (measured on a full FFT from a column of DSP48's), which puts in the best in the world class, as no one is building 90nm DSPs (yet), let alone with kind of performance. The

405PPC core was a low power version for V2P that was redesigned for V4, so it too is extremely low power. I think that V4 is the lowest power part around right now. Spartan plans to catch up, and beat it, however. in TQFP and anything < 360 pins ?. In fact, puting a V4 in a wire bond package (even if it was possible, which it is not) would be like putting a 700 hp V12 in a Austin Mini Cooper: if would fly apart. Sorry, if you want all that performance, it will only come in the package that can handle it.

Unknown. Sampling of LX25's is open immediately, with stock. Other parts to follow shortly after I get done verifying and characterizing them (which is pretty easy after we've done the LX25).

There is a network board demo'ing 1 GB/s network interfaces, etc with automatic per bit deskew (now built into every IOB). There is a memory interfaces demo board also which shows the superiority of all of the memory interfaces in speed and ease of use.

Contact your local FAE or disti.

And, there are tons of other boards folks are doing as well, that I do not know about.

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 5:15 PM

Antti,

The MGT's are designed to address the same standards as V2 Pro and V2 Pro X.

That said, the ppm frequency shift of SATA when using spread spectrum clocking (0 to -5000 ppm)is not addressed.

Aust>

- I
- IgI
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 9:03 PM

Hi!

A Xilinx representative came today to the company where I work and he had a short but very informative Virtex4 presentation. What I find very useful is that Xilinx finally put a FIFO control logic on BRAMs and significantly increase their performance. Feature to cascade FIFOs will also be very useful for me. I was also hoping to see a 256 deep and 64bit wide BRAMs, but I guess we'll have to wait for that feature for a while?

Several times in the past I bumped into the 8 global clocks limitation on Virtex II. That's why I was very exited to hear that I can use up to 32 global clocks, but after reading the Virtex 4 User's guide (page 21) my excitement cooled down a bit. There is a statement: "However, only eight different clocks can be driven in a single clock region. A clock region is a branch of the clock tree consisting of eight CLB rows up and eight CLB rows down. A clock region only spans halfway across the device." If I understand this correctly, there is still a limitation of 8 global clocks per device, that means max. of 8 different and completely unrelated clocks can be used in all regions of the chip? Please, tell me I'm wrong? ;)

While further reading documentation, I found there are several new variable phase-shifting modes available. What's got me worried (about my last Virtex-II design) is the following sentence: "Using the variable-positive and variable-center modes the phase can be dynamically and repetitively moved forward and backwards by 1/256 of the clock period.". In my last Virtex-II design I used 2 variable phase shifted clocks and I'm adjusting the phase dynamically all the time. So far the design is working, but can I expect for example that after one million adjustments (for the sake of simplicity let's say each adjustment increases the phase for 100 steps and then decreases the phase for the same amount of steps) the clock phase will still be 0. I know there are many parameters that can have influence on the stability of phase adjusted clock, but have you measured how repetitively accurate is fine phase adjustment in Virtex-II compared to Virtex-4?

I believe most of the new features will be very useful, just bring us the productions chips (not ES) as soon as possible, so we won't have to wait too long as it was the case with Virtex-II.

I will probably come up with some new questions tomorrow, because I first have to go over the docs/app notes I downloaded today...

Regards, Igor Bizjak

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 9:30 PM

IgI,

See below,

Aust> Hi!

Yup. Not his time. Aside, do we have enough BRAM?

Sorry, you are correct. Eight clocks per region. I don't understand why folks want so many different clocks-- ever heard of synchronous design? Regions are smaller than they used to be (smaller than quadrants) so having different clocks for different regions still allows a real nightmare of asynchonous clock designs.

That combined with the additional local clocks for IOs gives you even more opportunities to cross asynchronous clock domains.........

Yes we have.

V4 has even finer steps than V2P, and comes back to the same place +/-

12 ps. With V2p it was +/- 25 ps. Absolute. You worry about the oddest things......the DCM is all digital and a state machine, so it has no choice when it comes to where it should be....

One million or one trillion clocks, a flip flop doesn't care, and neither does the DCM.

Virtex II went cleanly into production with no delays. It was Virtex II Pro, and the issue of low K dielectrics that delayed that product family. We don't care to use low-K anymore unless it gets production qualified on someone else's dime.

Happy reading!

- H
- Hal Murray
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 10:12 PM

How good is current software at identifying signals/paths that cross clock domains?

It seems as though there should be some mechanism where you can tag a signal as asynchronous and/or tag the first FF as a synchronizer and specify how much extra time you need. Anything that crosses clock domains without being tagged should get flagged as an error. (Or something like that.)

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.

- J
- Jon Beniston
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 10:16 PM

So that's only 10% faster that a vIIp then?

Don't get me wrong, a 185 MHz CPU is pretty fantasic, it just doesn't seem that the v4 is giving it that much of a kick. Cheers, JonB

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 10:20 PM

on

is a

rows

unrelated

wrong? ;)

Indeed, when I saw that there are 32 global clocks in V4, my heart sank. Unlike Igor, I'm sick and tired of fixing shoddy designs that are the result of inexperienced designers throwing as many clocks as possible at their designs. Go synchronous, young man! Cheers, Syms.

- V
- Vic Vadi
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 11:49 PM

F8A44375B4C9D82F7C90A15D Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit

Actually you are wrong. Virtex 4 has several different clock regions. An LX25 for example has 12 different clock regions. Each of those

12 clock regions can only get 8 global clocks, but those 8 clocks in each of those 12 regions can come from any of the 32 global clock buffers. Plus you also get 2 regional clocks in each clock region for Source-Synchronous type applications.

Enjoy,

- Vic

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 11:52 PM

(snip regarding 32 global clocks in four regions)

A large number of clocks doesn't preclude a synchronous design, but I have to agree that more than eight should be pretty unusual.

(There were some designs discussed recently that used multiple phase clocks, all from the same clock generator. That could still qualify as a synchronous design.)

It would be interesting, though, similar to the way register windows work on the SPARC processor, to have some clock inputs global across the chip, some overlap in different combinations, and the rest local to a region.

-- glen

- V
- Vic Vadi
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Sep 14, 2004 11:58 PM

76654E55B9D356EB34130712 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit

Part of the reasoning behind having 32 global clock buffers is that customers may want to route other high fanout nets such as resets and clock enables on global clock buffers. (The value of this depends on your design style ofcourse)

- Vic

- G
- General Schvantzkoph
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Sep 15, 2004 12:25 AM

There are lots of situations where you need multiple clocks. If you are interfacing to a number of devices that each have their own clock you need a separate clock for each of them. Even if the devices are all using the same source clock their output clocks (for clock forwarded interfaces) are all phase shifted by an indeterminate amount so they have to be treated as asynchronous. I typically capture each device's data in an async FIFO and then transfer it into a primary clock domain. The great majority of the slices are in the primary domain so the device is mostly synchronous but there is still a need for a large number of clocks.

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Sep 15, 2004 12:29 AM

result

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Sep 15, 2004 1:14 AM

So, supposing someone wants to create a 'phased array' of clocks, to push their time-resolution well below the 1/clock, ( as in other threads ) what is the practical limit in Virtex-4, for the number of DCM controled phases that can be generated, AND dispersed thru the chip ? ( IIRC earlier DCM's had just 4 phases ). Can each of the 32 global clock buffers be driven to +1/32 phase advance, for example ?

-jg