Virtex4 and ISE reality check?

I've heard a few painful stories about implementing Xilinx Virtex4 designs (bad SDF, bad bitfiles, power supply requirements, etc...)

Are the parts and tool flows really ready for production designs with the V4? Should I bet the farm on getting a V4 design into production by July?

Our vendor promises industrial-temp parts by June. We're merging 3 mid-size V2Pro chips into one V4. The only tricky part is our 12-bit DDR deserializers on the front end: 5 interfaces at 360 MHz (times four

720 Mbps data streams on each interface).

How accurate/reliable is the back-annotated SDF out of ISE (v7.1.01i)?

How accurate/reliable are the timing reports from trce?

If trce generates the SDF, they should correlate well, but do they? Is either one really trustworthy for analyzing/debugging? (I'm still trying to understand some of what I'm seeing on the V2Pro designs.)

If I use the timing prorate options (max voltage, coldest temperature) in an attempt to anaylze min timing, are the trce reports and SDF output realistic?

With fast DDR inputs, hold times are important too. The additional IOB delay element (~1.1ns if enabled?) also adds uncertainty to the setup/hold requirements, making for tighter data valid window requirements, and appears to do more harm than good in this case.

OTHER TOOLS:

Does anyone have specific comments regarding Synplicity (Synplify Pro) or Synopsys (DC-FPGA and PrimeTime) on a Virtex4 design?

Thanks very, very much for your help.

mj

Reply to
jjohnson
Loading thread data ...

Hi

I bet no one can answer all your questions at the present time - for sure V4 DDR Designs works, so the tools must work too, but...

You asked a lot of questions about timing and reports and the accuracy of them, I assume you do not have a FPGA test board for your design, so looking at your schedule (production start July 2005 ?) there is on thing I can say for sure, - with your deadline the only way to be somewhat confident that the design will work is following:

go buy the ML461 memory reference board next monday, and implement the tricky part on that part ASAP, and let it run continous in-FPGA memory test, check for temperature influence, etc.. only based on those results you can decide. If your management doesnt approve that investment (buying the ML461 board) then forget about V4 with DDR design for production start in July.

looking at the timing reports will not give you the confidence level needed (at least that is my opinion), I'm sure some/many will disagree here..

Antti PS as of ISE vs Synplify, Synplify is known to be better, but not always at the moment I have a desing where ISE 6.3 meets PCI-X 100MHz timing, but Synplify reports 31 levels of logic (vs 11 with XST) and the timing fails of totally. No idea why this happens with that design, just a note that Synplify doesnt necessarily always do thing better than XST synthesis.

schrieb im Newsbeitrag news: snipped-for-privacy@z14g2000cwz.googlegroups.com...

Reply to
Antti Lukats

Antti Lukats wrote: [...]

accuracy of

looking

can say

that

the

memory test,

you can

ML461

July.

Not to mention that he was asking for I-temp parts:

I don't know if "our vendor" means that the distributor is making these promises, but I'd want it in writing with financial penalities - because I'm pretty sure there is no way for them to meet that date: I-temp parts typically aren't available for a number of months after C-temp production.

needed

here..

Considering that the design is supposedly already proven out, with one exception, I'll go ahead and disagree with you :-). Confirmation via timing analysis should be enough, with the exception that one or more of the errata may cause some grief (depending on his design). As the OP hints at, the data and clock relationship for the DDR interface (2x360 MHz, I think) provides a challenge as well, but if simulated accurately, shouldn't pose major issues. I'm basing this on the fact that my V4 interfaces (most of which migrated from V2Pro) came right up.

But unlike the OP, I can say all of this because I have the luxury of not having to face the music he will have to face if/when something goes wrong and there is no time in this seemingly insane schedule to fix it.

Speaking of which, why is there such an accelerated schedule from a .edu?

always at

but

fails of

synthesis.

Ken McElvain from Synplity (who usually hangs out here), would probably be VERY interested in seeing your design - assuming this still happens in 8.0.

Have fun,

Marc

Reply to
Marc Randolph

"Marc Randolph" schrieb im Newsbeitrag news: snipped-for-privacy@o13g2000cwo.googlegroups.com...

Hi Marc,

good to see some one disagreeing with me ;) yes and no to your comments - my comment was mainly because as you said 'insane schedule', otherwise I would be more relaxed. I did some very small V4 stuff and did see weird things, and its all bleading edge, etc... all that triggered my comment.

and as of Synplify the accident is with V 8.0 !

Antti

Reply to
Antti Lukats

Hi Antti,

Testing out a system on a board only proves that the system will work on that chip -- (accurate/correct) timing reports with proper timing constraints are the only way to guarentee that a all chips will work across all specified conditions. So while it is good advice to see if things work on a board, this only rules out a very bad timing problem, and won't identify marginal timing issues.

Regards,

Paul Leventis Altera Corp.

Reply to
Paul Leventis (at home)

You wrote: "With fast DDR inputs, hold times are important too. The additional IOB

delay element (~1.1ns if enabled?) also adds uncertainty to the setup/hold requirements, making for tighter data valid window requirements, and appears to do more harm than good in this case."

This is a big misunderstanding. The IDELAY feature allows you to adjust the input timing by moving either the clock or the data with 75 picosecond granularity. We put this programmable (and servo-stabilized) delay feature on every pin, exactly to solve challenges like yours. I am at home now, but I will send you the pertinent information tomorrow.

1 Gbps DDR @ 500 MHz clock rate is what we (conservatively) aimed for. Peter Alfke, Xilinx Applications.
Reply to
Peter Alfke

Take a look at XAPP802 (overview) and XAPP702 (memory interfaces) and XAPP700 (network interfaces) Just enter these names in the upper right hand corner search window on the Xilinx website. Here is a (not so short) description of a powerful approach:

Capturing the Input Data Valid Window.

Let's assume a continuously running clock and a 16-wide data input bus. Let's assume the clock is source-synchronous, i.e. its rising transition is aligned with the data transitions, and all these transitions have little skew.

The user faces the problem of aligning the clock with respect to the data in such a way that set-up- and hold-time specs are obeyed and (hopefully) data is captured close to the center of the data valid window. Given the fairly wide spread between worst-case set-up- and hold-time as specified by the IC manufacturer, a carefully worst-cased design will achieve only modest performance, since the designer is forced to accomodate the specified extreme set-up and hold time values of the input capture flip-flops. Typical values are positive 300 ps set-up time, negative 100 ps hold time, which implies a 200 ps window. The actual capture window is only a small fraction of a picosecond, but, depending on temperature, supply voltage or device processing, it might be positioned anywhere inside the specified wide window.

Here is a self-calibrating design approach that achieves much better performance by largely eliminating the uncertainty of the flip-flop characteristics.

This approach assumes reasonable tracking of the input flip-flops driven by the data and clock inputs, and assumes programmable delay elements at each input buffer.

The incoming clock is buffered and used to clock all data input flip-flops. The incoming clock is also used as if it were data, run through its own delay element X, then driving the D input of a clocked flip-flop. Its output is then used to control a state machine that manipulates X to find the two edges of the valid window, where the flip-flop output changes. Note that changing X has no impact on the bus data capture operation, it only affects the control flip-flop. Once both edges are found, the state machine calculates the center value, and applies this in common to all data input delays.

This auto-calibration circuit can run continuously (or non-continuously), since it does not interfere with normal operation. It means that the user can completely ignore the flip-flop set-up and hold time specifications, the spread between set-up and hold-times, and their possible variation with temperature and Vcc. This circuit does not compensate for skew between data lines, or any skew between data and clock, and it assumes good tracking between all input flip-flops, and relies on a reasonably fine granularity in the delay adjustments. Fundamentally, this auto-calibration reduces the data capture uncertainty from a first-order problem, to a second order issue, thus permitting substantially higher data rates and/or higher reliability of operation. Virtex-4 programmable input delays have 75 picosecond granularity. A low-skew data bus can thus be captured at bus data rates in excess of

1Gbps, even when the data valid window is smaller than 200 ps. Peter Alfke 3-31-05
Reply to
Peter Alfke

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.