Virtex4 running at 360Mhz DDR

fastgreen2000 · 2005-05-10T18:48:57+00:00

I'm about to use Virtex 4, and wonder if this is achievable. Allliterature seems to indicate that it is, but I'd like hear what othersthink and perhaps point out where I need to be careful in the design.I'd be receiving an LVDS clock pair @ 360Mhz, running part of theinternal logic at 360. This internal logic includes DSP48 slices (butneed to be pipelined in the fabric since I need more than 48-bit 'C'input for adder). Preliminary testing indicates that it can go above360 with light user intervention. One thing I'm cautious about is, therest of logic runs much slower, at 90Mhz. Initially was thinking ofusing /4 version, but Peter Alfke's post regarding added skews due toloading differences in DCM outputs is making me think about itcarefully.For otuput, I'd be using ODDR to multiplex 360 Mhz logic, to send thedata out at 360Mhz DDR (so the data can look like 360Mhz 'clock').Data is LVDS, so is the forwarded LVDS clock pair @ 360Mhz. Thereceiving device will use both edges of the forwarded 360 Mhz clock tosample the data. Clock to output delay is not good, 3+ ns, but sincethe clock will be forwarded and will incur effectively the same delayas data (other than IOB-IOB clk skew), as long as I send out 180 degversion of internal 360 clock using ODDR, it should be ok. Not surewhat kind of SI issue there will be, however.I have an option of running it at 180Mhz if 360 is risky. Externaldevice will be different. Am I playing too safe by going to 180? Will360 be a challenge?I'd appreciate feedback.

S

Symon 21 years ago

"Austin Lesea" wrote in message news:d5teda$ snipped-for-privacy@cliff.xsj.xilinx.com...

OK, I'm not sure where that came from, but let me explain. Each pin of the pair is driven by a 50 ohm line. Taken together, these two 50 ohm lines make a 100 ohm differential pair. At one single pin you've got a 12.5pF capacitor being driven from 50 ohms. If you view the pair together, you've got 6.25pF driven by 100 ohms. The return loss is the same in both cases. As is the rise time (RC = 625ps). So, Altera's 6.1pF per pin turns into 3.05pF when viewed as the termination to a pair. In the 1Gbit region, their return loss is much better. Their rise time is twice as fast. (RC=305ps). Now, pay attention. I can make your c(r)appy LVDS work better. Given the problem is that you've got a large Cpin because (I guess) of all the other attached goodies, a way to improve things is to drive this capacitance with a lower impedance. For an FPGA to FPGA connection, use a 50 Ohm differential transmission line instead of 100 Ohms. If you place an extra external 100 Ohms differential termination resistor at _BOTH_ end of this t-line, you have a pretty well matched 50 Ohm connection. The pole caused by the parasitic capacitance has moved out to double the frequency it was at. Of course, the signal amplitude has halved, which tends to make the eye close vertically, but it will open horizontally, which is the limiting factor in this case. Maybe LVDS_EXT would be a good idea too. There are also other methods to open the eye for specific bitrates if the t-line characteristics and parasitics of the parts are known. You need to use an application specific filter at the end of the t-line. HTH, Syms.

Vote

A

Austin Lesea 21 years ago

Symon,

All true.

I would suggest that you should have more of a differential line, than two single ended 50 ohm lines, but it doesn't change anything at all (you still end up being differentially terminated at the receiver, with

6.25pF across 100 ohms).

The eye is plenty good for up to 1 Gbs (see the ML450).

It does not work up to 1.3 Gbs, because we didn't design it to work up to there: that is what the MGTs are for.

If there is a 'beauty contest' for the 'best LVDS eye pattern', I will admit we come in second (due to increased Cpin), but I will not admit that it matters so far as use, function, or anything important is concerned. The Idly feature that allows for independent skew adjustment for each IO pin (pair) to center the eye sampling point to within

+/-78ps is a far more useful feature than having 'pretty eyes'.

Austin

Vote

B

Brian Davis 21 years ago

nice explanation...

Best wishes on getting Austin to stop with his "but it's really half, differentially" handwaving.

I've tried before, with results similar to that "but it goes to Eleven" bit from "Spinal Tap".

Also, when you've got plenty of drive margin, a differential attenuator ahead of the FPGA (with internal termination) works nicely to attenuate the reflection, and also makes for a convenient differential probe point. If you have 6dB to spare, even the most horrible of loads presents at least 12dB return loss, with the probe seeing 1/4 the reflection voltage of the original circuit.(however, the attenuator doesn't lower the drive impedance as does your suggestion )

Brian

Vote

B

Brian Davis 21 years ago

Austin,

Then why not publish them, along with a comparison of IBIS/HSPICE simulations versus the real world measurements?

Let's see if I've got this straight [1]:

A) Xilinx publicly posts in FPGA and SI forums touting their real world X vs. A package testing, and asks for feedback [2]

B) Forum users post some suggested measurements, which a certain Xilinx employee says they can make

C) Two months later, when asked when said measurements might be published, the very same Xilinx employee cops an attitude

That would be the same manual (UG077 v1.2) that mentions a HyperTransport compliant DUT interface connector, without pointing out that the the specified V4 FPGA Cin is 5x the allowed HyperTransport max Cin for a 1 Gbps part ???

As to why that matters: a HyperTransport test probe attempting to monitor the input link to the FPGA can't function properly because Cin reflections off the FPGA would prevent the probe from properly clocking the data at the mid T-line probe sampling point.

There are ways around this, but life would be easier if Xilinx actually bothered to meet the spec in the first place.

Lacking that, proper documentation of your part's shortcomings, and how and when to work around them, would be appropriate.

Brian

[1] Speaking of those unable to perform the SI engineering that is required of them : when might we expect publication of characterized static DCI power and DCI impedance modulation limits for the five year old Virtex2 FPGA family ? [2]

formatting link

Vote

A

Austin Lesea 21 years ago

Brian,

All I am trying to point out is that the load is 6.25pF + 100 ohms, not

12.5pF + 100 ohms.

When folks wave their arms and state 12.5pF is the LVDS load, they are miss-stating it.

Simple point.

And once you do the simulations, or look at the actual waveforms, you realize that it is mostly just a beauty contest. In communications theory, excess bandwidth in the channel only adds to the error rate (due to noise). Some band limiting is a good thing. Too much is a bad thing (eg using the LVDS at 1.3 Gbs where it wasn't designed to be used, that is where our MGTs are to be used).

Austin

Vote

A

Austin Lesea 21 years ago

Brian,

Sigh.

See below.

Aust> Austin,

All I can say, is that they are coming. Just takes awhile. Right now we have much more important things to do: tout our power advantage, our static current advantage, our speed advantage, our MGT advantage, our PPC advantage, our SI packaging breakthrough ... Showing an IBIS simulation of a five year old interface is just not high on our list -- too many customers use it, and are perfectly delighted with it. We do not want to be defocused and stop pointing out the areas where we are clearly superior.

Sure.

I did. Yes.

OK, so I was snippy. I am told that the measurements will be done, but again, it isn't a high priority.

True: we are not an ASIC/ASSP. That is the one area where they win (they can make these specs as tight as they please). But guess what? We are growing, increasing sales, and ASICs are not. Our real competition now is no longer other FPGA companies; it is the ASIC/ASSP providers. We can supply features and circuits on technologies they can't (yet). Who has 10 Gbs transceivers? Who has the lowest power

405PPC? Who has the lowest power/highest performance DSP48 blocks for DSP applications? We do, they don't.

I claim in a real system, with a compliant transmitter, there will be sufficient return loss matching to make the eye visible, and useful. But, I agree, that in some cases, what you see is not what you get. That can happen with a simple single ended input pin, and is definitely true about 1Gbs, where observing it, breaks it (often). I think that there is a whole class of people out there who have to see it to believe it. OK. But, they should get used to the fact that none of the test equipment is really fast enough to show them what they want to see. And it is only getting worse.

Already explained why we can't do that: 35 IO standards in one pin has to make some compromises.

We got all that. That is what the user's guide is for. That is what the datasheet is for. Should we place a billboard on 101 South that states the IOB pin capacitance is ~ 12pF? It is already in the datasheet. So is the MGT, PPC, DSP48, etc. What do you think we should spend time on?

I think all this is now covered between data sheets, user's guides, and technical answers on our website. Let me know if there is something missing between those three resources.

Generally speaking, if we don't specify it, then you are on your own to use it there. For example, if you chose to set the resistance to 100 ohms, to match a 100 ohm single ended line, we are not going to claim we meet any standard (there isn't any), and we aren't going to spend time characterizing all the silicon for it. I believe we state the range of the resistance from 40 ohms to 150 ohms, but when you use it at anything other than 50 ohms, you are required to check it out (I would run the spice simulations -- you may request impedances other than 50 ohms for the spice models of DCI, 40, 50, 68, and 75 are the ones we have if I recall correctly), as that is not any one of the 35 IO standards that we designed the IOB to support.

A small change, such as using the DCI at 68 ohms instead of 50 ohms is used by quite a few (to save power). You can characterize it if you need to, and if you feel there is a benefit you can derive, but unusual usage of a feature in an area it was not intended to be used (not specified), is not guaranteed.

Vote

F

fastgreen2000 21 years ago

Thank you all for responses. I didn't mean to start a spark, even though I was curious about what Stratix/II could do in comparison.

I'm now leaning toward doing it - parts of core @360, DDR data output @360 (720, effectively) along with forwarded clock @360. I'd be running simulation to make sure there isn't any big issue at 720Mbps, but since it's much lower than 1.2Gbps, I'm optimistic.

Can't say Altera is out of running, however. I just wanted to make sure I could do it in some FPGA device before committing to the interface.

Thanks again.

Vote

S

Symon 21 years ago

Brian, LOL, I'm beginning to feel the same way. It's interesting that almost all the PCB differential pairs I've seen are edge coupled striplines or microstrips, very few are broadside coupled. Of course, usually with edge coupled lines most of the coupling is to the ground plane, and very little between the conductors. So, it's much more like two 50 Ohm lines rather than a 100 Ohm pair. Not that it makes any difference, of course.

Yep! Cheers, Syms.

Vote

J

Jim Granville 21 years ago

Perhaps it is time to make some pins less "jack of all trades, master of none", and provide some with more focus ?

-jg

Vote

A

Austin Lesea 21 years ago

Jim,

It is something we agonize over everytime we look at a new family.

Should we add IO standard specific IOB's? How many? How are they to be organized?

What should the IO/CLB ratio be?

Or, should we continue with the present plans (if it ain't broke, don't fix it)?

What business did we lose because we could not meet a customer's requirement? How do we know we even lost any business at all?

We did add MGTs (and PPC's, EMAC's, DSP48's, ECC_BRAM's, FIFO_BRAM's, etc), so it isn't like we are not looking at adding new things, or mixing things up (the patented ASMBL architecture for example).

360 MHz, 720 Mbs DDR LVDS is now over five years old as something that either X or A has provided with their devices. One can argue the fine points, but as a gross capability, it has been there for quite awhile.

Aust> Aust>

Vote

B

Brian Davis 21 years ago

Austin,

Which of the following posts regarding Cin is more helpful for both Xilinx and its' customers:

Austin [1]:

Brian [2]:

Aust>

The only I/O capacitance number published in your datasheet is a single-ended parameter called Cin (or if you prefer, C_comp from the IBIS files).

Quoting this published datasheet Cin value is perfectly valid, and does not require "correction".

Comparing that number against the single ended Cin's of other devices, or against a single ended spec, is also perfectly valid.

I have never said the differential load is 12.5 pf; it is clear from my posts that I understand this, and also understand that the assumption of Cdiff_effective = 1/2 Cin_single_ended applies only for the differential components of the signals on the Tline.

I find it rather inconsistent that in past discussions of Xilinx's newly onerous SSO limits for the current mode output drivers, you've been quite insistent that real world paths are NOT perfectly balanced-

Yet when discussing the effects of high Cin, you posit that everything is perfectly balanced back to a perfect source termination, so that a 50-60% voltage reflection off of your input pins is never a problem.

If only all FPGA input buffers could live happily ever after there in Austin's world, where all connections are ideal differential point-point links, all drivers have perfect back terminations, and no probing or multidrops are ever allowed.

And massive, coherent input reflections do not fit the AWGN assumptions of most channel models, now do they?

Brian

p.s. As for your other post, I'll reply once I finish recovering from a hard drive crash at home and can find my old files again.

[1]

formatting link

[2]

formatting link

Vote

P

Paul Leventis (at home) 21 years ago

Hi Austin,

Well, things are getting a little less busy with my day job, so I finally have time to start replying again... I figured I'd start with an easy one.

Yes. Stratix II has LVDS running at 1.3 Gbps reliably across process, temperature, voltage. Beautiful eye diagrams. In simulation and on the board. And as noted here

formatting link

we will be increasing the spec to 1.25 Gbps in an upcoming version of Quartus II.

BTW, our simulations line up very will with board measurements. We offer accurate IBIS models that we proudly stand behind.

Regards,

Paul Leventis Altera Corp.

Vote

P

Paul Leventis (at home) 21 years ago

Austin:

According to our engineer who ran the sims, we did use on-chip termination for both V4 and Stratix II. I read the whitepaper again

formatting link

and I can't find anywhere where it says we didn't use on-chip termination.

Sorry to hammer on this again, but the above mentioned whitepaper does show some beautiful eye diagrams for SII and some ugly ones for V4. It also shows how nicely our lab measurement (of 1.3 Gbps LVDS on Stratix II) compares to the IBIS simulation.

Regards,

Paul Leventis Altera Corp.

Vote

F

fastgreen2000 21 years ago

Wait a minute - don't oversimply the original design critera - 720Mbps DDR LVDS is only a part of my question. The design also needs to run the internals at 360Mhz, and that include portion of the fabric, not just DSP48, etc. Five years ago, I don't think so. Maybe in the lab somewhere, but not as an available product.

Vote

A

Austin Lesea 21 years ago

No problem.

That is what all of the wonderful features are for in V4 (SSIO, IODLY, DCM, etc.). All of the above go a long way to support the fabric. Even though the fabric will run at 500 MHz, it is far easier to mux it down to 200 MHz, or 100 MHz (using the built in SSIO features) which makes place and route easier, and also provides a lot of margin.

Just go buy the ML450 board (network interfaces), and you will get a fully working platform to test out all of your ~ 400 MHz up to 500 MHz DDR interfaces.

Austin

Vote

S

Symon 21 years ago

Paul, Does that mean in Stratix II I could run an internal clock at 625MHz and use the I/O DDR to move data out at 1.25Gbps? Thanks, Syms.

Vote

P

Paul Leventis 21 years ago

Hi Symon,

There is a hard serializer/deserializer circuitry available for the left and right LVDS I/O banks. These SERDES blocks allow you to deserialize/serialize by any factor between 4 and 10x. For example, you could bring in a 4x data bus running at 312.5 Mhz. Or you can bypass the SERDES block and use the DDR registers for a 2x SERDES. Or bypass completely for 1x... but not at 1.25 Gbps. I don't know what speed the SERDES/DDR I/O clock can run at or will run at when we update this specification. I'm sure it will be published at the time.

We also have dedicated Dynamic Phase Alignment (DPA) circuitry for source-synchronous applications. The DPA block enables you to eliminate channel-to-channel and clock-to-channel skew. It achieves this by selecting the best clock phase to use for each I/O pair, centering the sampling window in the eye.

Regards,

Paul Leventis Altera Corp.

Vote

J

John M 21 years ago

Symon,

According to the data sheet, you can run the LVDS I/O up to 500 MHz in the fastest speed grade part. That would get you 1 Gbps. More likely, you would use the SERDES. For example, at 130 MHz and using x8 serialization, you get 1.04 Gbps per pair. Here is a link to the DPA datasheet:

formatting link

John

Vote

S

Symon 21 years ago

Paul and John, Thanks very much for your replies! So, for 1.25Gbps I'd need to use the SERDES. I guess that means I have to use the PLL circuit to make the clock? If I had more than 1 of these links, how easy is it to ensure that they're all synchronised together. For example, I want to send bits a_1, a_2, a_3, a_4 etc. on I/O LVDS_A I want to send bits b_1, b_2, b_3, b_4 etc. on I/O LVDS_B I use the serdes to do this. Can I ensure that a_n appears at (more or less) the same time as b_n? I.e. that the shift registers in the two serdes are aligned? I know, I should read the bloody manual more carefully, but I couldn't find this on a first pass. Thanks, Syms.

Vote

P

Paul Leventis (at home) 21 years ago

Hi Symon,

Sorry for taking so long to reply.

That's what the SERDES block is for. You just need to instantiate a altlvds_rx (receiver) or altlvds_tx (transmitter) with the number of channels you want in the link. Each of the channels will share a common PLL. Therefore, they share a common clock, and the enable pulses derived from that clock.

And if you want to give the manual another stab ;-), I've been told that volume 2, chapter 5 of the Stratix II handbook, "High-Speed Differential I/O Interfaces with DPA in Stratix II Devices"

formatting link

is helpful. Figures 5-2, 5-11 and 5-12 are most applicable in this case.

Regards,

Paul Leventis Altera Corp.

Vote

Virtex4 running at 360Mhz DDR

Join the Discussion

Didn't find your answer?