180 differential inputs each 800Mbps using V5

T

Test01 19 years ago

We have an application where I need to feed 160 differential data inputs and 20 differential clock inputs to the high end V5 FPGA. There is one differential clock for every 8 differtnial data inputs. I would like to use LVPECL inputs for this.

160 differential data inputs will be running at 800Mbps

20 Differenital clock inputs will be running at 400MHz. In order to provide DDR clock for 160 data channels.

Inside the FPGA we would like to use 1:4 demux and convert the data bus to 640 bits wide running at 200MHz so the FPGA fabric can handle it. Then the data will be fed to either 640 bits wide internal FIFO (as deep as I can get) or if I had a choice then feed the 640 bits wide bus running at 200MHz to external high density FIFO. This may require need for another FPGA.

I have seen Xilinx application note for supporting such configuration. But in our application we need a lot more data inputs.

I would like to get your input on this. Is this possible using high end V5 FPGA?

Thanks.

Vote

T

Tim 19 years ago

That sounds like a lot of ADS6425 parts ;-)

Look through the DDR/DDR2/QDR app notes. The problem is very similar to using latching incoming byte lanes from DDR etc. memory.

Vote

T

Test01 19 years ago

I am bit concerned about shear number of channels - 160 differential inputs each running at 800 Mbps. Is this possible using V5 110 device?

Vote

T

Test01 19 years ago

Any input on this will be great

Vote

M

Matthew Hicks 19 years ago

Since you work at Xilinx, wouldn't there be many people there who can answer your question. I'm sure they know their own devices better than any of us.

---Matthew Hicks

Vote

T

Test01 19 years ago

I wish I did work at Xilinx. Any input will be great.

Vote

N

notaxilinx employee 19 years ago

Woo look I can post from

formatting link

too does that mean I work for Xilinx?

Vote

M

Matthew Hicks 19 years ago

Yeah I figured that I would make that assertion and I would quickly get something along the response you gave. Upon further searching (www.xtra.xilinx.xom just forwards to xilinx) I see that they provide forums. I stand corrected.

To answer the question, the pure number of differential signals isn't a problem, but that's a lot of clocks, in fact the Virtex 5 only has 20 clock inputs. Assuming you don't need another clock in your design, you would be fine. You might be able to save the special clock routing channels by using placement constraints and well thought-out pin assignments to minimize clock routing using the standard routing channels.

Things of concern are metastabilty when trying to group all of the data channels together using a single fabric clock and the phase difference between all of the 20 clocks.

---Matthew Hicks

Vote

S

Sean Durkin 19 years ago

Using IO clocks instead of global clocks would be a solution. I'm not sure how many IO clocks a V5 that size has, though. But Jim Wu's ADEPT can help with that:

formatting link

IO clocks can only clock the ISERDES inside adjacent IOs, but that is exactly what is needed in this type of application. So you would clock the ISERDES with the IO-clock, do the 1:4-deserialization there, and use a divided by 4 clock as the write clock to a nearby FIFO. Have a look at xapp866:

formatting link

which describes a similar application.

Problem there is that you have to use specific pins to connect the clock to, and a specific group of pins for the corresponding data pairs (again, ADEPT will help with the pin assignment). This usually makes PCB layout harder, because you can't just connect everything so you have matched-length differential pairs without vias and such, which you would usually try to achieve.

HTH, Sean

My email address is only valid until the end of the month. Try figuring out what the address is going to be after that...

Vote

T

Test01 19 years ago

Thanks for your feedback. I will utilize the recommended tool and the application note in my application.

I do have a couple of questions:

(1) With internal 1 to 4 deserializer as recommended, the internal data bus will be 640 bits wide at 200 MHz. I would like to buffer this externally using deep FIFO. If the FPGA has enough number of pins, is it OK to also include the 640 bits wide, 200MHz external FIFO in the same FPGA as the 180 differential input,

800Mbps interface? I was thinking about using Virtex5 110 device for this application. It will be great to get your feedback. (2) For similarly configured differential outputs from another FPGA (I call it transmitter FPGA) are there any concerns of simultaneous switch noise? 180 differntial outputs switching simultaneouly at 800Mbps. As per my understanding, the simultaneous switching noise applies more to the single ended outputs and not as much to the differential output. The common mode noise on differential noise (ss noise) will be there but it will be subtracted by the receiver. Is that correct?

Vote

C

comp.arch.fpga 19 years ago

That would be 360 pins for LVDS and 1280 pins for the FIFO (read and write). That's a lot.

BTW: Price for FPGAs grows faster than linear with size. You are probably better of with multiple small chips. Also, the PCB should be easier to route if you have less FIFO chips per FPGA.

Kolja Sulimma

application note in my application.

will be 640 bits wide at 200 MHz. I would like to buffer this externally using deep FIFO. If the FPGA has enough number of pins, is it OK to also include the

640 bits wide, 200MHz external FIFO in the same FPGA as the 180 differential input, 800Mbps interface? I was thinking about using Virtex5 110 device for this application. It will be great to get your feedback. (2) For similarly configured differential outputs from another FPGA (I call it transmitter FPGA) are there any concerns of simultaneous switch noise? 180 differntial outputs switching simultaneouly at 800Mbps. As per my understanding, the simultaneous switching noise applies more to the single ended outputs and not as much to the differential output. The common mode noise on differential noise (ss noise) will be there but it will be subtracted by the receiver. Is that correct?

Vote

T

Test01 19 years ago

I was thinking that to unload the fifo, I will use separate FPGA as that is not on the critical path.

For the receiver FPGA, the write to the FIFO is most critical. Thus ideally, I would like to have the receiver FPGA contain 180 diff inputs(800Mbps each) and

640+FIFO Control outptuts(200Mbps each).

As per my understanding biggest virtex5 fpga contains about 1200 user i/o

What are the issues that you all see with this approach?

Vote

A

austin 19 years ago

Test01,

I would study the networking cores in the IP library.

They require one clock forwarded, for every N data lines, where N gets smaller as the speed goes up.

I think that for 180 paths, you may need as many as 10 clocks (each 18 bits get a clock). That makes 190 LVDS pairs, and 10 BUFG global clocks. Banking will be an issue, as in V5 I think we have banks of 20, and banks of 40 (pins, IO's, and LVDS takes two pins), so the bank of 20 is too small one set of 18 signals + clock, and a bank of 40 is one set of 18 signals + clock.

That makes for 10 banks of 40 IO's, so you need to size the part to match.

I would not do this from scratch, but follow a networking core (e.g. SPI POS 4.X).

PCB layout will be a real challenge, as you will need to make all traces the same length (including the traces in the package to the die).

There are advanced interfaces that will automatically adjust all the delays to get the bits all centered (pcb traces can be anything you like), but I don't think you can fit that many of these cores in one chip (you run out of something, like DCM's, or serial IO resources).

Local clocks only go to 200 MHz, so they can be used with the per pin serializer/deserializer. Each clock gets 4 bits in time.

Again, this is a real challenge, and puts you in the rarefied world of extremely high speed design. The pcb after layout has to be extraced, and run in an SI tool, or else you are pretty much doomed to fail.

Only after you prove that all signals get from here to there, and all the SI is acceptable, and simulate an eye pattern on each and every of the 180 lines, and 10 clocks, are your ready to fab the pcb.

Austin

Vote

T

Test01 19 years ago

Austin,

Thanks for your input on this one as I am bit puzzled on where to start. I was looking at xapp705 to use iserdes and oserdes and build upon that. It seems that you have something even better - SPI/Networking application. I will look into that as well.

When I said 180 diff pairs at 800Mbps, I meant 160 diff pairs for the data bus and 20 diff pairs for 400MHz DDR clock. Thus the sum total will be 180 diff pairs into the FPGA. There will a clock for every eight bits. The FPGA will use

1 to 4 deserializer to further widen the data bus to 640 bits at 200MHz. I could use internal fifo inside the fpga to store the 640 bits wide data but ideally I would like to store it in the external fifo due to the size advantage. The receiver fpga containing the 180 diff paris and 640 bits, 200Mbps output to the fifos will be involved in writing to the fifo and reading can be done by some other fpga.

Please let me know your thoughts on the external Fifo interface.

Vote

A

austin 19 years ago

Test01,

If it were me, I would figure out how to do everything inside the FPGA. Moving the data out to a FIFO, and back in again, you still have all those data running at 800 Mbs per pair.

FPGA fabric is the perfect place to expand to a huge wide bus, go into the 36K BRAMs (used as FIFO's, or used with read/write pointers on their dual ports).

What are you going to do with all this data once you have received it, and placed it in a FIFO?

Obviously, there is some kind of "output" from this design, what is it?

If the amount of storage on the FPGA is insufficient, then yesy you will have to put it somewhere else. A natural for this is DDR SDRAM.

You may build the FIFO functionality from any memory: use the fastest and least expensive memory you can.

Austin

Vote

T

Test01 19 years ago

Austin,

Once I capture the data into the external memory, I will retrieve the data using some back-door communication to the memory. My preference was to keep the external memory interface as simple as possible as this is the first time I am attempting complexity on the 800Mbps interface. Ultimately 160 data channels 800 Mbps each funnels down to 20 channels at 6.4 Gbps each. This interface is a bit more complex so we wanted to minimize the complexity on the memory side.

DDR interface has some overhead associated with it and will need more then one DDR interface to keep up with the data rate. This is why I liked the FIFO solution.

But is it feasible to have 180 diff paris 800Mbps and 640+ singled ended i/os

200Mbps each in one FPGA? Inside the FPGA fabric there is not much going on. I am not sure if V5 LX110 can meet the I/O and internal logic requirements?

Thanks for your input.

Vote

A

austin 19 years ago

Test01,

You did not answer my question.

Where are the 20 6.4 Gbs transceivers? How do get to them? Are you assuming that these 20 6.4 Gbs transceivers are already inside the FPGA? Why leave the FPGA, only to come back into the FPGA? 640 X 200E6 =

128 Gbs, and 20 X 6.4 = 128 Gbs, but that implies that you have no encoding, or overhead, which is not practical. Where do you find 6.4 Gbs transceivers? I suspect you need 8B10B coding, or at least 64/66B coding, which then bumps you up to 8 Gbbs, or 6.5 Gbs, respectively.

You will need channel bonding to concatenate 30 MGT channels together, and some kind of protocol (Aurora core is free). I still don't know why you are doing any of this...fun? video? data? radio? Is it a state secret?

Asking: "can the V5 do XXX" is of no use to anyone. If you wish to have a system architected by a committee, and designed by rumor, opinion, and hearsay, then I have to hit the "ignore" as this project will never succeed, and provide any revenue to Xilinx (it is, a complete waste of my time).

I am sure whatever it is, can be done by proper use of FPGA technology. How you are going about it is very odd. You seem to be architecting your system around a bunch of LVDS pairs, why?

Perhaps, it would be best if you contracted with one of the Xilinx expert consultants out there, or perhaps with Xilinx design services? What you are asking is non-trivial, and is very difficult, even when done correctly, by experts. 800 Mbs LVDS with as many wires as you need is a lot of work to get done right (as I have previously described).

Sorry, I have lost interest in this, as you are not answering my direct questions (what are you doing? why?).

Sorry, is now ON.

Austin

Vote

T

Test01 19 years ago

Austin,

The short answer is, I am not using the MGT or GTP to achieve 6.4 Gbps per channel. There are external discrete components to achieve this. We are going to have training algorithm to achieve bit alignment and packet alignment. It did not seem relavant to my original question so I did not mention it. Also the very high speed (6.4 Gbps) is not something that I am dealing with myself and it is some one else is looking at.

I hope this answered your question.

Vote

T

Test01 19 years ago

I would like to add to my previous reply. I applogize if I did not answer your direct question. It is quite possible that I did not understand the question.

FPGA is just one of the component of the project that I am responsible for and I hav some understanding of its limitations but not full.

Again I am sorry if I did not answer the questions directly.

Vote

B

Brian Drummond 19 years ago

Where do you find the package internal trace lengths documented, so that you can allow for them in your PCB trace length calculations?

Answer Record 18078 says

formatting link

with respect to the common FG or BG packages, "since these package structures do not lend themselves to this kind of analysis, Xilinx does not have this information available."

For other packages it refers to AR15321, where you find that for Virtex-II era devices (in unspecified packages) the information IS available but you have to open a webcase to get it.

- Brian

Vote

180 differential inputs each 800Mbps using V5

Join the Discussion

Didn't find your answer?