ML300 and GigE Experiences

I am curious if anyone here has had success maintaining a very low BER link using the fiber connections on the ML300 boards.

We have implemented an Aurora Protocol PLB Core for the ML300 (adding interface FIFO and FSMs to the Aurora CoreGen v2 core. It is currently a single lane system using Gige-0 on the ml300 board (MGT X3Y1). We were having small issues using the 156.25 bref clock so we are currently using a 100 MHz clock (we are just using the PLB clock plb_clk out of the Clock0 module on the EDK2 reference system). Clock compensation occurs at about 2500 reference clocks. (tried 5000, same, if not worse problems). Best results were with Diffswing=800mv, Pre-Em=33%.

Unfortunately our link has problems staying up for more than 20 minutes (it will spontaneously lose link and channel, until a mgt-reset on both partners kicks them off again). Additionally, there are mass HARD and SOFT errors reported by the Aurora core. I do not send any data, just let the Aurora core auto-idle. This is the timing:

DIFFSW=800 PREEM=33% Stays up: 30+ minutes, ~5 soft errors/sec DIFFSW=700 PREEM=33% Stays up: 30+ minutes, ~10 soft errors/sec DIFFSW=600 PREEM=33% Stays up: not tested, ~20 soft errors/sec (explodes to 200-300 errors/sec at about 13 minutes) DIFFSW=500 PREEM=33% Stays up: not tested, ~30 soft errors/sec (explodes to 200-300 errors/sec at about 13 minutes)

DIFFSW=800 PREEM=25% Stays up: not testeds, ~200-300 soft errors/sec

- In loopback mode (serial or parallel) the channel/lane are crisp and clean as ever.

- When the boards start up, the errors in each situation are small parts/second, but then grow over time. I dont know if this is a function of board/chip temperature (i put a heat sink on and it seems to slow the increase of the error rate), or if for some reason the Aurora core cannot compensate for some clock skew and jitter

-

Could any of you guys steer me in the right direction?

Is the higher loaded plb_clk as my ref_clk a source of problem? Anybody able to get low error rates?

Thanks, Tony

Reply to
Tony
Loading thread data ...

Tony,

A well designed link should be error free (ie many many hours without a single bit in error). Contact the hotline for details about MGT support on specific ML300 series boards: some early versions were not designed for supporting links above 1 Gbs! as they were designed to show off the

405PPC(tm IBM) instead.

So, there is a hundred things to check once you find out if your board was built for MGT usage, but you have to start somewhere:

1) is your refclk meeting the jitter spec? The MGTs require a very low jitter refclk. You can check this by observing a 1,0 pattern from the outputs of the MGTs and seeing how much jitter is there. Should be much less than 10% of a unit interval (bit period). If it is more than this, you have a tx jitter problem. If you loop with a bad jitter rx clock, everything is OK because the receiver is getting exactly the same bad clock to work with.

2) is your logic error free when looped back? I think you said yes, but often timing constraints may be missing, and the fabric is the source of errors.

3) are your errors in burts? or single? Bursts may indicate FIFO overflow/underflow (refclks far apart in frequency, and no means to deal with it, or the means is not working in logic -- when looped, the same clock is used, so no problem).

4) what is the channel? coax cables are not a differential channel, common mode noise will roar right into the receiver if the channel is not differential. Usually the coax's are used to connect the TX and RX pairs to a XAUI adapter module to the actual backplane (still not ideal, but at least most of the channel is differential).

5) what does the received eye pattern look like? This will tell you if you have a jitter problem, or an amplitude/loss problem. If the eye looks fantastic, that takes you right back to the digital processing, and takes away the analog side of things again....

6) have you tried a far end loopback? Loop the digital data directly back to the far end tx from the far end rx to go back to the near end.

7) contact an FAE, and arrange to go to one of our 15 world wide RocketLabs(tm) locations where we have all of the equipment and resources to debug your board, and compare it with our own boards and designs in the labs.

Aust> I am curious if anyone here has had success maintaining a very low BER

Reply to
Austin Lesea

Tony, I used a HW gmac core on the ml300. I believe we used a differential clock input (62.5 *2 ) = 125 Mhz. Maybe you can use this clock instead. This signal is provided on the ml300 board. i dont have the docs in front of me but I belive it comes in on either pins B13,C13 or B14,C14

My other experience with gmac core and corresponding reference designs are VERY bad at best, and xilinx support in that area is no better. maybe using the gig ports with the PPC is a little better but...

Matt

Reply to
Matthew E Rosenthal

Matt,

Do you have a case number?

I like to follow up on any less than happy experiences so that we can do better.

Did you have an FAE visit? Did you visit a RocketLab?

Please reply to me directly, (austin (at) xilinx.com)

There is a case now open for Tony (yup, it took that long), and we are zeroing in on his issue.

Thank you,

Austin

Reply to
Austin Lesea

Thanks Austin!

The hotline is getting back to me today or monday with respect to the MGT gbps ability for our boards.

1) our clock is probably dirty. It is the initial DCM output that goes to the plb_clk of the reference design. I noticed the DDR clock is fed from another DCM that de-skews and cleans up the first DCM, so I will do a quick switch to that to see if there is improvment. I am more and more convinced that the dedicated 156.25 BREF clock going straight to the MGT is the cleanest signal, and will also give that a try. I have to get a scope from the other lab to test the 0->1 jitter characteristics.

2) I am using the Aurora core v2 from coregen so I am comfortable saying the fabric is stable. These errors occur when idling (no pdu/nfc/ufc's), so it is not a sychronization problem with the aurora core.

3) I havent yet developed a test for this. Right now we are picking off falling edge HARD_ERROR, SOFT_ERROR, and FRAME_ERROR signals from the Aurora core, and generating interrupts to the PPC405 core which then prints to the screen every 100 interrupts, so there is significant delay, but more than sufficient to gather error rate statistics in the ~100/sec range.

4) Fiber, the ones that come with the Ml300 Kit.

5) I have to get a scope from the other lab to test this.

6) Far end loopback? Do you mean the serial-mode loop back where it goes to the pads? Yes that works flawlessly.

7) I was planning a trip just to check out the labs anyway, should be fun!

I'll reply with the result of switching to the DDR 100 MHz clock, and the 156.25 MHz clock.

Regards, T>Tony,

Reply to
Tony

I should say response time has been extremely fast and the people I spoke with were great to work with. I called the hotline and they opened up a case. (Austin, I am not sure if this is the same case, but I left your email and name with them). I havent used the GigE core but the PLB interface version seems very clean cut.

Regrds, T>Matt,

Reply to
Tony

Matt,

The Ml300 also supplies a 156.25 differential clock, but if that gives problems, the direct diff clock at 125 MHz would indeed be a step in the right direction. Thanks for the info!

T>Tony,

Reply to
Tony

Tony,

See my comments below:

Aust> Thanks Austin!

You're welcome. Happy to help out.

They contacted me, and I gave them the right names who know this stuff. I may not be that smart, but at least I know who is!

Driving the MGT fom the DCM definitely does not meet the MGT clock input specifications at 2.5Gbs and higher. We have heard that some folks can do this without trouble at 622 Mbs and 1 Gbs, but it still is not recommended. Driving it from two in tandem is even worse.

Sounds good.

Have you thought of using the XBERT design for link characterization? If you are getting lost frame indications, then that is something far worse than a few bit errors......

OK

OK

No, I was thinking of looping it back at the far end receive digital end to go back towards the receiver, but I do not think you need to do this.

Yes, we have a lot of fun. Since the equipment is there 24 X 7, the FAEs get to play with it, and they get proficient with it. Time gets saved because set up is sometimes the hardest part of any verification or measurement. Knowing the equipment, and the setup, and using it benefits everyone.

If that is easy, that might have a real big benefit.

Reply to
Austin Lesea

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.