Virtex-4FX embeded MAC and Rocket-IO data corruption??

Question

Hi,After a very enjoyable few days trying to sort out a test designinvolving Ethernet and Virtex-4 I thought it time to ask some advice.I have a FX60's embedded MAC using RocketIO to talk 1000Base-X to an SFPmodule(cat5 copper) this is then cross-over cable connected to an IntelE1000 network card. There are no switches involved in the links. Thedesign is based on the 1.4.1 reference design that comes with coregen8.1 for the embedded-mac. I am not using the Host Interface or suchlike, just a basic logic hooked on the end of the MAC with it staticallycconfigured using the tie-offs.I can get the PC with Intel card to send raw Ethernet frames to myvirtex-4's mac address and they seem to get through about 40% of thetime, and I see the "frame good" signal, the rest of the time I see a"bad frame". On looking at more signals I am seeing a large amount ofdisparity error and NotInTable coming from the RocketIO module and whatlooks like corruption or bit shifts on the data.I even see this when there is no traffic on the link and there are justidle frames being sent.. It seems semi-repetitive, with say the order ofa few hundred good bytes between small bursts of bad ones.Anyone got any ideas?I have checked the network card, the cable(long ones, short ones, goodquality ones and bad ones.) All cards and cables pass large statisticback-to-back testing at 1Gig between PC's with very few dropped frames.My initial thoughts were the reference clock being fed to the RocketIO(which i think is okay, its the ref clock from a PCIexpress socket goingvia a ICS9DB202 to clean it up and make it 125Mhz.) I have consideredheat too, so we bolted a larger heatsink and fan onto the chip...Words of wisdom will be greatfully received...--//rc Kelly..Just your average physicist trying to get by in a world full of normalpeople...

Sylvain Munaut · Accepted Answer

Have you seen the release notes and applied all the fixes ?

Also, I don't know about 1000BaseX, but with SGMII, you need a MDIO clock ... either internal (by configuring the divider with the host internal) or external (via the mdio clock in pin). Without it, I never could get the SGMII to autonegotiate witht the PHY ...

I'm working with EMAC and SGMII, and I had errors because my sgmii board already has AC coupling capacitors in the rx path (at the phy end), so with the on-chip ac coupling, the signal was just too attenuated. Switching off the internal ac coupling did the trick.

If it's from a PCIexpress socket, are you sure it's spread spectrum is deactivated ? If not, even with a pll, for a few ms, it will be quite off 125MHz ...

Sylvain

MikeJ · Answer

Marc,Have you designed the board with the Xilinx on in house ?After checking the quality of the reference clock I would look at the power supply to the MGTs and the quality of the PCB routing between the Xilinx and the SFP. Any way you can look at the RocketIO signals and check the eye pattern?/MikeJ(x-physicist)

Marc Kelly · Answer

It's actually a prototyping board bought from PLDA (their XpressFX60 board) with the SFP sockets already on it. They're main selling point is its PCIexpress capability, but we're after it for the high speed IO currently.

Its possible I think, would have to get our hardware people onto it, as I do mostly firmware and so they keep all very high spec scopes hidden from me :)

Marc Kelly · Answer

Yes, I had been hoping that one of them would magically fix things, sadly not.

I believe things are negotiating, although I may be wrong. The fact I am seeing real fames that work, and the system can echo them back to the PC as well gave me some confidence. I will check the MDIO clock issue however..

Ah, that does sound interesting, I will have to check he schematics for the board(s) tomorrow at work and see.. It does seem to be the kind of thing that might be causing it.. if so, then I owe you many beers...

yeah, we turned off all the spread spectrum settings in a moment of inspiration.. sadly didn't seem to have any effects. I even tried clocking it from a DCM generated 125Mhz clock, just to see what happened.. same effect as the proper ref clock.

MikeJ · Answer

Ok, I know of that board - they should know what they are doing and I assume they have looked at the eye pattern.

I know that feeling - having had some colleagues break/misplace expensive probes :)

Maybe it's worth replacing the reference clock with a quality low jitter differential oscillator and see if it makes any difference ? You can get away with some carefully matched length twisted pair mod wire - my company does it quite often ... but make sure the oscillator power supply is good - and put a smd cap across the pins of the oscillator at least.

/MikeJ

MikeJ · Answer

Just to clarify, that is a cap across the power pins of the oscillator!/Mike

Marc Kelly · Answer

The board actually has a spare place for mounting an oscillator that feeds into one of the MGT reference clocks, I need to check to see if it feeds the correct column to be used to drive the RocketIO I need..

'tis the joys of playing with such fun hardware I guess..

bh · Answer

Well this is a long shot, but is it possible that one is fixed at Full Duplex (no-negotiate) while the other is trying to negotiate and falls-back to half-duplex? This is a fairly common problem that occurs and the link appears to work for simple 'pings', but any real traffic has massive amounts of errors. This is due to the fact that one is in full duplex and transmits while receiving and the half-duplex connection sees this as a collision.

Just a possibility... and I'm only SURE that this happens with

10baseTX and 100baseTX, not with 1000baseX

-bh

jitter

Marc Kelly · Answer

I have tried with both ends forced t full-duplex, just incase. and made a new crossover cable too, just incase. I had some good success with turning off the internal ac-coupling caps as someone mentioned, and things look more sane. For small packets ~65-128 bytes long I get good transmission with maybe 1-2% packet loss, larger packets seem to be a problem however.

With an idle link I see the /K28.5/D16.2/ idle pattern fine, but sometimes the /D16.2/ is corrupt, and gives a "notintable" error from the MGT, always with what seems to be the same pattern.

I need to get Synplicity's Identify_debugger to play nicely tomorrow with a nice long sample memory to check how regular this is happening. The external logic analyser I have access to currently doesn't have the depth when running at a decent speed.

Maybe a possible issue with the MGT itself? I can move to another one i think and test that.

Virtex-4FX embeded MAC and Rocket-IO data corruption??

Join the Discussion

Didn't find your answer?