Machine check while reading DMA Status Registers

I am repeatedly seeing a machine check when reading one of the DMA Status Registers (DMASRx). Our board has an MPC8248 processor running Linux 2.6.26. We are using PCI DMA as described in section 9.13 of the MPC8272 PowerQUICC II Family Reference Manual. The machine check occurs after our system is running anywhere from 8 to 50 hours, during which time the status registers are read billions of times successfully before the machine check occurs. I am using all 4 DMA engines and chaining mode.

Has anyone else seen this? Any suggestions how to debug it?.

Reply to
Washington Ratso
Loading thread data ...

Not this and not on the same platform, but similar experience on the MPC5200B. When pushed harder - ftp downloading at 100 MbpS - some silicon timing problem comes up and messes up the ATA (IDE, while writing to disk) transfer. Happens once or twice per gigabyte - after I discovered another bunch of timing issues they had on the chip and made it work that good, that is. Another good effect of doing it all in-house (SDMA microcode included) was that the mentioned 1-2 times/GB error turned out to be recoverable and I managed to just retry the faulty disk write - and at that low rate this goes unnoticeable. When I asked their support what to do about the problem, I was answered that linux did not use DMA for ATA, could I not also use PIO please..... Turned out I had been right never to even consider linux or some other mickey-mouse stuff for real work. Under the score - todays SOC parts are way too complex for the silicon manufacturers to test 100%. They just don't have the software to do it - chip gets released and tested in the field by customers. Obviously there will be hardware issues - even Freescale will have them (and I doubt there is a vendor who delivers SOC silicon with less bugs/gate than they do, in fact I doubt anyone matches them above a certain complexity).

If you can understand what causes the failure - it may well be something other than a silicon error, I am just fresh out of wrestling some of these so I go there - then you can fix it by software. Especially if you get the machine check exception only for that reason, you can somehow abort and retry the corrupted transfer or something. Boy, I had not written such a long post in long time :-).

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Reply to
Dimiter Popoff

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.