Interpreting SRR1 and OOPS

I am getting the OOPS message that follows and have been having a very difficult time determining what is causing it. According to "PowerPC Microprocessor Family: The Programming Environments for 32-Bit Microprocessors", "When an exception occurs, bits 1-4 and 10-15 of SRR1 are loaded with exception specific information."

SRR1 is 00089032, so bits 1-4 are 0000 and bits 10-15 are 001000. Unfortunately, I cannot find anywhere what the "exception specific information" contained in these bits is.

Any information on this exception or interpreting an OOPS message in general on PPC would be greatly appreciated.

Eclipse # Machine check in kernel mode. Caused by SRR0=0xC0005D28 Caused by (from SRR1=89032): Machine check signal Oops: machine check, sig: 7 NIP: C3095218 XER: 00000000 LR: C30951BC SP: C015E240 REGS: c015e190 TRAP: 0200 Not tainted MSR: 00089032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK = c015c470[0] 'swapper' Last syscall: 120 last math c1db4000 last altivec 00000000 GPR00: 00000000 C015E240 C015C470 C32E6EB8 00001032 000000C6 0000008C

00000000 GPR08: C3110000 C36EF000 C310FA94 C0269600 00000175 1010E944 01FFD000 00000001 GPR16: FFFFFFFF 00000000 00000000 01FF7A0C 00001032 00000002 00000002 C3110000 GPR24: 00000001 C01B0000 C0140000 C0140000 00000002 00000002 00000000 00010000 Call backtrace: C30951BC C30A81BC C001D25C C001D008 C0006D0C C0005B20 C00071D0 C00071EC C0003948 C01705D8 000035F0 Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing
Reply to
Bill
Loading thread data ...

Please specify what PowerPC processor is involved. For instance: if it is MPC603e (or G2) than SSR1 bit 12 indicates "Machine check signal caused exception" for vector 0x200 which is the exception in your case.

David Gabbay DoGav Systems

Reply to
dg

See the chapter on exception processing, chapter 6.

Machine check exception is described in 6.4.2 in my copy:

SRR1 Bit 30 is loaded from MSR[RI] if the processor is in a recoverable state. Otherwise cleared. The setting of all other SRR1 bits is implementation-dependent.

So you may need to look at the user manual of your CPU.

Rob

Reply to
Rob Windgassen

I looked at section 6.4.2 but did not find it very helpful. My register settings do not match those listed. I have:

POW 0 FP 0 BE 0 DR 1 ILE 0 ME 1 FE1 0 RI 1 EE 1 FE0 0 IP 0 LE 0 PR 0 SE 0 IR 1

Reply to
Bill

0-11 Cleared 12 core_mcp-Machine check signal caused exception Check the SIU's register TESCR1 (offset 0x10040) for the specific cause.

David Gabbay DoGav Systems

Reply to
dg

Should I add printing the value of this register to the OOPS message? Is there a better way to read that register before a crash?

snipped-for-privacy@dogav.net wrote:

Reply to
Bill

Reading the TESCR1 revealed a PCI machine check. Then, reading the ESR showed that there was a PCI read data parity error, which had gone undetected because the parity error response bit in the PCI Bus Command Register was set to 0. Once this bit was set to 1, the presense of the parity error was confirmed.

Thank you very much. Now we know what is causing the oops and can go about fixing it.

snipped-for-privacy@dogav.net wrote:

Reply to
Bill

This is totally meaningless. Google is not usenet - it is only a poor imitation of an interface to the system. Read the links in my sig. below.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 Click to see the full signature
Reply to
CBFalconer

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.