Understanding PPC405 execution.

Hi all.

I'm trying to figure out how much time does it take for a such code to execute. I'm using modelsim on ppc405 processor (fpga virtex4 fx-12).

ffffc1d0 : ... ffffc1fc: 7c 00 01 24 mtmsr r0 ffffc200: 80 1f 00 10 lwz r0,16(r31) ffffc204: 7c 03 03 78 mr r3,r0 ffffc208: 48 00 01 89 bl ffffc390 ffffc20c: 7c 60 1b 78 mr r0,r3 ffffc210: 90 1f 00 20 stw r0,32(r31) ffffc214: 38 00 00 00 li r0,0 ...

ffffc390 : ffffc390: 54 64 c4 2e rlwinm r4,r3,24,16,23 ffffc394: 54 65 c0 0e rlwinm r5,r3,24,0,7 ffffc398: 50 65 42 1e rlwimi r5,r3,8,8,15 ffffc39c: 50 64 46 3e rlwimi r4,r3,8,24,31 ffffc3a0: 7c a3 2b 78 mr r3,r5 ffffc3a4: 50 83 04 3e rlwimi r3,r4,0,16,31 ffffc3a8: 4e 80 00 20 blr

As you can see in main() function when pc = ffffc208 we have function call (swap).

In modelsim I am looking on these signals:

  • exeAddr = Instruction address in the exe stage (program counter execution)
  • exeFull = There is a valid instruction in the exe stage (execution in exec stage)
  • dcdAddr = Address of instruction at decode stage (program counter decode phase)
  • dcdData = Instruction at decode stage (opcode in decode phase)
  • exeAReg = Operand A
  • exeBReg = Operand B
  • exeResult = Result of the operation

I turned off all optimizations in modelsim. I'm looking how much time (clock cycles) each instruction needs to execute. Normally each rlw* instruction should take around 2 clock cycles (you can read that in documentation) and it does with one exception.

It takes 13 clock cycles to jump from exeaddr = fffc39c to fffc3a0. I will describe briefly what is going on in modelsim when entering that address. cycle 1: exeaddr = ffffc39c | exeFull = 1 | dcdAddr = ffffc3a0 | dcdData = 801f0010 cycle 2: changes: exeFull = 0 | exe{Result, AReg, BReg} cycle 3: changes in gpr4 cycles 4..11: no changes cycle 12: changes in dcdData = 7ca32b78 cycle 13: exeaddr = ffffc3a4 ... - next instruction.

My interpretation of that is: cycle 1 - starting to execute instruction cycle 2 - leaving execution stage - we have results cycle 3 - storing results to gpr4 cycle 4-12 when we look at dcdData we see that in decode phase there is 801f0010 opcode (from ffffc200 address). Are these 8 cycles used for execution that instruction in background? One can see that r0 is used for mr r3,r0 and swap() uses r3 a lot, so why is he doing that right now (not before)?

Can someone help me to interpret that debug informations or point me to some materials where I can figure out what is going on? How much time does it take for that instruction to execute? Can it be that there is problem with ppc405 smartmodel? I have other question concerning that debug which are confusing me quite a lot but I will leave then for later discussion.

Thank you in advance, Mariusz.

-- mg.

Reply to
mg
Loading thread data ...

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.