Hi all.
I'm trying to figure out how much time does it take for a such code to execute. I'm using modelsim on ppc405 processor (fpga virtex4 fx-12).
ffffc1d0 : ... ffffc1fc: 7c 00 01 24 mtmsr r0 ffffc200: 80 1f 00 10 lwz r0,16(r31) ffffc204: 7c 03 03 78 mr r3,r0 ffffc208: 48 00 01 89 bl ffffc390 ffffc20c: 7c 60 1b 78 mr r0,r3 ffffc210: 90 1f 00 20 stw r0,32(r31) ffffc214: 38 00 00 00 li r0,0 ...
ffffc390 : ffffc390: 54 64 c4 2e rlwinm r4,r3,24,16,23 ffffc394: 54 65 c0 0e rlwinm r5,r3,24,0,7 ffffc398: 50 65 42 1e rlwimi r5,r3,8,8,15 ffffc39c: 50 64 46 3e rlwimi r4,r3,8,24,31 ffffc3a0: 7c a3 2b 78 mr r3,r5 ffffc3a4: 50 83 04 3e rlwimi r3,r4,0,16,31 ffffc3a8: 4e 80 00 20 blr
As you can see in main() function when pc = ffffc208 we have function call (swap).
In modelsim I am looking on these signals:
- exeAddr = Instruction address in the exe stage (program counter execution)
- exeFull = There is a valid instruction in the exe stage (execution in exec stage)
- dcdAddr = Address of instruction at decode stage (program counter decode phase)
- dcdData = Instruction at decode stage (opcode in decode phase)
- exeAReg = Operand A
- exeBReg = Operand B
- exeResult = Result of the operation
I turned off all optimizations in modelsim. I'm looking how much time (clock cycles) each instruction needs to execute. Normally each rlw* instruction should take around 2 clock cycles (you can read that in documentation) and it does with one exception.
It takes 13 clock cycles to jump from exeaddr = fffc39c to fffc3a0. I will describe briefly what is going on in modelsim when entering that address. cycle 1: exeaddr = ffffc39c | exeFull = 1 | dcdAddr = ffffc3a0 | dcdData = 801f0010 cycle 2: changes: exeFull = 0 | exe{Result, AReg, BReg} cycle 3: changes in gpr4 cycles 4..11: no changes cycle 12: changes in dcdData = 7ca32b78 cycle 13: exeaddr = ffffc3a4 ... - next instruction.
My interpretation of that is: cycle 1 - starting to execute instruction cycle 2 - leaving execution stage - we have results cycle 3 - storing results to gpr4 cycle 4-12 when we look at dcdData we see that in decode phase there is 801f0010 opcode (from ffffc200 address). Are these 8 cycles used for execution that instruction in background? One can see that r0 is used for mr r3,r0 and swap() uses r3 a lot, so why is he doing that right now (not before)?
Can someone help me to interpret that debug informations or point me to some materials where I can figure out what is going on? How much time does it take for that instruction to execute? Can it be that there is problem with ppc405 smartmodel? I have other question concerning that debug which are confusing me quite a lot but I will leave then for later discussion.
Thank you in advance, Mariusz.
-- mg.