Hi,
I'm experimenting with an instruction set simulator for an Infineon DSP.
The simulator simulates the execution of the code for a DSP that has an instruction cache (16 KByte, 2-way set associative,
256 Bits/Line, LRU replacement) and the instructions might be 16 or 32 Bit wide.What I do is to compile a program (given in assembler) in its original version and run it through the simulator. Next, I arbitrarily reorder the routines in the assembly code, compile the code again and finally execute it again in the simulator.
My goal is to see how the cache performance changes after the routines have been reordered. It is well known that function which call each other frequently should be mapped close to each other in memory to (possibly) improve cache behavior.
Each simulation generates a statistics. For the original code I get: Total number of executed instructions = 615545 Total number of cycles = 1304469 Instruction cache Hit Rate: 99.98% Total instruction cache accesses: 615501 Total instruction cache hits: 615364
After arbitrary reordering of the routines the statistics look as follows: Total number of executed instructions = 615545 Total number of cycles = 727801 Instruction cache Hit Rate: 99.89% Total instruction cache accesses: 39072 Total instruction cache hits: 39028
So, the execution time of the reordered code got reduced from 1304469 cycles to 727801 cycles. The reason as can be seen above it the number of total instruction cache accesses which reduced from 615501 to
39072 (here, each cache access has a latency of 1 cycle). How is this possible? Why for the same number of executed instructions the number of cache accesses decreased so drastically?Best regards, Tim