Hi,
in the recent days I was reading an article describing how to optimise instruction cache behavior by modifying the placement of functions in memory. The article can be found under:
I'm little bit confused about the effectiveness of the memory layout achieved by the described algorithm. Unfortunately, there's no detailed reasoning about why the found chains (functions placed continuously in memory) improve instruction cache performance.
For example, I don't see an advantage of the third chain
3-8-12-18 (see figure 3). After executing node 8, a large amount of instructions of other nodes will be fetched into the cache before node 12is executed. So, prefetching would not help here (or one assumes that all considered functions are very small which is not realistic).Can anyone shed some light on this?
Thank you.
Regards, Tim