Good document on ARM cache policies w.r.t. timing?

Once I finish preparing the materials, I'm giving a short lunchtime presentation at work about estimating CPU loading and latency.

I'd like to add some discussion about the difficulty of doing a simple count-the-cycles analysis on multi-cached, pipelined RISC architectures, where things start to get nondeterministic. I'm particularly keen to describe how this might affect ARM7[xxx] and ARM9 designs, because a lot of teams here are starting to migrate 8051 and other 8-bit designs into ARM micros.

I don't mind groveling through the ARM ARM and working it out from first principles if I have to, but is there a reference that already discusses these issues? For instance, if you're running with the MMU in full swing, L1 and L2 page tables in use, can you lock your ISR's table entries in the TLBs so the MMU doesn't have to touch RAM to look them up? How to lock code into cache? Is a cache line fill aligned on a hard memory boundary or will it fill from an arbitrary starting address, based on where you just touched memory?

Reply to
larwe
Loading thread data ...

That may be entertaining but not particularly usefull.

  1. Take a CPU which is good enough so you don't have to account for every bit and every cycle.
  2. Find the average load.
  3. Assume the peak load as 3...4 times higher then the average.

That's it.

I'm

If it did fit in 8051, then you don't have to worry if it fits in the ARM.

Don't do that. The whole point of using cache and TLBs is that you don't have to bother about the access to code and data. If you have to, then go find a faster CPU.

Is a cache line fill aligned on a hard

The cache lines are aligned on the boundary of their size.

Vladimir Vassilevsky

DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

Latency is really more important than overall loading, usually. The thrust of the presentation is really working on the small 8-bitters, but I want to illustrate in as much detail as possible how the simple analysis methods break down with complex micros.

Expansion and numerous complicated options are the reason they are migrating to ARM :)

Not a good answer if you also have to guarantee latency with boundaries on both sides of the window.

Reply to
larwe

I somewhere had a document going into this (done at Stanford IIRC) in some detail and a doc from Broadcom looking at it for the parts I used (MIPS core based) because it impacted the internal bus significantly.

Even though it's MIPS based, the basic issues would be the same, one might think.

I'll dig around and see if I can find them.

Cheers

PeteS

Reply to
PeteS

MIPS is fine. I am mostly illustrating problems, definitely not going into fine bit-level "this is how you fix it" solutions.

If you come across this document by Wednesday or so, please do let me know about it. (I'm planning the presentation for sometime in the latter half of next week).

Thanks!

Reply to
larwe

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.