XScale: Is Separate I and D Cache Appropriate for PC-Relative Variables?

Question

Aren't separate instruction and data caches inappropriate for ARM architecture?

The ARM compilers make good use of PC-relative pools for local variables, but doesn't this then mean that the same physical locations are going to end up in two caches effectively wasting half of the available cache memory? In other words, the same code and data will be in both I and D cache.

Surely a simple 'memory' cache is all that's required with the option to lock down a region or two; why the distinction between code and data when both will always be in the same physical area?

Is there a good reason for forcing this, apparently, grossly inefficient design choice?

Tim

raymund hofmann · Accepted Answer

"Tim Clacy"  schrieb im Newsbeitragnews:3f586dc1$0$267$How do you get to this assumption ?As I observe it, GCC does some PC-relative adressing in the ldr instructionmainly to load some "constants", which are put by the compiler into .textsegment (going in ROM).The ARM9/XScale is a Harvard architecture, getting perfomance fromsimultaneously reading instructions and reading/writing data.And as I is mainly the readonly instructions in ROM/FLASH or a distinct RAM areaand D is mainly the .data/.bss/heap in RAM there is little "wasting".Arey you into self modifying code for AI ?I feel like the amount of overlap can usually be ignored.Raymund Hofmann

Tim Clacy · Answer

I should have written assembler instead of compiler, not that it really matters since the compilers I use can also place data in PC-relative pools. Besides, it doesn't have to just be constants; take the interrupt vector table for example.

That wasn't the reason why I posted but, as it happens, yes... kind of... but not for AI. I'm into binary vector switches; that is a branch that can go one of two ways and flip the junction over so that it would go the other way next time; this is one side of an an event latch. In other words, the most efficient context switch possible using only one instruction (OK, two because of these cursed split load-store beasts). But that's by the by... :-)

Cheers for chipping-in.

Tim Clacy · Answer

...oh, I forgot the original point :-) The reason for the post was that, inthis particularly case, it would have been more useful to have a non-splitcache since the flyby DMA doesn't work in the PXA processors and we havetens of Mega bytes to shift around. It's not the end of the world, just anuncomfortable straight-jacket.

Skipper Smith · Answer

You would probably feel far more straight-jacketed by how much your performance would be limited by not being able to fetch code and data in parallel when executing instructions on the CPU.  It would simplify matters if the SA architecture (and I am assuming the XScale architecture, as well) had an "icbi"-type instruction, but having to execute a series of  nops to flush the instruction cache isn't that much overhead relative to moving 10s of MB, certainly.-- Skipper Smith                         Helpful Knowledge ConsultingWorldwide         Microprocessor       Architecture       TrainingPowerPC, ColdFire, 68K, CPU32                Hardware and Software/* Remove no-spam. from the reply address to send mail directly */

Dingo · Answer

What if you're not executing out of RAM? Also, the how 'inefficient' it is is going to depend on program design. If your locals can all be stored in registers then having a cache located around the pc isn't going to help. Maybe a chip designer will kick in also, because I have a feeling that there may be architectural reasons for keeping the two caches separate.

Tim · Answer

news:...variables,endInwhenisthereDingo. Hi. I'm aware of the benefits in traditional, sequential,single-thread code where code and local data are physically distant, but thepost was with regard to PC-relative data.

XScale: Is Separate I and D Cache Appropriate for PC-Relative Variables?

Join the Discussion

Didn't find your answer?