Greetings,
this question is for ARM experts, in particular it's about ARM926 core (which is used in TI's DM6467 DaVinci processor).
I want to use cache for speeding up processing on video buffer of size YCbCr 4:2:0 1080P (1920x1088x1.5). Normally the buffer is not cached, since it is shared between ARM code, C64 DSP core and with an additional PCI master. Data flow is the follows: external PCI master fills in raw uncompressed frame -> we add several processings (layout building, background, some graphics and OSD test belnding), then the whole resulting frame is passed to DSP for compression.
ARM core runs MontaVista Linux 4.0.1 with kernel 2.16.18 (MV-patched from MontaVista 5.0 distribution).
I'd like to enable caching on ARM for this buffer, process it in chunks of 4K (D-cache on DM6467's ARM core is 8K 4-way associative, so I want to leave at least two ways for caching of other program's data and stack) and then call a kernel module, which will write-back and invalidate each 4K chunk. So by the end of wbinvd'ing the last chunk the whole buffer will be consistent in external RAM ready for DSP processing (obviously, before starting such a processing, the whole D- cache will have to be invalidated without write-back).
Sounds good, but I see problems with doing so, according to ARM926 TRM (or may be I just misunderstand).
ARM caches data in 32-byte lines tagged with Modified Virtual Address. MVA is made by appending a special field FCSE PID in CP15 reg. c13 to program's virtual address, if that address is below 32M; if the address is above 32M, no appending takes place and VA = MVA (that's what happens in kernel mode). User-mode programs are mapped to lower
32M VA and hence use that FCSE PID. I tried to use user-mode pointers in kernel mode, and got inconsistent data in user-mode buffers; apparently, the kernel changes FCSE ID on system call entry or just disables it.Now the TRM says: "FCSE translation is not applied for addresses used for entry based cache or TLB maintenance operations. For these operations VA = MVA." That is, I can use VA-based cache manipulation CP15 instructions in kernel mode without caring about FCSE PID. Now if that was true, suppose that there are currently data from 3 different processes cached for the same VA, just different PID and we're invalidating cache entry for that VA via CP15 reg c7, for which "translation is not applied". Which of the 3 entries above will get invalidated? All of them?
Thanks, Daniel