Finding root cause of performance loss

- N
- Noob
  
  Contact options for registered users
posted
9 years ago

Wed, Jun 25, 2014 2:01 PM

[ NB: cross-posted to comp.arch.embedded and comp.arch ]

Hello everyone,

I'm currently working on a "typical" set-top box project (digital TV).

The system can be considered an "heterogeneous computing" system, with various "processing elements" for different tasks:

- an SH4 (ST40) "system" CPU, where the app runs on top of a mini OS

- a micro-controller for watchdog and low-power/stand-by functions

- a co-processor for audio decoding

- another co-processor and/or ASIC for video decoding (the media decoders are not well documented)

- a few DMA engines

- a blitter gizmo for UI whiz-bang

- a crypto co-processor

- stuff I don't even know about

All of these accessing a shared resource: RAM (through a shared bus??)

The ODM provides minimal profiling tools (instruction pointer sampling, and a post-processing script tho parse the symbol table, matching IP with the corresponding function).

Problem is, these tools only profile the "system" CPU. The rest of the system is a giant black-box to me.

Profiling shows that merely decoding one HD channel (audio and video) pegs the system CPU to 50%, which is unexpected, because all the heavy lifting is done elsewhere.

If I disable the audio, the load drops to 25%... even though audio tasks were far from taking 25%. When audio is disabled, the system CPU spends less time in ALL other parts of the software.

This would seem to incriminate some kind of bus contention for a shared resource, and I'm thinking main memory.

Drop audio decoding => bus contention drops => everything runs smoother.

Does this theory make sense/hold water?

More importantly, how would I validate/invalidate it?

In order to be a credible explanation, it is required that when the system CPU needs to access RAM, if the bus is locked by another entity, the CPU just spins, instead of switching to a different task.

I'm thinking maybe I can use the perfcounters to high-light CPU twiddling its thumbs while waiting for RAM access?

formatting link

Hmmm, there is a "ram" event, but its only a counting event, so no cigar. Perhaps using cache misses? pfi and pfo (Pipeline Freeze due to cache miss Instruction/Operand)

But the problem is not really SHARING the memory, but merely accessing it. In the limit, each processor could have its own little private part of RAM, but only one processor can access RAM at the same time. But that would still impact the latency of the pipeline freeze on a miss. (Sorry for thinking out loud, I'm really in the dark here.)

Anyway, I'm open to suggestions / advice / warnings / etc.

Regards.

- T
- Theo Markettos
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Wed, Jun 25, 2014 10:16 PM

What's the cache architecture? Is there only one cache, shared between all these processors doing different things, or does each processor have its own cache(s)?

Theo

- N
- Noob
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jun 26, 2014 8:31 AM

After /much/ digging around, it turns out that some moron on the team decided to disable compiler optimizations for the support libraries.

Changing optimization level back to -Os drops the system CPU load to 27%.

Disabling audio on the optimized build drops the CPU load to 14.5% The phenomenon I described still occurs (the system spends less time in video related tasks) but it is much less of a factor than I had originally assumed.

Also someone suggested that disabling audio means no longer needing to perform audio/video synchronization, which lowers system load even further.

Sorry for the noise.

- N
- Noob
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jun 26, 2014 8:38 AM

The system CPU has its own cache hierarchy (L1+L2). I don't know much about the other processors on the SoC. Anyway, data shared by several processors are placed in non-cached memory.

Also, cf. my other message: the main problem came from someone disabling optimizations for the build :-(

Regards.

- M
- Mel Wilson
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jun 26, 2014 12:52 PM

Good example of problem-solving at least.

Mel.

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Jun 26, 2014 3:52 PM

Hey, sometimes you just have to publicly admit that you're stumped before the answer jumps out at you.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com