How I tripped over sched_rt_runtime_us

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

                How I Tripped Over sched_rt_runtime_us

             Don't Try to Control What You Can't Measure

I recently spent almost two weeks banging my head on the wall trying
to figure out why some software I was working on was failing.  There
were a number of SCHED_FIFO tasks triggered by timers.  They ran with
periods of typically 4,8,10 and 20 ms.  Each woke up based on a
periodic timerfd and ran for a few hundred microseconds.  Most times
when I would start the program, everything ran fine -- and would run
fine indefinitely.

But, about 1 time out of 20 maybe, I would start up the programs, and
they just wouldn't work right.  Hardware devices would repeatedly
time-out because they weren't being serviced.  There would be gaps in
message streams, etc.

After poking about for a few days, I determined that a thread that was
supposed to be waking up every 4ms (for example), would run fine for
"a while" and then just "not run" for many tens of ms.  Sometimes it
just didn't wake up, and sometimes it got suspended while it was
running. When running, it wasn't doing anything that could block.
When it got suspended while running, the location where it got
suspended was pretty much random.

After instrumenting code a bit more, I determined that the most common
failure state resulted in all my SCHED_FIFO threads being suspended
for 50ms once every second.  _Exactly_ 50ms, _exactly_ once every

Changing to SCHED_RR didn't change anything.  Rearranging the
priorities didn't change anything.  However when I changed one of the
threads to SCHED_OTHER, it started running without interruption.

The SCHED_OTHER thread was running _while_ SCHED_FIFO threads were
being suspended "against their will".

I had read about the real-time throttling feature in the kernel, but
it was only supposed to kick in when realtime tasks were using up 95%
of CPU.  I had just measured my CPU usage, and the realtime tasks were
barely using 5% -- not anywhere close to the 95% limit.

But still, SCHED_FIFO threads being suspended for 50ms every 1s while
SCHED_OTHER threads continue to run.  That sure smells like somebody is
intentionally blocking all realtime tasks 5% of the time.

Then it dawned on my that (at least on the ARM9), the mechanism used
by the kernel to measure per-thread CPU usage is based on the
assumption that tasks execute _randomly_ and their execution times are
not correlated with kernel timers.

Mine _are_ controlled by kernel timers.

If you're running threads triggered by timers, then the kernel's CPU
usage numbers are complete garbage[*].  I knew this, and in the past had
to work out my own CPU usage measurement schemes because of it.

_If_ a task is a realtime task, _and_ it happens to be in phase with
the CPU usage sampling, then the realtime usage looks like it's 100%
and the sched_rt_runtime_us limiter kicks in and locks out all
realtime threads for 50ms once every 1s.  In this case, that breaks
all sorts of seemingly unrelated stuff.

The moral to the story: if you're are using an architecture with
without _real_ per-thread CPU usage measurements and you are
triggering realtime tasks using timers, you _must_ disable the
realtime limiter by setting sched_rt_runtime_us to -1 because the
limiter is being fed CPU measurements that are, in effect, random.

* If you have a thread that uses 2% of the CPU time, but it's
  synchronized with the CPU usage sampling, then it will usually
  appear to use either 0% or 100% depending on the phase difference
  between the sampling and the task's execution.  If the phase
  difference is _just_ right you will see a number that varies wildly
  between 0 and 100.

Grant Edwards               grant.b.edwards        Yow! I want the presidency
                                  at               so bad I can already taste
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline