semop and SCHED_RR under higher CPU loads

Hello group!

When running our application on full performances we encountered occasional delays of 100ms. At >80% of CPU load they appear very often - once every few seconds. If load is not so high, delays still appears but not so often. Investigation leads us to finding that delays occur on semop() when acquiring semaphore (this semaphore is used to protect a critical section, which is executed very intensively ~4200/s). Further investigation makes us to conclusion that delays occur due to round-robin scheduler (sched_rr_get_interval() returns exact 100ms period, application processes are running on RT priorities with SCHED_RR policy). When the process exhausts its time-slice scheduler preempts it for RR interval. If this happen when the semaphore is taken no one could return it for the 100ms. In the mean time many processes try to execute the same critical section but they are blocked. It has a consequence that all other processes, also those with higher priority are blocked, while they are running their own transitions (they should process messages from the message queue). These results to full queues (>1000 messages) of blocked processes and make them running for longer period (to empty theirs queues) and gives them a good chance to be preempted by RR scheduler again. The circle is closed. When the scheduler policy is changed to SCHED_FIFO all of this does not happen any more, but the application runs more in bursts which is not desirable. Do you agree with our findings from above? Do you have any suggestion how to prevent the described problem and retain SCHED_RR policy?

Best regards,

Sani

Reply to
Sani Rus
Loading thread data ...

Linux is not intended to be a hard realtime OS.

You can never rely on any time frame to be met.

Kernel 2.6 improved the soft-realtime behavior (meaning that the delay you see might occur not so often as in Kernel 2.4).

You can enable the "preemptive Kernel" feature in the Kernel configuration to activate this.

There are some "soft realtime patches" that might reduce the problem a bit more (but still you need to be aware that such delays of several 100 msecs can occur now and then).

If you need hard realtime you could try RTAI or commercial offers by TimeSys, MontaVista or SysGo.

-Michael

Reply to
Michael Schnell

Hello,

thank you for the answer. In fact we are using Montavista 2.6.10 provided kernel. I agree with you on the real time characteristics, but for this platform the main goal we want to achieve are performances - if you increase the real time characteristics this will have negative impact on the overall performances due to the additional overhead added. We want to keep SCHED_RR in order to keep the overall throughput unaffected.

Best regards,

Sani

Reply to
Sani Rus

This might prevent you from using the preemptive Kernel feature or even suggests to use Kernel 2.4 (better overall performance, worse soft realtime behavior).

OK. But SHED_RR processes preempt other processes only as long as these don't are serviced by the Kernel. And a Kernel service (e.g. file I/O) might prevent preemption for a long period.

The "preemptive Kernel" feature would help in many (but not all) cases but of course it reduces the overall performance.

-Michael

Reply to
Michael Schnell

I just read that MontaVista launched "Pro 5.0" that they say introduces a latency from as low as 5 µSecs,

-Michael

Reply to
Michael Schnell

What this latency stands for?

Reply to
Bin Chen

It doesn't "stands" for anything.

Reply to
ellis

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.