From cooperative to preemptive scheduler: a real example

I noticed my previous post about preemptive OS involved many people and started many discussions, most of them theoric.

Someone wrote the synchronization of tasks in preemptive scheduler is not so difficult, after understanding some things. Others suggested to abandon at all preemptive scheduler, considering its pitfalls.

Because I know my limits, I don't think I can produce a well-written preemption system. However I'd like to understand a little more about them. Starting from an example.

Suppose my system is a display where a message is written. The message can be customized by a serial line. In cooperative approach, I would write something:

--- main.c --- ... while(1) { task_display(); task_serial(); }

--- end of main.c ---

--- display.c --- static const char msg[32]; void display_set_message(const char *new_msg) { strncpy(msg, new_msg, sizeof(msg)); } void task_display(void) { if (refresh_is_needed()) { display_printat(0, 0, msg); } }

--- end of display.c ---

--- serial.c --- static unsigned char rxbuf[64]; static size_t rxlen; void task_serial(void) { unsigned char b = serial_rx(); if (b != EOF) { rxbuf[rxlen++] = b; if (frame_is_complete(rxbuf, rxlen)) { char new_msg[32]; /* decode new message from received frame from serial line */ display_set_message(new_msg); rxlen = 0; } } }

--- end of serial.c ---

The display needs to be refreshed. display_printat() is blocking: when it returns, all the display was refreshed. So the display always shows the entire message: there's no risk the display shows a part of the previous message and a part of the new message.

How to convert these two tasks in a preemptive scheduler? Which priority to assign to them?

The simplest approach is...

--- display.c --- static const char msg[32]; void display_set_message(const char *new_msg) { strncpy(msg, new_msg, sizeof(msg)); } void task_display(void) { while(1) { if (refresh_is_needed()) { display_printat(0, 0, msg); } } }

--- end of display.c ---

--- serial.c --- static unsigned char rxbuf[32]; static size_t rxlen; void task_serial(void) { while(1) { unsigned char b = serial_rx(); if (b != EOF) { rxbuf[rxlen++] = b; if (frame_is_complete(rxbuf, rxlen)) { char new_msg[32]; /* decode new message from received frame from serial line */ display_set_message(new_msg); rxlen = 0; } } } }

--- end of serial.c ---

This code works most of the time, but the display sometime can show a mix of old/new messages. This happens if display task is interrupted during refresh by serial task that calls display_set_message(). Or when display_set_message() is interrupted by display task and a refresh occurs.

If I assigned a higher priority to display task, the problem would remain. Indeed display_printat() couldn't be interrupted, but display_set_message() yes.

Here the solution is to take a binary semaphore before using the shared resource (and give the semaphore after the job is done).

void display_set_message(const char *new_msg) { semaphore_take_forever(); strncpy(msg, new_msg, sizeof(msg)); semaphore_give(); }

... if (frame_is_complete(rxbuf)) { char new_msg[32]; /* decode new message from received frame from serial line */ semaphore_take_forever(); display_set_message(new_msg); semaphore_give(); rxlen = 0; } ...

My impression is that a very simple code is cluttered with synchronization things that decrease readability and maintainability and increase complexity. Why? Just to use preemption?

Again my impression is that preemption is NOT GOOD and must be avoided if it isn't required.

So the question is: when a preemption scheduler is needed? Could you give a real example?

From what I have understood, preemption could solve real-time requirement.

Suppose display_printat() takes too much time to finish. This increases the worst-case superloop duration and could delay some system reaction. For example, if display_printat() takes 1 second to finish, the system could react after 1 second from an event (the press of a button, for example).

If this isn't acceptable, preemption could help. Is it correct?

Reply to
pozz
Loading thread data ...
[ 8< ]

The "clutter" is introduced because your "problem" inherently involves conflict; you're allowing two competing uses for a single resource.

The use of the synchronization primitive OVERTLY acknowledges this issue/possibility -- lest (subsequent) another developer fail to recognize that the possibility exists (i.e., "latent bug").

"Multiplication is NOT GOOD and must be avoided if it isn't required (i.e., if you can use repeated ADDITIONs, instead)"

The "scheduler" is present in any multitasking system -- cooperative or preemptive. SOMETHING has to decide who to give the processor to when the currently executing task gives up control (or, has it removed from it)

In your cooperative examples, the "while()" is used to implement the scheduler: when one task() "returns", the one listed on the next line (of the while loop) is given control... "scheduled" to run.

Don't conflate "big loop" with "cooperative".

Preemption, like any capability, brings with it assets and liabilities. Imagine you were tasked with building a box that blinked lights (XMAS lights!) at different/varying rates. The box has a dozen solid state switches that control the individual lights (or, "light strands").

It would be really easy -- and intuitive -- to write: void lights1() { while(FOREVER) { light(1,ON); sleep(500ms); light(1,OFF); sleep(279ms); } }

void lights2() { while(FOREVER) { light(2,ON); sleep(100ms); light(2,OFF); sleep(50ms); } }

void lights3() { while(FOREVER) { ontime = (10.0 * rand() ) / RAND_MAX; light(3,ON); sleep(ontime); light(3,OFF); sleep(10.0 - ontime); } }

etc. No silly "yields" to get in the way. No need for synchronization primitives, either, because nothing is SHARED!

[Contrived example but you'll find that there are many cases of tasks co-executing that are NOT sharing anything (other than the processor)]

There are other classes of problems where the problem lends itself, naturally, to "peaceful" sharing -- where you're not in conflict with another. And, other techniques to hide the sharing mitigation in other mechanisms.

Preemption lets you code AS IF you were the sole owner of the processor... EXCEPT when you need to share something (which would imply that you are NOT the sole owner -- at least at THAT time! :> )

The downside to cooperative multitasking is that *it* clutters your code -- with all those yield()s -- and requires you to keep track of how "long" you've hogged the CPU in the time since your last yield (because that time gets reflected to all subsequent task runnings).

When I write code in a cooperative environment, I *litter* the code with yield()s to keep *reaction* times (of other tasks) short. This then means yield() has to run like greased lightning lest it impact overall performance (because it is pure overhead!)

Reply to
Don Y

I made some such statement.

So, this system consists of a display and a serial input line and has requirements as follows:

  1. The display shall at all times show a message, of at most 31 characters.

- To be defined: what the initial message should be at system reset.

  1. The SW shall receive characters from the serial line, buffering them in a "frame buffer" in memory, which can hold up to 64 characters.
  2. After each received (and buffered) serial-line character, the SW shall check if the buffered characters form a complete "frame".

- To be defined: what to do if the frame buffer is full but does not form a complete frame. (This may of course be impossible by design of the "frame_is_complete" function.)

  1. When the buffered characters form a complete frame, the SW shall convert (decode) the contents of the frame into a message, of at most 31 characters, display that message until another, new frame is received, and erase the frame-buffer in preparation for the next frame.

The real-time aspects are undefined, except that each message is displayed until the next frame is received.

Before that conversion one must think about the real-time requirements: deadlines, response-times. This is difficult for this example, because you have not stated any requirements.

Let's assume these requirements and properties of the environment:

A. The function "serial_rx" polls the one-character reception buffer of the serial line once, and returns the received character, if any, and EOF otherwise. It must be called at least as often as characters arrive (that is, depending on baud rate) to avoid overrun and loss of some characters.

B. A pause in the serial-line character arrival cannot be assumed after the completion of a frame. The first character of the next frame can arrive as quickly as the baud rate allows.

C. The functions "frame_is_complete" and "display_set_message" take, together, so much less time than the serial-line character period that the whole "task_serial" function also takes less time than the character period.

D. The function "display_printat" can take longer than the serial-line character period.

Under these assumptions, the cooperative solution does not work, because when a frame is completed, "display_printat" is called which may mean too much delay for the next "serial_rx" call and cause loss of input characters.

Because the "msg" variable in display.c is accessed from both tasks, as an unprotected shared variable.

In addition to that problem, you have written both tasks to use polling, with no delay, which wastes processor resources, especially for "task_display". The "task_serial" task does need to poll "serial_rx" (per the assumptions above), but it could certainly do so at some non-zero period, computed from the baud rate and the execution times to ensure that "serial_rx" calls are frequent enough to avoid loss of input data.

Of course, a serious design would use serial-line interrupts and trigger the "task_serial" only when a character has been received.

For "task_display", you could replace the "refresh is needed" flag with another semaphore, which is initially zero, is "given" in "task_serial" when a new message is to be displayed, and is "taken" by "task_display" before it displays the new message. Then "task_display" consumes no processing resources until it actually has to.

Here you need some code to set "refresh is needed" to true. That flag is also a shared variable.

If you have semaphore calls here, in "display_set_message", ...

... then you do not need themn (and should not have them) here, around the call of "display_set_message".

You also need to use the mutex semaphore from "task_display", for example as follows:

void task_display(void) { while(1) { if (refresh_is_needed()) { char new_msg[32]; semaphore_take_forever(); strncpy (new_msg, msg, sizeof (new_msg)); // Here something to set "refresh is needed" to false. semaphore_give(); display_printat(0, 0, new_msg); } } }

Otherwise the "task_serial" could still overwrite the message with a new one, during the call of "display_printat".

To assign priorities, you look at the deadlines of the tasks:

- task_serial: deadline = serial-line character period (actually one-half of it)

- task_display: no deadline defined: infinite deadline.

Then you assign priorities in order of deadlines: higher priorities for shorter deadlines, hence "task_serial" will have higher priority than "task_display". The numerical values of the priorities do not matter, only their ordering.

With "task_serial" having a higher priority, it can pre-empt the slow "display_printat" function whenever it needs to, and thus call "serial_rx" often enough.

No -- to make the SW work, where the cooperative design did not work.

Maintenance is eased because the pre-emptive design continues to work even if the execution time of "display_printat" was initially short, but then increased to become longer than the serial-line character period.

In larger programs there are important advantages of preemption in helping decouple modules from each other.

Or it could lose serial input data (under my assumptions).

Yes.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .
Reply to
Niklas Holsti

Howevere the shared resource complexity is present only when preemption is used.

I know all approaches have pros and cons. What I was meaning is that preemption is used too often, even when it isn't really required. With FreeRTOS preemption scheduler is often enabled. It seems to me many times preemption is used only to show how nice I am.

Yes, my superloop is an example of a *very simple* cooperative scheduler, but a cooperative scheduler can be implemented in a different way (as FreeRTOS).

In the superloop cooperative approach:

void lights1() { if (state_ON && timer_expired()) { light(1, OFF); timer_arm(500ms); state_ON = false; } else if (!state_ON && timer_expired()) { light(1, ON); timer_arm(279ms); state_ON = true; }

This is a state-machine and I admit it's harder to write then in preemptive scheduler.

I suspect many real applications need synchronization mess (and risks if you don't know very well what the pitfalls of multitasking). And in those cases I'm not sure if it's simpler to code in preemption/blocking/synchronization or in cooperative/non-blocking/state-machine.

If you use non-blocking state-machines, there aren't any downside to cooperative multitasking. There aren't real yield()s, they are hidden when the task function exits.

Reply to
pozz

The only real-time is that the new message sent through the serial line appears on the display in a reasonable time: 100ms? 1s? Something similar.

The second requirement is that the display mustn't show a hybrid message composed by two parts of the successive messages.

No, serial driver works in interrupt mode and already use a FIFO buffer, sufficiently big. serial_rx() pop a single element from the FIFO, if any.

Why? I think the user can change the message only when he wants. Normally no activity is present on the serial line.

Serial driver interrupts guarantees no loss of input during display_printat() or other functions.

I was thinking to a refresh made at regular intervals, such as every 100ms.

Reply to
pozz

[snip code]

You asked about possible advantages of pre-emption; I made my assumptions, above, such that the (incomplete) example you gave shows this advantage, under these assumptions (which could be true for other, otherwise similar example applications).

Ah, then your *system* is intrinsically pre-emptive (the interrupts pre-empt the tasks), even if the *code you showed* does not show this pre-emption.

I won't reply to your other comments on my assumptions, as they are irrelevant to the point of where and when pre-emption can be good for you.

Right, because it is pre-emptive. So there you see the advantage.

In some systems that could result in annoying flickering of the display, which could even be dangerous (seizure-inducing) to some users.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .
Reply to
Niklas Holsti

Ah yes, interrupts are preemptive and I use them a lot, but they are confined in their works, they are very lightweight and fast.

The discussion here regards medium to high complexity tasks.

Reply to
pozz

[snip]

That's a design decision. Some systems do most of their work in interrupt handlers, and use "background" processing only for some non-critical housekeeping tasks.

I don't recall you saying so, and your example (perhaps naturally, for an example) did not have such tasks. Many interrupt handlers are more complex than the tasks in your example.

In a pre-emptive system there is no logical difference between interrupts and tasks. The only practical difference is that interrupts are "tasks" that are scheduled by the processor HW (moderated by SW control of interrupt masking and disabling) while the tasks are scheduled by the RTOS.

The advantages (and complications) of pre-emptive scheduling are mostly the same for tasks as for interrupts.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .
Reply to
Niklas Holsti

Yes, I imagine there are a pletora of possibilites. Anyway I thought there was two typical approaches: superloop that continuously calls non-blocking functions (an example of cooperative scheduler) and a full preemptive scheduler (most of the time a full RTOS).

Interrupts are always present, even in superloop.

Of course the example was simple. Its goal was to discuss the complexities (clutters) that a preemption scheduler add to the code of tasks (not to the code of interrupts that are almost the same).

Maybe I am wrong, but I implement ISRs with great care and attention, but they are very small and limited. Most of the time they are already implemented in drivers released by the silicon vendor. Anyway they are so limited that the preemption issues (synchronization) are very well defined and confined.

Normally the biggest part of the firmware is related to the application/tasks, not interrupts.

Reply to
pozz

Another trouble I found with a preemptive RTOS is how to size the stack of tasks.

In my superloop cooperative scheduler:

while(1) { task1(); // Non-blocking fast state-machined task task2(); // Non-blocking fast state-machined task }

the stack is assigned to all the unused memory available in the system (that is all the memory minus variables and heap size, if you use it).

If two tasks use a stack-intensive function (such as a printf-like function), this doesn't increase the overall stack requirement. For example, if the stack-intensive function needs 2kB of stack, the global stack can be 2kB (more other stack needed by tasks for other operations).

With a preemptive scheduler, tasks can be interrupted in any point, even during printf-like function. So *each* task needs a stack of 2kB, reaching a global stack requirement of 4kB.

Another issue with preemptive approach is that you should be smart enough to size N stacks (where N is the number of tasks). With the superloop architecture above, you should size only *one* global stack, that can be calculated over one task, the one that needs more stack.

Does this make sense?

Reply to
pozz

Am 14.01.2020 um 13:27 schrieb pozz:

That trouble is not at all particular to preemptive scheduling.

And how do you know that that's sufficient?

If you can't calculate the stack for those small-ish tasks, then you can't do it for the functions called by your super-loop, either.

If you can judge which of the super-loop's sub-tasks consumes the most stack, and how much that is, then you can do it for preemptive scheduling, too.

The difficulty of (reliably) computing stack usage is the same, regardless what tasking concept is used.

Reply to
Hans-Bernhard Bröker

I'm not too smart to estimate the stack usage of a function (I know the compiler can produce some useful info about this, but you should add interrupt stack usage and so on), so my approach is only tests.

I fill the memory with a known value, run the application for many hours/days and eventually check the memory content to see the greatest (or lowest) address reached by the stack.

This isn't a simple job, but I do it one time, because there's a single global stack in superloop architecture.

I don't agree. Most of the time I can guess what is the task (or a few tasks) that consumes more stack. So the effort to estimate stack usage is limited.

In preemptive scheduler, you would need to multiplicate your effort to estimate stack usage for every single task to avoid wasting precious memory.

Anyway, suppose we are very smart to calculate stack usage of each task:

- task1 needs 10kB

- task2 needs 10kB

With preemptive scheduler you need to reserve 20kB for the stack, in superloop you can reserve only 10kB.

Reply to
pozz

On 2020-01-15 pozz wrote in comp.arch.embedded:

But in the superloop you need to declare a lot of stuff static to preserve it for the next loop, so it can't be on the stack like in the preemptive case. So you will more likely need 1kB of stack and 2 * 10kB of statics in the superloop case.

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

Been Transferred Lately?
Reply to
Stef

Il 15/01/2020 11:30, Stef ha scritto:

This is right if the stack-consuming function is slow and should be split in steps to avoid having a long-running task (i.e., convert it to state-machine).

However I could have a fast stack-consuming function that shouldn't be split.

while(1) { task1(); // calls StackConsumingFunc() task2(); // calls StackConsumingFunc() }

If StackConsumingFunc() needs 10kB stack size, what is the stack allocation for superloop and preemptive scheduler?

I think I can allocate only 10kB stack if superloop is used, but I need

20kB stack if preemptive scheduler is used.
Reply to
pozz

The approach where all real-time "tasks" are interrupts is called "foreground-background", I believe. It was popular in the microcomputer era, before real-time kernels became popular -- it gave most of the same benefits (pre-emption) but scheduled by HW (interrupt priorities).

Again a design choice. Some critical systems don't want *any* pre-emption, even by interrupts -- simplicity reigns. So they poll, and accept the performance hit. For example, some colleagues of mine recently completed the "recovery SW" for ESA's ExoMars rover, which takes over if the "real" SW crashes and is intended to be super-safe and just support problem analysis and recovery. Using any kind of real-time kernel was forbidden -- only sequential coding was allowed. Any new, nice SW "feature" that my colleagues suggested, for example to speed up communication with Earth, was denied by the customer because it would complicate the SW a little bit -- perhaps one more conditional statement. No-no.

Of course you are right to use great care and attention, and especially in ISRs.

Many tasks in pre-emptive systems are also small and limited -- for example, one of my designs has a task whose main loop just takes a data-packet from an input queue, computes its CRC and appends it to the packet, and puts the updated packet in an output queue. A handful of source lines.

I would say that having a pre-emptive system relieves the designer of the great care and attention needed to slice computations into (state-mnachine-driven) "tasks" that execute their computations little by little.

Except in the aforementioned foreground-background designs.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .
Reply to
Niklas Holsti

That is quite true, if you allow pre-emption in the printf-like function. Also, depending on how your interrupt handlers are designed, you may have to allocate interrupt-handler space on each of the per-task stacks.

However, any computation that needs a lot of stack also tends to use a lot of time, so you might have to slice it up into a sequence of states (state machine approach) and then the data must be statically allocated rather than stack-allocated, as Stef pointed out, and moreover each task that uses that computation must have its own copy of the data.

Having a single stack rather than per-task stacks is easier if one has to program the high-water-mark method for measuring stack usage separately and manually for each stack.

Static analysis of stack usage bounds, either by a dedicated tool from the machine code or as an extra function of the compiler and linker, is really so simple that I see little excuse for emmbedded-SW programming environments not to offer this analysis as a matter of course. And then per-task stacks are just as easy as one stack.

Yes, the main-loop approach can use less stack than the pre-emptive approach.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .
Reply to
Niklas Holsti

Am 15.01.2020 um 09:36 schrieb pozz:

As the saying goes: testing proves diddly-squat. You'll hardly ever know whether your test cases came anywhere near exercising the code path of worst stack usage.

You disagree to a different statement than the one I made. I was talking about this job being difficult. You object based on it being a lot to work.

Guessing is about the only approach to this that is even less reliable than testing.

The effort doesn't really multiply, because it'll all be handled by loops in the code anyway. Doing the same thing multiple times is what CPUs are supposed to be good at, after all.

Reply to
Hans-Bernhard Bröker

Am 15.01.2020 um 20:26 schrieb Niklas Holsti:

No. It's exactly as easy. You just do the same thing for each stack as you would for a single one.

Objection. It is simple only in a limited class of use cases. As soon as there is any use of function pointers or the merest hint of possible recursion, static stack analysis becomes equivalent to Turing's Halting Problem. Problems don't really get much harder than that one.

Reply to
Hans-Bernhard Bröker

In terms of intrinsic difficulty, yes. In terms of the amount of manual work, no.

In theory, yes. In practice, no. The Halting Problem is impossible to solve for *all* programs. It doesn't mean that solutions are impossible for a *large* class of *common* programs.

Most programs are designed cleanly enough to make stack-usage analysis possible. An exact value is not needed; an upper bound is enough, if it isn't very pessimistic. Assuming that the type system is not broken, a whole-program analysis can figure out all possible target functions of any function pointer.

For recursion, I agree that some manual support (provide a bound on the depth of recursion) is usually needed. However, embedded programs rarely use recursion in any non-trivial way. And most functions have stack-frames of static size; it is rare to allocate dynamic amounts of stack.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .
Reply to
Niklas Holsti

Yes "foreground-background" is another name, but it says nothing about the blocking/non-blocking of background (non interrupts) tasks. If you have blocking (long running) tasks, you will not have a reactive system (only interrupts will be reactive). If the tasks are coded as non-blocking (maybe with the help of state-machines), you will be able to have a reactive system even for the non-interrupts tasks.

I don't think this is an example that can be considered "typical" and I know there are "extreme cases" where a singular approach is needed.

I don't think the CRC computation takes too long to split it in smaller and faster steps (state-machine). I think you could write that task without a state-machine in superloop architecture:

void task_CRC(void) { uint8_t *packet; size_t packet_len; int err = queue_pop(&packet, &packet_len); if (!err) { packet[packet_len] = crc(packet, packet_len); packet_len++; queue_push(packet, packet_len); } }

And here you don't have to think about multithreading issues (race condition, sharing resources, ...).

Even in foreground-background ISRs are limited to critical real-time events.

Reply to
pozz

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.