From cooperative to preemptive scheduler: a real example

pozz · 2020-01-07T01:08:52+00:00

I noticed my previous post about preemptive OS involved many people and started many discussions, most of them theoric. Someone wrote the synchronization of tasks in preemptive scheduler is not so difficult, after understanding some things. Others suggested to abandon at all preemptive scheduler, considering its pitfalls. Because I know my limits, I don't think I can produce a well-written preemption system. However I'd like to understand a little more about them. Starting from an example. Suppose my system is a display where a message is written. The message can be customized by a serial line. In cooperative approach, I would write something: --- main.c --- ... while(1) { task_display(); task_serial(); } --- end of main.c --- --- display.c --- static const char msg[32]; void display_set_message(const char *new_msg) { strncpy(msg, new_msg, sizeof(msg)); } void task_display(void) { if (refresh_is_needed()) { display_printat(0, 0, msg); } } --- end of display.c --- --- serial.c --- static unsigned char rxbuf[64]; static size_t rxlen; void task_serial(void) { unsigned char b = serial_rx(); if (b != EOF) { rxbuf[rxlen++] = b; if (frame_is_complete(rxbuf, rxlen)) { char new_msg[32]; /* decode new message from received frame from serial line */ display_set_message(new_msg); rxlen = 0; } } } --- end of serial.c --- The display needs to be refreshed. display_printat() is blocking: when it returns, all the display was refreshed. So the display always shows the entire message: there's no risk the display shows a part of the previous message and a part of the new message. How to convert these two tasks in a preemptive scheduler? Which priority to assign to them? The simplest approach is... --- display.c --- static const char msg[32]; void display_set_message(const char *new_msg) { strncpy(msg, new_msg, sizeof(msg)); } void task_display(void) { while(1) { if (refresh_is_needed()) { ...

U

upsidedown 6 years ago

What is the problem ?

In the character receiving ISR calculate the partial CRC/checksum for each received character. At the end of the message, you do need to process the last received character to check if the CRC matches.

In fact I have done a Modbus RTU line monitor for a hardware without accurate timing information.

After receiving a byte calculate a partial CRC and compare it with the last two bytes. If the check matches, Assume it was he end of frame. Then perform a sanity check that the frame is longer than the minimum length and that it makes sense according to the protocol specification. If these additional tests fail, assume that this was a premature CRC. Continue reading bytes until a new CRC match is obtained. Perform the same sanity test.

If the received frame is longer than maximum valid frame, move the start position one byte ahead and perform a long CRC calculation.

There are surprisingly lot of false CRC triggerings, but resynchronisation is usually quite quickly obtained.

Vote

N

Niklas Holsti 6 years ago

...

Reactivity was the very point of the architecture - anything that needed quick response executed at interrupt level. The background was just a sequential process that handled all the non-time-critical things (often stuff delegated from the interrupt levels).

Those machines usually had prioritized interrupt systems where a higher-priority interrupt could interrupt (and thus pre-empt) a running lower-priority interrupt handler. So the systems were definitely reactive.

Not needed in a foreground-background system.

Ok, but you said "always". I remember other examples of systems not using interrupts -- usually small, quick-and-dirty-but-good-enough applications.

If the result is fast enough, compared to the shortest deadline, of course you can. But you can't be sure it will be fast enough if you don't know (a) the maximum length of the packets (b) the speed of the processor (c) the shortest deadline that would be at risk.

Note also that even if the CRC computation is fast enough not to endanger other deadlines, if you make it one "task" in a superloop with

100 other "tasks", it will be called only once per 100 "task" executions (once per superloop iteration) which will limit the rate of packet flow, which may violate a hard performance requirement.

To fix that problem, you would have to go to a "major cycle - minor cycle" design in which the superloop is the "major" cycle, but consists of some number of "minor" cycles each of which calls the CRC computation "task", to ensure that it is called often enough. This was the standard approach to real-time systems before pre-emption became acceptable.

I only gave the CRC task as an example of a "task" that is as short as or shorter than typical interrupt handlers. But perhaps that was going off-topic.

(You may ask, why was that CRC calculation in a task of its own, and not just in a subroutine called from other tasks? Because the processor had a HW CRC unit, which the task used, and the CRC task serialized accesses to that HW unit. So the task in fact blocked, waiting for the HW unit to compute the CRC.)

The multithreading issues in the CRC task are solved by the queue data-structure, which is "task-safe" and used all around the system. No specific multithreading problems for the CRC task. Five minutes to design, 15 to implement.

You still don't get the point. In a foreground-background system, the "interrupt service routines" (the foreground) *are* the tasks and implement all of the main system functions. The background process is used only for the "left-overs" -- say, memory scrubbing in a space-based system. Of course there may be a considerable amount of "left-overs" if deadlines are tight and some non-urgent but interrupt-triggered activities can't fit in the foreground processing.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

P

pozz 6 years ago

I was using reactive word with another meaning.

Real-time response is critical and must be managed by interrupts (that can preempt background tasks). If a char is received from a UART with a single-byte FIFO, you *need* to move it in RAM as soon as possible, to avoid discarding the next byte that could arrive immediately after.

Reactivity is much less critical. You press a button on a keyboard and you expect to see that character on the display as soon as possible. It can appear after 10ms and you will be very happy (wow, this PC is very reactive), but can appear, maybe a few times, after 500ms and you will be less happy (s*it, how slow is this PC?). In both cases, the system does its job.

I usually manage reactivity of the system in non-interrupt background tasks. In order to have high reactivity, I need to maintain the superloop duration short (of course, if the tasks are maximum 10), so my first rule when implementing a task: it mustn't block waiting for I/O or mustn't spend too much time in calculations. If this happens, I try to transform the task in state-machine and I can say the reactivity of the system is normally very good.

Of course, I know there are some extreme cases where you have 100 tasks (I admit it's impossible to maintain a good general reactivity in this case, even if every single task is fast) or when a task is very CPU intensive (FFT, encryption, ...). I think those are examples of a few applications where preemptive multitasking approach is the right answer to have a good reactivity.

But in many many other applications you can guarantee real-time (through interruts) and reactivity (through non-blocking background tasks) with a simple superloop cooperative approach, without multitask issues (...and critical bugs) but with an additional complexity (state-machine).

What is not needed? Reactivity?

My shame... often.

Again you show examples that appears to me at minimum "extreme", not "typical". How many times you write applications with 100 tasks?

Ok, maybe it's better to say that I usually work on MCUs with internal Flash and RAM, just to say Cortex-Mx. My experience is with MCUs that run at maximum 100MHz and manage Ethernet or TFT, not together.

Now those MCUs are little monsters that can manage at the same time Ethernet, SD cards with FAT filesystems, USB, TFT display and so on. In those complex applications, I admit a preemptive approach could be a better choice.

[OT] Anyway I don't know how much space those little monsters have in the market. Should I design a board with a lot of fast interfaces and the cost or power consumption weren't so important, most probably I would choose an embedded Linux SOM with a lot of Flash and RAM. And maybe I will choose Python as the language that speed up a lot the development. [/OT]

Maybe multitasking issues aren't too complex... issues, after acquiring some experience and knowledge. At the moment they appear to me very complex.

What I mean with "foreground-background" system is another thing. A few critical real-time requirements are managed in foreground interrupts routines, the biggest part of the application, that should be reactive for the user, are managed by some background tasks.

I have an example of an IoT board based on a Cortex-M3 MCU by NXP. The MCU communicates with a remote device through a RS485 bus, and with a Cloud system (actually AWS) through Ethernet. In this application I have three main foreground ISR-based drivers (timer, Ethernet and UART) and some background tasks: lwip TCP/IP stack, RS485 protocol parser, LEDs and some other minors.

The real-time requirements are only for communication links: we shouldn't loose Ethernet frames nor RS485 characters. When some events are communicated over RS485, a message is published on the Cloud as soon as possible.

This system is implemented with a superloop cooperative approach and I can say the reactivity is good.

The Internet protocol is MQTTs, so the security layer mbedTLS runs together with lwip. It hasn't been a simple job, because mbedTLS needed a lot of RAM during TLS session handshake. lwip can work in "no OS" systems by calling sys_check_timeouts() function regularly... and I simply call it in my superloop.

MQTT client state-machine is not so complex, there are only a few main states:

- waiting for DNS replay (to resolve server domain address)

- waiting for MQTT connection ack from the server

- MQTT connected (where the message are published)

I admit mbedTLS is very slow during TLS session startup, because of asymmetric encryption (the MCU doesn't have hw crypto engine). During this period reactivity isn't good. I don't loose chars from RS485 (thanks to interrupts), but a timeout expires on the RS485 remote device that thinks the Ethernet board is dead. Luckily this happens only during TLS startup and the problem can be filtered in a simple way.

Recently I have seen that mbedTLS added a feature to split CPU intensive calculations in smaller steps, exactly to avoid blocking in cooperative systems[*]. I think they implent somewhat similar to a state-machine.

Just to say that this isn't so illogical approach.

[*]

formatting link

Vote

N

Niklas Holsti 6 years ago

So you are making a difference between hard and soft deadlines. Ok. And certainly the "background" task in a foreground-background design can be designed as a master-loop (or major/minor cycle loop) to give shorter response times for this task, resulting in a hybrid design.

I would not call use the term "reactivity" to denote this difference, but I suppose you can if you want to.

No, not "impossible". It depends on your quantitative definition of "good", and on whether you have a single master loop or a major/minor hierarchical loop.

An example application is on its way to orbit Mercury, on-board the Bepi-Colombo spacecraft: the on-board SW for the the MIXS/SIXS X-ray instruments. My colleagues who implemented this SW wanted to avoid a real-time, multi-tasking kernel (for a reason never really explained to me) but they had real-time deadlines to meet. They decided on a one-second major cycle and a one-millisecond time quantum. They made a fixed schedule saying exactly what the SW should execute in each of the

1000 milliseconds in each second. Some jobs are allocated more than one millisecond, so I think they have about 300 "jobs" to execute in sequence in each second. The schedule allocates several millisecond slots, evenly spaced, for jobs with deadlines shorter than one second. (I forget what the shortest deadline is, but they also use interrupts for the very short ones.) The SW works, of course.

Almost all my applications are in that class, so I see them as more than a few.

You still have "multitask issues" between the interrupt handlers and the background tasks.

But I do agree that some applications can be implemented as master-loop or major/minor-cycle loops without excessive complication.

In the MIXS/SIXS example I mentioned above, the designers had to slice several SW loops, which would have taken too long if executed in one job from start to end, into pieces executed in several jobs, with some variables keeping track of how far the loop has progressed.

For myself, I would be more worried about errors in such algorithmic distortions than about bugs from multi-task issues.

A concern for real-time responsiveness of the background part.

Perhaps this is a matter of definition. I would call something a foreground-background system only if it implements _all_ its real-time deadlines (whether hard or soft) in interrupt handlers, and the background task only has to provide enough average throughput to avoid a build-up and overflow of background jobs.

30 tasks is common for me. If I would have to divide them into "non-blocking" operations to be called from a master loop, and limit myself to very simple interrupt handlers, it could grow to more.

There are people in my domain (space SW) who take that approach, too. But it has drawbacks...

I was recently involved in a microsatellite design where the system designers at first intended to have two Linux-based, two-core, high-speed boards with nearly 100 MB of RAM, one for application (image) processing, and the other for system control functions. But when they added up their electrical power needs they found a problem... and the system-control computer was reduced to a Cortex microcontroller with 256 KB of RAM. But it still had to run a CAN-bus and two UART interfaces in parallel, as well as execute a time-tagged schedule, as well as monitor its own health and other data, etc. Python would not do, I think.

...

I would say, if you have mastered writing interrupt handlers, and the interactions between interrupt handlers and background tasks, you will surely be able to master pre-emptive tasking, where one can use high-level primitives (message-passing, rendez-vous, ...) which are often not allowed at interrupt levels.

Good, then. But it seems you do not quantify "good".

I haven't programmed such devices, or any Internet-connected devices on those levels, so I can't comment. My experience is limited to the "socket" level.

With all the security bugs and problems that have been and are being found in network protocols, it seems wrong to add such unnecessary complications in the algorithms and usage.

It is often used, sure, but I always wonder why, given that free, small, efficient, real-time, pre-emptive kernels are so easily available today. I can understand it for systems that are really squeezed for RAM or flash space, but not otherwise.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

From cooperative to preemptive scheduler: a real example

Join the Discussion

Didn't find your answer?