Task priorities in non strictly real-time systems

- P
- pozz
  
  Contact options for registered users
posted
4 years ago

Fri, Jan 3, 2020 1:41 PM

I have always worked on non real-time systems, i.e. when the reaction of the system to an event should occur in a reasonable time. For example, when the user presses a button, a motor should start rotating. If the motor starts after 100us or 100ms is not so important.

I never used RTOS, so the architecture of my firmware is based on "superloop technique" (background tasks running in the loop and interrupts).

while(1) { task1(); task2(); ... }

All the tasks don't block, ever. As a rule of thumb, I accept a block time of maximum 100us-1ms. When the task needs to block for a greater amount, I try to implement it as a state-machine, avoiding blocking.

The ISRs are very lightweight: they only set/get some flags or push/pop a byte to/from FIFO queues. With 32-bits MCUs (modern Cortex-M MCUs), I can change 32-bits variables (mainly the ticks of the system) in ISRs without caring of race-conditions that could occur on 8-bits MCUS when the background tasks access the same variables.

In the past I used this architecture with success even in medium-complex systems featuring Ethernet, lwip, mbedTLS, USB and touchscreen (emWin).

Sincerely I think this architecture is good enough for all non real-time systems, so I don't understand why to use a RTOS in those cases. However I need to use a RTOS (FreeRTOS) for the next project, because it is one of the requirement. It isn't a real-time system, but the RTOS is required.

I think I can convert my architecture to RTOS by creating a task for each of the function I call in the superloop and starting the OS scheduler. However now the task function can't return, so I can write it in the following way:

void task1_main(void) { while(1) { task1(); } }

task1() can be the *same* function of the superloop architecture.

I can assign each task the same priority: in this case, FreeRTOS will use round-robin scheduling, giving all the tasks the same opportunity to run.

Is it correct?

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jan 3, 2020 2:19 PM

RTOS's have their advantages and disadvantages. They can make it easier to guarantee particular timing requirements for high priority tasks, but make it harder for low priority tasks. They can make it easier to write individual tasks, but harder to write efficient inter-task data sharing. They can make it easier to modularise and separate the code, but harder to debug.

An RTOS is /not/ necessary for real-time coding. Conversely, an RTOS can be useful even when you don't need real-time guarantees.

You might be better using cooperative scheduling and :

void task1_main(void) { while(1) { task1(); taskYIELD(); } }

With cooperative scheduling, you know exactly when the current task can be changed - it can happen when /you/ want it to, due to a yield or a blocking OS call. With pre-emptive scheduling, you will have to go through your existing code and make very sure that you have locks or synchronisation in place for any shared resources or data.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jan 3, 2020 2:28 PM

... which can complicate the task's logical design, of course. That is the penalty you pay for the sequential-super-loop design.

And the same (state-machine) approach has to be used if a task contains some long sequential computation that must be divided into shorter stages (in which case the state-machine approach can have a much worse impact on the task's design). Again, that is the price...

[snip]

A much simpler solution is to create *one* task that contains the

*whole* superloop.

An RTOS does not *require* multiple tasks.

Probably not.

If your task() functions have some interactions through data (variables), your superloop design probably assumes that the functions are called in a certain order, or at least that they do not pre-empt each other.

I am not very familiar with FreeRTOS, but to get round-robin scheduling of tasks of the same priority you probably have to insert "yield" points in the tasks (as David Brown's message explains) or you have to enable a pre-emptive time-slicing round-robin scheduling.

Neither approach gives you a predictable ordering of the task calls, and the time-slicing approach moreover allows the functions to pre-empt each other, which probably messes up the data-flow of your program.

If the superloop design gives you sufficient performance and reactivity, just encapsulate it in a single task.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .

- R
- Rick C
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jan 3, 2020 7:49 PM

This is why I prefer to code in an HDL where parallel tasks all run concurrently with no swapping of memory or sharing of resources. Life is so much simpler that way and systems are so less trouble prone.

--

  Rick C. 

  - Get 1,000 miles of free Supercharging 
  - Tesla referral code - https://ts.la/richard11209

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jan 3, 2020 8:51 PM

When you have priority based pre-emptive kernel available, why put everything on a single priority and run round robin between them ? Some RTOS systems do not support round-robin but each task must have a different priority.

Designing some RT application is not hard.

First you check how time critical each task is and how long it executes. For non-critical or long execution time tasks assign a priority level as _low_ as possibly. When most long execution time tasks gave received some low priority, they are only a few tasks left that will execute at high priority. Conceptually interrupt service routines is just a task with highest priority, but of course the ISR implementation is different.

If there are time critical but long execution time tasks, better split it into two tasks, the long execution time task and drop its priority and remaining time critical but short execution task can have a high priority.

Most of the time, tasks should just waiting for some event, such as an external event or a message from an other task. In a typical RTOS application, often all tasks are just waiting for an event and hence consume no CPU power. Now and then a task becomes runnable, does its job and then starts to wait for a new event.

In rare occasions a low priority task becomes runnable. After a while a high priority task becomes runnable, pre-empts the low priority task, the high priority task does its job and waits for new event and the interrupted low priority task is resumed and runs to completation.

This is quite similar to systems that allow nested interrupts. Of course nested interrupts can be avoided by moving the long execution ISR function to a high priority RTOS task.

- J
- Joe Chisolm
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Fri, Jan 3, 2020 9:34 PM

It's been a couple of years since I used FreeRTOS. I downloaded the latest version recently but have not had time to look at it. FreeRTOS gives you multiple ways to control a task - semaphores, timers message queues. The main entry point of your firmware is going to setup the various task and then start the scheduler. The scheduler will never return.

The scheduler is going to start each task. The task will do some initialization, if needed, and then sit in the loop. The preemptive scheduler can be a little tricky. You have to watch for task critical sections. If tasks have a natural blocking point on a semaphore, timer, queue you can use the non preemptive scheduler. Tasks can also yield. If I remember correctly there are APIs to notify semaphores and queues from within a ISR. You have to use the correct ones. A ENET ISR can do some house keeping and notify the semaphore for the ENET driver task. Blinking a led is a simple matter of the task setting the led on, wait on a timer, set the led off, wait on timer. FreeRTOS takes care of all the work of deciding when to wake up the task waiting on the timer.

There are arguments for and against RTOS but you said you had to use FreeRTOS. I think you will find it's easy to use .. BUT ... Read the docs and have them handy when working on your code.

--
Chisolm 
Texas-American

- R
- Richard Damon
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sat, Jan 4, 2020 2:46 AM

I would likely make the loop:

while(1){ task1(); taskYield(); }

so that after task1 finishes what it was doing, you automatically roll to the next task rather than looping task1 for a full tick period, then switching to task2(), and so on. If the tasks naturally block/yield within them, then that isn't as needed.

Also, at that point you can undo all of your state-machining that was just to return and then resume where you left off. Which may make the code clearer.

The one difficulty is that, as has been mentioned, when two (or more) tasks share data, you need to add the needed protections to the access as they can now be effectively simultaneous. (Not really if it is a single core processor, but you can switch from one task to another at virtually arbitrary points)

As others have said, you can perhaps start with a single task written as before, and as you find the reasons they wanted an RTOS, move just that part into a separate task, maybe at a higher priority level.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 12:25 AM

Yes, I agree upon everything.

You're right, cooperative scheduling is better if I want to reuse the functions used in superloop architecture (that is a cooperative scheduler).

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 12:37 AM

Yes, converting a blocking task in a non-blocking state-machine task can be hard, but it's complex to write tasks in a preemption schuduler (you need to know when to use locks, semaphores, mutexes and so on).

Yes, but this isn't a very frequent situation, at least in my experience.

I remember a similar case with mbedTLS where a session setup taked a very long time, because I didn't use a hw crypto engine. For this reason, mbedTLS guys added a macro to enable an "incremental" calculation (I don't remember now the exact name they used). In practice the big calculus is divided in chuncks.

Oh yes, this is another solution. However I'd like to migrate to a "full" RTOS approach step by step, and starting with the same tasks of the superloop should help on this way.

One after the other, I could work on each task and convert it in a "standard" task that blocks, if needed.

Only in cooperative scheduler.

This is the "default" FreeRTOS configuration.

See my arguments against this approach above.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 12:48 AM

Because assigning different priorities to ask avoiding deadlocks could be tricky for me, because I have no experience with RTOS. And because I don't really have higher-priority tasks or hard real-time requirements.

FreeRTOS supports round-robin in preempetive scheduler when multiple tasks with the same priority keep running for too long (see configUSETIMESLICING).

Yes, the theory is this. I have to apply the theory in practice and could reserve some surprises.

I am worried about deadlocks that could occur very infrequently, maybe when the product is on the field.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 12:58 AM

Il 03/01/2020 15:19, David Brown ha scritto:

As I already wrote many times, I don't have experience with RTOS and task sync mechanism such as semaphores, locks, mutexes, message queues and so on. So I'm not able to understand when a sync is really needed.

Could you point on a good simple material to study (online or book)?

For example, many times I have a serial channel where soma data are received. A frame parser decodes the "wire data" in variables, accessed by other tasks.

while(1) { serial_task(); // frame receiver/parser main_task(); // uses variables touched by frame parser }

Supposing all the variables are of type int (i.e., they are changed atomically in serial_task()), should I need to protect them with locks, because they are used by main_task() too?

I think lock isn't needed, except main_task() needs to have a coerent data values for all the variables (all variables with new values or old values).

- C
- Clifford Heath
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 3:26 AM

I wish I could, but it is actually a frightfully difficult subject. Basically it's the same as thread-safe programming. Only about 1% of programmers think they can do it. Of those, only about 1% actually can.

It's the 0.99% that you have to worry about. At least some of them for Toyota. Don't be one of them!

However, this difficulty is precisely why Rust was created. Although I haven't yet done a project in Rust, I've done enough multi-threaded work in C++ to know that the ideas in Rust are a massive leap forwards, and anyone doing this kind of work (especially professionally) owes it to their users to learn it.

If "int" is your CPUs word size, you are using word alignment, and you don't have multiple CPUs with separate caches accessing the same RAM, you're probably ok for individual variables. However you will come unstuck if you expect assignments and reads to be performed in the same order you wrote them. A modern compiler will freely re-order things in extremely ambitious and unexpected ways, in order to keep the pipeline flowing.

I cannot emphasise this enough. The compiler will do what it can to make your program do what it thinks you have asked for - which will NOT be the same as what you think you have asked for.

You need to understand about basic mutex operations, preferably also semaphores, and beyond that to read and write barriers (if you want to write lock-free code). It's a big subject.

Clifford Heath.

- P
- Paul Rubin
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 6:34 AM

I have found it simplest to have tasks communicate by message passing, the so-called "CSP model" (communicating sequential processes), rather than fooling around with explicit locks. With locks you have to worry about lock inversion and all kinds of other madness, and your main hope of getting it right is formal methods, like Lamport used for the Paxos algorithm. Message passing incurs some cpu overhead because of the interprocess communication and context switches, but it gets rid of a lot of ways things go wrong.

If your RTOS supports message passing (look for "mailboxes" in the RTOS docs) then I'd say use them.

The language most associated with CSP style is Erlang, which doesn't really fit on small embedded devices, but Erlang materials might still be a good place to learn about the style. Erlang inventor Joe Armstrong's book might be a good place to start:

formatting link

At the much lower end, you could check out Brad Rodriguez's articles about Forth multitaskers:

formatting link

and related ones at

formatting link

.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 9:21 AM

You may need some double buffering in one form or another.

Assuming you have a receiver byte buffer that can take a full serial message and a structure of integers that will receive the values decoded from the message.

When the serial task notices the end of a message. it immediately decodes the values into the integers in the struct. After this, the serial byte buffer is ready to start receiving the next message.The serial task can then inform the main task that new data is in the integer structure and main task can copy it to local variables.

Alternatively, if a serial byte buffer is not used but the received bytes are decoded into the integer fields in the fly, then a copy of the struct may be provided, e.g. after the last integer has been decoded, put the complete struct into a mailbox, if the RTOS provides mailbox support.

In both cases the main task has a full message transfer time to process a message, before it has to process the next serial message. If the main task is incapable of processing the messages in time, then the program is faulty at least in the hard real time sense.

Using volatile declaration and turn of optimization will help. Better yet, use small assembler routines to have full control of actual memory access.

t least with a small micro controller, simply disable interrupts for a critical section. Of course the critical section must behave like a real interrupt, limit the number of instructions and do not call any library routines.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 12:56 PM

All good points.

"Safe" languages like Rust can help for simple issues, but won't give any benefits of the more challenging cases. If you understand the basics of multi-threading, and have a good, careful development methodology, you won't have the kind of problems that Rust would help you with. Maybe Rust will help for some cases, but don't believe that it is a game-changer.

Yes. Simple reads and writes of aligned data that is no bigger than the cpu's word size will be atomic without any more effort. But complex accesses (like "x++;") are not atomic on most processors. And you don't have any ordering unless you use "volatile", or memory fences of some kind.

A key mistake many people make is to think that non-volatile accesses are also ordered by volatile accesses - this is, of course, untrue.

Well, it /will/ be the same as you think you asked for when you know what you are doing!

Yes.

Or he can use cooperative multitasking, and avoid many of these issues!

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 1:07 PM

Turning off optimisation is /never/ the answer! (Baring bugging compilers, of course.)

If your code "works with optimisation disabled", your code is /wrong/. In over 25 years in this business, I have never seen an exception.

Remember, there is no such thing as "disabling optimisations" - compilers can re-arrange code and apply whatever transformations they like, according to the C standards, with a total disregard for your choice of optimisation settings. These settings are guidelines, not part of the semantics of the language - the language and the freedoms the compiler has do not change (unless your compiler specifically documents the changes).

And even if you think it is a "workaround" that is good enough for now, you are creating a maintainability nightmare. Or worse - you are creating something that works fine during your testing and fails when deployed.

"Volatile", when used correctly, can be helpful.

Assembly routines for memory accesses are usually a bad idea - inefficient, inflexible and error-prone.

If you want a simple and relatively fool-proof system, all you really need are two functions (preferably inline) :

interrupt_status_t disableGlobalInterrupts(void); void restoreGlobalInterrupts(interrupt_status_t old_status);

These must both act as full memory fences.

Then you can put whatever code needs atomic behaviour within a critical section bracketed by these functions.

You need more work if you have other memory masters (DMA, second processor, etc.).

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 1:12 PM

Yes - message passing (whether asynchronous with queues, or synchronous with CSP style) is often a lot easier to get right than complicated locking mechanisms.

I've worked indirectly with Erlang (I made the microcontroller half of the system, in C, while someone else wrote the Linux half in Erlang). I was not impressed - he spend a lot of time figuring out things that should have been very simple. It is just one sample point, of course, and not enough to condemn a whole language - but it does mean Erlang is not high on my "languages to learn when I have time" list.

Far and away the most popular "CSP language" is Go, as I understand it. Another option is XC for XMOS devices, but that is hardware-specific.

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 3:46 PM

The problem is the C standard or actually the language lawyers (in most languages) who do not have understanding for multithreading or multiprocessors.

In any hardware platforms with at least memory location increment or decrement operation as on a single instruction performing read/modify/write memory access cycle is often quite hardly.

Even if you can't get an atomic R/M/W cycle, there are often similar tricks e.g. such as using the lock prefix in x86

Is this standard C in some recent standard variant ?

This has a lot to do with cache coherence.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 4:00 PM

[snip]

Another language with CSP-style primitives is Ada (the "rendez-vous" feature), although AIUI most embedded Ada programs currently being implemented use the alternative "monitor"-like primitives (the "protected object" feature), which can be used to implement critical regions, or CSP-like message passing, or buffered (queued) message passing, or for many other styles.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
4 years ago

Sun, Jan 5, 2020 6:08 PM

The main point of C11 is support for multi-threading and multi-processor systems. The standards, and the language lawyers, /do/ understand it.

The more advanced and progressive compilers support C11. Many embedded ones do not, but that is the fault of the compiler vendors, and perhaps of developers who don't realise that they should be insisting on it.

The big missing feature, however, is that you need an implementation of some of the functions in the C11 threading libraries, and the implementation must fit the OS in use. That's not too hard for Linux or Windows, but a different world in embedded systems. Still, it should be possible to make C11 library support for FreeRTOS, mbed, and any other RTOS you like.

Key points like atomics, fences, and language semantics for multi-threading are in place.

(And C++ is more helpful in providing higher level multi-threading features.)

So we are far from having nice multi-threading integration in C toolchains, but nearly as far as you suggest.

/Some/ hardware platforms that let you do "x++" as a single instruction on memory, do so atomically. Many others do not. Typically, they are atomic on small 8-bit CISC microcontrollers. On larger processors, you rarely get such instructions at all (they don't exist on any kind of RISC cpu). And even when you /do/ get them, they may be implemented by multiple separate actions. Perhaps they are atomic with respect to other code on the same core (such as interrupts), but not with respect to DMA or other cores.

So this kind of thing can sometimes be acceptable on target-specific code for small microcontrollers, but not otherwise.

You do that using intrinsics or C11/C++11 atomics. You certainly don't do it with "volatile".

No C standard covers interrupts, or ways to disable and enable them - that is highly target-specific.

For example, on the ARM Cortex-M, you might use:

#include "core_cmFunc.h"

typedef interrupt_status_t uint32_t; static inline interrupt_status_t disableGlobalInterrupts(void) { interrupt_status_t old = __get_PRIMASK(); __disable_irq(); return old; }

static inline void restoreGlobalInterrupts(interrupt_status_t old) { __set_PRIMASK(old); }

If you don't want to use the ARM core functions, you can use inline assembly - but that is compiler specific. For gcc, that would be:

typedef interrupt_status_t uint32_t; static inline interrupt_status_t disableGlobalInterrupts(void) { interrupt_status_t old; asm volatile ("mrs %0, primask" : "=r" (old)); asm volatile ("cpsid i" : : : "memory"); return old; }

static inline void restoreGlobalInterrupts(interrupt_status_t old) { asm volatile ("msr primask, %0" : : "r" (old) : "memory"); }

C11 provides standard support for a memory barrier, but since you need compiler-specific code for the implementation anyway, you might as well use the compiler-specific memory barrier.

That is one aspect, yes. But it is not the only one. For example, bigger processors can have write buffers with re-ordering.