For coperative multitasking, consider also Ethernut. It first started with the AVR8, so it has 8-bit in mind. It also include AVR32 and several ARM. I have done a lot on STM32. As STM32 peripherals evolved from STM8 peripherals, work for the STM8 can perhaps profit from this work. However I must admit that my work on the STM32 peripherals didn't have STM8 in mind.
Adapting ethernut for SDCC probably will need some work. But the comon code already cared for non-gcc compilers.
I will comment that RTOSs tend to use up considerably more memory and clock ticks than the usual "ISRs and a task loop" scheme that you see in
8-bit processors. In this day and age, if you're cost-constrained enough to use an 8-bitter, I would expect you'd be willing to go the extra mile and not use an RTOS, to save the expense of the more capable processor you'd need to host it.
Right you are, although the differences between run to completion and cooperative are quite subtle. "Run to completion" is really a strategy based on a cooperative multitasker or a "big loop" ( meaning "no bespoke O/S") architecture.
You simply manage it. It's much less horrible than it sounds.
You set things up to where each run through a thread/task/routine must complete within a rigorous time budget.
You can also set things up to where there's global state that indicates that the high priority stuff needs to run.
It all depends on how deterministic you want the system to be. If there's no preemption, you can have the advantage of more deterministic behavior.
And yes - I'd be perfectly happy using (software) interrupts, perhaps triggered by a hardware/timer ISR to enable high-priority processing without the benefit of a kernel.
And rather than a full-on kernel, you might just have something equivalent to the Linux "ucontext()" call to separate the background processing from the high-priority stuff. But you still need to get in and get out at all times. If you're *careful*, you can enable the "ucontext()" thing to switch in middleware driven by a timer/hardware interrupt.
Just about anything that can be done with a kernel can be done without one, up to a limit where it just gets too ugly to look at.
In reality, if you can get posix threads working, it's hardly worth worrying about. But on a small 8-bitter, you might need to think differently.
On 8086/8088/80286/80386, it was not uncommon to fake multitasking on top of DOS, whether with Terminate and Stay Resident or just by butching together a threading library which respected the DOS Critical Section Flag.
This was extremely deterministic, down to the jitter on the various clocks.
"Why on Earth would you do that???" you may ask - well, it was nice to have the programming just about done while the hardware guys did whatever it is they do. And once you got the basic infrastructure in place, it was eminently useful for things like test fixtures, or even products sometimes.
I have successfully used preemptive operating systems for 40 years.
Like Windows 3.x environment :-) :-)
The first thing in RTOS system design is to divide the application to manageable tasks, estimate how long each needs to run and assign priorities to the tasks.
If there are performance problems with some tasks, never try to increase the priority of the starving task, instead look for other tasks, which priority can be _lowered_ without suffering too much, if interrupted by a short run time high priority tasks.
Use a server task for each resource such as serial port or shared data structure, so not much need for object locking (with priority inversion etc. issues).
Another middle way that I often use is a three layer system. At the bottom, you have the main loop, which is where you have the longer-term tasks (they can be just as critical as anything else in the system, but not with critical timing). And at the top, you have your interrupts for immediate response (motor control, UART buffers, etc.). So far, that's "standard bare metal".
Now in the middle, you have a layer of "software timers". These are handled from a timer interrupt at a regular rate (such as 100 Hz or 1000 Hz, depending on the requirements and the hardware). The timer interrupt first makes sure that other interrupts are enabled, so that the critical interrupts can break in. Then you run each of the time-dependent or repetitive tasks as frequently as needed, from within the same context.
It makes it easy to have tasks that are shorter but more time-critical than "background mainloop" tasks, but not interrupt functions. It makes it easy to know exactly what can pre-empt what - interrupts can pre-empt software timers and mainloop, and software timers can pre-empt mainloop
- but you can never get pre-emptions the other way, or between software timer tasks. Sharing data between tasks becomes easy - between software timer tasks, you can have global data. Up and down between mainloop and software timers or interrupt functions, careful use of "volatile" and perhaps memory barriers is sufficient - no need for locks, queues, shared data, etc.
And no need for multiple stacks, race conditions, deadlock, or the dreaded "it worked fine in all our testing, but fails in the field" challenges of using an RTOS.
You laugh, but Windows 3 was perfectly capable of HRT in the 10s of milliseconds range. On fast 80486, interrupt response to userspace tasks and [cooperative] task scheduling both were < 5us.
Between 1993 and 1996 I was implementing HRT machine vision QC/QA systems on Windows 3.1, running first on 80486 dx4 and then on Pentium. These systems had multiple tasks to perform with hard cycle times ranging from 150ms - 900ms.
There was additional hardware to do image processing heavy lifting [the CPUs were not capable] but that hardware required constant attention: it performed a single vector or kernel operation and then stopped, waiting for instruction. I used multiple processes running under Windows to control multiple hardware sets for different purposes.
The largest system performed inspections on 180 parts (30 each for 6 cameras) every 900ms. It had 6 sets of imaging hardware and 3 I/O boards for interfacing with the part conveyor. Each hardware set consisted of 5 boards: 2 slotted with 3 daughters attached, linked by a private bus. Needed a 20-slot chassis to fit them all. 6 Windows processes performed control and processing, and a 7th provided the GUI for operator control, logging and graphic display of results.
Win3 processes gave way to multi-threading in Win95 and then NT4, and finally the CPUs became capable of handling the image processing and the separate imaging hardware was eliminated.
These systems were sold commercially in the USA and Europe until mid
2004. AIUI, some still are available today from distributors and on secondary markets - the manufactuing equipment they were designed to work with still is in use in parts of the world.
Assuming that you used the message queue mechanism, in which all messages are sent to a single message queue and then dispatched to a specific "task" for processing. In that case, each "task" must have a maximum allowed processing time and if the operation could not be completed in that time, each "task" was required to submit a new message into the main message queue for the next step of a complex operation. A bad behaving program might not do that, but with all programs in your own control, this should not be a big issue.
This was very similar to early IBM mainframe CICS data entry system and similar features can be seen in web-servers (context saving, cookies) today.
Indeed. Come to think of it I remember doing something on the 6800 with a lot of tsx/txs (for those unfamiliar with the 6800: tsx sort of "transfers" sp to x ) but I am not sure what it was.... May be it was a waterflow/level meter, can't remember.
I did this on the HC11 but then one at least had that Y register and D-accumulator (IIRC, or was D on 09 only - no, I think it was there on the 11), this made things much much easier than on the 6800 where retrieving an address from the stack and modifying it was outright impractical without doing it via some swi call.
Oh yes, that's how I do it even today on smaller mcu-s (e.g. the mcf52211, with its 16k RAM which is a lot compared to the 512 bytes I had on the HC11 where I was doing it the same of course). I am trying to perceive that 80 MHz 68k part with 16k RAM - which is close to an oldie nowadays - from what must have been my 80-s point of view and I am not sure I manage...
The image processing and I/O boards were interrupt sources. The image hardware was single step: it performed one operation generating a completion interrrupt and then stopped waiting for new instruction. The I/O boards could interrupt on any signal - depending on environment they would be set to interrupt on part-in-place enables and/or strobe or frame-grab triggers.
I turned those interrupts into Windows messages to effect task scheduling: a task would initiate its next image operation, set a timer and yield waiting for the completion message. Time/operation was calculable by # of pixels and was guaranteed, so if the timer expired the task aborted because that usually meant the (very static sensitive) hardware was confused and would have to be re-initialized.
I reprogrammed the 8253 so that GDI software timers could be set down to 10ms with reasonable accuracy. I didn't care about the clock running fast, although a custom driver could have fixed that.
At the top level, messages were exchanged for configuration, changing modes, etc. - but the tasks performed their cyclic inspection work autonomously.
In Windows 3, tasks had no memory protection - they all were in the same space and could RPC each other's functions directly with little overhead (essentially just a far call segment change). My design exploited that mercilessly. The GUI task actually held all the code to drive the hardware, using it for off-line setup and diagnostic single camera inspections. The on-line display/inspection tasks effectively were userspace "threads" that RPC'd functions of the GUI task to do their work.
These designs ported quite nicely to Win95 with its real threads. Win95 needed a kernel mode reflection driver for interrupts, but nothing special was needed for timers: Win95 itself set the 8253 to
1ms resolution, and its GDI software timers could reliably be set down to ~16ms which was quite sufficient for the "watchdog" purpose.
Had to implement a timer driver for NT4, but that was the last OS that hosted the image hardware. Around 2000 the CPUs finally were fast enough to (re)implement using SSE [though some configurations required using a dual processor to achieve their timing].
I think you'd want to rephrase that. If system behaviour is unaltered by task priorities what are they for? I believe I can guess what you mean but I am not sure it is obvious enough to everyone.
Say you lower the priority of the task tracking the mouse pointer; below some point this will inevitably become noticeable. OTOH, as upsideddownunder suggests, the right thing is to set its priority very high knowing it takes a minute amount of system time so it always does get it when it needs it at an acceptable latency.
It certainly can be that - and it probably is that in the vast majority of cases. But the generalization is just wrong.
But we can never eliminate priority out of our lives no matter what we do, same with our systems. They may be designed with fixed priorities but they are there.
I personally do have use for varying priorities under DPS - for example in the netMCA there are various tasks some of which use a lot of the CPU resource - e.g. the filtering task can take up well above 50% of it depending on the incoming signal. Then there are the VNC server tasks which check the entire "display" framebuffer memory for changes, compress and send when detected; then there are the ethernet/ip inbound tasks which have to cope with all the traffic... And then there is the user I/O which must remain quick enough all the time regardless of what the incoming signal to process is, what the network activity is and what the disk activity is. You simply have to set the priorities right for everything to work smoothly and they are not equal for all tasks. Of course the DPS scheduler is a completely different animal from what you may know from windows.
If you mean changing priorities within reason could kill the system I would agree with you I suppose.
Not many things if any are better than simplification but it is simply bound by limits like anything else. Try to simplify the netMCA I talked above to equal priorities and it will become a lot more complex overall. Two priorities - may be but more like may be not, then how much simplification will come out of processing one bit vs. 16 bits for priority. And once you have a good tool you just use it.
One of my rules-to-thumb is to ban fiddling with priorities in hard realtime systems - because it usually is fiddling. When pushed I'll accept "high priority" and "normal priority", and can eventually be forced to concede a little more flexible.