There are all sorts of preemptive kernels, RTOS's, frameworks, outthere, open source and commercial. There are priority-drivenpreemptive systems, time-sliced preemptive systems, and others.But preemptive systems have their costs, namely context switchingtime, and context storing RAM. Even cooperative coroutines havecontext switch and space overhead.On the other end of the scale is pure cooperative multitasking, asexemplified by the classic "super loop":while (1){ task1(); task2(); /* ... */ task99(); task100();}In most cases, each of these task calls some lower level routine(timer, communication interface, whatever) to see if there's actuallyany work for it to do. If not, is just returns. The advantage, ofcourse, is that there is no context overhead, either space or time.This can be detrimental to response time if an event comes in for atask just after it has had its turn in the super loop. It won't seeits event until after every other task has been called, whether theother tasks really have any useful work to do or not.What I need, and I am planning to develop, is a cooperative taskscheduler to replace the super loop. Inter task communication, andISR to task communication, will go through a kernel to set eventflags, timer flags, put messages in queues, etc., and the kernel willtrack which tasks are actually ready because they have pending events.After any task returns, the scheduler will execute the highestpriority ready task.My question is, does anyone know of any kernel, RTOS, whatever, that'simplemented this way? I'd look at the documentation and API detailsfor any that exist, not their source code.Knowing whether or not this wheel has already been invented will behandy when I reinvent it.-- Jack KleinHome: for

=== SNIP ===Salvo? -- Dan Henry

Has this wheel already been invented?

S

Stephen Pelc 18 years ago

Tee, hee. But actually we did write the manual in Forth. The manual is in the form of formal comments in the source code. These are then extracted by our DocGen tool and processed to generate PDF and HTML versions of the manual.

We're big fans of literate programming.

Stephen

Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads

Vote

M

Mark Borgerson 18 years ago

I can't imagine any but the simplest system doing serial I/O without interrupts. Even my simplest 'super loops' just look in the queue for the data that the interrupt handler has put there.

The standard problems with preemptive RTOSs on small systems generally end up with the amount of memory needed to save the machine state for each task. On an ARM, that could mean stacking and restoring

64 bytes plus whatever is used for local variables for each task.

A super loop, with a combined variable stack, need only save and restore a smaller subset of the registers as each function is called, so the memory footprint is probably smaller.

My most recent project combined the super loop and a 'scheduler' interrupt to handle some moderately complex interleaved data collection:

The main loop looked for input in the serial Input queue and processed commands as needed. It also checked for input data in the input queues and processed and stored that data as needed.

Serial I/O was handled with standard interrupt-driven queues.

Analog input was collected and time stamped by a state machine called from a 50Hz timer interrupt. The state machine broke the ADC collection into a series of request-data and read-data operations. The data collected was put into a statically allocated memory queue, from whence it was fetched by main loop.

The longest-running state of the timer ISR took about 400 microseconds---or about 2% of the 20mSec between interrupts.

ADC collection was put in the timer ISR to maintain jitter-free data collection, as writing the data to a CF card can sometimes take more than 20mSec at those times when the FAT needs updating. Converting and displaying a subset of the values in floating- point format also sucked up the odd millisecond here and there! ;-)

This was on a 68K processor at 16MHz. Memory size was not an issue. The lack of an easily ported RTOS for the particular hardware did mandate some variation of the super loop approach.

Mark Borgerson

Vote

M

Mark Borgerson 18 years ago

When I'm in the mood to save power, I generally turn down the CPU clock speed in the main loop and turn it back up when there is data to be moved.

I've also used the clock interrupt to allow the system to save power like this:

while(!done){ SleepUntilClock(); // low power sleep until clock tick while(DataAvailable()){ GetInput(&mydata); ProcessOutput(&mydata); } }

For this type of loop, you probably want a clock that ticks more than once per second! You also need interrupt-driven input handlers that are smart enough to go to sleep when they are done with the input. This approach is probably not suitable for very high speed data collection---where the collection takes more than

10 to 20% of the CPU bandwidth.

Mark Borgerson

Vote

S

Steve at fivetrees 18 years ago

Er, yeah. See

formatting link

;). I'm practically an expert ;) ;).

Eeep. That does indeed sound like a PITA.

The "discipline" thing about cooperative multitasking is that each task/process must relinquish control back to the superloop/roundrobin every N milliseconds, where N is the worst-case latency you want other tasks to have. Most of the time, this can be tens of milliseconds or less; but even this means that lengthy processes have to be broken up into state machines. This is a Good Thing. It means that you have to think more in system terms than in "it's all mine" terms.

However, counting cycles sucks. Someone really made a poor architectural decision on that one. Please don't let that bad experience colour your thinking.

Steve

formatting link

Vote

S

Steve at fivetrees 18 years ago

Wot he said. Mark and I clearly could work together quite happily ;).

More seriously: no, I don't mean that I'd put delays *anywhere*. What I meant was that I'd ensure that higher-priority tasks couldn't DoS tasks which, although lower priority, still need to be run regularly. And, as Mark said, things like serial and "heartbeat" (timer) interrupts make life much rosier.

Steve

formatting link

Vote

M

msg 18 years ago

I am currently working on a DSP with a deterministic cycle-counting, time-slicing hard-realtime o/s; it does have provisions for 'background' non-deterministic processes, but the hard-realtime architecture is actually a refreshing change for me. I enjoy knowing the timing precisely for every task and every state vs. the complex considerations of timing in a pre-emptive system. I have no worries about jitter or vagaries of ISR timings. It is a different paradigm for me but it is elegant and simple.

FWIW, I would have liked to program the Singer System-10 which also was a hardware round-robin, hard interval real-time architecture.

Regards,

Michael

Vote

M

Mark Borgerson 18 years ago

For sure. I suppose that I'm set in my ways since I was teaching the advantages of interrupts to CS students in '84 and using them as part of all my projects since before then.

Two of the most important things I do with a new processor are

Get interrupt-driven serial I/O working.

Get the timer ISR working to give me a regular interrupt---usually with an LED blink to confirm its operation.

#1 was a bit of a challenge on the AT91SAM7 series, since I wanted to use the DMA capabilities of the chip.

#2 is a key element if you need jitter-free data acquisition. (However truly jitter-free operation can require a thorough knowledge of interrupt priorities and masking if there is a lot of data moving through the serial port implemented in #1.)

This approach fills a large void between a foreground-only polling loop and a true RTOS. It's worked well for me for a couple of decades, but I'm not totally resistant to change. I have the Micro-C/OSII book and disk on the shelf waiting for a few free weeks.

Mark Borgerson

Vote

D

David R Brooks 18 years ago

An interesting machine, indeed. More data at

formatting link

Vote

U

Ulf Samuelsson 18 years ago

The hairiest think I worked on was processing an incoming 2 Mbit E1 containing 32 timeslots of 8 bits on a 16 bit HPC16400E. (Basically a 3 MIPS machine)

The program used 32 tasks, each implementing a state machine. No buffering involved, so you had ~4 us to complete the task, including the task switch.

That used a Round Robin scheduler.

Best Regards, Ulf Samuelsson This is intended to be my personal opinion which may, or may not be shared by my employer Atmel Nordic AB

Vote

U

Ulf Samuelsson 18 years ago

while(TRUE) {

while(!(task_state && ANY_COMPUTABLE)) { sleep(); /* wait for interrupts */ } cur = get_next_task(); (cur->task)(); } }

is faster than

while(TRUE) { task00(); task01(); task02(); task03(); task04(); task05(); task06(); task07(); task08(); task09(); ... task97(); task98(); task99(); }

end is not pre-emptive, and does not need any context switch. Only tasks that needs to be run are run.

The formers is especially effective when you run the task with a variable frequency. I.E: you could use a task to process an incoming data stream over a serial port, a character at a time. You have no control over when the task needs to be run. The interrupt routine will just receive the data into a buffer and make the task computable keeping the interrupt routine short. Other tasks set a S/W timer which in some cases needs to trigger the task at high frequency, and in other cases needs to trigger the task at low frequency.

The simple round robin strategy sucks if you try to do the thing above.

Best Regards, Ulf Samuelsson This is intended to be my personal opinion which may, or may not be shared by my employer Atmel Nordic AB

Vote

P

Paul Carpenter 18 years ago

MOST of my comments below are aimed at the majority of the contributors to this thread.

Ulf just happens to be the one being followed up, instead of multiple repeats.

Assumption more generalisation

All embedded must save that much power, as the unit may be mains powered or has to be running at all times.

All architectures and applications will not have a problem with 'sleep' or 'halt' instructions stopping half of the peripherals on their microcontroller.

This implementation *can* also be used for round robin scheduling as no doubt the tasks exist in some list. Depending on application and architecture even the sleep can be incorporated.

get_task could be any form of task list(s) scanning.

It does not exactly look that preemptive to me.

Compared against the worst example of round robin scheduling, that I have never seen actually implemented.

Round Robin actually covers many different schemes, including multi layered and time delayed.

ALL TASK scheduling involves some form of list of tasks which have to be scanned

That can be done even on round robin.

If the architecture supports the low frequency without delays and potential character loss, when changing frequencies.

That is true of any asynchronous input to a system, want to try it with multiple video standards analog and digital format simultaneously.

So you are basically describing most regular timer driven events that may or may not be directly interupt driven.

That is an implementation I have never seen implemented on many systems.

The simplest round robins have time delays of how many time ticks or I/O wait state checks have to be done. Often I implement round robin as two tables

Table 1 task start points

Table 2 Time count and flag (flag is positive for active 0 for stopped -ve for errored status)

This way any tasks needing a state machine, have the flag variable for the state machine 'counter'.

At each time tick the priority is sorted by order in the list, if this time tick requires the task to be processed (count match and valid flag), then the matching task entry is called.

It could be possible to use the sleep/halt wait if time was left over, but often this adds unnecessary complexity. THE ONLY DIFFERENECE SHOWN ==========================

So far everyone has gone on about power saving, claiming this was about the scheduling that is performing this. Which is complete bunkum!!

The power saving is all about the idle task and if it can execute halt or sleep modes.

This may or may NOT work depending on

Application Architecture

If for example nothing happens on the processor but the bus needs to arbitrated for DMA or other inbuilt peripherals still need to run most architectures require the CPU to be in AWAKE mode to perform this and many architectures do not actually have a HALT instruction.

Paul Carpenter | paul@pcserviceselectronics.co.uk PC Services GNU H8 & mailing list info For those web sites you hate

Vote

U

Ulf Samuelsson 18 years ago

From Wiki:

"Round-robin is one of the simplest scheduling algorithms for processes in an operating system, which assigns time slices to each process in equal portions and in order, handling all processes without priority"

-------------------------------------------- This is the same definition I am using. If anyone uses another definition of Round-Robin, then arguing with me on pros/cons is a waste of time, because we are talking about two different things.

Best Regards, Ulf Samuelsson This is intended to be my personal opinion which may, or may not be shared by my employer Atmel Nordic AB

Vote

R

Robert Adsett 18 years ago

I would have considered both of the above round robin, non preemptive, run to completion systems.

Robert

Posted via a free Usenet account from http://www.teranews.com

Vote

E

Ed Prochak 18 years ago

Some things I believe need to be made clear:

There is the multitasking model preemptive versus cooperative

and there is the scheduling algorithm priority driven and round robin are two examples.

I am trying to get some thoughts together on this (and been back to work so time for those thoughts is limited). I'll post something soon.

Ed

Vote

M

mhahn 18 years ago

Take a look at csRtos. Originally developed for the atmel AVR by Glen Worstell, I ported a version to the PIC (16xxxx and 18xxxx families under CCS and Microchip C). Uses a lot less resources than a preemptive task switcher and allows for task priorities. It's a lot like the Salvo OS but has the big advantage of being free. The biggest restriction is you can't call OS subroutines from a subroutine, only from the task level, but that's not too hard to work around. I like using it because you wind up with simple linear code instead of large messy state machines (no flamewars please :-) , I still like state machines).

If interested, google for csRtos. You should find Glen's original article, source code for the PIC I posted to the CCS sourcecode forum, and AVRfreaks seems to have a more advanced version.

Mark

Vote

W

whygee 18 years ago

oh s*it, it looks like i have reinvented the wheel once again, but this time i figured out the hub :-)

more precisely, i (still) have to write some firmware for some machine, around a venerable 16F877 (with ALL the inherent limitations, grumblgrumbl) so a home-made "multi threaded kernel" is needed. I have a "working" system (proof of concept) with 3 independent threads, using less than 400 instruction words and i have no state machine anywhere :-) it's round-robin (though a thread could be added and removed dynamically, since the threads' linked list is in RAM), cooperative (NOT preemptive, but some IRQs communicate with some buffers) and i use many assembler tricks and lots of macros to make the code as small as possible, yet useful. I have a 1KHz (soft-)timer that can wake up threads. However, my application does not need priorities so i didn't implement them.

More interestingly, i have overcome one painful limitation. Any thread can call a routine which can yield(), and this routine can also be executed by another thread : there is some support for code reentry. Here i quote from the csRtos source : > //1) Operating System calls that may result in task switches can occur > // only at the top level of a task. That is, you cannot make os calls > // from subroutines called within a task. so i have addressed that. My technique (which is maybe already used by others ?) works in asm and not C (or the compiler would DIE). In fact, i use dirty hacks to circumvent PIC16F's single hardware stack, at the expense of some code bloat.

Finally, my client accepted that i release this multithreaded kernel) to the public (because it can be split easily from the rest of the application). So i'll certainly post it on my website when it's ready.

happy hacking,

yg

Vote

V

vinnie 18 years ago

formatting link

I second both of these recommendations.

You can get full source code for the SST (now called QK) at the web sit run by one of the authors of the article (Miro Samek)

formatting link

Vote

Has this wheel already been invented?

Join the Discussion

Didn't find your answer?