That latter "job runs on a thread" is precisely the structure I've used, where a "job" is to process an event - and that processing can involve multiple machines made by companies/computers I don't know exist :)
In realtime systems I've never had a case where a thread was spawned for a job, and then discarded. I'd be highly suspicious of any such architecture.
I dislike such architectures; it can be difficult to predict/monitor/log how computations are progressing - or more accurately not progressing. Logging FSM events and FSM states is a very powerful tool, and the mathematicians have spent a lot of time/effort in understanding and modelling their behaviour.
With 2-4 tasks, that is at least 32 bytes of stack per task. This needs to fit the task subroutine return addresses and space for saving task context (such as program counter, index reguster(s) and accumulator(s)) and additionally local variables used by the ISR. Should be doable, since subroutine parameters can be passed through the remaining 384 bytes.
A simple RTOS just needs a fixed table for each task in priority order in which each element contains the task state and saved stack pointer.
When e.g. the ISR wants to activate a specific task, it simply sets the target task state to READY. After that, scan the task table and find first task in READY state, load the saved stack pointer and execute a return from interrupt from the new stack to restore that task context.
Routines for e.g. sending and receiving messages between tasks is just syntactic sugar :-).
At least this one juror decides in favour of David B.
All programs are state machines, but the "state" can be represented in various ways: as data ("reified") or as control flow ("sequential", "blocking"). Which is better depends on the problem to be solved; usually I end up with a mixture.
Sometimes system constraints force one to use more data-state than is optimal, and the code becomes an awful mess. I have in mind my last project but one, where the SW controls several devices over a MIL-STD-1553 bus, with a cyclic, frame-based schedule, running several sporadically activated concurrent activites, each of which usually requires several carefully timed bus commands and responses, spread over several bus cycles.
The nicest design would dedicate one thread to each such activity; after sending a command, the thread would wait (block) for the response, and would know, from its position in the algorithm, what to do with the response and what command to send next. But system constraints prohibit this number of threads, and thread switches, so all the state of the activities is reified into multiple state machines, activated once on every bus cycle, with switch/case statements that change the global state, increment counters, detect end of loops, retry failed commands, and so on and on. Yuck.
That applies equally in the other direction, of course.
The discussion has been interesting, but I really don't think either you or the paper's author have done yourself justice here.
I have not seen any clear arguments from you about /why/ you think event-driven threads with state machines are so much better - "proof by repeated assertion" does not wash. I have not seen any counters to my alternative suggestions, nor any solid reasoning why strict event-driven threads with state machines are somehow easier than more flexible solutions. And I certainly have not seen any good argument for why using /one/ tool should be so much better than having that tool as an option amongst several.
I am of the opinion that having more options lets you pick better designs. (I actually think that in a great many cases where people use RTOS's, they would be better with a simpler non-OS design. I am not a believer in "everything should be RTOS".) But despite my clear opinions here, I think I could have come up with more and better arguments in your favour than you did.
Still, as I say, it has been an interesting discussion in many ways, and it is good to see this sort of thing in the newsgroup. It has been too quite for too long.
I have never designed a chip - though in my teens I once designed key parts of a simple 4-bit cpu with a small instruction set. I drew it all out in two input NAND gates, on graph paper - including the single-cycle multiplier.
They are a way to get convenient coding structures from very little software or hardware. I haven't had a use for them myself, but I suppose some people use them.
Mine was 6800 based. Just about everything was "suboptimal" - except that it worked and I learned a heck of a lot.
I later designed a single-purpose machine using 2900 bit-slices, but it was never implemented.
That's my (lack of) experience, but the requirement that all context switches are in the top level code (i.e. not a function) doesn't strike me as convenient. I know why it is "necessary", but that doesn't change the inconvenience.
Back in 1982 I was using cooperative multitasking with C threads on a Z80 (or PDP11 or whatever was convenient), with a little bit of assembler to save/restore the stacks and other context. That always seemed pretty natural - provided I used message passing (with timeouts) for large-scale flow control.