C++ threads versus PThreads for embedded Linux on ARM micro

graeme.prentice · 2018-07-20T11:01:28+00:00

We're starting an embedded Linux C++ project with an ARM micro and using GCC V7. Can anyone suggest pros and cons of using C++ Threads versus PThreads (Posix threads).

S

StateMachineCOM 7 years ago

On Wednesday, August 1, 2018 at 2:46:08 PM UTC-4, snipped-for-privacy@downunder.com wr ote:

d already in the 1970's to implement real time systems under RSX-11 on PDP-

Later on these principles were also used on real time systems under RMX

-80 for 8080 and similar kernels.

I recommended Cummings's article only because most of the participants of t his discussion seem to be firmly in the "sequential programming with sharin g and blocking" camp. For this traditional way of thinking, the full immers ion into the reactive programming principles might be too big of a shock. C ummings arrives at these principles by trial-and-error, intuition, and expe rience. He also does not call the concepts by strange names like "active ob ject (actor) pattern" or "reactive programming".

But, yes, absolutely. The various pieces of reactive programming are being re-invented all the time. For example, even in this discussion, people reco mmended using heavyweight processes instead of lightweight threads. This re commendation is valuable, because it addresses two of the best practices: t rue encapsulation for concurrency and message-driven communication and sync hronization. Heavyweight processes run in separate address spaces, which ma kes sharing of resources hard and the only ways of communication are pipes or sockets.

But the main point is that we don't need to re-invent this wheel anymore. T he best practices of concurrent programming have been well known for decade s (e.g., the "Real-Time Object Oriented Modeling (ROOM)" book was published in 1994). The most amazing thing is that the principles are so little know n and are still being questioned and dismissed.

Vote

T

Tom Gardner 7 years ago

Yes, and many people had independently come to the same conclusion much earlier than that!

There was once an observation that C/C++ papers/publications tended to refer only to other C/C++ papers/publications, whereas those for other languages tended to refer to many different languages/environments. An implication is that there is a tendency for C/C++ practitioners to reinvent wheels.

I certainly noticed the phenomenon when I read Gosling's Java whitepaper. He repeatedly said X has been proven in M,N, and Y has been proven in P, and X and Y work together harmoniously.

Vote

S

StateMachineCOM 7 years ago

On Wednesday, August 1, 2018 at 2:46:08 PM UTC-4, snipped-for-privacy@downunder.com wr ote:

d already in the 1970's to implement real time systems under RSX-11 on PDP-

Later on these principles were also used on real time systems under RMX

-80 for 8080 and similar kernels.

I recommended Cummings' article only because most of the participants of th is discussion seem to be firmly in the "sequential programming with sharing and blocking" camp. For this traditional way of thinking, the full immersi on into the reactive programming principles might be too big of a shock. Cu mmings arrives at these principles by trial-and-error, intuition, and exper ience. He also does not call the concepts by strange names like "active obj ect (actor) pattern" or "reactive programming".

But, yes, absolutely. The various pieces of reactive programming are being re-invented all the time. For example, even in this discussion, people reco mmended using heavyweight processes instead of lightweight threads. This re commendation is valuable, because it addresses two of the best practices: t rue encapsulation for concurrency and message-driven communication and sync hronization. Heavyweight processes run in separate address spaces, which ma kes sharing of resources hard and the only ways of communication are pipes or sockets.

But the main point is that we don't need to re-invent the wheel anymore. Th e best practices of concurrent programming have been tried in all sorts of systems, researched, published and taken several steps beyond the RMX-80. T he architecture has been extended and combined with objects and modern stat e machines decades ago (e.g., the "Real-Time Object Oriented Modeling (ROOM )" book was published in 1994). The most amazing thing is that the principl es are so little known and are still being questioned and dismissed.

Vote

D

David Brown 7 years ago

I can't speak for anyone else in this discussion, but I am in the "Sequential programming with sharing and blocking is /one/ way to handle things, but not the only way. Indeed, there /is/ no single right way" camp.

He also gets something /seriously/ wrong. His model of "a typical thread" on page 4 is completely incorrect - and his whole argument breaks down because of that.

/Some/ threads are event driven services - they wait for messages coming in, process them, and then go back to waiting for another message. This is basically a cooperative multi-tasking system - the sort of thing we had to use before multi-threading and multi-tasking OS's were suitable for small embedded systems. (It's also what we had in Windows for long /after/ every other "big" OS was multi-tasking.)

Threads like that can be a very useful structure, and can be very efficient ways to model certain types of problem. They are excellent for handling user interaction, and are thus very popular for the main thread in gui programs on desktops. They are also marvellous as pretty much the only thread (baring perhaps timers and interrupts) on many embedded systems - in particular, you know that when the thread is in the "Process message 1" part, it can't be in the "Process message 2" part, and thus you can avoid a great deal of locking or other synchronisation.

But a key point is that a thread like that should avoid (or at least minimise) any blocking or locking, and usually it should avoid any code that takes a long time to run (possibly by breaking it up, using a state machine).

Other threads can have completely different structures. In particular, they /can/ have blocking operations because it is fine for them to block

- the blocking operations are part of the normal flow of the sequential processes that are clear in the code and easy to track. That might mean you have more threads than you would otherwise need, and a corresponding loss in overall system efficiency, but that is the price you pay for better modularisation and clearer code flow. And you can have a range of different types of synchronisation - using whatever makes the best balance between efficiency, clear coding, and provably correct synchronisation (i.e., you can be absolutely sure you have no deadlocks, livelocks, etc.).

The "problem" described on pages 5-6 of the article stem from two errors by the author. One is to mix up the structures and responsibilities of the threads - he has thread A being an event-driven service thread that uses significant blocking in part of its processing. The second problem is that he has no clear picture of the control flow and the interaction between threads, resulting in circular dependencies. If you want an efficient (and, for hard real time, correct) multi-threaded system you have to know what your threads are, how they communicate, who waits for what, and what deadlines you have - and arrange things so that you don't have unnecessary waits.

The variety of incorrect "solutions" then given are clearly problematic, as he says - but he misses the obvious one which is to restructure the threading and communication. For example, a better solution might be to introduce a new thread C whose job is to handle the work currently done in thread A in response to message 1. Thread A's handling of message 1 thus becomes "trigger thread C", and all the work - including blocking if that is the most convenient solution - is done in thread C. Thread A can go back to handling events. But for some reason, the author (and many other authors on such topics, IME) does not think of adding another thread.

He also discusses the idea of the call to thread B be "Additional challenges can arise if, for example, Thread A is in the middle of processing a large collection of data when the callback executes, and if the response from B includes an update to a portion of that collection of data."

Proper structuring of responsibilities and encapsulation of the data and code acting on it would mean that is not allowed - or that it can be done in such a simple atomic way that there is no problem.

Really, most or all of the problems he sees could be solved by saying "/Design/ your multi-threaded system carefully, and don't be afraid of splitting tasks into new threads" rather coding first and thinking later, programming by trial and error, and using hacks instead of re-structuring as needed.

His solution to restructure all threads to wait on multiple queues is certainly one possibility, and can be the "best" for some problems. But it is not without its significant costs in complexity, especially if you assume it is the /only/ way to structure your threads. He helpfully gives some examples of these. But then he says "Confronting such complications is what often motivates people to take shortcuts. In-line blocking, for example...". No, confronting such complications is what should motivate people to realise this is not "one size fits all", and there are better ways to handle the problem than shoe-horning everything into your favourite fad model.

Exchanging your toolbox full of hammers for a toolbox full of screwdrivers does not make for good software development.

Pipes and sockets are most certainly /not/ the only way of communicating between processes - there are about a dozen other methods, depending on details of the OS. But certainly the stricter separation between processes, compared to threads, means you use more heavyweight synchronisation and communication mechanisms, you are less likely to "cheat", and you are more likely to have a clear modularisation and separation of tasks. The cost, of course, is more overheads and lower efficiency on a small system.

The most amazing thing to me is that so many people think they know the "best" method.

Vote

U

upsidedown 7 years ago

The method just describes how real time (i.e. with upper limit for execution time) is usually done. Most (often over 99 %) of the time each thread/task/process is in the wait state waiting for an event, (such as message queue, event flag etc. depending of implementation). When an external event, such as serial line or clock interrupt occurs, some task is activated, possibly sending messages etc. to other task. When done, each task returns to the wait state. Usually there are only one runnable task at a time. If more tasks become runnable at the same time, they are executed in priority order. Very nice and clean.

Windows NT 3.5 and later on was a full blown priority based multitasking system with close resemblance to RSX-11 and VMS.

Processor capabilities is no excuse for not having priority based multitasking. In the 1970/80's practically every company using 8 bitters had their own RT kernels. The Intel RMX-80 for 8080/85 might be familiar to some in this NG.

I usually put the (l)user interface in the null task (lowest priority task), so whatever the user does, it doesn't affect the really important parts of the system.

Think about the various cases as interrupt service routines. Compare this with a multiline serial card with a single interrupt. You first determine which subsystem caused the interrupt and handle it accordingly. Of cause, the ISR should be then handled quickly as should the various message processing. The return from interrupt is then comparable to go to the loop wait state.

If you need complicated sequential processing, such as in the (l)user interface, put it into the null task.

Yes, this is "self inflicted headache" (hangover) :-=

I also wondered why.

What is the point of having a queue priority, if there is already a state machine for Msg1 processing ?

Callback from some system service makes sense (i.e. translated interrupts), but I have seldom seen callbacks from one application to an other.

I use to limit the number of different tasks to 10, so that I can keep track of them with my fingers. Using more than that, would also require me to use my toes and my colleges might not appreciate, if I took off my socks :-)

Vote

D

David Brown 7 years ago

Most of the time, threads (other than an idle or background thread) will be waiting, yes. But it is completely wrong to suggest that they will necessarily be waiting at the outside of the thread function as suggested by his structure.

Yes, I understand how threading works. I also understand that it is /not/ a requirement that waiting threads are sitting in their outer loop waiting for a new message - they can make blocking calls at any point in their execution. What the author calls a "typical thread" is only one possible thread structure, and is certainly not a typical one.

I know. But in the days of Win 3.x, most "big" OS's were multi-tasking

- Windows (and, perhaps, MacOS - I am not as familiar with that) was the exception. Other systems were running *nix variants, VMS, or other multi-tasking OS's. Windows did not catch up until NT.

I disagree. Multi-tasking OS's require a certain overhead of ram and processing power. If you only really need to do one thing, or if the requirements can be stacked in a simple linear form (background task, with layers of clearly prioritised interrupts on top) then such systems are much more efficient use of resources. And a cooperative multi-tasking system fills the middle ground between features and efficiency, if it is good enough for the requirements.

Certainly you /can/ make a multitasking RTOS for even the smallest of processors. That does not make it a sensible choice.

That is often a good idea!

Yes, that is clear - and like the ISR, such event-driven threads should process the messages quickly or hand them off to other threads when long-running work is needed. These other threads, however, do /not/ have to be structured as event-driven with message queues as their point of entry.

You may want the reply from B to be handled quickly, at higher priority than ordinary messages coming in to A. If so, you would either have a second queue that has a higher priority (but wait on both), or use a priority queue as your main event queue into A. Having a single priority queue is the most flexible as long as all your messages will work with it.

No one talked about callbacks between "applications" (do you mean "processes" here?).

There is a lot of sense in limiting a system - it should be as simple as possible, but no simpler. An artificial limit is not helpful, however, and encourages the kind of restricted "solutions" suggested by the author of that paper. If adding a new thread simplifies the structure of your program, then use it - even if you have to take your socks off.

Vote

S

StateMachineCOM 7 years ago

hey /can/ have blocking operations because it is fine for them to block - t he blocking operations are part of the normal flow of the sequential proces ses that are clear in the code and easy to track.

There is no denying that sequential solution to a sequential problem is the simplest and most efficient. For example (which I provide here for the ben efit of the whole NG), if you have a thread that must handle a sequence of events ABC, you might hard-code the sequence in the following pseudo-code:

wait_for_semaphore_signaling_evt_A(); process_evt_A(); wait_for_queue_signaling_evt_B(); process_evt_B(); wait_for_evt_flag_signaling_evt_C(); process_evt_c();.

But the problem is that *most* of real-life problems are NOT sequential. So , in the example above, later in the development cycle it might become clea r that the thread also needs to handle a (perhaps rare) sequence of events ABBA. At this point, the thread has to be completely re-designed, perhaps i n the following way:

wait_for_semaphore_signaling_evt_A(); process_evt_A(); wait_for_queue_signaling_evt_B(); process_evt_B(); wait_for_queue_signaling_evt_BC(); switch (evt_type) case B: process_evt_B(); break; case C: process_evt_C(); break; }

At this point, the thread structure becomes a "hybrid" of sequential and "e vent-driven". Specifically, B can be followed by another B or C, which requ ires a more generic OS mechanism to wait for both B and C *simultaneously* (which most likely is a queue rather than event-flag). Moreover, downstream of the generic wait for both B and C, the code needs to check which one ac tually arrived (hence the 'switch').

The main point is that people (including Dr. Cummings) have observed that s equential code almost always degenerates that way, so they propose a simple , generic thread structure that is *flexible* to accommodate any event sequ ence, which is the event-loop structure. Cummings' article stops at this, b ut of course real-life threads must "remember" certain event sequences, bec ause the behavior depends on it. For example, if you build a vending machin e, events A, B, and C might represent "product selection", "payment", "prod uct dispense". The sequence is obviously important and other sequences shou ld not be allowed (e.g., AC -- selection and dispensing, but without paymen t). Here is where state machines come in, but this discussion is perhaps fo r another time.

two errors by the author. One is to mix up the structures and responsibil ities of the threads...

This is misunderstanding of the main premise of the article. Having found a generic thread structure (event-loop), Cummings (implicitly) assumes that this, and only this, generic structure is allowed. He then moves on to expl aining how to use the thread structure correctly and how NOT to use it inco rrectly. The event-driven thread structure is so superior to sequential cod e that he doesn't even consider that someone might still revert to the "old ". It is a bit like a programmer once exposed to structured programming wil l typically not consider going back to GOTOs.

we had to use before multi-threading and multi-tasking OS's were suitable f or small embedded systems. (It's also what we had in Windows for long /aft er/ every other "big" OS was multi-tasking.)

This might be another misconception, which might be coming from the histori cal baggage surrounding event-driven systems. The generic thread structure recommended in Cummings' article *combines* preemptive multitasking with ev ent-driven thread structure. Threads don't need to "cooperate" to yield the CPU to each other. Instead, any longer processing in a low-priority thread can be preempted (many times if need by) by higher-priority threads. This is determined by the preemptive kernel and its scheduling policy, without a ny explicit coding on the application developer part.

Indeed, if you stick with the "sequential programming based on shared-state concurrency and blocking", there is no "best" method. You need to devise t he structure of each and every thread from scratch, carefully choosing your blocking mechanisms (semaphore vs. queue vs. event-flags vs. select, etc.) . You then need to worry about race conditions and carefully apply mutual e xclusion. The blocking threads tend to be unresponsive, so to evolve the sy stem you need to keep adding new threads to be able to handle new event seq uences. This proliferation of threads leads to more sharing, because now tw o threads that wanted to be one need to share large data structures.

Alternatively, you can choose to work at a *higher level of abstraction*, w ith encapsulated event-driven threads (active objects). The threads wait ge nerically on event-queue at the top of the loop and don't block otherwise. This allows the threads to remain *responsive*, so adding new events is eas y. This also means that such threads can handle much more functionality tha n sequential threads. This reduces the need for sharing ("share nothing" pr inciple). And finally, this thread structure offers high-enough level of ab straction and the *right* abstraction to apply event-driven state machines, graphical modeling, code generation, and other such modern programming tec hniques.

Vote

U

upsidedown 7 years ago

The overhead doesn't have to be big. Of course, in a pre-emptive systems, there must be a private stack for each thread. In addition to that, very little overhead is required. I once worked with a small kernel for 6809, which in addition to the stacks, there were 3 bytes bookkeeping (thread status and saved stack pointer) for each thread.

All you need is the ability to have stacks in RAM and in addition instructions for loading and storing the stack pointer from/to memory.

Vote

T

Tom Gardner 7 years ago

And at that point the standard technique and design pattern is to code it as an FSM driven from a message queue or FIFO.

Personally I dislike if/switch statements, since in practice they always mutate like cancer to become deeply nested and unmaintainable.

Instead there are standard FSM design patterns for that, e.g. a 2D array of pointers to the process() functions with one dimension being the FSM state and the other being the event type.

I also like the State represented as a class, and the events as virtual functions. That allows complex behaviour to be represented simply, where a common super-state is represented as a common superclass.

Those design patterns have stood me well for all sorts of soft and hard realtime systems over the decades.

Vote

T

Tom Gardner 7 years ago

Unfortunately that is difficult to implement in C, so youngsters don't think of it.

Vote

R

Robert Wessel 7 years ago

To be fair, message passing does not force a traditional cooperative tasking model. It just requires that any mutable object only be accessible from a single message queue handler thread.

Vote

R

Robert Wessel 7 years ago

Windows NT 3.1, actually.

Vote

G

George Neuner 7 years ago

One serious problem is that too many programmers are much better at figuring out what CAN be done in parallel than they are at figuring out what SHOULD be done in parallel.

Having too many threads generally is worse than having too few.

I [am not sure but] am thinking that Cummings was intending to demonstrate blocking RPC by forcing B's reply to be a higher priority than other messages waiting for A.

As you alluded, the difficulty is that not all systems offer priority aware message queues.

E.g., VxWorks for a long time offered only FIFO message queues. If you needed priority messaging, you had to use multiple queues or implement something yourself. Moreover, at the time I was fighting with it [circa v4.x], message queues could not be monitored using select(). It was a real PITA juggling multiple queues and multiple network connections.

As I said above, the problem is figuring out what should be split off into a thread vs what can be.

+1

George

Vote

T

Tom Gardner 7 years ago

:)

For "embarrassingly parallel" applications such as telecom systems, 1-4 "worker" threads per core is a good starting point.

Vote

D

David Brown 7 years ago

So use sequential-style coding for sequential problems, and use other styles in other situations.

If the logic is hybrid, then use a hybrid code structure.

The idea that there is only one correct way of writing your code is simply /wrong/.

Lots of RTOS's have support for waiting for multiple objects. Failing that, it is not hard to solve such challenges in other ways. You don't have to use a queue, and you don't have to re-structure everything around the possibility that one part of your code might have to wait for two different things.

The standard "vending machine" solution is done in a /completely/ different manner - it is the running example in the classic book "Communicating Sequential Processes" which has a far neater, clearer and more efficient handling of multiple threads than that paper. The book is available freely online - I would recommend it.

Why? There is no justification for such an assumption - implicitly or not. He has picked a single thread structure and somehow decided that's the only one to use, and further that there should only be two threads in his system no matter how he changes the tasks at hand. All the problems he gets are due to his poor choice of structure and stubbornness in sticking to it.

It's not that I misunderstand the premise of the article - it's simply that I think it is a pointless premise. It would be like deciding that I will program in C, but all my loops will be "do {...} while ()" loops

- and then complaining that simple counting loops look ugly.

No, he goes on to talk about how to get something working - ugly, unclear and separate from the logical flow of the task at hand, but working - despite the self-imposed limitations.

When I first looked at the paper, I thought maybe he had a point. More thought, and the discussion here, has convinced me that he does not. No, event-driven thread structures are /not/ superior. They do have their uses and there are cases where they work well - fortunately I knew that before looking at the paper, because his efforts to force their usage inappropriately could easily convince people that they are always a poor choice.

Think a little bit harder, and you will see that it is the event-driven model that is ending up as GOTOs. It is not even managing a good old-fashioned GOSUB from BASIC years - you have to use global state variables to keep track of where you have been and where you are going.

I know what event-driven systems are.

I realise that. But it is much the same structure as for pre-emptive multi-tasking.

No, if you do proper development you understand there is no "best" method. Sometimes event-driven threads /are/ the best choice, or part of the best choice - but they are most certainly not /always/ the best choice. Please understand what I am saying here - I have nothing against event-driven threads used in the right place. I have everything against using them in the wrong place, or thinking that they are always the right choice.

Yes.

You need to /design/ your code - you need to plan it, you need to be willing to change it to handle different requirements if that becomes necessary.

You need to separate your code into clearly defined parts, and understand their interactions.

This is /always/ the case. You can't just say "This guy said in a paper that event-driven threads were magic. So we'll just use them instead of thinking".

Nonsense. If your threads are not as responsive as you need, you will have to fix the design (or perhaps it can't be handled by the hardware you have). That applies whatever kind of blocking your threads have.

You are inventing problems faster than that paper's author did.

Nonsense. If you think your magic thread structures have eliminated sharing and synchronisation, or the need to think carefully about your design, you have misunderstood everything about multi-threaded coding. Using different structures, or different synchronisation primitives, does not change /anything/ about the fundamentals of what data passes around, what synchronisation is needed, and what parts need to wait for which other parts. It changes the details, and the choices can make a big difference on how clear and simple the code is, how efficient it is, and how much effort it takes to write, read and maintain the code. Event-driven threads are the best choice in some cases, and the worst choice in others as your mess of state variables means your logic is spread out all over the place instead of a clear, neat code flow.

Vote

D

David Brown 7 years ago

This particular youngster grew up on small microcontrollers programmed in assembly. "All you need is the RAM" is not helpful when you have 512 bytes in total, and the stack pointer is limited to accessing the first

128 bytes of that. I have worked on microcontrollers where the context

more - a proper RTOS would be far too high overhead.

Of course you can have more minimal OS's with very limited features (all the way down to "protothreads"), and perhaps cooperative multi-tasking rather than pre-emptive. But then you are not going to have threads with multiple message queues like the ones under discussion.

Vote

D

David Brown 7 years ago

Yes, of course. Win NT 3.1 was rarely seen in the wild, but it did exist.

Vote

D

David Brown 7 years ago

Ah, that's a different matter. Here you are talking about threads that do a job, send out a result, and then close down. (Usually for efficiency you have a thread pool, and it is a "job" object that is activated, run, and closed down, rather than the whole thread. But logically, it is the same.) It doesn't matter which of these worker threads is running at any time, you simply want to make efficient use of the cpu resources and have everything completed in the end.

In an RTOS we are usually talking about threads that need to be alive at the same time, spend most of their time blocked somewhere, and which need to communicate and be able to wake each other. You typically only have one cpu core, but you might have dozens of threads.

Vote

S

StateMachineCOM 7 years ago

@David Brown: As they say: "you can lead a horse to water, but you can't make it drink". I rest my case.

Vote

T

Tom Gardner 7 years ago

Of course some processors have that capability in their hardware, e.g. select { // suspend until one of these events occurs and then // resume with 10ns latency message from channel A: do_this (); break; message from channel B: do_that(); break; timeout: scram_reactor(); break; input on port C: read it and record time it arrived break; output completed on port D: do next output on port D at 12.96us after last output on port D break; }

(RTOS? What RTOS?)

Vote

C++ threads versus PThreads for embedded Linux on ARM micro

Join the Discussion

Didn't find your answer?