Thread based software architecture vs Process based software architecture

- K
- Karthik Balaguru
  
  Contact options for registered users
posted
9 years ago

Sun, Sep 21, 2014 12:40 PM

Hi, Have few queries on the best possible software architecture.

Processes are heavy weight and they appear to occupy more memory, more time to create/start, increased latency during context switches and separate me mory space that necessitates heavy IPC mechanisms. Threads are light weight and share memory space. However, I realized that threads also enter into c ontention for resources/memory due to the shared resources among them that inturn becomes a kind of bottle neck for multi-threaded architecture but no t for multiple process based architecture. Also the workaround for having t hread local storage does not seem to be straight forward. This also makes m e believe that maintaining multi-threaded application can be bit complex co mpared to that of multiple process architecture. Also that the performance of multi process architecture will be better due to separate memory space ( This avoids locking or serialization of execution in case of multi process architecture) and this seems to take away the advantage of less context swi tch time in case of multi-threaded application !! Kindly let me know if thi s understanding is correct or correct with appropriate inputs.

I understand that the software architecture is mainly based on the type of application/requirement. Considering the development environment as Linux O S with C language on single core/multi-core processors, i would like to kno w for which type of applications should we need to go in for multi-threaded software architecture and for which type of applications should we need to go in for multiple process based software architecture ? Is there any matr ix sheet that maps/lists the type of requirements/applications and the pos sible software architecture for it ?

Thx in advans, Karthik

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sun, Sep 21, 2014 7:23 PM

You are mistaken in your notion that just because threads explicitly share a memory space and other resources that they have more contention. Threads share a memory _space_, but they don't have to use the same bits of memory within that space -- it is easy to set things up so that each thread has its own chunk of memory that it uses.

In a sense, once you get past the MMU, processes share the same memory space, too -- it's just that the MMU protects each process from having to know about the memory space occupied by other processes, or even, for that matter, from having to know what physical addresses it occupies.

The "processes have separate memory space" is an illusion, provided in hardware by the MMU. At the point where activity is going on in physical memory, all the processes have to access the same memory space, and so they contend for that resource. Ditto hard drive accesses, screen access, etc.

Really, the biggest thing that you give up with threads vs. processes is that -- assuming the OS is doing its job -- processes are safe from one another. Threads, however, can easily stomp on one another, simply by writing into some part of memory that some other thread is using and thinks isn't going to be disturbed.

For me, the dividing line between threads and processes is one of work load, processor loading, and trust: do I trust whoever is developing that software entity over there not to stomp on my stuff, and is it less work for both of us, using threads, to not stomp on each other's stuff than it is to just use processes? And can the job be done at all using processes? If the answer to the first two questions is "yes", then threads are indicated. If the answer to _either_ of the first two questions is "no", then processes are indicated -- and if the answer to the third question is then "no", the project is in jeopardy.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sun, Sep 21, 2014 9:04 PM

None -- without a clear definition of the application domain! :>

The easiest way to think of the distinction is: threads are active entities (i.e., they are the "things" that "execute code"). Processes are containers that hold resources -- which can include (one or more) *threads*!

I.e., a process is like its own little "machine" -- with its own memory, access priviledges, priorities (in the context of the "machine" in which it resides), etc.

So, if the "system"/machine has certain shared resources (I/O devices, etc), it is the *process* that requests (by the actions of one of its threads) those resources and, eventually, gains ownership/access to it. (I.e., thread #1 in process A can request a resource and, when made available, thread #5 in process A can *use* that resource -- but none of the threads in process B can, at that time)

Given that processes contain threads (in this conceptualization), you can see why it is "more expensive" to switch processes than it is to switch threads.

You can also see why two threads in a process can compete to access a resource THAT THE PROCESS OWNS (either because it was explicitly requested from "the system" by "some thread" in that process; OR it was implicitly granted to that process when the process was instantiated: e.g., "shared memory" IN the process's address space). You are glossing over the potential case where two or more PROCESSES have to compete in "the system" for other "shared resources".

It's still a bottleneck. If two or more processes want to share some data, they either do so via "shared memory" (assuming the OS supports this between processes) -- which requires SOME form of "access/contention resolution" -- or by a more expensive solution (e.g., IPC/RPC). In each case, SOMETHING is handling the fact that contention can exist.

There is no concept of a thread's "(private) memory space" -- though you can easily arrange for this (e.g., each thread has its own pushdown stack! anything thread #1 does that is implemented via the stack is effectively private -- though a rogue thread can still scribble on it!).

By contrast, each (single-threaded) process has a unique, disjoint memory space "guaranteed" by the OS at the process's instantiation (I am assuming you have a "real/nonTOY OS").

No. If there is no contention, there is no locking required beyond what is implicitly present when "thread #1" is scheduled to execute while the other threads are (temporarily) blocked.

Contention has costs, period. You can structure your code so that these costs are minimized. E.g., in a consumer/producer model of sharing, the two threads never actually compete for the same "object" -- an object that is being produced is invisible to a consumer waiting to consume it! Likewise, an object that has BEEN produced is no longer of interest to its producer!

Process model gives an (incorrect) illusion of greater separation only because it "makes sharing (between PROCESSES) harder". If you similarly impose the restrictions that different process spaces impose on your code (i.e., never compete for data for which you have no NEED to compete -- as if it was NOT POSSIBLE), then the costs of sharing are the same -- none.

(sigh) *BIG* (complex) question. Essentially, you have to look at the benefits of "tightly coupled" execution (threads) vs. more "loosely coupled" (processes). And, the overhead involved in each sharing case. Likewise, the potential for (the illusion of) concurrency and the periods involved.

E.g., any time an "execution context" (threaded or single-thread) has to block on (resource, user, i/o, etc.), then there is an opportunity for some other execution context to "do meaningful work" (note that this is not the same thing as a GUARANTEE that they will be able to do meaningful work!).

How often this occurs and the amount (percent?) of time that the blocked process is suspended -- relative to the rate at which "new work" arrives -- determines how much time you can afford to "waste" in the overhead of your model (thread v. process).

E.g., if work is represented by cars arriving at a toll booth (your job being to monitor the presence of individual cars, the receipt of appropriate payment from each and the control of the "gate" allowing paid vehicles to pass), you could (all else being equal) create a single-threaded process that: wait for car; wait for payment; raise gate; lather, rinse, repeat And, spawn N instances of this process -- one for each "lane" at the toll booth (binding the appropriate instances of "car sensor", "coin counter", "gate actuator" to each instance). The "procedure" (I am trying to avoid using the word "process") is inherently serial -- easily handled by a single thread.

[A car doesn't arrive at lane 4, pay at gate 7 and then exit at gate 2!]

THE PROCESSES HAVE NOTHING TO SAY TO EACH OTHER! So, there is no contention *between* them.

Most of the time, a process is waiting for (the next thing) to happen. I.e., while waiting for payment, it doesn't have to deal with "another car" -- even though another car *may* be arriving in some other lane! So, the cost of multiple processes (time) is largely hidden in that "wait time".

You can, similarly, design this as a set of THREADS in the exact same way! Each thread has nothing to share with the other threads!

[Keep this in mind as reading each of the following examples. "Thread" can often be replaced by "process"; but, you will have to think of everything else going on in the particular example to evaluate how (in)effective that solution might be!]

Imagine, instead, writing this process as a set of threads: one that waits for the car; another that waits for payment; a third that raises the gate (and, presumably, ensures the car has passed successfully). These threads need to share information -- you don't want the gate_raiser thread to raise the gate before the payment_received thread has vouched for the vehicle's compliance!

That shared information can be as simple as a shared "state" variable: {AWAITING_CAR, AWAITING_PAYMENT, RAISING_GATE}. Each thread can be responsible for monitoring the variable to determine when it is appropriate to "start" AND updating the variable when it has finished its assigned chore. I.e., only one thread is ever "holding" the variable (able to write to it!).

[threads could also directly start/unblock each other in succession... lots of ways to skin this cat]

You could likewise use a set of *processes* to do this: each process (pedantically, the single thread *in* each process) responsible for blocking on a particular condition, etc. But, processes cost more and are heavier-footed than threads.

In the "process" implementation, the sharing has to happen through some OS-supported mechanism -- *if* processes are prohibited with accessing each other's (or *SYSTEM*!) resources. In the thread implementation, threads within a shared "container" can freely exchange information (relying on synchronization primitives provided by the OS

*or* by constraints inherent in the algorithm: "YOU set this, I will CLEAR it")

You could, also, have one giant "process" with lots of threads -- that handles the entire toll-booth. (again, lots of ways to skin this cat... I'll let you sort out the "more obvious" ones)

Threads could sit "awaiting events". A set of "accepting payment" threads (responsible for verifying proper payment) can sit waiting for "CAR_ARRIVED" events (messages). When such an event is detected, the first WAITING/blocked thread consumes it and begins execution (the event obviously has to specify the lane on which the waiting car was detected!).

[The next "accepting payment" thread -- IF ANY (possibly a configurable option... you might have fewer threads than lanes, etc. depends on the expected interarrival times of "cars") -- then steps up and awaits the NEXT "CAR_ARRIVED" event. This may be on the same lane as the immediately preceding event -- or, another lane entirely!]

The "accepting payment" thread recently activated (above), now sits waiting for "coin received" events (from the specific lane that it is monitoring!). When it has processed enough of these to indicate proper payment, it generates a PAYMENT_RECEIVED event (tagged with the corresponding lane number) and then goes back to waiting for another "CAR_ARRIVED" event.

[I.e., this flavor thread can only handle CAR_ARRIVED events!]

Similarly, another (set of one or more) "raising gate" threads sit waiting for PAYMENT_RECEIVED events and act accordingly.

Here, you need as many "raising gate" threads as there are gates that you want to be able to raise CONCURRENTLY! (e.g., if you don't mind letting other "paid customers" wait while you raise the gate for customer X, then you only need enough of those threads to raise *on* gate at a time!

Yet another way of doing this is to have N copies of generic threads that are capable of processing *any* sort of event -- i.e., having a dispatch table (switch statement) at the start to route the event to the appropriate processing code fragment. In this case, you need only enough threads to handle the total number of "things" that can be happening at one time (i.e., one thing on each lane).

Ah, but what is to prevent a c*ck-up in The System (or, an exploit by a savvy user?) from preventing the vehicle's initial arrival to be immediately followed by a PAYMENT_RECEIVED event? I.e., BEFORE the "accepting payment" thread has even been activated! (we have a technical term for this: "bug")

In the initial "serial process", this wasn't possible: the code that was executed after payment was received COULDN'T run until a car had been detected AND coins counted. The design of the code precluded that possibility. To "exploit" the system, a user would have to synthesize all of the preceding events to "advance" the algorithm to the point where it was ready to lift the gate.

OK, let's build a SHARED OBJECT that indicates the "state" of each of the lanes! That way, the "raising gate" thread won't invoke the actuator unless it sees all of the required prerequisites in place -- even if "signaled" by a PAYMENT_RECEIVED event!

Now, you have several entities trying to update that state AT THE SAME TIME THAT OTHERS ARE TRYING TO EXAMINE IT. "Contention" that affects the entire application's performance -- ONE bottleneck (instead of a "bottleneck per lane" -- or NO bottlenecks!)

Imagine if the cost of ATOMICLY accessing this object was a fat system call -- because it resided somewhere that all PROCESSES could access (contrast with THREADS)!

In each case, you decide how much information you are sharing and who you are sharing it with. A single thread that runs a single lane from start to finish IMPLICITLY is sharing data with itself: it saw the car arrive on its assigned lane, it watched as the coins were deposited in the coin acceptor on that lane, then it raised the gate for that lane -- before returning to await the next arrival.

As you split the "chore" into finer pieces -- or, split the handling of it into different/disjoint "execution contexts" -- you need to pass more information between those objects. E.g., passing events of the form (, ) to a set of generic "handlers" moves the sharing into the "event system".

[whether this is a fifo, shared memory, IPC, etc.]

OTOH, you increase the possibilities for concurrency and more efficient use of resources (why have N "raise gate" processes if drivers can afford to wait for THEIR gate to be lifted? Perhaps the gate lift mechanism can ONLY lift a single gate at a time (motor and gears/clutches).

Sorry for the long-winded explanation. I will promptly be derided for it. But, hopefully it shows you different approaches (that exploit "potential parallelism/decomposition" in different ways) and the potential consequences of different approaches.

You have to look at your workload and see what approach makes the most sense. Interconnections are expensive in any algorithm!

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sun, Sep 21, 2014 9:16 PM

MMU is not a requirement of a process model. Nor excluded from a thread model (even in a single process system).

There is no guarantee that processes are protected from each other. That is an implementation detail. I.e., you can adopt a "process model" and have everyone living in a single, unified, FLAT memory space.

The better way of thinking of processes is as resource containers (threads being one sort of resource). As such, they are bigger/heavier than threads -- that only have to remember their current processor state.

E.g., a *process* can hold a resource. A thread can not. (the process CONTAINING the thread holds it). So, a process handling your "console" can hold that console (hardware/software construct) and one thread in it can paint the screen while another thread is responsible for "ringing the bell" (which takes a sizable fraction of a second!)

In most modern OS's (I won't use the L-word!), an MMU enforces a separation (partitioning) of this "hardware" address space. *If* the processor has such hardware available.

(how "violations" are handled is another subject)

No, that isn't guaranteed -- unless you speak of a specific *port* of a specific OS (think of OS's that claim to run on hardware WITHOUT MMU's! The process model still applies -- you just lose the protections!)

For me, the criteria (if it has to be boiled down to a single one) is "communications". The more data that has to be "interactively" shared (or, the higher the frequency of sharing), the more of an annoyance process boundaries become. Because there are BOUNDARIES that must be crossed (at some cost) -- even if only in a conceptual sense (i.e., just because you don't have hardware protections doesn't mean scribbling in another process -- CONTAINER -- is "right")

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Sep 22, 2014 7:15 PM

Yes and no.

The best way to think of a process is as a resource container - a thread is a particular kind of resource (a computation resource) that a process can contain.

Processes also typically are protection boundaries whereas threads typically are not [though there are exceptions].

In Linux there is a system call "clone" (see clone(2)). Clone essentially creates new threads, but it permits detaching a new thread into a separate process and specifying with relatively fine control what parent resources should be copied to the child.

Clone wraps an even lower level call that provides even more control over the environment of the new thread. Using clone you can create very lightweight processes, e.g., just a thread with MMU protection.

George

- K
- Karthik Balaguru
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Sep 26, 2014 3:11 PM

Hi Don,

Thanks for your quick reply ! That was indeed a pretty long & an interesting explanation !!

Karthik

- K
- Karthik Balaguru
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Sep 26, 2014 3:33 PM

Hi George,

Thanks for pointing it out. I agree that clone can be really handy in system architectures based on multiple threads running concurrently in shared memory space by controlling different levels of sharing between the parent and child tasks uses flags.

Karthik

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Sep 26, 2014 6:17 PM

The point is to show how a single application can be approached in a variety of different ways. And, within those different approaches, how the "sharing"/contention can manifest -- or not.

Finally, the relative differences in costs between process vs. threaded implementations when it comes to that sharing.

Figuring out how to approach YOUR problem (how to decompose it) will be your first step to determining the "most effective" implementation.

- K
- Karthik Balaguru
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Sep 26, 2014 11:34 PM

e

t

,

Hi Tim, It is a very practical input. Also, the point of view based on the skillset of person in not stomping of another person's memory area is a really a go od one to consider in any kind of project management and appears to be a re al practical implementation check-point.

Thanks, Karthik

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Sep 27, 2014 5:04 AM

Why wondering about processes vs. threads, use both as I have done for decades.

I prefer keeping individual programs relatively small to help manageability, protection and updatable, with only a few threads within each address space. For larger systems with multiple processes and address spaces, just create some shared memory areas and map these areas into multiple process address spaces.

For simple items (byte/word/dword) that can be accessed atomically, you don't need any synchronization, for complex items, process A moves xx megabytes of data to a shared memory area and then sends using some OS specific mechanism to process B "I just uploaded xx megabytes, go ahead".

On modern virtual memory OSes (Linux/Windows) shared regions are implemented as memory mapped files (with or without backup to a real file).

If the shared memory area is linked to a fixed virtual memory address, it must also fit into the same virtual address in each process and you can use pointers within that memory area directly. If the shared memory can be loaded at any virtual address in each process, pointers within each process in that shared area must be recalculated.

For applications intended for a long (more than a decade) support, one must be careful how that shared memory is structured. Put a version number and pointers to key data structures into the absolute beginning of that shared area and in that way, process from different software versions can access the same data structure and the shared data area composition can be changed at will.

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Wed, Oct 1, 2014 6:54 PM

I think it's hard to come to a consensus on many of these things. Too many different application domains at play, existing terminology, etc.

One of the most productive things you can do in a project, early on, is to come to an agreement on a particular lexicon that you will use throughout the project and its descriptions/documentation.

E.g., I have three different types of threads in my current system:

- kernel threads, onto which one or more

- user threads are multiplexed, each of which might support one or more

- application threads. Each is a "thread" (using my previous definition of "thread") in its own context. But, the contexts for kernel threads and user threads (or application threads!) are hugely different -- as is the implementation/API!

So, referring to different threads in different context levels is like comparing aardvarks to Buicks (performance, resources, guarantees, etc.) (system level documentation gains significance as complexity increases)

How do you refer to a group of "processes" that work together to achieve some common goal? (in much the same way that a group of threads WITHIN a process can be considered as implementing that

*process's* goal)

And, how do you address components of a "solution" that exist on other processors -- possibly remote? (without a lexicon, you spend lots of verbiage trying to qualify the terms that you ARE using in a particular instance to accurately represent what you *intend* them to represent)

"Can't tell the players without a program!" :>

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Oct 2, 2014 10:47 AM

There are too many thread and thread-like nomenclatures: "user space" vs "green" vs "cthread" vs "activation". "Fiber" vs "coroutine". "OS" vs "kernel" [which, incidentally, may not be the same]. I'm sure I've forgotten someone's favorite term.

Few people anymore know the distinctions between "multiprogramming", "multithreading" and "multiprocessing" ... and "multitasking" is used liberally to describe any combination of them.

Given GM's problems, I'd rather drive an aardvark 8-(

A "gang", as in "gang scheduling" [even if scheduling doesn't apply].

George

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Oct 2, 2014 11:37 AM

Hey George,

[Watch your mail. I *th> >

Yup. But each of those tends to have connotations regarding implementation specifics (e.g., how scheduled).

I have settled on referencing where in the "(machine) hierarchy" a thread executes to differentiate among them. And, separately describe their implementation(s).

This lets me focus on the sort of environment in which to execute that each can expect. E.g., kernel threads have far stricter RT guarantees than application threads (which, currently, are effectively just multitasked). Also, the sort of skillset expected to develop in each of those environments.

Shall we throw the RTOS/MTOS issue into the mix?? :> I guess that's why their are so many more "programmers" than "software engineers" :-/

And, microprocessor vs. microcomputer vs. microcontroller? (and we call this a *science*?? :< )

Poor ground clearance -- stubby legs!

I prefer "job" though note there is a discrepancy in "parts of speech" vs. "thread" and "process". Perhaps because the term originated (in my past) to differentiate between "jobs" and "tasks"?

OTOH, I try to avoid needing it by relying, instead, on "services" (and clients) as a more "natural" way of subdividing bigger "chores" (which also draws attention to the fact that a service may not be tied to a specific "job" but, rather, shared among many).

C's off to hike -- take advantage of the cooler weather (finally!). I guess I get to play "chauffeur" :-/

--don