ISA Support for Multithreading

Eric · 2007-02-20T12:09:02+00:00

Hi,I'm looking for the minimum set of instructions that a processor mustimplement to support some kind of simultaneous multithreading. I foundthe MIPS MT extension. What do you think of the instructions/mechanisms that it offers? What would you change? Do you know otherISAs implementing other usefull features related to SMP?Many thanks for your help.Eric

J

Jules 19 years ago

Surely for *simultaneous* multithreading (i.e. technologies like Intel's Hyperthreading that provide a second thread running in parallel to the primary thread using additional processor resources), none of this is necessary? Admittedly, a processor limited to running just 2 threads wouldn't be particularly helpful, but would technically fit the requirements.

Vote

C

CBFalconer 19 years ago

Jules wrote: (and elided attributions - don't do that)

If you have two independent processors, you have precisely two threads available. Whether or not you create further artificial threads out of one of those is your business. Some people call that process timesharing.

Chuck F (cbfalconer at maineline dot net) Available for consulting/temporary embedded and systems.

Vote

B

Bill Todd 19 years ago

Nothing whatsoever. However, I interpreted the original question to relate to additional instructions not required on a conventional

*uniprocessor*, and responded in that vein.

You don't need special

You clearly need *some* way to interact with specifiable execution contexts - some way that was never required on a uniprocessor.

Perhaps it's as simple as having an interrupt per execution context, which would be (again, on less than 60 seconds' cogitation) about as close to traditional uniprocessor (and perhaps SMP as well, given its traditional interrupt per processor package) operation as one could get. So instead of an instruction at boot time that let you specify which context to run which thread on, you'd just issue the interrupt for that context and give it the thread (then track things thereafter such that when the thread had nothing to do you could tell from its state which processor it had been executing on and give it something, e.g., with the potential to leverage cache affinity, though that's more pertinent to SMP than to SMT). This also lets you kill (or just time-slice) threads by interrupting them individually (if the processor hardware doesn't handle the latter now).

The point is that you've got to have *some* way of interacting with an SMT core that you never needed on a traditional uniprocessor core. But you and Nick are correct in saying that it needn't be via the ISA itself (as long as you don't consider the interrupt control structure - including the ability to issue one under processor control to a specific context target - to be part of the ISA).

Or perhaps I'm just unusually dense this week and there's an even easier way of handling things: I have some understanding of how the innards of an OS work, but have never designed one from scratch.

- bill

Vote

A

Andrew Reilly 19 years ago

Yes. The usual SMP techique (at least that I've seen) involves off-chip (or at least off-core) hardware. Usually only one processor comes out of reset at boot time, and it does enough initialization to be able to tell the chipset hardware to bring the other(s) out of reset. They then boot in the environment that the first set up, and go on their merry way. They are then independent processors as far as core execution is concerned, except that they typically have at least one interrupt line (each) tied to some control pin that the other can tweak. That is then used by the per-core time-sharing OS on the interruptee to re-schedule tasks as necessary to preserve the illusion of shared processors. The closest that you get to having a whole core "wait" on a condition on another processor is executing a halt-until-interrupt instruction or the equivalent but more power-hungry infinite loop.

None of that is really much different than you have on a time-shared, interrupt-driven uniprocessor (which is what SMPs have traditionally been built out of...)

In an SMT system, though, it might be more efficient in various ways to replace all of that interrupt peripheral logic with some special instructions that do much the same thing, at the cost of needing a different OS (or at least different hardware abstraction layer bits) than you would run on the equivalent SMP.

Cheers,

Andrew

Vote

N

Nick Maclaren 19 years ago

In article , Bill Todd writes: |> |> Nothing whatsoever. However, I interpreted the original question to |> relate to additional instructions not required on a conventional |> *uniprocessor*, and responded in that vein. |> |> You clearly need *some* way to interact with specifiable execution |> contexts - some way that was never required on a uniprocessor.

But that is precisely what is wrong. You don't need anything like that if you are not going to implement preemptive multitasking, and can avoid certain types of interface protocol. But you need that if you want to implement preemptive multitasking or handle interface protocols that may generate time-critical interrupts.

Once you have that mechanism, multiple CPU support comes almost for free; merely having an extra field in a control structure or register to indicate the CPU is trivial.

Most of the extra instructions provided for multiple CPUs are either for direct communication between processes, or to fix up memory synchronisation with relaxed memory models. Neither are critical, in an absolute sense.

Regards, Nick Maclaren.

Vote

C

Chris Thomasson 19 years ago

Well, I have an idea that can possibly address virtually any synchronization issue that can come up wrt multithreading in general:

formatting link

(PDR on the chip!)

I am still waiting for a response from Andy. Anyway, you will most likely see a Patent that matches my description from Mr. Paul E. McKenney:

formatting link

I wonder if IBM has any ideas wrt (PDR On Hardware?)

P.S.

PDR stands for (Partial-Copy-On-Write Deferred Reclamation)... Search and/or post on comp.programming.threads for further information...

Vote

C

Chris Thomasson 19 years ago

DWCAS instruction and a hardware generated "heartbeat-like" interrupt per-synchronization epoch should be all you need to create lock-free algorithms that scale up to many thousands of cores...

Vote

A

Anne & Lynn Wheeler 19 years ago

CAS was invented by Charlie when he was doing work on fine-grain locking for cp67 (CAS was chosen because they are Charlie's initials ... afterwards had to come uqp with mnemonic that matched his initials).

the attempt to get CAS as part of 370 was met with some amount of resistance ... claiming that a smp specific instruction couldn't be justified for 370 (test&set was sufficient) ... and to get inclusion in 370 ... a non-SMP specific justification needed to be created. As a result, came up with the use of CAS as atomic update in a (software) multi-threaded environment (whether running single processor or multi-processor environment) ... along with the various programming notes that were included (originally) in the 370 principle of operations. Also as part of that exercise, both single word and double word version was done and the mnemonics changed to CS (compare-and-swap) and CDS (compare double and swap).

since then instructions have been extended for both 32bit and 64bit mode ... more recent compare and swap

formatting link

and compare double and swap

formatting link

appendix "Multiprogramming and Multiprocessing Examples" programming notes

formatting link

A.6.1 Example of a Program Failure Using OR Immediate
A.6.2 Conditional Swapping Instructions (CS, CDS)
A.6.3 Bypassing Post and Wait
A.6.4 Lock/Unlock
A.6.5 Free-Pool Manipulation
A.6.6 PERFORM LOCKED OPERATION (PLO)

... snip ...

and as above ... compare-and-swap instruction has been augmented with the "PERFORM LOCKED OPERATION (PLO)" instruction

described here:

formatting link

collected past posts mentioning multiprocessor and/or CAS instruction

formatting link

Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

Vote

B

Bill Todd 19 years ago

I'm accustomed to hearing the phrase 'preemptive multitasking' used in the context of time-slicing (i.e., as the multitasking alternative to cooperative multitasking), not as something which also includes all

*other* interrupt mechanisms (which are required with cooperative multitasking as well and would seem to require the mechanism which I described).

and

Ah - perhaps you're saying the same thing after all, though the ability to, say, kill an application thread stuck in an infinite loop is not quite what I'd characterize as a 'time-critical interrupt' but still rather important to a general-purpose OS.

As I suggested in my original post. The one instruction that I suggested was needed is rather similar in effect to the use of the interrupt mechanism that appears to be the standard approach to the problem - but the interrupt mechanism effectively adds an additional processing level, whereas a conventional instruction would require allocating one of the contexts to execute it on (easy enough at boot time but not so easy thereafter - as I said, I gave the question less than a minute's thought at the time, and as best I can recall assumed that interrupts could be delivered to the individual hardware contexts without noticing that such an ability also addressed start-up needs).

- bill

Vote

N

Nick Maclaren 19 years ago

In article , Bill Todd writes: |> |> I'm accustomed to hearing the phrase 'preemptive multitasking' used in |> the context of time-slicing (i.e., as the multitasking alternative to |> cooperative multitasking), not as something which also includes all |> *other* interrupt mechanisms (which are required with cooperative |> multitasking as well and would seem to require the mechanism which I |> described).

But the mechanism is NOT required for other forms - that is the point!

|> Ah - perhaps you're saying the same thing after all, though the ability |> to, say, kill an application thread stuck in an infinite loop is not |> quite what I'd characterize as a 'time-critical interrupt' but still |> rather important to a general-purpose OS.

I wasn't including that, but referring specifically to cases where a device driver has to have its interrupt serviced within a certain time. As always, the problems with interrupts are not the interrupt itself but the resume; if you don't resume, it's easy.

Regards, Nick Maclaren.

Vote

C

Chris Thomasson 19 years ago

Interesting...

Kind of sounds like so-called 'punch card mentality'? CAS is a fairly versatile instruction...

I use those algorithms to this day. I think I remember Microsoft had a patent on a very slightly tweaked version... Instead of modifying the headers version counter when items were pushed onto the collection, it made the adjustments when items were removed... Trivial tweak, and they got a patent...

Just to clarify, DWCAS == CDS... BTW... Do you why the doubleword version of CAS was created? Did its ability to implement many different kinds of lock-free algorithms influence any decisions?

Vote

A

Anne & Lynn Wheeler 19 years ago

ISA Support for Multithreading

see the referenced programming notes for CDS example

example for "free-pool manipulation" using CDS

formatting link

bitsaver has copy of 370 principle of operations ga22-7000-4, 1sep75

formatting link

with basically has identical CDS free-pool manipulation description

Vote

G

glen herrmannsfeldt 19 years ago

(snip)

It would if there was a way to run two single task OSs on the system. I don't know that that is generally true.

-- glen

Vote

ISA Support for Multithreading

Join the Discussion

Didn't find your answer?