Implementing multiple interrupts

- A
- Aleksandar Kuktin
  
  Contact options for registered users
posted
10 years ago

Sat, Dec 7, 2013 12:04 PM

Hi all.

Some time ago, I designed a small and simple CPU to go into a project I am sort-of working on (when can I steal the time to do so). Now, I added an interrupt mechanism to it and encountered a problem when I tried to turn the one-interrupt interrupt into a multiple-interrupt interrupt - the kind all other CPUs have these days.

Here's my gripe: suppose the CPU enters interrupt 1. While it is interrupted, IRQ2 comes in. And then IRQ3 comes in, followed by IRQ4. WTF am I supposed to do now? I want all IRQ-s to be handled. So, no ignoring.

My first idea was to make a stack and push the interrupt contexts onto the stack as new IRQs come in and pop them as the software interrupt handlers return. But this is far too elaborate to be implemented in reasonably little resources.

My second idea was to have every IRQ assert a bit in a special reqister before asserting the unified IRQ signal. Then I would mandate that the software interrupt handler iterates over the register and handles IRQs that it finds asserted. And when it is done, the interrupt handler would reiterate over the register to catch any interrupts that came in during the handling of the previous ones. I currently plan to use a variant of this method.

While writing this I also came up with the option of deffering IRQs until the previous one's handling returns. Probably with some sort of a stack. If done correctly, it should take little resources.

What to other CPUs usualy do in these situations?

- G
- Gabor
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Dec 7, 2013 2:44 PM

Normally you have an instruction to mask / unmask interrupts. When the CPU gets interrupted, the source that caused the interrupt gets cleared, but others may be active. In some CPU's, there is one guaranteed instruction before another can interrupt the CPU, and you must use the general "mask interrupts" instruction as the first instruction of the service routine. Others automatically set the global interrupt mask when the interrupt occurs, and the service routine only needs to unmask them at the end of the routine, just before the return. Note that this means you need to allow one more instruction after the unmask to make sure the return happens before the next interrupt, or you can have stack overruns. Another option is to have a special RTI instruction that enables interrupts and returns.

In addition to this simple global mask, there is usually a mask per IRQ, that allows you to have prioritized interrupts. An interrupt service routine in this case will first save the current state of the per-IRQ mask, then mask all lower level IRQ's and unmask interrupts globally. Thus only higher-level IRQ's can interrupt this service routine. At the end, it must restore the state of the per-IRQ masks.

All this implies a machine with a stack to save current state. Here again there are differences in implementation. Some CPU's will automatically push registers onto the stack when the interrupt occurs, others require the service routine to save any state that it may change during operation. In the first case, you definitely need a separate RTI instruction that unwinds the stack.

--
Gabor

- S
- Stephen Fuld
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Dec 7, 2013 4:13 PM

Yes. All those will work. Another alternative is to have another, probably smaller, register set that is used by the first level interrupt handler. When an interrupt comes in, if it is allowed, further interrupts are disabled, the system is "switched" to use the special register set, and control is transferred to the ISR. A minimal amount of code is executed to whatever is needed to save the details of the interrupt to memory, set flags, etc. Then a special instruction is executed (e.g. something like the RTI mentioned above) that returns to the instruction that would have been executed had the interrupt not occurred, further interrupts are enabled and the mode is switched back to the normal register set. The big advantage of this is faster interrupt response, as no need to save/restore registers, and no need to size a specific interrupt stack.

There are lots of alternatives depending on your specific requirements and available resources.

The key requirement for all of them is the ability to defer interrupts for a short time until the information about an interrupt is saved.

--
  - Stephen Fuld 
(e-mail address disguised to prevent spam)

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Dec 7, 2013 8:26 PM

Consider "level triggered" interrupts rather than edge triggered interrupts...

Every peripheral (etc) generates an interrupt which pulls a single signal high until that interrupt has been cleared by the relevant interrupt service routine (ISR) and the return from interrupt (RTI) instruction executed.

If a second interrupt arrives before the first one has been cleared, then it too pulls the same signal high - which has zero effect. When the ISR has cleared the first interrupt and the RTI executed, the processor immediately enters the ISR for the second interrupt.

Hence all the interrupts are dealt with serially without preemption. If you want preemption then you have to design a hierarchy of interrupts and be prepared to stack one context for each level in the hierarchy. Many processors manage with two levels, typically a maskable interrupt and a non-maskable interrupt.

Have a look at early microprocessors, e.g. the Z80 or 6800.

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Dec 7, 2013 8:39 PM

First, you want level triggered interrupts, such that the interrupt line stays active until acknowledged. This was a problem with the ISA bus, fixed in PCI.

Q-bus, and probably the somewhat similar unibus, use a daisy-chained interrupt acknowledge system. You should probably look up the details, but, more or less, it chains through the backplane slots, such that ones nearer the CPU have higher priority. When not interrupting, the board in each slot (or a jumper board if no actual board) passes the ACK down the line. It a board has requested an interrupt, it doesn't pass it along, and the CPU will then process that one. When it clears, the next one will get its chance. Note the requirement for level triggering, and that the interrupt routine (or hardware) must reset the interrupt line at the appropriate time.

-- glen

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Dec 7, 2013 10:55 PM

(snip)

Yes.

Normally low, at least in the TTL days. Many buses are based on TTL level signals, even if implemented in MOS logic. TTL outputs pull down much harder than up.

-- glen

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Dec 7, 2013 11:38 PM

Yes indeed; I originally wrote "low". Given the modern tendency for all i/o (and less obviously internal) signals to be active-high, i suspected the OP wouldn't be familiar with how to cheaply implement a wired-or function with active-low signals. And I didn't see any benefit in explaining he concept!

Active high i/o signals /still/ feel somewhat unnatural to me. Must be too old :(

- G
- Gabor
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Dec 8, 2013 3:10 AM

The last time I designed a Unibus board, I needed to get the bus pinout from an insider at DEC. The documentation was all written about the interface at the ends of a backplane where the bus extenders hook to the A and B slots. Plug- in cards all used the C D E and F slots. This pinout was a (sort-of) secret. If you look at the documentation for Unibus, you can't actually figure out how the IRQ's are daisy-chained when there's only one pin assigned for each. That's of course only on the A B slots. Qbus is a better documented bus.

IIRC Qbus had 4 or five of these interrupt chains, one for each available interrupt priority level. So the CPU still had to deal with the interrupt priority. Also I believe there were many more soft interrupt levels on those CPU's. Probably not very interesting for a small compact CPU as mentioned in the OP. My original comments were based on my limited memory of 8085 and 6800 which I worked with in the early days of my career, before VHDL when design was pencil on vellum....

--
Gabor

- A
- Andy (Super) Glew
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Dec 8, 2013 7:00 PM

Fun stuff, eh? I keep thinking that interrupt architecture is one of the areas of computer architecture that is most in need of cleanup. I want to do a writeup, a survey, for my comp-arch.net wiki -- understanding the state of the art is what I usually do before trying to advance it.

Anyway - what do other CPUs do?

Most CPUs allow interrupt preemption. They save the state of the preempted thread / interrupt handler somewhere. Where they save the state varies. I'll summarize that later, but this thread inspires me to start from the simplest thing.

---+ Interrupt Source

In the beginning there was an interrupt pin... (oooo, this will be a cool slideset. complementary to my tutorial on memory, which begins "This is a bit..." with a big dot in the middle of the page, which is still my favorite.)

Inevitably, people wanted multiple interrupt sources, multiple interrupt pins. Sometimes they treated them uniformly: INT0, INT1, INT2 ... . Sometimes they created different mechanisms for different pins: INT, NMI, SMI, FERR, MCERR ... Fundamentally, even if different mechanisms, the same issues arise - often it is as if each interrupt source takes a different combination of the options we are about the list.

But let's deal with interrupt sources. Pins, easy to understand. Level triggered. Edge triggered => edge detector. Detect on risi9ng edge, falling edge, both. Glitches - level triggered interrupt pins that rise and then fall too quickly to actually trigger and interrupt. Similarly for edge triggered.

Non-pin interrupt sources: ECC on memory. Internal exceptions, not from outside world: cache or register parity.

Interrupt delivery mechanisms: interrupt controller outside in the system, that delivers and routes interrupts to various targets. E.g. different CPUs. Dedicated interrupt farics / busses, like the original Intel MPIC serial bus.

More non-pin: bus messages on the memory bus, like Intel XAPIC. Makes it easier to reason about ordering of interrupts and memory requests. Makes "level triggered" less meaningful, edge triggered more natural: level triggered requires messages to be sent on both high-to-low and low-to-high transitions. Edge triggered only requires one message.

I still think that interrupt "pins" are the fundamental primitive, and will be so long as we are working with wires. Even when you have bus message based interrupts, inevitably the messages are received by logic at the target, and then assert signals - the good old INT, NMI, etc, - into the CPU core or cores.

---+ Detecting interrupts

Pre-beginning: polling. Read the pin, possibly compare it to a previous value. Decide what to do.

Interrupts really begin when software doesn't have to poll. Dedicated logic does. And dedicated logic then tries to get software to do something.

---+ Interrupt delivery

OK, so what do you do when you detect an interrupt?

Pre-beginning: unconditionally force the PC to specific value. Throwing away whatever the old PC was.

This is limited, so we start thinking about saving the PC at point of interrupt somewhere.

0) dedicated save register

Interrupted_PC := PC PC := interrupt_handler_PC

1) switch PC

current_PC non_interrupt_PC interrupt_handler_active_PC, interrupt_handler_start_PC (constant)

on new interrupt: interrupt_handler_active_PC := interrupt_handler_start_PC (may be preloaded) current_PC = interrupt_handler_active_PC (points to)

on interrupt return current_PC = non_interrupt_PC

Note: fundamentally delivering an interrupt is an edge triggered action.

I think dedicated "save register" and "switch PC" are isomorphic, dual. Although the details matter. I will try to say "save/switch" interrupt state.

Many systems have a mix: some interrupt sources switch, others save.

Oftentimes, switching is suggested as the solution to all of the problems that saving has. Or vice versa. So we end up with inconsistent interrupt handling mechanisms. :-(

Either can be scaled to apply to multiple interrupt sources and save/switch PCs.

Nearly all systems save/switch not just PC, but also some other registers. Typically some register that can be used in addressing memory. Often this is called a "interrupt handler stack pointer", but it doesn't have to be a stack.

Execution mode / privilege (although not applicable to simplest systems that do not have privilege).

More:

2) Save/switch sets of registers.

3) Save/switch registers to blocks of memory. Basically, to avoid having to build lots of hardware registers.

E.g. Gould SEL machines had one of the most elegant save/switch mechanisms I have encountered. Perhaps not so much elegant, but very simple, instructive. Each of the many interrupt "vectors" had a memory block. Interrupt delivery consisted of saving (a subset of) current register state in the memory block, and loading new register state from a different part of the memory block. Delivery of new interrupts using that same memory block was prevented, until software had moved the state somewhere else, out of the memory block, at which point that interrupt source was unblocked

By the way: note "save old state / load new state". Basically a state exchange. CDC "exchange jump". Could be same memory, although IMHO that complicates state management.

Typically the old state save area is coupled with, related to, the interrupt vector, and the area from which new state for the interrupt handler is loaded. But it doesn't have to be this way. In some systems, you want the saved state of the interrupted thread/process to be protected from the interrupt handler - i.e. you do not necessarily assume that the interrupt handler is higher privilege that what got interrupted. The interrupt handler often does not need to know what got interrupted. Think - I/O interrupts. But many other uses need handler to know what got interrupted - think timer interrupts for profiling, OS context switching. Anyway, what I am trying to say is that some sort of linkage to what got interrupted needs to be saved, but not necessarily directly accessible.

Some systems with "Processor Control Blocks" or "Processor State Blocks" in memory save the interrupt state into that. But if every interrupt saves into the same PSB, then you cannot nest until that state has been moved somewhere else - unless the PSB itself is changed as a result of each interrupt.

Basically, the save area is something chosen from the cross product of Privilege_Domains X Interrupt_Sources.

(Of course, can always emulate complicated interrupt architectures like this on top of simpler architectures - like assuming interrupt handler is more privileged. And then let the privileged SW int handler switch to a less privileged actual interrupt handler. Similarly, we often say "interrupt return", as in "return to that which got interrupted, and resume". In reality, it is "end of interrupt handler", and then a scheduling mechanism (HW or SW) picks up whatever is most appropriate. Possibly return to that which got interrupted. But possibly switch.)

4) Save/switch onto stacks

It is quite common to push interrupts onto a stack - possibly THE stack. Avoids need for so many dedicated save areas. Typically, SW then moves stuff off the stack, ...

But once you start doing this, you quickly realize that you want multiple stacks. E.g. separate user / kernel stacks, separate by privilege. Which typically involves switching or saving registers, to change the stack pointer used, and then the memory accesses.

Basically, we can see that you have a dedicated stack per privilege domain (thread/process/user/supervisor). Nested interrupts / higher priority interrupts that run in the same privilege domain may be able to save on the same stack, pushing, incrementing stack pointer. Avoids the need for need to allocate per interrupt source, i.e. to have the full cross product of Privilege_Domains X Interrupt_Sources.

GLEW OPINION / BKM:

0)save/switching register state for simplest systems. 1) If things get more complicated, save/switch to memory 1.1) Old/new state per interrupt source 1.2) Or: save old state to PSB, load new state, including PC and PSB Leaving TBD the issue of where to link the saved PSB.

Although switching stacks is common, IMHO it is complicated. For a simple implementation, I suggest one of the above. Save/switch through registers or memory.

---++ How much state to save?

Lots of brainpower has been spent on minimizing the amount of state to be loaded / saved / switched/

Obviously, must save/switch PC. Typically privilege. And usually a register used in memory addressing - stack pointer, or PSB.

You can leave other state in registers to be saved/restored by software, right?

Well...

a) Yes if int handler privileged... big issues if not

b) Yes if int handler knows about and how to save state.

Problems arise with long lived computer architectures that have legacy concerns. E.g. an int handler that knows about 32-bit GPRs, but doesn't know about 64-bit GPRs. Or an int handler than knows about 128 bit XMM registers, but not about 256 or 512 bit YMM registers. Where accessing a 128-bit register zeroes out the upper bits if 512-bit registers.

If you assume that OS software in an interrupt handler knows about the state. But, oftentimes the reason we are doing virtual machines is to keep legacy OSes running.

These concerns are often coupled, related, to overlapped versus non-overlapped, partial registers, etc. I.e. if you want the simpler for hardware approach of zero or sign extending small register writes to full register width, to avoid partial register issues, you may be required that your privileged resource architecture save more registers, either in hardware or software.

Or, you can simplify interrupts by supporting partial register writes.

Or... various flavors of lazy save/restore (after an interrupt, trap if tricky registers are accessed. But then the trap handler just needs to know what to do about the state. There have been examples where lazy save/restore was not kept consistent with OS policies, producing security holes.)

Or... state saving instructions like Intel XSAVE.

---++ Threads

I can't quite work this into my taxonomy, so let me just say: one of the most common unusual or alternative interrupt schemes is thread based. Instead of logically switching threads, you start off with a multithreaded processor, and interrupt delivery conceptually just unblocks a stalled thread, the thread for the interrupt handler. Conceptuially letting the interruopted thread continue execution.

Elegant...

If the threads are loaded into registers, scaling issues. But a nice approach for a dedicated system.

If the thread state is in memory - but then hardware is doing CPU scheduling function. I like it, but that's a whole other topic.

Issue: what about profiling interrupts, where the interrupt handler wants to inspect the thread that was interrupted.

Dedicating CPUs to interrupts - same thing. Same elegance, same scalability.

Up to the point where hardware is free, we will always have scaling issues. The issues of how to switch state will always be there - but techniques like saving/switching register sets, or activating interrupt handler threads or CPUs, put the issue off. Possibly off far enough that it can be deferred to software, and hardware doesn't have to worry.

But there will always be the need for an interrupt handler thread / CPU to preempt or access state another thread/CPU. That will always need a hardware mechanism - inter-thread or inter-processor interrupts. It just can be reduced in frequency.

And, I think on general purpose CPUs, we will probably need to multiplex multiple threads/processes for 10-20 years more. I suspect forever.

I.e. dedicated interrupt handling threads/CPUs are something for the toolbox. But are not general purpose. Not universal.

---+ Blocking or Masking interrupts

Must always be able to block interrupts, even with a single pin. Basically a bit that says "interrupt handler in progress, can't save/switch state". Inevitably extended to also be used to block interrupts when interrupt handler not in progress,

Blocking bit.

Multiple: blocking bitmask. Or, level - accepting ints of higher priority.

GLEW: I like bitmasks. One bit per save area. Levels are more compact. Allows you to have many morte save areas in memory, without excessive bitmask storage. Levels can tie you up in knots - because actually interrupt handlers change their priority during their lifetime. Bitmasks, though, add to state that must be saved / restored.

Basically, mixing up, entwining, two concepts: a) interrupts that must be blocked because the save area for them is in use. (Statically save/switched, regs or memory; not so much dynamically allocated, stacks.) b) interrupts that must be blocked because of simplistic mutual exclusion policies.

I suppose could have both.

Levels are always simplistic for mutual exclusion. Level/bitmask?

---++ What about NMIs?

What about NMIs? Non-Maskable Interrupts.

GLEW OBSERVATION: a) first you add ordinary interrupts b) then you add NMIs. NMIs are supposed to never return. c) but then you want to have NMIs that return... e.g. profiling all code. e.g. hypervisor wants to context switch an OS that uses NMIs NOTE: must then guarantee that NMI state doesn't collide with ordinary ints

d) ... eventually, it seems that most mature architectures in fact have a mask bit for non-maskable interrupts. A mask bit separate from the mask bit for ordinary interrupts.

---++ Masking interrupts at start of handler

Simplicity can supposedly be obtained by masking state at the start of an interrupt handler - to give the OS a chance to save state, before allowing others.

Can be automatic - 1 or N-instruction window.

Or int handler may be trusted.

But... when you start virtualizing, or when you have NMIs that you want to guarantee do not get masked.

I.e. automatic masking is an optimization. Does not work in general, but in specific cases.

---+ Queuing interrupts

With a single pin, level triggered, there isn't really much of a queue.

1 bit, interrupt pending. Another bit, interrupt active.

Issues wrt clearing the bits as returning. Nearly always a special interrupt return instruction. (x86 special "deferred" handlimng of instruction after CLI/STI.)

With multiple interrupt sources, a bitmask.

Multiple arrivals sharing same bitmask -

0) don't do that (simplistic) 1) saturate - OR into existing.

Structure often seen: 1 deep satuirating queue (per bitmask position).

1 bitmask, but per int group, for "this interrupt is dispatched, handker has been started" (although may have been preempted).

Second bitmask for pending interrupt that has not yet been dispatched.

Less often: 2-deep saturating queue. x86 APIC? (Why... circuits.)

Much less common: deeper queues, less combining. Some event driven RTOSes simplified by keeping interrupt requests separate, unmerged. Inevitably must handle possibility of interrupt overflow. But this can become an error condition, rather than common.

Issue:

Conceivable, but I am not aware of: counter per interrupt group./ I think it has worst properties: still need to poll if multiplexed int sources.

---+ Conclusion

Lots of combinations of these mechanisms.

I think I am stopping because of exhaustion of myself, not of the possibilities seen, that I want to document.

--
The content of this message is my personal opinion only. Although I am  
an employee (currently Imagination Technologies, which acquired MIPS; in  
the past of companies such as Intellectual Ventures/QIPS, Intel, AMD,  
Motorola, and Gould), I reveal this only so that the reader may account  
for any possible bias I may have towards my employers' products. The  
statements I make here in no way represent my employers' positions on  
the issue, nor am I authorized to speak on behalf of my employers, past  
or present.

- I
- Ivan Godard
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Dec 8, 2013 7:53 PM

What a mess. We decided early on that we didn't want to deal with it.

A distinction that was useful to us: a fault, a trap, and an interrupt are not the same thing.

A fault is internally generated by the running program, and has termination semantics. Example: illegalInstruction

A trap is internally generated, not necessarily associated with a particular program or execution, and has resumption semantics. Example: not-present-page trap.

An interrupt is externally generated and has resumption semantics. Example: i/o complete interrupt.

The difference between interrupt and trap boils down to whether the event can be precluded by having the CPU avoid doing anything that would raise it (a trap), or whether the event is triggered by something that is outside CPU control (interrupt).

For all three, handler invocation is an involuntary function call, and the handler is a function like any other. All three have dispatch vectors that are simply arrays of function pointers indexed by event kind; the vectors are in memory and the three vector roots are in hardware registers. Hardware detecting an event selects the appropriate root, indexes the vector with the event kind, and calls the function.

The expected use is that the fault vector and its root are in the executing turf and hence under the control of the faulting program (or more likely its run-time system). Trap and interrupt vectors are not expected to be in the application turf but instead in a turf used by the OS for event management. Trap and interrupt handlers are expected to be distant functions (the invocation switches turf); fault handlers may or may not be.

Cascaded faults (including an attempt to return from a fault handler) causes a trap. The trap handler clears up the rubble.

- A
- Aleksandar Kuktin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Dec 8, 2013 8:23 PM

I *really* like this idea.

I think I will implement a variant of queuing. Or, as I have begun to call it, "deffering". After posting the original post yesterday, I gave the idea thought and figured out I can make a sufficiently small mechanism that does this.

Originally I was going to use a stack but stack is a FILO construct and I need a FIFO so I'll probably use a ring buffer with two pointers (read and write) pointing into it. So an interrupt request makes the IRQ buffer unit write into the appropriate place that the IRQ has been received as well as which IRQ was it. Actually, it will just write and increment the write pointer. Logic can figure out the interrupt is pending because there is a used memory slot between the read and write pointers. Then, if the CPU is not in interrupt, it gets interrupted. If it is in interrupt, nothing happens untill the interrupt handler returns when the CPU immediatelly gets pushed back into an interrupt.

With a sufficiently big ring buffer, sufficiently fast interrupt handlers and sufficiently interspersed interrupt requests, this scheme should handle all of my needs for the concievable future.

- A
- Aleksandar Kuktin
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sun, Dec 8, 2013 8:29 PM

Interesting concept. Suppose I'll check out Z80 and the Motorola. Always had a suspicion I'll end up doing something with Z80.

- J
- Jecel
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Dec 9, 2013 2:25 PM

e

One advantage of using co-routines instead of subroutines for handling inte rrupts is that the suspended PC encodes some state while since the subrouti ne (traditional) interrupt handler always starts at the exact same address it has to spend some time figuring out the current system state.

The TX-2 machine (MIT Lincoln Laboratory) was an early co-routine based des ign and the Xerox PARC machines starting with the Alto used this extensivel y. I have used this in some of my designs (using the distributed RAM instea d of the flip-flops in FPGA slices for register allows you to have, for exa mple, 16 PCs in the same area you would normally just have one) and it can work very well.

At the software level, the Apertos (later Aperios) operating system from So ny Labs implemented its drivers as co-routines instead of subroutines.

formatting link

-- Jecel