You're not thinking with HLL's in mind -- where a *tool* creates the software (how does the tool tell you, concisely, which registers it used?)
And, even if you know which registers were used, you don't know which were used SINCE THE LAST CONTEXT SWITCH!
The silicon is trivial: each load of a register (or register in a register file) forces a corresponding bit to be set in a collection of flags.
Then, a new "PUSH " opcode simply uses that "collection of flags" as the .
If you had to "manually" examine the flags (bits) in that vector and conditionally save/restore registers, the overhead of doing so wouldn't offset the cost of just unconditionally performing the save/restore.
In essence, this is what I do with my handling of the FPU context (see other post). I assume the FPU registers are NOT used and let the processor (in the NS32k example) tell me when a floating point instruction is invoked (the FPU is an optional component in the early NS32k systems; if it is NOT present, the opcodes are implemented by traps to user-supplied emulation functions) by invoking a TRAP handler.
Of course, I can use that notification (with or without a hardware FPU) to alert the OS to the fact that the additional state is being referenced and save/restore it, as appropriate.
[This only needs to happen at most once for each context switch]