Only if you need that information. Either way, especially on systems with cache, access to a state variable is (always?) faster than to peripheral I/O registers.
It's really not an issue of optimization. These primitives only come into play at the lowest levels of I/O, but I usually generalize them into inline functions that are defined in a cpu specific header file. Then, I use them when writing the device driver as if they are required on any processor that may execute the device driver. On platforms where their function is not required, the inline functions generate no code.
PCI door-bell registers are a good example. A master device on the PCI bus may be assigned one bit in the register which it may use to cause an interrupt to the processor with the door-bell register.
When one of the master devices wishes to cause an interrupt to the device with the door-bell register, it simply writes the register with a bit mask with its assigned bit set to one.
Any other of (up to 32) master devices on the PCI bus may do the same without contention.
For this function, there is no need to know the state of the interrupt bit.
For PCI, it also eliminates of the interrupt from arriving at the processor ahead of previously written data that may still be traversing the PCI bus ;-)
I have no idea. I suspect that there are military and other high performance applications that require those kinds of embedded systems. Besides, as the availability and price of such systems increase, the applications will come. Todays multi-core desktop and laptop processors will pave the way.
If you build it ... they will come :-)
I don't know if abstraction is the right term, but it certainly increases the complexity of the software systems when multi-threading and multi-processors are used. Especially at the system level.
As an example, look at the game console industry now as it grapples with the 3 core XBOX 360 powerpc processor (SMP) and the asymetric (but still multi-processor) CELL processor.
Why not? But ... only if I need it of course ;-)
What doubts about efficiency? I suppose you're referring to the OS here, but SMP and the obtaining the performance of multi-processor systems does not *require* a big OS. Nor does it require an opaque distinction between user and kernel space (ala Linux).
Code bloat is a function of programmers and feature-creep, not of the processor or OS.