The Semantics of 'volatile' ===========================
I've been meaning to get to this for a while, finally there's a suitable chunk of free time available to do so.
To explain the semantics of 'volatile', we consider several questions about the concept and how volatile variables behave, etc. The questions are:
- What does volatile do? 2. What guarantees does using volatile provide? (What memory regimes must be affected by using volatile?) 3. What limits does the Standard set on how using volatile can affect program behavior? 4. When is it necessary to use volatile?
We will take up each question in the order above. The comments are intended to address both developers (those who write C code) and implementors (those who write C compilers and libraries).
What does volatile do?
----------------------
This question is easy to answer if we're willing to accept an answer that may seem somewhat nebulous. Volatile allows contact between execution internals, which are completely under control of the implementation, and external regimes (processes or other agents) not under control of the implementation. To provide such contact, and provide it in a well-defined way, using volatile must ensure a common model for how memory is accessed by the implementation and by the external regime(s) in question.
Subsequent answers will fill in the details around this more high level one.
What guarantees does using volatile provide?
--------------------------------------------
The short answer is "None." That deserves some elaboration.
Another way of asking this question is, "What memory regimes must be affected by using volatile?" Let's consider some possibilities. One: accesses occur not just to registers but to process virtual memory (which might be just cache); threads running in the same process affect and are affected by these accesses. Two: accesses occur not just to cache but are forced out into the inter-process memory (or "RAM"); other processes running on the same CPU core affect and are affected by these accesses. Three: accesses occur not just to memory belonging to the one core but to memory shared by all the cores on a die; other processes running on the same CPU (but not necessarily the same core) affect and are affected by these accesses. Four: accesses occur not just to memory belonging to one CPU but to memory shared by all the CPUs on the motherboard; processes running on the same motherboard (even if on another CPU on that motherboard) affect and are affected by these accesses. Five: accesses occur not just to fast memory but also to some slow more permanent memory (such as a "swap file"); other agents that access the "swap file" affect and are affected by these accesses.
The different examples are intended informally, and in many cases there is no distinction between several of the different layers. The point is that different choices of regime are possible (and I'm sure many readers can provide others, such as not only which memory is affected but what ordering guarantees are provided). Now the question again: which (if any) of these different regimes are /guaranteed/ to be included by a 'volatile' access?
The answer is none of the above. More specifically, the Standard leaves the choice completely up to the implementation. This specification is given in one sentence in 6.7.3 p 6, namely:
What constitutes an access to an object that has volatile-qualified type is implementation-defined.
So a volatile access could be defined as coordinating with any of the different memory regime alternatives listed above, or other, more exotic, memory regimes, or even (in the claims of some ISO committee participants) no particular other memory regimes at all (so a compiler would be free to ignore volatile completely)[*]. How extreme this range is may be open to debate, but I note that Larry Jones, for one, has stated unequivocally that the possibility of ignoring volatile completely is allowed under the proviso given above. The key point is that the Standard does not identify which memory regimes must be affected by using volatile, but leaves that decision to the implementation.
A corollary to the above that any volatile-qualified access automatically introduces an implementation-defined aspect to a program.
[*] Possibly not counting the specific uses of 'volatile' as it pertains to setjmp/longjmp and signals that the Standard identifies, but these are side issues.What limits are there on how volatile access can affect program behavior?
-------------------------------------------------------------------------
More pr An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects.
Nowhere in the Standard are any limitations stated as to what such side effects might be. Since they aren't defined, the rules of the Standard identify the consequences as "undefined behavior". Any volatile-qualified access results in undefined behavior (in the sense that the Standard uses the term).
Some people are bothered by the idea that using volatile produces undefined behavior, but there really isn't any reason to be. At some level any C statement (or variable access) might behave in ways we don't expect or want. Program execution can always be affected by peculiar hardware, or a buggy OS, or cosmic rays, or anything else outside the realm of what the implementation knows about. It's always possible that there will be unexpected changes or side effects, in the sense that they are unexpected by the implementation, whether volatile is used or not. The difference is, using volatile interacts with these external forces in a more well-defined way; if volatile is omitted, there is no guarantee as to how external forces on particular parts of the physical machine might affect (or be affected by) changes in the abstract machine.
Somewhat more succinctly: using volatile doesn't affect the semantics of the abtract machine; it admits undefined behavior by unknown external forces, which isn't any different from the non-volatile case, except that using volatile adds some (implementation-defined) requirements about how the abstract machine maps onto the physical machine in the external forces' universe. However, since the Standard mentions unknown side effects explicitly, such things seem more "expectable" when volatile is used. (volatile == Expect the unexected?)
When is it necessary to use volatile?
-------------------------------------
In terms of pragmatics this question is the most interesting of the four. Of course, as phrased the question asked is more of a developer question; for implementors, the phrasing would be something more like "What requirements must my implementation meet to satisfy developers who are using 'volatile' as the Standard expects?"
To get some details out of the way, there are two specific cases where it's necessary to use volatile, called out explicitly in the Standard, namely setjmp/longjmp (in 7.13.2.1 p 3) and accessing static objects in a signal handler (in 7.14.1.1 p 5). If you're a developer writing code for one of these situations, either use volatile, code around it so volatile isn't needed (this can be done for setjmp), or be sure that the particular code you're writing is covered by some implementation-defined guarantees (extensions or whatever). Similarly, if you're an implementor, be sure that using volatile in the specific cases mentioned produces code that works; what this means is that the volatile-using code should behave just like it would under regular, non-exotic control structures. Of course, it's even better if the implementation can do more than the minimum, such as: define and document some additional cases for signal handling code; make variable access in setjmp functions work without having to use volatile, or give warnings for potential transgressions (or both).
The two specific cases are easy to identify, but of course the interesting cases are everything else! This area is one of the murkiest in C programming, and it's useful to take a moment to understand why. For implementors, there is a tension between code generation and what semantic interpretation the Standard requires, mostly because of optimization concerns. Nowhere is this tension felt more keenly than in translating 'volatile' references faithfully, because volatile exists to make actions in the abstract machine align with those occurring in the physical machine, and such alignment prevents many kinds of optimization. To appreciate the delicacy of the question, let's look at some different models for how implementations might behave.
The first model is given as an Example in 5.1.2.3 p 8:
EXAMPLE 1 An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics.
We call this the "White Box model". When using implementations that follow the White Box model, it's never necessary to use volatile (as the Standard itself points out: "The keyword volatile would then be redundant.").
At the other end of the spectrum, a "Black Box model" can be inferred based on the statements in 5.1.2.3 p 5. Consider an implementation that secretly maintains "shadow memory" for all objects in a program execution. Regular memory addresses are used for address-taking or index calculation, but any actual memory accesses would access only the shadow memory (which is at a different location), except for volatile-qualified accesses which would load or store objects in the regular object memory (ie, at the machine addresses produced by pointer arithmetic or the & operator, etc). Only the implementation would know how to turn a regular address into a "shadow" object access. Under the Black Box model, volatile objects, and only volatile objects, are usable in any useful way by any activity outside of or not under control of the implementation.
At this point we might stop and say, well, let's just make a conservative assumption that the implementation is following the Black Box model, and that way we'll always be safe. The problem with this assumption is that it's too conservative; no sensible implementation would behave this way. Consider some of the ramifications:
- Couldn't use a debugger to examine variables (except volatile variables);
- Couldn't call an externally defined function written in assembly or another language, unless the function is declared with a prototype having volatile-qualified parameters (and even that case isn't completely clear, because of the rule at the end of 6.7.5.3 p 15 about how functions types are compared and composited);
- Couldn't call ordinary OS functions like read() and write() unless the memory buffers were accessed using volatile-qualified expressions.
These "impossible" conditions never happen because no implementation is silly enough to take the Black Box model literally. Technically, it would be allowed, but no one would use it because it breaks too many deep assumptions about how a C runtime interacts with its environment.
A more realistic model is one of many "Gray Box models" such as the example implementation mentioned in 5.1.2.3 p 9:
Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries. In such an implementation, at the time of each function entry and function return where the calling function and the called function are in different translation units, the values of all externally linked objects and of all objects accessible via pointers therein would agree with the abstract semantics. Furthermore, at the time of each such function entry the values of the parameters of the called function and of all objects accessible via pointers therein would agree with the abstract semantics. In this type of implementation, objects referred to by interrupt service routines activated by the signal function would require explicit specification of volatile storage, as well as other implementation-defined restrictions.
Here the implementation has made a design choice that makes volatile superfluous in many cases. To get variable values to store-synchronize, we need only call an appropriate function:
extern void okey_dokey( void ); extern int v;
... v = 49; // storing into v is a "volatile" access okey_dokey(); foo( v ); // this access is also "volatile"
Note that these "volatile" accesses work the way an actual volatile access does because of an implementation choice about calling functions defined in other translation units; obviously that's implementation dependant.
Let's look at one more model, of interest because it comes up in operating systems, which are especially prone to want to do things that won't work without 'volatile'. In our hypothetical kernel code, we access common blocks by surrounding the access code with mutexes, which for simplicity are granted with spin locks. Access code might look like this:
while( block_was_locked() ) { /*spin*/ } // getting here means we have the lock // access common block elements here // ... and access some more // ... and access some more // ... and access some more unlock_block();
Here it's understood that locking ('block_was_locked()') and unlocking ('unlock_block()') will be done using volatile, but the accesses inside the critical region of the mutex just use regular variable access, since the block access code is protected by the mutex.
If one is implementing a compiler to be used on operating system kernels, this model (only partially described, but I think the salient aspects are clear enough) is one worth considering. Of course, the discussion here is very much simplified, there are lots more considerations when designing actual operating system locking mechanisms, but the basic scheme should be evident.
Looking at a broader perspective, is it safe to assume this model holds in some unknown implementation(s) on our platforms of choice? No, of course it isn't. The behavior of volatile is implementation dependant. The model here is relevant because many kernel developers unconsciously expect their assumptions about locks and critical regions, etc., to be satisfied by using volatile in this way. Any sensible implementation would be foolish to ignore such assumptions, especially if kernel developers were known to be in the target audience.
Returning to the original question, what answers can we give?
If you're an implementor, know that the Standard offers great latitude in what volatile is required to do, but choosing any of the extreme points is likely to be a losing strategy no matter what your target audience is. Think about what other execution regime(s) your target audience wants/needs to interact with; choose an appropriate model that allows volatile to interact with those regimes in a convenient way; document that model (as 6.7.3p6 requires for this implementation-defined aspect) and follow it faithfully in producing code for volatile access. Remember that you're implementing volatile to provide access to alternative execution regimes, not just because the Standard requires it, and it should work to provide that access, conveniently and without undue mental contortions. Depending on the extent of the regimes or the size of the target audience, several different models might be given under different compiler options (if so it would help to record which model is being followed in each object file, since the different models are likely not to intermix in a constructive way).
If you're a developer, and are intent on being absolutely portable across all implmentations, the only safe assumption is the Black Box model, so just make every single variable and object access be volatile-qualified, and you'll be safe. More practically, however, a Gray Box model like one of the two described above probably holds for the implementation(s) you're using. Look for a description of what the safe assumptions are in the implementations' documentation, and follow that; and, it would be good to let the implementors know if a suitable description isn't there or doesn't describe the requirements adequately.