The Semantics of 'volatile'

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

volatile int buffer_ready = 0;

of course!

;^)

Now, there is another issue. On NUMA systems with a non-cache-coherent network of processing nodes, the code above might not work. One may need to issue a special instruction in order to force the store issued on `buffer_ready' to propagate from the intra-node level, up to the inter-node level. Think if `check_and_process_buffer()' was running on `Node1-CpuA-Core2-Thread-3', and `buffer_init()' was running on `Node4-CpuD-Core3-Thread-1', and the memory which makes up `buffer_ready' and `buffer' was local to `Node4'. There is no guarantee that the store to `buffer_ready' will become visible to the CPU's on `Node1'. You may need to use special instructions, such as message passing via channel interface or something. Think of the PPC wrt communication between the memory which belongs to the main PowerPC's, and the local private memory that belong to each SPU. volatile alone is not going to help here, in any way shape or form...

Reply to
Chris M. Thomasson
Loading thread data ...

Hmmmm... how can I say this gently? I think you may be confusing the notions of abstract machine and physical machine.

You say, in part, "[MEMBAR] maintains the state of the abtract machine to what the developer wrote." In fact MEMBAR is not necessary for correct functioning of the abstract machine. If MEMBAR is necessary at all, it's necessary only for producing appropriate physical machine semantics for use of volatile. If all MEMBAR's were taken out, and no variables were accessed externally, the program would still execute correctly. In other words the abstract machine would still get a faithful mapping -- it's only external accesses that might be affected, and such accesses are not part of the abstract machine.

Also, you talk about "the hardware level". There isn't a single hardware level. There are at least two, namely, the hardware state as seen by execution of a single instruction stream (where MEMBAR isn't needed), and the hardware state as seen by execution of another thread or process, perhaps on another CPU (where MEMBAR may be necessary to preserve some sort of partial ordering as seen by the single instruction stream "virtual machine"). It isn't required that volatile take the latter perspective -- it could just as well take the first perspective, under the provision that what constitutes a volatile-qualified access is implementation-defined. Depending on what enviroments the implementation is intended to support, either choice might be a good one.

I fully understand that hardware plays a role in "optimization" (used in a slightly different sense here) -- I mentioned store reordering in another posting, and there is also out-of-order execution, and even speculative execution, etc. However these "optimizations" are irrelevant as far as the implementation is concerned (for non-volatile access), because the state as viewed by the single executing instruction stream is carefully maintained to appear exactly as though storage ordering is preserved, instruction ordering is preserved, speculative branches that end up not being taken are suppressed, etc[*]. It's only when 'volatile' is involved that these effects might matter, because the program execution stated is being viewed by an agent (process, thread, device logic, etc) external to this program's execution state.

Finally, to repeat/restate my earlier comment, it's only true that these effects /might/ matter, and not that they /must/ matter, because an implementation isn't obligated to take the other-process perspective as to what constitutes a volatile-qualified access. Depending on what choice is made for this, the hardware-level "optimizations" might or might not need to be taken into account for what volatile does.

Does that make a little more sense now?

[*] It's a different story for machines like MIPS where the underlying pipeline stages are exposed at the architectural level. But that's not important for this discussion.
Reply to
Tim Rentsch

Right, the different types of memory actions correspond to different memory regimes -- the intra-node level is one memory regime, and the inter-node level is another memory regime.

I take what you're saying here to mean that the implementation shown above, with MEMBAR's but not special instructions for inter-node stores, will not guarantee that the store to 'buffer_ready' will be visible, because so much depends on the specific memory architectures and how they interact.

Yes -- arbitrarily diverse memory architectures mean potentially arbitrarily complicated memory coherence mechanisms.

Most likely it won't, but in principle it could. Assuming first that the necessary memory linkage could be established, so memory in the 'buffer_init()' process could be accessed by code in the 'check_and_process_buffer()' process (such linkage could), an implemenation could choose to implement volatile so it synchronized the two memories appropriately when the volatile accesses are done.

In practical terms I agree this sort of implementation isn't likely, but the Standard allows it -- in particular, as to how 'volatile' would behave in this respect, because that choice is implementation-defined.

Reply to
Tim Rentsch

Well, I ___expect___ MEMBAR to behave well within an intra-node point of view. AFAICT, for inter-node communications, on NON ccNUMA (e.g., real cache incoherent NUMA), if you can find MEMBAR pushing out coherency "pings" across inter-node boundaries, well, that would not be good, IMVVVHO at least!

However, what does ANY of that have to do with volatile?

A: NOTHING!

;^o

Indeed. Well, I totally disagree on a vibe I am getting from your statement. I get the vibe that you seem to think diverse highly specific memory models seem to potentially require complicated coherence... Well, the term `complicated' is in the eye/ear of the individual beholder, or perhaps softened across a plurality of a specific local group of beholders... Statistics are so precise!

Jesting of course... Perhaps? ;^D

Anyway, your 100% correct. Sometimes a parallelization of an algorithm might simply require so many rendezvous' of some, perhaps "dubious", sort that they simply cannot ever be made to scale in there present form.

Yes. Absolutely.

An implementation can do its thing and define volatile accordingly.

I PERSONALLY WANT volatile to be restricted to compiler optimizations wrt the context of the abstract virtual machine. Of course the abstract machine is single-threaded. Great! That means a physical machine can implement a million threads that each implement a single local abstract C machine. They never communicate until the end of computation. Lets that takes a month. That whole month is governed by the local abstract machines. Lets say they have the ability to network and cleverly rendezvous in a NUMA system after they were finished? I say yes... No volatile needed; well, volatile can probably be efficiently used by node-local only code...

As for optimizations on loop conditions, well, that's newbie stuff...

;^o

Reply to
Chris M. Thomasson

Sorry, I guess I wasn't quite clear enough. My comment was meant basically as an implicit question, trying to clarify your intended meaning. Rephrasing, the two salient features are, one, using MEMBAR is enough to guarantee intra-node access consistency, and two, using MEMBAR (and nothing else) is not enough to guarantee inter-node access consistency. That what I thought you meant before, and I read this response as confirming that.

I think it's relevant to the discussion (ie, of volatile) because there are two clearly distinct memory regimes (intra-node and inter-node), and it's perfectly reasonable to consider an implementation's volatile supporting one but not the other. Indeed, I take you're saying to mean it's reasonable to /expect/ volatile to support intra-node access but not inter-node access in this case. And I think that's right, in the sense that many people experienced in such architectures would expect the same thing. (And I wouldn't presume to contradict them, even if I expected something else, which actually I don't.)

My statement was more in the nature of an abstract, "mathematical" conclusion than a comment on what architectures are actually out there. I think we're actually pretty much on the same page here. (OOPS! No pun intended...)

To say this another way, parallelizing an algorithm in a particular way might work well for one kind of synchronization (eg, intra-node coherence) but not for another kind of synchronization (eg, inter-node coherence).

I had to read this sentence over several times to try to make sense of it. I think I understand what you're saying; let me try saying it a different way and see if we're in sync. Optimizations don't happen in the abtract machine -- it's a single thread, one-step-at-a-time model, exactly faithful to the original program source. However, in the course of running a program on an actual computer, there needs to be a degree of coherence between the abstract machine's "memory system" and the computer's memory system. (The abstract machine's "memory system" doesn't really exist except in some sort of conceptual sense, but it seems useful to pretend it exists, to talk about coherence between it and the actual computer memory). The coherence between the abstract machine's memory system and the actual computer's memory doesn't have to be exact, it only has to match up to the point where the "as if" rule holds. Does that make sense?

Under this model, I think you're saying that you would like volatile to impose coherence between the abstract machine memory system and the "most local" physical machine memory system (ie, the same thread executing on the same CPU), and not more than that. This coherence is stronger than the non-volatile coherence, because the two memory systems must be completely in sync (and not just "as if" in sync) at points of volatile access.

In other words, the memory regime you're identifying (that volatile would or should align with) is the same thread, same CPU memory regime. Anything more than that, including inter-core (but still intra-CPU), or even inter-thread (but still intra-core and intra-CPU) would not be covered just by volatile. Is that what you mean, or are do you mean to say something different?

To come at this a different way, let me ask it this way: which level of communication/coherence (do you mean to say that) volatile should support

(a) only same-thread, same core, same CPU, same node (b) inter-thread, intra-core, intra-CPU, intra-node (c) inter-thread, inter-core, intra-CPU, intra-node (d) inter-thread, inter-core, inter-CPU, intra-node (e) inter-thread, inter-core, inter-CPU, inter-node (f) something else? (I didn't even mention intra/inter-process...)

I first thought you meant (a), but now I'm not so sure.

I'm not sure what a same-thread/same-core/same-CPU/same-node definition of volatile buys us, except some sort of guarantee for variable access in intra-thread signal handlers. (There is also setjmp()/longjmp(), but I think that's incidental since whatever guarantees there are there will be true no matter what memory regime volatile identifies.)

Certainly it's possible to do inter-thread or inter-process communication/synchronization using extra-linguistic mechanisms and not using volatile, even under model (a) above. Ideally an implementation would support several different choices of which volatile model it follows (eg, selected by a compiler flag), and developers could choose the model appropriate to the needs of the program being developed. Before that can happen, however, we have to have a language to talk about what the different choices mean. My intention and hope in this thread has been to start to develop that language, so that different choices can be identifed, discussed, compared, and ideally selected -- easily.

Reply to
Tim Rentsch

[...]

First of all I need to read your entire detailed response carefully in order to give a complete response. However, I can answer the question above:

I choose `a'

I do not agree with the fact that MSVC automatically inserts memory barriers on volatile accesses because it can creates unnecessary overheads. What happens if I don't need to use any membars at all, but still need to use volatile? Well, the damn MSVC compiler will insert the membars right under my nose. Also, what if I need a membar, but something not as strict as store-release and load-acquire? Again, the MSVC compiler will force the more expensive membars down my neck. Or, what if I need the membar, but in a different place than the compiler automatically inserts them at? I am screwed and have to code custom synchronization primitives in assembly language, turn link time optimizations off, and use external function declarations so they are accessible to a C program.

So, I want volatile to only inhibit certain compiler optimizations. I do not want volatile to automatically stick in any membars1

;^o

Reply to
Chris M. Thomasson

Good, this makes clear (or at least mostly clear) what you want. I also think I understand why you want it; not that that's important necessarily, but to some degree the why clarifies the what in this case.

At the same time, I think many other developers would prefer other choices, including most of b-f above. It would be good to support other choices also, perhaps through compiler options or by using #pragma's. There isn't a single "right" choice for what volatile should do -- it depends a lot on what kind of program is being developed and on what assumptions hold for the environments in which the program, or programs, will run. Ideally both the development community and the implementation community will start to realize this (or, realize it more fully). After that happens, there needs to be a common language describing different possible meanings for volatile -- more specifically, language more precise than the kind of informal prose that's been used in the past -- so that developers and implementors can talk about the different choices, and identify which choices are available in which implementations.

Reply to
Tim Rentsch

=3D=3D=3D

One of the biggest analysis of 'volatile' i have come across :):)

Karthik Balaguru

Reply to
karthikbalaguru

[199 lines deleted]

Why did you feel the need to re-post the whole thing just to add a fairly meaningless one-line comment?

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
Nokia
 Click to see the full signature
Reply to
Keith Thompson

And you _had_ to quote it in its entirety, adding nothing but that trivial remark, for _what_ reason?

Furrfu.

Richard

Reply to
Richard Bos

... snip ...

Agreed. Why do people do these silly things? Who said 'They do it only to annoy' in Alice in Wonderland?

--
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: 
 Click to see the full signature
Reply to
CBFalconer

I believe it was the cook, in her song about sneezing children (from memory, so it might not be word-perfect):

Speak roughly to your little boy, And beat him when he sneezes, He only does it to annoy Because he knows it teases.

Reply to
David Brown

Word-perfect, but failed on punctuation.

formatting link
's_Adventures_in_Wonderland/Chapter_6

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
Reply to
Boudewijn Dijkstra

Great post!!! Thanks to the author.

I am new to the embedded world and I would like to hear from other developers how they handle volatile. Basically, I try to avoid it as much as possible. Anything that is meant to be volatile, I avoid declaring it as such. Instead, I simply make sure that the accesses are performed from different translations units using helper functions (i.e., getters and setters) in the hope that many compilers will produce the result I am expecting. Am I safe though? Probably not. Short of manually inspecting the assembly produced by the compiler, what else can one do? Simply avoid compiling with optimizations?

My question to you as: How do you deal with volatile? Do you use it at all? If not, what do you use instead? memory barriers?

Thanks

Reply to
figurassa_mano

as

he

The key is Know your compiler.

I try to treat volatile as a hint to the compiler, not a command. So yes, sometimes you just need to get down into the assembly to find out what's really going on.

HTH, ed

Reply to
Ed Prochak

So you are trying to fool the compiler? By the ``as if'' rule compilers are allowed to do the same optimization however you divide your sources over files. Your approach may work in practice. Mostly linkers don't know how to optimize. Linkers aren't necessarily used, or next to invisible, e.g. Turbo C. They stand in the way of aggressive optimisation.

Compiling without an optimisation option not necessarily helps. AFAK Turbo C had no knobs to turn w.r.t. optimisation.

You must inspect your compiler's documentation about volatile. Then try your best to use it in that vein where applicable. Most compilers do the same sensible thing, but volatile is by definition somewhat non-portable.

Groetjes Albert

--

--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
 Click to see the full signature
Reply to
Albert van der Horst

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.