Question About Sequence Points and Interrupt/Thread Safety

I've included a function below and the generated STM8 assembly-language. As it ends up (based on the assembly-language), the function is interrupt safe as intended.

My question is, let's assume I have this:

DI(); if (x) x--; EI();

where DI and EI just expand to the compiler's asm( ) feature to insert the right machine instruction to disable and enable interrupts, ...

Is there any reason that the compiler cannot delay writing "x" back so that I get effectively this:

DI(); cpu_register = x; if(cpu_register) cpu_register--; EI(); x = cpu_register;

???

It isn't clear to me if "volatile" is required on "x" or if there is any possibility of the write of the variable back to memory being delayed.

Thanks for any insight.

Function below.

The Lizard

----------

//-------------------------------------------------------------------------------- //DESCRIPTION // Decrements an array of zero or more 8-bit unsigned integers, but not // below zero. This function is intended for software timers, but may have // other applications as well. // //INPUTS // in_arg // Pointer to first element to be decremented. This pointer must // be valid if in_nelem > 0. // // in_nelem // Number of elements to be decremented. If this value is 0, in_nelem // will not be dereferenced and may be NULL or otherwise invalid. // //INTERRUPT CONSIDERATIONS // This function must be called only with interrupts enabled (it uses simple // DI/EI protocol). // // This function may be called from non-ISR software only. // // In the case of software timers, individual software timers may be safely // shared with interrupt service, due to the critical section protocol. So, // an ISR may safely set and test software timers. Note that the behavior // of individual software timers is guaranteed by DI/EI, but the relationship // between timers is not, as an interrupt may occur while an array or sets // of arrays are being decremented. // //MNEMONIC // "dec" : decrement. // "u8" : unsigned 8-bit. // "arr" : array. // "nbz" : not below zero. // //UNIT TEST HISTORY // // //-------------------------------------------------------------------------------- void MF_decu8arr_nbz(UINT8 *in_arg, UINT16 in_nelem) { while (in_nelem) { DI(); if (*in_arg) (*in_arg)--; EI(); in_nelem--; } }

700 ; 289 void MF_decu8arr_nbz(UINT8 *in_arg, UINT16 in_nelem) 700 ; 290 { 701 switch .text 702 00b4 f_MF_decu8arr_nbz: 704 00b4 89 pushw x 705 00000000 OFST: set 0 708 00b5 200d jra L552 709 00b7 L352: 710 ; 293 DI(); 713 00b7 9b sim 715 ; 294 if (*in_arg) 717 00b8 1e01 ldw x,(OFST+1,sp) 718 00ba f6 ld a,(x) 719 00bb 2701 jreq L162 720 ; 295 (*in_arg)--; 722 00bd 7a dec (x) 723 00be L162: 724 ; 296 EI(); 727 00be 9a rim 729 ; 297 in_nelem--; 731 00bf 1e06 ldw x,(OFST+6,sp) 732 00c1 5a decw x 733 00c2 1f06 ldw (OFST+6,sp),x 734 00c4 L552: 735 ; 291 while (in_nelem) 737 00c4 1e06 ldw x,(OFST+6,sp) 738 00c6 26ef jrne L352 739 ; 299 } 742 00c8 85 popw x 743 00c9 87 retf
Reply to
Jujitsu Lizard
Loading thread data ...

3.6: "A full expression is an expression that is not part of another expression. Each of the following is a full expression: an initializer; the expression in an expression statement; the controlling expression of a selection statement ( if or switch ); the controlling expression of a while or do statement; each of the three expressions of a for statement; the expression in a return statement. The end of a full expression is a sequence point."

Having said that, the "as if" rule applies. If the implementation "wants" to delay the assignment to x, it is permitted to do so /provided/ that a strictly conforming program can't tell the difference.

--
Richard Heathfield 
Email: -http://www. +rjh@
 Click to see the full signature
Reply to
Richard Heathfield

"tell the difference" is a bit of an ambiguous phrase.

I think you are saying that in this case the compiler is not free to delay the write of "x" because that would be a logical error -- a conforming ISR could in fact "tell the difference", and it would result in logical errors in the program.

Am I understanding your response correctly?

Thanks, The Lizard

Reply to
Jujitsu Lizard

None of asm, interrupts or threads are actually part of standard C, they are extensions provided by some implementations, and how they work differ. You have cross-posted to comp.arch.embedded where there are rather more people who know about this sort of thing that on comp.lang.c where most of the issues are not really topical. I've set follow-ups to comp.arch.embedded for further discussion.

OK, so we know what these do, so we can make some educated guesses from the standards point of view. However, it is possible that on some systems different threads could be running on different processor cores and so still be accessing x simultaneously even with interrupts disabled!

No reason at all, since as far as the C standard is concerned the only thing that could occur before the next access of x is a signal, and if it was acted on by a signal handler it would need to be "volatile sig_atomic_t" for the behaviour to be defined.

From the standards point of view it needs to be at least volatile and it would be best in my opinion for it to be at least "volatile sig_atomic_t"

Since you are talking about threads you should look at what facilities the threading implementation you are using provides and what guarantees it provides.

--
Flash Gordon
Reply to
Flash Gordon

I.e. if we have delimit a region of code with some compiler-specific inline assembly magic, are there any requirements that we actually get a properly implemented critical region?

The answer is, that the standard C language deosn't have any requirements in htis area. The asm feature is an extension of your compiler (and a non-conforming one, if it is actually called asm rather than say __asm, since a C program can use the identifier asm).

Any requirements related to asm, such as the interaction between asm blocks and surrounding code, can only be found in your compiler's documentation.

For instance, if you are using GNU C, there are special things you must do in and around your __asm__ constructs to ensure that code is not improperly reordered. The GNU compiler allows inline assembly to be quite tightly integrated into the generated code, and can even allocate registers for you. I.e. in the inline assembly you can refer to virtual register names, rather than concrete ones, and associate them with operands denoted by C syntax. The compiler will find registers for those operands, and generate the loads and stores to dovetail them into the surrounding code.

If you want to use GNU C inline assembly for things like critical regions, where there are interactions with other threads or interrupts that are not obvious to the compiler, you have to inform it that there are ordering and memory issues.

The volatile keyword is not particularly useful for concurrency issues. It is defined by ISO C and has a couple of uses in the standard language, in relation to signal handlers and setjmp/longjmp.

Whether it's suitable for any other purpose is up to the implementations. So volatile may either be too weak to prevent the reordering that you are worried about, or, on the other extreme, it may be a blunt instrument---i.e. it may defeat all optimization of the object that is declared volatile!

So if you use volatile, you might actually get the code

cpu_register_0 = x

if (cpu_register_0 != 0) { cpu_register_1 = x cpu_register_1-- x = cpu_register_1 }

I.e. since your code accesses x twice and stores once, and x is volatile, the generated code may also access twice.

But when you have critical regions of code accessing shared data, you don't want such unoptimized access to data. You only want to stop optimization across the entry and exit to the critical region, not everywhere! You want correct concurrency, but not at the cost of poor code.

This is why some of the popular multithreading interfaces, like POSIX threads and Win32, do not require volatile qualification on shared data. They stipulate that acquiring and releasing a proper synchronization object is enough. e.g. if you call pthread_mutex_lock, then writes performed before the call are settled, and no premature reads have taken place.

Ideally, you want your DI and EI macros to behave the same way. The DI macro should provide the assurance that accesses to objects prior to DI have completed, and accesses which happen after have not yet begun.

Your compiler's documentation must explain how to do this. If it does not, you can try your luck in various ways, like investigating the compiler's actual behavior.

It may be enough to put DI and EI into external functions (functions defined in a different translation unit from everything that calls them).

Since x is a shared variable, then the compiler must supect that an external function call may modify x --- unless it is devilishly clever and can prove otherwise. That is to say, the assignment to x cannot be delayed until after EI(), because EI is an external function which can interact with x. Even if x is a block-scope static varaible, EI could conceivably recurse back into this function:

static void local_fun() { extern void DI(void); extern void EI(void);

static int shared_x;

/* ... */

DI(); shared_x++; EI();

/* ... */ }

Without knowing anything about EI and DI, we can't prove that they don't recurse into local_fun somehow, in which case shared_x must have the old upon the call to DI, and the new value before the call to EI.

If you put the interrupt manipulation into external functions, there is a good likelihood that it will work. Of course, you have to review the generated code, and that would be a last resort, if you cannot coax the behavior out of the inlined versions.

Reply to
Kaz Kylheku

And of course a program that uses DI() and EI(), assum In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

And paragraph 4 may also be relevant here:

When the processing of the abstract machine is interrupted by receipt of a signal, only the values of objects as of the previous sequence point may be relied on. Objects that may be modified between the previous sequence point and the next sequence point need not have received their correct values yet.

But I think that a program that uses DI() and EI() isn't just not-strictly-conforming; I think it's all the way into the realm of undefined behavior (which merely means behavior that isn't defined by the standard). I think your question can be answered only by the documentation for your implementation.

--
Keith Thompson (The_Other_Keith) kst@mib.org  
Nokia
 Click to see the full signature
Reply to
Keith Thompson

Okay, I'll amplify it a bit. Imagine one strictly conforming program, and two otherwise identical compilers, one of which delays the assignment to x and one of which does not. Compile it with each compiler and run it with identical input. If the two invocations of the program (one under each compiler) produce identical output, then the strictly conforming program failed to tell the difference, so the delay is okay. But if they don't, then the delay is not okay.

It's allowed to delay the x write /provided/ that the same results are produced as in an otherwise identical compiler that does not delay the x write.

--
Richard Heathfield 
Email: -http://www. +rjh@
 Click to see the full signature
Reply to
Richard Heathfield

My interpretation: since you used asm(), all bets are off from the point of view of the C standard. Among other things, the compiler doesn't know that the EI or DI instructions don't clobber registers that it's using. Even with some assembly languages and machine code, instructions are not necessarily executed in the order they are in the code. I suspect that your processor does not schedule instructions like this.

If you have a good compiler, it might assume that instructions unknown to it introduced with asm() potentially clobber almost everything, so it has to push most of the registers and restore them afterward. This may kill your speed but guarantee ordering.

gcc's version of asm() allows you to indicate what registers are potentially clobbered and which aren't. Done carefully, this lets you minimize unnecessary register-saving.

If you use threads, there's no guarantee that sequence points will work *between* threads. I'd expect that sequence points will work OK between some stuff and other stuff in the same thread.

No. And there's no reason at the end of this that it won't hold on to cpu_register as containing a valid copy of x for use later on even if storing the value back to x comes before EI().

It wouldn't hurt, except perhaps in performance, and it might help in the situation above where it remembers that cpu_register contains a valid copy of x when it might not be - volatile in that situation might eliminate the problem.

You really can't look to the C standard for this. You have to rely on guarantees made by the compiler, which usually aren't much.

Reply to
Gordon Burditt

Addendum to my previous post ...

I did a little more dinking with the compiler, and I'm convinced now that it can't be fully trusted.

Here is the code (just dinking around):

UINT8 in_nelem22;

void MF_decu8arr_nbz(UINT8 *in_arg, UINT16 in_nelem) { while (in_nelem) { DI(); in_nelem22 += 3; EI(); in_nelem22 += 5; in_nelem22 += 7; in_nelem++; } }

and here is what it got me:

701 ; 291 void MF_decu8arr_nbz(UINT8 *in_arg, UINT16 in_nelem) 701 ; 292 { 702 switch .text 703 00b4 f_MF_decu8arr_nbz: 705 00b4 89 pushw x 706 00000000 OFST: set 0 709 00b5 200f jra L552 710 00b7 L352: 711 ; 295 DI(); 714 00b7 9b sim 716 ; 296 in_nelem22 += 3; 718 00b8 c60000 ld a,_in_nelem22 719 00bb ab03 add a,#3 720 ; 297 EI(); 723 00bd 9a rim 725 ; 298 in_nelem22 += 5; 727 00be ab0c add a,#12 728 ; 299 in_nelem22 += 7; 730 00c0 c70000 ld _in_nelem22,a 731 ; 300 in_nelem++; 733 00c3 5c incw x 734 00c4 1f06 ldw (OFST+6,sp),x 735 00c6 L552: 736 ; 293 while (in_nelem) 738 00c6 1e06 ldw x,(OFST+6,sp) 739 00c8 26ed jrne L352 740 ; 302 } 743 00ca 85 popw x 744 00cb 87 retf

Note that the EI() caused it to split up the additions into +3 and +12, but that IT DID NOT WRITE THE VARIABLE BACK TO MEMORY UNTIL LATER (it kept the contents in the accumulator).

This is a danger sign.

It means in a complex code sequence with DI() and EI(), it might get me!

Thanks for all the information and advice.

The Lizard

Reply to
Jujitsu Lizard

What if you make DI and EI into external functions, in a separate compilation unit? That could just be the silver bullet for this compiler; no need to snapshot the assembly language.

The external calls are a cost (and add to the amount of time you spend with interrupts disabled!) but maybe it's a wortwhile tradeoff, if it works.

Also, in spite of what I wrote about volatile in the other article, it may also work under this compiler, and you can minimize the performance-hurting aspects of volatile by using non-volatile temporaries.

I.e. suppose we want this:

DI(); if (condition(x)) x++; EI();

If x is volatile int, the semantics is that there are two accesses to x and one store. But suppose you have a local variable temp of the same type as x, but not volatile:

DI(); { int temp = x; if (condition(temp)) x = temp + 1; } EI();

Now we are back to one access and one store, the minimum required. We are hoping that the compiler can optimize away temp entirely.

It would be irksome to rewrite all critical region code this way, though.

It's obvious that no memory writes at all take place in the critical region between the ``sim'' and ``rim''.

Reply to
Kaz Kylheku

[assembly code snipped]

What happens if you declare in_nelem22 as volatile?

--
Keith Thompson (The_Other_Keith) kst@mib.org  
Nokia
 Click to see the full signature
Reply to
Keith Thompson

Kaz Kylheku writes: [...]

[...]

Quibble 1: a store is an access. C99 3.1 defines "access" as " to read or modify the value of an object", with a note:

Where What constitutes an access to an object that has volatile-qualified type is implementation-defined.

This doesn't contradict your advice, though; judicious use of the volatile qualifier just might be what the OP needs to get his program working with his particular compiler.

(Kaz, would you mind making your lines a trifle shorter? The often go past 80 columns when quoted with "> ". I usually format paragraphs to

70 or 72 columns.)
--
Keith Thompson (The_Other_Keith) kst@mib.org  
Nokia
 Click to see the full signature
Reply to
Keith Thompson

You can not make any assumptions. There is no guarantee of the correct behavior of this piece of code. If you need to ensure the atomicity, use the explicit critical section functions provided by OS or develop the assembly functions of your own.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

Assembly language below. Looks a bit different!

BTW, the ld instructions are in the form

ld destination, source

which might help with clarity.

701 ; 291 void MF_decu8arr_nbz(UINT8 *in_arg, UINT16 in_nelem) 701 ; 292 { 702 switch .text 703 00b4 f_MF_decu8arr_nbz: 705 00b4 89 pushw x 706 00000000 OFST: set 0 709 00b5 201d jra L552 710 00b7 L352: 711 ; 295 DI(); 714 00b7 9b sim 716 ; 296 in_nelem22 += 3; 718 00b8 c60000 ld a,_in_nelem22 719 00bb ab03 add a,#3 720 00bd c70000 ld _in_nelem22,a 721 ; 297 EI(); 724 00c0 9a rim 726 ; 298 in_nelem22 += 5; 728 00c1 c60000 ld a,_in_nelem22 729 00c4 ab05 add a,#5 730 00c6 c70000 ld _in_nelem22,a 731 ; 299 in_nelem22 += 7; 733 00c9 c60000 ld a,_in_nelem22 734 00cc ab07 add a,#7 735 00ce c70000 ld _in_nelem22,a 736 ; 300 in_nelem++; 738 00d1 5c incw x 739 00d2 1f06 ldw (OFST+6,sp),x 740 00d4 L552: 741 ; 293 while (in_nelem) 743 00d4 1e06 ldw x,(OFST+6,sp) 744 00d6 26df jrne L352 745 ; 302 } 748 00d8 85 popw x 749 00d9 87 retf
Reply to
Jujitsu Lizard

On Tue, 24 Feb 2009 20:23:56 -0500, "Jujitsu Lizard" wrote in comp.arch.embedded:

As others have pointed out, this is beyond the C standard and off-topic in comp.lang.c, but definitely topical in comp.arch.embedded.

You have no reason whatsoever to expect much of anything without a real function call, unless you define the object as volatile. A macro that expands to inline assembly language that does not include a subroutine call does not show the compiler that some external code is going to access the object.

The other problem is that it can be even worse than you think even if you do define it as volatile.

First, there's the fact that the definition of volatile is just plain broken in the C language and standard. It is no particular consolation that it is even more broken in C++.

The real crime is the fact there's a paper showing a number of compilers that actually have errors in their implementation of volatile that can be detected by a strictly conforming program. See:

formatting link

Hopefully, compilers for real embedded systems processor architectures do better than this.

--
Jack Klein
Home: http://JK-Technology.Com
 Click to see the full signature
Reply to
Jack Klein

But there is no such thing as a "conforming ISR" in C, because ISR's are, strictly speaking, outside of the C language's paradigm.

As are, for that matter, asm statements.

So you have one of three choices:

1: Write all your really critical stuff in assembly. 2: Write your critical stuff in C, check the compiler output, and be prepared for bugs if you get a different compiler -- even a new version of the same old. 3: (I hate these folks) Write your critical stuff in whatever, and move on to a new job before your sins catch up to you.
--
http://www.wescottdesign.com
Reply to
Tim Wescott

The link-time optimizer, assuming the compiler in question has one, can still ruin everything.

Reply to
Chris M. Thomasson

#3 shows that you are not under 30 years old.

You understand the dark side of the force.

Reply to
Jujitsu Lizard

That is the single most depressing paper I've read recently.

But thanks for the link : )

The Lizard

Reply to
Jujitsu Lizard

Jack,

In fact, the paper you linked to was SO depressing, I just had to try to match you ... try this one on for size:

formatting link

What always amazed me about the Therac affair is the initial denial of the existence of a problem.

The Lizard

Reply to
Jujitsu Lizard

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.