gnu compiler optimizes out "asm" statements

T

Tim Wescott 11 years ago

This is related to my question about interrupts in an STM32F303 processor. It turns out that the problem is in the compiler (or I'm going insane, which is never outside the realm of possibility when I'm working on embedded software).

I'm coding in C++, and I'm using a clever dodge for protecting chunks of code from getting interrupted. Basically, I have a class that protects a block of code from being interrupted. The constructor saves the interrupt state then disables interrupts, and the destructor restores interrupts.

This has been reliable for me for years, but now the destructor is not being called. I suspect that the optimizer can't make sense of it because of the asm statements, and is throwing it away.

If someone knows the proper gnu-magic to tell the optimizer not to do that, I'd appreciate it. I'm going to look in my documentation, but I want to make sure I use the right method, and don't just stumble onto something that works for now but should be depreciated, or is fragile, or whatever.

Here's the "protect a block" class:

typedef class CProtect { public:

CProtect(void) { int primask_copy; asm("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy)); _primask = primask_copy; }

~CProtect() { int primask_copy = _primask; // Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)); }

private: volatile int _primask; } CProtect;

and here's how it's used:

{ CProtect protect;

// critical code goes here }

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

L

Lanarcam 11 years ago

Are you sure that the constructor is called? If the variable protect is not used, it could be "optimized away"

Vote

S

Simon Clubley 11 years ago

Are you sure the destructor is not being called or is it just the asm statement in the destructor which is being optimised away ?

Try marking the asm statement itself as volatile.

See

formatting link

for some discussion.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

T

Tim Wescott 11 years ago

Yes -- the constructor is what disables interrupts.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

T

Tim Wescott 11 years ago

Another data point: I'm optimizing at O1. When I build at O0, it works.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

S

Simon Clubley 11 years ago

In that case, try my suggestion of marking the asm statement itself as volatile.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

L

Lanarcam 11 years ago

Could you try: Cprotect *protect = new Cprotect();

Vote

T

Tim Wescott 11 years ago

The compiler doesn't allow that.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

T

Tim Wescott 11 years ago

I could, but here at Wescott Design Services we have a fairly hard to overcome rule that says "don't thrash the heap". My boss would kill me, which would hurt me twice because I'm my boss.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

T

Tim Wescott 11 years ago

This works (with the optimize attribute specified for each function, and the level set at O0), but I would like some opinions on whether it is kosher. It works even when the overall optimization level is set to "O3", which is cool.

typedef class CProtect { public:

CProtect(void) __attribute__ ((__optimize__ ("O0"))) { int primask_copy; asm("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy)); _primask = primask_copy; }

~CProtect() __attribute__ ((__optimize__ ("O0"))) { int primask_copy = _primask; // Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)); }

private: volatile int _primask; } CProtect;

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

S

Simon Clubley 11 years ago

It works for me in C. What syntax are you using ?

Here's one example from a test program:

asm volatile("mrc p15, 0, %0, c10, c0, 1" : "=r" (tlbldn) : : "memory");

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

T

Tim Wescott 11 years ago

I need to take my brain out and examine it under a microscope to see how large it is, apparently.

I was using "volatile asm". "asm volatile" compiles, and works great, to boot.

So -- more kosher than setting the "optimize" attribute of the whole function to "O0", do you think?

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

T

Tim Wescott 11 years ago

This also works (note commented-out optimize attributes, and "asm volatile"):

class CProtect { public:

CProtect(void) // __attribute__ ((__optimize__ ("O0"))) { int primask_copy; asm volatile ("mrs %[primask_copy], primask\n\t" // save interrupt "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy)); _primask = primask_copy; }

~CProtect() // __attribute__ ((__optimize__ ("O0"))) { int primask_copy = _primask; // Restore interrupts to their previous value asm volatile ("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)); }

private: volatile int _primask; };

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

S

Simon Clubley 11 years ago

[Code example snipped.]

Sorry Tim, but my initial reaction, in a good natured way, is yuck! :-)

The code feels to me like you are trying to trick the compiler instead of solving the core problem and the proposed solution feels "fragile".

Are you sure you can't use "asm volatile" with C++ code ?

I don't know if that would solve your problem but if it did, it would feel more "legitimate" to me as volatile is documented to behave in certain ways as you can see from the page I pointed you to.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

S

Simon Clubley 11 years ago

Certainly (at least based on previous experience).

It will be interesting to see if others agree or if there's any issues I have not thought of.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

T

Tauno Voipio 11 years ago

For embedded code, my favorite is -Os.

-TV

Vote

S

Simon Clubley 11 years ago

Interesting. How does -Os change the behaviour of asm volatile ?

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

D

David Brown 11 years ago

That is almost certainly the issue.

Some compilers consider inline assembly as "volatile" - they view them as something scary, and make sure everything before them is completely finished before executing the secret assembly code, and basically turn off all optimisation around the inline assembly call.

gcc (and clang, and a few other compilers) is not like that - it provides ways for the programmer to tell the compiler exactly what the assembly code affects or depends on, so that it can optimise around it. This is extremely useful for some sorts of inline assembly, and it lets you make good use of processor instructions that cannot easily be expressed in C (such as a bit reverse instruction) with only the bare minimum being written in assembly. It also means you don't have to mess around with things like the "primask_copy" variable in this CProtect class - gcc understands these things, and makes copies in registers as needed.

The flipside is that you have to know the rules, and be very careful to apply them.

A key rule here is "volatile". A normal inline assembly instruction is considered non-volatile - the compiler is free to omit it if it is dead code, and can re-order it as it finds convenient. (Inline assembly statements with no outputs, and whose inputs don't involve addresses, are considered "volatile" by default as they would be pointless if they didn't do something unknown to the compiler.) So step one is to make the inline assembly codes "volatile" so the compiler knows it has execute them, and it has to do so in order.

The second key rule is the interaction of "volatile" accesses (either volatile reads and writes, volatile inline assembly, or calls to unknown external code) and normal accesses. C does not specify this ordering in any way. So in code like this:

int a; volatile int v;

void foo(void) { a = 0; v = 1; a++; v = 2; a++; }

the compiler can re-arrange writes to "a" with writes to "v". It can replace all accesses to a with a "a = 2;", and it can put that before, in the middle, or at the end of the two volatile writes to v.

The same applies to volatile assembly.

Consider this:

uint64_t big;

void atomic_write(uint64_t x) { asm volatile("disableInterrupts"); big = x; asm volatile("enableInterrupts"); }

This will not work, except by luck - the compiler can re-order the write to "big" with respect to the interrupt disable/enable, and therefore destroy your hopes of making an atomic write.

The way to deal with this is either by making the write to "big" volatile, to add artificial volatile dependencies that enforce the order, or by using "clobbers" in the assembly statements. Clobbers can be quite sophisticated when you want to get the maximal performance (by using minimal clobbers), but the easiest and therefore safest method is to clobber "memory":

void atomic_write(uint64_t x) { asm volatile("disableInterrupts" ::: "memory"); big = x; asm volatile("enableInterrupts" ::: "memory"); }

The memory clobber tells the compiler that the inline assembly might read or write memory in unexpected ways - all statements that logically write something to memory that appear before the inline assembly, must complete those writes. And any logical reads from memory after the inline assembly, cannot be started until after the assembly. Data from memory cannot be cached in registers across the assembly.

This is often used with an empty inline assembly:

static inline void compilerBarrier(void) { asm volatile("" ::: "memory"); }

Once we have cleaned up the other minor issues in your class (the unnecessary "volatile" on the private member, the unnecessary typedef, the use of "int" instead of "uint32_t", and the use of reserved identifiers with leading underscores), we get this:

#include

class CProtect { public : CProtect(void) { asm volatile("mrs %[primask_], primask\n" "cpsid i" : [primask_] "=r" (primask_) : : "memory"); }

~CProtect() { asm volatile("msr primask, %[primask_]" : : [primask_] "r" (primask_) : "memory"); } private : uint32_t primask_; };

extern uint64_t big;

void atomic_write(uint64_t x) { CProtect protect; big = x; }

Compiling with this command line (using the usual optimisation setting -Os):

/opt/Freescale/KDS_2.0.0/toolchain/bin/arm-none-eabi-gcc -c a.cpp -Wall

-Wextra -Wa,-ahdsl -Os -mcpu=cortex-m4 -mthumb

gives this assembly:

21 _Z12atomic_writey: 22 .fnstart 23 .LFB6: 24 @ args = 0, pretend = 0, frame = 0 25 @ frame_needed = 0, uses_anonymous_args = 0 26 @ link register save eliminated. 27 @ 9 "a.cpp" 1 28 0000 EFF31083 mrs r3, primask 29 0004 72B6 cpsid i 30 @ 0 "" 2 31 .thumb 32 0006 034A ldr r2, .L2 33 0008 C2E90001 strd r0, [r2] 34 @ 15 "a.cpp" 1 35 000c 83F31088 msr primask, r3 36 @ 0 "" 2 37 .thumb 38 0010 7047 bx lr 39 .L3: 40 0012 00BF .align 2 41 .L2: 42 0014 00000000 .word big 43 .cantunwind 44 .fnend

And that, I believe, is both correct and optimal.

Vote

T

Tim Wescott 11 years ago

Well, there's a reason I'm tossing it out to the group for comment!

Me, too. Actually, I had been compiling at -O1, possibly because with the Cortex M3 processor set it worked at that level but not higher.

I can. I just can't use "volatile asm". See my own reply that's parallel with yours.

"asm volatile" certainly seems to fix the issue (which ended up being that the optimizer had an extraneous call to part of the constructor, not a missing call to the destructor, BTW).

www.wescottdesign.com

Vote

D

David Brown 11 years ago

The general rule is that if you think you need to reduce optimisation to make your code work, your code is wrong. Very occasionally, the compiler is broken - but that should be rare indeed.

"-Os" does most of the "-O2" optimisations, except for an emphasis on smaller size if the speed optimisation in "-O2" would expand the code significantly. (Note that you still get inlining and occasional loop unrolling - but only if the result is smaller code, or if you asked for the inlining explicitly.)

As always with optimisation flags, it keeps correct code correct - but makes it more likely that poor code (such as missing or incorrect volatiles) breaks dramatically.

Vote

gnu compiler optimizes out "asm" statements

Join the Discussion

Didn't find your answer?