gnu compiler optimizes out "asm" statements

Tim Wescott · 2015-05-28T18:27:31+00:00

This is related to my question about interrupts in an STM32F303 processor. It turns out that the problem is in the compiler (or I'm going insane, which is never outside the realm of possibility when I'm working on embedded software). I'm coding in C++, and I'm using a clever dodge for protecting chunks of code from getting interrupted. Basically, I have a class that protects a block of code from being interrupted. The constructor saves the interrupt state then disables interrupts, and the destructor restores interrupts. This has been reliable for me for years, but now the destructor is not being called. I suspect that the optimizer can't make sense of it because of the asm statements, and is throwing it away. If someone knows the proper gnu-magic to tell the optimizer not to do that, I'd appreciate it. I'm going to look in my documentation, but I want to make sure I use the right method, and don't just stumble onto something that works for now but should be depreciated, or is fragile, or whatever. Here's the "protect a block" class: typedef class CProtect { public: CProtect(void) { int primask_copy; asm("mrs %[primask_copy], primask " // save interrupt status "cpsid i " // disable interrupts : [primask_copy] "=r" (primask_copy)); _primask = primask_copy; } ~CProtect() { int primask_copy = _primask; // Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)); } private: volatile int _primask; } CProtect; and here's how it's used: { CProtect protect; // critical code goes here } -- Tim Wescott Wescott Design Services

D

David Brown 11 years ago

In addition, the code would be bigger, slower, open to out-of-memory or memory fragmentation issues, pull in a large chunk of library (if this is the only "new"), and it would still be wrong.

Vote

D

David Brown 11 years ago

It was a good question - I hope my other reply post is useful to others in this group too.

It can be legitimate to use optimize attributes to change the way code is generated (such as using "-Os" for most of the program, and "-O3" for a function where you really want loop unrolling). But changing optimisation levels to hide bugs in the code is rarely a good idea! Remember, optimisation levels are hints to the compiler - they are not commands, and they do not change the logic of your code.

The way you write that implies that the compiler was making a mistake - the compiler was generating legitimate code from the source given.

And don't forget that you almost certainly need the "memory" clobber, unless your protected code contains volatile accesses which force the ordering.

Vote

S

Simon Clubley 11 years ago

Sorry Tim, I was just trying to make a good natured comment. Sorry if it didn't come across that way.

Indeed. The memory clobber is a lesson I learnt long ago as you can see from the example line of code I posted. I should have remembered to mention that one as well.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

T

Tim Wescott 11 years ago

25

@ link

primask 29

31

strd r0,

msr primask, r3 36

0010 7047 bx

.cantunwind 44

Wow. Thank you for that. I try to understand the whys and the wherefores of "asm", but sometimes I fall short -- maybe you need to write compilers to get it all.

(OTOH, if you need a control loop closed -- call me).

That code has been around for a long time, but I dimly recall that part of the reason I used the "primask_copy" stuff was because if the object should be built on the stack rather than using a register, gcc or the assembler would bitch because the mrs instruction only works with registers. If "primask_" ends up in a memory location then the compiler or assembler would just halt with a nastygram to the effect that you're not doing your job right.

My assumption was that using "primask_copy" would keep the thing from bombing in the event that the object was created on the stack, but any side effects from using it (except for the volatile part, which was my fault) would get optimized away.

So wouldn't it be better to continue to use "primask_copy", but take away its volatility and that of "primask_"?

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

T

Tim Wescott 11 years ago

This is what I have at the moment.

class CProtect { public:

CProtect(void) { int primask_copy; asm volatile ("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy) : : "memory"); _primask = primask_copy; }

~CProtect() { int primask_copy = _primask; // Restore interrupts to their previous value asm volatile ("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy) : "memory"); }

private: int _primask; };

It appears to generate sensible assembly code, and it works when I test it.

I haven't tried to force a build of a "CProtect" object on the stack (I'm not sure exactly how, for that matter, other than it would involve lots and lots of local variables).

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

D

DJ Delorie 11 years ago

I think you have your answer on this, but speaking as a gcc maintainer...

Here you have told gcc that you'd doing "something" that will produce a value, which gcc needs to deal with. Since gcc doesn't know what the value will be, and must store it in a volatile member, it has no choice but to run the "something".

In this case, you've told gcc that "something" needs the value of primask_copy, but doesn't do anything visible with it - you have specified no outputs or clobbers, so gcc concludes (based on what you told it!) that the instruction has no side effects and thus gcc is free to omit it if it chooses.

If you need gcc to keep your asm around, you need to give it a reason. Either:

Add an output constraint that gcc must deal with, like the first asm call. (not appropriate in this case)
Indicate that the asm has side effects via a clobber that describes those side effects. Clobbering "all of memory" is one way, but also tells gcc to forget every memory optimization it's made at that point, which is (duh) sub-optimal.
Tell gcc to keep the asm despite the lack of visible side effects, which is what "volatile" is for:

asm volatile ("something" . . .)

The "volatile" tells gcc that the asm does something "unspecified and unoptimizable" - which means not only will gcc keep the asm around, but it will also ensure that other volatile operations specified before the asm happen before the asm, and operations after happen after, i.e. preserving the order of volatile operations.

Note: If you use an old-style asm, which has no i/o parameters:

asm ("something")

gcc assumes the asm is doing something behind the compiler's back (because, in this case, there's no way to tell it otherwise), and leaves it alone. This is not the preferred way of doing things, since you and gcc have no idea if you're going to step on each other's toes.

Vote

T

Tim Wescott 11 years ago

Thanks DJ.

David's method is designed to force the two asm blocks to be before and after the "code in the middle", without asking the developer using CProtect (who is presumably naive, 'cuz it might be me!) to make all of the stuff in the middle volatile or anything else.

I think this is more or less in keeping with the intent of my CProtect class, because it turns off all interrupts, which should be done for an absolute minimal amount of time. (While I have occasional help from other developers, I do review code, and I'm persnickety about not egregiously turning off interrupts - you do it for a good reason, and you do it for as short an amount of time as you can.)

I don't know if there is a better way to meet this intent, but I'm open to suggestion, or even, occasionally, wisdom.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

D

David Brown 11 years ago

That is exactly the point of using a memory clobber (and volatile). As DJ says, it is often overkill - you don't want to use a full memory clobber in the middle of a critical loop, for example. But here it is better to be extra safe than extra fast! And in practical situations for the CProtect class, I think it will seldom be significantly sub-optimal.

I know there are ways to use a more refined clobber list - I /think/ you can just list the variables that you know need to be ordered with respect to the assembly. But I am not experienced enough at them to give advice here, and using a less general clobber would mean removing the "memory" clobber from the CProtect class and requiring class users to have their own guards (asm volatile ("" ::: ); ). That would defeat the ease-of-use and safety of the class.

Vote

D

David Brown 11 years ago

You don't need to write compilers to understand this stuff (I would like to be involved in gcc development, but I simply haven't got the time. Maybe when the kids are all grown up). I just find this stuff interesting.

I know - I bought one of your books. Unfortunately, I lost it before I'd read much of it! But most of my programs only need fairly simple control systems.

The "r" in the assembly statement tells gcc that it needs to go in a register - there are other letters used for memory operands, or more general register-or-memory operands (not so useful for the ARM, but nice for m68k or other cpus). So gcc already knows it needs to move primask_ into a register if it is not there from before. Thus there is no need for you to make an extra copy.

Of course, the optimiser will remove the extra primask_copy if you have included it, so you still get optimal generated code. But I think it is neater without it in the source.

Vote

J

John Speth 11 years ago

It's clear that asm volatile() solves the problem but the reason why will not be obvious to a casual reader. I prefer to explicitly disable optimization using a pragma bracketing the code so that the asm statement (severely non-portable, by the way) will not be optimized away.

JJS

Vote

T

Tim Wescott 11 years ago

You perceive exactly how I want to use the class. Indeed, I'm willing to give up a bit of optimal behavior in return for generality and safety.

Straying off topic from this thread, but I find that things like this CProtect class are (a) invaluable, and (b) way dangerous in the hands of ninnies. I've seen them used by lazy developers around huge swaths of code "because it makes everything work" when what was really needed was a couple of wrappers around non-atomic operations on variables that were accessed in different threads of execution.

Of course, the "everything" that was "fixed" was the part that particular developer was rewarded for making work, not the other time-critical parts of the code that were broken in the process.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

G

Grant Edwards 11 years ago

Sorry, that's just plain wrong.

The compiler is free to optimize as much as it wants in any manner it wants at _any_time_ _regardless_of_options_.

Sure, specifying -O0 might work today using version X.Y.Z of the compiler for this particular target on this particular host.

You are not justified in depending on the assumption it won't fail on for a different target, with a different compiler version (either older or newer), on a different host, or even on a different day. [OK the last one is slightly hyperbolc.]

Just write correct code.

Grant Edwards grant.b.edwards Yow! I always have fun at because I'm out of my gmail.com mind!!!

Vote

T

Tim Wescott 11 years ago

And besides, anything with "asm" in it is automatically weird chit and for the advanced programmer only; the less the casual reader is inclined to become a casual writer, the better.

I will admit that in this particular code base the intent of the code is rather under-commented, because at its original writing it was purely for me. I need to change that...

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

D

Dave Nadler 11 years ago

Your reply was great - thanks for the extremely clear and detailed response to a question for which its really hard to find a concise answer. Thanks! Best Regards, Dave

Vote

D

DJ Delorie 11 years ago

Case in point, gcc 5 *will* optimize some even at -O0.

Vote

L

Les Cargill 11 years ago

There are no side effects in the destructor.

Notes: - extern is not extern "C". You can dern tootin' make 'em extern "C" but I don't see the point. - SFAIK, the "extern...inline" combination works. I have no way to test it. "extern...inline' isn't critical; it's just more or less an apology for making things slightly more complex. - If you are not using separate compilation it gets even simpler.

I would:

#ifndef _CPROTECT_H_ #define _CPROTECT_H_ // CProtect.h

extern void inline interruptsOn(void); extern void inline interruptsOff(void);

typedef class CProtect { private:

public:

CProtect(void) { interruptsOff(); }

~CProtect() { interruptsOn(); }

} CProtect; #endif

// CProtect.cpp

#include

static int _primask;

void inline interruptsOff(void) { int primask_copy; asm("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy)); _primask = primask_copy; }

void inline interruptsOn() { int primask_copy = _primask; // Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)); }

Les Cargill

Vote

D

David Brown 11 years ago

It would be nice if there were some standard way to handle these things, to save people making them themselves (and either having to learn more than they wanted to know about compilers, or risk getting it subtly wrong). But there is no standard method in C or C++, because these languages have very little support for multiple threads of execution (either from an RTOS, or interrupt handlers). These also means solutions won't be cross-target, and are usually not even directly portable to different compilers on the same target.

There are several alternatives, however. First off - if you are using an RTOS (FreeRTOS, MQX, whatever), the OS will provide macros, functions, types, etc., for atomic access and for safely transferring data between threads. This should always be your first port of call when you have an RTOS.

Target libraries sometimes have support for interrupt control. avrlibc, which is usually used with avr-gcc, has macros for making uninterruptable blocks:

The msp430 gcc port has an elegant method - it implements a "critical" function attribute, which causes interrupt state to be saved and disabled at the start of the function, and restored at the end. It works fine with inline functions too. I would like to see that one implemented on other gcc targets (if you have the spare time, DJ ?).

There is the type "sig_atomic_t" in , that has existed since the dawn of C. You can use that to make guards around access to bigger data objects, with judicious use of volatile for ordering - the resulting code is a bit awkward, but it can be made portable and it does not need to disable interrupts. It is theoretically subject to livelock, however.

C11 and C++11 both provide some atomic and multi-threading support in the language and standard libraries. Unfortunately, these are not well supported, significantly (and subtly) different from each other, overly complicated, under-specified, optionally implemented, poor matches for existing OS's, and very ugly.

Vote

D

David Brown 11 years ago

Write /correct/ code - code that depends on optimisation settings for correctness, is wrong code.

If the reason for needing "volatile" in the asm statements is not clear to casual readers, then either comment it, or make sure casual readers don't fiddle with things they don't understand.

Vote

D

David Brown 11 years ago

This looks like you believe making an external function call enforces a memory order. This is simply incorrect - if the compiler knows the effects the function may have, it can use that information to re-order the code. In particular, it knows something about the data the function could access (it cannot legally access function-local or file static data if the data does not "escape" by passing pointers to it out to unknown code), and it may have full access to the source code through inlining or link-time optimisation.

So while the use of external functions might appear to work, and /will/ work in some contexts, it will work by accident of compiler flags and limited compiler capabilities, rather than by design. The use of "inline" (or "extern inline") here is utterly pointless - if the compiler is able to inline the code, then you lose all hope of it working. And when the code is /not/ inlined, you generate significantly bigger and slower code for something that should boil down to 2 instructions for the constructor and 1 instruction for the destructor.

Vote

D

Dave Nadler 11 years ago

Careful Les, that's a bug if entered in a context where interrupts are already disabled and must remain so...

Vote

gnu compiler optimizes out "asm" statements

Join the Discussion

Didn't find your answer?