gnu compiler optimizes out "asm" statements

Tim Wescott · 2015-05-28T18:27:31+00:00

This is related to my question about interrupts in an STM32F303 processor. It turns out that the problem is in the compiler (or I'm going insane, which is never outside the realm of possibility when I'm working on embedded software). I'm coding in C++, and I'm using a clever dodge for protecting chunks of code from getting interrupted. Basically, I have a class that protects a block of code from being interrupted. The constructor saves the interrupt state then disables interrupts, and the destructor restores interrupts. This has been reliable for me for years, but now the destructor is not being called. I suspect that the optimizer can't make sense of it because of the asm statements, and is throwing it away. If someone knows the proper gnu-magic to tell the optimizer not to do that, I'd appreciate it. I'm going to look in my documentation, but I want to make sure I use the right method, and don't just stumble onto something that works for now but should be depreciated, or is fragile, or whatever. Here's the "protect a block" class: typedef class CProtect { public: CProtect(void) { int primask_copy; asm("mrs %[primask_copy], primask " // save interrupt status "cpsid i " // disable interrupts : [primask_copy] "=r" (primask_copy)); _primask = primask_copy; } ~CProtect() { int primask_copy = _primask; // Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy)); } private: volatile int _primask; } CProtect; and here's how it's used: { CProtect protect; // critical code goes here } -- Tim Wescott Wescott Design Services

D

DJ Delorie 11 years ago

That's a highly target-specific attribute, so each target maintainer would have to add it separately. I think some other targets have similar features, though, but it's not something standard across all targets.

Vote

T

Tim Wescott 11 years ago

Yes there are. the MSR instruction loads the contents of a register (primask) into the mask register, setting the mask bit to whatever's in the register's bit 0.

Yes, very good. And the first time you hit the block with interrupts OFF, they will be (erroneously) ON when you exit it.

Hence, you want to save the interrupt state and restore it on exit.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

D

David Brown 11 years ago

I know that it's a target-specific attribute, but in theory it could be made common across a range of targets (at least most typical embedded microcontrollers). If there were common functions such as __builtin_disable_interrupts returning the current interrupt mask, and __builtin_restore_interrupt_mask to restore it, you could make those target-specific while the "critical" attribute could be common to every target that implemented those builtins.

For those of us who work with a variety of gcc microcontroller targets, it can be frustrating to see useful extensions from one target unavailable on other targets. As well as "critical", another that I miss on some targets is "naked".

Vote

T

Tim Wescott 11 years ago

Part of the reason for the "CProtect" class is to allow me to wrap different things in the same syntax. It may not be portable outside of Wescott Design Services, but it means that my internal code can be portable across different processors.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

L

Les Cargill 11 years ago

It's less that than that _primask is now a side effect. Again, I can't test this in his environment so the idea is to suggest a possible direction less than a destination.

It may well be that interruptsOn() doesn't quite work unless you write back to _primask. Or something.

Others have more directly addressed direct directives to tell the optimizer to hold off. I also understand not wanting to do that.

Of course.

Not so much. It'll work either way.

Life is like that. I would have used 'C' macros myself because those always work, but the use of a class to do this is pretty slick.

Les Cargill

Vote

L

Les Cargill 11 years ago

"Doctor, doctor is hurts when I do that..."

That is inherent in the design of this widget.

Les Cargill

Vote

L

Les Cargill 11 years ago

I do not think the compiler understands side effects in asm code. SFAIK, those are opaque to the optimizer.

If that were not true, it would not optimize the asm stuff out.

interruptsOn() is only called by the destructor. We presume the destructor is only called at the end of a block where a CProtect object has been instantiated.

I believe what I put up there is equivalent to what you'd put up. I just added some scaffolding to tweak some of the framing rules in C++.

If I made some mistake in that way, my bad - I have just used this technique with GNU before.

Les Cargill

Vote

D

David Brown 11 years ago

Please read the posts I have made in this thread, that give complete and correct answers to the problem, along with an explanation for why my recommendation is correct, and as close to optimal as possible given the ease-of-use constraint. You may also like to read other posts in this thread.

If you would prefer a list of issues with your class here:

It relies on having the interruptOn and interruptOff functions in an external module - as noted above, this is not safe.
It relies on a static object to hold the previous interrupt state. This means the method cannot be nested, making it useless for the task in hand.
It is seriously inefficient, meaning interrupts are disabled for much longer than necessary, and making critical code a costly operation rather than merely the 3 cycles or so that are actually needed. In embedded programming, this is important.
Even if the functions are treated as external black boxes, calling them will not enforce ordering on memory references which could not legally be accessed by an external function, thus defeating the purpose of the class.

Then why is it there in the first place?

No, it will not work. Rather than repeating myself here, I direct you to my other posts in this thread (as well as earlier in this thread) where I have explained this. If you find flaws in something I wrote there, please say. If you don't understand the issues, please ask.

If your C macros are similarly incorrect, then you are merely lucky that they have worked. Feel free to post those macros, and I will try to comment on them too.

Do you believe that my class, as posted earlier, has errors? If so, I would like to know - while I am confident that it is correct, I would be glad to hear of any suspected problems or situations where it would not work.

But assuming it /is/ correct, why would you prefer to use a solution that is more limited in its use cases, dependent on the compiler's optimisation flags, and big and slow in the cases when it might work?

Vote

D

David Brown 11 years ago

You are not in a position to be sarcastic here!

No, you are incorrect here. You either misunderstood Dave's comment, or you misunderstood your own code.

Dave was incorrect in his complaint - as long as the class is used without any nesting or recursion, it will preserve the interrupt status on construction, and restore the previous state on destruction.

However, it is misleading (and therefore bad coding) to call the destructor function "interruptsOn", because that is not what the function does - it should be called something like "interruptsRestore". Bad identifier naming is a crime in programming - it lead directly to Dave's misunderstanding here.

Vote

D

Dave Nadler 11 years ago

Yep, I thought he was re-enabling interrupts in all cases...

I've been using the ctor/dtor pattern for interrupt bracketing and similar resource management for decades, but I'll be fooled by poor naming every time!

Thanks for the clarification, Best Regards, Dave

Vote

L

Les Cargill 11 years ago

David - sorry.

I had apparently missed your post in the maelstrom. I was unfamiliar with the "memory clobber" idiom.

Thanks for that, David.

For what it is worth, the reference to extern class routines and variables was sufficient last time I'd faced this. As you say, I'd probably been lucky.

Nope - it was very clear once I'd found it. I would not have posted at all had I seen that.

Because it had worked for me ( when we were using inline asm to do something horrible ) once before. Like you say, it was probably just luck that it worked.

I did not find it particularly big and slow - the "iniline' did what you'd expect. We weren't exactly tight for cycles.

The "static" thing, though, was me not understanding the intention of Tim's original code. I really thought it was just "on or off" and didn't see any nesting needed.

Les Cargill

Vote

D

David Brown 11 years ago

Sorry if I was a bit crass with you in my posts last night - it hadn't occurred to me that you had simply failed to see my posts in the mass in this thread.

Compilers usually generate code in the order of operations in the source code, unless they have particular reason to re-order them. This means that you can very often get away with code that doesn't enforce the order specifically. And if you don't have LTO or other sorts of whole-program optimisation enabled, then external function calls form a barrier to a fair amount of re-ordering. Thus the external function calls here /usually/ work.

However, compilers are getting smarter and smarter - and they are optimising over a wider range with more cross-function and cross-module optimisation. Techniques that worked well enough before, are now failing. Some people think this is a bad thing - that compilers are getting "smart-ass" rather than "smart". Personally, I am a fan of compiler optimisation, but it means that we need to be more accurate in telling the compiler /exactly/ what we want. But it has the advantage of letting us write neater and clearer source code, knowing that the compiler will produce efficient object code in the end.

As well as varying between compilers (or versions of compilers, or selection of optimisation flags), these things also vary between targets. When you are using an ARM Cortex-M3/M4 (as in this thread), the cpu is pretty straightforward and does one thing at a time. Thus the compiler can order generated instructions in the same way as the source code, without any penalty - this makes debugging and single-stepping easier, as well as being easier to understand the generated code. But if you are using a PowerPC, or a MIPS processor, especially the higher-end devices with multiple issue super-scaler cpus, the compiler does a great deal of re-ordering and re-scheduling to get maximum overlap from instructions. With such targets, you need to be /very/ careful about your order enforcements (and may also need additional assembly instructions to flush the cpu's write buffers, or synchronise pipelines in some way).

It's nice to have the confirmation.

Tim didn't mention nesting anywhere (at least, not that I noticed) - I took it as an implicit requirement (in particular, it is one of the reasons you need to restore the previous interrupt state rather than assuming interrupts are always on at the constructor). Once you have thought of the idea of being able to nest the critical sections, it is obviously a useful feature. If nothing else, it helps make the class simpler to use.

Vote

L

Les Cargill 11 years ago

No apology needed - I was ... quite confused :) You were quite professional about it.

I saw your *postS*, but not the critical *post*.

As soon as I realized that there was a construct specifically for warning the toolchain that "here we made memory changes", I realized that that was probably the only way to reliably enforce this - this issue had been deliberated on, and the solution provided wasn't just there as decoration.

Yes. Emphasis "usually".

:)

Indeed.

Always good to find the "bug". :)

Sure.

Absolutely.

Les Cargill

Vote

gnu compiler optimizes out "asm" statements

Join the Discussion

Didn't find your answer?