Beware of GNU-ARM compiler for Cortex-M0/M0+/M1

The popular GNU-ARM toolset has had long-known issues for the Cortex-M0/M0+ /M1 (ARMv6-M architecture). Specifically, people have reported very ineffic ient code generated, see "Cortex M0/M0+/M1/M23 BAD Optimisation in GCC" htt ps://embdev.net/topic/426508 .

But while so far people reported only inefficient code, I would like to mak e people aware of *incorrect* code generated by GNU-ARM for Cortex-M0/M0+.

The issue was detected with interrupt disabling and has been documented in a bug report for the QP framework, see

formatting link

84/ . The experiments performed with the latest available GUN-ARM (GNU Tool s for ARM Embedded Processors 6-2017-q2-update, 6.3.1 20170620 release) cle arly show incorrect code generated at optimization level -O, while the same code compiled at -O2 level seemed to be correct.

Please be careful with GNU-ARM for ARMv6-M architecture and preferably avoi d using it for these CPUs as long as the issue remains unresolved.

Miro Samek state-machine.com

Reply to
StateMachineCOM
Loading thread data ...

Thanks Miro for bringing this to our attention...

Reply to
Dave Nadler

Do you have an example source snippet?

QP feels pretty heavy for Cortex-M0.

--

-TV
Reply to
Tauno Voipio

It is impossible for anyone to determine if this is a bug in the compiler or a bug in the QS macros without giving us the source of the test. Can you give us the source of these macros (or if they are proprietary, a roughly equivalent source that shows the same problems)? I'd like to see it, and try it on a simple case such as the example in the linked page.

void crit_section_test(void) { uint32_t i; for(i = 0; i < 10; i++) { QS_BEGIN(123, 0); QS_U32(8, 0); QS_END(); } }

My guess here is that there is a misunderstanding or error in the embedded assembly in these macros. gcc inline assembly can be a bit fiddly to get exactly right.

Reply to
David Brown

I just wonder if QP attempts use the exclusive access instruction pairs (LDREX / STREX), which do not exist in M0 and M1.

--

-TV
Reply to
Tauno Voipio

From the link he gave, there is a screendump of the generated assembly

- there is no LDREX or STREX there.

My guesses for the problem are missing "volatile" in the asm statements, multiple independent asm statements where there should be a single one, or incorrect dependency information in the asm statements or other code.

gcc does a lot of optimisation and re-arrangement of code, including with inline assembly. It is easy to get it wrong when you depend on the order of the code in a way that the compiler does not know about.

Reply to
David Brown

I think QP/C is open-source project (even if it isn't free-to-use for commercial business).

The source code is here:

formatting link

QS_BEGIN and QS_END are defined in include/qs.h, but depends on many other macros.

Reply to
pozz

Yes, I saw the source was there - but I have no interest in the project, and no interest in digging through all the source of that project to try to find the problem. The OP is one of the people behind that project, as far as I can see - he should be able to provide a small self-contained equivalent definition for the macros so that we can get to the bottom of his problem.

My take on this at the moment is that it is most likely to be a flaw in the QP code, not the compiler. I am happy to help, whether it turns out to be a compiler problem or a QP problem.

But the OP has to do some work here, not just give a hit-and-run FUD about the compiler that is far and away the dominant tool for these microcontrollers. "Avoid using gcc for the M0/M0+" is advice to avoid those microcontrollers entirely.

Reply to
David Brown

Thank you everyone for attention. There is really no need to be hostile. I' m NOT trying to sell you anything. I merely didn't have the time to distill the problem to be completely "context free".

But I was was able to distill the problem to a relatively small snippet of code without any external dependencies or macros. I filed this information as an official bug report at GCC-ARM-Embedded, please see:

formatting link

As I experimented with this code, the excessive type casting in the conditi on for the if statement seems to be implicated (the bug goes away if I remo ve some of this type casting). The type casting has been added in the first place to satisfy static analysis with PC-Lint for MISRA-C compliance.

--MMS

Reply to
StateMachineCOM

No hostility was intended - I just want to make sure that this issue is considered properly, and followed up properly. I have seen too many people drop into a newsgroup like this and make claims about compiler bugs, then disappear (perhaps in embarrassment) when it is their own code that is found faulty. I want to push you to follow the thread here and keep things updated

Thank you for posting the test code (in the launchpad bug report). I can't see anything wrong with the code you wrote so far. I am a little short on time just now (it's dinner time here :-) ) but I will do some experiments with the code as soon as I get the chance, and get back to you.

Reply to
David Brown

What level is this -O ? All I know are -O0, -O1, -O2, -O3, -Os

Reply to
raimond.dragomir

-O is the same as -O1.

My preference for small embedded targets is -Os.

--

-TV
Reply to
Tauno Voipio

I have just added this comment to the gcc-arm-embedded bug report linked above:

I agree that it is surprising that "asm volatile" statements can be re-arranged with respect to each other, and with respect to other volatile accesses. This seems to be a particular problem with asm statements that return outputs - "volatile" is used primarily to tell the compiler that you can get different outputs at different times, even if the inputs (if any) are the same. For asm statements with no outputs, the compiler appears to assume they have a special function and should not be moved.

As far as I know, there is no way in C or as a gcc extension to specify ordering of statements or executable code - you can only specify ordering of memory (via volatile accesses). Even the traditional method of declaring a function (like "foo" in the sample) externally is in danger - with link-time optimisation, the compiler knows everything and can re-arrange bits of "foo" with respect to the asm statements or volatile accesses.

A related problem is well documented for the AVR gcc port:

formatting link

There is, however, a solution to all this. (I have told the avr-libc folks about it a number of times, but they have not added my solution to their webpage. I have given up trying to persuade them.)

I have three macros defined that I use in circumstances like these:

#define forceDependency(val) \ asm volatile("" :: "" (val) : )

#define forgetCompilerKnowledge(v) \ asm ("" : "+g" (v))

#define forgetCompilerBlock(start, size) \ do { typedef struct { char x[size]; } XS; XS *p = (XS *) start; \ asm ("" : "+m" (*p)); \ } while (0);

The first one tells the compiler "I am using "val" here, so you have to evaluate it before this point". The second one tells the compiler "I am using "val" here, and I might change it, so you have to evaluate it before this point, but forget anything you know about it". The third one is just another version that can handle data of any size.

Putting "forceDependency(status)" after the "mrs" instruction, but before the "foo()" call, ensures that the compiler has evaluated "status" before calling foo. It makes most sense to put it before the "cpsid i" instruction, but that does not appear to be critical. The neatest arrangement is to combine it with the cpsid:

uint32_t status; asm volatile ("mrs %0, PRIMASK" : "=r" (status) :: ); asm volatile ("cpsid i" :: "" (status) :);

foo();

asm volatile ("msr PRIMASK, %0" :: "r" (status) : );

Reply to
David Brown

I am tending to use -O2 more than -Os for Cortex-M targets. Often the code they generate is almost identical, but -Os can sometimes sacrifice quite a lot of performance for very minor space savings. It will depend on your target, your code, and your requirements, however.

Reply to
David Brown

Yes! This solution seems to fix the problem! The code compiles correctly at all optimization levels (no explicit optimization, -O, -O1, -O2, -O3, -Os).

Thanks a lot, David.

--MMS

Reply to
StateMachineCOM

You are very welcome. I think this kind of stuff is fun, and it's nice when it can help people.

David

Reply to
David Brown

Or:

uint32_t status; asm volatile ( "mrs %0, PRIMASK\n\t" "cpsid i" :: "=r" (status));

Combining the instructions into a single asm statement does it.

--

-TV
Reply to
Tauno Voipio

No, it does not - not in this case. Putting the two instructions in a single asm means that these two instructions will follow each other. However, in this example it does not hinder the compiler from moving the "foo()" call above this asm statement, which is the wrong order.

The ordering enforcement of gcc's asm instructions is not very clear in the documentation. But it appears (from my testing and experience) that "asm volatile" statements with no output are not moved in relation to other statements (at least, volatile memory accesses, unknown external function calls, and other no-output asm volatile statements). "asm volatile" statements with an output are often moved.

So the first important point here is to have an asm volatile statement with no output (such as the "cpsid i" one, or a "forceDependency" macro) before "foo()". The second important point is to force the usage of the output of the "mrs" statement, by using its output as an artificial input to another asm statement.

Does that make it clearer? (This stuff is hard to understand, and hard to explain - especially when I can't claim to have a full understanding myself.)

You can try it out using the online compiler at . If we are lucky, the following link will have the example code:

Reply to
David Brown

Just to keep people updated - this has been confirmed as a gcc bug. "asm volatile" statements are not supposed to be moveable across other "asm volatile" statements.

Details can be seen at:

In addition, a memory clobber has been added to the CMSIS function "__get_PRIMASK" for gcc as a workaround.

Reply to
David Brown

ny

d here

? I

ittle

you.

To prevent compiler reordering I use this inline asm macro.

__asm__ __volatile("":::"memory");

I'm pretty sure that tells gcc that it should not reorder above and below t hat statement. So you might sprinkle those about?

Reply to
mrfirmware

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.