Yes, the simple delay() function does not call anything. But still, interrupts can preempt it, which is quite likely because a function like this runs for a long time by design (and consumes a significant percentage of the CPU time).
In fact, I've checked it, and an interrupt preempting delay() must re-align the stack by using the "stack aligner". So the simple (no FPU) Cortex-M exception stack frame of 8 registers (32 bytes) becomes the bigger stack frame of 9 registers (36 bytes). Please note that the Cortex-M CPU deals with it just fine and the program runs. But in the case of RTOS or some other assembly code dealing with interrupts could break the system by making assumptions about the stack alignment. I thought that the compatibility with interrupts is the primary reason why the ARM ABI stipulates 8-byte stack alignment.
Also, I've just checked ARM/KEIL Compiler 6 (based on LLVM), and that compiler generated 8-byte aligned code for delay():
SUB SP, SP, #0x8 ... ADD SP, SP, #0x8 BX LR </pre>
Now, I don't have the time to investigate all compilers and various optimization levels. I thought that standards, like the ARM ABI, are supposed to settle things like that. I'm just a bit perplexed and couldn't find much information about that.
The hardware has to be able to cope with interrupts occurring while stacks are not 8-byte aligned. It's possible that it is marginally slower or results in a bigger stack frame, but it has to work.
The key reason for stack alignment is efficiency. It makes a bigger difference when you have caches and big internal buses, and an even bigger difference when this is combined with multiple cores. It's also possible that some vector and SIMD units require higher alignments. For embedded Cortex-M devices, it would not have made much difference (I believe the old EABI required 4 byte alignment), but requiring 8 byte alignment is a very minor cost that makes future compatibility much simpler. Getting it right early on avoids the kind of dog's dinner you see in the x86 world where the 64-bit Windows stack alignment is too small for the needs of SIMD instructions.
A leaf function can be fine with 4 byte stack alignment. A quick test shows gcc aligns on 8 bytes, while clang aligns at 4 bytes for a leaf function.
An extremely useful tool for investigating this kind of thing is the online compiler at
. It does not include many commercial compilers (though it has MSVC), but supports C, C++, and lots of languages on a very wide range of compilers and targets. Here you can see your code compiled for gcc and clang Cortex-M4 :
I'd like to do this, but I use this newsgroup so infrequently that I don't want to buy and install anything special. Is there some online tool you'd recommend?
Yes, thank you. It seems indeed as a useful tool for a quick look at the generated assembly.
But regarding the stack alignment requirements, The "ARM Procedure Call Standard for the ARM Architecture" (ARM IHI 0042E) says in Section 22.214.171.124 "Universal stack constraints" that "SP mod 4 = 0, The stack must at all times be aligned at word boundary". Later in the next Section 126.96.36.199 "Stack constraints at a public interface" it strengthens the requirements to: "SP mod 8 = 0. The stack must be double-word aligned".
So the question now is: what do they mean by "public interface"?
Thunderbird is free - as are any of a dozen different newsreaders, depending on preferences and OS. Many other email programs also support Usenet. There are several free Usenet servers, at least for non-binary groups like those in comp.* news.eternal-september.org is a popular one. Your ISP might also provide the service, as it used to be a standard part of any internet access package.
I don't know of any free online interfaces other than google groups, which is barely worth the price (although as always with google, it's good for searching). There are several paid-for services, mostly targeting binary groups (which used to be a popular way to spread pirated software and media, before bittorrent).
Technical groups are all text posts, and most have relatively few posts. Even if you start your newsreader once a month, it will take no more than a few seconds to download all posts in comp.arch.embedded to bring it up to date.
I use it all the time, for looking at code on different targets, comparing different options, checking complicated syntax (such as testing C++ features in the latest standards, newer than the compilers I have online), comparing the output of different compilers, sharing code with others via links, checking if the code I write gives exactly the assembly I want, amongst other things.
I guess that means when calling code, or being called from code, that is independently compiled. When it is within the same compiled code, you don't have to follow the standard ABI at all - you (meaning "the compiler") can make your own rules regarding parameter passing, volatile / non-volatile registers, etc.
Yes, the Standard API defines what functions are allowed to presume when they are called by "unknown" code. That is what is allowed at a "Public API", being public, anyone can call it.
Since routines are allowed to assume they are entered with a stack pointer aligned to a multiple of 8, the caller needs to assure that (at least if their entry at a public API also had the stack pointer properly aligned).
The purpose of this is that some common instructions require their source/destination to be so aligned, and it is a bit awkward to write a subroutine that might be called with a stack pointer that isn't so aligned to make the pointer so aligned (it typically costs a register to hold the old SP), so the ABI requires the stack to be so aligned.
If a piece of code doesn't call any outside routines, then this isn't a problem, so the ABI doesn't restrict the stack pointer at those times. This is important, as it isn't uncommon to want to temporarily push a single word onto the stack for a bit, and it the stack pointer needed to be kept at an alignment of 8, that operation would need to use up extra stack memory.