Short blocking delay in Cortex-M0+

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 24, 2017 2:21 PM

That is written in the old "basic" format for inline assembly, which for most purposes has been replaced by the "extended" format. As it says in the manual:

"Using extended asm (see Extended Asm) typically produces smaller, safer, and more efficient code, and in most cases it is a better solution than basic asm."

This would have been better written:

__attribute__((section(".ramfunc")) void portable_delay_cycles(unsigned long n) { asm volatile ( " dmb\n" " 1:\n" " sub %[n], %[n], #1\n" " cmp %[n], #0\n" " bne 1b" : [n] "+r" (n) :: "cc"); }

I would actually prefer such functions to be declared "static inline", and not as a separately compiled function in a different segment, to avoid function call overhead in the delay calculation. But I appreciate the thought of putting it into ram like this.

There is very little point in specifying an optimisation attribute for a function containing nothing but assembly!

When written as basic assembly, the compiler has fewer opportunities to do much with the code. Extended assembly gives the compiler details of exactly what you need, and exactly what you are doing. If the compiler can move the code around (such as for a "static inline" function, or if you have link-time optimisation enabled), then the compiler is able to generate better code. With the basic assembly, the compiler must assume that the assembly code will corrupt the volatile registers r0..r3, lr and the condition codes. With extended assembly, the compiler knows exactly which registers are in use - and it does not have to pick r0 for the loop counter. It also means the compiler knows exactly how "n" is going to be used.

Also, there is no need to put a "data memory barrier" /inside/ the loop! I don't quite see how Atmel count 7 clocks per cycle - I see 6 myself. But putting the DMB outside the loop reduces it by two clocks per cycle, thus increasing the resolution.

Finally, they might like to note that:

void portable_delay_cycles(unsigned long n) { while (n--) { asm volatile (""); } }

gives a faster loop, as well as actually being portable!

The best code for accurate delays is usually made by having static inline functions, so that the compiler can calculate and compensate for loop entry/exit directly, and use whatever registers make most sense for efficient code.

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 24, 2017 7:26 PM

I would delete the cmp instruction and change the sub instruction to a subs instruction.

One more vote to the C version of David.

--

-TV

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 25, 2017 12:11 AM

That is what gcc generates for the C loop above (I said it gave a faster loop, but did not post the generated code).

Yes.

The " asm volatile(""); " trick might be new to some people - it tells the compiler "pretend something important is happening here, even though there is no code". It is cheaper than the traditional idea of making the loop variable volatile to force the compiler to keep the loop, or the alternative of using an assembly "nop" instruction.