Short blocking delay in Cortex-M0+

- P
- pozz
  
  Contact options for registered users
posted
7 years ago

Thu, Feb 23, 2017 4:15 PM

During startup, I need a short and not precise delay, before configuring clocks, timers and other peripherals (at startup the CPU runs with internal clock).

What do you suggest?

I think there's a simpler method than configuring a hardware timer.

I need to check the status of an input pin, *after* enabling internal pull-up. I'd like to introduce a short delay after enabling internal pull-up, otherwise there's a risk I will read a transient level (maybe 0 or 1).

- J
- John Speth
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Feb 23, 2017 5:00 PM

First thing after main() starts, configure your pin and run a spin loop based delay, then read the pin. There's probably no need for a timer at that stage of start up.

void delay(void) { // Use volatile so the optimizer will not nullify this code volatile int i; for(i = 0; i < YOUR_DELAY; i++); }

JJS

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Feb 23, 2017 7:15 PM

Exactly! Short, reliable, and if you're going to do the startup in a single thread anyway, no harm done anywhere.

--
Tim Wescott 
Control systems, embedded software and circuit design 
I'm looking for work!  See my website if you're interested 
http://www.wescottdesign.com

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Feb 23, 2017 8:54 PM

"Short" can mean lots of different things to different people.

*Must* the delay expire before these configurations are *begun*? I.e., can you factor the time required to perform these configuration actions INTO the delay -- esp given that you don't care about its precision (presumably, the delay is expressed as "at least X" and not "at least X but not exceeding Y" or "exactly X".

I like to have timing services running just after I know I have a working stack. So, I'd simply use those services.

In your case, just spin in a loop immediately prior to reading the pin. The problem with spin loops is knowing how long each iteration of the loop WILL take to execute in light of hardware changes (XTAL freq, introduction of cache, etc.), software evolution (i.e., lifting that code section and reusing it in another design -- or, elsewhere in the same design), etc. So, you want to #define the constants governing the loop iteration in terms of actual hardware characteristics (XTAL_FREQ, CLOCK_MULTIPLIER, CYCLES_PER_ITERATION, DESIRED_DELAY, etc.) to document the desired goal *and* its derivation in a manner that makes it less dependant on the particulars of the design.

But, "serializing" delays in this manner means that NOTHING happens while the processor is twiddling its thumbs. You'll likely repeat the practice somewhere else in the startup code -- or, rationalize that you can use it for larger "short" delays, etc.

This is akin to lame code like: show_startup_banner() delay(FOR_USER_TO_VIEW_BANNER) do_rest_of_real_work()

A more "performant" approach is to use the expected time of some other activity to serve double-duty as your delay. This allows "work to get done" while the delay is being satisfied. E.g., do_whatever() // at start of delay interval clear_memory() // eats up time etc() do_followup() // after delay has passed The risk with this approach is that it decouples the "delay" as a prerequisite for the "do_followup()" that relies on the delay. I.e., someone changing the ACTIONS that implement the delay AS A SIDE-EFFECT can alter the delay in ways that aren't obvious to the do_followup() invocation.

Timing services allow you to make the delay and the things that depend on it more explicit: start_timer(INITIALIZATION_TIMER, INITIALIZATION_DELAY) do_some_stuff() wait_timer(INITIALIZATION_TIMER) do_followup() // requires INITIALIZATION_DELAY to have expired

So, you need to know the worst-case time-constant for is dangling on the pin (C) fed by the internal pullup. Is that likely to subtly change during manufacture (e.g., if you are sensing a switch closure on a cable that might get lengthened -- perhaps by the user after the sale?) or with different versions of your design?

Postpone the time at which you *need* to know the pin state to a point where you have more of the system (and its services) available for use. Even if you end up having "done" things that are now determined to have been unnecessary (based on your observation of the pin state), it hasn't COST you anything as you would have otherwise spent that time twiddling your thumbs.

If you opt to use a spin-loop, put it (and other hardware specific things) in a separate module (mach.c/mach.s) so you make the hardware dependencies very obvious to future developers.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Feb 24, 2017 8:51 AM

Yes, it's a solution. But I haven't ANY idea what is the value of YOUR_DELAY based on the real delay.

Considering I'm talking about a precise processor (Cortex-M0+) and knowing the core clock at startup (in my case 8MHz), is it possible to calculate YOUR_DELAY value based on the delay in microseconds?

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Feb 24, 2017 9:26 AM

Sure, it should be fairly easy to get enough info on the instruction timing to figure that out. But why not just code it up and measure it? Put an I/O instruction before and after and hang a scope on the I/O? Try a few permutations of options on the tools such as debug settings and optimizations to see how that impacts it.

After all, you did ask for a "not precise" delay. What is your concern?

--

Rick C

- J
- Jack
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Feb 24, 2017 10:22 AM

g

0

I would put a "asm NOP;" (in whathever sintax is needed by your compiler) t o better calculate the timing.

BUT: check the library files (the include files) that comes with the toolch ain. Usually you have some utility function already defined/implemented, su ch as enabling/disabling interrupts and also something link us_delay() or m s_delay() (they could be implemented as funtions or macros).

Bye Jack

- P
- Paul
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Feb 24, 2017 10:36 AM

1/ If in startup surely other things need setting up or blocks of memory initialised or similar to be put in waiting to start stage, do those first. Then do some form of delay if needed 2/ Does you processor have some form of SYS TICK hardware counter? If so once you know that is stable take a reading of SYS TICK after everything else than can be initialised take another reading of SYS TICK Wait till SYS TICK has reached some threshold value.

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk 
    PC Services 
   Logic Gates  
Education  
  Raspberry Pi Add-ons 
 Timing Diagram Font 
 For those web sites you hate

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Feb 24, 2017 5:29 PM

Am 23.02.2017 um 18:00 schrieb John Speth:

If you want a piece of assembly to show up in your binary, write that piece of assembly (and not some piece of C code that happens to produce that piece of assembly today). Even with volatile, the compiler is free to decide between a register or a variable, possibly different increment and compare instructions, unrolling the loop, etc.

And if you're too lazy to implement parameter passing in assembler to get a variable delay, it's definitely not hard to write a function that burns 10000 cycles. Wrap that in a C loop, and if your compiler generates anything between 1 or 100 cycles for the C code, you'll still have 1% accuracy.

Stefan

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Feb 24, 2017 8:22 PM

In theory, yes, but there's a huge number of variables having to do not only with the core used, but how the manufacturer of the particular part has laid out their flash.

If it's just one part, time it, then recalculate YOUR_DELAY. If that's not good enough, then use the core's clock.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com 

I'm looking for work -- see my website!

- D
- Dave Nadler
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Feb 24, 2017 9:03 PM

Even that is not enough; GCC will optimize out assembler NOPs if you're not careful. For Cortex-M4, NOPs get eaten in the pipeline; can't remember about M0.

Regardless, if you want a reliable delay:

1) You MUST check the delay on a scope, in RELEASE build with the optimization settings you expect for production. 2) You MUST recheck with the scope after any toolchain update or optimization setting change.

Hope that helps! Best Regards, Dave

PS: Just last week I had to do a short delay with a volatile iterator for loop, and check it on the scope ;-)

- M
- mrfirmware
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Mar 22, 2017 4:29 PM

g

0

How is it legal for the compiler to optimize out the iterator? If it does, your compiler is broken. Don't confuse from gdb with "I rem oved your code." That for loop will execute because you told the compiler t o loop for YOUR_DELAY counts. It must emit that code even if 'i' is not vis ible in the debugger. Volatile is only needed when the compiler cannot see any possible path for the variable to change with in the current scope of e xecution, e.g. ISR, another thread, or a hardware mapped address.

As for a delay in this case I'd read that GPIO port N times to create the d elay. The port read will have a longer execution duration and you may be ab le to empirically determine how many reads it takes before it stabilizes. O ften times port reads can be rather CPU clock independent and depend more o n the port peripheral clock so you may even be able to get reliable timing regardless of the CPU clock.

- M
- mrfirmware
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Mar 22, 2017 4:48 PM

ing

e 0

t

, your compiler is broken. Don't confuse from gdb with "I r emoved your code." That for loop will execute because you told the compiler to loop for YOUR_DELAY counts. It must emit that code even if 'i' is not v isible in the debugger. Volatile is only needed when the compiler cannot se e any possible path for the variable to change with in the current scope of execution, e.g. ISR, another thread, or a hardware mapped address.

delay. The port read will have a longer execution duration and you may be able to empirically determine how many reads it takes before it stabilizes. Often times port reads can be rather CPU clock independent and depend more on the port peripheral clock so you may even be able to get reliable timin g regardless of the CPU clock.

Gah! I forgot about optimizers. Yes the -O2 flag will remove the for loop. It's not correct C though to do that. The port read is immune to optimizer magic.

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Wed, Mar 22, 2017 9:02 PM

Am 22.03.2017 um 17:48 schrieb mrfirmware:

Actually in the case at hand, it can still be considered correct, depending how you interpret some rather involved wording in the standard.

If there were one in the code under consideration, it might be. But there isn't, so the entire loop is fair game.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Mar 23, 2017 9:05 AM

It is /entirely/ correct for the compiler to remove a delay loop like this if there is no "volatile" involved. For C, there are certain "observable behaviours" in a program. These are:

Accesses to volatile objects (reads or writes). Ordering is important, but timing is not.
Program start and exit.
Input and output of "interactive devices" through functions like printf(), fread() and fwrite().
The data written to files, at program termination.
Any function calls where the functions might do one of the above four things.

The compiler has to generate code that produces the same results with respect to these "observable behaviours" as the "C abstract machine" does.

And that is /it/. The compiler is free to do anything it wants, as long as these rules are followed.

For embedded systems, this usually boils down to just "program start" and "volatile accesses". Even when you use "printf", the key point is that this will eventually use volatile accesses to send data to UART hardware. And function calls are considered "observable behaviour" if the compiler does not know for sure that they involve no volatile accesses.

A loop that simple counts up does not access anything volatile - the compiler can freely remove it. The compiler can shuffle things around any way it wants, as long as it does not break the ordering of volatile accesses. It can remove or simplify any code it wants, as long as the results are that the same data is written out to volatile objects and the same reads are made of volatile objects.

It can do this regardless of any optimisation settings. Using -O0 or no optimisation flags does /not/ turn off optimisations - it merely tells the compiler not to work too hard on optimising. Turning on -O2 can never make correct code (in the C sense) into incorrect code - it just asks the compiler to work harder to generate more efficient code that does exactly the same thing as before.

And note that C has no concept of time - there is no way in C to express a delay, or to suggest that some code has to be faster or slower than other code. There is only ordering of volatiles and the expectation that a decent compiler won't insert extra slow code.

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Mar 23, 2017 5:54 PM

Just wanted to double down on that -- if bumping up the optimization level screws the pooch, then your code is at fault.

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com 

I'm looking for work -- see my website!

- M
- mrfirmware
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Thu, Mar 23, 2017 7:52 PM

Yeah, in retrospect what was I thinking? I guess this is why I just read or write the port pin multiple times to sample reliably/create a pulse. Or snapshot a free running hw timer and wait for it to exceed some offset from the snapshot time.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 24, 2017 11:23 AM

Il 23/02/2017 17:15, pozz ha scritto:

I found some interesting code in Atmel Software Framework. It is based on the following function defined in delay.c file:

__attribute__((optimize("-Os"))) __attribute__ ((section(".ramfunc"))) void portable_delay_cycles(unsigned long n) { (void)n;

__asm ( "loop: DMB \n" "SUB r0, r0, #1 \n" "CMP r0, #0 \n" "BNE loop " ); }

In delay.h there are some macros:

void portable_delay_cycles(unsigned long n);

#define cpu_ms_2_cy(ms, f_cpu) \ (((uint64_t)(ms) * (f_cpu) + (uint64_t)(7e3-1ul)) / (uint64_t)7e3) #define cpu_us_2_cy(us, f_cpu) \ (((uint64_t)(us) * (f_cpu) + (uint64_t)(7e6-1ul)) / (uint64_t)7e6)

#define delay_cycles portable_delay_cycles

#define cpu_delay_s(delay) delay_cycles(cpu_ms_2_cy(1000 * delay, F_CPU)) #define cpu_delay_ms(delay) delay_cycles(cpu_ms_2_cy(delay, F_CPU)) #define cpu_delay_us(delay) delay_cycles(cpu_us_2_cy(delay, F_CPU))

However I didn't measure how precise are the delays generated by those functions.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 24, 2017 11:57 AM

Someone should teach the Atmel folk about gcc inline assembly...

They might also realise that "written in AVR32 assembly" and "portable" don't really go together!

precision. The point of the function is to put the code in ram, so that its timing is not dependent on things like flash access delays.

Of course, if interrupts are enabled, it's a different matter.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 24, 2017 12:54 PM

What is wrong with that code?