A timer driver for Cortex-M0+... it rarely doesn't work

- P
- pozz
  
  Contact options for registered users
posted
6 years ago

Wed, Apr 26, 2017 10:55 PM

I don't know if it is an issue specific to Atmel SAMC21 Cortex-M0+ devices or not.

I wrote a simple timer driver: a 32-bits hw counter clocked at 875kHz (14MHz/16) that triggers an interrupt on overflow (every 1h 21'). In the interrupt I increment the 32-bits global variable _ticks_high. The 64-bits number composed by _ticks_high (upper 32-bits) and the hw counter (lower 32-bits) is my system tick. This 64-bits software counter, incremented at 875kHz, will never overflow during my life, so it's good :-)

In timer.h I have:

--- Start of timer.h --- #include #include // Atmel specific

#define TIMER_FREQ 875000

typedef uint64_t Timer; extern uint32_t _ticks_high;

#define volatileAccess(v) *((volatile typeof((v)) *) &(v))

static inline uint64_t ticks(void) { uint32_t h1 = volatileAccess(_ticks_high); TC0->COUNT32.CTRLBSET.reg = TC_CTRLBSET_CMD(TC_CTRLBSET_CMD_READSYNC_Val); uint32_t l1 = TC0->COUNT32.COUNT.reg; uint32_t h2 = volatileAccess(_ticks_high); if (h2 != h1) return ((uint64_t)h2 COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) { ++_ticks_high; TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF; } } ...

--- End of timer.c ---

The idea is simple (and stolen from a post appeared on this newsgroup). At first, the 64-bits software counter must be calculated disabling interrupts, because if a timer interrupt triggers during calculation, the overall software counter could be wrong. By reading _ticks_high before and after reading hw counter, we can avoid to disable the interrupts.

Now I have a code that uses timers. It's a simple state-machine that manages communication over a bus.

--- Start of bus.c --- ... int bus_task(void) { switch(bus.state) { case BUS_IDLE: if (TimerExpired(&bus.tmr_answer)) { /* Send new request on the bus */ ... TimerSet(&bus.tmr_answer, timeout_answer); bus.state = BUS_WAITING_ANSWER; } break;

case BUS_WAITING_ANSWER: if (TimerExpired(&bus.tmr_answer)) { /* No reply */ bus.state = BUS_IDLE; TimerSet(&bus.tmr_answer, 0); } else { if (reply_received() == true) { /* Analyze the reply */ bus.state = BUS_IDLE; TimerSet(&bus.tmr_answer, 0); } } break; } return 0; } ...

--- End of bus.c ---

I don't think I need to explain the code in bus.c. The only thing to specify is that bus_task() is called continuously in the main loop.

99% of the time this code works well. Unfortunately I have seen some strange events. Rarely (very rarely, one time in a week) the bus seems frozen for a time. After that it restarts the normal activity magically. There's a thing that relates those strange events to driver of timers: the bus stall time lasts exactly 1h 21', the overflow period of hw counter.

I suspect there's a problem in my low-level driver and sometimes, maybe near the overflow, the code doesn't work as I expect. Maybe the TimerSet() function sometimes sets a wrong value to the uint64_t timer, maybe a tick value that will happen only at the next overflow of the hw counter.

Do you see where is the problem?

- S
- Simon Clubley
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Wed, Apr 26, 2017 11:24 PM

[snip]

Does changing the optimisation level change the problem ?

Does a review of the generated code using objdump match what you would expect ?

Have you tried placing come kind of debug marker in TC0_Handler() (maybe turning on an LED) to see if TC0_Handler() is called without the interrupt flag being set ?

This last one is in case there's some kind of rare timing issue which causes the overflow handler to be called without TC_INTFLAG_OVF being set yet when INTFLAG is examined.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world

- J
- John Speth
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 12:17 AM

First you need to make ticks_high volatile. It doesn't make any sense to me why you'd cast it volatile in one execution context but not the other. If it's shared between two execution contexts, the compiler needs to know that.

It sounds like you have a concurrent access problem that you need to protect against. If it was me, I'd disable interrupts instead of your trick.

JJS

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 7:32 AM

As you can think, it's very difficult make a test and try, because I have to wait for the rare event that could happen after a week.

Anyway I think yes, disabling optimisation should solve, but it's not a solution.

Do you mean reading the output listing with assembler instructions? I'm not an expert of ARM assembler, anyway I tried to read it and it seems correct.

No, it would be very strange. TC0_Handler() function address is stored only in the vector table, so it is called only when TC0 peripheral requests an interrupt. Anyway, if the interrupt flag is not set, in TC0_Handler() there's a if and the increment of _ticks_high isn't done.

There's an if and the increment of _ticks_high wouldn't be done.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 7:37 AM

Il 27/04/2017 02:17, John Speth ha scritto:

_ticks_high is accessed in TimerSet(), TimerExpired() and the ISR TC0_Handler(). Timerset() and TimerExpired() reads _ticks_high by calling ticks() function. So _ticks_high is accessed only in two points: ticks() and TC0_Handler().

ticks() is called during normal background flow (not interrupt), so the access to _ticks_high must be volatile. TC0_Handler() is the ISR, so I think a volatile access to _ticks_high is not necessary (the ISR can't be interrupted).

I liked this trick because avoids disabling interrupts. TimerExpired() is often called.

I found the post (from Wouter van Ooijen) where this trick is suggested:

formatting link

Don Y suggested other improvements in the same thread.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 8:01 AM

Identifiers that begin with an underscore are reserved for file scope - you are not supposed to use them for "extern" variables. I would be extremely surprised to find that a compiler treated them in any special

That looks familiar :-)

(I don't know what the CTRLBSET stuff is for - I am not familiar with Atmel's chips here.)

If the low part of the counter rolls over and the interrupt has not run (maybe interrupts are disabled, or you are already in a higher priority interrupt, or interrupts take a few cycles to work through the system) then this will be wrong - l1 will have rolled over, but h2 will not show an updated value yet.

So if the high parts don't match, you need to re-read the low part and re-check the high parts. It is perhaps easiest to express in a loop:

static inline uint64_t ticks(void) { uint32_t h1 = volatileAccess(_ticks_high); while (true) { uint32_t l1 = TC0->COUNT32.COUNT.reg; uint32_t h2 = volatileAccess(_ticks_high); if (h1 == h2) return ((uint64_t) h2 COUNT32.COUNT.reg; uint32_t h2 = volatileAccess(_ticks_high); if (h1 != h2) l1 = TC0->COUNT32.COUNT.reg; return ((uint64_t) h2

When you want a boolean result, return "bool", not "int". (This is not your problem, of course - it's just good habit.)

An alternative method here is to just use the low 32-bit part (with delays limited to 2^31 ticks), but be sure that you have considered wraparound. Keep everything in unsigned at first, to make overflows work as modulo arithmetic:

static uint32_t ticks32(void) { return TC0->COUNT32.COUNT.reg; }

static inline void TimerSet32(Timer32 *tmr, uint32_t delay) { *tmr = ticks32() + delay * TIMER_FREQ / 1000; }

static inline bool TimerExpired32(Timer32 *tmr) { int32_t d = ticks32() - *tmr; return (d >= 0); }

Note that doing the subtraction as unsigned, then converting to signed int, gives you the wraparound behaviour you need. Converting an unsigned int to a signed int when the result is out of range is

define it as modulo arithmetic:

"Good artists copy, great artists steal", as someone famous once said.

It sounds like you are missing a tick interrupt - maybe you are somehow blocking the overflow interrupt on occasion. Maybe the CTRLBSET line is the problem (since I don't know what it does!) But it could be the problem I noted above in ticks().

Diagnosing problems like this are a real pain - and demonstrating that you have fixed them is even worse. The key is to find some way to speed up the hardware timer so that you can provoke the problem regularly - then you can be sure when you have fixed it. Are you able to make this timer overflow at a lower bit count - say, 12 bits rather than 32 bits? If not, then try this for your hardware interrupt function:

void TC0_Handler(void) { if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) { ++_ticks_high; TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF; TC0->COUNT32.COUNT.reg = 0xfffff000; } }

That will force overflows and _ticks_high counts to run much faster.

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 8:02 AM

Hoping that the compiler will know how to fix this is not a very efficient approach. It is just a few lines of assembly, no matter how inconvenient the ARM assembly may be it will still save you weeks (months) of blind trial and error. You just need to _know_ how this is done, typing blindly this or that is unlikely to ever work.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 8:13 AM

"volatile" does not mean that a variable is shared between two execution contexts - it is neither necessary nor sufficient to make such sharing work.

To get shared accesses right, you need to be sure of what is accessed at what times - it is the /accesses/ to the variable that need to be volatile. Marking a variable "volatile" is simply a shortcut for saying that /all/ accesses to it must be volatile accesses.

Yes, the volatile accesses here are fine. Omitting the volatile for _ticks_high does not give any benefits or disadvantages, because you are only doing a single read and write of the variable anyway - there is very little room for the compiler to optimise. The compiler can move the non-volatile accesses to _ticks_high to after the clearing of the overflow flag, maybe saving a cycle or two if it is a Cortex M7 that can benefit from more sophisticated instruction scheduling. But otherwise, it does not matter one way or the other.

- J
- Jack
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 8:15 AM

change the timer clock so it overflows faster than 1.21h...

Bye Jack

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 8:27 AM

Looking over these again, I can see that they will have the same potential problem. They are fine for when the high part of the counter is also in hardware - but /not/ if it is dependent on interrupts which could be delayed. If you have a situation like this:

_ticks_high is 1 timer reg is 0xffff'fff0 interrupts are disabled timer reg rolls over, and is now 0x0000'0003 the ticks() function will return 0x0000'0001'0000'0003 instead of

0x0000'0002'0000'0003

You will need to check the hardware overflow flag to make this work.

static inline uint64_t ticks(void) { uint32_t h1 = volatileAccess(_ticks_high); uint32_t l1 = TC0->COUNT32.COUNT.reg; uint32_t h2 = volatileAccess(_ticks_high); if (h1 != h2) { // We just had an interrupt, so we know there is // no pending overflow l1 = TC0->COUNT32.COUNT.reg; // Or l1 = 0 for slightly faster code return ((uint64_t) h2 COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) { // There has been an overflow, and it was not handled l1 = TC0->COUNT32.COUNT.reg; return ((uint64_t) (h2 + 1)

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 9:06 AM

Why would you think it is easier to write this in assembly than C? The issue is to be sure that the /algorithm/ is entirely safe regardless of any ordering of interrupts, interrupt disables, timer overflows, etc. Exactly the same issues are relevant in assembly.

Another poster suggested fiddling with optimisation levels, which is equally odd here. There are enough "volatile" accesses here to ensure that the order of accesses in the C code will be kept when the code is compiled to assembly - writing it in assembly or changing optimisation levels will not affect that. At most, it will change the timing - making the problem appear more or less often. The problem is to ensure that the ordering of actions specified by the programmer and the algorithm is correct - not that it is being compiled as expected.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 9:38 AM

Il 27/04/2017 10:01, David Brown ha scritto:

;-)

Atmel says you *need* to write CTRLB register before reading COUNT register from TC0 peripheral. I don't know why exactly, but if you don't, you can't read the correct COUNT value.

Yes, you are right and I thought about this possibility... but IMHO it's impossibile.

ticks() are called only by TimerSet() and TimerExpired() and those two functions are called only in normal background (not interrupt) code. This means ticks() always runs in a lower priority than TC0 ISR. Moreover, I never disable interrupts in any part of the code.

I don't understand what do you mean with "[...] or interrupts take a few cycles to work through the system". Is it possible to read a rolled-over hw counter value (0x00000003) and a not incremented _ticks_high?

uint32_t l1 = TC0->COUNT32.COUNT.reg; uint32_t h2 = volatileAccess(_ticks_high);

COUNT register is read before _ticks_high second read. If COUNT has rolled over and the rolled value is in l1, the interrupt was fired for sure... IMHO. So h2 should contain the incremented value.

This doesn't solve the problem you described.

_ticks_high = 1

90 > /* delay is in ms */

Yes, I know this method and I used it many times in the past. However it has two drawbacks:

- the maximum delay is 2^31 (in my case, around 40 minutes, but I have delays of 1h)

- many times you need an additional "timer running/active" flag

The second point is more important. If you want to switch on a LED after

10 minutes when a button is pressed:

bool tmr_led_active; Timer32 tmr_led; void button_pressed_callback(void) { TimerSet32(&tmr_led, 10 * 60 * 1000); tmr_led_active = true; } void main_loop(void) { ... if (tmr_led_active && TimerExpired32(&tmr_led)) { switch_on_led(); tmr_led_active = false; } }

If you don't check the timer flag, the led will switch on at random times.

Yes, I will try your ideas.

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 9:53 AM

Il 27/04/2017 10:27, David Brown ha scritto: >> [...]

Yes, but this situation it's impossible in my case. Who could disable interrupts here?

Atmel SAM TCx peripherals are 16-bits counters/timers, but they can be chained in a couple to have a 32-bits counter/timer. I already coupled TC0 with TC1 to have a 32-bits hw counter. I can't chain TC0/TC1 with TC2/TC3 to have a hardware 64-bits counter/timer.

Yes, those problems are fascinating :-)

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 10:54 AM

You are sure you are not calling ticks() from within another interrupt function?

There is always a delay between an overflow in the timer, and the actual interrupt function being started. Depending on details of the chip, it is possible there will be several cpu cycles of the current instruction stream executed before the interrupt function is run. If I remember rightly, the M3/M4 has a 5 stage pipeline. When the interrupt is registered by the core, it effectively means that a "jump to interrupt vector" instruction is squeezed into the instruction stream - but these current 5 instructions must be completed before the interrupt call takes effect. You will also have a cycle or two delay in the NVIC, and perhaps a cycle or two delay getting the signal out of the timer block (especially if the timer does not run at full core speed). Cortex M interrupts are handled quickly - but not immediately.

Yes - hence my follow-up post.

In such cases, you rarely need the same level of accuracy. Use your accurate ticks32() counter to let you track a seconds counter in the main loop of your code - this can be read freely without worrying about synchronisation.

Another method is to make the timer overflow interrupt occur more often

- say, every millisecond. This increments a single millisecond counter (either 32-bit, or 64-bit, or split lo/hi 32-bit - whatever suits). Then reading that is easy, because the change to this counter is always done in the same context as an atomic operation - there is no complication of independent updates of the low and high halves. And most of the time, it is enough to just use the low 32-bit part.

So you have to add a flag - big deal. You are not using any more memory, your code is smaller, and it is simpler to be sure that everything is correct.

Good luck - it is not an easy task. Don't forget to let us know the source of the problem, and the solution!

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 11:20 AM

Perhaps not "easier", just taking a few minutes rather than a few months. Doing an lwarx/stwcx. like thing on a timer in a high level language is simply a waste of someone's time - but then someone might find it easier to waste time than to write the < 10 lines, keep on wondering why todays compiler output does not work while yesterdays did, what is actually happening etc. The obvious thing to do here is:

-lock the 64 bit entity (test a flag and set it, if found set try again),

-read the 64 bit counter lwarx/stwcx.-ish,

-unlock the timer.

- ensure this is done more frequently than the lower 32 bits overflow.

And you want to write this in C (or whatever)?! Well, it is your time.

Well good, if your C compiler behaves in such a predictable way go ahead. And don't come crying a year later when you discover it did so indeed - almost :-).

The mere fact that I see so _many_ and _lengthy_ posts lately all related to wrestling the compiler to do what one wants it to do, this taking often _months_ of peoples time instead of a few minutes speaks for itself.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

- P
- pozz
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 11:51 AM

Il 27/04/2017 12:54, David Brown ha scritto:

Yes, sure. I am very careful during writing interrupts code, even if it wasn't sufficient in this case.

Really? So, the core is able to read the register of a peripheral (TC0->COUNT32.COUNT register, in my case) with a *new* (i.e., rolled) value, but the ISR hasn't run yet? In other words, a bus access can be done (TC0 is on the bus) while interrupt request is pending?

If this is the case, it is a mess :-(

Yes, of course. My point here is that you have to **remember** that the timers you are using can roll-over at any time in the future, so they can change from "not expired" to "expired".

After using TimerSet32() and after the timer expires, you could expect it stays "expired", until you arm it again with TimerSet32(). This is not true, because TimerExpired32() could returns "false" at a certain time in the future.

This is not a problem with timers that are repetitive (armed again as soon as they expire), but with one-shot timers.

I configured TC0 in 16-bis mode, so now TC0_Handler() is fired every

75ms. I changed accordingly ticks() to create a 64-bits by shifting _ticks_high for 16-bits.

if (h2 != h1) return ((uint64_t)h2 CTRLB is a **Write-Syncronized** register, i.e. the value you are writing will be really wrote after sync time. Maybe the sync loop I added waits for time needed to the CTRLB command to be executed... otherwise the COUNT value you read immediately after could be wrong (IMHO!)

After this... I think my code is always affected by the original problem if, as you explained, TC0 interrupt is delayed when I'm reading hw counter in ticks(). I don't have the expertise to understand this possibility happens really, so it's better to find another method.

I liked the idea to use a 64-bits counter for ticks that will never roll-over during the entire lifetime of the device.

- S
- Simon Clubley
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 12:03 PM

Not really. I suggested changing the optimisation level because if it works reliably with lower optimisation levels then that strongly implies an issue within the code itself instead of in the hardware.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 1:39 PM

I cannot say for sure that this can happen. But unless I can say for sure that it /cannot/ happen, I prefer to assume the worst is possible.

True. But perhaps that can be baked into your "Set" and "Expired" functions, or otherwise made "unforgettable". The aim is to simplify the code that /may/ have rare race conditions into something that cannot possibly have such problems - even if it means other code is bigger or less efficient.

If you can make your hardware timer function run every millisecond (or whatever accuracy you need), then use this:

extern volatile uint64_t tickCounter_;

static inline uint64_t ticks(void) { uint64_t a = tickCounter_; while (true) { uint64_t b = tickCounter_; if (a == b) return a; a = b; } }

void TC0_Handler(void) { if (TC0->COUNT32.INTFLAG.reg & TC_INTFLAG_OVF) { tickCounter++; TC0->COUNT32.INTFLAG.reg = TC_INTFLAG_OVF; } }

If ticks never accesses the timer hardware register, there cannot be a problem with synchronisation. There is no need to use the timer peripherals complicated synchronisation and locking mechanism, nor any concern about interrupt delays. Re-reading the 64-bit value until you have two identical reads is sufficient to ensure that you have a consistent value even if a timer interrupt occurs in the middle of the

64-bit read.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 2:16 PM

You don't think you are exaggerating just a /little/ ?

The problem has nothing to do with making an atomic 64 bit read of data

- that is /easy/ in C. The problem is the locking and synchronisation system for reading this timer peripheral in this particular microcontroller, which is inconvenient no matter what language is used. The other considerations are synchronisation between updates of the data in the interrupt function, the calling function, and the hardware timer - again, language independent.

A lwarx/stwcx can be useful for doing read/write/modify operations on an object - it is not necessary for atomically reading data which could be changed underway. The simplest way to handle that, in C /or/ in assembly, is to read it more than once:

volatile uint64_t count; uint64_t readCount(void) { uint64_t a = count; while (true) { uint64_t b = count; if (a == b) return a; a = b; } }

Alternatively, using C11, you handle it like this:

_Atomic uint64_t a_count; uint64_t readCount(void) { return a_count; }

I appreciate that you are very familiar with assembly and may write it faster than C, but the above function did not take me a month to figure out.

Using lwarx/stwcx (or ldrex/strex on ARM) will not be shorter or faster, and it will be more complex. Basically, you need to use these to create a lock for the access to the non-atomic type. They are always harder to use than something like C11's atomic access support, regardless of the language. And if you want to program in C, and don't have C11 atomics (or other help from your compiler), you can easily write inline assembly wrappers for those two instructions and write everything else in C.

My C compiler(s) behave in a predictable manner - because I know how C works. People who don't know all the fine details can get it wrong, but that applies to any language.

I follow most threads in this newsgroup, but I can't figure out what you are referring to here. There are occasional posts where people make mistakes due to misunderstandings about how C works and how their compiler works - but they are not /that/ common. And by what possible justification could you claim that writing the code in assembly would result in fewer bugs or shorter development times? You might reasonably say that the programmer would not make those particular mistakes - but they could easily make many, many more errors in their assembly code that they would not have made with C.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
6 years ago

Thu, Apr 27, 2017 2:18 PM

As in, "it is worth giving it a try to see if it affects behaviour, and therefore gives a clue to help debugging" ? Yes, I can agree with that. Once the OP has made changes that speed up the error rate to something workable, that kind of thing can be worth trying.