Help with interrupt software routine and non-atomic operations

P

pozz 10 years ago

I have an ISR that uses a struct with two members:

struct mystruct { const void *data; size_t size; } *curr, *next, curr_s, next_s;

void ISR(void) { static const unsigned char *d; if (next) { *curr = *next; next = NULL; d = curr->data; } if (curr) { /* Process curr structure, byte by byte */ ... if (d - (const unsigned char *)curr->data == curr->size) { curr = NULL; } } }

Now, to activate a new data structure I wrote the following routine:

void foo(const void *data, size_t size) { if ((volatile struct mystruct *)curr == NULL) { curr_s.data = data; curr_s.size = size; curr = &curr_s; } else { (*) next_s.data = data; next_s.size = size; next = &next_s; } }

If a current data structure is present, it should be stopped and the new one should take place.

The first if guarantees that there isn't a current structure, so curr and curr_s can be changed without any problem.

The second if happens when a current structure really exists. In this case, I can't change curr_s and curr directly, so I use a temporary data structure next_s and an atomic flag next (I'm working on 32-bits architecture, so the assignment is atomic). The next data will take place in the next ISR.

Now suppose current structure isn't present (curr == NULL) and I call foo() three times:

foo(d1, s1); foo(d2, s2); foo(d3, s3);

I would like to have finally (d3,s3). After first foo(), the first if is followed and curr_s data structure and curr flag is filled. After second foo(), the second if is followed and next_s/next are filled.

What happens with the third foo()? What happens if the ISR is fired exactly *after* line marked with (*) above? The ISR will use next_s data structure, but it contains uncoherent data values: data points to d3, but size is s2.

How to avoid this problem?

Just to explain my problem. The ISR is a timer interrupt that generates and audio stream at a sampling frequency on a DAC output. foo() can be named audio_play().

Vote

T

Tim Wescott 10 years ago

Normally you identify those bits of code that need to be atomic but aren't, you turn off interrupts for JUST THAT PART, and then you turn interrupts back on.

You have to accept that your interrupt latency will increase a bit (and you want keep a stick handy, to whack any developer who wraps a protection block around more than the absolute minimum of code).

www.wescottdesign.com

Vote

D

David Brown 10 years ago

Your code above does not guarantee that curr_s is set appropriately before curr is set. The compiler is free to store the address of curr_s in curr before setting curr_s.data and curr_s.size.

To force the desired ordering, you have two choices. You can make sure curr_s /and/ curr are both accessed as volatile (by using casts to volatile, or by simply declaring the variables as volatile). Or you can use a memory barrier to stop the movement of accesses back and forth:

asm volatile("" ::: "memory");

(assuming gcc or a compiler that supports gcc inline assembly).

Unless you are sure you know what you are doing with memory barriers, and with getting volatile casts in exactly the right places, it is usually easier to declare the relevant variables as volatile, at the risk of slightly suboptimal code generation. And that means /all/ the relevant variables - volatile ordering does not affect the ordering of non-volatile accesses.

Again, you need volatiles or a memory barrier to force the setting of next_s before the setting of next.

If the ISR has not fired, then next will be filled with d3, s3.

Trouble.

There are two ways to handle this that I can think of. One method is to disable interrupts (or at least the timer interrupt) around the critical section. As long as you remember to use volatile accesses or memory barriers appropriately, that should be easy to get right (for convenience, your interrupt enable/disable macros should include memory barriers).

Other than that, you basically have a two entry FIFO queue with one reader client (the ISR) and one writer client (foo). This can be implemented as a 3 entry ring buffer with separate head and tail pointers. The writer only ever writes to the tail pointer and the buffer, the reader only ever writes to the head pointer. As each part can only be written by one task, you can keep them consistent. However, a simple implementation will only work if you drop new values when your FIFO is full - it gets messy if you want to be able to overwrite the final entry with new data.

Vote

D

David Brown 10 years ago

Turning off interrupts for a bit is often the simplest way, but just be /very/ careful that you really do it correctly. Something like:

curr_s.data = data; curr_s.size = size; asm volatile("cpsid i"); curr = &curr_s; asm volatile("cpsie i");

is /not/ good enough. The compiler can freely move the writes to curr_s inside the critical zone (leading to unpleasant whacking), or it can move the write to curr outside the critical zone (leading to everything working perfectly until the customer demonstration, when it fails). It can even move the writes to curr_s after the write to curr, and after the critical zone - who knows what will happen then.

Make sure you have memory barriers, such as these (for ARM Cortex-M devices):

static inline void disableGlobalInterrupts(void) { asm volatile("cpsid i" ::: "memory"); }

static inline void enableGlobalInterrupts(void) { asm volatile("cpsie i" ::: "memory"); }

Vote

T

Tim Wescott 10 years ago

Something just went on my "to be fixed" queue. I assume that I've been getting away with it by luck.

At any rate, to provide the full-meal-deal you also save the current state of interrupts, so that you don't turn the turned-off interrupts off, and then turn them on inappropriately.

inline int disable_interrupts(void) { int primask_copy; asm("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy) :: "memory");

return primask_copy; }

inline void enable_interrupts(int primask) { int primask_copy = primask;

// Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy) : "memory"); }

Then in the code you write:

int interrupt_state = disable_interrupts();

// Put your teeny bit of critical stuff here

enable_interrupts(interrupt_state);

I _hope_ that this works right for all cases -- I'm interested in what David has to say about it.

(It's not guaranteed to compile -- I just added the "memory" stuff; I need to go put it into my code)

Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com

Vote

P

pozz 10 years ago

You are right, I didn't thought about this situation.

Does the following code fix that problem?

void foo(const void *data, size_t size) { volatile struct mystruct *c = (volatile struct mystruct *)curr; if (c == NULL) { curr_s.data = data; curr_s.size = size; c = &curr_s; } else { /* ... */ } }

Now is the instruction "c = &curr_s;" *guaranteed* to be executed

*after* curr_s assignements?

Yes.

Yes, disabling interrupts is the simplest solution, but I usually try to find a more elegant solution... if any exists.

But I need to keep the last pushed data, not dropping it.

What about TWO next data structures? When a foo() is called and current structure is busy, the free next data structure is filled and the next pointer is then assigned to the new data structure.

struct mystruct { const void *data; size_t size; } *curr, *next, curr_s, next1_s, next2_s;

void foo(const void *data, size_t size) { volatile struct mystruct *c = (volatile struct mystruct *)curr; volatile struct mystruct *n = (volatile struct mystruct *)next; if (c == NULL) { curr_s.data = data; curr_s.size = size; c = &curr_s; } else if ((n == NULL) || (n == &next2_s)) { // Use next1 next1_s.data = data; next1_s.size = size; n = &next1_s; } else { // Use next2 next2_s.data = data; next2_s.size = size; n = &next2_s; } }

If volatile cast guarantees that the order of the three instructions inside if branches isn't changed, I think this solution works in any case.

Vote

D

David Brown 10 years ago

A great many people misunderstand about how memory accesses can be re-ordered in C. They are saved by several things:

If you use "volatile" generously, you will probably get the ordering you want. All volatile accesses (either because the variable is declared "volatile", or you use volatile casts for the access) are ordered with respect to each other. It is only non-volatile accesses that can change order.
Compilers don't usually move reads and writes around unless there is something to gain. If you have a fast superscaler processor with caches, it can happen a lot - in particular, compilers will often generate reads quite early so that the data is there in the register when it is actually needed. But for simpler, in-order processors, there is often no point in reading early or writing late, although the compiler may not bother writing out the data if it knows it will be writing new values to the variable shortly, and it has the free registers available.
Many developers don't really understand compiler settings and optimisations, and simply compile with optimisations disabled (how many times have you heard people say their code runs "correctly" when optimisations are disabled, but not when optimised?). Of course that means bigger and slower code - and in this branch, that can sometimes mean more expensive and power-hungry microcontrollers. And it also means code that may break on newer compilers - compilers are still free to optimise as they want despite your choice of compiler flags. "-O" settings are hints, not demands.
When you get this sort of thing wrong, you probably won't notice. Everything will work unless an interrupt hits at exactly the right point in the code, and the chances of that happening are typically small.

That all looks fine. However, generally you don't need to save interrupt state - because you /know/ whether you are in a critical section or not. If you have written:

foo(void) { // Do lots of stuff disableInterrupts(); // tiny critical section stuff enableInterrupts(); // Do lots more stuff }

then the only reason you would want to make your "disableInterrupts/enableInterrupts" store and restore the previous value is if "foo" is sometimes being called from within an interrupt function or another critical section. And if that is happening, your interrupts will be blocked during "lots of stuff" and "lots more stuff"

- your system is likely broken.

However, sometimes you have functions that you want to call from within interrupts, or other situations where nested critical sections could occur - and in that acse, you want to track and restore the interrupt status.

Vote

D

David Brown 10 years ago

No - volatile accesses are only ordered with respect to other volatile accesses (and other observable or potentially observable events, such as calls to external functions). The stores to curr_s can be moved after the store to c, since the stores to curr_s are not volatile.

(You could also add a memory barrier before "c = &curr_s;".)

Often, simple /is/ elegant!

Remember, it is extremely difficult to spot issues with this sort of thing in testing. You can examine generated assembly code - but perhaps at a later date, with slight changes to other code, compiler options, or compiler versions, the generated assembly no longer matches.

So it is better to be safe than sorry, even if that means marginally sub-optimal code. Typically for this sort of thing, adding extra volatiles is not going to cause inefficiency - you are going to do these reads and writes anyway. And disabling interrupts for a few cycles is rarely a real problem - it can be much better than using many more cycles for a non-locking solution.

Then it gets complicated!

Basically, you need a some indirection in order to be able to atomically swap the old data with the latest data.

With the note from earlier that your curr_s (and now next1_s and next2_s) need to be volatile, or accessed through volatile pointers, I think this is probably okay - but I have not studied it thoroughly.

I would recommend you declare the variables here as volatile - you have nothing to lose in efficiency, and it will make the code clearer.

I also recommend typedef'ing your struct and declaring variables separately - again, it will make the code clearer. But that's a style issue, not a correctness issue, and different people have different styles.

Vote

P

pozz 10 years ago

In that case (adding a memory barrier before "c = &curr_s", is it necessary to declare curr_s and c variables as volatile?

Yes, you're right again.

If I will decide to disable timer interrupt in foo(), is it ok to avoid declaring variables (curr, next, curr_s, ...) with keyword volatile?

I think yes, it should be ok. If interrupt (where the variables are potentially changed) is disabled, the variables aren't volatile.

Vote

T

Tim Wescott 10 years ago

I know. It's a a team-programming thing. Sometimes you sit there thinking "no one but a dumbass would use my code in this way" -- then you realize that: (A) sometimes your manager* hires dumbasses; (B) when a dumbass breaks your code, you'll be the first one blamed, and even when the smoke clears some of that blame will be stuck to you; and finally (C), sometimes, on a bad day, you're the dumbass that breaks your own code.

Well, I believe that David owns his company, but still, mistakes happen.

Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com

Vote

G

George Neuner 10 years ago

That's only partly correct.

It is true that the compiler *should* never reorder reads or writes across a declared memory barrier, however ...

... memory barriers (aka fences) are actual CPU instructions that prevent reordering reads and/or writes in the hardware. OoO CPUs require general fencing capability, and in-order CPUs require (the moral equivalent of) store fences if they perform write combining.

The asm barrier declaration should generate the corresponding barrier instruction for the target CPU.

George

Vote

D

David Brown 10 years ago

I actually only own a small bit of my company (from its early days 20+ years ago, when the company couldn't afford competitive salaries and thus gave a few key employees some shares instead). But we are a fairly small group of programmers (the company is mainly electronics production), most projects are handled by only one or two programmers, our department manager is smart enough not to hire dumbasses (mistakes happen, of course, but not often), and I have a lot of influence in what goes on in the programming group.

So usually when I have to work with idiotic code, it is because the code came from outside. I might have to pick up the pieces, but I can usually avoid the blame!

Your point is valid, of course - how you write your code and functions can be heavily influenced by who will be using them. And you don't want hidden assumptions in your functions - they should be part of the code where possible, part of the name, and if all else fails, clearly part of the comments.

So I would suggest you either use my functions, whose names make it clear what is going on:

static inline void disableGlobalInterrupts(void) { asm volatile("cpsid i" ::: "memory"); }

static inline void enableGlobalInterrupts(void) { asm volatile("cpsie i" ::: "memory"); }

Or you use your functions, but give them better (IMHO) names:

static int pauseInterrupts(void) __attribute__((warn_unused_result)); static inline int pauseInterrupts(void) { int primask_copy; asm("mrs %[primask_copy], primask\n\t" // save interrupt status "cpsid i\n\t" // disable interrupts : [primask_copy] "=r" (primask_copy) :: "memory");

return primask_copy; }

static inline void restoreIntterupts(int primask) { int primask_copy = primask;

// Restore interrupts to their previous value asm("msr primask, %[primask_copy]" : : [primask_copy] "r" (primask_copy) : "memory"); }

You might also like "enterCriticalSection" and "exitCriticalSection" as names.

And of course, pick camelcase, underscores, etc., according to preference.

(I must remember to use "__attribute((warn_unused_result))" more often in my own code.)

Vote

D

David Brown 10 years ago

If you include appropriate assembly instructions along with the memory clobber in the inline assembly function, then you will get a cpu-level barrier. Certainly when you are using super-scaler OoO cpus with long pipelines, or have caches or write buffers, then you need to be using such instructions. There are some examples in the functions I posted earlier.

Exactly what counts as a "memory barrier" is a question of definitions, as it can apply at various levels. On "big" systems, such as the x86 or PPC processors, you usually include such assembly code in your definition of barriers. (And you use "volatile" much less, because it is purely a compiler-level concept - this is why Linus Torvalds shuns the use of "volatile" except in the definition of things like barrier macros. It is not powerful enough to do the job on typical Linux cpus, and when you do things correctly with the OS primitives, it is not necessary.) On the smaller systems and microcontrollers typical of c.a.e. members, "memory barrier" usually just means the compiler level memory clobber, as that is all that is needed.

Vote

P

pozz 10 years ago

After learning interesting things from David about volatile keyword use, memory barriers and so on, I want to show three solutions to my original problem, with pros and cons.

I can't be defined a guru, so my solutions could be wrong and I will be happy if someone will write if they are right or wrong, giving suggestions. I'm writing those solutions mainly because explaining something to others is the best method to check if you have understood the topic.

Here I will try to better define the original problem.

A simple audio library should be written based around the main function play(). The audio stream to play is in RAM and is simply an array of samples to generate at the output of a DAC peripheral, embedded in the CPU. If the application calls play() when an audio stream is actually playing, the library should stop the actual stream and starts playing the new stream. The samples are generated at the output at the sampling frequency, generated by a timer interrupt.

--- audio.c --- typedef struct { const uint16_t *samples; size_t size; uint16_t *s; bool init; } AudioStream;

static AudioStream curr_s; // current stream in playing static AudioStream next1_s, next2_s; // next streams to play static AudioStream *curr, *next; // used mainly as atomic flags

void Timer_ISR(void) { uint16_t sample; bool dac_refresh = false; if (next) { // a new stream is pending curr_s = *next; curr = &curr_s; curr->init = 0; next = NULL; } if (curr) { // we are playing a stream if (!curr->init) { curr->s = curr->samples; curr->init = 1; } sample = *(curr->s); dac_refresh = true; if (++curr->s - curr->samples == curr->size) { curr = NULL; // last sample processed } } if (dac_refresh) DAC_VAL = sample; }

void play(const uint16_t *samples, size_t size) { volatile AudioStream **const c = (volatile AudioStream **)&curr; volatile AudioStream **const n = (volatile AudioStream **)&next; if (*c == NULL) { volatile AudioStream *const s = (volatile AudioStream *)&curr_s; s->samples = samples; s->size = size; s->init = 0; *c = &curr_s; } else if (*n != &next1_s) { volatile AudioStream *const s = (volatile AudioStream *)&next1_s; s->samples = samples; s->size = size; s->init = 0; *n = &next1_s; } else { volatile AudioStream *const s = (volatile AudioStream *)&next2_s; s->samples = samples; s->size = size; s->init = 0; *n = &next2_s; } }

--- audio.c ---

This is my preferred solution, because it doesn't use memory barriers at all, so it is perfectly portable on different platforms and completely standard.

I use two buffers for the next audio stream to play. They are necessary in situation where three (or more) calls to play() functions happens in main application, without any timer interrupt occurred:

play(audio1, audio1_size); ... play(audio2, audio2_size); ... play(audio3, audio3_size);

With only one next buffer, there is a real possibility that troubles could happen if timer interrupt triggers exactly at a precise point (that is my original question in my original post).

The static variables that are changed in ISR aren't declared volatile. This is to avoid stopping optimizing the ISR (having a small and fast ISR is surely a benefit).

The volatile memory accesses should be done in main/background application, i.e. in play() functions. This is why I don't access directly to curr, curr_s, next, next1_s and next2_s. I declared mirror pointers to volatile objects.

As pointed by David, it's important to use volatile for *all* the variables involved in play() function. Let's take a look at one part of it:

if (*c == NULL) { volatile AudioStream *s = (volatile AudioStream *)&curr_s; s->samples = samples; s->size = size; s->init = 0; *c = &curr_s;

If *c (that is curr) is NULL, we aren't playing any stream and ISR doesn't change any variable. It could appear safe to not use volatile in this first branch in the following *wrong* way:

if (*c == NULL) { curr_s.samples = samples; curr_s.size = size; curr_s.init = 0; *c = &curr_s; ...

Here curr_s isn't volatile so compiler could reorder instructions. If the last statement "*c = &curr_s" is moved at the top (and this is allowed) we will be in trouble if interrupt triggers exactly after.

In order to avoid reordering of instructions, we *must* use *only* volatile accesses. As David explained, compiler can't change the order of volatile memory accesses.

The only drawback of this approach is the programmer must carefully check the critical code (function play), checking if an interrupt trigger *at any point* could generate problems. To avoid problems, you will discover you need two next buffers and an atomic flag that switches from one buffer to the other.

Regarding curr and next flags, they are pointers. So I'm supposing the architecture is capable to atomically read/write pointers. If this isn't the case (for example, AVR architectures), you could change pointers to unsigned char: curr would be current_stream_is_active (0 or 1) and next could be next_stream_pending (0 for no pending stream, 1 for stream pending in next1_s, 2 for stream pending in next2_s).

I hope I haven't written too wrong things.

Vote

P

pozz 10 years ago

Il 30/04/2016 17:39, David Brown ha scritto: > [...]

avr-libc (in atomic.h) defines ATOMIC_BLOCK macro that accepts ATOMIC_FORCEON or ATOMIC_RESTORESTATE. It's gcc, so I think those macros can be reused for Cortex-M (and gcc compiler), of course changing accordingly assembler instructions that enable/disable interrupts.

Vote

P

pozz 10 years ago

I admit that following code instruction by instruction for checking if an interrupt trigger could bring to problems is tricky. The use of a double buffer (next1_s and next2_s) is obscure at first.

Maybe it could be simpler to disable and enable interrupts, without worrying about next pending audio stream.

--- audio.c --- typedef struct { const uint16_t *samples; size_t size; uint16_t *s; } AudioStream;

static AudioStream curr_s; // current stream in playing static AudioStream *curr; // used mainly as atomic flags

void Timer_ISR(void) { uint16_t sample; bool dac_refresh = false; if (curr) { // we are playing a stream if (curr->s == NULL) { curr->s = curr->samples; } sample = *(curr->s); dac_refresh = true; if (++curr->s - curr->samples == curr->size) { curr = NULL; // last sample processed } } if (dac_refresh) DAC_VAL = sample; }

void play(const uint16_t *samples, size_t size) { disableInterrupts(); // start of critical section curr_s.samples = samples; curr_s.size = size; curr_s.s = NULL; curr = &curr_s; enableInterrupts(); // end of critical section }

--- audio.c ---

In this solution, I don't need the init member in AudioStream structure: I use s member to understand if it is initialized or not.

Here I don't need volatile for any variables managed in play(): after entering critical section, all the variables can't be changed asyncronously (interrupts are disabled), so their access can be normally optimized. [It is possible to disable only the Timer interrupt and not all the interrupts.]

The only problem is to avoid optimizer to move instructions from inside the critical section to outside (before disableInterrupts() or after enableInterrupts()). This is one thing I never taked care until David explanation.

One solution to avoid reordering is to embed a memory barrier in disableInterrupts() and enableInterrupts(). The memory barrier guarantees that the memory accesses before and after it (inside critical section in our case) aren't optimized (reordered), even if curr_s and curr aren't volatile (David, is this right?).

If the code is for Atmel AVR architecture, I think it is better to use macros from :

#include ... void play(const uint16_t *samples, size_t size) { ATOMIC_BLOC(ATOMIC_FORCEON) { curr_s.samples = samples; curr_s.size = size; curr_s.s = NULL; curr = &curr_s; } }

The macro ATOMIC_BLOCK automatically disable interrupts at the beginning (using a memory barrier) and enable interrupt at the end (using again a memory barrier).

Incidentally, I noticed a strange thing in and :

--- --- ... #define sei() __asm__ __volatile__ ("sei" ::: "memory") ...

--- ---

--- --- ... static __inline__ void __iSeiParam(const uint8_t *__s) { sei(); __asm__ volatile ("" ::: "memory"); (void)__s; } ...

--- ---

Why adding a memory barrier after sei() in __iSeiParam() if sei() macro alreay has a memory barrier?

Anyway, reading is very instructive, because many tricks of C language and gcc compiler are used there (for example, cleanup attribute of a variable).

If the code is for Cortex-Mx architectures, macros proposed by Tim or David can be used. I don't know if it is necessary to add DSB and ISB hw memory barriers after "cpsid i", to be sure pending interrupts won't really trigger after disabling interrupts instruction.

Vote

D

David Brown 10 years ago

Certainly the principles behind these macros could be adapted and reused. Whether or not you like the format of those macros is a matter of taste, but the idea is that a local variable (typically held in a register, not the stack) gets a copy of the old interrupt status, then interrupts are disabled. A gcc extension to allow a "clean up" function is used to make sure a function (to enable, disable or restore interrupts as appropriate) is called when the local variable goes out of scope. Personally, I think the method is quite nice - it avoids the need to match your "enter critical block" with an "exit critical block".

For those using C++, it is perhaps neater to use a C++ class with a destructor to handle this.

Vote

P

pozz 10 years ago

Maybe another possibility is to switch off completely the optimizer for a certain funcion:

void critical_function(...) __attribute__((optimize("O0"))) { curr_s.data = data; curr_s.size = size; asm ("cpsid i"); curr = &curr_s; asm volatile("cpsie i"); }

In this way, I don't think volatile attribute for inline assembly code is necessary anymore.

However gcc manual says regarding this attribute:

"This attribute should be used for debugging purposes only. It is not suitable in production code."

Why?

Vote

R

Reinhardt Behm 10 years ago

Switching optimize off does not guarantee that the compiler respects it and even then the compiler is still free to rearrange everything. May not now, but the next version.

What we really need for some embedded stuff is a compiler option or pragma to tell the compiler "keep your fingers off my code" and a guaranty that this is respected.

Reinhardt

Vote

T

Tauno Voipio 10 years ago

The GCC tool for that is __asm__ volatile ().

-TV

Vote

Help with interrupt software routine and non-atomic operations

Join the Discussion

Didn't find your answer?