Use of volatile in structure definitions to force word accesses

T

Tim Wescott 10 years ago

This question came up in the thread titled "Re: Ada, was: Re: Fundamental C question about "if" statements".

I felt it deserved its own thread, because it's far off topic to what's become a very interesting, but far off topic thread.

I, for > >>> Good luck doing that in a reliable way on ARM with gcc when you are

>> accessing device registers.
>>>
>>> If you access a bitfield in the lower 8 bits of a register, the gcc
>>> optimiser can turn that into 8-bit ldrb/strb opcodes instead of 32-bit
>>> ldr/str opcodes and thereby causing the program to break if you need
>>> to access your registers in units of greater than 8 bits.
>>
>> That's a very interesting comment, because that's basically How It Is
>> Done here, and I've had several years worth of success on both ST and
>> TI/
>> Luminary devices. I'm not sure if that's because of the way I define
>> my bitfields (absolutely everything is defined as a struct of
>> bitfields, then a union of that struct and a 32-bit integer), or if
>> I've been lucky,
>> or what.
>>
>>
> Do your MCUs _require_ register access in units of 16/32 bits ?
>
> If so, you may be doing something which works for now, but may not work
> in the future.
>
> It might be worth having a quick look with objdump just to be sure.
>
> I have personally had this happen to me and it broke my code due to the
> register access size constraints no longer been true.
>
>> So it looks like:
>>
>> struct SPeriphRegisterBits {
>> unsigned int bit0 : 1; unsigned int bits1_10 : 10; unsigned
>> int bit11 : 1; unsigned int : 16; unsigned int
>> bits28_31 : 4;
>> };
>>
>> union UPerihRegister {
>> struct SPeriphRegisterBits bits;
>> unsigned int all;
>> }
>>
>> All of the various register definitions then get collected into a
>> structure for the peripheral, which gets declared as "extern const",
>> and defined in the linker command file.
>>
>>
> This is a bit more involved than what I was doing. I wonder if this is
> what is saving you for now.

Yes, my MCU's require access in units of 16 or 32 bits, at least in places -- at least, if the ST data book is to be believed that's the case.

I would REALLY like one of the compiler guys to weigh in on this, because I'm wondering if collecting all those unions into a struct and calling it volatile isn't forcing a 32-bit access. It would be nice to know if I'm just bamboozling the optimizer into doing what I want, or if I'm actually _telling_ it to do the right thing.

So, the final definitions (that I left out) is

struct SPeripheralRegs { UPeriphRegister1 REG1; UPeriphRegister2 REG2; -- etc -- };

extern volatile SPeripheralRegs THIS_PERIPHERAL;

Then I access things by

THIS_PERIPHERAL.REGISTER.bits.whatever = someth UPeriphRegister A;

A.all = static_castTHIS_PERIPHERAL.REGISTER.all;

A.bits.whatever = something;

static_castTHIS_PERIPHERAL.REGISTER.all = A.all;

I could see how it might do this by the intent of the language, but I could see that I may just be bamboozling the optimizer, rather than intelligently directing it. So -- I'm curious, in a more-than-idly sort of way.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

S

Simon Clubley 10 years ago

I've duplicated this using a variant of your structs and caused gcc to generate invalid code for accessing registers.

I wasn't able to duplicate the range of cases I thought I could, but I found this in one of my header files I created some years ago:

* Using bitfield structs versus bitmasks to access register bitfields: * My original plan was to use bitfield structs to access the register * bitfields. However, gcc is generating code to access the bitfields * at a byte, and not longword, level. As the MMIO fields are longword * access only, this results in invalid code been generated. * * Marking the enclosing 32-bit struct, instead of the underlying bitfields, * as volatile results in code been correctly generated for most cases. * However, when the field to be modified is a byte in the least significant * 8 bits of the register, then gcc still generates a byte level store * even though it performed a longword level read. There may also be other * cases which I have not yet come across. * * As a result, I have switched to using bitmasks to access bitfields in a * register. *

so it's clear that _where_ you place the volatile attribute is critical to how the generated code looks (and how broken it is). A longword is

32 bits BTW; that's my DEC background showing. :-)

Here's a self contained example:

============================================================================ /* * Compiled with: * * arm-none-eabi-gcc -g -c -O2 -mcpu=arm926ej-s -o tim.o tim.c */

struct SPeriphRegisterBits { // unsigned int bit0 : 1; // unsigned int bits1_10 : 10; unsigned int bit0 : 8; unsigned int bits1_10 : 3; unsigned int bit11 : 1; unsigned int : 16; unsigned int bits28_31 : 4; };

union UPerihRegister { struct SPeriphRegisterBits bits; unsigned int all; };

struct SPeripheralRegs { union UPerihRegister REG1; union UPerihRegister REG2; };

//extern volatile struct SPeripheralRegs THIS_PERIPHERAL;

// //Then I access things by // //THIS_PERIPHERAL.REGISTER.bits.whatever = something;

volatile union UPerihRegister *ur1 = (union UPerihRegister *) 0x80001000;

volatile struct SPeriphRegisterBits *sr1 = (struct SPeriphRegisterBits *) 0x80001000;

void tim_test(void) { unsigned int val;

// val = ur1->bits.bit0; // ur1->bits.bit0 = 1;

val = sr1->bit0;

sr1->bit0 = sr1->bit0 + 3;

THIS_PERIPHERAL.REG1.bits.bit0 = 3; return; } ============================================================================

and here's the objdump output:

============================================================================ tim.o: file format elf32-littlearm

Disassembly of section .text:

00000000 : unsigned int val;

// val = ur1->bits.bit0; // ur1->bits.bit0 = 1;

val = sr1->bit0; 0: e59f3024 ldr r3, [pc, #36] ; 2c 4: e5933000 ldr r3, [r3] 8: e5d32000 ldrb r2, [r3]

sr1->bit0 = sr1->bit0 + 3; c: e5d32000 ldrb r2, [r3] 10: e2822003 add r2, r2, #3 14: e20220ff and r2, r2, #255 ; 0xff 18: e5c32000 strb r2, [r3]

THIS_PERIPHERAL.REG1.bits.bit0 = 3; 1c: e59f300c ldr r3, [pc, #12] ; 30 20: e3a02003 mov r2, #3 24: e5c32000 strb r2, [r3] return; } 28: e12fff1e bx lr ... ============================================================================

Note the use of ldrb and strb instead of ldr and str in the generated code.

So it's confirmed that having a 8-bit bitfield in the LSBs of the struct causes the generated code to be broken, at least in the version of gcc I am using.

My notes from a few years ago reflect my memory that I was able to generate broken code for other cases as well, depending on exactly what was marked as volatile (which admittedly was a factor I had forgotten).

You don't happen to have any structs with the lower 8 bits being a single bitfield by any chance do you ? It would be interesting to see what objdump says in your case.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

S

Simon Clubley 10 years ago

I've now got gcc to generate bad code (at least as far as accessing registers are concerned) whenever the bitfield is an 8-bit bitfield which lies on a byte boundary.

A variant of the first example:

============================================================================ /* * Compiled with: * * arm-none-eabi-gcc -g -c -O2 -mcpu=arm926ej-s -o tim.o tim.c */

struct SPeriphRegisterBits { // unsigned int bit0 : 1; // unsigned int bits1_10 : 10; unsigned int : 8; unsigned int : 8; unsigned int bits1_10 : 3; unsigned int bit11 : 1; unsigned int bits28_31 : 4; unsigned int bit0 : 8; };

union UPerihRegister { struct SPeriphRegisterBits bits; unsigned int all; };

struct SPeripheralRegs { union UPerihRegister REG1; union UPerihRegister REG2; };

//extern volatile struct SPeripheralRegs THIS_PERIPHERAL;

// //Then I access things by // //THIS_PERIPHERAL.REGISTER.bits.whatever = something;

volatile union UPerihRegister *ur1 = (union UPerihRegister *) 0x80001000;

volatile struct SPeriphRegisterBits *sr1 = (struct SPeriphRegisterBits *) 0x80001000;

void tim_test(void) { unsigned int val;

// val = ur1->bits.bit0; // ur1->bits.bit0 = 1;

val = sr1->bit0;

sr1->bit0 = sr1->bit0 + 3;

THIS_PERIPHERAL.REG1.bits.bit0 = 3; return; } ============================================================================

and the code it generates:

============================================================================ tim.o: file format elf32-littlearm

Disassembly of section .text:

00000000 : unsigned int val;

// val = ur1->bits.bit0; // ur1->bits.bit0 = 1;

val = sr1->bit0; 0: e59f3024 ldr r3, [pc, #36] ; 2c 4: e5933000 ldr r3, [r3] 8: e5d32003 ldrb r2, [r3, #3]

sr1->bit0 = sr1->bit0 + 3; c: e5d32003 ldrb r2, [r3, #3] 10: e2822003 add r2, r2, #3 14: e20220ff and r2, r2, #255 ; 0xff 18: e5c32003 strb r2, [r3, #3]

THIS_PERIPHERAL.REG1.bits.bit0 = 3; 1c: e59f300c ldr r3, [pc, #12] ; 30 20: e3a02003 mov r2, #3 24: e5c32003 strb r2, [r3, #3] return; } 28: e12fff1e bx lr ... ============================================================================

Do any of your structs have any 8-bit bitfields which are on a byte boundary ? It would be interesting to see what gcc is doing in your case.

The cross compiler I am using is gcc 4.5.{something} although, based on the other problem reports I have come across, I suspect it's still a problem in at least some newer versions as well.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

L

Les Cargill 10 years ago

objdump is faster than we are at this :)

Les Cargill

Vote

D

David Brown 10 years ago

On my testing, this second read is ldr. (Note that the first "ldr r3, [r3]" is just because the sr1 pointer is not made const or static.)

On my testing, both accesses are 32-bit. This leads to a little more code between the load and the store, to get the masking right.

Same again - it's all 32-bit accesses, but the code is more complex for the masking.

All we have confirmed here is that you have a gcc version with the old volatile bitfield bug. (Note that arguably it was not a bug, since the C standards don't say anything about what size accesses to a bitfield should use. gcc used the smallest access because it gave shorter and faster code. But even if it is not technically a bug, it is still a surprise - and leads to problems on architectures that expect accesses to be 32-bit in such circumstances. Hence the option was created, and made standard on targets such as ARM that need it.)

What version of gcc are you using here?

Vote

D

David Brown 10 years ago

That explains it - the issue was fixed in 4.6.

There are a few questionable issues in later versions of the compiler, in regard to particularly complex bitfield structures. The trouble here is that volatile accesses are not well defined in C - much of it is left as "implementation defined" (meaning the compiler can choose, but it should be consistent and preferably documented) or even "unspecified" (meaning it has to follow certain behaviour patterns, but the compiler can choose which to use - and it can choose arbitrarily from case to case).

As with all uses of volatile, if you try to make the code unreasonably complicated, you run the risk of code that does not do as you hoped - usually because you don't understand it yourself, but sometimes because of imprecise standards definitions and sometimes because of compiler bugs.

But as long as you stick to rational usage of volatile bitfields, and gcc 4.6 or later, you should not see any problems.

Vote

A

Anders.Montonen 10 years ago

I'm not a compiler guy, but having looked at bitfields a short time ago, my understanding is that the language standard always allows the compiler to access bitfields in smaller units if it feels like it. GCC has a '-fstrict-volatile-bitfields' flag for disabling this optimization.

-a

Vote

D

David Brown 10 years ago

Maybe I should have read this post before answering in the other thread...

Let me try (see below).

It is the case for many MCU's, including most ARM devices.

As long as you are using gcc 4.6 (or any other compiler that implements volatile bitfield accesses with exactly the field size given in the bitfield definition), you are fine. C does not specify what size such accesses should have, but leaves it up to the implementation. So gcc

4.5 and earlier were technically correct to the C standards - the implementation picked a size that gave smaller and faster code, but that unfortunately did not work on many ARM microcontrollers.

That's pretty much what happens, yes. With each RMW set done independently for each bitfield access.

It is implementation dependent (defined by the compiler), but gcc 4.6 onwards defines it to work in the way you expect here.

I've pasted some bits from my replies >

A few comments here.

Don't use "unsigned int" - use uint32_t. Make it all absolutely clear and fully specified.

You don't need to do this in two parts. Put it altogether, using an anonymous struct (this is a very common extension to C99, which has existed in gcc since close to the dawn of time, and it is now part of C11).

I prefer to make typedefs of all my types, so that I don't need union or struct tags later. But that is a matter of style. And I strongly dislike systems Hungarian style, such as prefixing a type with U just because it happens to be a union (especially if you haven't used typedef, so that you have to have the "union" tag too).

Make your padding bits explicit with a name. The C++ memory model says that unnamed fields may not be changed by any write operations (C is more lenient), which can cause trouble for fields. It's better to make all your padding explicit.

I also like to add a static assertion to check that there are no mistakes when defining structures like these. Of preference I also use "-Wpadded" in gcc to warn of any unexpected padding - but that does not play well with most manufacturer-supplied headers.

typedef union { struct { uint32_t bit0 : 1; uint32_t bits1_10 : 10; uint32_t bit11 : 1; uint32_t padding : 16; uint32_t bits28_31 : 4; }; uint32_t all; } periphRegister_t;

static_assert(sizeof(periphRegister_t) == 4, "Check bitfield definition for PeriphRegister");

Having said all that, your definition will also work correctly (assuming you've got the volatile there somewhere, and are using gcc 4.6 or above which fixes the volatile bitfield problem).

I presume you mean "extern volatile const", and that's only for the read-only registers. It's common to define registers like this:

#define periphRegister (*((volatile periphRegister_t*) 0x12345678))

(Registers can, of course, be collected in larger structures first.)

The advantage of that is that everything is defined in a C header file - you don't need to mess around with linker files too. And since the compiler knows exactly what addresses are used, it can generate more efficient code than if they have to be sorted out at link time.

If you write:

void test1(void) { periphRegister.bit0 = 1; periphRegister.bit11 = 0; }

then the compiler will (correctly) generate two independent RMW sequences, optimising only the calculations of the address to use.

If you want both assignments to be done in a single RMW operation, you must do so explicitly - that's what the "all" field is for.

void test2(void) { periphRegister_t cache; cache.all = periphRegister.all; cache.bit0 = 1; cache.bit11 = 0; periphRegister.all = cache.all; }

Vote

J

John Devereux 10 years ago

The *C language* standard may allow it, but on a particular machine architecture there can be an ABI that specifies these things. For example the ARM eabi. The OP was using the bitfield feature for accessing hardware registers on microcontrollers. So the architecture will be pretty much fixed by definition and it should be safe to use (compilers bugs aside).

John Devereux

Vote

T

Tim Wescott 10 years ago

Or, as I alluded to in my original post, you run the risk of making code that's complicated to confuse the optimizer enough that it throws up its hands and does what you want at the moment, without guaranteeing that future optimizers wouldn't be able to unwind what you did and break it by making it "right".

Which is why I was asking about the intent of the volatile keyword WRT unions.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

A

Anders.Montonen 10 years ago

Yep, but why learn bad habits that will break your code when you change platforms in the next project?

-a

Vote

S

Simon Clubley 10 years ago

Interesting.

One of the things which has clearly thrown me here is that I went and looked at the comp.lang.ada thread from a few weeks ago before commenting here about bitfields.

In that thread, the OP is using a reasonably recent version of GNAT (the name for the Ada gcc front end) from ACT. The last time I checked, ACT were using gcc 4.7 for their own internal branch of the gcc tree so I thought the problem was present upto at least gcc 4.7.

However, it's possible that the Ada front end is still exposing the problem in a way that the C front end no longer is.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

S

Simon Clubley 10 years ago

You have to do this in Ada as well at the moment.

There's currently an Ada Issue which talks about adding the ability to the next version of Ada to list the bitfields to be updated in a single assignment statement so that the Ada version of the above would become a single RMW sequence without having to use a temporary variable.

It's here if you are interested:

formatting link

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

H

Hans-Bernhard Bröker 10 years ago

Am 23.09.2015 um 21:55 schrieb Tim Wescott:

As far as the language definition is concerned, "volatile" is not really supposed to have anything to do with that.

Volatile specifies that all accesses to an object have to actually happen, in the order specified by the source. So no optimize-out of reads or writes, and no shuffling of volatile accesses among each other, among other restrictions.

But it says nothing at all about the width of accesses. In th Standardese language, the reason is that it's "implementation-defined what constitutes access to a volatile-qualified object". One of the meanings of that rather opaque statement is that they left open whether access to a single element of the compound also constitutes an "access" to the whole thing. And only such accesses would be protected by the specified effect of "volatile". So no, the outer "volatile" is not really specified to have any effect on referral to the inner elements.

The only way I can see to somewhat portably control access width is to do away with the all the structs and bitfields altogether, and actually use bitwise operators on volatile-qualified object of the necessary size. I.e. if you write

volatile uint32_t *pointer_to_register = some_address; *pointer_to_register |= some_mask;

then that _does_ force the compiler to actually do a 32-bit access, because the object you're accessing is, beyond reasonable doubt, qualified "volatile", so the rules do apply here: this object has to be read, and it has to be written.

Bit fields can not portably achieve the same effect, because access to them does not necessarily constitute access to the containing compound data structure.

Vote

D

David Brown 10 years ago

The intent is, I think, pretty clear - a volatile access is the same regardless of whether or not a union is involved. The issue with volatile bitfield access was that the size of the access was not well defined (the standards don't give any clues), and for many purposes the use of differently sized accesses can give significantly more efficient code that is perfectly correct on that platform.

There are, I think, three main issues with regard to getting volatile to work as the programmer intended. If you understand these, you should be fine.

Programmers often don't understand volatile. Some believe it makes accesses atomic - so that on an 8-bit micro, they confuse making a uint16_t volatile with making it safe for swapping data between contexts (different threads or interrupt functions). And some believe that volatile accesses enforce certain orders on non-volatile accesses, or calculations.
The standards are unclear - "what constitutes a volatile access is implementation dependent". If v is a volatile variable, and your target supports arithmetic operation directly on memory addresses, then v++ may do a read, modify, then write, or it may do a single add to the memory address. If you ask to read a volatile variable of a certain size, then the compiler may read it in multiple parts, or it may read additional data beside the actual variable, or it may read only part of the variable (if it only needs part of it). Unless your compiler makes guarantees in this area (such as gcc's -fstrict-volatile-bitfields), you don't know.
Compilers may have bugs. The typical case is when complex volatile operations interact with complex optimisations, and the compiler does code movement or simplification that is not allowed for volatiles.

A key way to minimise the risks of anything going on is to keep statements and expressions involving volatiles relatively simple. Try to stick to a single read or a single write of a volatile at a time. Then the code is clear to a human reader, and clear to the compiler. (Having the volatile item deep within nested structs and unions is not a problem, nor does it matter if the volatile qualifier is on the innermost item, the outermost structure, a cast wrapper on the outside, or anything inbetween.) If vp is a volatile pointer to a volatile int, and vi is a volatile int, with foo() being a function, then writing "vp[vi = foo()]++" may be legal C, but it is asking for trouble.

Vote

D

David Brown 10 years ago

I am afraid I can't answer for GNAT - the tiny amount of Ada that I have tried does not touch on this sort of thing.

One other point is that only some targets have

-fstrict-volatile-bitfield enabled by default (ARM, for example) - on some targets, you need to enable it manually if you want that behaviour.

Vote

S

Simon Clubley 10 years ago

Yes, the OP in that thread was using ARM. I'm assuming there's something in the Ada front end which is causing this to still be an issue for Ada code.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

D

David Brown 10 years ago

I am not sure of that final point in regard to the specifications - but I am very sure of how it works in all compilers I have seen. A volatile qualifier on a struct, union or array applies equally to all members of that struct, union or array (the same applies to the const qualifier). So there is no difference between:

struct { volatile uint32_t a; volatile uint32_t b; } s1;

and

volatile struct { uint32_t a; uint32_t b; } s2;

No, that does not force the compiler to use 32-bit accesses. It forces the compiler to access all parts of the 32-bit object, but the compiler is free to divide it into 4 8-bit accesses if it wants, and it can order those accesses as it likes. It is perfectly acceptable (to the standards) for the compiler to do 4 byte-sized RMW operations - or 4 byte-sized reads, followed by the or, then 2 16-bit writes in the opposite order.

Of course, a compiler is not going to do that (unless alignment issues or the bit-size of the processor forces it to break up the accesses). But it could - according to the standards.

You are right that such full-object manipulation is less likely to bump into compiler bugs than volatile bitfield operation, but volatile bitfield operation is considered reliable enough for most targets and most modern mainstream compilers (such as gcc post 4.6) - that is why it is a common arrangement for manufacturer-supplied header files.

We are talking here about accessing hardware registers - it is inherently non-portable. Even portability across compilers (for those few that now use anything other than gcc for ARM) is unlikely.

Vote

D

DJ Delorie 10 years ago

Well, I'm the resident compiler guy who fixed it in gcc :-)

As others have noted, you need the version of gcc that supports

-fstrict-volatile-bitfields, *and* you need to have a well-defined (i.e. unambigious) struct (no holes, consistent types, etc), *and* you need to have a port that supports it (ARM does), *and* it has to be a size/alignment that the hardware supports (duh), *and* the

-fstrict-volatile-bitfields options needs to be enabled (a later version of gcc has a third setting that means "unless the ABI says otherwise" but the result is the same *if* your struct is well-defined) (-fs-v-b might (should?) be the default for ports that support it, I don't recall if they argued to change that).

So a struct like this should be OK:

volatile struct foo { int32_t a:17; int32_t b:5; int32_t c:10; };

but this would not be:

volatile struct foo { char a:8; int32_t b:24; };

But, the only portable reliable way to guarantee the right type of access is to not use bitfields.

Vote

T

Tim Wescott 10 years ago

Thanks DJ, for the work as well as the comments.

I'm willing to put up with the non-portability caused by bitfields, because at this point I don't see moving away from gcc for any reason (well, unless some client pays me to, and if so they can pay for all the rest, too).

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

Use of volatile in structure definitions to force word accesses

Join the Discussion

Didn't find your answer?