This question came up in the thread titled "Re: Ada, was: Re: Fundamental C question about "if" statements".
I felt it deserved its own thread, because it's far off topic to what's become a very interesting, but far off topic thread.
I, for > >>> Good luck doing that in a reliable way on ARM with gcc when you are
>> accessing device registers.
>>>
>>> If you access a bitfield in the lower 8 bits of a register, the gcc
>>> optimiser can turn that into 8-bit ldrb/strb opcodes instead of 32-bit
>>> ldr/str opcodes and thereby causing the program to break if you need
>>> to access your registers in units of greater than 8 bits.
>>
>> That's a very interesting comment, because that's basically How It Is
>> Done here, and I've had several years worth of success on both ST and
>> TI/
>> Luminary devices. I'm not sure if that's because of the way I define
>> my bitfields (absolutely everything is defined as a struct of
>> bitfields, then a union of that struct and a 32-bit integer), or if
>> I've been lucky,
>> or what.
>>
>>
> Do your MCUs _require_ register access in units of 16/32 bits ?
>
> If so, you may be doing something which works for now, but may not work
> in the future.
>
> It might be worth having a quick look with objdump just to be sure.
>
> I have personally had this happen to me and it broke my code due to the
> register access size constraints no longer been true.
>
>> So it looks like:
>>
>> struct SPeriphRegisterBits {
>> unsigned int bit0 : 1; unsigned int bits1_10 : 10; unsigned
>> int bit11 : 1; unsigned int : 16; unsigned int
>> bits28_31 : 4;
>> };
>>
>> union UPerihRegister {
>> struct SPeriphRegisterBits bits;
>> unsigned int all;
>> }
>>
>> All of the various register definitions then get collected into a
>> structure for the peripheral, which gets declared as "extern const",
>> and defined in the linker command file.
>>
>>
> This is a bit more involved than what I was doing. I wonder if this is
> what is saving you for now.
Yes, my MCU's require access in units of 16 or 32 bits, at least in places -- at least, if the ST data book is to be believed that's the case.
I would REALLY like one of the compiler guys to weigh in on this, because I'm wondering if collecting all those unions into a struct and calling it volatile isn't forcing a 32-bit access. It would be nice to know if I'm just bamboozling the optimizer into doing what I want, or if I'm actually _telling_ it to do the right thing.
I could see how it might do this by the intent of the language, but I could see that I may just be bamboozling the optimizer, rather than intelligently directing it. So -- I'm curious, in a more-than-idly sort of way.
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
I've duplicated this using a variant of your structs and caused gcc to generate invalid code for accessing registers.
I wasn't able to duplicate the range of cases I thought I could, but I found this in one of my header files I created some years ago:
* Using bitfield structs versus bitmasks to access register bitfields: * My original plan was to use bitfield structs to access the register * bitfields. However, gcc is generating code to access the bitfields * at a byte, and not longword, level. As the MMIO fields are longword * access only, this results in invalid code been generated. ** Marking the enclosing 32-bit struct, instead of the underlying bitfields, * as volatile results in code been correctly generated for most cases. * However, when the field to be modified is a byte in the least significant * 8 bits of the register, then gcc still generates a byte level store * even though it performed a longword level read. There may also be other * cases which I have not yet come across. * * As a result, I have switched to using bitmasks to access bitfields in a * register. *
so it's clear that _where_ you place the volatile attribute is critical to how the generated code looks (and how broken it is). A longword is
32 bits BTW; that's my DEC background showing. :-)
Note the use of ldrb and strb instead of ldr and str in the generated code.
So it's confirmed that having a 8-bit bitfield in the LSBs of the struct causes the generated code to be broken, at least in the version of gcc I am using.
My notes from a few years ago reflect my memory that I was able to generate broken code for other cases as well, depending on exactly what was marked as volatile (which admittedly was a factor I had forgotten).
You don't happen to have any structs with the lower 8 bits being a single bitfield by any chance do you ? It would be interesting to see what objdump says in your case.
Simon.
--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
I've now got gcc to generate bad code (at least as far as accessing registers are concerned) whenever the bitfield is an 8-bit bitfield which lies on a byte boundary.
Do any of your structs have any 8-bit bitfields which are on a byte boundary ? It would be interesting to see what gcc is doing in your case.
The cross compiler I am using is gcc 4.5.{something} although, based on the other problem reports I have come across, I suspect it's still a problem in at least some newer versions as well.
Simon.
--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
On my testing, this second read is ldr. (Note that the first "ldr r3, [r3]" is just because the sr1 pointer is not made const or static.)
On my testing, both accesses are 32-bit. This leads to a little more code between the load and the store, to get the masking right.
Same again - it's all 32-bit accesses, but the code is more complex for the masking.
All we have confirmed here is that you have a gcc version with the old volatile bitfield bug. (Note that arguably it was not a bug, since the C standards don't say anything about what size accesses to a bitfield should use. gcc used the smallest access because it gave shorter and faster code. But even if it is not technically a bug, it is still a surprise - and leads to problems on architectures that expect accesses to be 32-bit in such circumstances. Hence the option was created, and made standard on targets such as ARM that need it.)
There are a few questionable issues in later versions of the compiler, in regard to particularly complex bitfield structures. The trouble here is that volatile accesses are not well defined in C - much of it is left as "implementation defined" (meaning the compiler can choose, but it should be consistent and preferably documented) or even "unspecified" (meaning it has to follow certain behaviour patterns, but the compiler can choose which to use - and it can choose arbitrarily from case to case).
As with all uses of volatile, if you try to make the code unreasonably complicated, you run the risk of code that does not do as you hoped - usually because you don't understand it yourself, but sometimes because of imprecise standards definitions and sometimes because of compiler bugs.
But as long as you stick to rational usage of volatile bitfields, and gcc 4.6 or later, you should not see any problems.
I'm not a compiler guy, but having looked at bitfields a short time ago, my understanding is that the language standard always allows the compiler to access bitfields in smaller units if it feels like it. GCC has a '-fstrict-volatile-bitfields' flag for disabling this optimization.
Maybe I should have read this post before answering in the other thread...
Let me try (see below).
It is the case for many MCU's, including most ARM devices.
As long as you are using gcc 4.6 (or any other compiler that implements volatile bitfield accesses with exactly the field size given in the bitfield definition), you are fine. C does not specify what size such accesses should have, but leaves it up to the implementation. So gcc
4.5 and earlier were technically correct to the C standards - the implementation picked a size that gave smaller and faster code, but that unfortunately did not work on many ARM microcontrollers.
That's pretty much what happens, yes. With each RMW set done independently for each bitfield access.
It is implementation dependent (defined by the compiler), but gcc 4.6 onwards defines it to work in the way you expect here.
I've pasted some bits from my replies >
A few comments here.
Don't use "unsigned int" - use uint32_t. Make it all absolutely clear and fully specified.
You don't need to do this in two parts. Put it altogether, using an anonymous struct (this is a very common extension to C99, which has existed in gcc since close to the dawn of time, and it is now part of C11).
I prefer to make typedefs of all my types, so that I don't need union or struct tags later. But that is a matter of style. And I strongly dislike systems Hungarian style, such as prefixing a type with U just because it happens to be a union (especially if you haven't used typedef, so that you have to have the "union" tag too).
Make your padding bits explicit with a name. The C++ memory model says that unnamed fields may not be changed by any write operations (C is more lenient), which can cause trouble for fields. It's better to make all your padding explicit.
I also like to add a static assertion to check that there are no mistakes when defining structures like these. Of preference I also use "-Wpadded" in gcc to warn of any unexpected padding - but that does not play well with most manufacturer-supplied headers.
static_assert(sizeof(periphRegister_t) == 4, "Check bitfield definition for PeriphRegister");
Having said all that, your definition will also work correctly (assuming you've got the volatile there somewhere, and are using gcc 4.6 or above which fixes the volatile bitfield problem).
I presume you mean "extern volatile const", and that's only for the read-only registers. It's common to define registers like this:
(Registers can, of course, be collected in larger structures first.)
The advantage of that is that everything is defined in a C header file - you don't need to mess around with linker files too. And since the compiler knows exactly what addresses are used, it can generate more efficient code than if they have to be sorted out at link time.
The *C language* standard may allow it, but on a particular machine architecture there can be an ABI that specifies these things. For example the ARM eabi. The OP was using the bitfield feature for accessing hardware registers on microcontrollers. So the architecture will be pretty much fixed by definition and it should be safe to use (compilers bugs aside).
Or, as I alluded to in my original post, you run the risk of making code that's complicated to confuse the optimizer enough that it throws up its hands and does what you want at the moment, without guaranteeing that future optimizers wouldn't be able to unwind what you did and break it by making it "right".
Which is why I was asking about the intent of the volatile keyword WRT unions.
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
One of the things which has clearly thrown me here is that I went and looked at the comp.lang.ada thread from a few weeks ago before commenting here about bitfields.
In that thread, the OP is using a reasonably recent version of GNAT (the name for the Ada gcc front end) from ACT. The last time I checked, ACT were using gcc 4.7 for their own internal branch of the gcc tree so I thought the problem was present upto at least gcc 4.7.
However, it's possible that the Ada front end is still exposing the problem in a way that the C front end no longer is.
Simon.
--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
There's currently an Ada Issue which talks about adding the ability to the next version of Ada to list the bitfields to be updated in a single assignment statement so that the Ada version of the above would become a single RMW sequence without having to use a temporary variable.
It's here if you are interested:
formatting link
Simon.
--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world
As far as the language definition is concerned, "volatile" is not really supposed to have anything to do with that.
Volatile specifies that all accesses to an object have to actually happen, in the order specified by the source. So no optimize-out of reads or writes, and no shuffling of volatile accesses among each other, among other restrictions.
But it says nothing at all about the width of accesses. In th Standardese language, the reason is that it's "implementation-defined what constitutes access to a volatile-qualified object". One of the meanings of that rather opaque statement is that they left open whether access to a single element of the compound also constitutes an "access" to the whole thing. And only such accesses would be protected by the specified effect of "volatile". So no, the outer "volatile" is not really specified to have any effect on referral to the inner elements.
The only way I can see to somewhat portably control access width is to do away with the all the structs and bitfields altogether, and actually use bitwise operators on volatile-qualified object of the necessary size. I.e. if you write
then that _does_ force the compiler to actually do a 32-bit access, because the object you're accessing is, beyond reasonable doubt, qualified "volatile", so the rules do apply here: this object has to be read, and it has to be written.
Bit fields can not portably achieve the same effect, because access to them does not necessarily constitute access to the containing compound data structure.
The intent is, I think, pretty clear - a volatile access is the same regardless of whether or not a union is involved. The issue with volatile bitfield access was that the size of the access was not well defined (the standards don't give any clues), and for many purposes the use of differently sized accesses can give significantly more efficient code that is perfectly correct on that platform.
There are, I think, three main issues with regard to getting volatile to work as the programmer intended. If you understand these, you should be fine.
Programmers often don't understand volatile. Some believe it makes accesses atomic - so that on an 8-bit micro, they confuse making a uint16_t volatile with making it safe for swapping data between contexts (different threads or interrupt functions). And some believe that volatile accesses enforce certain orders on non-volatile accesses, or calculations.
The standards are unclear - "what constitutes a volatile access is implementation dependent". If v is a volatile variable, and your target supports arithmetic operation directly on memory addresses, then v++ may do a read, modify, then write, or it may do a single add to the memory address. If you ask to read a volatile variable of a certain size, then the compiler may read it in multiple parts, or it may read additional data beside the actual variable, or it may read only part of the variable (if it only needs part of it). Unless your compiler makes guarantees in this area (such as gcc's -fstrict-volatile-bitfields), you don't know.
Compilers may have bugs. The typical case is when complex volatile operations interact with complex optimisations, and the compiler does code movement or simplification that is not allowed for volatiles.
A key way to minimise the risks of anything going on is to keep statements and expressions involving volatiles relatively simple. Try to stick to a single read or a single write of a volatile at a time. Then the code is clear to a human reader, and clear to the compiler. (Having the volatile item deep within nested structs and unions is not a problem, nor does it matter if the volatile qualifier is on the innermost item, the outermost structure, a cast wrapper on the outside, or anything inbetween.) If vp is a volatile pointer to a volatile int, and vi is a volatile int, with foo() being a function, then writing "vp[vi = foo()]++" may be legal C, but it is asking for trouble.
I am not sure of that final point in regard to the specifications - but I am very sure of how it works in all compilers I have seen. A volatile qualifier on a struct, union or array applies equally to all members of that struct, union or array (the same applies to the const qualifier). So there is no difference between:
No, that does not force the compiler to use 32-bit accesses. It forces the compiler to access all parts of the 32-bit object, but the compiler is free to divide it into 4 8-bit accesses if it wants, and it can order those accesses as it likes. It is perfectly acceptable (to the standards) for the compiler to do 4 byte-sized RMW operations - or 4 byte-sized reads, followed by the or, then 2 16-bit writes in the opposite order.
Of course, a compiler is not going to do that (unless alignment issues or the bit-size of the processor forces it to break up the accesses). But it could - according to the standards.
You are right that such full-object manipulation is less likely to bump into compiler bugs than volatile bitfield operation, but volatile bitfield operation is considered reliable enough for most targets and most modern mainstream compilers (such as gcc post 4.6) - that is why it is a common arrangement for manufacturer-supplied header files.
We are talking here about accessing hardware registers - it is inherently non-portable. Even portability across compilers (for those few that now use anything other than gcc for ARM) is unlikely.
Well, I'm the resident compiler guy who fixed it in gcc :-)
As others have noted, you need the version of gcc that supports
-fstrict-volatile-bitfields, *and* you need to have a well-defined (i.e. unambigious) struct (no holes, consistent types, etc), *and* you need to have a port that supports it (ARM does), *and* it has to be a size/alignment that the hardware supports (duh), *and* the
-fstrict-volatile-bitfields options needs to be enabled (a later version of gcc has a third setting that means "unless the ABI says otherwise" but the result is the same *if* your struct is well-defined) (-fs-v-b might (should?) be the default for ports that support it, I don't recall if they argued to change that).
I'm willing to put up with the non-portability caused by bitfields, because at this point I don't see moving away from gcc for any reason (well, unless some client pays me to, and if so they can pay for all the rest, too).
--
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.