Absolute addressing on the ARM

Nils M Holm · 2014-03-16T20:22:06+00:00

Hi and sorry about butting in out of nowhere. I have a question about absolute addressing on ARMv6 processors as used in the Raspi. Recently I have written a back end for said processor and wondered about the best method for loading a value from an absolute address into a register when the absolute address cannot be known at compile time (i.e. cannot be placed in range for PC-relative addressing). I came up with the following code to load a value from X: .data X: .long 0 /* arbitrary distance here */ .data L1: .long X .text ldr r0,L1 ldr r0,[r0] which works fine. Now someone told me that it might be possible to construct absolute addresses with MOV/MOVT and let the linker fix the gory stuff. I doubt that because of the limitations the ARM seems to place on immediate values in MOV and MOVT. If I understand the manual correctly, immediate operands of MOV and friends must be 8-bit values that can be shifted to the left by up to eight bits. Wouldn't this limitation make MOV/MOVT unsuitable for loading absolute addresses that cannot be known at compile time? Or am I missing something? Any hints would be welcome! -- Nils M Holm

N

Nils M Holm 12 years ago

As I already said in a different post: my mistake, it works fine.

Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org

Vote

L

Lasse Langwadt Christensen 12 years ago

formatting link

-Lasse

Vote

L

Lasse Langwadt Christensen 12 years ago

why not:

#define RALPH (*(volatile SomePeripheralOrAnother *) 0x40039400)

evrything in one place

-Lasse

Vote

S

Simon Clubley 12 years ago

The implicit assumption I am about to make is that at the core of SSomePeripheralOrAnotherRegs is a struct along the lines of:

struct SSomePeripheralOrAnotherRegs_t { dev_reg_1_def; dev_reg_2_def; dev_reg_3_def; };

and _without_ the volatile attribute on each register definition.

I don't understand why the struct based datatype is defined as volatile in this case instead of the registers within the struct.

OTOH, if the registers _are_ defined as volatile, then I don't see why it's needed on the struct based datatype itself.

As far as I can see, the struct itself is not volatile; it's just a language level construct to gather together a set of device specific registers. It's the registers defined within the struct, and only the registers within the struct, which need to be marked as volatile in this case.

This is how I define these structs and it has always worked for me across a range of architectures.

What am I missing ?

Simon.

PS: BTW, I also use the above #define approach as well, and yes, it nicely keeps everything in one place.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

D

David Brown 12 years ago

The key thing to understand about volatile is that there is no such thing as a real "volatile" type, struct, variable or other object in C - it is /accesses/ that are volatile. When you give a variable the volatile qualifier, you are simply telling the compiler that all accesses to that object must be volatile accesses.

You can mark a variable as volatile either in the declaration of the individual variable (including an extern declaration), or by declaring it to be a type that is volatile qualified. If it is an aggregated type (struct or union), it does not matter if the individual items are marked volatile, or the whole type is marked volatile. And it does not matter how many "volatiles" you have.

Thus the effect is the same if you define the struct to be volatile, or the individual items as volatile, or both. And you can make the type volatile, or the declared variable volatile, or both.

Where you prefer to put the volatile is a matter of style (usually picked for you by the compiler or microcontroller vendor when they make the header files). Putting it in each field in the struct is a bit verbose, but gives the possibility of having some fields volatile and some non-volatile. Putting it on the struct definition is neater, and means the volatile qualifier cannot be forgotten when the struct is used

- but it means you can't avoid using it. Putting it on the declaration of the object itself makes it explicit at the point of declaration, and also means that you can use the struct in a non-volatile way (such as for a local cached copy of the data in question). Finally, you can put the volatile on the access of the data, making it explicit when it is used, but forcing you to remember to use it.

Note that there is a subtle difference between declaring RALPH as an extern volatile, and making it from a cast from an integer literal to a pointer-to-volatile. When a variable is declared as volatile, it is undefined behaviour to use a cast to remove the volatile qualification. But the pointer cast macro does not declare any volatile data, merely a pointer - you can legally use casts to remove the volatile aspect.

Vote

T

Tim Wescott 12 years ago

Mostly, because I think it looks ugly. I can't really defend it further

-- and I promise that if you show me your code, I won't barf on it.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

G

George Neuner 12 years ago

would

Hi Simon,

Just a postscript to David's excellent response. Remember that a struct doesn't really "contain" anything, but rather is just an organizational template for "viewing" a block of address space [not necessarily "memory"] in a particular way. That template then is "placed" over a particular address space to view it correspondingly with the declaration.

[some people don't like the "view" notion of structs, but consider that you can place multiple different structs at the same location to yield different views of the same underlying address space.]

Moreover, a more generic declaration might be useful elsewhere without the requirement for volatile access.

The handling of volatiles is poorly defined in the C standard [any version - take your pick]. Access to a volatile variable through a non-volatile pointer may implicitly cast away the volatile qualifier on the variable. E.g.,

volatile int v; int *pv = &v; /* bad */ volatile int *vpv = &v; /* much better */

However, given

struct _s { int i } s; struct _s *ps = &s; int *pi = &s.i; int x;

"x = ps->i" is semantically different from "x = *pi" even though the result [at least here] is equivalent. The struct reference contains an implicit cast to int*: i.e. it acts as if you really had written

x = *((int*)((char*)ps + offsetof(_s,i)))

The implicit cast on struct member references is a source of confusion for compiler implementers regarding volatile members: does it or does it not cast away a volatile qualifier on the member? IME it is compiler dependent what happens if you access a volatile struct member through a non-volatile struct pointer.

across

MMV. I would expect that if all the compilers use the same base: e.g., all are GCC (or whatever) derived. But you shouldn't count on it across different compilers.

IME the result even may depend on how the definitions are written: e.g.,

struct peripheral_t { volatile uint8_t reg1, ... } *RALPH = 0x40039400;

may work, whereas

struct peripheral_t { volatile uint8_t reg1, ... }; : peripheral_t *RALPH = 0x40039400;

may fail.

You can say that's a stupid compiler, and I would agree ... but you have to work with what you've got.

George

Vote

T

Tauno Voipio 12 years ago

This works, and makes the I/O accessible from GDB, too:

--- clip clip ---

/* Cortex-M3 system tick definitions */ /* Semantics Oy, Tauno Voipio */ /* $Id: cxm3tick.h $ */

#ifndef CXM3TICK_H #define CXM3TICK_H

#include

struct systick { uint32_t ctrl; /* control / status */ uint32_t reload; /* reload value */ uint32_t value; /* counter value */ };

extern volatile struct systick cxm3tick; /* 0xe000e010 */

/* .ctrl: */

#define TICK_EN (1

Vote

D

David Brown 12 years ago

That's a useful idea, but you need to be a bit careful - as far as the C language (and therefore the compiler) is concerned, each piece of memory you access is part of a unique object with a defined type. The only legal exceptions are unions, pointers to char, and structs which share the first few fields.

So if you have a pointer pS1 to struct S1 and a pointer pS2 to struct S2, the compiler "knows" that they cannot both point to the same object in memory - this is know as type-based alias analysis. If you use pS1 to modify the object, then read the memory via pS2, you may get unexpected results - since the compilers "knows" that these point to different areas of memory, the loads and stores are independent and the compiler may re-arrange them.

Of course, when you are using volatile accesses, the compiler cannot re-arrange such accesses.

Other ways to avoid type-based alias analysis (other than disabling it, which limits other compiler optimisations) include using memory barriers, accessing data through pointer-to-char (a good compiler should optimise memcpy and similar functions as inline loops using larger-than-char accesses if the alignments are suitable), and accessing data through unions (known as "type punning unions"). Instead of pointers directly to struct S1 and struct S2 types, you could use a single pointer to a union of the two structs.

I agree that volatiles are poorly defined in the standard - but the standards are clear on this point. It is undefined behaviour to cast away the volatile qualifier. So "int *pv = &v" is not just bad, it is illegal in C.

I believe that "x = ps->i" is semantically identical to "x = *pi" - it is valid to take a pointer to an element of a struct. I can't find the reference in the C standards, but we could always cross-post to comp.lang.c for an "official" ruling (at the risk of annoying everyone else here with the serious pedantry that would be posted).

You are correct in how "x = ps->i" is interpreted - but since "int *pi = &s.i" is interpreted as

int *pi = ((int*)((char*)ps + offsetof(_s,i)))

then "x = *pi" is exactly the same as "x = ps->i".

It is definitely compiler dependent, because it is undefined behaviour - you cannot cast away the volatile qualifier (or the const qualifier) and expect it to work properly. Compilers will generally warn about this.

Given this code:

typedef struct { volatile int x; volatile int y; } S;

S s;

void foo1(void) { s.x = 1; }

void foo2(void) { volatile int *pvi = &s.x; *pvi = 1; }

void foo3(void) { int *pi = &s.x; *pi = 1; }

gcc will complain "warning: initialization discards qualifiers from pointer target type" even if you don't enable warnings. It will still generate identical code for each function - but it could produce nasal daemons for foo3().

Neither of these are valid C - you can't turn an integer literal into a pointer without a cast. If they were cleaned up appropriately, including casts, then the results are identical. I can't imagine any compiler treating them differently (although I /can/ imagine a poor-quality compiler having bugs in its implementation of volatile struct fields).

Vote

D

David Brown 12 years ago

I won't argue with you about looking ugly, but it's okay to have ugly code hidden away - the definition of RALPH as a macro here is perfectly clear.

Using macros like this is far and away the most common method in use today. It has two main advantages over using an "extern" definition.

First, it keeps everything in C rather than having essential parts of your code in linker scripts (sometimes you need project-specific linker scripts, but most people prefer to avoid them), or having assembly modules (which most people also prefer to avoid).

Secondly, it gives the compiler a lot more opportunity for generating better code. This is particularly noticeable on architectures like ARM

- with the macro definitions, the compiler can load a pointer to one peripheral into a register and use register+offset addressing for other peripherals (within the same function). Using extern definitions, the compiler has to use large and slow absolute addressing for each different peripheral used.

Vote

S

Simon Clubley 12 years ago

Thinking about it a bit more, I think what _really_ caught my attention when I saw Tim's code was thinking about what is the scope of the initial read in a read/modify/write cycle when the struct itself is the volatile component ?

In the approach I prefer, the struct contains a collection of independent volatile variables and the compiler does not read another variable when generating code to update a specific volatile variable within the struct.

(This assumes it's physically possible to perform the requested access to just the variable within the constraints of the MCU's architecture.)

However, when the struct itself is the volatile component, I can see only one volatile component, the struct. The struct may have multiple variables but the volatile attribute applies to the struct itself.

In such a struct, is it 100% guaranteed by the C language standards that, when referencing a specific variable within the struct, only the memory allocated to that variable is accessed in the generated code or is the compiler allowed to access the memory of neighbouring variables ?

(Or is it even allowed to read the whole struct when doing a R/M/W cycle when the volatile attribute is on the struct itself ?)

Quite a bit of the time, I just throw away the manufacturer's headers and create my own from the MCU's technical reference manual. Sometimes it's because of the manufacturer's copyrights on the headers (Microchip for example) or because I think the headers are junk and I can do better. Doing this as a hobby gives you that level of freedom. :-)

(The TRM register layouts are usually in a form which makes them suitable for editing with a few emacs keyboard macros after you have pasted them from the TRM PDF into a emacs buffer.)

I use my own data type, which has the volatile attribute as part of the type, when defining register variables so being verbose isn't a problem in this case. (I have a fondness for user defined data types at multiple levels when creating headers.)

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

D

David Brown 12 years ago

I suppose you could say it would be more consistent if accessing a volatile struct meant fully reading or writing the whole struct each time. However, that is not what happens either in theory (i.e., the C standards) or in practice (real-world compilers).

The C standards consider a "volatile struct" to be a struct in which each member is volatile (the same applies to unions). See the examples in section 6.5.2.3 in the C11 standards (the latest draft, which is virtually identical to the final standard, is freely available online - look for document N1570).

Thus a volatile struct and a struct of volatiles are identical to C.

When you are looking for guarantees about volatiles, the key sentence in the standard is "What constitutes an access to an object that has volatile-qualified type is implementation-defined". There are /no/ guarantees from the standards. So you have no guarantee that a volatile read will not also read neighbouring data, nor that it will be carried out as a single read rather than multiple smaller reads. Writes may not write to neighbouring data (except within bitfield operations, which are even less clearly specified), but they can use multiple small writes.

Compiler implementations can, of course, give specific guarantees. And almost all compilers will generate code that does not split the access into smaller accesses, assuming alignments and sizes match the hardware. But some compilers will generate larger read accesses than specified - in particular, it was common in pre-Cortex ARM code to implement 8-bit and 16-bit reads as a 32-bit read with shifting and masking. Also note that if you are doing something like assigning one struct to another struct, the compiler can combine small reads into larger ones even if they are made of small volatile fields.

_Atomic accesses are better specified, if your compiler supports them - then there /is/ a difference between an _Atomic struct and a struct of _Atomic.

I have sometimes had to correct manufacturer's headers (fixing typos, or removing the volatile qualifier on struct members because of the limitations noted above). But I don't write my own if I can avoid it - doing this as a professional means I can't spend time on such niceties without good reason.

Vote

S

Simon Clubley 12 years ago

And it would be a Very Bad Thing (TM) if it did full struct reads. :-)

Thank you, I will.

And that answers my query rather nicely thanks.

Oh, I'm fully aware about the issues surrounding bitfields. :-) I did some experiments a year or two ago with using bitfields instead of masks in some ARM header definitions.

I gave up and went back to using masks. Some things worked rather nicely but I could never 100% guarantee the generated code would do what I expected it to do with regards to reads/writes to the underlying 32-bit register.

In my day job, I am a commercial systems programmer/sysadmin and it's quite usual for me to work with high quality code and documentation. For example, I have a VMS background (and a general DEC background) and while VMS is dated by today's standards. the documentation and example code is very good.

It was quite revealing seeing some of things the various MCU manufacturers put out as code/documentation when examined in light of my day job experiences.

However, I understand exactly where you are coming from and understand that you have to work with what you have when someone is paying you for your time.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

B

BartC 12 years ago

Why not? There are considerable advantages to having a simple, small compiler, some of which the OP has pointed out.

I've always used my own compilers *and* languages, and they have been used to create real, commercial products too.

I think there is a place for a simple, static compiled language other than the same boring choice of always using C (or sometimes, C++; same thing really). It can be simpler and tidier too because it will have less baggage. (Although my own effort is likely to stay private.)

As for speed, my last working unoptimised compiler, for x86, was on a par with gcc -O0 (and an experimental optimised version could just match other non-gcc optimising compilers). However, because typically I make it very, very easy to have inline asm code, it is a simple matter to optimise specific routines this way, and approach or surpass gcc -O1.

As for ARM, I haven't had a go at that yet. I also noticed the lack of absolute addressing. But what puts me off though is, after generating ARM ASM, I end up having to use gcc anyway! (I assume the gnu assembler that's been mentioned is the one inside gcc.)

But a funny thing about ARM (specifically the one in the 'Pi') and gcc: the first C program I tried, even compiled with -O3, ran at about one third the expected speed. This was because gcc, thinking some pointer values were misaligned (they weren't misaligned in my program, and this model of ARM didn't have that issue anyway) was doing byte-at-a-time accesses to load and store values! It didn't bother to mention this small detail. I can tell you that any code of mine, no matter how bad, at least wouldn't have done that!

Bartc

Vote

D

David Brown 12 years ago

Normally that's true - although sometimes people misunderstand volatile and think it gives them atomic access. Imagine a struct containing a count of days, and a count of seconds, with an interrupt routine updating the counters. If the main loop reads this without due consideration, you get a chance of everything going horribly wrong once per 24 hours - and volatile on the struct or the fields will not help. The _Atomic qualifier on the struct will solve the problem - /if/ you've got a compiler that supports it.

It is not the most exciting bedtime reading, but it can be useful. It is also a bit depressing how badly some things are specified (or left as implementation defined), and how many things we take for granted that are actually not required by the standards.

It is all "implementation defined". So if your compiler says it implements volatile bitfields in a particular way (such as gcc with the

-fvolatile-bitfields flag), you have your guarantees. But things can change on a different compiler, different versions, or different targets. As a general rule, if the compiler vendor provides headers with registers defined using bitfields, you can be confident that it will implement volatile bitfields in the most user-friendly manner of using the bitfield specified size for all reads and writes (except when you read or write the structure as a whole, of course).

At the risk of making unfair generalisations, I think a lot of the example code produced by MCU and compiler vendors seems to be produced by students doing summer jobs - certainly the code is seldom of the quality you would expect for serious embedded systems.

Vote

D

David Brown 12 years ago

Yes, there are advantages to having a small and simple compiler in some contexts. But this particular compiler is very limited in the subset of C it supports, and that greatly reduces its usefulness of normal work. And since the generated code is about half the speed of gcc on -O0, it will be a factor of 5 to 10 times slower than /real/ code generated by /real/ compilers using /real/ compiler flags. Would you be happy for all the software on your computer to run at 10-20% of normal speed, just because the software vendor wanted to use a simpler compiler? Of course not. So for normal work, you use a normal compiler - for educational work or other specialised usage, you might want to use a more niche compiler.

There are vast numbers of programming languages, many of which are supported by different tools - and there are many good reasons for using them. Very often you will pick the right programming language for the task (occasionally making your own language if that's the best solution), then pick the compiler or interpreter to suit (again, occasionally writing your own). But if you pick a major language - such as C - for your program, you do not then pick a very small and very limited compiler as your development tool unless you have very specialised needs - such as being able to run it on a tiny host, or being able to understand the compiler's source code.

I don't disagree with that (except that C and C++ are not the same thing at all - and the distance between them has been growing rapidly). I don't think the complexity or size of C compilers is the reason for this, however - the aim would be to make a better statically compiled language than C (for whatever value of "better" suits your purpose).

With inline assembly, you are no longer working in C (obviously) and so is of no relevance in a speed contest.

There are many reasons why one might usefully write a C compiler - for fun, education (either your own or other peoples'), for specialised processors, for specialised hosts, or as a basis for an "extended C" language. But you don't write one for a standard processor architecture and aim to be fast, unless you have the resources to compete with the big names (gcc, llvm, Intel, MS, and big embedded toolchain vendors) - unless it is for fun or education. Otherwise you need to do an enormous amount of work (probably tens of man-years) to get close on speed, features, and correctness, for a tool that almost no one will ever use.

The gnu assembler is part of the gnu binutils project (along with the linker and librarian) - it is not part of gcc. And of course if you are writing your own compiler, you can write your own assembler and linker - it's a lot easier than writing an optimising compiler.

gcc will assume that pointers are properly aligned according to their types - you have to go out of your way to lose that (such as by casting pointers). If you have code that takes a pointer-to-char and you pass it pointers to 32-bit values, then of course the compiler has to generate byte-sized code, because that is the only legal choice in C. There are several ways to tell the compiler about larger alignment and access sizes, but you have to give the compiler the information before it can generate such code. (And if your compiler breaks the relevant rules for C and/or the hardware, that's your choice - but it is not then a C compiler.)

Vote

T

Tim Wescott 12 years ago

That's an interesting point. The peripherals on the part I'm using tend to be in widely-spaced blocks, so I'm not sure if that really applies here. But it could. I'll need to think about that.

Most of my customers have low enough volume sales that my goal in choosing a processor is to get one that's big enough and fast enough so that I don't have to be sparing with clock ticks or memory when I'm writing code: shoe-horning always seems to consume lots of expensive engineering hours; in this age of big fast inexpensive processors, I don't see the payback unless you're building 100k units at a time.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

S

Simon Clubley 12 years ago

Quite true. Don't the Free Pascal people use binutils just as gcc does ?

I've seen this byte-at-a-time issue before on a ARM target when __attribute__ ((packed)) was specified for a struct. Even though gcc had enough information to generate reasonable code, and to guarantee the struct was aligned correctly, the code it generated was rather stupid. It even generated byte-at-a-time code for 32-bit volatile variables (used for device registers).

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

N

Nils M Holm 12 years ago

Of course I would. I would be happy to trade execution speed for compilation speed in many cases. Back in the days we had a project that took almost an hour to compile using MSC and four minutes to compile with Turbo C. We were quite happy to use a faster compiler, even if the resulting executable was a bit larger and slower.

BTW, when comparing my own compiler with GCC -O3, it's more 35% of the optimal speed, and for many programs that I use, both in private and at work, that does not make any difference.

Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org

Vote

B

BartC 12 years ago

I think that was it (iirc): 1-byte alignment was applied to a struct such as:

{int; double; int;}

so that you had field widths of (4, 8, 4) bytes, and a total size of 16. Without the packing, I think it would align the double on an 8-byte boundary (which seemed pointless on the 32-bit ARM) and end up with a 20-byte struct.

In the end I just moved the fields around so that it was {int; int; double;}. (The actual struct was much more complex.)

Bartc

Vote

Absolute addressing on the ARM

Join the Discussion

Didn't find your answer?