Absolute addressing on the ARM

Nils M Holm · 2014-03-16T20:22:06+00:00

Hi and sorry about butting in out of nowhere. I have a question about absolute addressing on ARMv6 processors as used in the Raspi. Recently I have written a back end for said processor and wondered about the best method for loading a value from an absolute address into a register when the absolute address cannot be known at compile time (i.e. cannot be placed in range for PC-relative addressing). I came up with the following code to load a value from X: .data X: .long 0 /* arbitrary distance here */ .data L1: .long X .text ldr r0,L1 ldr r0,[r0] which works fine. Now someone told me that it might be possible to construct absolute addresses with MOV/MOVT and let the linker fix the gory stuff. I doubt that because of the limitations the ARM seems to place on immediate values in MOV and MOVT. If I understand the manual correctly, immediate operands of MOV and friends must be 8-bit values that can be shifted to the left by up to eight bits. Wouldn't this limitation make MOV/MOVT unsuitable for loading absolute addresses that cannot be known at compile time? Or am I missing something? Any hints would be welcome! -- Nils M Holm

B

BartC 12 years ago

That 5-10 times would be the absolutely worst case, when you are running tight integer benchmarks.

Real code would be different. On my interpreter program, the difference between -O0 and -O3 is less than 2:1. And even *that* is when interpreteing integer benchmarks!

And unless all the software (OS, drivers, libraries) is recompiled with the same slower compiler, the slowdown will only apply to the application.

Also, development speed might be more important than the maximum possible execution speed; you can always run gcc with -O3 when it's finished! It might also be that a simple optimising pass, if desired, could make an quick improvement on that 2x slower that -O0.

For embedded work especially, being able to express integer constants in binary (together with separators for readability) is one of many dozens of simple enhancements that can make life easier. (Maybe gcc has some non-standard extension, and obscure switch to enable it, for doing that. That's not the same.) Type-specifiers that you simply read and write left-to-right like English is another (I doubt there's a gcc switch for that!).

You will still be working predominantly in C (or whatever comparable high level is used). The inline asm stuff is just a technique. After all gcc C compilers seem to depend on a plethora of weird-looking attributes, a thousand compiler options, and an entirely separate language in make files in order to do what they do. A few lines of ASM is tame!

And if done properly, you will provide both asm and regular versions of the code, so that you can switch off the asm when necessary.

OK, I will take a look. The ARM assembly code posted here looked like that assembled by gcc when you give it a .s file. (I'm not interested these days in writing assemblers and linkers...)

Bartc

Vote

S

Simon Clubley 12 years ago

If you are talking about my code, gcc compiled a C program by generating a file containing ARM assembly source code which it handed off to the assembler which is a part of binutils. The binutils assembler then turned that generated source code into a binary .o object file.

The binutils assembler can be used just fine outside of gcc; it's just that most people, myself included, invoke the binutils assembler by using the gcc frontend.

objdump, the utility used to disassemble the .elf file in my example code, is also a part of binutils. It has a option to include the source code at the correct points in the disassembled output.

However, with various optimisation levels turned on, that means you usually see the same sections of source code included more than once which happened in my example.

Simon.

Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world

Vote

D

David Brown 12 years ago

They do indeed, although Free Pascal is a relatively small community. gcc itself supports several languages in mainline (C, C++, Ada, Java, Fortran, Go) as well as a number of other out-of-mainline languages (D, Pascal, Mercury, Cobol, Modula-2, VHDL, PL/1 - and probably many more). And pretty much any other compiler in the Linux (or *nix) world will use binutils rather than re-inventing the assembler and linker.

And of course, don't forget assembler programmers!

Yes, the "packed" attribute can cause this - you really want to avoid that unless you have very good reason. Mostly it generates good code, but sometimes you get byte-at-a-time access. There can be several factors involved here, such as the compiler version, the flags used, and how the data is being accessed. But it is quite possible to get slower-than-necessary code for packed struct accesses - and it is also possible to get misaligned accesses, which are illegal on some hardware, but you usually have to cheat a little to force that (such as by taking a pointer to a packed struct member).

In many cases, you can re-order your struct to avoid gaps without resorting to "packed". Personally, I often like to put in any padding explicitly (and use the "-Wpadded" flag to check).

Vote

G

George Neuner 12 years ago

Absolutely.

But my point was more esoteric: namely that a struct is a reification of a view of the address space and not a "container" as such. You really have to think about it in this way to correctly (ab)use unions.

the

it

Yes, you can take the address of a struct member.

But here's a question for you: how does the compiler get an rvalue from memory? Answer: it computes the corresponding lvalue and then dereferences it.

Next question: what does the type checking for that look like?

The type "int *pi" is trivially ptr(int) and the type of the rvalue expression "*pi", also trivially, is deref(ptr(int) => int.

However the type of the rvalue expression "ps->i" is not trivially int, but rather something like:

member( _s,int(i) ) => int

which is compatible with and substitutable for int at a higher level, but it is not itself int because a struct defines a separate namespace that must be qualified to decide the type of the expression. Applying the address operator to the above, i.e. "&ps->i", makes a pointer and gives:

ptr( member( _s,int(i) ) => int ) => ptr(int)

which then is assignment compatible with "int *pi".

A different level of semantics.

However, it is trivial to remove the volatile qualifier. E.g.,

void foo4(void) { *((int*)((char*)&S + offsetof(S,y))) = 42; }

foo4() constructs and uses a non-volatile pointer to S.y, but the compiler won't utter a peep about it.

Btw: this isn't an exercise in fooling the compiler - I've seen this sort of thing done in the real world to do things like application level scatter/gather I/O, generic field access to dynamically defined record structures, etc. E.g.,

size_t read_int_member( FILE* fp, void* base, int offset ) { int *p = (int*)((char*)base + offset); return fread( p, sizeof(int), 1, fp ); } : read_int_member( file1, &S, offsetof( S, x ) ); read_int_member( file2, &S, offsetof( S, y ) ); :

YMMV, George

Vote

D

David Brown 12 years ago

The alignment of a double is normally determined by the ABI for the target - gcc follows the standard set down by ARM (as will any other ARM compiler). I am guessing that on chips with floating point hardware, the 8-byte alignment here /does/ matter - and for consistency, it is kept the same across the whole architecture.

Your {int; double; int;} struct would actually be 24 bytes, not 20 bytes

- the end of the struct is padded so that you can use it in an array and have the correct 8-byte alignment for all elements.

In most cases, optimal code will still be produced even with "packed" if the compiler knows that the alignment is good - my brief testing of a packed "int; double; int" struct was fine. But sometimes it can be messy - you do not want to use "packed" unless you really have to. (That applies to all targets and all compilers, not just arm and gcc.)

Vote

D

David Brown 12 years ago

Any tradeoff has its balance points - when there is such a huge difference in compilation times, you are going to pick the faster tool.

For many programs, a slowdown of a factor of 3 is not going to be a big issue - but for many others, it would be very significant. Different types of program have different balances between issues like run-time speed and ease of development.

The biggest reason against using your compiler for real work is not the speed - it is the features. "SUBC" is such a limited subset of C89 (which itself is 25 years outdated - C90, C99 and C11 have at least a few useful new features) that you cannot call it a "C compiler", as it cannot compile normal C code. You might feel it is a big enough subset to be useful, which is fair enough - and I'm sure it is very useful for educational purposes (I haven't looked at the source, but I believe you when you say it is clear and understandable). But it's not a realistic tool for serious C programming.

Vote

D

David Brown 12 years ago

For small and medium sized projects, compiler speed is rarely a problem

- and it is often easily solved by using a faster development PC and using "make -j" for parallel compilation. For more complicated projects, decent makefiles (rather than an IDE's project manager) can make a big difference, and tools like ccache can speed up compilation.

For larger projects, you generally need a mainstream compiler to get the features you need.

I agree on the principle, of course - the speed and convenience of development tools is very important. And sometimes that also means your tools or options vary through the development process - such as using

-O1 for fast compilation and -O2 for fast run-time. I just don't agree that using a simple, limited C compiler rather than gcc (or llvm, MSVC for C++, Intel, etc.) is going to be a good choice.

(gcc on -O3 usually produces slower code than -O2, as the larger code makes less use of caches. Only go above -O2 for particular types of code - it is usually best to use -O2 with hand-picked optimisations that match the particular code in use.)

gcc supports binary constants in the form 0b0010, and has done for quite some time - without any sort of "obscure switch". It /is/ an extension, in that it is not valid standard C syntax, but it is very much a "simple enhancement". It is also planned for the C++17 standard, and hopefully will one day make it into the C standards.

gcc has many other enhancements to make life easier for its users. It is not uncommon for them to make their way into newer C and C++ standards - it is easier to get a feature into the standards if you can point to an existing successful and popular feature from a major toolchain.

Of course, there are many other possible enhancements or extensions that gcc does not support. And if you have a lot of such features that you need, there can be some sense in making your own compiler that supports them. Of course, you could also consider adding them to gcc (it's much easier these days, due to the plugin support in gcc) - then you get all the benefits of gcc, and you get your desired enhancements, and you have avoided the work of writing the rest of the compiler.

Exaggeration is the root of all evil!

Vote

N

Nils M Holm 12 years ago

Absolutely. There are certainly areas where you have to squeeze out every cycle you can get.

First of all: I do not intend to propagating SubC for "real" development.

But then I do not agree with your implied definitions of "real work" and "serious programming". What is "real" work? Working on complex problems? Getting paid? Being creative? Having fun?

In the field where I work, algorithms are more important than using the latest compilers. We all know that quicksort with -O0 beats bubblesort with any fancy optimization flags you could through at it, hands down, given the input set is big enough.

In fact I am using an interpreter most of the time and a slow one at that plus a language that predates C89 by almost 20 years. I do research and I get paid for it. So is this "real" work or not?

I think a narrow definition of "real work" and "serious programming" (in whatever language) marginalizes people who think outside of the box and are not afraid to do their own thing. I think this is a pity, because it propagates uniformity, and uniformity kills creativity (and joy, and many other things that make us human).

Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org

Vote

D

David Brown 12 years ago

OK...

It is true that "ps->i" is not /trivially/ an int in the same way as *pi was, but it is nonetheless an int - that is how structs work in C.

A struct is a collection of members, which can be of (almost) any type. Accessing them, and identifying the types of members, is applied recursively through all layers of struct, union, array, and pointer. It's a little more complicated than a simple type, but it is not fundamentally different.

You don't need anything like that sort of "offsetof" mess. A simple cast is enough "int *pi = (int*) &s.y; *pi = 42;".

The syntax is correct, and the compiler does not warn about it (other compilers might, of course), and the generated code is as expected. But it is still undefined behaviour according to the C standards - and no amount of messing around with offsetof macros will make the slightest difference. Accessing a volatile qualified object through a non-volatile lvalue is undefined behaviour - the standards are very clear on this.

Yes, it /is/ an exercise in fooling the compiler - even if the code author did not know it.

It is legal to construct and use a pointer-to-int from a pointer-to-char like this (especially since a pointer-to-char can alias anything else), as long as you make sure that any alignment requirements are satisfied.

But it is not legal to cast away the volatile qualifier from an object or type (though perfectly legal to add it). It is the same situation as you have with "const" qualifiers.

Vote

B

BartC 12 years ago

(On x86, -O2 isn't faster, not for my main project anyway. But I'll try it on the ARM board next time I dig it out.)

OK, that's good. Maybe next they will allow separators! (This is more important in binary, because the numbers are longer, and also the digits often represent specific patterns or groups.)

But as I said there are very many language enhancements that are easy to implement even for a static C-class language. At the moment also, I'm looking at generating C source code from my compiler, so I will get the best of both worlds: nice feature-rich non-C syntax, and the speed of gcc -O3!

(Example input: print 11_0001B

output: printf("%d",49);

which doesn't depend on any C extensions.)

Bartc

Vote

D

David Brown 12 years ago

It depends of course on the sort of register+offset modes the cpu in question supports - but it's not uncommon to support a 16-bit signed offset, and there are often a lot of peripherals within such an area.

In fact, gcc has an optimisation option "-fsection-anchors" to treat file-level data as though it were a struct, precisely so that it can use register+offset addressing rather than absolute addressing. And many targets (like the PPC) have a "small data section" to let all small global data be accessed in this way.

So using register+offset instead of absolute addressing can make a significant difference.

Yes, but optimising code is fun!

Seriously though, I agree with you here. However, I think it is a good thing to understand these issues - it is better to know that you are writing code that is easy to write and maintain but sub-optimal at run-time, rather than guessing about the final result.

Vote

D

David Brown 12 years ago

I have been using inverted commas around "real" precisely because I don't have a good definition - it is a fuzzy concept. Note that I think having fun or learning is important - these are both perfectly good reasons for writing or using a compiler. But I think "real work" is perhaps best thought of in terms of the reliability of the program, the responsibility of the developer, the quality of the coding (its correctness, its clarity, the clarity of its correctness, its maintainability, the value for time, money and effort in development, etc.).

Suppose you are responsible for the software for brake handling in a car (which would definitely count as "real"), and there is a crash due a software fault. When you are asked in court about the tools you used, and you say "a homemade compiler", you will be facing manslaughter charges for wilful incompetence.

In more realistic scenarios, the limitations of your C subset make the language unsuitable for complex coding. No typedefs, no support for different integer sizes, no floating point support, no complex data structures, no "const", no macros, no conditional compilation, limited function pointers, limited initialisations, and no post-C89 features (inline functions, mixed code and declarations, booleans, // comments, compound literals, designated initialisers, anonymous structs and unions, static assertions, alignment specification). There are other differences between your subc and C89, and between the different C standards - these are just the ones that /I/ would miss in my embedded programming.

That's true. And in that sort of field, it is common to use higher level programming languages so that it is easy to express your algorithms, rather than worrying about the implementation details. People write such code in functional programming languages like Haskell, or mixed-paradigm interpreted languages like Python. And if they care about speed, they use C++ and the STL containers rather than wasting time and effort reinventing wheels. For more mathematical work, Fortran is still popular - and there are always people that use more esoteric languages such as APL. And of course there are times when the best choice is specialist languages like OpenCL or occam, or when you want a language that is part of a bigger tool such as Matlab.

In other words, they pick a language and tools that let them work best at what they want to concentrate on. The best tool for the job depends on many things - easy of expressing the task in hand, ease of development, run-time speed, popularity, experience, etc.

What they do not do is pick a relatively difficult and very low-level language like C, then pick an even lower level subset of it.

Out of curiosity, which languages are these? 20 years before C89 could be Pascal - Fortran was earlier, and Algol 68 is not much used these days. For the interpreted language my guess would be Python, but there are many to choose from.

I did not want to get into a definition of "real work" - because there is no fixed definition. But I want to make the point that while /writing/ and /studying/ a small compiler for a subset of C is an interesting project and has its good points, it is very rare that /using/ it will be the best tool for the job.

Vote

D

David Brown 12 years ago

Your mileage will vary here. Often -O2 or -Os is the fastest, and if you find that -O3 is faster than -O2 then you could probably get even faster by using -O2 with some specific optimisation flags, and also by being careful about the code design (things like making sure that everything that can be "static" /is/ "static", and using attributes like "always_inline" and "flatten"). Of course, whether this is worth the effort is up to you.

I agree 100%.

C++17 will (according to plan) allow a single quote mark (123'457'134) as a separator within numbers, which is nice. I expect to see that implemented in gcc for C and C++ as an extension long before C++17 is ready.

I'd recommend changing your binary syntax to match gcc and the future C++17 standard - it's consistent with existing 0x for hexadecimal, while a "B" suffix conflicts with the C use of suffixes indicating types. And while I would prefer a "_" separator, as used in Ada, the C++ folks have picked '.

When you are making your own language, it makes a lot of sense to generate C as the output. When you are targeting a specific compiler, there is usually little harm in using some extensions, and they can make a significant difference sometimes.

If I were making my own language, I would write the compiler in Python - it has lots of libraries for parsing, excellent string handling and regular expression support, and easy and fast structures such as dictionaries. I would generate C code as the output, using extensions where they were useful, targeting gcc and llvm.

If the language because successful and popular, the next step would be to change the output format into the internal formats for gcc and/or llvm. This would be harder than generating C, and give both the scope and the necessity of front-end optimisations. But it would still take advantage of the middle-end optimisations and back-end code generation from gcc and/or llvm - thus saving an enormous amount of the work.

The final step would be translating the Python front-end into C++ for conveniently building it along with the gcc and/or llvm binaries.

That process would let me concentrate on the interesting part - writing and using the new language - and skip the more tedious and time-consuming parts, and at the same time give a highly optimised end result.

Vote

B

BartC 12 years ago

Perhaps even the use of C would be a bad choice for such an application. Because it's easier to have coding bugs in such a language. Also, C is now quite complex, and compilers for it you say are also large and complex. That would be one advantage of a streamlined language and a simple compiler (whether it's created at 'home' or in a work environment is not relevant).

I'd miss 'goto'. This is more important from the point of view of using it as a target language (some of the other stuff can be emulated).

Bartc

Vote

N

Nils M Holm 12 years ago

Note, again, that I am not here to defend my compiler or my choice of languages. I just wanted to point out that choosing the latest greatest is not always a rational choice and that it is by no means self-evident that the most modern approach is always the best.

An aquaintance of mine owns a company that manufactures medical devices, heart monitors, defibrillation devices, stuff like that. They are in fact using a _homebrew_, _non-optimizing_ compiler for a _limited subset_ of a simple, _low-level_ language for developing the firmware of such devices, exactly *because* they need to be sure that the software does not endanger the lifes of people. They can prove formally that the compiler generates correct output which, I guess, would even be a good defence in court, but it never came to that. Try this with a modern C compiler.

I am not here to defend or endorse my compiler. Not at all. I am talking about a much greater context.

As outlined above, in fact, sometimes "they" do (even if it was not C in that case).

I am writing most of my code in Scheme (1975, IIRC), using an interpreter because it offers faster turn-around.

I agree that there very probably is no need for a compiler like SubC in production. And for my part, this is not about my compiler. I just wanted to point out that there exist cases where you actually do want to use a very simple tool for the task, *especially* in some very critical and "serious" areas.

Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org

Vote

B

BartC 12 years ago

For binary I allow 1101B or 2x1101 formats (the latter form allowing other bases too). But I've now added 0b1101/0B1101, if that's what C will use. (It took a few minutes; another advantage of a DIY language!)

While for separators, I allow _ ' and `. I like to give a choice...

(Actually, I've experienced a *lot* of problems in targeting C. Generating assembly or machine code is more straightforward, even if the resulting code is not the best. You don't have an idiosyncratic syntax, type system, and a bunch of highly pendantic and picky C compilers to get in the way. I've had to simplify my syntax to make it possible to express it in high-level-ish C, and a few things can't be expressed at all. However, the portability, and speed of the result makes it worth the effort!)

Bartc

Vote

D

David Brown 12 years ago

That is true - and there are many who prefer Ada for such high reliability applications.

It is /highly/ relevant whether it is created at work or at home - but it is /irrelevant/ whether the compiler is simple or complex. What is important is that you, as the compiler user, have taken appropriate steps to ensure that the tools you use are solid and reliable, and that they appropriately translate the source code into object code. Without that assurance, it doesn't matter how good or bad your source code is, or how complex or simple it is - if you cannot be sure that the compiler translates it correctly, then all your effort in writing good source code is wasted.

So how do you ensure that the compiler works correctly - at least to a high degree? There are industry-standard compiler test and certification suites, such as Plum Hall. There is quality control and good development practices for the toolchain, such as accurate source code control and bug tracking. There is internal testing, including automatic test suites, regression testing, and procedures for building and testing the toolchain. And there is a user base - if large numbers of people use the toolchain without noticing problems, then it is likely that the toolchain is working correctly.

This is totally impossible to achieve with a homemade compiler - you have no choice but to use one of the big, well-known toolchains.

Then there is the choice of language. You are correct that C is often a less than ideal language - but you can do well by using a slightly limited subset (such as avoiding malloc and friends) and strict coding standards, as well as using static error checking tools and perhaps also dynamic checking tools. It is also vital to use an industry-standard language so that any weaknesses or problem areas are well understood.

Again, this is totally impossible to achieve with a homemade language. Your choices are C, C++, and Ada.

Using a custom language and compiler that generated C /might/ be acceptable, if it were possible to do a good analysis of the generated C code (i.e., it should be understandable to C experts) and it were compiled with a suitable C compiler.

Agreed. I very rarely use "goto" in my own code, but it would be likely to be more useful in generated code.

Vote

P

Paul Rubin 12 years ago

Have they published anything about this? What kind of formalization do they use?

formatting link

That's a tempting line of reasoning but (at least unless heavy testing and extensive process is involved) it seems to fall down under real-world examination. See:

formatting link

(the simpler gcc versions from the "good old days" were much buggier than the current versions people complain about more).

formatting link

From p.10 of the pdf:

*Compiler simplicity*. For non-bottleneck applications, compiler optimization adds little end-user value. It would seem possible to take a simple compiler such as TCC [2], which does not optimize across statement boundaries, and validate it through code inspections, heavy use, and other techniques. At present, however, TCC is much buggier than more heavily-used compilers such as GCC and LLVM.

Vote

P

Paul Rubin 12 years ago

See

formatting link

for the disadvantages of using C as an intermediate language.

Vote

D

David Brown 12 years ago

Fair enough.

If you can provide a formal proof of correctness for the compiler tool, then you have a strong case for calling it "safe". But it is certainly no small feat to prove formally that a compiler is correct, even for a simple language.

OK.

(And I am not trying to beat on your compiler or the work you have done on it - I am just discussing why a tool like that would have limited usage.)

The case above was for a very different situation than you had mentioned (as far as I understood it). Different jobs require different tools.

I have never been a fan of Lisp - too many brackets to keep track of! I guess it's a matter of habit.

I can agree with that - and it is certainly common to use subsets of programming languages for safety-critical systems (such as Spark for Ada, or Misra C). But those subsets are chosen on the basis of minimising the risks of errors in programming - not on the basis of reducing the complexity of the compiler. You compile your Spark code with Gnat or GHS Ada, despite the complexity of these tools, because you rely on the experience, maturity and testing of those tools and their developers. You take a tool that is known to be good for large, complicated systems and then make it even more reliable by sticking to small, simple programs.

And my point against homemade tools and languages was because you normally don't get the kind of quality and testing that you get with bigger toolchains. But - as you pointed out with your example - there are exceptions.

Vote

Absolute addressing on the ARM

Join the Discussion

Didn't find your answer?