gcc .data and .bss address space

t the

ta

d IN

The compiler tell the linker to relocate address of another module, as well as address of data variables. But it does not change the content of such variables, even if they are relocated to another address space. In this ca se, if the assembler is using the content of the variable pointing to the d ata variable's address, it would remain in the rom space.

Reply to
Ed Lee
Loading thread data ...

If you define a variable in a section that is bound to a ROM portion of your address space, then the variable *is* in the ROM -- effectively "immutable". (this is useful in preference to #defines)

If it is *really* a "variable", then you have two types to deal with: initialized and uninitialized. (ignoring stack frames)

An uninitialized variable just takes up space in RAM; there is no need to store the "initial value" for that variable. You just need to know where -- in the .bss segment -- the variable is implemented. You can reduce the size of an executable by putting all "uninitialized" data into a single .bss (conventionally) section.

The startup code typically "zeroes" all of the bss segment. Note that is usually does this as efficiently as possible -- bzero() just jams zeroes into a *region* of memory with no concern over the "variable boundaries" within it.

On the other hand, *initialized* data (variables) need to take up space in ROM (for the initial value) as well as RAM (for the *actual* variable which can be ALTERED, at run time).

These (the "live" variables modifiable at run time) reside in the .data segment.

The constant values with which they should be initialized are copied into this segment by the startup code -- before "your" code runs. Again, the startup code doesn't have to respect the individual boundaries of variables; it just has to ensure that, once done, every variable referenced in that section has the correct initial value.

[E.g., I can jam 0x41424344 into a word and this might correspond to four characters of a string ("ABCD"), two shorts (0x4142 and x4344), etc. The initialization code will just copy a block of constants into the writable memory set aside for those "initialized data" as efficiently as possible]

Actual "const" values are stored in a .rodata segment which, ideally, can not be altered (but, that's up to the hardware).

You don't care where *the* constant value is stored that will be used to initialize "foo" in "int foo = 123;". That value will be copied *into* foo before your code runs -- ONCE!

But, you *do* care where "foo" actually resides because your code will reference it -- REPEATEDLY.

Using these segments/sections, you can strategically rearrange where your resources are allocated. I recall a legacy compiler that placed a 64KB limit on the amount of data that were supported. But, treated consts as a *separate* 64KB segment. So, I could effectively have 128KB of "data" addressable without exceeding the limitations of the compiler.

Reply to
Don Y

What you wrote is confused. After assembly main part of .o is preliminary content which will go to the executable. This is preliminary is sense that there are holes to be filled by the linker. For linker it does not matter much if hole is part of instruction or content of "variable" (I wrote variable in quotes because if it goes to ROM it can not change, but for linker it does not matter much). Another part of object file (relocation table) gives formulas which tell linker how to compute values needed to fill holes. I wrote formulas because there is some calculation, but it is rather simple. Some values may be (absolute) constants defined in other files. Some are of form "start address of module + offset" (offset inside module is known at assembly time, start address is known only at link time). To make it more concrete look at part of disassembly from example that I provided earlier:

from .o file:

00000200 : 200: 4a02 ldr r2, [pc, #8] ; (20c ) 202: 6813 ldr r3, [r2, #0] 204: 3301 adds r3, #1 206: 6013 str r3, [r2, #0] 208: e7fb b.n 202 20a: bf00 nop 20c: 00000000 andeq r0, r0, r0

You see that addresses are just offsets from start of file, instruction at offset 200 loads word at offset 20c. Content of this word is not known at assembly time, so objdump shows it as 0.

Now the same from .elf file:

08000200 : 8000200: 4a02 ldr r2, [pc, #8] ; (800020c ) 8000202: 6813 ldr r3, [r2, #0] 8000204: 3301 adds r3, #1 8000206: 6013 str r3, [r2, #0] 8000208: e7fb b.n 8000202 800020a: bf00 nop 800020c: 20000000 andcs r0, r0, r0

Linker shifted object file to start of ROM, so now we have instruction at absolute address 8000200 which loads word from address 800020c. Linker knows that this word is address of variable c, which goes to RAM at 20000000, so linker changes (fixes) content of constant at 800020c to 20000000.

Note that 411 starts with uninitialized RAM. If you want to initialize variables in RAM, you need to put initial values in ROM and initialization code of your program have to copy initial values to RAM. If you use normal embedded toolchain, your toolchain will provide starup routine which responsible for initializing variables and few other things expected by C code. If you want pure assembler you need to provide your own initialization (my example was done in way which needs no extra initialization, but it is doing nothing interesting, just sits in infinite loop incrementing variable).

For debugging using gdb you can load data (or program) to RAM (and linker supports this), but this depends in debugging interface. In context of classical OS, operating system loads program to RAM. Linker does not care much if section goes to ROM or RAM. Linker simply puts specified sections in ELF executable and fills holes according to rules (which may be more complicated than exaples I gave, but not very complicated).

--
                              Waldek Hebisch
Reply to
antispam

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.