Getting the size of a C function

john · 2010-01-22T22:53:18+00:00

Hi,I need to know the size of a function or module because I need totemporarily relocate the function or module from flash into sram todo firmware updates.How can I determine that at runtime? Thesizeof( myfunction)generates an error: "size of function unknown".Thanks.

B

BGB / cr88192 16 years ago

this is a little closer to the second option, of having a secondary image file embedded as data...

this is, assuming the linker or image format actually supports the "separate section" idea...

dunno about ELF, but PE/COFF would not support this, since it would require breaking some of the internal assumptions of the file format (for example, that the image is continuous from ImageBase to ImageBase+ImageSize, ...).

ELF may have similar restrictions (actually, I think most ELF images are position independent anyways, so one could relocate and adjust the GOT for an image easily enough).

(note that embedding an additional PE/COFF of ELF image would not likely be "that difficult", and the formats are not particularly difficult to work with). a fixed-address PE/COFF image is likely an easy case, since one can copy the contents of the sections and then call into it.

for fixed-address, producing a raw binary image (supported by GNU ld, ...) is also probably a good option, since in this case the resulting image can be copied as a raw chunk of data (no need to relocate or worry about file-format), and jumped into.

can't say so much about other file formats though...

Vote

J

James Harris 16 years ago

...

So in Mark's example what will it be in Markend()?

i.e. Moveme()?

James

Vote

M

Mark Borgerson 16 years ago

t

ion

urce

code).

I think you missed a few points:

Inside Markend, The return address on the stack will be the address=20 after the call to Markend----which was purposely located at the end of=20 MoveMe. Then next few instructions after the call to Markend will be the return from MoveMe (an RTS or equivalent with stack=20 cleanup).

Inside Markend, the return address on the stack will be an address near the end of MoveMe. It is that address that you need to save and make available for the computation of the function length.

In assembly, the code in Moveme might look like this:

0900 MoveMe: sub.l #8, SP // make room for 8 bytes of locals 0904=09 test.l R14 // check the findend parameter in R14 0908 bne lbl1; // if true, just find end of function .... .... // all the work of Moveme goes here .... // and gets executed when findend is zero .... =20 1000 bra lbl2 // skip the markend call 1004 lbl1: bsr Markend =20 1008 lbl2: add.l #8, SP // clean up 8 bytes of local variables=20 1012 rts // return from MoveMe

When Markend is called at 1004, the address 1008 gets pushed on the=20 stack.

Inside Markend, you could do:

2040 Markend: Move SP, NearEnd // NearEnd is a global variable 2044 RTS

Someplace else, could do=20

MMLength =3D NearEnd - (unsigned long)&Moveme + 4;

When I was teaching introductory M68K assembly language, I used to give exam problems with nested subroutine calls like this---some with pushed local variables, and ask the students to show the contents of the stack at some point in the function. Those questions really separated the As from the Bs and Cs!

NOTE: You have to make sure that your compiler doesn't convert the Markend function to an inline sequence of instructions.

Mark Borgerson

Vote

M

Mark Borgerson 16 years ago

That's a real good point. If the OP's goal was just to move the function code--and not necessarily execute it after movement, he may not care whether the bytes in the .rodata, .data, or .bss segments get moved.

If the function has to be moved and executed, then it better to be able to access the data in the .rodata, .data and .bss segements---or not use data in any of those segments that are in flash memory.

If you're moving the function to RAM because you can't execute from Flash while updating flash, the function being moved could be written to use only variables and data in RAM. This might be the case if the function being moved is the Flash write routine.

Now that I think about it, I may use this approach in writing a firmware update routine for the MSP430---which has the restrictions mentioned above.

Mark Borgerson

Vote

B

Ben Pfaff 16 years ago

You seem to be assuming that the compiler emits machine code that is in the same order as the corresponding C code, i.e. that the call to Markend() will occur at the end of MoveMe(). This is not a good assumption.

"A lesson for us all: Even in trivia there are traps." --Eric Sosman

Vote

M

Mark Borgerson 16 years ago

it

ction

source

t code).

ED

=20

Yikes! I'll have to mark myself down 5 points!!!

That should be =20

2040 Markend: Move @SP, NearEnd // NearEnd is a global variable

I need to save the data pointed to by the stack pointer, not the contents of the stack pointer itself.

I also realized that, on the MSP430, I don't even need the function call. At the end of the function whose length I want to determine, I simply add the assembly=20 language:

mov PC, NearEnd

Both these methods do require some assembly language and are processor dependent. The compiler that I'm using on the MSP430 (Imagecraft), allows inline assembly, so the instruction above would be

asm("mov PC, %NearEnd\n"); // the % is used to reference a C =09 =09=09=09=09=09//variable

I'm reasonably confident that I can use this technique to move a flash-write routine, but I will have to be very careful=20 about using global variables, since the compiler produces PC relative references to global and static variables. Those references will be hosed when the code is moved.

Mark Borgerson

Vote

M

Mark Borgerson 16 years ago

I'll paraphrase the old Reagan maxim: "assume, but verify". I did a test run with an MSP-430 compiler and the call was at the end. For that particular processor, as I later discovered and noted in another post, you don't even need the function call. You can save the contents of the PC at the end of the function with a line of assembly.

This would certainly be a dangerous technique on a processor with multi-threading and possible out-of-order execution. I think it will work OK on the MSP430 that is the CPU where I am working on a flash-burning routine.

Mark Borgerson

Vote

B

bartc 16 years ago

If you're going to add a special parameter (and assume the return type is compatible with a return address), it might be possible to use gcc's feature of obtaining the address of a label.

Then findend can return the address of a label placed near the closing brace of the function (which possibly may be less likely to be rearranged than a function call).

int MoveMe(...., bool findend){ if(findend) return (int)&&endoffunction;

// do all the stuff the function is supposed to do

endoffunction: return 0; }

Bartc

Vote

B

Ben Pfaff 16 years ago

Threading and out-of-order execution has little if anything to do with it. The issue is the order of the code emitted by compiler, not the order of the code's execution.

Ben Pfaff http://benpfaff.org

Vote

M

Mark Borgerson 16 years ago

But woudn't an optimizing compiler generating code for a complex processor be more likely to compile optimize in a way that changed the order of operations? I think that might apply particularly to a call to a function that returns no result to be used in a specific place inside the outer function.

Mark Borgerson

Vote

W

Willem 16 years ago

Mark Borgerson wrote: ) In article , snipped-for-privacy@cs.stanford.edu ) says... )> Mark Borgerson writes: )> )> > In article , snipped-for-privacy@cs.stanford.edu )> > says... )> >> Mark Borgerson writes: )> >> You seem to be assuming that the compiler emits machine code that )> >> is in the same order as the corresponding C code, i.e. that the )> >> call to Markend() will occur at the end of MoveMe(). This is not )> >> a good assumption. )> >

)> > This would certainly be a dangerous technique on a processor )> > with multi-threading and possible out-of-order execution. )> > I think it will work OK on the MSP430 that is the CPU where )> > I am working on a flash-burning routine. )> )> Threading and out-of-order execution has little if anything to do )> with it. The issue is the order of the code emitted by compiler, )> not the order of the code's execution. )> ) But woudn't an optimizing compiler generating code for a ) complex processor be more likely to compile optimize in ) a way that changed the order of operations? I think ) that might apply particularly to a call to a function ) that returns no result to be used in a specific ) place inside the outer function.

More specifically, it could generate code like this: (example in pseudocode)

(begin MoveMe) TEST var SKIP NEXT on zero JUMP Markend ... ; the rest of the code RETURN (end MoveMe)

SaSW, Willem

Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

Vote

J

Jon Kirwan 16 years ago

Ben quite correctly brought you up short on the right point. Your example was, just to refresh ourselves:

Let's divert from this for a moment and take the case of a for-loop in c. It looks like:

A compiler will often translate this into this form:

(The reason for the C label is to support the continue- statement and the reason for the D label is to support a break-statement, of course.)

The straight interpretation would have been more like this:

But note that the execution of the for-loop's main body, presumed by the compiler to have "many iterations" as a reasonable guess, includes execution for the "goto A" statement in each and every iteration. But so is, in effect, the conditional test, too. In other words, it takes longer to execute the body, even if that only means the execution of one jump instruction. It's more efficient to redesign the model used by the compiler to the first example I gave, merely because the c compiler takes the position that the added one-time execution of the first "goto A" will be the lower cost approach (which it almost always will be.)

Now let's assume that the compiler takes the position that the first case of an if-statement section is the more frequently travelled one. In other words, when the conditional case is executed, it will more often be "true" than "false." The model used might very well then be to convert:

into:

This provides s1-block execution with one less jump and therefore lets it execute slightly faster with the idea that it is the preferred path.

So let's revisit your example again in this light:

This _may_ be taken by a c compiler to be:

Leaving your function call to Markend not exactly where you'd have liked to see it occur.

An old book you can pick up talking about a method used to _explicitly_ inform the compiler about statistics of branch likelihoods is the Ph.D. thesis by John Ellis:

formatting link

Worth a read, some snowy day.

Jon

Vote

M

Mark Borgerson 16 years ago

I've actually seen constructs like that intentionally coded in assembly language, since it saves the address push and pop you would need need in a branch to a subroutine. I haven't seen it recently in compiler output, but that may be because I limit optimization to make debugging easier. Since I do limited numbers of systems in a niche market, I save money by spending a few extra dollars on more memory and CPU cycles if it saves me a few hours of debugging time.

In any of these instances, I would certainly review the assembly code to make sure the compiler was doing what I intended in the order I wanted. Maybe programmers in comp.lang.c don't do that as often as programmers in comp.arch.embedded. ;-)

Mark Borgerson

Vote

M

Mark Borgerson 16 years ago

I've also run across main processing loops such as

void MainLoop(void) while(1){ get user input execute commands } MarkEnd(); }

where MarkEnd doesn't appear in the generated machine code, because the compiler, even at lowest optimization setting, recognizes that the code after the loop will never get executed.

That could certainly occur. I would be interested in the logic that could come to the conclusion that one or the other of the branches would be more likely to occur. I guess the compiler could check all the calls to MoveMe and compare the number of times the findend parameter was true and false. However that might be pretty difficult if a variable was used.

Still, a good reason, as I've said in other posts, to look at the resulting assembly langauage. I did it for one MSP430 compiler, and it worked the way I wanted. YMMV.

I wonder how many compilers would make that kind of optimization and under which optimization settings.

Those are pretty rare in Corvallis. Only one or two so far this winter. Now, rainy days----those I get in plentitude!

Mark Borgerson

Vote

J

Jon Kirwan 16 years ago

Yes, of course. That is another possibility. The intended function may be essentially the "main loop" of the code and as such never returns. However, whether or not MarkEnd() were optimized out, it wouldn't ever get executed anyway. So you'd never get the address stuffed into something useful... and so it doesn't even matter were it that the compiler kept the function call. So it makes a good case against your approach for an entirely different reason than optimization itself.

I wasn't suggesting that the optimizer includes a feature where it "tries" to adduce the likelihood. I was suggesting the idea that the compiler writer makes the 'a priori' decision that it is.

Think of it this way. Ignorant of application specific information, the compiler writer has two options to take when considering the if..else case's approach. Regardless of which way the compiler author chooses, one of the two blocks will get a run-time preference. So, does the compiler author _choose_ to prefer the if-case or the else-case? Without knowledge, which way would _you_ decide to weigh in on? Either way you go, you are making a choice. No escaping that fact.

Now, the Bulldog compiler provides a way for the author of the code to supply known information _or_ to use run-time profiling to provide that information, automatically. But I'm not talking about this case. That's for another discussion. I only pointed that out for leisure reading. Not as a point in this specific discussion.

Run-time profiling could provide that information. But that wasn't anything I wanted you worrying over in this talk. It distracts from the central point -- which is that a compiler writer, sans application knowledge and sans anything in the compiler or compiler syntax provided to the application coder to better inform him/her about which way to go, must choose. Either prefer the if-case or prefer the else-case. There is no other option. So which way would you go?

Indeed. I think the point here is that one is left entirely to the vagaries of the compiler author. And on that point, they may decide to go either way. There is NOTHING in the c language itself to help them decide which is better.

I think this question is moot. The point I was making remains even _without_ optimizations that may help inform the compiler about frequency of execution. So there is no need to argue this point.

Hehe. I live near Mt. Hood at an elevation of about 1000' ASL. So I get three feet of snow and ice, from time to time. I've had to use my JD 4320 tractor on more than one occasion! :)

Jon

Vote

M

Mark Borgerson 16 years ago

Well, I wuould not dream of using this approach on a function that never returns. OTOH a flash-update routine had better return, or it won't be particularly useful (unless your goal is to test the write-endurance of the flash) ;-)

Mark Borgerson

Vote

G

Grant Edwards 16 years ago

Yup, it's pretty much exactly that.

Every C compiler/toolchain I've used for embedded systems development for the past 25 years supported things like that. If his tools don't support multiple sections, then the first order of business is to find a decent toolchain.

ELF supports multile sections, and I've done exactly such things with ELF-based toolchains (Gnu binutils and GCC) when working on stuff like bootloaders where the memory map changes completely part-way through the program as the memory controller gets configured.n

That depends on the compiler options and linker command file. In my experience, "executable" ELF files on embedded systems (images that are ready to load into RAM and run) are generally not relocatable.

The COFF-based toolchains I've used all seem to support multiple sections, but that may have been due to vendor-specific extensions.

Grant

Vote

B

BGB / cr88192 16 years ago

ok.

well, I haven't personally had much experience with embedded systems, so I am not sure here.

yes, I know it has multiple sections, but AFAIK it is generally assumed that the final image is in a continuous region of memory (with the sections generally packed end-to-end), at least in the cases I have seen. granted, in cases I have seen, ELF has usually been x86 and PIC as well (the default build for Linux).

interesting...

well, I have really only seen ELF on Linux on x86, and there it is almost invariably position-independent.

granted, I don't know what other systems do...

COFF has multiple sections, but PE/COFF (in particular) also has ImageBase and ImageSize fields (I forget their exact official names right off), which are located in the optional header, which is mandatory in PE/COFF, and also contains things like the subsystem (Console, GUI, ...), and references to the import and export tables (related to DLL's), ...

AFAIK, PE/COFF also tends to assume that the image is continuous between these addresses, and also that all loadable sections be between them (doing otherwise could break the DLL / EXE loader). however, they may support additional "non-loadable" sections, which AFAIK need not obey this (but are usually ignored by the loader).

granted, to really know, I would have to dig around more closely in the PE/COFF spec (and Microsoft's sometimes confusing writing style, which caused great fun in a few cases when trying to debug my custom EXE/DLL loader...).

however, I can't say much about how much of this is common with other variants of COFF (IOW: the ones which don't necessarily begin with an MS-DOS stub, ...).

nevermind the added layer of hackery needed for .NET ...

I guess it all depends then on whether the particular linker for the particular target supports non-continuous images then, or if alternative means would be needed instead...

or such...

Vote

F

Flash Gordon 16 years ago

Then believe people who have...

It simply isn't true for embedded systems.

Then believe people who do...

Or you could believe people with experience on embedded systems. It is a common requirement to have non-contiguous sections, and sections which are loaded in to one location but run from another, and all sorts of funky things. Sometimes you can execute code faster from RAM than ROM, so you move the code at run-time, having boot loaders which get code from one place (sometimes on a different processor) and put it in another is common, and I've even had gaps in the ROM where there was RAM! All of which means having separate sections which are not adjacent.

It's all specific to the given tool-chain as to the best way to achieve it though.

Flash Gordon

Vote

F

Flash Gordon 16 years ago

Some times you *do* write such functions so they never return. Whilst reprogramming the flash it keeps kicking the watchdog, but it stops when it's finished and the watchdog resets the system thus booting it in to the new code. Or it might branch to the reset (or power-up) vector rather than return. In fact, returning could easily be impossible because the code from which the function was called is no longer there!

Flash Gordon

Vote

Getting the size of a C function

Join the Discussion

Didn't find your answer?