IAR MSP430 compiler problem

- B
- brOS
  
  Contact options for registered users
posted
14 years ago

Mon, Nov 23, 2009 8:19 PM

Dear all,

Does anybody knows how to force compiler to use call instruction instead of br(branch)for disassembling function call?

It is extremely important for me to specific function is disassembled using call instead of brunch, as compiler always does.

Please help.....

--------------------------------------- This message was sent using the comp.arch.embedded web interface on

formatting link

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Nov 23, 2009 8:46 PM

A) Why is it so important to you to use CALL rather than BR? You may be falling into the trap of attacking the wrong problem.

B) I think the world would be a generally happier place if more processors had a dedicated brunch instruction. I figure that properly implemented it ought to take a good hour and a half to return, and then come back with the stack smelling of coffee and bacon.

--
Rob Gaddi, Highland Technology
Email address is currently out of order

- B
- brOS
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Nov 23, 2009 9:03 PM

A) This is why...Function i'm calling have structure like this:

void Spin(void){

}

--------------------------------------- This message was sent using the comp.arch.embedded web interface on

formatting link

- B
- brOS
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Nov 23, 2009 9:08 PM

This is why i need it.... Function I'm calling have looks something like this: void Spin(void){ for(;;){} } So if it is disassembled with call before entering in pc will be saved on stack and it will point to instruction after function spin....So I want to use that pc and to save context so when my scheduler schedule that task again it will not continue spinning in that forever loop but it will jump to next instruction after Spin function.....

branch doesn t push pc to stack so taht s my problem;)

--------------------------------------- This message was sent using the comp.arch.embedded web interface on

formatting link

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Nov 23, 2009 9:38 PM

...

The compiler has deduced that a branch instruction is as good as a call instruction for this/these calls of Spin. There can be two reasons for that:

If the compiler has seen the code of Spin (if it is in the same source-code file as the calling function) it may have deduced that Spin never returns, so it does not need the return address that a call instruction would push on the stack. Of course the compiler cannot know that your scheduler breaks C semantics (I assume by interrupting the eternal loop in Spin) and needs the return address.
If the call to Spin is the last statement in the calling function (a "tail call"), the compiler understands that the call does not have to push a return address, because Spin will return (assuming it would return) to the end of the calling function, which immediately returns to

*its* caller. The branch instruction leaves the calling function's return address on the stack, so when Spin returns (assuming it could return) it will take a short-cut and return to the caller of the calling function. This optimization saves time and stack space.

In case 1, try to put the Spin function in its own source-code file and compile it separately. When the compiler then compiles a call to Spin, it should assume that Spin may return, and therefore needs a return address and a call instruction, not a branch.

In case 2, you could add some statement in the calling function after the call to Spin, that is, make sure that the call to Spin is never a tail call. On the other hand, since a tail call still leaves a valid return address on the stack, your scheduler could use this return address (the return address for the function that calls Spin). Then you don't have to do anything, it should work even with a branch instruction.

Another possibility is to avoid the "High" optimization level of the compiler. I did not find a specific explanation of the tail-call-to-branch optimization in my copy of the compiler manual, but the "High" level seems to have most of the inter-procedural optimizations, of which this may be one. Try the "Medium" level for the compilation of the calling functions, and hope that the compiler does not do tail-call optimization at this level.

HTH,

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Nov 24, 2009 5:33 AM

[return address]

volatile void Spin(void) {} ?

- 4
- 42Bastian Schick
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Nov 24, 2009 6:58 AM

Why don't you call a context_switch() function ?

--
42Bastian
Do not email to bastian42@yahoo.com, it's a spam-only account :-)
Use @monlynx.de instead !

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Nov 24, 2009 9:02 AM

Your diagnosis of the problem is fair enough, but your workarounds are, IMHO, totally wrong. Anything that involves trying to trick or cripple the compiler (separate compiled files, disabling optimisations, fake extra inline assembly, gratuitous function pointer usage, etc.) is at best an ugly hack, and at worst a maintenance nightmare. Remember, the compiler is free to work around all these workarounds - lying to your tools is a bad idea.

The function is called by branch, not call, because it never returns. That's what you (OP) wrote in the source code, so that's what the compiler does. If you want the function to return, you have to write code that allows the function to return. In particular, you need to have some way of exiting the spin, otherwise it is useless. Thus you should write your spin function so that it exits when that condition is satisfied. For example,

void Spin(volatile uint8_t char *pBlockedFlag) { while (!(*pBlockedFlag)) ; }

If you can't see why you need something along these lines, you'll have to think a bit harder about how you want your code to work. But telling the compiler you want a tight infinite loop, and then trying to find some way to break out of it, is definitely not the answer.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Nov 24, 2009 12:32 PM

Could you graciously use a few more precious key-strokes to explain what you mean by that cryptic comment?

The IAR MSP430 C compiler reference guide explains "volatile" for objects only, it does not give "volatile" any meaning for functions. Interestingly, it accepts the above "volatile" function declaration without complaint. What do you suppose "volatile" should do, here?

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Nov 24, 2009 8:46 PM

I experimented a bit with the IAR MSP430 compiler (current "kickstart" version), and it uses call instructions to call a non-returning function containing only an eternal for-loop, even if the function is presented in the same source-code file as the call. If the function is marked with the __noreturn keyword the compiler will use a branch or jump instruction, though. (I assume that the OP has not marked Spin with __noreturn.)

So it seems my suggested reason 1 is not the true explanation.

In my small experiments, the IAR compiler does code a tail call to Spin using a branch or jump instruction, instead of a call. So reason 2 is a possible explanation for the OP's observation. Interestingly, this happens even if the optimization level is set to "None", so this advice of mine:

does not work.

David Brown wrote:

In general I agree with you, David, but the OP is trying to run C code under a custom scheduler, apparently in some kind of simple multi-threading or coroutine style. This is out of scope for the C language, so the operation of the scheduler will involve some things that the compiler does not know about -- and should not (have to) know about. The scheduler/kernel routines should follow the C compiler's calling protocols, but will themselves do things that exceed C's semantics.

Of course, the person writing the scheduler should know all about the C compiler's calling protocols and run-time system so that the scheduler can save and restore thread contexts properly.

The Spin function seems intended to be part of the application/scheduler interface; an application task calls it when it has finished its job and yields to the scheduler. Writing this "yield" routine as an eternal loop is unusual, but can be OK for a custom kernel. In a more conventional kernel, the application would call a kernel "yield" or "suspend_me" function, the kernel would check if some other thread is ready to run, and if not the kernel would stick in a loop, or schedule a looping "null thread" that is always ready to run.

That could be a reason, but I now doubt it for the IAR compiler -- see my note on experiments above. The tail-call explanation is the more likely one.

As I understand it, the OP's scheduler (most likely running in an interrupt handler) will break out of the "eternal" loop by popping the return address from the stack into the PC, forcing a return from Spin. This is legal MSP430 code, but out of C semantics.

Making Spin test a flag that the scheduler sets is a solution, but a different solution.

It could be safer to write Spin in assembly language, to prevent the C compiler gaining any false knowledge about its behaviour, such as "does not return" knowledge. But if the OP knows that the C compiler does not transport such knowledge across compilation units, writing Spin in C (for separate compilation) is safe. Of course this has to be rechecked for each new version of the compiler, so it is indeed a maintenance burden, over and above the burden of checking for changes in the calling protocols and run-time system structure, which a scheduler author has to do for every compiler version anyway.

Summary: Tail call optimization is the likely cause of the compiler using a branch instead of a call instruction. So:

- If the scheduler needs the return address (on the stack) only for resuming execution at the code following the call to Spin, there is no problem; the branch instruction leaves the return address of the calling function on the stack, and the scheduler can resume execution at this address.

- If the scheduler needs the return address to mark the location of the call to Spin (but why?), there is a problem if the call happens through a branch instruction, since the stacked return address then marks the location of the call to the function that calls Spin (or even the call to some even higher-level function, if there is more than one tail call at the end of the call path). In this case, and as there seems to be no way to disable the tail-call optimization in the IAR compiler, the only option is to make sure that no call to Spin is a tail-call. Or use some other kind of Spin, for example following David's suggestions.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Nov 25, 2009 8:50 AM

It's always fun to test and compare compilers. The stable version of gcc for the msp430 is an older version - 3.2.3 (with 4.x under development). It always "calls" the function even when it knows it is non-returning, and there is a "ret" after the call (and after the infinite loop). Newer gcc versions give tighter code (testing with avr-gcc 4.3.2) - a function calling Spin() inlines the infinite loop into caller. There are no jumps, calls, or returns.

The point here is that such details vary from compiler to compiler, and from version to version. The compiler will do exactly what you tell it, but you can't rely on it using a particular method to implement a particular construct.

Optimisation levels are never more than a hint to the compiler. You are just making a suggestion as to how it should balance compile time, ease of debugging, and size and speed of the generated code. Optimisation flags are never demands, and the compiler is free to apply all its optimisations at any level (though obviously it is more user-friendly to have some correlation). Code that is dependent on the optimisation level for correctness is broken code. (Obviously it can be dependent on the optimisation level for size and speed requirements.)

No, the scheduler/kernel should /not/ rely on the compiler's calling protocols. The compiler can change these as it wants, and mix them for different functions. If the scheduler depends on the compiler using particular instructions to call a function, the scheduler is broken - a pre-emptive scheduler can assume /nothing/ about the code it is pre-empting.

If you have a scheduler that for some reason needs a way to get a function's return address, then it needs to use a compiler-specific feature such as gcc's "__builtin_return_address()" function. If the compiler doesn't have such a feature, then you are out of luck. Get a different compiler, or write a scheduler that doesn't depend on knowing the return address.

Under no circumstances is it correct to tell the compiler you have an infinite loop, and then complain because you can't see how to break out of it.

It is not "unusual", it is "wrong".

There is no point in trying to help the OP find some workaround to get this system to compile - he must fix the code.

Exactly. When a task has finished, control must be returned to the scheduler, either by calling a "yield" function, or by returning to its caller (the kernel). You could, I suppose, end a task in an infinite loop and rely on the pre-empter to make sure other tasks get processor time. But you certainly wouldn't expect that thread to ever leave the infinite loop - that's why it's called an "infinite loop".

If the OP wants to write such brain-dead code in some sort of non-C, that's up to him - but he should not expect to use a C compiler to achieve it.

That's /almost/ correct. Making Spin test a flag /is/ a solution. But it's not a "different solution", because he doesn't have a solution at the moment - his scheduler concept /cannot/ be made to work the way he thinks.

An infinite loop is a dead end to the thread that hits it - no exits, no escapes, no returns. It's dead. The end.

Rather than trying to play Dr. Frankenstein, the OP should re-think the way his scheduler should work, and what Spin() should actually do. In particular, if he wants the function to be able to return, he must give it a way to return.

Rubbish. Fake assembly to lie to the compiler is not the answer.

Dangerous rubbish. Code the relies on separate compilation is as broken as code that relies on hobbling the optimiser. You don't have that choice - the compiler can transport anything it wants across compilation units, and you can't choose to stop that.

Real summary:

The original idea is /wrong/. An infinite loop has no exit and no return. If the function Spin() needs to exit, it should have an exit. Write code that says what you want it to do, don't write something totally different and rely on layers of workarounds, compiler-specific hacks, assembly tricks and other nonsense.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Nov 25, 2009 10:22 AM

[ Quotations edited severely but hopefully without misattribution.]

David Brown wrote:

I didn't say "rely"-- I said "follow". If the application calls a kernel routine, it will use the compiler's calling protocols that, for example, say which registers must be preserved, and which can be overwritten. The kernel routine should follow these rules, but is certainly allowed to change the values of the overwritable registers, for example. (Note, I am not talking about *pre-emption* here, nor was the OP, I believe.)

The question here is not really about particular instructions, but about the state in which the Spin routine is entered, specifically whether there is a usable return address on the stack. The presence of a return address on the stack must be defined in the compiler's calling protocol if the compiler is meant to be able to interface to assembly-language routines or generally "foreign" routines.

In principle true -- for a preemptive scheduler. (The OP is most likely not making a pre-emptive scheduler, however.) But in practice a pre-emptive scheduler must sometimes know about the run-time architecture of the pre-empted software. For example, some small systems use statically allocated memory for thread-specific data, such as additional working "registers" for floating-point libraries. A pre-emptive kernel has to know about such things in order to save and restore context. The alternative is to disable preemption while a thread uses such software-defined shared resources; the choice is a trade-off between latency and context-switching overhead.

But that is veering off-topic, I think.

Not very helpful to the OP. But "tough love", perhaps :-)

Who was complaining? The OP seems to know perfectly well how to break out of this loop by changing the PC in the scheduler (when the looping code is interrupted).

Eh? The system compiles. And can work, if the compiler's use of a branch instruction instead of a call instruction is only due to tail-call optimization, and there is always a return address on the stack.

It is not uncommon for kernels to (internally) use an eternal loop ("lab: jump lab") to wait for the next interrupt that creates some work to do, as the OP does in Spin. Yes, the loop is syntactically eternal/infinite, but in the presence of interrupts it can be terminated.

The OP is combining C semantics -- the loop is eternal -- with interrupt semantics -- the loop can be broken. This approach is normal for writing kernels and schedulers, but of course has its pitfalls.

I agree that it would be cleaner to write the non-C-semantics code, such as Spin, in assembly language.

Sure it can -- that is, an interrupt handler can break the Spin loop and resume execution at the point after the Spin call, as long as there is a return address.

There is nothing "fake" about this. A kernel/scheduler (especially if pre-emptive) has to go beyond C semantics. Using assembly language is the normal way to do this. And using the return address is the normal way for a kernel to save the PC of a thread, when a kernel routine suspends the thread.

The loop in Spin can be terminated by a scheduler using PC manipulations, as in a typical scheduler. Nothing wrong about that, although it is risky to write it in C, for the reason that we agree on: the C compiler will only see the C semantics, and may use them in ways that cause problems for this idea.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Nov 25, 2009 1:44 PM

The OP has given us very little information to go on - a lot of what we both are writing about is speculation (and I am just as likely to guess incorrectly as you). However, since an infinite loop can clearly never be broken without pre-emption, I am assuming he /does/ want pre-emption.

Certainly the kernel should follow the compiler's conventions for function calling - it should, as far as practically possible, be written in C, and thus calling conventions follow automatically. I misinterpreted your post - I thought you meant the kernel could assume that the code it is scheduling always follows the compiler's conventions.

That is only true at the points at which it actually /is/ interfaced to "foreign" code. When a C function calls another C function, the compiler can use or abuse whatever calling convention it likes at the time. Good compilers can and will do all sorts of re-arrangements to get better code, including inlining code bodies, changing register usage, or using a "branch" instead of a "call" when the called function cannot return. Nothing you can do with compiler flags, separate compilation, or other tricks can change that in a reliable way.

The code is just as broken for a co-operative scheduler. As you have said yourself, when a task wants to release the processor it should call the kernel scheduler.

I don't know how much you have worked on schedulers, but I get the impression you know what you are doing and could write one perfectly well. You would solve the same sorts of problems in a similar way to the way I or most other scheduler writers would. So I don't really want to sound like I am trying to teach you something you already know about.

But I just cannot comprehend why you are defending the OP's bad design, and trying to find ways to jam that square peg into a round hole.

You know as well as I do that writing a tight infinite loop, and then trying to find some way to go around the compiler to break out of the loop, is bad design from step 1. Everything else in this thread is of minor relevance (though interesting).

That is true enough. In such a situation, the OS must know whether these additional "registers" (or for some devices, they are real registers) must be preserved and restored. In "normal" embedded code the same situation turns up with interrupts. For example, when using the embedded multiplier on the msp430 you must disable interrupts or be sure that the interrupt routines don't use the multiplier.

Only a little :-)

That is, IMHO, what the OP needs here. Any advice he gets that help him continue down his original path is false help.

He is complaining because although he knows he has to change the PC, he doesn't know what new value to use.

A system can work (assuming for a moment that it can be made to work), and yet still be so badly designed and fragile that it is "broken".

It is certainly possible to have such an infinite loop in the kernel - but only as an idle function for when the processor is doing nothing. The thread is never expected to continue beyond the loop, or return from it in any way.

It is perfectly common and reasonable to have almost-infinite loops. An obvious example is a real spin lock, as implemented in real working schedulers - you have a tight loop that checks for an external event (such as a flag set within an interrupt routine or another task), and exits the loop when the flag is set. But the critical point here is that the loop has an exit clause. If you want to write a loop that will be exited, you write a loop with an exit clause.

I am /not/ saying this sort of code should be written in assembly - I am saying it should not be written at all! It can never be "clean" code. But if it is written in assembly, then at least you are giving the tools no useful information, instead of directly lying to them.

How is this in any way "better" than having Spin loop until a flag is set, and have the interrupt handler set that flag? Doing it the right way is entirely standard C, is far easier, far safer, far more portable, far more maintainable, and is smaller and faster than any sort of hack you might conceivably get working.

Using assembly language where assembly language is needed is absolutely fine - and a pre-emptive scheduler is always going to need some assembly language. But using assembly language to try to force the compiler not to optimise some code is almost always bad design.

And the scheduler gets the PC of a thread by looking at the return address for the interrupt routine, not by trying to dig down the stack and guess the return address for the current function in the interrupted thread.

One thing we haven't really discussed here is how the interrupt routine / scheduler knows that the thread is in the Spin function. Is it going to take the real thread PC (from the interrupt routine's return stack) and compare it to the address of the Spin function to determine if the thread is current at the "lab: jump lab" instruction? If it is there, then it will look deeper in the stack for the previous return address, and return to that point. If not, then the thread is somewhere else and the interrupt routine (or the scheduler) must return there.

While such a scheme may theoretically be made to work, it is needlessly complicated, very fragile and highly dependent on getting the code compiled in exactly the right way, and hopelessly restrictive and inflexible.

Maybe there is something here that I'm missing - perhaps the OP will come back to us with some more information.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Nov 25, 2009 9:45 PM

Yes.

I wrote my answer assuming that the OP knows what he or she is doing but was concerned that the branch instruction might not leave a good return address on the stack.

I would not call it pre-emption, but interruption. To me, pre-emption means suspending a task at some arbitrary point in its execution and switching control to another task. In the OP's code, the Spin function seems to be the expected place for suspending and resuming the task, so the task is prepared for it, at that point. This looks like co-operative multi-tasking.

My guess about the OP's design was that the Spin function would be used for consuming the rest of a thread's time-slice when the thread has finished its current job, and that the OP would not try to schedule another ready thread to use this (slack) time, perhaps in order to have deterministic time-triggered behaviour, or perhaps to avoid pre-emptions.

I agree completely that a pre-emptive kernel cannot assume that. (Well, there may be *some* conventions that always hold, for example relating to the stack pointer. But all conventions known to hold at a "foreign" call are generally not true at arbitrary points.)

Agreed. For most embedded compilers, though, anything in a separate compilation is considered "foreign". But as you say, there is no guarantee in general.

In my guess as to what the OP is doing, the call to Spin *is* this call, which would make the OP's kernel a rather special one. On the other hand, perhaps I mis-guessed, and the call to Spin happens *within* the OP's kernel, after the kernel has done the more normal things such as looking for other ready tasks.

Thanks. I've written a couple of simple, co-operative ones, a while ago, for obsolete processors, and studied a few other, current ones from the point of view of static WCET analysis.

I'm not so ready to call this "bad design" without knowing more about the OP's requirements and design. The code generated for Spin is exactly the kind of tight eternal loop that you often find in a kernel where the kernel has no ready tasks and waits for an interrupt. I haven't tried it, but it seems to me that writing this loop as a conditional, flag-checking one could increase (by a little) the latency for resuming the right task when an interrupt happens, compared to resuming the task directly from the interrupt handler and simply abandoning the tight loop.

It may be bad practice to rely >>> Under no circumstances is it correct to tell the compiler you have an

Niklas Holsti replied:

David Brown replied:

Because the OP thought that a branch instruction would not leave a return address on the stack. But if the branch instruction implements a tail call, it does leave a return address (although for an outer call).

See my comment on latency, above. But of course this is again a guess as to why the OP is doing it this way.

Writing a function in assembly language (and not, of course, as "in-line assembly code" in a C file) is a pretty sure way of making the C compiler treat is as a "foreign" function and so ensure that calls use the standard conventions, including pushing a return address.

Or the return address of the call from the thread to the kernel function, which is the case for Spin (I guess).

That is (also) my guess of what the OP is trying to do.

Maybe not. In my guess of the OP's design, if the thread is not in Spin when the interrupt happens, the thread has exceeded its time-slice. I don't of course know what the OP intends the kernel/scheduler to do, in that case; perhaps log a fatal error and reboot. Another choice is to set an error flag and let the thread continue until the next tick, when it is checked again.

That would be good.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- L
- larwe
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Nov 25, 2009 10:54 PM

*PROPERLY* implemented it should divert to the nearest pub and not return until the keg is dry.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Nov 25, 2009 11:26 PM

I don't think "the OP knows what he or she is doing" is a fair assumption, based on the posted code for Spin() !

It's possible (or maybe even likely) that the OP is /trying/ to implement a co-operative scheduler. But it doesn't actually co-operate

- an eternal loop is not co-operative, even if it you cheat and break out using interrupts. Interrupts are inherently asynchronous - if the thread can be suspended by an interrupt function, that is pre-emptive multitasking.

That could well be the intention. But spinning like that is a silly idea, and even if he wants to do what you suggest here, the implementation is totally wrong. The interrupt should set a flag, and the spin lock should block waiting for the flag.

These days, full program optimisation is not uncommon. Even gcc (despite its critics' opinions) can do reasonable full program optimisation by compiling all the C modules in one shot.

I think most of our apparent disagreements have the basis in different guesses as to what we think the OP is trying to do.

Hopefully the OP is still reading the thread, and will take some inspiration from our discussion!

Nah, the loop overhead to continually read a flag would be a few cycles at most. The interrupt function overhead to figure out return addresses from the stack will be much, much worse.

When I see someone write one thing, and mean another, I see a mistake. When the author knows what he has written and is wants to find some way to work around this difference rather than correcting the code, I see a bad design. Maybe I'm just less tolerant than you.

That is true, but my point is that you should not use assembly like this just to "get around" the compiler - not without very good reasons. I've often seen people use assembly code to try to force the compiler to act in some way, when they could have done much better while staying within C.

Your guesses as to the OP's ideas make a certain sense - perhaps he is trying to implement a sort of fixed time-slice scheduler. The implementation of Spin() is still wrong (you'll never convince me otherwise!), but that might bring us a little closer to helping him get a working implementation.

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Nov 26, 2009 11:59 AM

Well, what constitutes "co-operation" may be a matter of precise definition (in real life, sometimes of litigation :-). In my guess of the OP's kernel/scheduler design, the suspension is designed to happen only when the thread is looping in the Spin function. By calling Spin the thread shows that it is ready to be suspended, so it is co-operating in my view. (As discussed earlier, we don't know what happens if the scheduler interrupt finds the thread is *not* in Spin.)

Sure, but in that case it would not be "separate compilation".

Interesting question, though: Is there a standard way in a C environment to ensure that the standard calling sequence is used for an extern function, with no C-calling-C optimizations?

Let's consider what the kernel has to do, in my guess of the OP's design, considering the two cases of (A) an unconditional "eternal" loop and (B) a flag-checking loop.

The kernel knows which thread is running.

When the thread finishes its job in this time-slice, it calls Spin, expecting to be resumed at the next instruction after the Spin call, say instruction R.

The Spin function loops, eating up the rest of the time-slice.

The tick interrupt comes in.

The tick interrupt handler saves the context of the interrupted thread. By comparing its PC to the address of the Spin loop, it can check that the thread has not overrun its time-slice. At this point:

- For (A) the handler gets the return address of Spin by a POP and stores this return address as the resumption point for the thread to be suspended.

- For (B) the handler stores the interrupted PC (in the flag-checking loop) as the resumption point.

The interrupt handler (scheduler) finds the thread to run in the next time-slice. In case (B) it then sets the (thread-specific) flag on which Spin is waiting. In case (A) it does not need to set any flag.

The handler restores the context of the new thread. As the last step in this, it pushes the resumption address and the restored status register and does return-from-interrupt (RETI).

In case (A), the thread is resumed immediately at the desired instruction, the instruction R that follows the Spin call.

In case (B), the thread is resumed in the middle of the flag-checking loop. It still has to read the flag, branch out of the loop, and execute a return instruction (effectively a POP from the stack), before instruction R is reached.

In summary, case (A) and case (B) both have to POP the stack to get to instruction R, but case (B) also has to set a flag and check a flag. It is a close call, but you might save some cycles in case (A). Morever, in case (B) the flag has to be thread-specific, so it has to be passed to Spin with a parameter, consuming more cycles.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Nov 26, 2009 12:07 PM

Plus the flag has to be cleared at some point (being careful to avoid race conditions).

... except if the flag is in a register that is cleared in Spin before the loop, but set by the scheduler in the context that is restored (except for this flag) when the Spin loop is resumed.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Nov 26, 2009 2:34 PM

In fact the flag can be global, not thread-specific, since only one thread is resumed at a time. But since this thread *is* resumed, it is certain to find the flag set, which goes to show that the flag is redundant in this design, and the flagless unconditional loop in case (A) makes more sense.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Nov 26, 2009 3:18 PM

Fair enough.

What about gcc 4.5 with -flto ? Then you can compile C modules separately into object files, but the object files hold a copy of the internal trees as well as generated object code. When you link these object files, the trees are used for link-time optimisation, including inlining across modules. You lose all clarity in the definitions of "compile", "link", and "separate compilation". But that is a digression, especially since the msp430 gcc port is not (yet) updated to gcc 4.5, which is itself not yet released.

I think the only way is by being sure that the compiler can't access the code for a function declared as "extern". It should not be hard to do, but you may have to do it explicitly. For example, if you use a compiler's IDE and project manager, you might have to go out of your way to force true separate compilation.

Note that a sensible Spin function would tell the kernel that it is finished and entering the spin loop, rather than leaving the interrupt handler to figure it out in this fragile way.

Assuming, of course, that you've figured out a way to do that safely and reliably....

This bit will typically require some assembly, compiler-specific features, or some knowledge of the way the compiler generates interrupt routines. But that's unavoidable when you have an interrupt-based scheduler.

Fair enough, although setting a flag is exactly a hard job, and can be done within standard C.

OK.

Again assuming that is it is possible to figure out the address of R in a reliable way...

Yes, you can expect it to take about 3 or 4 instructions before getting to R. That would still be a lot less time than you spend messing around getting the address of R in case (A), so case (B) wins here in time.

Remember that all ideas about how case (A) could feasibly be implemented are based on hobbling the compiler. Write the code correctly (case B), and you can let the optimiser do its job - that will overwhelm any conceivable time advantage case A might have had. Among other things, Spin() could be inlined in its calling function and remove most of the overhead.

Even making the great leap of faith that there is a reliable way to get the desired return address, and then making a second leap of faith that case A is faster, the concept is /still/ wrong. There is no way that shaving a few cycles off the latency could justify using this horrible hack. If those cycles matter, you need a new design.