IAR MSP430 compiler problem

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Nov 26, 2009 9:55 PM

[ snip ]

Niklas Holsti wrote:

That is interesting info on gcc, and new to me (I don't follow gcc development that closely). Thanks, David, this is definitely something one needs to know about (future) gcc.

Yes indeed, at least for the purpose of ensuring a standard calling sequence is used for a given function.

And the OP specifically asked about the IAR MSP430 compiler, anyway.

I was afraid that would be the answer, as far as it goes. Yuck.

There are several things under discussion here:

- Whether it makes sense to use a routine Spin, containing a loop (whether conditional or unconditional) as the last thing a thread should call in its time-slice, such that threads are always suspended and resumed only at this point, that is, within the call of Spin.

David, I think you have more or less agreed that this is a workable design for a non-preemptive (in my definition) time-sliced system that does not schedule other threads to use the slack left over in one thread's time-slice. I won't say more to defend it at this point.

- How difficult or time-consuming is it for an interrupt handler that interrupts the loop in Spin to find the return address of that call of Spin?

Assuming that

o the thread calls Spin using the normal calling sequence, in which the return address is left on top of the stack,

o the code in Spin does not push more data on the stack, and

o the handler is written in MSP430 assembly language,

then this is just one POP.W instruction, executed after the interrupt handler has popped the saved status register and saved interrupt-point PC from the stack. (I'm not very familiar with the MSP430 instruction set and its interrupt handling, so this may be a bit optimistic. But the MSP430 instruction set is claimed to be strong on stack accesses, so it should not be much harder.)

Thus, getting the return address of Spin (under the above assumptions) is quick and well-defined.

In fact, the tick interrupt handler could do it smartly as follows:

1) Pop the saved status register. 2) Pop the saved interrupt-point PC, check that it points to the loop in Spin, and then discard it. 3) Push back the saved status register.

This makes the two top words on the stack be the resumption PC (the return address for the Spin call) and the saved status register, exactly the state needed for a future RETI to resume this thread. It is not even necessary to get and manipulate the return address for the Spin call. (I'm assuming that each thread has its own stack area.)

- Whether the Spin routine can or should be written in C.

If the C compiler generates code for Spin and for the calls to Spin that satisfies the above assumptions, it can be written in C. But David is right to say that it is hard to be sure that the assumptions do hold, and will continue to hold, if Spin is written in C. So let's assume that we write Spin in assembly language, which lets us be sure that the assumptions hold.

- Whether the thread-resumption latency can be shorter if the Spin loop is unconditional, and the return address of Spin is saved and used as the resumption point (case A), compared to the latency when the Spin loop polls a flag, the address in the interrupted loop is saved and used as the resumption point, and the interrupt handler sets the flag to make the loop terminate (case B).

I comment on that below.

Why would the kernel need that information, if it is not going to schedule another thread for the rest of this time-slice?

I don't see much fragility in it. It is beautifully simple: if the thread finished what it had to do, it is in Spin; otherwise not. (I hope I am lauding the OP here, not my own guess about the design.)

And perhaps the OP's kernel actually has an "I_am_done" kernel call, which just ends up in Spin.

See above: a POP instruction. It is safe and reliable, if Spin is written in assembly language.

Getting the address of R takes one POP in case (A). Probably faster than these 3-4 instructions, at least not much slower.

No, it is based only on putting the right code in Spin, and ensuring that Spin is called with the standard calling sequence that leaves a return address on top of stack. This is readily and normally done by writing Spin in assembly language. The compiler is not hobbled in the C code parts.

It can hardly overwhelm it if Spin contains just the loop -- not much to optimise there. Or do you mean to write the tick interrupt handler in C? I know that some C compilers claim that you can use them to write interrupt handlers, but to me this seems more fragile than writing Spin in C. Especially for an interrupt handler that is meant to switch threads, not just manage som peripheral device and return to the interrupted thread.

However, the OP said that Spin in the OP's kernel contains some other things, too, so it's hard to say what the optimiser could do.

What "overhead" are you talking about, David? If Spin's profile is as simple as the OP showed (void Spin (void)), inlining would directly save only one call or branch instruction per time-slice. Perhaps the optimiser could let the thread keep more local data in registers over the (in-lined) Spin call, avoiding some store/load instructions. The IAR MSP430 compiler defines R12-R15 as scratch registers (caller-save) and R4-R11 as preserved registers (callee-save), so an inlined function could increase the available registers from 8 to 12; hard to say if that would be significant.

If Spin is inlined we lose the ability to check for time-slice overruns by checking that the interrupted thread is in the unique and only Spin, so I think that Spin should not be inlined. Perhaps you consider this to be one of the "fragile" aspects of this overrun-checking method, but it is not difficult to make sure that Spin is not inlined.

Wow, this makes me feel like a preacher. But "raise your eyes to the text above", and believe! :-)

There can hardly be a more reliable aspect of a standard calling convention than the presence and location of the return address.

That sounds rather dogmatic. If it works and is reliable, why is it "wrong"? Too heretical?

Although I still think that case A is a bit faster (and feel I have given good reasons above and in my preceding posting) it isn't the main point in favour of case A, the unconditional loop. Given this design of a Spin function in which the threads are suspended and resumed, a flag-polling loop is logically unnecessary: after the tick interrupt, the thread that gets to poll the flag is *the* scheduled thread, so polling the flag is superfluous.

I think the design is a neat solution to specific, limited requirements. It is a bit tricky, but interrupt-handling and thread-switching are often tricky. I mean "tricky" in the sense of "a trick", not in the sense of "difficult".

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- M
- Marc Jet
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Nov 27, 2009 11:40 AM

Niklas Holsti ha escrito:

Having seen this thread up to this point, I think the key point is summarized in the quote.

You assume that the OP has a very thorough knowledge of what he's doing and why. To me, his questions show that this might not be the case!

Solving his question to the letter will probably not be of much value.

To mention a few points that support this impression of mine:

- What's the behaviour of a function that does not return, when it suddenly returns? To all my knowledge the behaviour will be undefined. You can't complain when undefined behaviour turns out to not be what you expected.

- How should a scheduler detect that the interrupted task is in fact in the spin loop? Should it use the linker map to find the code address (and what about optimizations)? Should it match instruction patterns (and how do you force those to be generated by the compiler)? In my opinion there is no safe way to detect this, unless the function "helps" the scheduler (for example using a Yield() call, global vars, etc).

Since we're talking about a C program, we are constrained by the rules of C. We programmers exhaust those rules to our advantage, and the compiler writers exhaust "their side" of the rules to make better compilers. Therefore we can't break the rules, and C is not the correct tool for what the OP wants (IF it really is what he wants).

The clean way is to use any language or tool to create the desired functionality and encapsulate it into a linkable object. Only then it is compatible to the C portion of the design, and will stay so until the compiler calling conventions change.

I say this as programmer who has written schedulers on various architectures, and coincidentially also a virtual CPU of the architecture used by the OP (involving detailed inspection of instruction set and compiler conventions to create efficient hooks from virtual to physical).

Best regards Marc

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Nov 27, 2009 12:25 PM

See and (and also more generally).

It was news to me too that the LTO (link time optimisation) branch of gcc had been merged with the main development line. I've known of its existence for a long time, but for many years it has been (or appeared to be) a bit of a blue-sky project with a lot of ideas and limited working code.

It will still take a while before LTO rolls down to gcc compilers popular in c.a.e. gcc 4.5 is in stage 3 (no new features, bug fix and testing) - it will be a early next year before we can expect a first release. It will take a while from then for Code Sourcery to qualify and verify it thoroughly on their targets, and the 32-bit embedded gcc suppliers will pick it up from there. Smaller ports, such as avr-gcc, will take longer - they have fewer developers and resources. For out-of-tree ports such as the msp430, it depends entirely on what the developers want to prioritise.

At the moment (gcc 4.3), using -combine and -fwhole-program can get you quite a lot of these effects for C programming. Basically, it treats all the files in the program as a single big C file with everything declared "static".

LTO will give several advantages on top of that. Files can be individually compiled - useful for large projects, when files are in different directories, or when you want different compiler options. Libraries can also have LTO information. You can use languages other than C (for example, C++), and mix them together. And gcc 4.5 has a number of new optimisations that are only relevant for whole-program compilation.

It is like a lot of things in C development - finding a general, portable, standards-compliant solution is hard, even though making it work in a real life project is typically very easy. The trick is to find a balance for a solution that is general enough without being overly complicated.

OK.

You can't assume that (otherwise the OP would never have asked the question in the first place...), although we can probably assume this can be forced in some way.

It may make sense for the end-of-time-slice function to do more than just spin. Then it may have to make a separate call to Spin.

That's a big requirement, and totally unnecessary except as a way to implement this bad idea.

That is mostly true (you can't just pop the top of the stack, because the interrupt function must first preserve a register or two - but as you say the msp430 has good stack access instructions), given your assumptions. But as I noted above, the assumptions are not reasonable, IMHO.

Once you have taken into account saving a working register or two (easy enough), then that's a fairly elegant implementation of a very ugly hack.

If this system is for scheduling important or time-critical tasks, and there is no prioritised pre-emption, it is very likely that you need to track when the task's work is done for testing and verification, or for tracking errors.

But you are correct that the kernel might not /need/ that information.

Working around the compiler in this way /is/ fragile. It is a hack, and it is dependent on details of the compiler, the processor, the stack structure, it requires assembly for what should be simple C code, and it hinders the compiler's optimiser.

However, I have to agree that you have come up with a simple implementation of this design (and I am lauding /your/ implementation here, not the OP's bad design, or our guesses about it).

Don't forget the comparisons to check that you are in the spin loop.

A function as simple as Spin case B would often be inlined into the calling function (either explicitly, or via whole-program optimisation). Code like that is smaller as well as faster when inlined.

The code calling Spin can be better optimised if Spin is a proper C function, and the complier knows its definition.

The majority of interrupt handlers in embedded systems are written in C these days. For general interrupts, if you are not happy to trust your compiler to generate good and safe interrupt code, get a better compiler! But thread switching code will almost certainly need some inline assembly at least.

I am referring to the check of the exit flag that you think makes my Spin function too slow compared to your version.

It is, as you say, hard to be sure - especially if Spin contains other code.

Yes, but you don't /need/ to check call or return addresses if you write proper C code...

I think we'll just have to agree to disagree on this one.

Yes, it's against my religion :-) You don't write hacked code based on lying to the compiler, assembly code, and stack manipulation tricks when there are perfectly safe, efficient and reliable ways to do the same job with C. It's about writing legible, maintainable, portable code that is clear in its purpose and easy to verify. Even if this is nothing more than a simple test program, you should maintain a certain level of development quality.

I am not saying that /all/ such hacks are a bad thing - just that you have to have very good reason for using them. Shaving off a few processor cycles (if you are correct and you /do/ save time) is very seldom a good enough reason.

Just because code is accepted by the compiler, and works in practice, does not mean it cannot be *wrong*. And yes, I know I am pontificating

- doesn't that beat preaching?

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Nov 27, 2009 3:46 PM

You may of course be right, Marc. My impression is based on two things: the OP seems to have a workable design for the scheduler, and the OP seems able to read and understand the assembly-language code the compiler has generated. The OP's worry was only that a branch instruction would not leave a return address that the scheduler could use to resume the thread. I think that the OP did not know about tail calls implemented as branches, or did not remember this possibility.

The behaviour is defined by the code that the C compiler generates, and which the OP seems to have inspected. Of course, it is risky to rely on future compilations giving the same code. Writing Spin in assembly language would remove that risk.

I do understand that your question is about behaviour as defined in the C standard, but this is not a pure C program.

As I understood it, the Spin function is part of the OP's kernel/scheduler. If the loop in Spin is of the form "lab: jump lab", the scheduler interrupt handler can compare the PC at the interrupt point to the address of the label "lab". If Spin is written in assembly language, "lab" can be defined as a global symbol so its address is accessible to the scheduler as a constant. Or the Spin module can define a globally visible data word that holds the address of "lab".

But I don't know how, or even if, the OP intends to check that the task has reached Spin. Perhaps the OP's Spin sets a flag before entering the loop. Anyway, the check can be done easily and quickly, whether using a flag or using the PC.

We are talking about a program consisting of some C code, divided into several threads/tasks (which is outside C semantics, I believe), plus a thread scheduler (also outside C semantics), probably implemented by a tick interrupt handler. Typical multi-threaded embedded program.

That would certainly be my approach -- for my own purposes, I am not at all an "anything that works is good" programmer; I usually write in Ada, and enjoy it. I would write Spin in assembly language, as well as any coding-sensitive parts of the scheduler.

I agree fully, but I did not want to lecture the OP, only answer the OP's question. I did advise the OP to use separate compilation for Spin. In retrospect, I should have advised the OP to use assembly language for Spin.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Nov 27, 2009 4:14 PM

[ snip lots ]

The OP was specifically asking how to force the use of a normal calling sequence. The OP thought that a branch instruction did not represent a normal calling sequence, and it doesn't, except when it is implementing a tail call, as I think was the case in the OP's problem.

Agreed. Or pop whatever it has pushed on the stack, before spinning. Or even better, save the context of the thread before spinning, which would really decrease the thread-resumption latency.

Pooh. You have admitted that parts of a thread switch must be written in assembly language. So this code is in that part.

Yep. But if you want a check for time-slice overrun, that has to be done somehow, perhaps with a flag. Both checks are about equally fast, I think. If you don't want to check for time-slice overrun, you need a pre-emptive scheduler.

But if you still want to check for time-slice overrun you have to use flags, and watch out for race conditions. The amount of inlined code is growing...

OK.

Except for the assembly code that you need to switch threads.

The only "lying" that has been done here is writing the eternal Spin loop in C, from which the compiler could deduce that no Spin call returns. I think Spin should be written in assembly language and considered part of the thread-switching code.

In fact (but don't take this too grievously :-), the only non-standard C code for "stack manipulation tricks" was shown by you, when you referred to the special gcc function for getting the return address.

Good one, David! Luckily I'm an atheist... well, perhaps an agnostic for the purposes of this thread.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

- B
- brOS
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Nov 30, 2009 10:18 AM

Hi all.... :)

I 'd like to thank you for your posts because you helped me a lot. And I'm also sorry because I couldn't reply earlier... I would also would like to explain what I wanted with my Spin function. First, it is infinite loop function which increments idle counter only for testing code in simulator. For real work instead of it I'm using low power mode.

The Spin function is part of a kernel API call Task_Suspend_no_sched(). This function should suspend task, change its state, put it in a suspended list, save context and then enter low power mode until next tick, when scheduler should be called. So I needed that PC to save task's context in such way, that when it is scheduled again it could jump outside the Spin, or to instruction after the _bis(LPM1) or somthing like that. So that was idea and finally I realized it like shown below

Task routine should look like this :

void TaskRoutine(void){ while(1){ //do some work Task_Suspend_no_sched(); }

}

Task_Suspend_no_sched should looks like this:

void Task_Suspend_no_sched(void){ change_state(); put_in_suspend_list(); save_context();//it should use PC placed on stack by // Task_Suspend_no_sched call //_bis(LPM1); or loop like below for(;;){ Idlecnt++ }

} So ISR which provide system tick should interrupt loop or low power mode, and then scheduler should be called. When Scheduler schedule this task again it should jump at the beginning of the task.

Thank you for your comments and suggestions, it was very useful. :)

--------------------------------------- This message was sent using the comp.arch.embedded web interface on

formatting link

- B
- brOS
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Nov 30, 2009 10:28 AM

Hi all.... :)

I 'd like to thank you for your posts because you helped me a lot. And I'm also sorry because I couldn't reply earlier...

I would also would like to explain what I wanted with my Spin function. First, it is infinite loop function which increments idle counter only for testing code in simulator. For real work instead of it I'm using low power mode.

The Spin function is part of a kernel API call Task_Suspend_no_sched(). This function should suspend task, change its state, put it in a suspended list, save context and then enter low power mode until next tick, when scheduler should be called. So I needed that PC to save task's context in such way, that when it is scheduled again it could jump outside the Spin, or to instruction after the _bis(LPM1) or somthing like that.

So that was idea and finally I realized it like shown below

Task routine should look like this :

void TaskRoutine(void){ while(1){ //do some work Task_Suspend_no_sched(); }

}

Task_Suspend_no_sched should looks like this:

void Task_Suspend_no_sched(void){ change_state(); put_in_suspend_list(); save_context();//it should use PC placed on stack by // Task_Suspend_no_sched call //_bis(LPM1); or loop like below for(;;){ Idlecnt++ }

} So ISR which provide system tick should interrupt loop or low power mode, and then scheduler should be called. When Scheduler schedule this task again it should jump at the beginning of the task.

Thank you for your comments and suggestions, it was very useful. :)

--------------------------------------- This message was sent using the comp.arch.embedded web interface on

formatting link

- M
- Marc Jet
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 3:37 PM

Then all the comments about your approach (being inappropriate) where dead-on.

Your function Task_Suspend_no_sched() should really be named something like WaitForTick(). And it should do just that: wait for the next system tick.

The implementation does not require breaking out of endless loops or other fancy stuff. It is basic task switching theory, descibed in OS literature and all over the internet.

Using such a function, all the rest falls nicely into place.

Good luck with your project! Marc

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Dec 1, 2009 6:23 PM

Bogdan's extended description of his design closely matches my guess of his design, as far as I can tell. Therefore I don't agree with Marc's conclusion. I think Bogdan's approach is reasonable, except that it is risky to use C for code such as save_context() and the unconditional spin loop. But perhaps Bogdan is only showing C-like pseudocode?

There is > Task routine should look like this : >

contains just one call of Task_Suspend_no_sched, at the end of the eternal while(1) loop. If this is true of all tasks in Bogdan's application, the design could be simplified by inverting the control, so that instead of TaskRoutine calling the kernel/Scheduler through Task_Suspend_no_sched, and the Scheduler then resuming the TaskRoutine after this call, the Scheduler could always call TaskRoutine at its entry point, and the TaskRoutine could return to the kernel after doing its work. The while(1) in TaskRoutine would be removed as would the call of Task_Suspend_no_sched, and their functions would be taken over by things that the kernel/Scheduler does in between calls of TaskRoutine. The TaskRoutine would be just

void TaskRoutine(void){ //do some work }

A TaskRoutine with a single call of Task_Suspend_no_sched corresponds to a very strict design rule for real-time systems called the single suspension point rule. By this rule, each task shall have a single point where it can be suspended and resumed. The rule is good for schedulability analysis, but can be difficult to use if the task must perform complex timing or interaction sequences, because the task must then use data variables to remember what it is doing -- how far the sequence has advanced -- perhaps by means of a finite-state automaton.

Marc, did you not note that the Scheduler may schedule *another* task, not simply continue the one that is "waiting for the next system tick" in your terms? So the waiting task may be waiting for a longer time, several ticks, and it is necessary to save its context, including the PC. One way to get the PC is to retrieve the return address for Task_Suspend_no_sched, as Bogdan does. Another way is to put the task in a loop until it is interrupted, then retrieve the PC of the interrupt point (in other words, the return address of the interrupt handler).

Any kernel call that can suspend the calling task must retrieve the PC, which seems to be the kind of "fancy stuff" that Marc means.

As for the endless loop, most schedulers can encounter a situation where no real task is ready to do any real work until the next interrupt happens. There are several ways to deal with this:

- Use a special instruction or configuration that halts, idles, or powers-down the processor until an interrupt comes in. This is what Bogdan plans to do in the real system. (As an aside, this can have a nasty drawback: in one project where I was involved and it was tried, the resulting square-wave variation in the processor's power consumption disturbed the sensitive analog electronics on the board, so we had to use an eternal loop instead. That was not an MSP430, however, but a space-qualified 80C32, so it used rather more power.)

- Schedule a lowest-priority null task that contains just an eternal loop that does nothing, or perhaps maintains a processor-load indicator such as Bogdan's Idlecnt. But this is not so easy in a non-preemptive scheduler.

- Use a spin loop in the kernel itself, as Bogdan does. Earlier discussion in this thread shows that making this loop conditional is logically redundant. The only reason for not using an eternal (unconditional) loop would be to avoid confusing the C compiler, which is one reason why this loop should be written in assembly language. I have seen such eternal null loops in more than one kernel, including commercial kernels. I think they are an appropriate solution to this requirement.

Good luck from me, too, Bogdan.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .