Delta Queue Help - Paging Mr. Kirwan

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 5:51 AM

If you wanted to use the context switch routine for Cortex I showed earlier, it's also good to know how to set up the initial stacks.

When a task is started for the first time, it happens in these two instructions:

pop {r4-r11, lr} bx lr

According to the ARM documentation, in a 'pop' instruction (or any load/store multiple), the registers are stored in order where the lower numbered register is stored in the lower address memory location.

Because the stack grows down, the 'lr', pointing to the task entry, should be at the top of the stack.

The code to set up the stack could look something like this:

long *stack, *stack_top;

stack = alloc_stack( stacksize_in_words ); stacktop = stack + stacksize_in_words;

// now push 'lr' on the stack, containing entry point of task:

*--stacktop = (long) task_main;

// push r4-r11. We don't care about the values, // so we just make some room. stacktop -= 8;

// Now, put 'stacktop' in the task struct as the initial stack pointer. task->sp = stacktop;

Now you can call the context switch to the new task.

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 5:59 AM

Thanks for the reply. They do waste a gp register, but so does a link register. And a frame pointer has some value, though perhaps more for assembly coders than c compilers -- because the c compiler can keep track of push/pop counts as it goes and always knows what the local variable offset will be at any given time where a hand-coder would have to keep a piece of paper at hand to track that stuff as they write compatible code without a fixed position frame pointer. But also, if each stack frame (activation record/frame) doesn't include the prior stack frame pointer and an exception handler pointer within it, then a general purpose unwind for exception handling would seem to be difficult to me. So it makes me wonder how that is handled in this processor. Could be by maintaining a separate exception stack. I've done exactly that in adding exception handling to c, so I guess that would be another option here. I guess it must be done that way or else there would be a standard called out here.

Again, thanks for the informed comments about ARM!

Jon

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 6:04 AM

Oh, yes. I gathered that already from your prior post. I perceive it was easy to read even though I don't know any ARM assembler, yet. I think I have gleened the fact that the return address is stored in a link register, lr, rather than on the stack. Or I think I have figured that out from your code, anyway. Popping the lr causes it to be restored from the prior time's push instruction (not shown above.)

I didn't need to care about the order. I just assumed that if you use a multiple push that a matching multiple pop will handle things as you'd imagine it should.

All that makes good sense. And much appreciated!!

Thanks, Jon

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 6:21 AM

There's a good reason to drop the frame pointer. Maintaining the frame pointer creates run-time overhead, but maintaining the stack variable offsets creates compile-time overhead, so programs without frame pointer are smaller and faster.

The link register comes from the quite pure RISC architecture of the original 26 (and 32) bit ARM. The architecture did not have a hardware idea of a return stack at all. This decision forced ARM also to have banked registers for exception handling. You can regard the link register as a cached top of the return stack.

The Cortex architecture has returned the idea of a return stack. It replaces the banked register sets in exception handling. For a context switch, the way to go is the PendSV exception handler run at lowest exception priority.

--

Tauno Voipio

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 6:39 AM

I'm completely aware of that trade-off. And also aware that the compile-time counting is often done, these days.

Interesting, and thanks!

Hmm. More reading to do, I see.

Thanks, Jon

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 7:19 AM

Normally there wouldn't be a reason to care much about the order, but if you're going to set up the proper initial stack (from C code) these details become important.

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 7:41 AM

Putting the context switch in the PendSV handler is a good idea if you're implementing a preemptive scheduler.

I prefer the simplicity the comes with a non-preemptive scheduler whenever the project allows it, and I just call the scheduler in my main loop, or from tasks that need to wait for something. In that case, having a plain function works just as well.

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 8:55 AM

Yes, thanks. In other words, if the functions that start a thread accept parameters via registers. Yes?

Jon

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 8:59 AM

Yes. That's what this thread is all about, so far. No preemption, as yet. No reason for heavy-weight context switches.

Jon

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 9:31 AM

The PendSV handler can also be used in a non-pre-emptive way. It is so simple that there is no reason to use anything else.

A context switch in a Cortex needs to be done in a exception handler to be able to handle all baseline registers.

---- clip clip ---

pendsv: ldr r0,=ctdb @ -> current thread pointer save ldr r1,[r0] @ -> current thread record

ldr r2,=ntdb @ -> next thread pointer save ldr r2,[r2] @ -> next thread record

stmdb sp!,{r4-r11} @ push remaining registers str sp,[r1,#tsp] @ save outgoing sp

ldr sp,[r2,#tsp] @ get incoming sp ldmia sp!,{r4-r11} @ pop remaining registers

str r2,[r0] @ update current thread pointer bx lr @ go to thread

--- clip clip ---

The hardware takes care of stacking r0-r3, r12-r15 and status.

To create a non-preemptive thread switch, set up pointer to next thread record to ntdb and trigger a PendSV exception.

--

Tauno Voipio

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 9:56 AM

Why do you worry about r0-r3, r12 and status register ? I don't save those in my code. The function calling the context switch doesn't expect them to be preserved, so why bother ?

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 10:02 AM

I just use it to make sure the value for the link register is set to the address of function that is used as a task body. When all the registers are popped in the context switch, it then continues code execution at the 'lr' value, which came from the stack.

As far as parameter passing, things are different. The ARM passes parameters in r0-r3, and these aren't saved in the context switch function, so they can't be passed that way.

Of course, it is possible to pass values in r4-r11, and use some assembly code in the function (or a separate task entry veneer) to obtain those values.

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 11:07 AM

This is a pre-emptive context switcher - you'll see sooner or later that you need it.

--

-Tauno

- E
- eeboy
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 11:19 AM

I like that approach better. Let's start simple, get the individual blocks working and tested and then add to it.

--------------------------------------- Posted through

formatting link

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 11:21 AM

I agree that a preemptive context switcher needs to save these registers, and that using PendSV makes a lot of sense in that case.

For non preemptive schedulers, the PendSV method carries a bit more overhead, and doesn't seem to offer an advantage.

So far, none of my projects needed a preemptive scheduler. In fact, most didn't even benefit from tasks. Things may change, of course, but preemptive scheduling adds a bunch of complications that I'll try to avoid as long as I can. You can get a long way by doing the highly timing critical stuff in (deferred) interrupts instead of in tasks.

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 1:51 PM

My mistake. I forgot that r0-r3 aren't popped at all.

Yes, but your code didn't really care what the values where.

Yes, understood. It's possible, in taking account of Tauno's example where r0-r3 are handled, to pass along parameters at the start. But not in the case where there isn't some mechanism of setting those up to start a thread.

Thanks.

Right.

Thanks for keeping me straight.

Jon

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 2:17 PM

So, just a simple context switch? A great thing is that we have some excellent examples now from Tauno and Arlet!! So we have literally been given the keys to the door!

Since you want to take the simple approach here, would you like to start by taking the code already posted here as the ctxsw() function, plus what I've said about the simplicity of just setting up a few entries in an array? The only thing I haven't discussed myself is setting up the arrays in the first place, but Arlet did that in another post. So really you have everything already here. If you want to take a shot at it before I do, I'd let you try. (You've done good work already.)

Or would you prefer I carefully walk through an example that I can hope is also vetted with the two helpful posters, Arlet and Tauno? I think you have enough, now.

Let me lay out the details:

(1) You have a context switch from Arlet:

push {r4-r11, lr} str sp, [r1, #TASK_SP] ldr sp, [r2, #TASK_SP] str r2, [r3, #SCHED_CURRENT] pop {r4-r11, lr} bx lr

And another from Tauno:

ldr r0,=ctdb @ -> current thread pointer save ldr r1,[r0] @ -> current thread record ldr r2,=ntdb @ -> next thread pointer save ldr r2,[r2] @ -> next thread record stmdb sp!,{r4-r11} @ push remaining registers str sp,[r1,#tsp] @ save outgoing sp ldr sp,[r2,#tsp] @ get incoming sp ldmia sp!,{r4-r11} @ pop remaining registers str r2,[r0] @ update current thread pointer bx lr @ go to thread

As already noted, one only deals with r4-r11 and lr. And because an exception is used, Tauno's deals with r0-r3, as well (the details of which are not in the above example.) I think both store r4-r11 on the stack explicitly. Also, one is possibly for the Cortex-M3 more specifically (noting, ignorantly, that slightly different forms are used.) Again, I'm not sure. But I think if you make a few choices Arlet or Tauno can make sure the ctxsw() function is written correctly.

The basic idea here is that _at least_ r4-r11, and if I gather things correctly, also lr (r14) should be saved somewhere. The current stack is a possibility but you could also choose to store them into the proc structure. It depends on your preference, though the above examples appear to do it on the stack to me.

Then you need to save the resulting sp register into the structure entry for the current thread (you can't save that one on the stack because you'd never know where it is, later on) and then load up the sp for the next thread to run. Read the above code to make sure you follow their techniques (I still need to learn the assembler, though I can almost read them above without that knowledge.)

(2) You need to wrap the ctxsw() function, whichever you choose, with some c code that passes in the right values for r0-r3 as parameter values. (The above routines assume some proper call setup.)

(3) You need to define an appropriate array and links, initialized as appropriate.

(4) You need to set up some pseudo stacks. This can be done with static arrays, if you like. But as with Arlet's post elsewhere, you do need to init them. Or you can malloc them and init them. Either way, you will need to set up the pointers to those memory bits into the thread nodes used by ctxsw() so that it will work, when called.

(5) That's about it. You don't need to deal with killing threads in this example. Just keep going round and round in a circle.

Does all that make sense? Think you can do it from here? Or would you like an explicit example to try?

Jon

- J
- Jon Kirwan
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 2:20 PM

Left over from earlier writing. Should not have been posted. Ignore it.

Jon

- E
- eeboy
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 9:43 PM

I think it makes sense... but that's the way it always goes until you actually start doing. I am going to have a go at it. I'll post up what I produce and let you comment on/revise it. I'll probably have several questions along the way though.

--------------------------------------- Posted through

formatting link

- E
- eeboy
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Sun, Sep 25, 2011 4:35 AM

So... your context switch prototype might look something like this?

void ContextSwitch(struct Task *,struct Task *, struct Scheduler);

And I am confused on the assembly. The concept makes sense but I read the assembly to do something different. The idea (in my mind) is to pull the current stack frame contents for the current task and shelve it in a task structure so that it can resume later and then place the next task onto the stack so that it can resume from where it left off right?

If my understanding (above) is not flawed, why do we first push onto the stack (burying the current frame) before we do anything?

A further indication that I have taken a step backwards...

If the parameters are passed to a function in r0-r4, why are we not saving those? What if the function needs to refer to those parameters after resuming?

What about the autos used by the task? I assume when I set up a process I must provide a pointer to an array (I'd rather not use malloc) large enough to hold everything (even the autos).

It seemed clear just a few days ago. Now I fear I am lost.

--------------------------------------- Posted through

formatting link