Delta Queue Help - Paging Mr. Kirwan

Almost. My context switch code I posted comes from my own scheduler, also written in asm, so the r1-r3 registers are set up, and r0 is used for something else. If you're calling it from C, the parameters are passed in r0-r3, so you'd have to rename the registers. For instance:

push {r4-r11, lr} str sp, [r1, #TASK_SP] ldr sp, [r2, #TASK_SP] str r2, [r0, #SCHED_CURRENT] pop {r4-r11, lr} bx lr

and then the prototype would like this:

void ContextSwitch( struct Sched *, struct Task *old, struct Task *new );

In a sense, yes, but in practice it's not necessary to pull anything from the stack, and copy it somewhere else. It is fine to leave stuff on the stack. Nobody's going to touch it, so it will still be there when execution is returned to that task. The only thing we need to remember is where the top of that stack is, so that's why the stack pointer is saved in the task structure.

Since everything on the stack will be automatically preserved, and we need to preserve those registers, it is easy to push them on the stack as well. Since pushing/popping is such a common operation, most CPUs have efficient instructions for that. The alternative is to copy them into the task struct.

According to the ARM procedure call standard, the registers r0-r3 (and ip and even lr), and not guaranteed to be preserved after a function call. This is even true if you don't pass parameters in those registers. They are referred to as 'caller-saved', so if the caller wants to preserve them, it is responsible for saving them. Other registers, such as r4-r11 are referred to as 'callee-saved', so that's why I saved them on the stack.

The auto variables will end up on the task stack, so yes, you must allocate enough stack space for each of the tasks (either through malloc or a static declaration). Also, you must be careful not to use functions that use large auto variables, such as arrays, when you've created a smallish stack.

That's okay. Most people feel that way the first time they try to understand context switching.

Reply to
Arlet Ottens
Loading thread data ...

Please remember that each task needs an own stack, so it is OK to leave things on stack for the suspended state.

--

Tauno Voipio
Reply to
Tauno Voipio

Well, it is Arlet's code. So I'd prefer to let him tell you. But my read of it is that r0 might have been some kind of 'this' pointer that wasn't used. Or else the parameters are sent in "backwards" with r3 being the first parameter. This is why you need to read your compiler docs. They will tell you which end is up. But his code uses r1 through r3 as parameter inputs, it looks to me. r0 is either garbage or else a hidden parameter (often 'this' in c++.) r1 points to the task block of the current task that is switching away; r2 points to the task block of the next task to go to; and r3 points to some kind of informational structure about whatever is the current task (it needs a copy of r2, so the next time around r1 will get set to that, properly.)

I think you need to do your OWN design here and use these examples from Arlet and Tauno as being illustrative, but not determinative. You do what is right for _your_ structures, not right for theirs. The context switch is something you need to write, as appropriate. However, you already have the basic idea, I think. It should switch from one stack to another and let the rest of the code know what it did do before it returns.

Hmm. Let's go over this again.

Let's say you have three threads. Call them T0, T1, and T2. You need to make sure there are three stacks for them. Since the tool-smiths of the compiler and linker and will already give you one stack to start with, you will need to make at least two more. That can be done with arrays (static data space) or malloc (heap space.) That's your choice. Once you have the memory in hand, you will need to initialize it so that your context switch routine won't crash the system the first time it is called. (More on that later, but for now just realize you need to do something -- you can't just malloc some space and do nothing more.)

You will need three thread blocks. Let's say, for now, that all they are is three words to hold no more than the stack pointer and nothing else at all. In other words, the entire thread block structure amounts to one word the same size as the stack pointer. So you allocate an array of three, let's say, and call it T[]. So T[0] represents T0, T[1] represents T1, and T[2] represents T2. But you are running in T0 right now (still setting things up, really.) So T[0] will get destroyed the first time you switch. Don't worry about it. However, you had darned well better make sure that T[1] gets a pointer to the right place for the memory you allocated for T1's stack and similarly for T[2] re: T2. So set T[1] to the right address for T1's stack and T[2] to the right address for T2's stack. Keep in mind this will likely NOT be the exact address you got from malloc or the starting address of some array you set aside. That's because of the initialization you need to do. The pointer will be somewhere further on. That's the value you store in T[1] and T[2]. Your context switch routine will determine just how much further on. So the exact assembly code will dictate that.

Once you have your stacks neatly prepared for T1 and T2, and you've inited your T[] array, you also need to create some kind of variable that tells you that you are currently running in T0. In other words, something that points at T[0] at the very least. Call that variable CURRENT. Now, when you execute a thread switch, you load up CURRENT, increment it to the next T[] array position and deal with wrap around, and this gives you now two pointers -- the original value that still points at T[0] and a new pointer that points at T[1]. You may (but not necessarily so) want to add a third pointer you pass into your context switch that tells it where you got your T[0] pointer -- namely, give it a pointer to CURRENT. But that isn't strictly necessary -- your context switch routine might just hard-coded-wise know that it is supposed to use CURRENT and not need a parameter for that. When you call it, it saves the current SP where the first parameter points (T[0]) and loads the new SP from where the second parameter points (T[1]) and then saves this second parameter into CURRENT. So the process can start all over again, but with T[1] and T[2] next time. And so on.

Aside from that, your context switch routine first saves all important registers and thread state on the "current" stack before it switches stacks as I just described. Then it restores all important registers and thread state from the stack it just switched to, before returning. Note that stuff goes onto the current stack, the stacks are changed, then stuff gets popped off the new-current stack before returning.

If you think about that closely, you will see the magic in it!

But.. you could instead of using the stacks to do this, just make T[] elements a nice, fat structure instead so that each T[] entry holds all the register values and thread state there. Don't use the stack for it, then. There are advantages either way. Pushing and popping on the stack is often fast and easy to code. But putting all that crap into a static array element lets other code in your system have easy access to the registers and thread state -- which can be nice, at times.

Your call.

Hmm.

You need to understand.

When T0 calls pswitch(), pswitch() must save anything important to the current thread before flipping over to a new thread. This is because the new thread has a different stack, different 'auto' variables, and will definitely _screw_ with the registers a lot. c compilers tend to assume that some registers don't get messed with when they make a function call to pswitch(). So pswitch just crams them onto the stack just in case. When it changes stacks, it will just accidentally be the case that this new stack will have _also_ have had those important registers pushed last time it was active. So popping them restores that context cleanly.

Best to work this out on a piece of paper and use a very simple case of just two or three threads.

It won't need to, as I understand it. r0-r4 are assumed to be destroyed by the c compiler whenever it makes a function call. Since pswitch() _is_ a function call, the c compiler will assume those registers are destroyed. If it cares about them, it will save them _before_ making the call. So you don't need to.

That's why you have separate stacks!! Each stack holds the 'auto' variables. And if some of them are in registers, well pswitch() saves those too! So all of the local variable data is kept isolated in each stack area. When you change stacks, you change that context and will then be operating on whatever the situation there was.

Now, it is possible that you never were taught about the basic program model. If not, none of the above makes much sense. I've kind of assumed you understood about code, const, init data, uninit data, heap and stack fits into a clear, well-designed program model that supports local variables in a very specific way. If you aren't familiar with that model, say so. I will present a short post on that. It's not hard to understand and it is VERY IMPORTANT right now that you understand it.

So maybe that's the big problem here. You need that piece.

Let me know.

Jon

Reply to
Jon Kirwan

Actually I was doing just that... not necessarily to make it clear to me but to convey my understanding so that you can point out exactly where I was screwing up. Take a look here...

formatting link

.. keep in mind I had done this before reading your post and the latest from Arlet so any thoughts that resonated there are not present in my 'sketch'.

Now, on my first read (it takes multiple reads to sink in sometimes :) ) of your post I believe I am really hosed. I am going to go back through and comment on the individual points but I thought I'd share the image above. Perhaps you'll get a good laugh?

This may be. I don't write code often but I like to! I am an EE. Had some basic C and asm as part of the curriculum but nothing too extensive.

--------------------------------------- Posted through

formatting link

Reply to
eeboy

The way we've described it, you have 3 different stacks. One for main/A, one for B, and one for C, each allocated with the full size.

When task B is active, all subroutines and auto variables are using the B stack. When it switches to task C, everything B was doing is left on the B stack, and the stack pointer is loaded with the stack for C. On return from the context switch routine, the program counter is loaded from the C stack, and execution resumes from the point where C called the context switch the last time.

So, you don't actually copy the contents of the stacks anywhere.

Reply to
Arlet Ottens

Ok, I had always assumed that I was just going to create one large stack and these 'other stacks' you had talked about were just temporary holding places. However, what you actually want to do is create these static arrays and then manipulate the stack pointer to point to these 'other stacks'? Correct?

I guess it probably seemed pretty silly that I was thinking we'd copy off the stack frames from the 'main stack'.

This does bring up a question though. If I am making a stack for some function. It seams it would be easy to allocate space for it as it would be some base value plus whatever space for autos... but if that function calls another sub which might call another sub which might... see where I am going with that question? How do I allocate? I know that I use some large char arrays (~128b) for sprintf. I guess those should be static huh?

The initialization what does it consist of? Is this so that we know 'where to go' if the task should ever return?

So something like? ...

struct T{ unsigned long * task_sp; unsigned long reg_r4; unsigned long reg_r5; unsigned long reg_r6; unsigned long reg_r7; unsigned long reg_r8; unsigned long reg_r9; unsigned long reg_r10; unsigned long reg_r11; unsigned long reg_lr; };

Ok... pswitch looks roughly like this:

pswitch(){

//Assembly pseudo code move r4-r11 and lr into our CURRENT task array register storage load CURRENT with next task array sp (handle any wrap from T_MAX to 0) move CURRENT task array register storage into r4-r11 and lr return

}

This assumes that pswitch is called from task (will eventually be call to psleep right?). No parameters are passed to pswitch in the simple example (T array known size and CURRENT is global). No need to do anything with the stack contents because this stack is separate/isolated.

--------------------------------------- Posted through

formatting link

Reply to
eeboy

Not necessarily. On some memory-constrained systems, or on systems with hardware stacks, copying the stack may be a useful method. It's a lot slower, though, so when possible it's better to create separate stacks for all the tasks.

If you make those arrays static, you have to keep in mind that printf() will no longer be reentrant. This could be a problem if you call that printf() function in a task, do a context switch in the middle of it, and have the other task use the same static buffer. An auto variable doesn't have that problem, but will require a suitably large stack to accommodate it. A static buffer protected by a semaphore would also be an option.

Reply to
Arlet Ottens

To find out how much stack space you need for a task, first check if your compiler/linker combination can tell you. The compiler knows how much each function needs; the linker can add them up. For some compiler-linker combinations.

The second method is to use a static stack analyser, for example StackAnalyzer from AbsInt

formatting link
or Bound-T from my company
formatting link

The third method is to allocate a large stack area, then run some comprehensive tests while keeping track of the "high water mark" -- the largest amount of stack ever used -- then add some margin and allocate that amount. And hope that your tests were comprehensive enough.

The fourth method is to do a static stack analysis, but manually.

And of course, in all cases you are better off if you avoid recursive calls.

HTH,

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .
Reply to
Niklas Holsti

It's probably starting to get confusing. I'm not sure, even, what is being asked by eeboy. But your response, Arlet, needs some clarification I think. I'll use your comments as a foil, but I'm really responding to eeboy and not you.

...

Often, what I may do is set up each thread in a separate c file. And I then don't publish (make 'extern') any static data I define that I don't want other threads using or messing with.

This works just fine. It's just fine to use static arrays in each thread and it's not even risky, so long as you make sure that other threads don't use the same static regions.

You mentioned sprintf(). Let's say you don't have enough sram laying about on your micro. So you simply *must* use the same buffer for sprintf(), in all threads, because you have no other choice in the matter. Do you have a problem? Maybe. Not necessarily.

We've been discussing a non-preemptive design all along. Keep that in mind. Yes, if the system were preemptive and could literally _tear_ control away from you, it would be a problem and _then_ you probably would need a semaphore to help arbitrate access to the buffer for sprintf(). But let's say this is NOT preemptive. In that case, there is no problem. When you call sprintf(), unless you re-write it, it will do the work you asked and then return. But since this is a cooperative system and changes to other threads can only occur when you call the thread switch code, there is no way for interference from other threads. You have control over the cpu, so there is no problem. So long as you get a chance to _use_ the buffer before you switch threads, or copy the part of it you need somewhere else, first.

It's not nearly as scary as some are making this sound. Especially not if you are staying cooperative.

Now, if you go preemptive, then yes. All hell breaks loose. It's possible that there is static memory being used in some floating point code you don't even know you are using. So you have thread T1 in the middle of some expression calc and then the operating system 'preempts' you and forces control over to thread T2. But the floating point code was not written for a preemptive operating system and the author, stilly idiot as he or she may be, used some static memory that is entirely internal to their code for temporary storage during computation. But T2 now starts up a calculation, or perhaps comes back to one it thinks is in progress, and it's just BAD. Nothing works and you've no idea why your numbers aren't right. The only fix is to make the context switch code aware of all these library static areas and save them away. Or else block switching when a library routine is running. Stuff like that. It gets really bad, fast.

Which is why I don't recommend pre-emption as your first attempt. :)

Anyway, I'll answer other stuff elsewhere, shortly.

Jon

Reply to
Jon Kirwan

A common scenenario is the following:

sprintf( buffer, "hello world\n" ); uart_sendstring( buffer );

Now, if your uart_sendstring() puts stuff on an transmit FIFO for the UART, and the FIFO happens to be full, you'd usually want the task to sleep until there's some room in the FIFO.

While the task is sleeping, another task is scheduled, that also performs a sprintf() overwriting the buffer that was still used.

Reply to
Arlet Ottens

You don't know it and won't understand my answer right now, but you could have been almost a genius thinking that way.

There actually is another concept which helps implement 'iterators' (it's a way of letting a function produce items for a 'for loop' and allowing that function to determine when the loop terminates, all the while maintaining local variable space for the duration.) It uses something called a 'thunk.'

But no, I was talking about something else. And yes, I think you now realize more about what I was saying.

Yes. Static arrays are fine. So is malloc()'d heap space. The main thing is that you need some persistant ram laying about in which to hold all those local variables you were worrying about.

Actually, there are more ways than either static or heap. You could do some fancy footwork and instead divy up the stack that the compiler vendor gives you (or you request with some switch option to the compiler or linker or in the link file itself) into as many parts as you feel you need. I've done that, too. It's more subtle that way and uses stack space that _some_ stack check routines will think is valid because of the way I 'stole' it.

But it doesn't really matter how you get it. Technically, you might be able to allocate it as a local variable in some function... just so long as you didn't exit that function's instance before you destroyed the threads built upon that local memory. (For example, you could set up the stacks as local variables inside of main() and that works, too.)

You just need the stack space to stick around at least as long as the thread does. Bad news, otherwise.

Once you really get this down in your mind, your creativity will bloom in the various ways you might do this. You have lots of ways to slice this.

No, that's NOT silly. In a Unix fork() call, the entire stack context may get copied over into a new thread/process. So it's done. But that's heavy-duty stuff. I'm just thinking simple and lightweight work here. Not big-time, major operating system style stuff.

Below, I'll talk a little more about all this. It may help.

I've addressed this in another post, earlier. So I will leave that where it is for now.

Okay. So here's the deal. Keep your mind focused on what the context switch code actually does and does not do. By this, I mean the thing I called ctxsw() before and the thing that Arlet and Tauno offered in source form for ARM.

At some point in time, the code in main() or else the code in some function that main() calls will eventually give up cpu time for the very first time to the next thread. By this time, of course, there must actually _be_ a next thread. Or else bad things happen. So main() or its delegates must have done some initial stuff first so that ctxsw() won't branch into some random part of memory.

Let's stick with Arlet's style. His routine does NOT save r0-r3. It does save r4-r11, plus r14, if memory serves. So let's go with that. Let's assume we are talking about 32 bit registers here and that the stack is aligned on a 32-bit boundary. (Frankly, I don't know or care. It's just a detail to me. But let's assume it for purposes of discussion.) So here is what is going to happen in ctxsw():

1 push r4-r11 and r14 onto the current stack (old thread) 2 save stack pointer into static memory of current thread 3 update thread pointer somehow 4 load stack pointer from static memory of next thread 5 pop r4-r11 and r14 from the current stack (new thread) 6 return

So what happens if you call that from main() and you did not do any initialization? Well, it will:

(1) Dutifully push 9 32-bit words onto the current stack. That's fine. It's main()'s stack and there is plenty of space, we can assume. So no problem so far.

(2) Well, what exactly does 'current thread' mean here? How do you know? Well, let's assume for a moment that 'current thread' is determined by an array index (not a memory pointer but a value that starts at 0 and advances up to some known limit and then goes back to 0, again.) So let's call that array index 'cur' for now. So the first problem here is that it would help a lot if 'cur' starts out as 0. Luckily, if you set up 'cur' as a static variable, c compilers tend to set it to zero before main() starts, so you are good there even if you don't initialize it. So this step will put the stack pointer in the right spot. So no problem so far.

(3) Updating the thread pointer might be nothing more complex than adding 1 to it and making sure it doesn't exceed your array index limit. So if N is the number of array indices you have, then you just add 1 and make sure that if it becomes N you set it back to zero. So no problem so far.

(4) Now this one is a problem. The code will load up the stack pointer from your array index [1]. But if you didn't initialize what is stored there, it will be zero (or if you got it from heap space instead, it will be 'unknown.') So the stack pointer will now be zero. Not so good -- see (5).

(5) Pop 9 32-bit values. But the stack is zero. So likely you will pop really bad stuff. Might even get an interrupt or exception from that. But no matter what, it won't be good. So registers are now unknown crap and the stack pointer, if you survived this, is probably pointing into unknown territory.

(6) Return? Great. Stack pointer points to who knows what and you do a return? This is where you crash and burn for sure.

So you really need to initialize things before the first switch takes place.

It would be good, first off, to make sure that 'cur' gets set to zero explicitly (because you might want to reinit everything at a later time and even if the c compiler does set it to zero to start, it will be some other value if you reinit things later on.) Also, you will want something reasonable as the stack pointer loaded in step (4). This means main() or some delegate should make sure that the value is meaningful. And that means that there must also be room for at least those 9 32-bit registers (doesn't matter if they are garbage or not if your thread-main functions all start without parameters) and a reasonable 'return' address, as well. That 'return' address is actually the starting address of your thread-main. And if you want to be safe, you will add one more thing -- another return address that goes to the kill code (pkill()) so that if the thread-main function itself does a function return then it will go kill itself, properly.

So the stack (static array, heap space from malloc(), etc) would look like:

s[M-1] { address of pkill() } s[M-2] { address of thread-main } s[M-3] { r14 } s[M-4] { r11 } { . } ~~~~~ ~~~~~ { . } s[M-11] { r4 } ~~~~~ ~~~~~ s[0] .....

If I compute it right, that is 11 32-bit words. And note that you need to set up things so that when pswitch() goes and calls ctxsw(), the stack pointer that it will find points at r4. In other words, you need the address of s[M-11]. Right? That way, when ctxsw() runs, it will load up that address in step (4), then it will pop those nice register values (which may be garbage or set to zero or whatever the compiler requires of you) and then return to thread-main. Which means starting thread-main, in effect.

Note that there will only be one thing on the stack at that time -- the pkill() address. So if thread-main decides to go and return, then it will 'accidentally' go call pkill(). What pkill() does is then load up the thread pointer and kill the thread. That can be easy to do, or not. (More work for you to worry about.) Or you can simply say it isn't allowed and make pkill() instead a die() type of thing -- bail out and die if it happens. Your choice.

Anyway, that's the kind of thing you need to do. It's not conceptually hard, but the details _are_ important and you must get them right. If you do, everything works slick. If you don't, it's disaster. Sink or swim; do or die; etc. Never is it a something in between. Eveything locks up, or else everything just flies like a bird.

Yeah. You can add more stuff, too. Folks just love adding stuff, in fact. You get thread ID numbers, maybe ASCII names for the threads, asynch message words, and in fact just about anything your imagination can come up with. I will often add the start of an exception stack here, as well. This allows thread-based exception handling in c and that is a nice thing to have.

Hmm. Looks real similar to what I wrote above. So yeah.

I often use a no-parameter pswitch() function. You can permit a parameter to specify a thread, if you want to have the ability to force a particular 'next thread' event. But it is rarely needed, to be honest.

I'm a little confused about your 'eventually be call to psleep right?' question. The idea is that the ready queue threads just 'round robin' with each other. If there is nothing in the sleep queue, then what is in the ready queue just goes round and round. But if there are some things in the sleep queue, then if and when they time out they will get added to the ready queue and become part of the cpu-party and get some time to run. Otherwise, they just hang out in the sleep queue until their time comes.

A thread can go to sleep. It calls psleep() with some time value to do that. What happens is something like this (keep in mind that I'm not including ASSERTs or other important tests that may be required but just focusing on the thrust of it):

stat= disable( ); qunlink( pcur ); qinsertd( sleepq, pcur, time ); ctxsw( ); restore( stat );

(It is a 'critical section' so you need to wrap it with disable and restore.)

As you can see, it's pretty easy. It unlinks itself from the ready queue (the node has .next and .prev pointers, so specifying the node is enough to figure out how to unlink it from the queue it is in ... which had better be the ready queue), then inserts itself into the sleep queue, then begs ctxsw() to switch. Now, as you might realize by now, since it has removed itself from the ready queue... ctxsw() won't come back to it. But ctxsw() will save the registers just fine on the stack before going away.

Now, at some later point, the timer moves this thread back into the ready queue and eventually some ctxsw() call will cause a return. Where will it return? Here. And then it will restore the interrupt system per whatever it was when this thread went to sleep [okay, there is a long subject and debate here about how that works and why... but I don't want to get mired in that now] and then returns from the original psleep() call that started the whole thing.

The effect is exactly what you want.

Tricky, eh?

Jon

Reply to
Jon Kirwan

Oh, yes!!

As I said, so long as you do what needs to be done _before_ allowing a context switch, it is fine. Your case is common and would violate my cautionary point. So good to bring it up.

Jon

Reply to
Jon Kirwan

Jason (or eeboy), here is a reduced sample structure from one example of code I've used before where I wanted to use a bog standard linked list node type on a processor with enough memory that I didn't need to worry:

struct T { struct T *next; struct T *prev; void *stack; void *alloc; exceptframe_t *exceptstack; int message; qnodekey_t key; qpriokey_t priority; };

Let me explain each item.

'next' and 'prev' are the queue next and previous pointers. You already understand these, so I won't bother explaining them much further. The node, so long as it exists and is valid, should be part of _some_ queue. So those pointers should be always valid, once constructed.

'stack' is that stack pointer value that ctxsw() uses. So it should always point to the bottom of the stack for that thread. We've discussed this already, so I'll leave it there for now.

'alloc' is something else. When I allocated stack space on this system (this particular case), I used malloc() to do it. When a thread was created, the caller specified how large to make the stack for it. So I just called malloc() to do the work there. In this system, it was okay to go that route. In other cases, I didn't do that. Anyway, the value of 'alloc' was always set to whatever malloc() gave me. That way, if the thread returned and caused pkill() to execute, the code in pkill() could return the space by calling free() with this value. I needed some way to remember the malloc() pointer. This was it.

'exceptstack' was either NULL (if there was no active exception handler at all) or else pointed to the lowest level exception node, if there was an active handler. Keeping it in the thread node like this meant that each thread could support its own, separate exception handler list. If you don't support exceptions in c, drop this. You won't need it.

'message' was a simple, asynchronous message word. Any thread could send a message to any other thread using this value. If it was zero, there was no message. If non-zero, then there was a message and the value was the message. Getting the message set it back to zero. It's a really simple way for threads to chat, without doing something complex like pipes, buffers, semaphores, and so on. And so long as you are aware that only the last one of multiple messages is received by the thread when it runs and looks, then you are fine. It's cheap. That's its really big feature.

'key' was either the thread priority (if inserted on the ready queue) or else the delta time value (if inserted on the sleep queue.) [In a semaphore queue, it is meaningless.]

'priority' is the default thread priority when moving it from the sleep queue to the ready queue. A value fixed when the thread was created.

That's enough to do some real work.

Jon

Reply to
Jon Kirwan

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.