Really tiny microcontrollers

How does that work if the entity for which you need a pointer is above address 0xFFFF? That would include all the peripheral registers, RAM and Flash on many ARM variants.

Mark Borgerson

Reply to
Mark Borgerson
Loading thread data ...

That may be compiler dependent. I don't think there is any particular reason that you can't have bytes and 16-bit words on the stack. The compiler may have to do some alignment if you also have 32-bit word variables on the stack.

If your code is simple enough and your compiler smart enough, small local loop counters will be in registers.

Mark Borgerson

Reply to
Mark Borgerson

The IAR compiler for the ARM is perfectly happy to use 16-bit stack entries for local loop variables.

I compiled the following code:

void TestStackVars(void){ volatile uint16_t i,j,k,l,m, result; for(i=0;i

Reply to
Mark Borgerson

I meant that whenever you call a sub-function, the caller-saved registers will be saved on the stack as 32 bit values, even though you may only have been using them as 8 or 16 bit variables. Similarly when you get an interrupt.

Reply to
Arlet Ottens

I agree. It's one of those tradeoffs you don't appreciate until you actually look at the assembly code.

If you let the compiler use registers for the 8 or 16-bit variables, it has to push more 32-bit registers onto the stack, but the generated code is shorter and faster.

If you specify variables on the stack, the compiler will only use as much stack as is needed for the variable and you won't need to push and pop as many registers. However, the code will take more instructions for each operation and will be longer and slower.

As a programmer, you get to balance RAM/stack usage, flash usage, and execution speed. That kind of balancing act is one of the things that separates embedded programmers from Linux and Windows programmers.

Mark Borgerson

Reply to
Mark Borgerson

To Arlet:

Agreed, it's a "natural" solution because that is the way it is usually taught in school and in many books. It's NOT the only solution for creating linked lists, though.

In fact, I use a different method in an operating system I wrote to avoid the need of memory pointers while retaining linked lists (which I do need.)

To Boudewijn:

I routinely use linked lists with microcontrollers (which in some cases I deal with may have less than 1k ram) with an operating system I wrote and often use where it makes sense for the application. The O/S needs linked lists to support the run queue, the sleep queue, and the semaphore queues, for example. The method does NOT use memory pointers as data elements but instead uses 1 byte ram as an index into the "array" to achieve this. It's a very simple method that one typically uses in languages which do NOT support memory pointer semantics (like early BASIC.) Works fine. Very light on ram usage. It allows either singly or doubly linked lists, so you can shave off a byte for each thread and the head/tail of each queue. A minimal system with 3 threads, support for run and sleep queues and 4 semaphore queues and support for priorities, requires about 20 bytes of ram for the O/S.

I'd consider 4k ram to actually be very "roomy" for what I'd designed this O/S to operate within (not 32-bit alu systems.) Linked lists are always a natural solution for operating systems. It's just a matter of how you implement them.

Jon

Reply to
Jon Kirwan

It does have a limitation that the number of items in the list must be statically allocated. Of course, on a memory restricted embedded CPU, you usually need to do that anyway, so in practice it's not a big deal.

By the way, I wrote a simple multi tasking scheduler for ARM7/Cortex that keeps the run/sleep/semaphore queues in 32 bit integers. Each task is represented by a bit. Waking up a bunch of tasks is just a bitwise OR operation. Finding the highest priority task is just a matter of finding the highest set bit, and some ARM cores can do that with a single CLZ (count leading zeroes) instruction.

Reply to
Arlet Ottens

My approach on DPS (which is a full blown OS for power but recently I used its scheduler on a "small" 16k RAM CF, MCF52211) was (is) different. It is up to each task to state it is OK as far as it is concerned to "nap" by exiting (just for cooperative rescheduling) via this or that call (i.e. it can exit and allow nap or exit and disallow it). Obviously if a task does not exit cooperatively but because its time slot has expired the nap is disallowed. Then the scheduler decides whether to put the machine into nap for some time or not; it will nap it only if all tasks have exited allowing that.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Reply to
dp

Agreed.

When possible on the micro, I will often "wait" on a halt instruction of the processor, when all processes are sleeping or waiting on a semaphore that may be changed on an interrupt event. The timer event ticks, waits up the processor, decrements a single counter for the top sleeping queue entry (delta times are used, so all threads keep delta times relative to the thread ahead of them and only the top thread needs to be decremented) and if it becomes zero, then all threads with zero deltas (the current one and any following ones with zero, also) are moved to the run queue (by moving exactly ONE index value.) Since a thread is now ready to run, that is executed. But otherwise, with nothing in the ready queue, it can just sit on a halt if that is appropriate.

Jon

Reply to
Jon Kirwan

I just use a timer counter per thread. The timer interrupt decrements all of them in a small loop. Since I never have more than a handful of threads, the loop is really short and simple:

for( i = 0; i < NR_TASKS; i++, task++ ) { if( task->timer && !--task->timer ) mask |= (1 timer.

Reply to
Arlet Ottens

It's too easy for me to NOT use that approach. A delta queue is trivial to implement and requires no looping (unless there is more than one thread waiting on the same time event.)

Threads are inserted into the sleep queue by time, but only the delta is retained. So if you have a thread waiting for 8 clocks at the top and one more with a delay of 2 from that (for 10), then inserting a new thread wanting to wait 9 will first subtract 8 from 9, leaving 1, then will insert itself after the top entry but before the other entry. The resulting sleep queue, with 3 entries, would be 8, 1, 1 for their delta time values (both the inserted and one thread immediately following it would have updated values.) In this way, I only need ONE timer and very few instructions needed to advance the entire queue since only the top one needs to be decremented.

For very simple systems (I don't always do this), my thread structure is:

Sleep--------------------------------> -> [] -> [] -> [] - / | Ready------------> -> [] -> [] -> [] - | / | Avail--> [] -> [] - | | ^ | |_________________________________________________|

It's circular. Moving a thread from sleep to ready is simply moving the sleep queue pointer by 1.

Sleep Ready Avail Pointer conditions Description

------------------------------------------------------------- Empty Empty Empty A==R, R==S, S==A Meaningless Empty Empty A==R, R==S, S==A All slots sleeping Empty Empty A==R, R==S, S==A All slots ready Empty Empty A==R, R==S, S==A All slots available Empty AR, RS, S==A No slots sleeping Empty AR, R==S, SA No slots ready Empty A==R, RS, SA No slots available AR, RS, SA No empty queues

It's very easy to check indices for the above conditions (or pointers, if used instead.)

The above is for very simple cases, though, that are primarily timer-driven and threads move frequently from ready to avail, as well, and all of the pointers move in clockwise (or counter clockwise depending on view) arrangements. I use the more usual methods when this isn't appropriate.

But the timer queue design almost always works out well, regardless of the other details. I really like having only ONE entry to check in the frequent and regular case of a timer interrupt event. It keeps variability of latency, absolute latency, and excess cpu usage to minimums.

Jon

Reply to
Jon Kirwan

Making these "volatile" completely changes the semantics here. Of course the compiler makes them 16-bit and in-memory, on the stack - there is almost no other choice it could make (I say /almost/, as there is no requirement in C for there to be a stack at all).

This is a totally different situation from when the compiler stores registers on the stack as part of the calling convention. I expect that IAR's compiler - and most likely all compilers - will store the full

32-bit registers on the stack if it is unable to eliminate the store altogether.

Again, this is no surprise whatsoever.

Reply to
David Brown

That is the kind of thing that separates programmers obsessed with micro-optimisations from programmers who know what is important.

Leave your local variables as local variables, and let the compiler optimise the code as well as it can. You won't make better code by mucking around trying to make your local variables "volatile" to push them onto the stack. All you will do is make your code much larger, much slower, and much harder to understand. The chances of /really/ saving more than a tiny amount of stack space are small - the compiler will use the registers for other purposes, and save them anyway on function calls (and for interrupts, it must save /all/ the registers that are used). The odds are actually higher that you will use /more/ stack space - by trying to force the compiler like this, you will hinder its optimiser and limit opportunities for inter-procedural optimisations which can significantly reduce stack space (as well as making smaller and faster code).

The compiler is better at this than you are. Learn to use your compiler properly - understand its switches and options, and study the generated code. Then let it do its job. Work /with/ your compiler, not against it, and not by ignoring it. /That/ is one of the things that separates embedded programmers from "big system" programmers.

Reply to
David Brown

I understand what you're saying and I pretty much agree. I've hardly ever had to force an ARM compiler to do something or had to resort to assembly language. I've used assembly language in the past for the MSP430 and M68K when the compiler was either inefficient or refused to use optimum instruction (The DBNE loop instruction on the M68K comes to mind). The compiler certainly is better than I am at producing good ARM code.

However, my example is there to contest the following statement:

That's not always true and I wrote the example to illustrate that fact. I suppose I could have avoided the "volatile" keyword by simply adding a few more levels of nested loops to the point where the compiler register allocation algorithm ran out of free registers.

Unless your processor is VERY limited in RAM space, such considerations are probably best left to the compiler. I have run into some tight RAM problems on MSP430 systems where I needed a couple of 512-byte SD buffers, an inut data queue, and general variables on a system with just 2KB RAM. Things were tight, but a long test under real world conditions with stack markers showed I had almost 5% of the RAM unused for stack space. I later shifted to a processor with more RAM when I found that long delays in SD writes were getting close to input queue overflow.

Mark Borgerson

Reply to
Mark Borgerson

You can see under a BGA with that???

Reply to
Ben Bradley

Op Tue, 26 Feb 2013 17:06:23 +0100 schreef Mark Borgerson :

I should know that and in hindsight I do, thanks for reminding me that stack is not only used as call stack but also for local variable frames.

Yes, an 8-byte boundary even, in the latest EABI.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:   
http://www.opera.com/mail/
Reply to
Boudewijn Dijkstra

Op Tue, 26 Feb 2013 18:41:02 +0100 schreef Jon Kirwan :

I am familiar with this technique but I didn't know that people would call this "linked list" as there no conceptual linking going on. I would call this something like indirect indexing.

I may be misunderstanding something here, but I would think that you always have a doubly linked list because you can traverse the array of indices in two directions.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:   
http://www.opera.com/mail/
Reply to
Boudewijn Dijkstra

Op Tue, 26 Feb 2013 16:31:09 +0100 schreef Mark Borgerson :

I was talking in the context of That would include all the peripheral

When I talked about "explicitly storing a pointer", I meant explicitly declaring a pointer variable or constant. This is not needed for peripherals as they can be accessed via a literal base pointer to a struct, nor for RAM (unless you use dynamic allocation), nor for Flash constants, the only exception in the OP's case is function pointers.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:   
http://www.opera.com/mail/
Reply to
Boudewijn Dijkstra

I found M68K compilers could often generate DBNE loops if you made sure the index was 16-bit, and you were careful about exactly what was in the loop, and how the condition was checked. But YMMV.

I haven't had to use much assembly on the msp430 since the early days of mspgcc - the compiler does an excellent job. In general, it is not often that I have to write assembly to improve on a compiler (obviously you need assembly for features that don't exist in C, if the compiler or library does not support it).

I suspect if you did that, they would be stored as 32-bit registers on the stack as that is the fastest size for the processor - the variable would initially be in a register, and would only move onto the stack as an overflow.

I know what you are trying to say here, but the fact is that most compilers will normally store such variables as 32-bit if they have to put them on the stack (which they will not do unless it is absolutely necessary). You are right that they will not /always/ be 32-bit on the stack - but they will be 32-bit most of the time.

I worked with a COP8 program in assembly which practically filled the

32K OTP rom and 512 byte ram. By the end of the last version of the program, bug fixes or feature changes could take minutes to find a solution, but days to find a spare bit (not byte) of ram, or a couple of spare bytes of rom space. It also had an external flash for logging data, with bus cycle times measured in milliseconds.

mvh.,

David

Reply to
David Brown

Hmmm. I wasn't aware that using LDRH and STRH to store 16-bit half- words took any longer than the 32-bit LDR and STR instructions.

So why did that change when I used the 'volatile' keyword? Using that keyword should not have changed any of the supposed advantages of using

32-bit stack entries.

That brings back terrible memories of the first programs I wrote in C for a 2KB Flash 8051 variant. I still shudder when I think of all the different pointer types!

Mark Borgerson

Reply to
Mark Borgerson

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.