FreeRTOS context switch time

RaceMouse · 2006-07-12T07:18:40+00:00

Greetings,I was going to measure the time it took for a context switch on an AT91SAM7S256 by toggling a port pin. My meaning was to set a port pin high when entering vPreemptiveTick and setting it low again just before exiting. The result was that port pin didn't toggle.I found another way around the problem so the result is in hand - I am just wondering : Is it not possible to have direct port pin access in a naked interrupt routine ?/RaceMouse

M

Michael N. Moran 20 years ago

Only if you need that information. Either way, especially on systems with cache, access to a state variable is (always?) faster than to peripheral I/O registers.

It's really not an issue of optimization. These primitives only come into play at the lowest levels of I/O, but I usually generalize them into inline functions that are defined in a cpu specific header file. Then, I use them when writing the device driver as if they are required on any processor that may execute the device driver. On platforms where their function is not required, the inline functions generate no code.

PCI door-bell registers are a good example. A master device on the PCI bus may be assigned one bit in the register which it may use to cause an interrupt to the processor with the door-bell register.

When one of the master devices wishes to cause an interrupt to the device with the door-bell register, it simply writes the register with a bit mask with its assigned bit set to one.

Any other of (up to 32) master devices on the PCI bus may do the same without contention.

For this function, there is no need to know the state of the interrupt bit.

For PCI, it also eliminates of the interrupt from arriving at the processor ahead of previously written data that may still be traversing the PCI bus ;-)

I have no idea. I suspect that there are military and other high performance applications that require those kinds of embedded systems. Besides, as the availability and price of such systems increase, the applications will come. Todays multi-core desktop and laptop processors will pave the way.

If you build it ... they will come :-)

I don't know if abstraction is the right term, but it certainly increases the complexity of the software systems when multi-threading and multi-processors are used. Especially at the system level.

As an example, look at the game console industry now as it grapples with the 3 core XBOX 360 powerpc processor (SMP) and the asymetric (but still multi-processor) CELL processor.

Why not? But ... only if I need it of course ;-)

What doubts about efficiency? I suppose you're referring to the OS here, but SMP and the obtaining the performance of multi-processor systems does not *require* a big OS. Nor does it require an opaque distinction between user and kernel space (ala Linux).

Code bloat is a function of programmers and feature-creep, not of the processor or OS.

Michael N. Moran (h) 770 516 7918 5009 Old Field Ct. (c) 678 521 5460 Kennesaw, GA, USA 30144 http://mnmoran.org "So often times it happens, that we live our lives in chains and we never even know we have the key." The Eagles, "Already Gone" The Beatles were wrong: 1 & 1 & 1 is 1

Vote

D

David Brown 20 years ago

ColdFire devices (at least, the one I'm using) have a data out register for each port (reading returns the value of the register, not the value on the pins - that's a different register), as well as clear bit and set bit registers for the port. That way, the programmer can choose what's most convenient for them at the time (the ColdFire cpu already supports atomic rmw instructions directly on the memory, but a single direct write is faster).

Who says IBM engineers have no sense of humour?

It depends on the particular PPC, and what you are trying to do. For the smaller PPC's, memory writes are never re-ordered, so it's enough to just use "volatile", and make sure the port addresses are in non-cached areas. Reads can sometimes be re-arranged, however, and can skip the queue ahead of pending writes. So if you tried to set an output bit then read an input port, the core might queue the write until the read was completed (for normal memory access, this would speed up the core throughput). That's where the "eieio" instruction comes in - use it between these two accesses, to enforce the ordering. So yes, you do need to take this sort of thing into account in the programming (the compiler can't do it for you), but it can usually be limited to the most low-level stuff. Any processor with ooo execution, or super-scaler execution, or even just caches has these issues to some extent.

Vote

D

Darin Johnson 20 years ago

Also the PowerPC architecture allows memory regions to be marked as "guarded", which prohibits out of order execution for data or instructions in that area.

-- Darin Johnson

Vote

D

Darin Johnson 20 years ago

Depends upon what you mean by multiprocessing. Most embedded systems I've worked on had some variety of multiprocessing, but I've worked on larger systems (medical devices, routers). Ie, a system with a PowerPC, PIC, and several DSPs that can all communicate with each other. This is still below a "workstation sized OS".

If you're using a system with a bus-mastering device, you end up treating pieces memory in a manner similar to multiprocessing systems.

This is not necessarily the same as "symmetric multiprocessing" though, which is another headache and which I think is rare in embedded systems.

-- Darin Johnson

Vote

C

Chris Quayle 20 years ago

I think the context was multiprocessing as > 1 cpu core, but agree that there is usually some degree of multiprocessing, even if it's just a division of labour between interrupt and mainline code. Big system design has come to embedded though - cheap pc hardware and peripherals, embedded Linux etc brings the cost and time to market down, but at what cost to system visibility and integrity ?.

Having said that, there's a Linux based firewall / router here, running from floppy on a P75, that hasn't been rebooted in over 16 months, so I guess it can't be that bad :-)...

Chris

Greenfield Designs Ltd ----------------------------------------------------------- Embedded Systems & Electronics: Research Design Development Oxford. England. (44) 1865 750 681

Vote

C

Chris Quayle 20 years ago

Rut perhaps, but it often makes sense to use an existing knowledge base. It takes time and effort to become fluent with any complex cpu architecture and not everyone has the time to do this (for interest only) if it's not directly revenue earning. It probably explains why some of the older architectures with serious flaws, like the Pic or to a lesser extent 8051, remain so popular. Also the Coldfire, which looks

68K externally, but a very different animal under the skin. If you program low level, intimate knowledge of the architecture, warts and all, is essential. The tool availability and knowledge base for some of the older architectures is vast.

One way to get this fluency is to write a common interface bsp like library for the device peripherals. You gain the knowledge along the way and the result is a well tested foundation for any number of projects. Workstation os and rtos have done this for years, but it's just as applicable to msp or 8051 class systems. It saves time and effort over several projects and the common interface makes it much easier to sustitute a different device. Eventually, the cpu and peripherals just get abstracted away altogether :-)...

Chris

Greenfield Designs Ltd ----------------------------------------------------------- Embedded Systems & Electronics: Research Design Development Oxford. England. (44) 1865 750 681

Vote

M

Michael N. Moran 20 years ago

Fortunately, by learning additional architectures you can identify your assumptions and adjust your mental model concerning what is possible.

Sure. Although if you're in the technology business, its in your own interest to make time, as the more advanced technologies will eventually filter down into your domain.

As for the "directly revenue earning" part ... well ... the benefits of knowledge are not easily/directly/simply traceable to economics, but few would argue that there is a benefit. ;-)

[snip]

Sure. But if you plan to reuse your code across cpu types, then knowledge of more general system issues (out-of-order execution, cache coherency, syncrozisation, etc.) enables your code to be re-used without change/debugging.

Of course, when programming resource scarce micro-controllers with applications that are either a) too simple to warrant the effort to write re-usable code or b) too large to fit comfortably within the limits of the resources, the rules are different.

The tool availability for *many* cpu architectures is present in the form of GCC.

I suspect there will always be a larger group of unskilled versus skilled labor, and I can understand the attraction for business. However, I have also witnessed the effect that this has on the business when it assumes that a programmer is a programmer, ... and it isn't pretty ;-)

I assume that you mean fluency with other cpu/os archectures.

You gain the knowledge of other cpu architectures only if you are able to test the device driver actually runs on other cpu architectures.

The same peripheral code for the same OS, may not work, for example on an out-of-order cpu, unless you already understood the abstract needs of a more general cpu model.

What is gained instead, is knowledge of using a particular peripheral on a particular cpu/os architecture.

There is even more to be gained by writing and testing the same driver for a variety of cpu architectures and operating systems.

Doing so exposes and separates the assumptions from the abstractions about the cpu architecture (and the programming language) in the same way that traveling to another country/culture can challenge your belief systems.

Assuming, of course, that the devices actually share an abstraction (e.g. byte stream.) It might be difficult to substitute a real-time clock for an ethernet controller ;-)

Michael N. Moran (h) 770 516 7918 5009 Old Field Ct. (c) 678 521 5460 Kennesaw, GA, USA 30144 http://mnmoran.org "So often times it happens, that we live our lives in chains and we never even know we have the key." The Eagles, "Already Gone" The Beatles were wrong: 1 & 1 & 1 is 1

Vote

C

Chris Quayle 20 years ago

Agreed, though in the real world, you tend to concentrate on core areas that bring in the work. This doesn't mean blinkered and no attempt to keep up with developments, but embedded is a broad church, so you become selective. Intimate knowledge in areas that are relevant or sometimes just interesting, global overview on the rest. Of course, if you are an academic and/or have lots of free time, I guess that doesn't apply. At the high end, systems programming on Unix, Vms or Linux can be absorbing, but it's pretty remote from the average embedded design, even if much of the underlying theory is in common.

Much embedded design is in this class. Legacy hardware with applications too complex for the available resources = all too common. Penny pinching on the hardware often means penalty in time to market, difficult to maintain code base and marginal performance against spec. As for code reuse, it's (IMHO) worthwhile for any size system and the gains start after the first project.

Client management do sometimes assume that software is easy. Electronics hardware companies that build a first micro based (Pic ?) project with a few pages of asm, then assume that all micro projects are just as easy.

But that's true for any new endeavor - curiosity and the learning curve while you find all the wrinkles. That's part of the challenge and fun of it :-)...

Chris

Greenfield Designs Ltd ----------------------------------------------------------- Embedded Systems & Electronics: Research Design Development Oxford. England. (44) 1865 750 681

Vote

A

atte.kojo 19 years ago

I would strongly advise never doing this with I/O registers unless you are really sure what you're doing and have read the processor manual at least three times (and memorized the section containing register descriptions). Most microcontrollers use so-called write-only bits in I/O control registers to save die area. These registers usually always read back as '1'. So, when you do something like this

foo->register |= 0x40; or foo->register &= ~0x40;

you also set all the write only bits to '1' in the process. This happens regardless if the operation is done in software or using a bit set/clear instruction of the processor.

Those kind of bugs are a real hoot to track down, I can tell you ;-)

Vote

4

42Bastian Schick 19 years ago

This does not work on ARM. ARM has a load/store architecture, so it needs some (2 in this case) registers to write to an I/O.

So portSAVE_CONTEXT() saves some destroyed registers.

Just set the pin after storeing and reset it before restoring registers.

Then take the time for this into account.

42Bastian Do not email to bastian42@yahoo.com, it's a spam-only account :-) Use @monlynx.de instead !

Vote

FreeRTOS context switch time

Join the Discussion

Didn't find your answer?