factors affecting context switch time

Just to add some extra information, ARM processors starting with the v6 architecture (ARM1136 and later cores) have tagged TLB's and also have VIPT caches (Virtually Indexed, Physically Tagged) so that no flushing is required at a context switch. The drawback of the physically tagged caches is that they require a TLB look-up to get the physical address before looking into the cache. This might not make any difference with the modern, pipe-lined, processors though.

There is another thing to consider for OS's like Linux (not that Linux can be used for hard real-time) - the application code and read-only data pages are loaded from the filesystem on demand. I.e. the application initially starts with only few pages loaded/mapped and when branching happens to a location in a different page, the kernel traps the prefetch abort and loads the new page into memory, mapping it into the task's address space (on some architectures, this requires the flushing of the whole TLB). This could cause significant delays. Another situation is the malloc'ed memory which Linux doesn't really allocate until it is accessed (can use calloc instead which forces the write). Even if you don't have the swap enabled, Linux on MMU systems can remove read-only pages from RAM if it runs short of available memory.

For some ARM cores (pre v6 architecture), the MMU or MPU (Memory Protection Unit) needs to be enabled to be able to use the caches.

--
Catalin
Reply to
Catalin Marinas
Loading thread data ...

"vivekian" skrev i en meddelelse news: snipped-for-privacy@g47g2000cwa.googlegroups.com...

All of the above!! Kernel version is most important then CPU.

Basically you need to measure.

Amongst the tools are "Linux Trace Toolkit" which will give detailed trace of the kernel, then there is "Oprofile" which will give a more summary/statistiscs view and there is a user-side app called "Hourglass" which might be close to what you need (if you use 64 bit, you will have to fix it).

I did some measurements just today and I am not sure I belive them yet: according to those it takes 50 usec on a 4-way Opteron box. Hmmm, nah no way!!! Or Maybe that is why the papers on latency dried out around 2003-2004 as the 2.6 kernels began to work. The problem went away.

formatting link
formatting link
formatting link

maybe you want schedutils to play with cpu affinity

formatting link
(it is a package for redhat/debian/suse)

irg affinity can be set with catting to "/proc/irq//cpu_affinity

hope this helps.

Reply to
Frithiof Andreas Jensen

though this does explain a lot about how pages are loaded into RAM , cant corelate it as to the effect it has on context switch time -- makes it slower ?

Also , another question -- maybe should post it as a different thread .. There are some lines of C++ code for which the number of clock cylces has to be measured. At present using RDTSC instruction. Is there some way to make sure that the code runs uninterrupted so that a true picture is available ? Or is there some tool available to measure how much time the code takes to run ?

thanks, vivekian

Reply to
vivekian

Just ignore ludicrously high values.

DS

Reply to
David Schwartz

[ I limited this post to the "embedded" and "realtime" groups. ]

You can find some references to some such tools at

formatting link
Don't know if they are exactly the kind of thing you need, though; mostly they are aimed at smaller systems with small real-time kernels.

--
Niklas Holsti
Tidorum Ltd
 Click to see the full signature
Reply to
Niklas Holsti

The effect would be similar to what cache/TLB flushing during context switch does: The observed end-to-end context switch time (i.e. time between last instruction in one task and first instruction in another) would not be affected, but execution of user-level code would slow down, or, more precisely, execution of user-level code would pause for some time at unpredictable points in time. Compared to the effects of TLB/cache reloading, delays caused by demand paging tend to have a more coarse granularity, so are more visible, but the effect is more or less the same (user code slowing down).

Obviously, this can only happen in an MMU based system which, as you suggested earlier, does not seem to be your primary focus (but then, why are you posting to so many linux related newsgroups?). Also, in Linux, there are the mlock() and mlockall() syscalls which can be used to avoid delays due to demand paging.

What is the goal here? I suspect that the motivation for such a benchmark is to be able to predict the timing behavior of some piece of code, but for that, you have to consider the real-world conditions for which this prediction should be made. Also, what exactly do you want to determine: worst-case execution time or some average execution time?

To determine average execution time for a piece of code, just run it many times so the presumably few cases of the code being interrupted are evened out. Or, as David suggested, ignore the "freak" values. (But here we go again: is being interrupted occasionally part of the real-world situation the code will eventually run in? If so, ignoring those "freak" values would be wrong.) The average time that you can determine this way may be used as an input for some heuristic reasoning, but keep in mind that the worst-case execution time that same piece of code may exhibit can easily exceed the average by several orders of magnitude!

Rob

-- Robert Kaiser email: rkaiser AT sysgo DOT com SYSGO AG

formatting link
Klein-Winternheim / Germany
formatting link

Reply to
Robert Kaiser

formatting link

Context switching times should still be a minor overhead, even in realtime systems. The diference in this case, I would argue is the OS, not the hardware. A good RTOS that performs well on one hardware platform will likely perform well on another platform. But a poor RTOS on one platform will not improve much even when moved to a "better, faster" platform.

If you are just talking about an embedded realtime system, then I think the cross posts to LINUX OS groups are not appropriate. I'll go over to the realtime group to follow this further. Ed

Reply to
Ed Prochak

All very interesting but not as enlightening as browsing theough the pages you could get from tossing "rdtsc instruction" at a search engine.

--
JosephKK
Gegen dummheit kampfen Die Gotter Selbst, vergebens.  
 Click to see the full signature
Reply to
Joseph2k

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.