RTOS, Virtualization and Paging

Hi folks,

I have got a question about memory protection in kernel mode for x86 processors.

According to Intel's documenations "In Intel VT-enabled system, running guests with paging disabled reduces the isolation of that guest - the guest can read and write all of the physical memory, including that of devices or other guest.".

Here are a couple of questions:

  1. Why?
  2. Are kernel mode applications protected by MMU? I am quite confused because I often hear that "kernel mode process can write to any memory and any device port". So it means that MMU works only for useland applicaions? No memory protection is kernel mode?
  3. If I use segmentation (with paging off), then will it be sufficient to protect my processes (tasks)?

Thanks.

Alex

Reply to
Alex
Loading thread data ...

Because the very reason that you quoted.

On x86, when paging is enabled the MMU translates every memory access with page tables. 32-bit linear addresses (segment base+offset) are translated into 32-bit (36-bit with PAE) physical addresses through multi-level page tables. Along with corresponding physical address for each page the page tables keep attributes, of which the following is relevant to protection: user/supervisor level, read-only/read-write and present/non-present.

A user-mode application (CPL 3) can read and write only pages marked as "user level". It is also restricted by "read-write" attribute. Kernel-mode software usually is not affected by this settings, but it cannot access pages, neither for read nor for write, that are marked "non-present".

Usually when a multi-tasking system uses paging, it sets up a separate set of page tables for every task, creating address spaces. Thus memory that belongs to one task will not be visible in another task's address space - its physical addresses will not be mapped in the other task's page tables with "present" attribute. Exception to this will be shared memory, which will be deliberately mapped to all tasks that need it.

An assertion "kernel mode process can write to any memory and any device port" as is is not true. I/O ports indeed cannot be restricted for kernel mode code, but memory certainly can (and device-mapped memory, too). However, some OS that run applications in user-mode and only OS kernel and drivers in kernel mode map the entire (or as much as it fits) physical memory - starting from some fixed offset. With such a mapping, the claim already becomes true, but only due to particular system's memory management.

You can achive much of the same protection as with paging with segmentation against erroneous access. For willing access the kernel- mode code can always modify your segment tables and create a flat data segment to access everything. Not so with paging: even though the code can find out physical addresses for page tables, it will not be able to guess what linear addresses have to be used to access it: linear- physical translation is generally is not reversable function.

Reply to
Stargazer

Stargazer,

Thanks a lot.

I would appreciate if you could clarify a little bit the following:

For system service speed and interrupt service response time, the kernel is mapped into the processors virtual address space. So if an interrupt is asserted the processor will not have to switch the MMU page tables to service the interrupt. According to my understanding, this mapped kernel code is shared among different processes/tasks. In this way, the poor written driver can always destroy the whole system, not just its application. Is there any way to restrict such a driver to the application process/task only? How can the memory (and device- mapped memory) be restricted for kernel mode code?

As I have just mentioned, if you don't load the OS kernel, then you will have a very sluggish system, because it will need to walk through the pages for many time-critical kernel services. What can be the alternative solution to this approach? Could you provide any examples of such OSs?

Thanks.

Alex

Reply to
Alex

That's a common way to do things in many OS. Please note that usually there is a distinction between processes and tasks: tasks are scheduler's entities that get CPU time according to some rules and processes are resource management's entities that get address spaces, file descriptors and other resources. In many OS a process may have one or more running tasks (threads).

But this is not the only possible design.

There is a microkernel (nano/pico-kernel) approach to this: drivers run as user-mode processes. Such a process is granted access only to resources (ports, memory) that belong to its device. This way the system is not affected by a malfunctioning driver or even device. Of course, this is paid by increased interrupt handling latencies - IRQ handler or bottom half with consume 2 context switches, but there's free cheese only in a cage :-)

On recent x86 CPU ("recent" are actually 486+) a context switch'es cost may be eased by mapping the core kernel in global pages, that are not flushed from the TLB during task switch.

You're mixing up things a bit: in your previous post you wrote about applications in kernel mode (which in many embedded OS run in kernel mode and may be only in some way isolated with address spaces as I described), and now you talk about the kernel. With any design some kernel core that includes at least the scheduler, the memory manager and CPU/MMU management code will have to be globally mapped. However, drivers of any kind may be made user-mode or address-space separated kernel processes. QNX, Integrity are examples of such OS.

I am not a big fan of microkernels, especially in embedded systems, where in many cases a failure of one device will be important enough to render all the system unusable. I think that the safety achieved with user-mode device drivers is not that valuable, but latencies costs are inherent. Advocates of such approach claim that the latencies costs are not that big, and that existing monolothic/kernel- only OS are written so poorly that they can bit that OSes by latencies even with microkernels.

I found the following third-party benchmark:

formatting link
Well, if a monolithic and kernel-only VxWorks indeed wastes 13 us for interrupt reporting and 19 us for task switch on a 300 MHz CPU (meaning about 4000 CPU cycles ?!), such claims are not impossible.

Ironically, among popular desktop OS where, on the countrary to embedded, the latencies are much more tolerable, safety is much more demanded and the user in most situations will appreciate an OS dealing with single device failure rather than bluescreening, there are no microkernel OS.

Reply to
Stargazer

Stargazer,

Thanks a lot again. Good explanations. Now (almsot) everything is clear.

Alex

Reply to
Alex

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.