Paging/Segmentation: How really are they implemented?

M

Maria 20 years ago

Hi,

I have read about paging, segmentation and paged segmentation and I believe I have (nearly) understood how these techniques are implemented in hardware. However, I am till confused about the some details which I'll highly appreciated your assistance on.

1- When using pure paging and for a page size equal to 4KB=2^12, each page should be located at 4KB's offset in the main memory. There is no similar restriction with segmentation since segments don't have a pre-set size. What about paged segmentation? Should segments be located at 2^12 boundaries since each segment now is a set of pages [assume page size is equal to 2^12].

2- How can we as users choose any of the above techniques, is there any register to set by the compiler (or the linker, loader)?

3- From what I read segmentation requires the use of the assembly indirect addressing, while each address contains two fields [register segment: offset]. If we are in real mode, register segment content [after probably right shift] determines the starting address of the segment in the main memory and the offset field represents the offset within the segment in the physical memory. However, if we are using protected mode, the content of the register segment points to a segment table which includes the starting address of the segment. Am I right?

4- Now regarding protected mode/real mode. Are they part of the CPU modes which defines the execution mode? How are they related to user mode and kernel mode?

5- Is the segment table part of the CPU architecture?

6- Who set the values of the segment registers and the segment table? I presume the kernel. Does the kernel decide on behalf of the user which INDEX VALUE the code/stack/data/ etc segment is given and set the content of the segment table entries [including in particular the starting address of the segment in the main memory and its size] accordingly? I presume the segment table content should be saved/updated each time a segment is relocated?

7- Sometimes I encounter while using my computer, system messages [like protection errors] showing a similar address to OFFF:XXXX which indicates a very high number of segments in an application , a highly unlikely situation as the number of segments in an application tends to be moderately small. So what 0FFF stands for and why it is as high?

8- Actually, how the CPU differentiates between the next 3 instructions [in protected mode]: Load 3, CS:XXX Load 3, DS:XXX Load 3, SS:XXX

Will each of the above instructions be translated to a different numerical code depending of the type of segment?

If this is case why not skip this "unnecessary" step by binding permanently each segment register to a fixed entry in the segment table? May be this has not been done because an application might have a large number of segments than the available cpu segments registers. As such a particular register will be used for more than one segment and its content should be used to index the segment table. Am I right?

9- The last question about pure segmentation. You can see from the figure available at this link

formatting link

that the address is considered as one unit value, instead of two fields, and I have seen than in many other references and even exam papers the students are asked to find a physical address (in pure segmentation) for a particular virtual address. And the virtual address given is simply one field hexadecimal value, example 0x43 instead of two fields as pure segmentation is described

formatting link

I understand that in paper we have to find the segment index and offset by splitting the address(0x43) into two fields. However why the address is considered as only one field instead of two fields. Will the CPU appends CS content to XXXX while it encounters an instruction similar to load 3, CS:XXXX

Many thanks for your help and sorry for the long message

Regards

Vote

K

Keith Thompson 20 years ago

[snip]

None of this seems to be related to the C programming language. Please drop comp.lang.c from any followups. Thanks.

Keith Thompson (The_Other_Keith) kst-u@mib.org San Diego Supercomputer Center We must do something. This is something. Therefore, we must do this.

Vote

V

Vadim Barshaw 20 years ago

If you are talking about an application program - it's probably nothing you should (and could) fiddle with. The operating system will setup all the necessary memory environment for you.

You didn't specify what processor architecture or OS you have in mind, so there might be other possibilities.

Not necesserily. Aagain, you said nothing about architecture, but from what you write below I assume it is IA32.

In real mode the notation is XXXX:YYYY, where XXXX is the value in the segment register (CS, DS, ES, SS) and YYYY is the offset in the segment. The effective address is calculated as (XXXX modes which defines the execution mode? How are they related to user

The only common thing is the word "mode", they are not related at all. The former concept comes from particular architecture, IA32 (and there is even "unreal mode" ). The latter is probably the terms from Linux operating system -- processor runs in the protected mode, with user programs being executed at privilege level 3 (Ring3) and kernel modules running at level 0 (Ring0). Again, these terms are from IA32, other architectures may have different names for privileged and non-privileged processor modes, as well as different number of them. Usually, CPU instructions that access MMU can only be executed in the most privileged mode.

No, it is an entity that is located in memory and initialised by some piece of code, like BIOS, or part of an OS. But MMU _is_ the part of the CPU architecture, its registers are initialised by the same piece of software.

Generally yes, if you are speaking of Linux. In other OSes it can have some other name - in QNX it is microkernel. But the idea is the same.

Depends on the OS. Say, in QNX you can specify how much stack space should be allocated to the process, and the OS will add the required entry into the page table. But in general, as a user of an OS, you can not specify _where_ you application should be loaded and what particular memory addresses it should allocate. In general. Again, depends on OS. if there is OS at all -- it can be just a single module that has all the system/user code and data, with statically allocated and initialised descriptor tables. The boot loader would only need to initialise MMU registers properly.

Yes, since the table entry generally contains the starting address of memory page.

If it is IA32, 0x0FFF stands for the selector (index), the lower three bits are the privilege level (in this case, 3), AFAIR. Your system does not necessarily have 0x1FF (0xFF8/8) segments allocated, most page descriptors might be blank.

IA32 does not have such instructions. Depending on what you mean under "Load

3", the processor might or might not use a prefix that specifies what selector register to use. Instructions are always fetched against CS, stack (push, pop) is accessed via SS, data is loaded against DS.

If you mean something like "mov eax, [addr]", it is by default "mov eax, [ds:addr]". If you write "mov eax, [cs:addr]" or "mov eax, [ss:addr]" you will load four bytes from the code segment or stack segment, respectively. In non-default case, there will be a one-byte prefix in the instruction.

Yes and no (see above). Just use the assembler of your choice to see what code is being generated. There was even a technique that confused some early disassemblers -- in the binary code specify several consecutive prefixes. Or whas it a processor bug, where 15 CS: prefixes caused processor hangup? Anyone?

Not sure what you mean, but the OS loader (the thing that allocates memory pages, loads the code into memory, adds entries to the descriptor table) does exactly this - it "permanently" fixes the page table for the duration of the process existence in memory (with all its code, data and stack).

No. Selector register is an index into the descriptor table. The entry in the table might specify a memory page or point to a second-level descriptor. So, CS and DS refer to code and data segments respectively, while the layout of these segmens is defined by the descriptors in the table(s). Further, not all memory might be allocated at once -- allocated and unused pages can be swapped to the storage media (hard disk, for example), while new pages might be alocated on demand.

I'll skip it. The only remark is that the slides in question reference the works of "Silberschatz & Galvin" here: . Amazon suggests the following reading: . Oddly enough, all their OS Concepts books have dinosaurs on the cover. Is it because they reference MULTICS in the lectures?

You are welcome.

HTH,

Vadim

PS: I prefer Guinness (extra cold please :)

Vote

Paging/Segmentation: How really are they implemented?

Join the Discussion

Didn't find your answer?