Page sizes

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Are there any processors/PMMUs for which the following would be true (nonzero)?

(pagesize - 1) & pagesize

Re: Page sizes
On 27.06.2020 06:45, Don Y wrote:
Quoted text here. Click to load it

That would imply that the page size is not an integral power of 2.
Very unlikely.

Regards
--
Bernd


Re: Page sizes
On 6/26/2020 11:33 PM, Bernd Linsel wrote:
Quoted text here. Click to load it

Yes, that was the point of the question.

Quoted text here. Click to load it

As, for my needs, this is one of those "fundamental assumptions", I'm
looking for more assurance than "very unlikely" -- just as one wouldn't
pick one SPECIFIC page size and assume it to be ubiquitous (or even
expecting a single page size to be supported at any given instant)  :>

But, I don't see any VALUE in other sizes as it needlessly complicates
any *hardware* implementation (though poses little barrier to a software
implementation!)

Re: Page sizes
On 27.06.2020 09:32, Don Y wrote:
Quoted text here. Click to load it


 :>
Quoted text here. Click to load it
e

Commonly (in all current mainstream processor architectures like iA32,  
AMD64, ARM, MIPS etc.) a MMU divides a logical address at bit boundaries  

into a page address and an offset.
As a result, the page address on these platforms is always a power of two
.

Yes, there _may_ exist some exotic MMUs that let you choose protection  
areas (to avoid the term 'pages') with arbitrary base addresses and  
sizes. This flexibility requires heavily increased hardware efforts and  
cost and complicates an OS's memory management, so it's unlikely to be  
used at all.
One example were the i286/i386's Protected Mode segments, but even there  

was a granularity of 4K/1M, so the assertion 'segment base address is a  
power of two' was also true, you just couldn't be sure each segment had  
the same size. Setting up and maintaining the segment descriptor tables  
was so complicated that mainstream OS's on i386 (NT, Linux) only set up  
the most necessary segments and went on using a flat 4GB address space  
and the page tables of the additional MMU.
Furthermore, using segments slowed down hardware memory accesses  
considerably, that in the '486 and successors added Segment Descriptor  
Caches etc etc.

Conclusion: No, you cannot fundamentally assume that page sizes on any  
existing MMU are powers of two. Hardware designers can implement  
whatever weird and complicated adressing patterns they like.


Re: Page sizes
On 6/27/2020 3:21 AM, Bernd Linsel wrote:
Quoted text here. Click to load it

Yes, that's what my survey has produced.  There are "prefered" page sizes
across architectures and the range of sizes is constrained by (no doubt)
practical implementation issues.

Quoted text here. Click to load it

But this wasn't always the case.  Much of the "adventurism" that was
prevalent in CPU design in the 80's seems to have been winnowed down
("electronic Darwinism?") to the fixed page size implementations that
are commonplace, today (esp wrt devices supporting DPVMM).

And, as an implicit acknowledgement that this isn't "quite sufficient",
we see the introduction of superpages, subblocks, page size choices,
etc. to further complicate the mess.

All targeted to increase TLB reach as working sets get larger.

Quoted text here. Click to load it

Yes, but -- as above -- the trend seems to be towards reducing page-size
choice (flexibility) in the hope that performance hits can be mitigated with
larger TLBs (or smarter resource scheduling).

On the surface, this may (?) be the right approach -- barring a fundamental
change in how developers approach system/application development.  It's
certainly one that silicon developers can more easily wrap their heads around!

Re: Page sizes

Quoted text here. Click to load it

Quoted text here. Click to load it

Minor quibble:

You can't assume the minimum protection zone is power-of-2, but some
systems separate the notion of the protection zone from the allocation
unit.

Every MMU I am aware of has allocation / management units that are
power-of-2.

George

Re: Page sizes
On 27/06/2020 09:32, Don Y wrote:
Quoted text here. Click to load it

If that was the point, why didn't you write that?  Writing a C
expression like that is appropriate if you are writing C code, but it
would make more sense to write what you mean - and /why/ you are asking.

Quoted text here. Click to load it


You might conceivably get an answer "this cpu has non-power-of-two page
sizes".  But you'll never get anything better than "very unlikely" the
other way.  You can't expect anyone (or even a combination of people) in
this newsgroup to have used /every/ cpu design ever made, or that ever
will be made.

If you say why you think page sizes are relevant, however, then it's
conceivable that people could give you more useful answers.


Quoted text here. Click to load it


Re: Page sizes
On 28/6/20 1:21 am, David Brown wrote:
Quoted text here. Click to load it

David, it is Don Y you're addressing.

Re: Page sizes
On Sat, 27 Jun 2020 00:32:00 -0700, Don Y

Quoted text here. Click to load it

Power-of-2 pages are ubiquitous in available hardware.

4KB is directly supported by almost all CPUs.  Some allow smaller
pages, many allow larger pages, but (excepting Alpha and UltraSparc)
it's hard to find something that can't work with 4KB (perhaps by
combining smaller pages).

https://en.wikipedia.org/wiki/Page_ (computer_memory)#Multiple_page_sizes
https://en.wikipedia.org/wiki/Memory_management_unit#Examples

George

Re: Page sizes
On Fri, 26 Jun 2020 21:45:54 -0700, Don Y

Quoted text here. Click to load it

On a decimal (BCD) computer, the page size might have been 10 or 100
words.

However, I much doubt that these ancient computer had MMU hardware,


Re: Page sizes
On 6/26/2020 9:45 PM, Don Y wrote:
 > Are there any processors/PMMUs for which the following would be true
 > (nonzero)?
 >
 > (pagesize - 1) & pagesize

I'm curious why it's important to you. A nice binary round number is  
easy to work with on the programming side and simplifies the silicon  
design side.

JJS

Re: Page sizes

Hi Don,


On Fri, 26 Jun 2020 21:45:54 -0700, Don Y

Quoted text here. Click to load it

Not anything you can buy.

George

Re: Page sizes
Hi George,

Hope you are keeping well...  bad pandemic response, here; really
high temperatures; increasing humidity; and lots of smoke in the
air (but really cool "displays" at night!)  :<   Time to make some
ice cream and enjoy the ride!  :>

On 6/27/2020 2:37 PM, George Neuner wrote:
Quoted text here. Click to load it

I'm wondering if some of the "classic" designs might scale to newer
device geometries better than some of the newer architectures?

E.g., supporting ~100 (variable sized) segments concurrently and
binding each to a particular "object" (for want of a better word).
If the segment management hardware automatically reloads (in a manner
similar to the TLBs functionality), then this should yield better
(or comparable) performance to the fixed page-size approach (if
you assume the fixed pages poorly "fit" the types of "objects"
that you are mapping)

[I think we discussed this -- or something similar -- a while ago]

You still have a "packing problem" but with a virtual address space
per process, you'd only have to address the "objects" with which a
particular process interacted in any particular address space.
And, that binding (for PIC) could be done at compile time *or*
load time (the latter being more flexible) -- or even RUN-time!

Re: Page sizes
On Sat, 27 Jun 2020 15:10:22 -0700, Don Y

Quoted text here. Click to load it

About ~10 years ago  8-)

But you asked about "pages" here, which invariably are fixed sized
entities.  Arbitrarily sized "segments" are a different subject.

If you want a *useful* segmenting MMU, you probably need to design it
yourself.  Historically there were some units that did it (what I
would call) right, but none are scalable to modern memory sizes.

Whatever you do, you want the program to work with flat addresses and
have segmentation applied transparently (like paging) during memory
access.  You certainly DO NOT want to follow the x86 example of
exposing segments in the addressing.


Quoted text here. Click to load it

George

Re: Page sizes
On 6/27/2020 3:35 PM, George Neuner wrote:
Quoted text here. Click to load it

Yes -- but you note that some "modern" CPUs now allow multiple (fixed)
page sizes to coexist in the same address space.  So, it's a matter
of degrees...

Quoted text here. Click to load it

Agreed.  Segments were meant to address a different problem.

OTOH, exposing them to the instruction set removes any potential
ambiguity if two (or more) "general purpose" segments could
overlap at a particular spot in the address space; the opcode
acts as a disambiguator.

The PMMU approach sidesteps this issue by rigidly defining where
(in the physical and virtual address spaces) a new page CAN begin.
It's bank-switching-on-steroids...

[IIRC, I had previously concluded that variable sizes were impractical
for reasons like this]

Re: Page sizes
On Sat, 27 Jun 2020 16:36:59 -0700, Don Y

Quoted text here. Click to load it

Not really.  Allowing this process to have 4KB pages and that process
to have 16KB pages and yet a third process to have 1MB pages (or
whatever) is light years from allowing this process to have 109 bytes
here and 3002 bytes there and that process to have 1061 bytes of which
53 overlap the other process's memory but with different protection.

That isn't "paging".  Segmenting MMUs could/can do sh-t like that, but
most don't provide enough segments - per process or in total - to make
it worthwhile to subdivide memory at such fine granularity.  Only Mill
claims this capability at sufficient scale for a large memory ... but
you can't buy a Mill.


Quoted text here. Click to load it

???  Not following.



Quoted text here. Click to load it

The problem is that you're thinking only about the protection aspect
... it's the subdivision management of the address space that is made
slow and difficult if you allow mapping arbitrarily sized regions.

You have to separate the concerns to do either one efficiently.

That's why pure segment-only MMUs quickly were superceded by
combination page+segment units with segmenting relegated to protection
while paging handled address space.  And now many CPUs don't even
bother with segments any more.

George

Re: Page sizes
On 6/27/2020 10:01 PM, George Neuner wrote:
Quoted text here. Click to load it

In a large, flat address space, it is conceivable that "general purpose"
segments could overlap.  So, in such an environment, an address presented
to the memory subsystem would have to resolve to SOME particular physical
address, "behind" the segment hardware.  The hardware would have to resolve
any possible ambiguities.  (how do you design the HARDWARE to prevent
ambiguities from arising without increasing its complexity even more??).

If, instead, the segments are exposed to the programmer, then the
choice of opcode determines which segment (hardware) is consulted
to resolve the reference(s).  Any "overlap" becomes unimportant.

Quoted text here. Click to load it

Perhaps you missed:

    'You still have a "packing problem" but with a virtual address space
    per process, you'd only have to address the "objects" with which a
    particular process interacted in any particular address space.
    And, that binding (for PIC) could be done at compile time *or*
    load time (the latter being more flexible) -- or even RUN-time!'

You have N "modules" in a typical application.  The linkage editor mashes
them together into a single binary to be loaded, ensuring that they don't
overlap each other (d'uh!).  Easy-peasy.

You have the comparable problem with each segment representing a
discrete "object" being made to coexist disjointedly in a single
address space.

If the "objects" never change, over time, then this is no harder to
address than the linkage editor problem (assuming any segment can
being at any location and have any size).  Especially for PIC.

But, if segments can be added/removed/resized dynamically, then
you're essentially dealing with the same sort of fragmentation
problem that arises in heap management AND the same sort of
algorithm choices for selecting WHERE to create the next requested
segment (unless you pass that off to the application to handle
as IT knows what its current and future needs will be).

Quoted text here. Click to load it

The advantage that fixed size (even if there is a selection of sizes
to choose from) pages offers is each page has a particular location
into which it fits.  You don't have to worry that some *other* page
partially overlaps it or that it will overlap another.

But, with support for different (fixed) page sizes -- and attendant
performance consequences thereof -- the application needs to hint
the OS on how it plans/needs to use memory in order to make optimum
use of memory system bandwidth.  Silly for the OS to naively choose
a page size for a process based on some crude metric like "size of
object".  That can result in excessive resources being bound that
aren't *typically* USED by that object -- fault in those portions AS
they are needed (why do I need a -- potentially large -- portion of
the object residing in mapped memory if it is only accessed very
infrequently?)

OTOH, a finer-grained choice (allowing smaller pieces of the object
to be mapped at a time) reduces TLB reach as well as consuming OTHER
resources (e.g., TLB misses) for an object with poor locality of
reference (here-a-hit, there-a-hit, everywhere-a-hit-hit...)

So, there needs to be a conversation between the OS and the application
regarding how, best, to map the application onto the hardware -- with
"suitable" defaults in place for applications that aren't aware of
the significance of these issues.  This is particularly true if the
application binary can be hosted on different hardware -- or
MIGRATED to different hardware while executing!

Obviously makes sense to design that API in a way that is only as
general as it needs to be; WHY SUPPORT POSSIBILITIES THAT DON'T EXIST?
(Or, that aren't *likely* to exist in COTS hardware?)  IOW, you
can KNOW that:
     ASSERT( !( (pagesize - 1) & pagesize ) )
for all supported "pagesize", and code accordingly!

Paraphrasing:  "Make something as simple as it can be -- and no simpler"

[Time to check the daily briefing on the fire and then go out and take
a look at it...]

Re: Page sizes
On Sat, 27 Jun 2020 23:50:50 -0700, Don Y

Quoted text here. Click to load it

The ancient method for increasing the process address space is to swap
out the whole process into the swap file, increase the size descriptor
and then let the program loader find a new memory area in the physical
memory. Of course this is a heavy operation and should not be done for
every malloc() call :-).

By monitoring the physical memory usage was easy to determine, when a
process extended its virtual memory allocation by observing momentary
swap outs.


Re: Page sizes
On 6/28/2020 3:27 AM, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

In a desktop environment, you typically only have to worry about how
patient the user will be.  And, how tolerant he will be if the system
crashes because it got into a "sorry, I can't fix this" condition.

In an embedded environment (c.a.E), the application has to continue to
operate without that possibility of user intervention.  And, often has
to satisfy timeliness guarantees (though this isn't c.R).

All said, you want algorithms and approaches that have a more predictable
degree of success regardless of the "current situation", at the time.

Re: Page sizes

On Sat, 27 Jun 2020 23:50:50 -0700, Don Y

Quoted text here. Click to load it

Unless you refer to x86, I still don't understand what "ambiguity" you
are speaking of.

x86 addresses using segments were ambigious because x86 segmentation
was implemented poorly with the segment being part of the address.  

A segment should ONLY be a protection zone, never an address
remapping. Segments should be defined only on the flat address space,
and the address should be completely resolved before checking it
against segment boundaries.  



Quoted text here. Click to load it

Overlap is always important, and the x86 method took that away.

Multiple segments can refer to the same address, and segments can be
safely stacked with the top level permission being the intersection of
all the underlying segments.  At any given time, the program needs
control only over what is the "topmost" segment.



Quoted text here. Click to load it

No I saw your reference to "packing problem".  My point is that it
isn't a problem unless you try to use segments as the basis for
allocation, or swapping.  So long as segments are only used as
protection zones within the address space, they can overlap in any
way.


Quoted text here. Click to load it

You do if the space is shared.


Quoted text here. Click to load it

Exactly.  The latency of TLB misses are the very reason for the
existence of "large" pages in modern operating systems.


Quoted text here. Click to load it



George

Site Timeline