Control over memory alignment

D

D Yuniskis 15 years ago

Hi,

My "unified memory manager" allows the client to specify the alignment for the fragment it is expected to return (this differs from malloc's more general assurances).

A client can (in appropriate circumstances) rely on behavior like: region1 = allocate(..., size1, ...); region2 = allocate(..., size2, ...); ... deallocate(..., region1, size1+size2, ...); (think about it)

In light of the fact that an alignment request can bias the location of the returned fragment, it seems like I need to also include an "end_alignment" parameter to force appropriate padding into an allocated segment (since the client would otherwise have to have information about alignment requirements and consequences for each service request).

I.e., if size1 was 25 and 4-byte alignment was requested for both regions, a hole would exist between region1 and region2 (there are lots of obvious bad things associated with this :> )

I can't see any other way around this (?)

As a separate issue, should the client interface be confined to force all requests to be page bounded (where possible)? Or, should this be an option available to the client (since the client

*could* have implicit knowledge of page wiring)?

Thx,

--don

Vote

R

Robert Adsett 15 years ago

That makes the assumption that the allocated regions are sequential does it not?

Should not an allocation return the amount of memory requested plus any additional needed for alignment? Otherwise the client incurs additional complexity having to deal with undersized allocations. IOW any padding necessary occurs on allocation. If that means wasting some memory at the beginning so be it.

Robert

Vote

D

D Yuniskis 15 years ago

Correct. Some of the elided parameters govern which heap is used, the criteria used t select the fragment from the free list, whether the fragment is trimmed to size, etc.

An application can implicitly know what the state of the heap ("a" heap) is and, therefore, know how each particular request will be handled.

Correct. An IN/OUT lets the service report the actual size of the allocated fragment. The client tracks this -- so there is no clutter *in* the allocated fragment (as in the classic K&R implementation) nor "behind the scenes".

There are two issues here.

First, if you return a particular fragment and *then* a memalign() pointer, the application has to deal with both of these (i.e.,it has to hold the pointer to the fragment start -- for it's ultimate release -- plus the aligned pointer. As such, there is no saving in the above otherwise taking the fragment pointer and memalign-ing it.

If you return a fragment aligned accordingly, then you risk having some small piece of memory that is trimmed from the start of the free fragment that remains in the free list after the allocation.

But, the issue I was addressing occurs "after" all this. I.e., if you request a chunk that isn't a multiple of the alignment granularity, the memory *beyond* this fragment will begin at an unaligned address.

I.e., consider the case with a virgin heap and two 4-byte alignment requests for 25 bytes -- the first fragment will be [N,N+24] and teh second will be [N+28,N+52] leaving a hole at [N+25,N+27]. If the heap can't represent that tiny "hole" fragment, then the memory manager can't release the [N+28,N+52] fragment but must, instead, release something like [N+32,N+56] (so that the resulting hole is big enough to hold any overhead bytes)

If the caller can specify an "end_alignment" (as opposed to a

*start_alignment*), then it's knowledge of what it is likely to ask for *next* can alleviate this problem (and force the "hole surplus" to be appended to the first allocation, etc.)

Vote

R

Robert Adsett 15 years ago

One way or another you have to track this either in the memory manager as is conventionally done or in the application as you have apparently decided to do. It wouldn't generally require two pointers unless you want to deal with arbitrarily large alignments (which would seem a bizarre requirement). Indeed, practically speaking you may be able to encode the block size and misalignment in the same variable if space is an issue. I suspect 4 bits is sufficient for encoding misalignment offset in a lot of practical cases, I can't imagine needing more than

16 bits.

I don't see what the third allocation is any different from the second.

The biggest issue I see, and you are already well down this road from the looks of things, is that you are essentially moving all of the logic from the allocator to the application rendering the allocator as a rather pointless entity.

Robert

Vote

D

D Yuniskis 15 years ago

Correct. No free lunches.

If you track in the memory manager, then you are essentially duplicating effort since the application is given this information and *does* hold onto it -- even if it doesn't realize it is doing so *explicitly* (e.g., if application allocates a chunk of memory to use as pushdown stack for a task, it obviously records the location of that stack

*somewhere* so the task can use it; if application allocates a chunk of memory for a buffer, it has to remember where that buffer *is* in order to be able to use it...)

Difference is the application is now *responsible* for remembering these things (and must do so, accurately). Note that there are times when the application can successfully *forget* these things... for any memory that it wants to permanently retain!

Sure. But, you aren't doing anything *for* the application (client). I.e., allocator could just return *a* pointer and have the application do:

pointer_to_be_saved = allocate(...); usable_pointer = memallign(pointer_to_be_saved, alignment);

Yes, but then you have to give the application a tool to unpack the "misalignment" from the block size (or, hope it never needs to access either of those and just hold onto them for the eventual "deallocate()")

Sorry, I was trying to point out the size of the hole that the allocator would have to create in order to satisfy the request.

I take advantage of the space *in* "free list fragments" to store the overhead for managing those fragments -- unallocated. So, I need to ensure that any fragments that remain in the free list are large enough to support the overhead required to track them

*in* that free list. Think of a fragment (while IN the free list) as being:

typedef struct { fragment_t *next; fragment_t *previous; size_t size; byte unused[]; } fragment_t;

So, the allocator must be sure that it doesn't "trim" a fragment from the free list in such a way that the portion retained is smaller than sizeof(fragment_t).

[unless the allocator is invoked with the "NO_TRIM" argument]

Allocator implements mechanism; application implements policy. Allocator encapsulates all of the mechanisms to "manage" memory based on criteria provided by the application.

Allocator can't know how the application will use memory. Application shouldn't be bothered with details of sorting and searching free lists, etc. (any more than it should be bothered with knowing how to restore the state of the next task to execute).

The whole point of this approach is to give the application better control over memory management without forcing it to handle all of the details of sorting free lists, selecting best free fragment, trimming that fragment, verifying that a deallocated fragment doesn't overlap a fragment that is already in the free list (i.e., bug), etc. E.g., I don't need a separate API to manage partitions/pools. I don't need to support a realloc(). etc. Careful use -- PLANNED USE -- of the allocation and deallocation services gives the client all of the facilities that it needs to *predict* (deterministic behavior) how the memory manager will handle any given request.

Vote

B

Boudewijn Dijkstra 15 years ago

Op Sat, 12 Mar 2011 03:53:32 +0100 schreef D Yuniskis :

If a client can allocate two regions in two steps (calls), then why can't it deallocate them in two steps?

Similarly, if a client wants to deallocate two regions in one step, then why doesn't it allocate them in one step?

Why? Does your memory manager not keep track of the actual allocated size of each block?

How about keeping it simple?

How about just giving the client exactly the memory it requested, or NULL ?

Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/ (Remove the obvious prefix to reply.)

Vote

D

D Yuniskis 15 years ago

Sure! The point I was illustrating was that it doesn't

*have* to *AND* that it can count on the allocations to be contiguous (if the application is designed that way). This is something that isn't true of, e.g., malloc()

What if the first allocation needed to be grown (enlarged)?

Memory manager only tracks sizes of fragments in the free list. Application is responsible for tracking sizes of fragments once allocated. Note that this is only important if the fragments (or portions thereof!) need to be returned to the heap. Why track something if you don't have to?

The above question deals with the case where the memory manager can satisfy the request by giving the client a fragment THAT SPANS A PAGE BOUNDARY. Some clients might not care about this (e.g., willing to risk a page fault) while others might prefer, instead, to constrain the manager to satisfy the request within a single page (or "the fewest number of pages").

Vote

B

Boudewijn Dijkstra 15 years ago

What is gained by that freedom?

Indeed. But I still don't understand why you need two allocations for one contiguous block.

Are you saying that you want your memory manager to re-allocate both regions, since they should be contiguous? That is madness.

Could be handy for debugging, e.g. when a process goes haywire. Some RTOSes even use a magic end-mark at the end of the allocation, to see whether it gets overwritten, and at the beginning a header with size and ownership information. For safety systems, things like this are mandatory.

In (the real-time part of) an embedded system, a page fault should not occur. Therefore, I imagine you need at least two memory managers: one that swaps and one that doesn't.

Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/ (Remove the obvious prefix to reply.)

Vote

D

D Yuniskis 15 years ago

[attributions elided]

My example was intended to show that the behavior of the allocator is predictable -- unlike other memory allocation schemes.

For the specific example, an advantage of this is that the client need not track *two* separate allocations if it knows (and relies upon!) that the two allocations will be contiguous. I.e., the client need only track the pointer to the start of the first allocated fragment and the total combined size of the allocations. It

*knows* that the two allocations can be represented as a single -- larger -- allocation. [indeed, this is a guarantee that other allocators don't explicitly make]

And, that it can then, *later*, decide to treat the allocation as any number of allocations (i.e., if the manager allows two allocations to be treated, externally, as *one*, then it can just as easily treat *one* allocation to be treated as *many*).

So, to wrap words around code, you could do:

buffer = allocate(..., size1, ...); space = size1;

// use all/most of buffer for some purpose; decide more is needed

foo = allocate(..., size2, ...); space += size2;

// do stuff and decide even more space is needed

foo = allocate(..., size3, ...); // note foo is clobbered space += size3;

// free up unused portion of allocation (*half*, in this case)

deallocate(..., &buffer[space/2], space/2, ...); space /= 2;

// finish up

deallocate(..., buffer, space, ...);

[note there are bugs in this if you want to be pedantic; I am merely trying to show how you can incrementally allocate *and* release memory and that the allocations need not line up with the releases *and* that the client is only bothered with tracking *one* chunk of resizable memory]

If your needs change (increase). This is ideal for a single/set object that has variable dimensions. But, it also is great for allowing two clients to share a single heap without risk of fragmentation starving one (or both) clients -- by forcing allocations to be contiguous.

No. This eliminates the need for realloc(). By exporting control over the allocation algorithms to the client, the client can *know* that the second allocation will be contiguous with the first. With other managers, there are no *expressed* guarantees -- all you know is whether the allocation succeeded or failed and a pointer to it (in the case of success). Even if you are the sole client, there is no guarantee that you will receive contiguous allocations!

Of course! But, in production, it's superfluous. Just like "foo" in the above example (i.e., the client could *verify* that foo == &buffer[size1] just to assure itself that the allocator was behaving properly)

But you needn't make that distinction! The two managers run essentially the same code.

A client can request an allocation and that can be satisfied in a deterministic manner (regardless of whether or not the pages are both present). The client, knowing the use to which it be putting the memory, can suggest that the allocation fit in the smallest number of pages. The client can then fault the pages in and wire them down.

*Or*, the client can NOT care about the number of pages and/or whether they are wired down and just say "give me this much memory".

The memory manager runs essentially the same code but with different selection criteria. No need for an entirely different piece of code.

Just like managing buffer pools need not be any different from managing variable sized blocks of memory -- same code, different criteria.

Vote

U

upsidedown 15 years ago

Is this a real problem ?

Why not just create two memory pools, one with byte boundaries and the other for IEEE double precession doubles (8 bytes) ?

For instance one growing downwards with 1 byte alignment and the other growing upward with double float alignment ?

Vote

D

D Yuniskis 15 years ago

Because you might want successive allocations to be from the *same* pool (heap). Think: "locality of reference". It's also more burdensome on an application to *explicitly* examine the alignment requirements of a particular request (based on what it's going to stuff in the fragment that is eventually allocated) and decide whether to access "heap A" or "heap B".

How do you handle cache-line alignments? Or, page alignments?

Vote

Control over memory alignment

Join the Discussion

Didn't find your answer?