The Embedded Heap

Tim Wescott · 2013-06-21T17:45:09+00:00

I know this is a dangerous question to ask, because often people tend to run on emotion with this issue. I ask, because _I_ tend to run on emotion ("hey! That guy suggested using "new"! Where's the torches and pitchforks?!!"), but I'm contemplating an application where a traditional heap -- if I understand its working correctly -- may actually work out, even in a 24-7 application. I know I've worked around applications where heap management was an issue -- my first job out of college was at a company whose product ran under DOS and which had to reboot the machine every night at 1:00AM in no small part because the heap became hopelessly fragmented and the only way to fix it was to bail out of the application and resume. (It wasn't a leak. We had an instrumented heap, we could see the free space leveling off while the largest available block just got smaller and smaller until finally -- poof! things didn't work anymore.) So I'm looking at an application where allowing use of a heap would let me use some canned code (for linear algebra -- it's mathematically intensive stuff I'm doing). The heap usage is going to be a set of stuff with disparate sizes, but each element in the set is always going to be the same size from iteration to iteration. Some of the matrices are going to be around forever, but most are going to come and go with each iteration of the loop. My feeling is that if the heap manager is intelligent enough it should be able to fit all the short-lived, same-size stuff back into spaces from which it came, even if the long-lived stuff is kinda in the middle. I can see execution time being an issue, but if the structure of the heap settles out after a while then that time should be bounded to some maximum. So -- does this make sense? Does anyone have any suggested reading for me to do, to understand this? My alternative is to hand-allocate storage for a bunch of intermediate steps, and probably have my code doing backflips to...

The Embedded Heap

Jun 21, 2013 41 Replies

Rate this thread:

Assuming all your objects are equal sized and your allocator tries to find the smallest fit, somewhat prefers lower addresses over higher ones, the request stream is balanced(one free to one malloc etc...), and you also assume that before stability different sized (smaller)objects have been allocated, the gaps inbetween the objects will be at most the neighboring allocations' size minus one plus some (constant) overhead. consecutive runs of free() will create additional holes of n_consecutive_free*obj_size.

so what you get as a worst case upper bound is about:

n_objects*(obj_size+slack_per_obj)+n_objects*(obj_size-1+slack_per_obj)+n_consecutive_free*obj_size

The first product from the allocated objects, the second one for the gaps in between, the last one accounts for a streak consecutive free() calls. This is somewhat around two times the total size of all objects, (neglecting the free()'s.. assuming they are few...)

If/where possible, try to preallocate storage for those "persistent" objects. This gets them out of the way early so the holes they punch in the heap are avoided.

A lot depends on the implementation of the memory manager. (There are a LOT of different strategies used for different needs) E.g., if your application is the type that will NOT exhaust available memory and tends to run as a "one shot", then a MM system can simply satisfy all requests (since we've decided it has more than enough memory) and defer the cleanup/coallescing/compaction until the first request is not *immediately* satisfied (from the "virgin" portion of the heap).

Other MM's might coallesce/compact after each release (free). Others might satisfy different requests from different portions of the heap (i.e., small requests from one area while larger ones come from another area), etc.

E.g., in the past, I've had my MM do a fair bit of sanity checking on requests and releases to help guard against bugs/failures. For example, "Why are you asking me to free() an address that I already think *is* free? (i.e., contained within the current free-list)" or "What possible use could you have for a *one* byte allocation???"

There is a *lot* of published data on memory management algorithms. Unfortunately, much of it is geared towards one particular use -- or another (most often, desktop-ish environments). A better approach might be to get a good handle on exactly how memory will be consumed by/in your application. Then, look at how a MM *might* handle those requests and releases "in an ideal world". Chances are, things aren't as hostile as you suspect!

You've not disclosed the actual *algorithm(s)* that will be involved. I.e., if the allocation/release *pattern* is constant or exhibits variation from one invocation to the next.

Since it is *an* algorithm (not a *system* of competing, asynchronous processes), it is more likely that the structures that you allocate will tend to be allocated in a fixed order/sizes each time (barring conditionals in the code that try to optimize the computation by omitting or enhancing particular aspects of the computation). If so, two separate runs of the algorithm would essentially leave memory (in-use as well as heap) in essentially the same general state. Any sort of cleanup would then prepare the heap for the next, *similar* use.

[I.e., this could be ripe for deferred cleanup/GC]

A modest compromise might be to set up multiple arenas and patch the "COTS" code so that certain types of objects are allocated from certain arenas. E.g., if you have a few different types/sizes of objects, you can create an arena for each type thereby ensuring that allocations of one type don't poke holes in the region from which another, larger type is allocated.

[I currently use a MM that lets the caller define the arena, fragment selection strategy, how/if the selected fragment is "trimmed" to fit the request, where released allocations are reinserted into the free list, whether they are coallesced with adjacent free-list members, etc. It allows memory to be treated in a unified manner -- I can create "buffer pools" using the same mechanism that I manage variable length allocations and tune performance accordingly (e.g., constant time services where needed!)]

Your specific requirements will, of course, vary based on your data, resources available, etc.

Why not try a couple of dry runs and watch to see what happens to memory? Perhaps instrument "malloc/free" to log their actions so you can more readily do a post-mortem and compare results of multiple runs? See if there is anything you can identify in the

*data* that you are providing to the algorithm that makes things better or worse.

HTH,

--don

The Embedded Heap

Join the Discussion

Didn't find your answer?