Determinism in malloc (et al.) [long]

- D
- D Yuniskis
  
  Contact options for registered users
posted
14 years ago

Thu, Mar 4, 2010 11:09 PM

Hi,

(sigh) It appears that there is a mistaken belief that malloc() et al. is non-deterministic and, as such, shouldn't be used -- or CAN'T be used (e.g., in RT applications). This is true only if you have no knowledge of how the allocation algorithm within "your" malloc (or associate) works *or* have no knowledge of how you will be making (and releasing) allocations.

Malloc (et al.) is a piece of code. As such, it is very predictable (unless it includes some *truly* random function :> ). I.e., if you watch *all* invocations of it (and it's "free()" counterpart(s)), you can *predict* the results of the next request at any time. You don't doubt the results of sqrt(9.0), do you? Else your machine is broken!

If a managed resource is shared (e.g. a heap), then this is complicated only to the extent that you now have to watch multiple consumers knocking on malloc's door. But, from that *doorway*, your observations and deductions remain as viable as if a single consumer was using it!

The same holds true if multiple developers are writing the code to invoke it. I.e., presumably they can "compare notes" as to their individual memory allocation requirements. Indeed, if they want to avoid the possibility of having to deal with a dry heap, they *must* do so (otherwise, how will anyone know how big to make the heap??)

Note that each of these above caveats can trivially be dismissed. I.e., none of them are "requirements" for the use of malloc. They are just different ways or environments that malloc *might* be used. None of them deny the assertion that you *can* predict how a particular allocation request will be handled.

If, on the other hand, you use malloc in a willy-nilly, undisciplined fashion, it gets harder to keep track (at design time) of how -- or even *if* -- these requests will be handled.

As an example of how malloc *does* provide deterministic behavior SUITABLE FOR USE IN A RT ENVIRONMENT, consider the following example. (throughout, imagine the naive implementation of malloc set forth in K&R(1) and how

*it* would behave in these examples -- it is not only *deterministic* but yields results in >>>CONSTANT

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Mar 5, 2010 1:29 PM

You need to learn to use smaller examples!

It is extremely difficult to figure out what you are trying to do in your described software, or why you are doing it in that particular way. So it is only possible for me to give very general opinions.

When building something that had layers of menus and dialog boxes, I would aim to have them entirely statically allocated using constant structures build into a tree (I have done this for text-based menus).

If I needed dynamic memory here (perhaps the text data is stored in a serial flash rather than the main code, for example), I would be using a stack rather than a heap. My "malloc" would be a pointer increment, and my "free" would be a pointer decrement.

Whether or not that is sufficient for your application is impossible to say here. But I would use similar strategies to remove as many general dynamic allocations as possible to break down the memory usage into manageable, controllable and verifiable parts.

What you seem to miss here is that malloc (in combination with free - unfreed initialisation-only malloc is a special simple case) is much more complicated than things like maths functions. A division function may take different amounts of run-time depending on the values passed to it, but it is a stand-alone function that does a specific job within a known (or at least knowable) timeframe. You pass it the same values, it will give you the same results after the same time. It is independent of how many times you have called the function in the same program, and it is independent of the function's use elsewhere in the program.

This makes it an easily manageable problem to determine the correctness of the code - either your processor is fast enough to do the required number of divisions within the required time, or it is not fast enough.

malloc is a totally different kettle of fish. The running time, and the success or failure, of malloc (and free) is totally dependent on previous calls and other uses of the functions within the code. This makes it hugely difficult to determine the correctness of the code (whether it will work or not) and the run-time characteristics (necessary for real-time requirements).

This is not an impossible task, but it is very difficult to do properly, and requires very careful design and restricted use of malloc to be feasible. The majority of developers are either unable to do it properly, or unwilling to spend the time and effort required to do it properly.

Compare this to another related issue you mentioned before - how do you know your stack is big enough? There are two ways to do it - one is to work through your call stacks (manually or using automated software) to establish the largest possible depth of stack, add on your interrupt stack depths, and reserve enough space for that size of stack. The second way is to fill up your memory with marker values, run the code, measure the high water mark, and add a margin for safety.

In reality, this second method is what most people do (if they even do that much). And in a great many cases, it is good enough, if your test runs cover all cases (as they should) and your margins are appropriate.

The problem with malloc is that you cannot do the same thing there - yet people try to. At best, you have a probabilistic game showing that since the code worked in the lab, it is likely to work in the field. But since the patterns of allocations and fragmentations vary so much depending on the run-time situations, you can never have an appropriate level of confidence in test-based verification.

It is a different matter once you are dealing with bigger systems with support for paging and memory management (to counter fragmentation issues), combined with more sophisticated algorithms and fewer (if any) hard real time requirements.

I agree that we should not naïvely :-) perpetuate myths - I just don't agree that the disadvantages of malloc is a myth.

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Mar 5, 2010 10:22 PM

Well, trivially, if all of your allocations are LIFO, then predicting the behavior of a semi-sane malloc() is easy. Of course in that case why not do even better and use your implementation's version of alloca()? Or actually implement a stack-based allocator that couldn't help but have better performance than malloc()?

But the whole point of a memory manager is to allow more complex allocation patterns than LIFO. And as soon as your allocation/free patterns are no longer stack like, heap fragmentation becomes a potential issue, and the time (and even possibility) of a successful malloc becomes intimately dependent on the actual prior sequence of allocation events - and if those are in any way non-predictable (let's say some happen in response to external events), it quickly becomes very complex.

Which is not to say that malloc, even in the face of unpredictable allocation patterns is per-se unusable. You can for example, if you know the maximum number (N) of objects and the maximum size (M) of any object (including the typical internal memory allocation control blocks), trivially demonstrate that a heap of (N+1)xM bytes cannot be fragmented into a condition where an allocation cannot be satisfied. And one can do rather better than that with a better allocator (for example, if you can determine a maximum number of objects segregated by size, you can write an allocator that splits the objects into pools, which can result in a much lower total space requirement for avoiding deadlock).

And while I'm no curses expert, how can you be so sure that there are not some internal structures the curses uses that are going to be maintained in a non-LIFO manner? Perhaps an internal window management structure might include a list of pointers to subwindows, and that list might grow as the maximum number of subwindows attached to a window at any given time increases - which could quite trivially lead to some heap fragmentation. Mind you I don't know that, but without a very detailed analysis of the source code of your implementation of curses, how can you be sure (I'm assuming there isn't some bit of curses documentation that actually makes this claim)? And certainly some of the stuff commonly associated with curses, like the form library, are chock full of things that look like they'd cause fragmentation if not used with the most exquisite care (and again, not being familiar with the internals, it's hard to say if the guarantee exists in *any* circumstances).

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Mar 5, 2010 10:26 PM

Whoops - I meant (N+1)x(2M-1) bytes.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 7:44 AM

As I said elsewhere, disclosing an entire application lets others see the *design* issues involved.

How would you share the memory used for the various layers of menus in the example I proposed? The base screen needs:

typedef struct { unsigned char character; unsigned char attributes; } cell;

cell base_screen[rows][columns];

(note rows and columns are determined at run time and can *vary* between invocations of the menu system -- e.g., if it is invoked with a termcap of an 80x25 display vs. a 132x24 display, etc.)

This memory isn't used unless the task is activated. So, it must be encapsulated within a union of some (?) other "variables" that *won't* be used when it is active.

Then, you have to think about the first layer dialogs that can appear atop this:

union { cell Communications[8][20]; cell System[15][30]; }

(i.e., this union is 450 cells)

(I'm making these numbers up -- I didn't think it worth the trouble to dig them out of the code)

Note this is in addition to the "base_screen".

Then, you have to look at the second layer of dialogs FOR EACH OF THESE FIRST LAYER DIALOGS. So, If the Communication dialog had been invoked, you would have second layer dialogs:

union { cell Serial_Ports[10][20]; cell Network[15][60]; cell Printer[6][30]; }

(i.e., this union is 900 cells)

whereas if the System dialog was active, the second layer would involve:

union { cell Date[10][40]; cell Time[6][26]; cell Log[20][60]; cell Power_Management[8][15]; cell Security[10][20]; }

(i.e., this union is 1200 cells)

The same process continues for the *third* layer of dialogs, etc.

Using these (bogus) dimensions, the total number of cells needed for any particular layer in the menu hierarchy that I illustrated would be (only the first two layers shown):

Communications 160 (8*20) Serial Ports 360 (8*20)+(10*20) Network 1060 (8*20)+(15*60) Printer 340 (8*20)+(6*30) System 450 (15*30) Date 850 (15*30)+(10*40) Time 606 (15*30)+(6*26) Log 1650 (15*30)+(20*60) Power Manage 570 (15*30)+(8*15) Security 650 (15*30)+(10*20)

(note that these ignore the size of the underlying base_screen since we don't *know* how big that will be a priori)

So, when the Communications dialog appears, you have used 160 cells out of your static allocation (recall that union was 450 cells). When the System dialog appears, you have used 450 cells (the biggest element in that "first layer" union.

The extra 290 cells in that first layer union can't help you satisfy the needs of second layer dialogs in the case of the Communications menu hierarchy. I.e., that's just a "hole".

To make this hole available for use in other parts of the menu hierarchy, you have to nest the respective unions. E.g., something like:

union { struct { cell Communications[8][20]; 1st layer "Communications" dialog union { 2nd layer choices: cell Serial_Ports[10][20]; "Serial Ports" union { stuff invoked from the "Serial Ports" dialog } cell Network[15][60]; "Network" union { stuff invoked from the "Network" dialog } cell Printer[6][30]; "Printer" union { stuff invoked from the "Printer" dialog } } } struct { cell System[15][30]; 1st layer "System" dialog union { 2nd layer choices: cell Date[10][40]; "Date" union { stuff invoked from the "Date" dialog } cell Time[6][26]; "Time" union { stuff invoked from the "Time" dialog } cell Log[20][60]; "Log" union { stuff invoked from the "Log" dialog } cell Power_Management[8][15]; "Power Management" union { stuff invoked from the "Power Management" dialog } cell Security[10][20]; "Security" union { stuff invoked from the "Security" dialog } } } }

This is just plain *stupid*! It brings all this stuff together needlessly when it *should* be kept in separate modules (make sure to replace all the constants here with MANIFEST constants and include *all* of the various headers -- Communications.h, System.h, SerialPorts.h, Network.h, Printer.h, Date.h, Time.h, Log.h, PowerManagement.h, Security.h, etc. -- so you can

*reference* all of those constants and other bits and pieces that might be referenced in the "stuff invoked from the..." portions that I haven't bothered to fill in!

Contrast that with: =========== BaseScreen.c =============== Base_Screen() ( rows = tgetnum("li"); cols = tgetnum("co"); thePad = create_pad(rows,cols);

...

key = get_input(); switch { case Communications(); break; case System(); break; }

delete_pad(thePad); } =========== Communications.c =============== Communications() { thePad = create_pad(8,20);

...

key = get_input(); switch { case SerialPorts(); break; case Network(); break; case Printer(); break; }

delete_pad(thePad); } ===== handle create_pad(rows, cols) { ... return malloc(sizeof(cell)*rows*cols); }

delete_pad(handle) { ... free(handle); }

etc.

You can't possibly believe the mess of unions and structs will give you a more robust and maintainable implementation? And, it doesn't give you any more efficient use of memory

*or* time!

Great! Show me how to modify the above structs to fit my needs. And tell me why it is so much better than my *cleaner* implementation!

Something that is *easy* to do with my *tiny*, isolated modules vs. this giant kludge that tries to implement a static allocation scheme.

When you call qsort, you get a different result depending on all of the items that were inserted into the prior to its invocation. Yet you don';t shy away from using it!

If I pass malloc the "same values", it will give me the "same results". It behaves. The "problem" that scares you off is there is more to keep track of. It's like deciding if you can afford to use a recursive algorithm when you are already operating "deep" on the stack.

Yeah, just like inserting items into a list and then sorting. But, you know what got inserted (i.e., what requests were issued and released) so you know what to expect from it as a whole. If you have inserted 100 items, the sort will take (statistically) longer than if you only have inserted 50.

But, you can sit down with pen and paper and figure how long it *will* take.

And, in the examples I have shown, it is *predictable*. (the LIFO menu system usage, the buffer pool emulator)

And, because of that, we say "don't use it"? Instead of showing *how* it can be SUCCESSFULLY used -- so folks can draw parallels to their own applications?

Why would you need a margin if you have "cover[ed] all cases"? You are implying that you *haven't* (hence the fudge factor).

You should *know* how your code behaves. This is especially true in an embedded application where you can't "tweek" things later.

I love recursive solutions. But, I use them where I can precisely control the recursion depth by the choice of arguments that I apply to the algorithm. My comments in the code will explain the cost of these algorithms in a manner that can be pugged into a formula.

No. You are assuming unconstrained, undisciplined, UNENGINEERED use. You can stop that menu system at any point in its execution and look at the heap and see that it *exactly* matches its predicted utilization factor. Invariably. BY DESIGN.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 8:51 AM

You can't use auto's because you don't know how big they will necessarily be a priori (e.g., you don't know the dimensions of the screen you are mapping onto).

I contend that you can predict the behavior of *any* malloc -- whose source you can inspect.

--^^^^^^^^^ not *guaranteed* to be an issue! The prohibitions against using malloc/free act as if malloc was guaranteed to cause you grief. I claim that it is just yet another mangeable design issue. Just like interrupts, NMI, recursion, floating point, etc.

But, *you* write the code. *You* decide how malloc is called. Don't let "random" things access the heap in unpredictable ways (unless they always malloc & free before terminating).

Exactly. My "malloc" allows the caller to specify the free list management algorithm, fragment selection strategy, processing applied to a selected fragment (e.g., trimmed, and how, or not), release strategy, etc. Tweek a few trivial calling options and you can make *a* heap (I support as many as you want to define) act like a buffer pool (where buffers can be identically sized *or* have various specified sizes), K&R malloc, a FIFO pool, etc. It's all the same code with very little overhead to support all those options.

I wrote it. (there were no open source implementations

25 years ago) The published interfaces are consistent with a LIFO style -- create's paired with delete's. Of course, the developer can ignore the suggested structure of the interfaces and do things ad hoc. But, I opted to impose the discipline of pairing/nesting these calls so that a "closed menu" leaves no evidence of having ever been "opened".

Again, my DESIGN philosphy called for this to be the case. A menu is closed/terminated before any underlying menu can be re-exposed. Menu creation allocates resources. Menu deletion naturally suggests the right point for those resources to be released. Everything nicely pairs up in a strict hierarchy so you *know* every "push" has a corresponding "pop".

If you were to look at curses and see *how* it could be used (looking just at the API) to build a menu system; and, how that "hypothetical" menu system would need to be used to ensure that it, also, layered nicely; then you would see that malloc/free fit perfectly within this framework. So much so that to use any other approach would be *more* work.

My point in this example is to show that malloc/free *can* be used in cases where RT performance is an issue, can provide deterministic behavior and make a problem *simpler* than it would be with more "manual" memory management techniques.

The problem lies when you think malloc can be used willy nilly by "anything" -- hey, you're writing the code, *you* decide how it is going to be used! :> Then look to see if its implementation fits with your intended use.

I consider the heap important enough that my "task_create()" takes "heap_size" as an argument (along with "stack_size"). It's just *so* nice to say, "gee, I need a chunk of memory that is X+Y*Z bytes long to do something right now...". As long as I remember what *not* to do, it's perfectly safe to use.

- B
- Boudewijn Dijkstra
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 1:12 PM

When an interrupt comes in, the OS might push the registers on the stack of the process being swapped out, hence the need for a margin. But this should be a known value, not a 'fudge factor'.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 5:09 PM

Exactly ------^^^^^^^^^^^. Assuming the system doesn't employ a "system stack" vs. "user stack".