Avoiding deadlock in malloc

- D
- D Yuniskis
  
  Contact options for registered users
posted
14 years ago

Tue, Mar 2, 2010 6:06 PM

Hi,

[no idea why I bother including c.realtime as it's deader'n a doornail, there! :< ]

I'm thinking of implementing a "wait until satisfied" option in malloc() [call it by some other name soas not to muddy the typical expectations of malloc(3)]. In essence, if malloc() *would* return NULL, it, instead, *blocks* waiting for the Heap to be able to satisfy the request.

The first (naive) strategy I considered was to block all future malloc()'s -- i.e., don't let anything else be withdrawn from the heap -- and only allow free()'s to proceed. *Naive* in that it assumes the resources the first caller awaits are going to be released "soon".

*Dangerous* in that it is rife with potential for deadlock! :<

So, the only realistic implementation is to allow malloc() and free() to proceed unimpeded. However, I can hook all free()'s so the blocked request(s) is reissued prior to return-ing.

I've not decided if the free() should then preempt the current task or simply perform the deferred allocation and return to its caller. I also haven't decided how to handle multiple requests (queue based on task priority? size?)

[Note that this can be implemented efficiently and need not bear the cost of an embedded "malloc()"-like invocation.]

Notice that the behavior blocking-for-request differs from moving this activity up to the caller (i.e., call malloc, get NULL, lather/rinse/repeat).

Are there any other issues that I am missing? Or, dragons to be wary of?

Thanks!

--don

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 2, 2010 6:07 PM

Trying to reinvent Java?

VLV

D Yuniskis wrote:

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 2, 2010 6:27 PM

No. Just trying to promote to the interface layer anything that the application can't "emulate" in some other way. :-/

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 2, 2010 7:48 PM

So if my code _could_ gracefully deal with not being able to get data I'm still screwed? Gee, thanks.

So if your code wants 10000 bytes and mine needs 10 I'm screwed? Gee, thanks.

I hope that you're planning on a "blockable malloc" and a "plain ol' malloc". Better, put the "engineer" in "software engineer" and design your system to not use the heap.

If you're going to make a blockable malloc, you should probably also prioritize the malloc, so that malloc requests from higher priority tasks get first choice of what's available.

But I think you're better off just not using malloc at all unless you don't care about the system being real time.

--
Tim Wescott
Control system and signal processing consulting
www.wescottdesign.com

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Mar 2, 2010 9:31 PM

Perhaps you neglected to read: "... OPTION in malloc..." Presumably, the designer knows when *to* invoke said option and when *not* to.

You'll note I abandoned this idea.

And, in the idea proposed (below), your allocation could still be satisfied -- unless it competes overtly with "mine".

In *any* allotment of resources, there are winners and losers. If you *need* your "10" (always), then you can either (you being the developer):

- declare it static

- allocate an arena (heap) for the exclusive use of your task/thread

- coordinate (implicitly or explicitly) your use of a shared resource to ensure (by design) that you can fulfill this need

- add more re$ource$ to the device

Again, "option".

Not all systems can be designed a priori with static allocation. Or, if they are, they end up using more resources (or have fewer options) that they would, otherwise.

It is a fallacy to assume that malloc precludes real-time system designs. That sort of argument extends to imply that you can't design a real-time system with a network stack. Or a serial port. Or any other "thing" that can't be deterministically controlled in the development effort.

As with any RT design, you carefully decide which actions

*require* deterministic responses; which of those are HRT vs. SRT; and which things can be done "as available". You can also design the system so that it *handles* missed deadlines gracefully -- even rearranging the workload to prevent such overruns from happening in the future.

E.g., crt0 brings up the RTOS. Do *all* of the objects used by the RTOS have to be statically defined? Is there something that recludes building task contexts from the system heap (instead of declaring them as static)? Is there something that precludes creating per-thread stacks from that heap? Is there something that precludes creating per-thread

*heaps* from that heap?

Depriving yourself of a tool (e.g., the Heap) is something you should only do if you *know* it won't work -- or, if you can't figure out how to craft your application to use it effectively within the constraints imposed by it. It's like deciding *universally* not to use long doubles "because they are too slow"...

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 3, 2010 8:32 AM

I suspect that the majority of the embedded systems I have written over the years have had a serial port - /none/ have used malloc. (I'm talking about small/medium systems here, not systems with an MMU, "big" OS, etc.)

You are mixing up "dynamic memory" and "malloc" here. malloc/free is only one way to implement dynamic memory - it aims to be very general and convenient, but is almost always non-deterministic in run time, and has a lot of overhead in time, ram space, and code space.

Additionally, it is often very hard to predict if malloc allocations will succeed or not, and failure to allocate memory can easily mean failure of the application (and I don't just mean if the programmer forgets to check the return value - there may be nothing sensible the program can do if 0 is returned).

Thus when a small and/or real-time embedded system needs dynamic memory (such as for handling network buffers), you typically use specialised code using statically-allocated space.

Apart from the special case of only using malloc at startup, and never freeing the memory, you should treat malloc with extreme caution in an embedded program - it is just too difficult to be sure your program is really correct.

That's true enough - while malloc is totally inappropriate for sections that need RT response, it is certainly possible to use non-deterministic functions in non-RT parts. However, malloc really has to be considered an unreliable function, not just non-deterministic, since it may fail at unexpected times. And unreliable functions have no place in an embedded system.

Why would you even want to do this - except as a way to work around an inappropriate dynamic memory system? It is far better to figure out what your system needs, and if you need dynamic memory then write code that you know will work in your system. If you know that you will have

8 tasks that need large buffers, and you only have ram space for 4 at a time, then declare 4 static buffers and write code to control allocation of them - it will be deterministic, reliable and predictable.

Allocating memory during startup and initialisation, and never freeing it, is a special case of dynamic memory. Even then I would not use malloc - I'd work hard to figure out a way to allocate the memory statically, and failing that would write my own allocator. But it is certainly possible to use malloc in such cases - the run-time is manageable, it is easy to test for correctness, and runs should be repeatable. Of course, you still waste time, ram and flash space in malloc overhead.

Yes, and depriving yourself of a pneumatic drill in your electronics lab is something you should only do if you /know/ it won't work....

Dynamic memory /can/ be a useful tool, but it should be considered dangerous because it is often unpredictable. It is always better if you can find a clearer solution which you can easily see is correct. It is certainly true that dynamic memory is sometimes the cleanest solution - but even then, standard "malloc" should generally be avoided. Consider your program's dynamic memory needs, and write appropriate code for it - you can almost always do better than malloc.

It is a good thing that you are thinking about the problems in malloc, and want to write something better. Just don't call it "malloc", because if it avoids the failings of malloc (in the context of small embedded systems), then it is no longer malloc.

- B
- Boudewijn Dijkstra
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 3, 2010 9:25 AM

Op Tue, 02 Mar 2010 19:06:14 +0100 schreef D Yuniskis :

What is this talk of "heap" I keep hearing whenever malloc is mentioned? The C standard, nor any POSIX/Unix specifies that a heap must be used.

Shouldn't the scheduler take care of that?

Wake all waiters in malloc, let the scheduler activate them in order, check if allocation can be satisfied (and check for timed-out requests), go back to sleep otherwise.

Look how modern commercial RTOSes do this.

formatting link

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)

- J
- jacko
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 3, 2010 11:55 AM

Lists linking an malloc? This is a bad excuse. The pascal book was fine for learning, but (void*) allows so much more. An array of (void

*) statically defined. It's the odd size requirement that make malloc slow, and the overhead of say 4K per list item with a page allocator is just stupid. I thought list dual (void*) structure was so that recycling of nones would be by making a free chain. This can then be pseudo quick sorted by a b ranging to combine local list items into allocatable cache lines.

;)

Cheers Jacko

- W
- WangoTango
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Wed, Mar 3, 2010 2:31 PM

Now, I am pulling this out of LONG unused memory locations, so it might be less than the gospel, but......

You can thank Intel/segmented architecture for that little bit of terminology. The memory models used on the PC has some of them using a 'near' or 'far' heap for memory allocation. The near heap was all in one segment and could be accessed quickly using one segment register, BUT it was limited to 64K. The far heap, on the other hand, was only limited by available memory but had to use segment math to get to, so it was slower. So, I think the term heap was coined to denote a pile of memory available for dynamic use. Pull it out of the heap and toss it back on when done.

I'm sure I have some part of that wrong, and will be corrected by the authorities on such matters in record time. :)

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 4, 2010 10:41 PM

malloc(3) requires a size_t per allocated buffer. Hardly "a lot of overhead". Note that a pool/partition needs a size_t per pool, a count per pool, and a flag per buffer. Plus the consumer still needs a void* as a handle to the buffer. I.e., any sort of dynamic allocation scheme has overhead. Claiming that malloc's is "a lot" more than any other seems prejudiced.

You are assuming a preconceived notion of how malloc "must" be implemented. And, how it "must" be used.

There is no reason that malloc(3) be implemented in the naive K&R(1) method. (recall I began this thread with "malloc() [call it by some other name soas not to muddy the typical expectations of malloc(3)]") A *lot* of research has been done on memory allocation strategies and their performance characteristics. The only potential problem lies if you can't *see* how "your" malloc works (in which case, write a new one!)

No. You do that only if the "task" ALWAYS needs that space and NEVER needs to share it.

A common silly practice is to use a partition/pool to implement "fixed size buffers" to "guarantee" deterministic performance in RT deployments. Doing so means you need a new set of hooks into the API (create_pool, allocate_buffer, free_buffer)

*and* the code to back those up.

Instead, use a malloc that supports multiple arenas (i.e., as simple as passing a pointer to a "Heap" struct to a "newmalloc()"). If you *always* make requests for the same size (which the concept of a pool/partition just

*formalizes*), then those requests will always be granted (unless you request more of them than there are space for... the equivalent of "using up all the buffers in your pool/partition), there will never be any deleterious fragmentation *within* that "heap", AND the request will be satisfied in CONSTANT TIME (using a trivial malloc like K&R(1)).

I disagree. How do you know you have enough stack allocated for your program/tasks? Do you just keep increasing the allocation until things "don't crash anymore"? No, you

*engineer* the solution! You look at your code to see what the maximum call tree looks like and the needs of each level in that tree.

The same sort of discipline can be applied to how you use malloc (read: "some dynamic memory allocation scheme"). Look at the example that I present in the "Malloc and determinism" post. Think about how you would do that with:

- static allocations

- pool/partition

- *less* total memory I think you will be hard pressed to implement as rich an interface in a given set of resources! And, as easily *maintained* as this (i.e., creating a new dialog or even dialog *tree* requires only knowledge of the maximum total screen resources required)

---------------------------------------^^^^^^^^^^^^^^^^^^^^^

See please my separate post re: determinism and malloc. This is flat out WRONG!

Then do you consider the following as "unreliable", also:

- division (denominator might be zero!)

- sqrt(argument might be negative)

- addition (sum might overflow)

- subtraction (see "addition")

- recursion (might recurse too deep)

- pushdown stack (might overflow)

Of course not! Instead, you *understand* the criteria that govern how each is used and live within those rules. I *rarely* test for zero before dividing. Instead, I try to craft the math so that the denominator *can't* be zero (so, why test?).

Do you test each add/sub to see if the operation overflowed? Of course not! You *engineer* your solution so that these things can't happen (or, you *test* the results if they can!)

Why can't I use malloc in a way that I *know* it will behave as I *need* it to behave (after all, *I* can WRITE the damn thing since it sits *atop* the OS in most applications :> )?

Agreed. The fault of your argument lies in the assumption that malloc *is* unreliable. If that were the case, then it would have to be a truly random function!

Instead, it follows a set of rules (which depend on how your particular malloc is coded). As long as your expectations of its behavior are consistent with those rules, there is nothing to fear. Do you avoid using the addition operator out of fear that someday

2+3 may equal 7 (instead of 11)? :>

You are using a simplistic notion of what a task's needs will be ("4 static buffers").

If you have lots of resources, then you can waste them. (MS has a job for you! :> ) The tough part of engineering comes from making limited resources *act* like more than they are.

This is true of all of the hacks we use (e.g., fixed point arithmetic to get around the cost of floating point; sorted lists to optimize decision making; etc.) -- it is even the basis of your "shared static buffers". But, as the resource requirements become more complex, the tricks to managing them also become more complex!

Here's a current real application by way of example:

I'm designing a "network audio client" to deploy throughout my home. I need about a twenty of them so I want them to be cheap. :> I need them to be low power because they are PoE powered (there are limits on the power available to the PD). I need them to be *tiny* (a few sq in) so I can tuck them

*inside* junction boxes (i.e., inside the walls) and not have lots of stupid little boxes lying around the house.

They are RT devices. *H*RT as missed deadlines become audible events. Imagine how undesirable dropouts would be when listening to your "stereo"! And, imagine how potentially damaging those dropouts could be (to your "speakers") if they happened at a high rate (e.g., square waves :> )

There are several aspects to such a design. You need to be able to pull packets off the network (using *some* protocol) and buffer them locally (since they might not arrive "in order"). Their integrity needs to be verified (you don't want crap getting pushed out to your speaker(s)!). You need to enforce some security protocol (so "unwanted content" doesn't make its way onto your devices). Their "tonal quality" may require modification (else you have to do this "in analog" or force the server to take on that duty on your behalf).

You also need to be able to configure/control each of these characteristics. I.e., some sort of control channel must coexist alongside the "data" (audio) channel.

Finally, you need some way of providing temporal synchronization among the various "nodes" so the sound coming from each "emitter" agrees with others.

I have a little board that supports a pair of speakers (actually, it will handle 4 but there just isn't enough PoE available to drive 4 speakers with any significant volume -- and, 4 at very low volumes doesn't seem very practical; remember, each would require a *connection* to the board as well!). In some deployments, it *will* drive two speakers (e.g., imagine a "boom box" sitting on the counter in your kitchen/bathroom). In most others, it will drive *one* (a speaker in the ceiling in each corner of the room).

I can write the code so that it can be *told* (at time of manufacture or even at "run time") how many speakers (channels) are attached to it. Or, I can have it look at it's actual loads and determine this for itself.

This is important as it determines the amount of memory that will be available for each channel's buffers. (at 2 sq in, it should be obvious that this is an SoC implementation so "adding memory" is not an option :> ) The amount of memory available for each channel's buffers is important (critical!) as it determines the amount of "packet jitter" that can be tolerated in the transmission protocol -- if a buffer runs dry, the music stops! :<

But, these little devices can't control the other end of the wire! Nor the other traffic *on* the wire!

I keep a watchful eye over how much "reserve" I have in the buffers for each channel. If things start getting low, I know I have to do something. Letting the buffer run dry *abruptly* is worse than letting it run dry *gracefully*. E.g., if it *appears* that the buffer *will* run dry, I begin attenuating the signal going out the speaker. I'd much rather it seem to "fade out" than "click off". (actually, I'm lying here. There are some tricks I use to delay the decision point to *beyond* where it should actually be)

Once this has happened, I have lots of time to figure out how to

*fix* things :> (by "I", I mean "the code running in the little client) before I complain to the user (also "me" :> )

The cause of the problem needs to be identified, first. Has the server crashed? Try to talk to it. Ask other nodes if *they* can talk to it. In which case, there is nothing I can do locally to get back on line. Power down the amplifier and wait for things to come back up (after logging an error at the syslog host). Let the server know of my plight and perhaps it will inform the PoE injector to power *me* down.

Have I lost connectivity? (switch failure?) Try talking to peers to see if they are still available. Ask *them* what their status is (if they are still "operational", then this must be a server bug -- perhaps the thread servicing my requests has died?).

Has the network become congested "temporarily"? Check the performance statistics of other protocols to see if they have been exhibiting signs of extraordinary delays. Check to see if data *is* starting to arrive again.

Note that the (considerable) resources needed to perform these diagnostics need not exist at (normal) run-time! Since they are only invoked when the buffers have run dry, I have all of the resources represented *in* those buffers available to use (dynamically reallocated!) in my diagnostics! :> Trying to do so with static allocations just makes the code more brittle and difficult to maintain! Setting aside these resources ALL THE TIME is just plain wasteful.

This last case is the most commonplace -- the network/server gets overloaded *temporarily* and you get a period of very long service delays. It will typically sort itself out -- usually rather quickly. I can note the duration of the "outage". It may be just a bit longer than I was willing to wait before I decide to apply the pad! :<

So, I can cautiously start accumulating data, again. And, after determining that the transient has passed, I can resume normal operation (I don't want to do this "too soon" as I don't want to "run dry" a second or two later -- the human will get annoyed as the audio clicks on and off and on and off)

Note that I also have to re-synch with the other clients before turning the audio back on. So, I will end up discarding much of the "new data" (audio) that I accumulated...

I can notice this pattern and decide extraordinary measures might be warranted. I can seize resources that had preciously been set aside for other purposes (e.g., the "command interface") and disable them temporarily. If I am servicing *two* speakers, I can merge the buffers (note that the fixed memory available in the device means that a "stereo" device can handle *less* jitter than a mono device as the memory used must handle two signal paths instead of one) and drive both speakers with the same (mono) signal -- yeah, it's not how things are *supposed* to work but *some* sound is better than *no* sound!

With larger buffers, I can hopefully bridge more jitter (longer packet interarrival times). After a (short) while, I will probably see buffer utilization at 100.0% again. And, see packets arriving promptly -- indicating the network/server is behaving better.

Now, I can start *shrinking* the amount of space set aside for buffers with the (original) expectation that I will get the data that I need in time for *when* I need it! I can unwind the extraordinary measures taken to keep the music on (e.g., restore "stereo" effect, command interface, etc.) and return to "nominal" operation.

I suspect, given a *lot* of time, one could come up with a set of bizarre data structures/unions that will seemlessly let this sort of dynamic resource RE-sharing happen. Or, I can use an interface and programming style that already supports this sort of thing. free(3) the resources used by those "other" aspects of the device that are "just not as important" as the audio signal path when things get hairy; then restore them later once sanity has returned.

Likewise, use those "ineffectual" resources when the system has "crashed" (misnomer) to help re-tune its future performance.

And what would you call *your* allocator? :> *This* is mine! :>

(again, I remind you of "malloc() [call it by some other name soas not to muddy the typical expectations of malloc(3)]" ;-)

If you are going to have *a* malloc in your runtime (for other uses), then the TEXT space is already "wasted" (your term, not mine -- since I consider malloc to be a useful tool; just "difficult" in its traditional implementation). Unless you know a priori (compile time) the actual memory needs of each of those things (e.g., "how many speakers are attached to my device?"), then you too will "waste time" computing those requirements in some ad hoc manner.

The "RAM" requirements I have already mentioned are minimal. You are only concerned with the per fragment costs. Memory that remains in the heap has no visible overhead. If you want to operate as if the allocated memory is NEVER freed, then you could conceivably violate your contract with malloc/free and use the space set aside for bookkeeping (cool, eh? three sets of doubled letters!) as if it were part of the allocated fragment itself (this presumes a K&R style implementation).

And, documenting your particular "memory allocator" since you clearly have indicated you don't want to call it malloc. :>

I would argue that a pneumatic drill has limited use in an electronics lab (unless you are fabbing boards). OTOH, malloc has a potentially *big* role in a software project.

I'm waiting to hear of the "cleaner" solution to the menu example. I've got several other "deployed" examples you can work on after that! :>

I believe the problem is with expectations of memory management shortcomings. Too many "old wives tales" about what you can and can't do. Most of which fail to mention the caveats that

*enable* you to avoid those problems.

Note that my "[call it by some other name..." comment was in the FIRST SENTENCE of my original post. If I had referenced it there as createFragment(), few folks would have grasped my intent -- and the pitfalls I was avoiding -- as quickly as by using "malloc".

And I'm *still* waiting for an answer to my OP (deadlock). :>

--don

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 4, 2010 10:54 PM

Hi Boudewijn,

(wow, every time I type that, I have to triple check my spell> Op Tue, 02 Mar 2010 19:06:14 +0100 schreef D Yuniskis

I think it's just one of those terms that has migrated into the vernacular over the years. E.g., I see no mention of it in System 6 (which is about as far back as I can conveniently go). Note, also, that memory was a different beast in the early days of UNIX (e.g., sbrk())

Yes. I'm talking about an efficiency hack. I lump memory allocation *in* the OS instead of on *top* of it in a standard library. As such, the memory management routines can avail themselves of the hooks that are available to the OS.

So, free can know about malloc's implementation. And, know that it is possible for some request(s) to be pending "within" malloc "fragment(s) larger than presently available". Since free "knows" that it is adding to the amount of available memory; and, since free()'s actions may result in the addition of a "large enough" fragment -- or *creation* of a large enough fragment (as the freed region is coallesced with existing fragments within the heap), it would be easy for free to peek at the "pending request" list to see if its actions would enable any of those requests to be satisfied. Then, instead of returning directly to the caller, it can return "through" malloc deliberately handing the newly created "large enough" fragment to whichever request awaits it.

It's just "faster".

Timers are handled independantly -- since you want those tasks blocking with timeouts to be restarted when their timers expire regardless of whether or not you have any malloc/free activity.

I am still uncertain about what strategy to use when stuff is added to the heap (by free). I.e., try to satisfy *any* pending request? Try to satisfy the request of the task with the highest runtime priority? Try to satisfy the first/oldest request? etc.

Thanks, I'll have a look to see if they have anything pertinent.

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Mar 4, 2010 11:00 PM

Not sure I understand all of what you are saying... :-(

This is not for deployment in a PMMU environment (though it could be). The cost of maintaining the linked list of fragments within the Heap is a void* per fragment (i.e. pointer to the "next free fragment". This is stored in the fragment(s) themselves. And, since they are "free" (unused), this is "no cost".

The overhead associated with allocated fragments is a size_t. Instead of accumulating the overhead in a kernel structure (as would be commonplace for large systems), the costs are distributed *with* the allocated fragments (this is a vulnerability as a careless application can scribble on this "(should be) protected" data and screw things up. (we have a name for this sort of behavior: it's called a BUG!)

Unlike paged environments, the fragment (whether allocated or free) is always "mapped". So, there is no risk of thrashing when trying to walk the free list, etc.

Single link. You never have to go backwards (unless your free list maintenance strategy sorts the list using other criteria).

- B
- Boudewijn Dijkstra
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Mar 5, 2010 10:52 AM

Op Thu, 04 Mar 2010 23:54:22 +0100 schreef D Yuniskis :

When I travel abroad, (especially France and some regions of the US) I sometimes call myself "Bo". It saves time and prevents people from "embarrassing" themselves trying and failing time after time, and getting into a bad mood.

IPA: /?b??d?????n/

- /??/ sounds like the 'o' in 'to bow', but shorter;

- /?/ doesn't occur in English, the 'r' in 'red' comes close phonetically (it is still a 'w', but not as long and vowely as the English 'w').

Fair enough.

Surely the highest priority task is the one you want to hold up waiting for the shortest time possible. If a lower priority was there first, then it can wait a little longer, because it is just a lower priority.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Mar 5, 2010 1:29 PM

malloc() itself does not /require/ any overhead, in that there are no C standard requirements here. But while almost all implementations of malloc do need minimum of a size_t per allocation, it is quite possible to think of faster or more deterministic implementations of malloc that require more overhead, or implementations supporting additional features.

Additionally, many malloc's use pools of memory, which leads to more overhead, and they will often align or round-up the allocation blocks leading to more waste.

Whether all this adds up to "a lot more" overhead than other implementations of dynamic memory depends on the circumstances, of course.

I admit to being biased towards thinking of how malloc is often implemented, and how it is often used. There are certainly many ways to implement dynamic memory - even for a general allocator. But there are always tradeoffs - faster or deterministic algorithms will require more code and/or more ram.

I am not saying that malloc is "bad" as such - it certainly has its place. And better implementations are always welcome. But for small embedded systems, I believe it is normally a better idea to re-structure your code to minimise the need for dynamic memory, and to use application-specific code for the areas that still need some sort of dynamic allocation.

You've got that backwards, unless I've misunderstood you.

RAM space costs money in embedded systems - but being unsure about whether your software is correct or not costs a great deal more. So you only share memory if you are absolutely sure that there is no conceivable way that the two tasks will never conflict in their memory needs. Even then it is seldom worth the effort. It is rare enough that you can handle the situation with a statically allocated "union", or some casts and a couple of wrapper functions.

Correctness first - efficiency is secondary.

That's also an argument against sophisticated efficient malloc's - KISS, so that you know your software works.

It's also an argument against dynamic memory in general - dynamic memory is a common source of errors in programming. It's easy to make mistakes with it, and debugging those issues is much harder than debugging with statically allocated memory. Therefore it is easier to get your code correct (through good design, debugging, testing, proof of correctness, code reviews, etc.) when you can avoid dynamic memory as much as possible.

This sounds like you are taking a clear and simple fixed-size buffer allocation scheme and trying to squeeze it into the syntax for malloc().

That might be a useful approach if you are converting malloc-using code, but I would much prefer that a different scheme had a different name and interface.

If you avoid recursion, stack usage is reasonably consistent and deterministic (things like interrupts cause some variance that you must account for). But the reason you need dynamic memory is to handle run-time variations in memory needs - this makes it much harder (but not impossible) to analyse.

I've posted more on this in that thread.

Note, however, that I have been talking about small systems where your code can (and should) be written to have minimal use of dynamic memory, and that those few uses should have specific allocation functions. If you are talking here about a larger and more general system, then malloc may be appropriate - a single reasonably good general allocator will be better than many specific allocators. But you are also talking about parts of a system with minimal real-time requirements, making malloc less of a problem.

The big difference here is how you you can tell when these sorts of functions are safe. As you say, you craft your maths so that you know your calculations will be correct - that's typically something that is done locally in the code you are working on. But the safe use of malloc depends on its use throughout the program, and the order of allocations and frees throughout the run-time of the program. You have to be /very/ careful about how you use malloc and free if you can ever hope to be sure that your mallocs will succeed when you need them.

My expectation of malloc (and free) behaviour is that it will take an indeterminate time, and may return nothing useful. It's not often that I have use for a function like that - but I certainly agree that it's fine to use it within those expectations.

My notion of what that task's needs will be is from the specific example I gave.

No, the tough part of engineering is making sure that what we make works correctly. If we can do it using less resources, so much the better - but correctness comes first.

Once you have a design (not necessarily coded) that is correct, you can start to think if it is possible to save resources without risk - if that is an appropriate balance of engineering costs versus production costs.

I don't mean that embedded developers should /waste/ resources. But you must always balance the pros and cons before trying to optimise resources.

I have said several times that there are times when malloc is appropriate, and I am talking about smaller systems and real-time systems, rather than "complex" systems (though these are all very vague terms).

I can't see dynamic memory being involved in any of this so far. You need a few buffers if you want to support packet fragmentation (though it would be much more sensible to simply avoid fragmentation at the server end), but these don't come from a heap.

I see buffers here, but I don't see anything whose size would vary at run-time. Your fifos would have a maximum depth that is specified at compile time, and be statically allocated.

Or are you suggesting that you might sometimes want a larger output fifo for handling synchronization, and sometimes more network buffers for handling fragmented packets?

A couple of options to balance your buffers is a case for a specialised memory allocation scheme (which would be extremely simple if you can decide everything at startup). I still don't see any use of malloc here.

You are trying too hard. That's okay if this is just a system for fun, but you are making a very complex solution to almost-imaginary problems (fragmentation won't occur in practice, for example, and can be better avoided by appropriate ICMP messages, MTU settings, and the don't fragment bit). KISS - that's how you write code that you know works.

I would say that it is better to write a system that (for example) supports 1 second jitter and 6 channels than 1 second jitter with 6 channels, 3 seconds with 2 channels, etc. If 1 second is not good enough, then you need more ram - end of story.

Of course, I'm simplifying here - any particular case has its own requirements.

Your word ordering is wrong - malloc has a role in *big* software projects.

There are good reasons why safety and reliability related embedded software standards such as MISRA ban or advise against dynamic memory in general, and malloc/free in particular.

There /are/ times when malloc is appropriate. But it should be avoided on small/medium systems when there are safer, faster, more predictable and easily verified alternatives.

It is true that malloc can be useful, and can be used safely. But these "old wives tales" are largely based on real-life practice - most programmers do not use malloc in a manner that would be considered safe and reliable in an embedded system. It is better to consider it a "dangerous" function and look for more appropriate solutions. It may turn out that malloc is, in fact, the best solution - but that's an unusual case.

You avoid deadlock by using a dynamic memory allocation system that provides the memory your program needs, when it needs it. If it does not do this, then the allocator is broken, or your program is broken, or you have too little ram, or the original design specifications are broken.

And if you can't be /sure/ that the allocator will provide the memory you need, when you need it, then your allocator is broken, or your design is broken.

Your allocator may also provide memory when the program /wants/ it, rather than /needs/ it, but that's optional. Your code should run fine within the required specifications (but not necessarily with optional or non-essential features) with a malloc implementation of

void* malloc(size_t size) { return 0; }

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 4:40 AM

When I order pizza, I tell them my name is "Fred -- with a 'P'" Saves the hassle of them misspelling my name on the ticket -- and then the delivery driver mispronouncing that misspelling! :-/

I'm confused. Is it a fricative? I should just push this through my speech synthesizer but that will add it's own artifacts :<

I let callers optionally pass a Timer and/or Mutex to each routine. So, if you (the caller) *know* there are no competitors, you can skip the Mutex. And, for "guaranteed to run to completion" routines (without competitors) -- like "release()" -- there is no need for a Timer.

The whole point is to let the developer get the subsystem to fit *his* needs instead of the other way around.

That *seems* right but I haven't thought it through to its conclusion. E.g., if the lower priority task *had* arrived earlier *and* memory had been available, it wouldn't have blocked.

Presumably, it's blocking had some effect on the higher priority task eventually queuing up in the allocator as well.

If the lower priority task *had* managed to gain the memory requested *when* originally requested *and* the higher priority task eventually came along and blocked, the lower priority task would have had to inherit its priority in order to "get out of its way" (else deadlock as the higher priority task would effectively be blocked by the lower priority task -- forever).

So, perhaps the *right* solution involves *whichever* task is granted the resource inherits the priority of the highest waiting (blocked) task. This *seems* right. Which leaves the only decision to be "which task to give the resource to"

Dunno. I have to think through the various scenarios to see which is least prone to deadlock (hence the reason for the post). Its these sort of "little" decisions that cause things to break in very *grand* ways. :<

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 7:28 AM

You *need* a size_t per allocation -- even if all your malloc does is release the *entire* heap in the first request made of it (somewhere, you have to track the size_t of that initial heap!)

And exposing those pools as *visible* pools (with their own special API) somehow makes *them* use less memory than when they are buried within a malloc wrapper?

Or, having to maintain several different sized buffer pools to try to more closely match the size of the request to the size of the "whole buffer" allocated?

Or, do you want to compare a *sophisticated* malloc to a

*naive* buffer pool implementation? (lets at least *try* to compare apples to apples)

No! That's where you are caught in your assumptions about malloc. generalizing that "malloc is bad because..." pays a huge disservice to the wide variety of applications and developers out there. You're lumping everything into one bag and then claiming it "smells bad".

The menu system example uses a *naive* allocator -- a page of code including the "free()" routine -- and is both determistic, uses *less* code (than a static allocation -- think about it!) and RAM and executes in CONSTANT time. Because it was

*designed* that way.

Your words:

"That's true enough - while malloc is totally inappropriate for sections that need RT response, ..."

Note that the menu example *has* RT requirements and meets them 100% of the time!

I don't see it that way. My first couple of products (i4004 days) were forced to deal with purely static allocations. It was a huge PITA trying to figure out when you could reuse each *byte*. I find it infinitely more efficient to deal with blocks of memory that I can freely reuse -- as the need arises -- instead of having to hunt around for "a byte here and a byte there".

Trying to share static allocations by wrapping them in unions means you have to expose inner working of one module to other modules (since they both/all must declare their usage of these shared unions in one place -- else you can't be sure one set of needs "fits" within another's). It also forces some really bastardized structures.

See the menu example and how you could wrap it up with static allocations.

As for "small" embedded systems... what do you call small? It's not a PIC 12 but it was originally released on a 64K Z80 so... (that's code + data)

Use static allocation if the memory usage is PERSISTENT. If it is going to "go away", then it is dynamic in nature (has no permanent place of residence)

Exactly. That's where the engineering comes in. I *often* kill off tasks that aren't needed while others are running (otherwise, they are just twiddling their thumbs!)

I'm still waiting to see the union mess you come up with to handle the menu example I gave you. Note that I didn't even include any of the "other" needs that the code on each of these "layers" might add to the requirements.

Then, show me a code snippet of how I could write the label "MTTU:" in reverse video beginning in the second column of line 4 of the Network dialog.

Correctness is easy -- if you are competent. The tricks are always trying to make it *fit* within the resources you have (or less).

Why do you assume malloc has to be more sophisticated to be efficient?

Use of pointers is a common source of errors in programming. Use of floating point is a common source of errors in programming. Use of arrays is a common source of errors in programming.

You're saying "don't use knives because you could cut yourself".

*I'm* saying, "LEARN HOW to use a knife so that you *don't* cut yourself!" [buffer pools]

A "clear and simple fixed-size buffer scheme" adds its own API to the mess. I'm saying malloc does this already for you!

Tell me how the buffer pool code differs from tha *naive* malloc implementation:

name = create_pool(num_bufs, size_buf); buffer = allocate_buffer(name);

vs.

heap = create_heap(num_bufs * size_buf); buffer = allocate(heap, size_buf);

Note that you already *have* the "create_heap" function

*somewhere* in your system -- even if it is inlined code. All I have done is allowed "allocate()" to accept a pointer to a "heap". Now, this heap behaves identically to your buffer pool -- it will run out of resources at exactly the same point in time as your pool does. It allocates in constant time.

You've had to invent a new API (create_pool, allocate_buffer, release_pool) just to do something that already can be done.

So, you must dislike qsort()?

You have to analyze *every* path through your code to see what it imposes on stack utilization (esp annoying in C++). How is that different than looking to see how and when malloc/free are called?

Define small.

Historically, most of the products that I have designed have been small (intentionally -- cost). My rewrite of "malloc" now is still for use in a reasonably small deployment (256K CODE,

128K data). The changes I have been proposing (in other threads) are attempts to give me several *different* management strategies in the same wrapper to suit different needs of *the* application. I'm *sure* no one will bother to rewrite a working system to use "static allocations" in lieu of my dynamic allocations. :>

Inertia? Or, perhaps just "it works" is probably a big enough disincentive for folks to screw with it after I release it.

What determines "minimal real-time requirements"? The user surely can't wait "indefinitely" for an allocation to succeed. And, it *must* succeed or he will consider the device broken!

No! Malloc is used where and when *I* want to use it. Just like division (and the dreaded fear of divide-by-zero) is used where I want!

But that's your expectation of *your* malloc. *Mine* behaves the way I want it to. Because I *made* it thusly. I can count on it just as I can count on sqrt(unsigned) to always work.

You must work in a comfortable environment where there are no cost or performance constraints placed on the products that you design. :> I typically have to meet a cost projection established when the product is conceived. I can't, later, say, "Oh, I need to add another 100KB of RAM (or ROM) to the product design. Can you scrap the PCB's and the case design so we can fit this in? And, by the way, we need to push the schedule back a few months for all of these changes..."

With experience, you *know* what tricks you can use to save resources (CPU + memory). I am constantly reusing old design approaches to leverage these "tricks". malloc/free are right up at the top of my toolkit because they let me do thiings that aren't practical otherwise. (e.g., like getting a full screen user interface to run *fast* over a 300 baud modem)

I don't see any distinction. Just like I don't see any reason to avoid floats in "real-time" applications. You just have to engineer the solution instead of slopping something together and hoping you can buy what you need later to make it work.

The example was to refute your comment:

"Why would you even want to do this - except as a way to work around an inappropriate dynamic memory system? It is far better to figure out what your system needs, ..."

I.e., this is "why I would want to do this"

Fragmentation of individual packets isn't the problem. That's handled in the network stack.

"Audio programming" (music) is a continuous stream of data. An infinite number of packets.

They need not arrive at the client in the order intended. So, you have to reassemble the "packet stream" in memory (i.e., you can't just push the most recently received packet out the speaker(s) as it may be a "future packet" and the packet that you need *next* arrives *after* it)

If you do that, then you will create two "channels" of buffers (one for each speaker) even though at runtime you discover that you only have *one* speaker attached to you. I.e., you waste half of your buffer space.

No. You need one or two *sets* of "music buffers" to contain the "audio program" that is delivered thru the speakers. You have other needs to handle things like the "control channel" that allows the user (via the server) to reconfigure parameters in the device. You have still other needs that pertain to synchronizing the audio stream(s) from this "node" with other nodes nearby.

And, as below, all of this has to be shared (when "disabled") with the error recovery software that runs when these things are NOT running properly.

Recall this was my clarification of my comment:

which you discounted.

I try to explain things so *others* (we aren;t the only ones reading this) can understand the application.

On the contrary, if this was "just for fun", I would just put all of these on a separate network and dedicate a processor to keeping them satisfied.

However, it is intended as a learning platform for a control system that *can't* be "down". How do you tell the customer where his problem lies? "Gee, I missed a bunch of packets sometime earlier today..." How does that help him solve his problem? How does he continue to *use* the devices/system *while* things "aren't quite right"?

Sure, you can tell him that he has to upgrade his server; or needs to rearrange his network topology to reduce long paths; or needs to install new software, etc.

But, he wants to know why things are broke *today* and why it's going to take you 3 hours to get *to* him (or, maybe

3 *days*!) and "some amount of time thereafter" to get things running again. Trust me, he *won;t* be anxious to spend another megabuck -- or, even finish paying you for the *first* system!

So, you design products to be very robust and work even when they "can't". It's design work up front that saves time and effort later.

Do the math. A second of audio is ~100KB. Two channels would be 200KB. And that's just for the "raw audio". No other overhead (signal processing, control channel, network stack, RTOS, system stack(s), etc.).

I.e., you design for much less than "1 second". And, make provisions to stretch that when it looks like its is going to break -- design effort doesn't add to recurring costs (and you can use that effort for other products as well)

No. It fits everywhere.

MISRA tells you not to run with scissors. So, if an infant is "hanging" (strangled) in the sash cords for your window blinds, you should WALK CAREFULLY from the office to the bedroom with the scissors in your hands and, once there, carefully cut the cord wrapped around your child's BLUE neck.

Most guidelines are intended to deal with shortcomings in design disciplines. I.e., they try to prevent people from doing things that they don't understand properly -- but don't *know* they don't understand.

E.g., I routinely pass "pointers-to-functions". It's a powerful technique. Like regular "pointers to data". Yet, both are prone to misuse and error. MISRA's attitude is to discourage their use. My attitude is to educate for their *proper* use.

Now we've moved from "small" to "small/medium". Yet we have no definition of what those are? :-(

I disagree. It's better to teach people how to use things

*right*. Most people talk on their cell phones while driving. Should we make it illegal? Or, should we teach people to pull over when they want to make a call??

What happens with your buffer pool scheme when you run out of buffers? How do you prevent deadlock *there*? Or, do you just provide all the buffers anyone can *ever* need (despite their *typical* needs)? If so, why use a pool? Why not just give each consumer the DEDICATED memory he needs??

The same deadlock potentials exist with a buffer pool as with malloc. They are both shared resources. What do you do if two tasks arrive at the pool each looking for a buffer and NONE remain??

That's not the issue. You can design with flexible timing constraints. Even in HRT applications -- so long as you meet your deadlines. The fact that someone has to *wait* doesn't make anything broken -- so long as the resource is available IN TIME for the task to meet its deadlines.

No. Then your buffer pool implementation should allow *your* application to work when the pool is created with *no* buffers in it! Right??

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 8:23 AM

I don't know of any malloc implementations that don't have a size_t overhead per allocation, but I haven't looked closely at very many. And while it is very typical to store the size of a chunk at the address just below the returned pointer, it's not a requirement. It's also possible to use the same basic storage, but with variations. For example, on an 8-bit system, you might store a single byte for the size if the allocation is less than 128 bytes, and only use two bytes if it is bigger than that - thus halving your overhead in most cases.

We are comparing virtual apples to imaginary apples here - this discussion is getting way too abstract. I get the feeling we are both just floating ideas without being able to express them clearly and succinctly, and there is probably a lot more overlap than there appears. I'll read the rest of your post, and maybe make a few comments, but I don't think we will get very far at the moment. You've certainly given me a few things to think about in this thread - I hope I've given you some too. But I'm afraid I don't have the time to comment here as completely as I should - especially as I think we are going round in circles. Maybe I'll get the chance later to post more.

I think all this boils down to what I mean by "malloc" is not the same as what you mean by "malloc". I view "malloc" as a general purpose allocator that will work with any sequence of allocations and releases of any sizes, returning memory from some sort of heap system. malloc is part of the standard library for your development tools, and knows nothing about your application or the actual pattern of allocations used. Anything else that is written for a particular usage or application is a special-purpose allocation system.

As I now understand it (I think), you have written a special-purpose allocation system and tucked it under a standard malloc call interface, and thus call it malloc.

If that's the case, then we agree on everything except the name.

No, I'm saying "don't use a chainsaw to cut a loaf of bread". Learn to use the chainsaw safely for when you need to chop down a tree, but use your tools appropriately.

What do you mean "you already have the create_heap function somewhere in the system"? You talk as though "create_heap" and "allocate" are some sort of standard functions that are already available. If that's the case in your system, then obviously I agree that there is no difference between the two API's you described above. But /I/ don't have these functions, so I could use either API.

And also note that I have not been suggesting a pool-based heap allocator like this (though such allocators have their place too). In the few cases where I have seen a need for any sort of dynamic memory allocation in small systems, the allocator has been much more specialised - with fixed sizes for everything, it is even simpler.

That's a bit of a non-sequitor, I think.

I like qsort - in its place. There are times when a quicksort of some kind is the right sort of sort to use - there are times when a bubblesort is the correct choice.

- B
- Boudewijn Dijkstra
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 11:14 AM

That depends on how much overhead you spend on debugging aids, such as the owner process ID.

--
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/
(remove the obvious prefix to reply by mail)

- B
- Boudewijn Dijkstra
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 11:37 AM

If it were, I would have used /v/ instead. As IPA defines, /?/ is a labiodental approximant, voiced in this case.

I'm not quite sure I understand. One would use the mutex to "hold off the competition"?

Obviously.

The most probable cause is that neither allocation request can be satisfied at this time.

How so? The allocator doesn't know that the higher priority task is about to make a request.

Why forever? Are you talking about cyclic alloc/free?

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Mon, Mar 8, 2010 5:08 PM

Hmmm... OK. I'll have to research it, then. Not used in the European languages?

I figured even *if* it "tried correctly" it would still end up coming out stilted.

If the Heap (there can be many) is not shared, no need to protect it with a synchronization primitive as "you" are the only consumer.

OTOH, if it *is* shared, pass a MUTEX (handle to a mutex) to the allocator (as well as free, when the time comes) and the allocator will attempt to take the mutex before accessing those aspects of the Heap data structure (e.g., free list) that need to be manipulated atomically. So, if something else is already doing so, you will end up blocking (until the mutex is released)

Yes. So I am trying to think that line of reasoning through to its conclusion. And see how it affects the outcome of competing requests (i.e., it seems that the first request might want to be allowed to run when it can -- but inherit the priority of the highest priority task that is currently waiting for the resource it now holds)

No, I'm assuming the higher priority task is now blocking waiting on that resource -- which is being held by a lower priority task! So, the lower priority task should have its priority elevated so that deadlock can't occur.

E.g., don't think of it as memory. Think of a simple resource. If low priority task holds it and high priority task *needs* it, then deadlock can occur. If you elevate priority of low priority task holding resource, then low priority task won't get stuck later by some medium priority task, etc.

Low priority task takes resource (forget memory for the time being and think of just *a* shared resource). Some epsilon later, high priority task wants that resource. Can't have it because it is in use. So, high priority task blocks. Now, medium priority task comes along. Doesn't need the resource -- but, having higher priority than the "low" priority task, it forces the low priority task to wait. For as long as the medium priority task wants to run! High priority task can't preempt medium priority task because it is blocked. Low priority task can't finish using the resource that it holds because medium priority task outranks it. Deadlock.

If, OTOH, low priority task had its priority elevated to that of the highest priority task waiting for the resource it holds, then that "was low" priority task could continue to run and, presumably, release that resource (at which time its priority would return to normal, high priority task would become unblocked because resource was now available and resume running).

This was the essence of my post. I *think* requests can be satisfied in the order they arrive. As a thought experiment, imagine what would have happened if the heap had been bigger and the first request hadn't blocked...