Dynamic allocation and newlib-nano

P

pozz 7 years ago

I normally don't use dynamic allocation in embedded projects, however there are situations where this isn't possible.

I'm working on a project with LPC1768 Cortex-M3 MCU that features

32kB+32kB RAM. I'm using lwip+mbedTLS to make a TLS connection to AWS IoT Core through Ethernet.

After a precise allocation of data blocks (incoming Ethernet packets, outgoing packets and so on), it seems working. However I'm using dynamic allocation, mainly for outgoing packets: when some application (MQTT, DNS client, ...) wants to transmit, lwip requests some space from the heap.

lwip can be configured to use a set of memory pools of different sizes to avoid dynamic allocation, but it's very difficult to configure. During execution I saw allocation requests of very different sizes: 5-10 bytes, 50-100 bytes, 800 bytes, 1500 bytes.

However the dynamic allocation algorithm appears very strange to me. I expected that malloc() returns always the start of the heap if the heap is completely free. It is so at the beginning, however when some allocations and deallocations happen, even if the heap is completely deallocated (all the allocations were freed), a new allocation doesn't return the start of the heap. How to explain this behaviour?

I noticed my allocation pattern is very simple and, after 1-5 allocations, there are other 1-5 deallocations and I eventually have a completely free heap, when all the outgoing packets are serially shifted out and/or acked from the remote side.

For example:

- malloc(3)=addr1

- free(addr1) -> free heap

- malloc(5)=addr2

- malloc(60)=add3

- free(addr2)

- free(addr3) -> free heap

- malloc(800)=addr4

- malloc(5)=addr5

- malloc(32)=addr6

- free(addr5)

- malloc(80)=addr7

- free(addr6)

- free(addr7)

- free(addr4) -> free heap

With this pattern, I think I can simplify the allocation algorithm so that it returns always the start of the heap when it is completely free and reduce fragmentation of complex algorithm.

Vote

U

upsidedown 7 years ago

malloc() needs to store the actual allocated size (user requested size+possible align to word/longword boundary), so that free() can deallocate the total amount allocated. In addition dynamic allocation is often allocated as linked lists so some descriptors may be used. One convenient place to store this auxiliary data is at the top of the heap and only then add the user requested buffer, thus the user receiver never starts at the start of heap, but after this descriptor.

Vote

P

pozz 7 years ago

Yes, I know. When I wrote that malloc() returns a different address than the start of the heap, I meant it is *very* different.

In other words, if the heap is at 0x1000, after many allocations/deallocations, when the heap is again completely free, malloc() returns 0x1100, 256 bytes from the start of the heap.

Vote

N

Niklas Holsti 7 years ago

If it is always 256 bytes, whatever the pattern of allocations and releases, then I would guess that your heap manager allocates a 256-byte block for its own use, on first call, and keeps that block allocated thereafter.

If the offset varies depending on the size and order of allocations and releases, your are probably seeing an effect of the allocation algorithm which tries to find the "best" (or a "good enough") free block to allocate. Even if all heap blocks have been released, this "good" block may not be at the start of the heap, probably because the allocation algorithm does not coalesce all adjacent free blocks.

It is quite complex (and can be processor-heavy) to structure and manage the heap area in such a way that releasing all allocated blocks returns the heap to its original state as a single, large, empty area. Instead, the heap area tends to be left as a collection of "free" blocks, of various sizes, and the next malloc() call picks one of these blocks to use for the new allocation. But the details depend on the particular allocation algorithm, and there are _many_ different algorithms.

The only heap-allocation algorithm I know that always returns the heap to its initial state is the "buddy algorithm"

formatting link

but there may be other, newer ones.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

P

pozz 7 years ago

No, it isn't my case.

Oh bad!

Complex? Really?

However is strange to me. If I have some allocated and freed blocks scattered on the heap space, I could understand. But when there aren't allocated blocks, but only free blocks, I think it's very simple to fallback to an initial state without *any* free blocks.

I'm using newlib and I think the allocator is this[1].

I tried to use mbedTLS buffer memory allocator[2] and it seems it works better than newlib allocator.

I had a problem during a TLS connection that was caused by an out of memory. After some allocations/deallocations, the heap was completely free. mbedTLS requests two allocations: 300 and 1300 bytes. I had around

2000 bytes of heap space, but the malloc failed. The 300-bytes block was allocated at an offset of 500 bytes from the beginning of the heap. Of course, 1300 bytes couldn't be allocated.

Increasing heap space solved the problem. But using mbedTLS allocator, solved the issue too.

[1]

formatting link

[2]

formatting link

Vote

J

Joe Chisolm 7 years ago

It could be trying to do a best fit algorithm. Also depends if it's trying to merge adjacent blocks. Might try malloc(3) malloc(5) free both malloc(8)

But I suspect since there is still plenty of free space in the heap malloc will take the quick way and alloc a new block. It wont do the expensive best fit or merge until it's out of space.

Chisolm Republic of Texas

Vote

N

Niklas Holsti 7 years ago

Ok, not complex in the special case when _all_ blocks are released, but complex in general to detect and coalesce adjacent free blocks while _some_ blocks remain allocated.

Yes, this special case could be detected by keeping count of the number of blocks allocated but not yet released, and reinitializing the heap when releasing a block makes this count zero. However, in programs which use the heap for more than one kind of data I expect this case to happen rarely.

That seems to be using the "first-fit" algorithm and a single free-block list holding blocks of any size. The free-block list is sorted into address order which lets the algorithm coalesce adjacent free blocks when such appear. However, it means that free() must traverse the (potentially long) free-block list when inserting the released block into the list at the proper place, which can take a while; Order(N) where N = length of free-block list.

After a glance at the code, it seems to me that this coalescing algorithm should result in a single, large free block after all allocated blocks are released, so I don't understand why that does not happen in your case.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

G

George Neuner 7 years ago

Or older ones. Mark-Release allocators also return to the initial state. For those that don't recognize this ancient term, Mark-Release is stack allocation from the heap.

George

Vote

P

Paul Rubin 7 years ago

I wouldn't necessarily expect that, but your best bet is to inspect the malloc library source code if you want to know what it is doing.

For your purpose I wonder if you could use an arena-style allocator instead of a lot of small mallocs and frees. That means allocate a single chunk of memory, then sequentially allocate from within the chunk, then free the whole chunk.

It's great that you have TCP and TLS working in so little ram, but I wonder if you could make your life easier by using a bigger MCU with more memory, so you don't have to manage it quite so carefully.

Vote

P

pozz 7 years ago

Yes, it has been a very difficult project.

Yes, I know. However we have an already designed and produced board that we wanted to reuse with TLS.

Next hardware will have much more RAM :-)

Vote

P

pozz 7 years ago

I see.

I don't know too. I'm not sure I'm using that allocator, because it is included in newlib-nano, that are the libraries I selected in NXP MCUXpresso IDE.

Vote

Dynamic allocation and newlib-nano

Join the Discussion

Didn't find your answer?