ElectricFence Exiting: mprotect() failed: Cannot allocate memory

B

Bill 17 years ago

I am using electric fence 2.1.13 to try to find a memory allocation problem that occurs after my application runs for about 3 hours. When I link to the electric fence library, I get "ElectricFence Exiting: mprotect() failed: Cannot allocate memory" during initialization. Could this be the source of the error that takes 3 hours to occur? I wonder because all I see at this point is a 12 byte malloc.

According to a comment in efence.c, "On some systems it will be necessary to increase the amount of swap space in order to debug large programs that perform lots of allocation, because of the per-buffer overhead." How does one increase the amount of swap space? I am running Linux 2.6.26 on an MPC8248.

Vote

D

David Schwartz 17 years ago

I doubt that's the source of the error that takes 3 hours to occur.

I would recommend doing invasive debugging on a test system with significant additional memory. It's hard to help you without knowing more about your hardware. Do you have a hard drive? Do you have USB ports? How much memory do you have?

DS

Vote

P

Paul Pluzhnikov 17 years ago

What kind of problem? Efence is good at finding memory corruption problems, not memory allocation problems.

Unlikely.

A *single* 12-byte malloc was performed by that point? If so, your copy of efence is misconfigured, miscompiled, or busted in some other way.

Efence adds 1 page guard to every malloc. It is very rarely helpful in debugging non-toy applications. You may have better luck with Valgrind.

Try "man swapon".

Cheers,

In order to understand recursion you must first understand recursion. Remove /-nsp/ for email.

Vote

B

Bill 17 years ago

After about 3 hours, the program seg faults when trying to do a malloc

65K bytes. At the time, according to top, there is plenty of memory available

I tried using valgrind but it slowed down my application so much that it was useless..

Vote

B

Bill 17 years ago

I have a total of 128 MB of flash on my target board. No USB ports. Monitoring top, it does not appear that memory is being leaked, but it is behaving as if running out of memory. Is there a better way than top to monitor memory?

n

Vote

P

Paul Keinanen 17 years ago

Apparently you do not have even 64 KiB of _contiguous_ virtual memory available, but only a huge number of smaller fragments all over the memory. I guess that the system would run a few hours longer, if the largest allocation was 8 KiB :-).

Sounds like a typical dynamic memory fragmentation problem.

The other alternative, if the stack and dynamic memory occupy the same memory area (one growing upwards and the other downwards) is that he stack size is constantly increasing due to a programming error, finally inhibiting the growth of the heap.

Paul

Vote

R

Rainer Weikusat 17 years ago

Still assuming your description is correct, it is behaving as if the malloc-code made an invalid memory access because of a corrupted pointer inside the heap. But you can easily verify if the allocation should have succeeded, ie if there was a continuous area of at least

64K of 'unused VM' available:

Modify the segfault handler in the kernel to send a SIGSTOP instead of a SIGSEGV.
Use pmap to inspect the address space layout of the affected process after it has been stopped by the signal.

Vote

J

John Reiser 17 years ago

Under glibc, setting the shell environment variable "export MALLOC_CHECK_=2" [note the trailing underscore] performs additional internal consistency checks that are relatively inexpensive. Run "info libc" then search for MALLOC_CHECK_.

man swapon # how to increase swap space. /proc//maps reveals summary information for one process. /proc//smaps reveals more details for one process. /proc/meminfo reports a system-wide summary.

Vote

R

Rainer Weikusat 17 years ago

MALLOC_CHECK_.

The OP is using a PPC-based SoC. I doubt that he has any swap space on board.

Vote

B

Bill 17 years ago

Below is what pmap -x gives for the process (snmpd) upon failing at a call to malloc for 65536 bytes. Does anything here would indicate a possible problem trying to malloc 65536 bytes? It should be noted that a call to pmap -x before the failure while snmpd was still running gave identical results. Therefore, I wonder if the cause of the problem can be seen here?

Address Kbytes RSS Anon Locked Mode Mapping

0f8b8000 64 - - - r-x-- libresolv-2.6.so 0f8c8000 252 - - - ----- libresolv-2.6.so 0f907000 4 - - - r---- libresolv-2.6.so 0f908000 4 - - - rwx-- libresolv-2.6.so 0f909000 8 - - - rwx-- [ anon ] 0f91b000 16 - - - r-x-- libnss_dns-2.6.so 0f91f000 252 - - - ----- libnss_dns-2.6.so 0f95e000 4 - - - r---- libnss_dns-2.6.so 0f95f000 4 - - - rwx-- libnss_dns-2.6.so 0f970000 40 - - - r-x-- libnss_files-2.6.so 0f97a000 252 - - - ----- libnss_files-2.6.so 0f9b9000 4 - - - r---- libnss_files-2.6.so 0f9ba000 4 - - - rwx-- libnss_files-2.6.so 0f9cb000 28 - - - r-x-- librt-2.6.so 0f9d2000 252 - - - ----- librt-2.6.so 0fa11000 4 - - - r---- librt-2.6.so 0fa12000 4 - - - rwx-- librt-2.6.so 0fa23000 1264 - - - r-x-- libc-2.6.so 0fb5f000 252 - - - ----- libc-2.6.so 0fb9e000 8 - - - r---- libc-2.6.so 0fba0000 12 - - - rwx-- libc-2.6.so 0fba3000 12 - - - rwx-- [ anon ] 0fbb6000 12 - - - r-x-- libEclipseHms.so 0fbb9000 252 - - - ----- libEclipseHms.so 0fbf8000 4 - - - rwx-- libEclipseHms.so 0fc09000 16 - - - r-x-- libEclipseVer.so 0fc0d000 252 - - - ----- libEclipseVer.so 0fc4c000 4 - - - rwx-- libEclipseVer.so 0fc5d000 12 - - - r-x-- libEclipsePai.so 0fc60000 252 - - - ----- libEclipsePai.so 0fc9f000 4 - - - rwx-- libEclipsePai.so 0fcb0000 8 - - - r-x-- libEclipseCil.so 0fcb2000 256 - - - ----- libEclipseCil.so 0fcf2000 4 - - - rwx-- libEclipseCil.so 0fd03000 12 - - - r-x-- libEclipseConf.so 0fd06000 256 - - - ----- libEclipseConf.so 0fd46000 4 - - - rwx-- libEclipseConf.so 0fd57000 16 - - - r-x-- libEclipseSem.so 0fd5b000 252 - - - ----- libEclipseSem.so 0fd9a000 4 - - - rwx-- libEclipseSem.so 0fdab000 12 - - - r-x-- libEclipseLog.so 0fdae000 256 - - - ----- libEclipseLog.so 0fdee000 4 - - - rwx-- libEclipseLog.so 0fdff000 8 - - - r-x-- libEclipseLst.so 0fe01000 252 - - - ----- libEclipseLst.so 0fe40000 4 - - - rwx-- libEclipseLst.so 0fe51000 80 - - - r-x-- libpthread-2.6.so 0fe65000 256 - - - ----- libpthread-2.6.so 0fea5000 4 - - - r---- libpthread-2.6.so 0fea6000 4 - - - rwx-- libpthread-2.6.so 0fea7000 8 - - - rwx-- [ anon ] 0feb9000 640 - - - r-x-- libm-2.6.so 0ff59000 252 - - - ----- libm-2.6.so 0ff98000 4 - - - r---- libm-2.6.so 0ff99000 12 - - - rwx-- libm-2.6.so 0ffac000 12 - - - r-x-- libdl-2.6.so 0ffaf000 252 - - - ----- libdl-2.6.so 0ffee000 4 - - - r---- libdl-2.6.so 0ffef000 4 - - - rwx-- libdl-2.6.so 10000000 1192 - - - r-x-- snmpd 10169000 32 - - - rwx-- snmpd 10171000 552 - - - rwx-- [ anon ] 30000000 116 - - - r-x-- ld-2.6.so 3001d000 24 - - - rw--- [ anon ] 30023000 4 - - - r--s- [ shmid=3D0x0 ] 30024000 4 - - - rw--- [ anon ] 30025000 4 - - - r--s- [ shmid=3D0x0 ] 3005c000 4 - - - r---- ld-2.6.so 3005d000 4 - - - rwx-- ld-2.6.so 3005e000 4 - - - ----- [ anon ] 3005f000 8188 - - - rw--- [ anon ] 3085e000 4 - - - ----- [ anon ] 3085f000 8188 - - - rw--- [ anon ] 7ff61000 332 - - - rw--- [ stack ]

-------- ------- ------- ------- ------- total kB 25084 - - -

Vote

C

CBFalconer 17 years ago

Please do not top-post, but do snip properly. Your answer belongs after (or intermixed with) the quoted material to which you reply, after snipping all irrelevant material. This gives prospective repliers a fighting chance at understanding the thread. See the following links:

(taming google) (newusers)

[mail]: Chuck F (cbfalconer at maineline dot net) [page]: Try the download section.

Vote

P

Paul Pluzhnikov 17 years ago

Are you *sure* above description is accurate?

Is it that the application gets SIGSEGV *while* trying to do a malloc (IOW, it crashes *inside* malloc), or is it that the application gets NULL from malloc and gets SIGSEGV when it attempts to use returned memory?

The former implies heap corruption, the latter heap exhaustion.

MALLOC_CHECK_, Valgrind, efence all help with the former, but are useless for the latter.

Cheers,

In order to understand recursion you must first understand recursion. Remove /-nsp/ for email.

Vote

B

Bill 17 years ago

A backtrace in the SIGSEGV signal handler I put into the application points to the line where the malloc occurs. There is an if statement to check for a NULL pointer and print a message if malloc returned a NULL pointer. No message is printed.

Valgrind slows down the application too much to be effective.
efence exits during initialization with the "Exiting: mprotect() failed: Cannot allocate memory" error.
I am running a test right now with MALLOC_CHECK_=3D2 and will examine the results in the morning.

Vote

J

John Reiser 17 years ago

Beware of the possibility of buffering. Consider the program below. When run interactively with stdout connected to a terminal, then: f(10) is NULL. Segmentation fault where the first line is unbuffered stdout from the program, and the second line is unbuffered stderr from the shell. When run with stdout re-directed into a regular file, then you see only: Segmentation fault on stderr, and the file is *empty* ["No message is printed."] because the buffer was not flushed. So remember fflush().

----- #include

char *f(a) { return 0; }

main() { char *p = f(10); if (NULL==p) { printf("f(10) is NULL.\n"); /* fflush(stdout); THE FIX */ } return *p; }

-----

Vote

R

Rainer Weikusat 17 years ago

[...]

The last line should describe the 'regular heap' of the application (the area used by brk/sbrk). Its present size is 552K and it could grow by about another 510M until it would 'hit' ld-2.6.so (sbrk/brk would return null pointers then).

The two 8818K segements preceded by a single page w/ 'no access' are most likely (userspace) NPTL-stacks for two threads (default NPTL thread stack size is 8M, the lowest 4K are used as guard page so that an access beyond the bounds of one stack causes a [MMU] exception instead of overwriting data on the other stack). These stacks are allocated by calling mmap with MAP_ANON. There is still plenty of space for other anonymous mappings between the highest used address (0x3105f000) and the lowest presently used address of the conventional 'stack segment'.

Unless I am very much mistaken, this process should certainly be capable of allocating more virtual memory using either brk/sbrk or mmap.

BTW, while getting non-spam e-mails at least ocassionally is nice :-), I usually read postings in the groups I frequent, except insofar 'certain posters', whom I deem to be more of an annoyance than an information source, will be filtered by my newsreader.

Vote

B

Bill 17 years ago

=A0libresolv-2.6.so

=A0snmpd

=A0 =A0[ anon ]

=A0ld-2.6.so

=A0 =A0[ anon ]

=A0 =A0[ shmid=3D0x0 ]

=A0 =A0[ anon ]

=A0 =A0[ shmid=3D0x0 ]

=A0ld-2.6.so

=A0 =A0[ anon ]

=A0 =A0[ stack ]

When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR and si_addr of 0x2d. What does address 0x2d represent? Is the problem that address 0x2d is not in the ranges shown in pmap?

Vote

N

Nate Eldredge 17 years ago

Page fault when accessing an unmapped page.

The address that the program tried to access.

Well, sort of. 0x2d isn't in that range because that page isn't mapped. But it's not supposed to be mapped. The first page of virtual memory is always unmapped, so that NULL pointer dereferences generate faults. So it's an address that can't possibly be valid. If the crash is inside malloc, as you said earlier, then most likely some pointer in malloc's data structures got overwritten with 0x0000002d.

If you have a core dump, you might be able to trace backwards a little ways to figure out where this pointer itself is located. If you recognize the data around it, it might suggest to you what part of your program could be guilty of overwriting it. (As a start, 0x2d is ASCII '-'. Any part of your program use hyphens?)

Vote

P

Paul Pluzhnikov 17 years ago

The OP stated that he doesn't actually know that, only deduces this from lack of printed message (which, as John Reiser aptly suggests, may be due to naive use of stdout buffering; where stderr was likely called for).

Much more likely than pointer being overwritten is that malloc() in fact returned NULL, and OP then did (an equivalent of):

struct Foo *p = malloc(sizeof(Foo)); p->some_field_at_offset_0x2d = 1;

Cheers,

In order to understand recursion you must first understand recursion. Remove /-nsp/ for email.

Vote

D

David Schwartz 17 years ago

This is still not very good. What if 'printf' needs to allocate memory to do its job? What if 'fflush' does? In an error handler like this, you are better off calling 'write' directly.

DS

Vote

R

Rainer Weikusat 17 years ago

This means roughly 'it is more likely that the system ran out of memory than that the application contained a programming error'. But this is again a question which can be answered very simply: Check the PC/IP value at the time of the segfault. That's either within malloc (as the OP has repeatedly claimed) or within application code.

So, what is it?

Vote

ElectricFence Exiting: mprotect() failed: Cannot allocate memory

Join the Discussion

Didn't find your answer?