Crashing dlopen() on Sharp Zaurus C860

Dear all, I have the following issue:

I am using gcc 2.95.3 with Objective-C extensions to compile for the Sharp C860 (Linux 2.4.18 kernel and glibc 2.2.2).

The code dynamically loads a lot of shared libraries through dlopen() but spuroiusly runs into an "Illegal Instruction".

Several tests have reduced the code to a simple main() function that dlopens several system libraries and then three of my own shared libraries. The strange things are:

  • it is not repeatable. Sometimes, the Illegal Instruction comes on the first library. Starting again (even in a while true loop) makes the first and second load and it fails on the third one. In approx. 1 out of 100 cases it loads all three.
  • there is some dependency on the code included in the library - but it even fails with a nearly empty shared library.
  • it makes no difference if RT_LAZY or RT_NOW is specified
  • -rdynamic also makes no difference
  • running under gdb is also strange. Sometimes it loads without problems, sometimes it fails. The stack backtrace issues a "can't access memory location". info share sometimes fails as well. If not it shows that the failing library has been loaded.
  • writing a core dump also changes the gdb output.
  • the Illegal Instruction occurs within dlopen()
  • it *might* come from the _init code but I was not yet able to verify. Adding debug code to the objc runtime (which I assume is called from _init) was not called before the Illegal Instruction appeared
  • the strange thing is that the same code runs on a SL5500 without problems
  • The Qtopia system used on the Zaurus also relies heavily on shared libraries and seems to work

So I have several hypotheses:

  • there is a bug in the glue code for _init
  • there is a bug in the dlopen() function (where can I find source codes?) which depends on the size or alignment of the loaded library
  • there is a Kernel bug behind dlopen() which fails to load the correct page
  • the CPU cache is not properly managed when code is swapped in

What I would really appreciate are any hints, helps and workarounds on that issue.

Many thanks,

hns

formatting link

Reply to
Dr. Nikolaus Schaller
Loading thread data ...

Tough problem. Some casual google research suggests that you should try to upgrade the your glibc distribution, which includes the dynamic loading code in libdl, if that is a possibility for you. You can get source in a lot of places, including here:

formatting link

Reply to
Last2Know

Are there multiple threads at the time of the crash?

I have a test case which dlopen()s DSOs from 2 different threads and "reliably" crashes on Linux/x86/glibc-2.2

Cheers,

--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Reply to
Paul Pluzhnikov

You should try the same code on OpenZaurus which uses a much more modern compiler (gcc 3.4.3) and glibc (2.3.3).

Mickey.

Reply to
Michael 'Mickey' Lauer

For some unknown reason I can't directly reply through Google...

@Last2Know:

loading code in libdl

Unfortunately, this is not an option as I want to have my applications being installed without touching anything on the preinstalled system libraries.

But using your hints I have found a thread

formatting link
where others report similar issues depending on the glibc version (2.2.6 works, 2.2.8 fails). They added a dummy dlopen(NULL, RTLD_LAZY); and it worked on all versions. I will try this as soon as possible.

@Basile Starynkevitch:

bad powersupply, etc.... Have you any means to check your hardware? Did you try on another one? Can you change the RAM?

I am quite sure that it is not a hardware fault as I never have seen a similar thing in the applications that come with the Sharp Zaurus. There should be at least occasional crashes of Qtopia applications.

And it clearly depends on my own shared libraries - so it *might* also be an issue of the linker creating a library that sometimes makes dlopen() run away.

@Paul Pluzhnikov:

"reliably" crashes on Linux/x86/glibc-2.2

No, it is a single thread. But I have linked with -lpthread

@Michael 'Mickey' Lauer:

compiler (gcc 3.4.3) and glibc (2.3.3).

It seems to be too much effort to get an OZ system running just to compare and then switch back. If it does not fail with OZ, I still have no clue why it fails with the preinstalled glibc-2.2.2 and how to work around.

Many thanks for all these hints so far! Additional ones still welcome!

-- hns

Reply to
Dr. Nikolaus Schaller

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.