Embedded linux: With or without MMU

Hi all,

In extend of my thread "Embedded Linux Vs. Real time Linux" I have another question regarding embedded systems based on Linux.

Is it possible to run an ordinary linux on an 32bit architecture that does not include MMU hardware? In that case what is the Idea of ucLinux if you can use and ordinary distro (if ported offcourse)? Also if it's possible what is required in term of kernel setup, in my head there most be allot of kernel code that is irrelevant because of the lacking MMU.

I would like a CPU to include MMU for the reasson of avoiding tricky memory violation bugs and problems with heap fragmentation. Also it gives me the perfect separation between the different threads of execution in my application(s) but also between the high level of my application code and the low level kernel code (drivers and such).....any other reasson to want a MMU included ?

Is the performance loss by using a MMU only dependant on the hardware architecture of the MMU or is it also software (linux kernel) dependant. Do you have any idea of the performance loss when using the MMU hardware?

Again sorry for the noob questions.....Brave new world ;)

Best Regards

MMJ

Reply to
MMJ
Loading thread data ...

No that is why it's called "full" Linux (if that is what you mean by "ordinary").

But the "official" kernel distribution does support MMU-less CPU by selecting the appropriate architecture. Some architectures always come without an MMU and for some (such as ARM) an MMU-version and an MMU-less version is supplied (selectable in the Kernel configuration). The MMU-less Linux version is called µCLinux.

Debugging might be better, as a user process going wild can't destroy the Kernel.

If you have a safety-critical application an MMU provides a better fall-back behavior of the device in case part of the user software fails.

The MMU itself slows down the CPU (fast cheap CPUs like Blackfin don't have MMUs with virtual CPUs in FPGAs often MMUs are avoided due to performance considerations). Moreover the Kernel needs to deal with programming the MMU and with ill-designed hardware (such as ARM) the MMU is "behind" the cache (viewed from the CPU) and thus the cache needs to be cleared whenever the MMU content gets changed (i.e. with any task switch). (The x368 CPUs are fine on that behalf).

That depends greatly on the application. If you mainly run a single task (and you don't have the choice of a faster CPU chip) you will not notice any slow down, but if you do many task switches the MMU might provide a problem.

-Michael

Reply to
Michael Schnell

Does the lack of the MMU cause many limitations of which systems calls that is possible to use in user software? ucLinux does as far as I know not include the full Linux API?

So another point here might be boot-time. If my application should crash the kernel will keep living and should be able to restart the application without booting the board?

Is this "mistake" in design general? How about MIPS and PPC ?

Great info. Why is task switches a problem? Is it because the "cached table entries" must be refreshed when changeing memory space?

Reply to
MMJ

ss

t

The biggest problem is that the lack of an MMU means that it is impossible to support the fork() system call. Not having this system call means that many Linux applications cannot be supported.

.

he

No MMU means that your Kernel may be totally trashed or, more likely, corrupted in some perverse fashion that causes weird stuff to happen. You best guardian against this is a system watchdog that forces a microprocessor reset if the watchdog goes off.

be

).

Every OS that uses an MMU runs more slowly than one that does not. The degree of slowdown, however, is quite small for general purpose CPUs (e.g. x86, 68K, MIPS, etc). For DSP chips such slowdowns would be a killer since running small amounts of code very quickly is what they are designed to do. DSPs are not designed to run a general purpose OS.

e

In general you response is correct. A task switch will require the MMU hardware to be reloaded for the new task (i.e. process). However, this operation is normally highly optimised for each processor architecture that an OS supports and runs as fast as possible. The scope for speeding up existing code in this area is very small. You will need to be a genius from another dimension to get another factor of 2 out of the existing code for reload an MMU (or TLB).

Of course, under Linux, some task switches are really thread context switches. The great thing about threads is that all threads share the same address space. Thus a task switch that is a thread context switch does not require a MMU reload to occur. They are, therefore, faster.

Reply to
gordy

If you have an application crash on a system without an MMU, you probably don't want to just restart the app. The app may very well have caused memory corruption to any other process or even the kernel and those errors might not show for hours or even days.

Reply to
AZ Nomad

AFAIK the API calls are quite different with µCLinux, but you never do direct Linux API calls. You need to link your application against one of the µC-aware libraries (instead of gLibC) and thus you will notice no difference in "normal" applications. OK there is no "fork" and you need to use vfork instead that works a bit different, but you supposedly will not use fork a lot anyway in your own code in an embedded device. If you want to do multitasking with your supposedly will use the pthread library.

You will need a hardware watchdog, of course, to detect crashing. Supposedly µCLinux will boot faster than full Linux.

I don't know anything about MIPS. I do suppose that the PPC MMU/Cache system works like that of the X86, as this is originally meant for Desktops as well.

If there are no task switches there in fact is no difference in the (single) running application whether it runs in full or µCLinux.

(Besides the potential difference in hardware execution speed) With any task switch the OS has to do some work to reprogram the MMU (and with ARM the cache gets invalidated).

-Michael

Reply to
Michael Schnell

In terms of MMU performance lack I guess that the pthread lib will do best - since (AFAIK) multiple POSIX threads within the same application will run in the same address space? a fork() call will (AFAIK again) spawn a replicate of the process in another address space - or is this assumption wrong?

Won't I run into problems when importing general Linux application that I need in my systems - if I run uCLinux?

Again because of the lack of a virtuel memory system?

-- MMJ

Reply to
Morten M. Jørgensen

You are right that by default threads (e.g created by pthread) use the same address space while different processes (if not on µCLinux) (e.g created by fork) by default use different address spaces. But if that means that switching threads is much less overhead, I can't say. Between the two threads the Kernel must run and same uses another address space than the threads. So the MMU might be involved anyway.

As said, "normal" applications (that can be linked against µCLinux aware libraries) should not notice the difference.

I do suppose so.

-Michael

Reply to
Michael Schnell

A good indication of whether or not a given Linux application will run under ucLinux is if it has a mingw windows port. Windows does not implement a "fork", so programs compiled with mingw (which is a fairly minimal wrapper) can't use fork - they must use "vfork" for new processes. If the application has a cygwin windows port but no mingw port, then it *may* use "fork", since cygwin implements it (slowly).

Reply to
David Brown

IMHO, "normal" applications don't spawn other independent applications. So there is no need for fork anyway.

Moreover the "standard" purpose of fork() is not to let the running application run twice, but to spawn a different executable file. And this is done with vfork() (nearly) exactly as with fork(). AFAI remember, I read, that in full Linux you can do vfork() as well (though it's not recommended) and in µCLinux you can only do vfork. It looks like in most cases the only porting effort is adding the "v" :).

-Michael

Reply to
Michael Schnell

I found an appropriate reference:

formatting link

The basic difference between the two is that when a new process is created with vfork(), the parent process is temporarily suspended, and the child process might borrow the parent's address space. This strange state of affairs continues until the child process either exits, or calls execve(), at which point the parent process continues.

-Michael

Reply to
Michael Schnell

IMHO this means if you do the "normal" stuff (i.e. just starting a program from a file):

if (!vfork()){ exec?(...); }

it does not matter if you use fork() or vfork().

-Michael

Reply to
Michael Schnell

nother reference that also talks about the differences between fork and vfork that you will see if the child not immediately calls exec?().

-Michael

formatting link

If your app does an exec immediately after the fork, then it's usually really easy. Just replace fork with vfork, and that's about it.

If not, then life is more difficult - you need to very carefully audit, and sometimes re-factor, the application flow around the fork.

Things to remember:

  1. The parent blocks until the child calls exec() or _exit()

  1. The child shares all data and stack with parent, so must not return from a function call (unwinds the stack).

  2. Any variable modifications (local or global) by the child must be carefully checked for side effects in the parent. Watch for side effects in library calls, like the "errno" variable.
Reply to
Michael Schnell

Here is a rather detailed article regarding the topic

formatting link

-Michael

Reply to
Michael Schnell

There are three common uses that I can think of for (v)fork from "normal" applications. One is in forking servers, another other is for executing external subtasks, and the third is for splitting a task into parallel executed parts.

In the first case, things like webservers will often fork new processes to handle incoming connections. In this situation, it must be a real fork, since the new process keeps the same code and inherits things like file handles from the parent. This is a traditional unix server arrangement, and does not work well under ucLinux or windows ("fork" in

*nix is extremely efficient using COW, but very slow if you don't have an MMU and must copy everything, or if the OS simply doesn't support the concept). Such servers need to be heavily modified to work without fork

- they need to either use select() and other such asynchronous techniques, or they must use threads instead of processes. (Modern apache, for example, uses a mixture of forks and threads.)

For applications that fork off external subtasks, you normally see a fork/exec pair, often connected by a pipe to the parent. This sort of structure is normally fairly easily modified to a vfork.

For applications that use fork to parallise (is that a word?) their execution (keeping the same binary, but with different processes executing different parts of the code), it is probably better to re-write using threads. Traditionally, *nix was bad at thread handling

- there was no standardisation, and it was very unclear how threads relate to processes for scheduling). Since fork was so cheap on *nix, there was no real need for threads - unlike on windows, where fork is expensive so threads were needed. But modern linux and ucLinux handle threads well, making it a good choice in many situations.

Reply to
David Brown

Hi Michael,

Thx for all you great answers! I'll look into all the links you have thrown!

BR

-- MMJ

Reply to
MMJ

Are you sure about this ? If it were true, the performance would completely suck. Remember, ARM was originally designed for desktop use as well, AFAIK .. wasn't it used in those Acorn Archimedes machines ?

Reply to
Xenu The Enturbulator

The first ARM was for the Acorn Archimedes machines, but it did not have an MMU, and I'm not sure that it even had a cache (it ran at 8 MHz IIRC, and in those days memory was not much slower than cpus).

There are two ways to handle cache and MMU - you can cache by physical address (which causes slower access to the cached data, as addresses need to be translated before accessing the cache), or you can cache by virtual address (which is faster for the cpu to access as the logical addresses are used directly, but it requires a cache flush when changing the MMU maps).

I don't know which method the ARM uses. I've a vague feeling that on larger processors, L1 caches use virtual addresses while L2 (and L3) use physical addresses, but that could be wrong.

Reply to
David Brown

An excellent explanation, IMHO !

Thanks,

-Michael

Reply to
Michael Schnell

I'm not sure how it really works (you would need to take a look at the Linux source code). But as the application can't can't access the Kernel's memory and the Kernel can, there needs to be done _something_ _somehow_ in entering and leaving the Kernel land.

-Michael

Reply to
Michael Schnell

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.