Filesystem performance overheads?

Question

Hi all,Is anyone aware of studies that have been done to measure theperformance overheads that result from using a filesystem? We want toknow the performance loss that we would suffer using a filesystem likeFAT16 on a ramdisk versus using the memory as it is (raw reads andwrites to memory, without considering it as a ramdisk with a filesystem).As far as I understand the performance loss would be miniscule,especially compared to what the filesystem buys us in convenience. But,still I am looking for some hard facts to back up this claim.Thanks & regards,Sachin

Maxim S. Shatskih · Accepted Answer

Measure it yourself, by using, say, UNIX shell scripts.-- Maxim Shatskih, Windows DDK MVPStorageCraft Corporation

Alexander Skwar · Answer

Maxim S. Shatskih :Correct. No "generic" performance test will give youthe *exact* stats that *you* need. Alexander Skwar-- One is not born a woman, one becomes one.                -- Simone de Beauvoir

toby · Answer

To which I would add, you should also look closely at what you specifically need from your data store. You might find that a simpler data organisation than an off-the-shelf filesystem suffices; if not, there are a large number of filesystems to choose from, with different characteristics, features and tradeoffs. Don't just test one.

Bill Todd · Answer

You don't need a 'study', you just need to have some familiarity with what happens in a file system disk access.

We want to

Reads and writes to memory take fractions of a microsecond these days. Reads and writes to a RAM disk can easily take over 100 microseconds, after they've made it from the application into the operating system, through both the file system and disk-driver layers, and back again.

No, in fact the performance difference between direct memory access and RAM disk access is more than a couple of orders of magnitude. Of course, the performance difference between a RAM disk and a real disk can also be something close to a couple of orders of magnitude, so if you were instead asking how much *additional* performance improvement you could get over using a conventional disk by using direct memory references rather than a RAM disk the answer would be, "Not too much."

As usual, the way you frame the question has a lot of influence on what the answer to it is.

- bill

Paul Keinanen · Answer

This is definitely true for seek performance, but if several contiguous blocks are to be transferred, the RAM disk might even be slower if copied under program control, while a real disk might perform the transfer using DMA with minimal program intervention.

Paul

Anthony Roberts · Answer

Well, it depends on how you define "miniscule". Compared to using memory directly, no.

A read or write through any filesystem requires a system call. The computer has to trap to the OS (involves a context switch), process the system call (Linux is VERY good at this, but it's still hundreds of instructions), and then do the operation which may involve editing several regions of the ramdisk filesystem. You might ask the question "Can't I mmap(2) the file in?", to which I would reply "Yes, but then what would be the point of using a file on a ramdisk?"

The best case for a read or write to a ramdisk is the minimal system call overhead, which is still hundreds clock cycles assuming all the relevant code is in the CPU cache. The best case for a read or write to memory is L1 cache, which is like 1-2 clock cycles, which can be less than a nanosecond.

If the filesystem buys you convenience, just use the filesystem. Linux does very good caching with unused system memory, which is similar to the overhead of using a ramdisk but it's less of a PITA. Without knowing why a file is convenient for you it's hard to say, but if it's something like "it's easy to append to" then you can use a linked list or something to do the same thing in memory, maybe have a thread to spew the records to disk asynchronously.

In summary: ramdisk takes a big hit relative to directly using memory.

Jean-David Beyer · Answer

You will probably have to do this with your own machine under the conditions you are really running under with the programs that really concern you.

Imagine that the file system is 1000x slower in some sense than the RAM, but that you do little IO between the compute-limited processing of whatever it is you are reading or writing. Then the IO performance overhead would really be of no interest, and if speed is a problem, the processing algorithms would need improvement, not the IO.

At the opposite extreme, imagine your process were completely IO limited. Then the file system overhead would matter a lot. But if you run a 5400rpm hard drive with a tiny buffer using IDE interface transferring a byte at a time, this could be serious, whereas if you run a 15,000 rpm Ultra/320 SCSI hard drive with an 8 Megabyte buffer and your OS chains the SCSI commands, the file system overhead can be considerably less. Also, if you run Linux, you can have an enormous amount of stuff cached in RAM even with the file system in use, and avoid doing a lot of IO if you have enough memory for the cache.

Only after you have considered all these issues does the question of the file system used begin to be of interest.

But with all these (and other) variables, you can see why no one would bother doing such a study. It would take too much time to do, and too much space to print the results that would be of very limited interest.

Jonathan Bartlett · Answer

It may be "small" but not "miniscule". For every access you would have to do a context switch and the kernel would have to run just to read data, as opposed to just being able to read it with one instruction and no context switch.

Jon

Gene S. Berkowitz · Answer

FAT16 (and variants) require walking the FAT chain whenever it is necessary to append, truncate, or seek in a file. Depending on cluster size and file size, that can be quite significant, and result in quite noticeable delays. Additionally, depending on the implementation, it's necessary to only allow one process access to the FAT at a time, in order to prevent corruption.

--Gene

David A.Lethe · Answer

There is certainly going to be a high degree of variation on the answer. A filesystem is going to have some level of blocking, a directory structure, and a mechanism for expansion/deletion, along with fragmentation.

Raw read/writes don't have any of this overhead. Therefore, your answer is going to vary depending on your filesystem settings and the type of data you are using.

Information · Answer

For another take on this topic, you might have a look at this article that compares a database in a ramdisk versus an in-memory database, both on Linux systems.

In-Memory Database Systems

Linux Journal, September 1, 2002

formatting link

Or the proprietary version:

formatting link

Filesystem performance overheads?

Join the Discussion

Didn't find your answer?