Filesystem performance overheads?

Hi all,

Is anyone aware of studies that have been done to measure the performance overheads that result from using a filesystem? We want to know the performance loss that we would suffer using a filesystem like FAT16 on a ramdisk versus using the memory as it is (raw reads and writes to memory, without considering it as a ramdisk with a file system).

As far as I understand the performance loss would be miniscule, especially compared to what the filesystem buys us in convenience. But, still I am looking for some hard facts to back up this claim.

Thanks & regards, Sachin

Reply to
sg
Loading thread data ...

Measure it yourself, by using, say, UNIX shell scripts.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
 Click to see the full signature
Reply to
Maxim S. Shatskih

Maxim S. Shatskih :

Correct. No "generic" performance test will give you the *exact* stats that *you* need.

Alexander Skwar

--
One is not born a woman, one becomes one.
                -- Simone de Beauvoir
Reply to
Alexander Skwar

To which I would add, you should also look closely at what you specifically need from your data store. You might find that a simpler data organisation than an off-the-shelf filesystem suffices; if not, there are a large number of filesystems to choose from, with different characteristics, features and tradeoffs. Don't just test one.

--
>
> Alexander Skwar
 Click to see the full signature
Reply to
toby

You don't need a 'study', you just need to have some familiarity with what happens in a file system disk access.

We want to

Reads and writes to memory take fractions of a microsecond these days. Reads and writes to a RAM disk can easily take over 100 microseconds, after they've made it from the application into the operating system, through both the file system and disk-driver layers, and back again.

No, in fact the performance difference between direct memory access and RAM disk access is more than a couple of orders of magnitude. Of course, the performance difference between a RAM disk and a real disk can also be something close to a couple of orders of magnitude, so if you were instead asking how much *additional* performance improvement you could get over using a conventional disk by using direct memory references rather than a RAM disk the answer would be, "Not too much."

As usual, the way you frame the question has a lot of influence on what the answer to it is.

- bill

Reply to
Bill Todd

This is definitely true for seek performance, but if several contiguous blocks are to be transferred, the RAM disk might even be slower if copied under program control, while a real disk might perform the transfer using DMA with minimal program intervention.

Paul

Reply to
Paul Keinanen

Well, it depends on how you define "miniscule". Compared to using memory directly, no.

A read or write through any filesystem requires a system call. The computer has to trap to the OS (involves a context switch), process the system call (Linux is VERY good at this, but it's still hundreds of instructions), and then do the operation which may involve editing several regions of the ramdisk filesystem. You might ask the question "Can't I mmap(2) the file in?", to which I would reply "Yes, but then what would be the point of using a file on a ramdisk?"

The best case for a read or write to a ramdisk is the minimal system call overhead, which is still hundreds clock cycles assuming all the relevant code is in the CPU cache. The best case for a read or write to memory is L1 cache, which is like 1-2 clock cycles, which can be less than a nanosecond.

If the filesystem buys you convenience, just use the filesystem. Linux does very good caching with unused system memory, which is similar to the overhead of using a ramdisk but it's less of a PITA. Without knowing why a file is convenient for you it's hard to say, but if it's something like "it's easy to append to" then you can use a linked list or something to do the same thing in memory, maybe have a thread to spew the records to disk asynchronously.

In summary: ramdisk takes a big hit relative to directly using memory.

Reply to
Anthony Roberts

You will probably have to do this with your own machine under the conditions you are really running under with the programs that really concern you.

Imagine that the file system is 1000x slower in some sense than the RAM, but that you do little IO between the compute-limited processing of whatever it is you are reading or writing. Then the IO performance overhead would really be of no interest, and if speed is a problem, the processing algorithms would need improvement, not the IO.

At the opposite extreme, imagine your process were completely IO limited. Then the file system overhead would matter a lot. But if you run a 5400rpm hard drive with a tiny buffer using IDE interface transferring a byte at a time, this could be serious, whereas if you run a 15,000 rpm Ultra/320 SCSI hard drive with an 8 Megabyte buffer and your OS chains the SCSI commands, the file system overhead can be considerably less. Also, if you run Linux, you can have an enormous amount of stuff cached in RAM even with the file system in use, and avoid doing a lot of IO if you have enough memory for the cache.

Only after you have considered all these issues does the question of the file system used begin to be of interest.

But with all these (and other) variables, you can see why no one would bother doing such a study. It would take too much time to do, and too much space to print the results that would be of very limited interest.

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 Click to see the full signature
Reply to
Jean-David Beyer

It may be "small" but not "miniscule". For every access you would have to do a context switch and the kernel would have to run just to read data, as opposed to just being able to read it with one instruction and no context switch.

Jon

--
Learn to program using Linux assembly language
http://www.cafeshops.com/bartlettpublish.8640017
Reply to
Jonathan Bartlett

FAT16 (and variants) require walking the FAT chain whenever it is necessary to append, truncate, or seek in a file. Depending on cluster size and file size, that can be quite significant, and result in quite noticeable delays. Additionally, depending on the implementation, it's necessary to only allow one process access to the FAT at a time, in order to prevent corruption.

--Gene

Reply to
Gene S. Berkowitz

There is certainly going to be a high degree of variation on the answer. A filesystem is going to have some level of blocking, a directory structure, and a mechanism for expansion/deletion, along with fragmentation.

Raw read/writes don't have any of this overhead. Therefore, your answer is going to vary depending on your filesystem settings and the type of data you are using.

Reply to
David A.Lethe

For another take on this topic, you might have a look at this article that compares a database in a ramdisk versus an in-memory database, both on Linux systems.

In-Memory Database Systems

Linux Journal, September 1, 2002

formatting link

Or the proprietary version:

formatting link

Reply to
Information

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.