Multiple files with same filename on FAT

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi,

I'm planning to put the hard disk of my PVR (a Video DVD recorder with
hard-disk) on an external tray to ease video file moving to my PC.
Ideally, I would launch a small program to display the HD contents,
select the sequences I'm interested, and it would then copy them to my
PC's hard drive for advanced editing and burning. Then I would put the
drive back into the PVR.

For testing, I took the drive out, cloned it, and I'm currently
analyzing its contents.

It really looks promising : The drive is FAT based (Ghost said FAT 16,
XP says FAT 32), video files are standard MPG without encryption (they
play ok in VLC) and there's a "reclist.dat" file listing all recording
metadata (title, start time, duration, compression mode) that I
reverse-engineered without too much hassle.

However, the filesystem obviously is not 100% FAT compliant (for
example, the last modification timestamp is empty in XP and
"01/01/1601 01:00" in dos prompt's dir) and the real problem is the
following :

When a video file reaches 4 Gb, a second one is created with the
*same* name (!). For example, "dir" shows :

01/01/1601  01:00     4.290.772.992 CLIP2.MPG
01/01/1601  01:00       201.195.520 CLIP2.MPG

or :

01/01/1601  01:00     4.290.772.992 CLIP10.MPG
01/01/1601  01:00     1.055.064.064 CLIP10.MPG

Due to this inconsistence, I cannot play or copy the file : all I'm
getting is an error message.

Does one of you know a simple way (utility, libary, OS) of accessing
the files directly by their FAT entry ? I'm pretty sure entries are
contiguous (and even if they're not, finding the correct sequence
manually would not be a problem).

As a fallback position, any reliable way to rename the duplicate files
would be welcome too, although it would probably prevent me from
putting back the drive in the PVR afterwards unless I undo the
renaming.

The program could be in any language and even any OS as I intend to
place it on a bootable CD that would only be used to make the
transfer.

Any hint or help is welcome.

    Vicne


PS : All recordings (single or multiple files) are accompanied by
fixed-length .MAP files such as  :
01/01/1601  01:00         4.194.304 CLIP2.MAP
01/01/1601  01:00         4.194.304 CLIP10.MAP
Which seem to contain mainly offsets inside the main files. I guess
the only goal is to allow faster skipping or fast back/forwarding
inside the video...


Re: Multiple files with same filename on FAT

Quoted text here. Click to load it

1. use dd to make a complete image of the drive
2. mount it on the loopback device
3. copy CLIP2.MPG to another location
4. delete or rename CLIP2.MPG - the FAT code will rename the first
copy
5. now concatenate CLIP2.MPG (which will now refer to the second file
above) with your copied version of the first file
6. lather, rinse, repeat


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

In fact, concatenating may even be unnecessary as I can join the parts
in the video editing program. I'd be happy if I could just *read* them
separately.
So that's definitely an interesting path. Indeed, working on an
*image* instead of a *clone* would be much simpler and I could try to
analyse the drive without the OS trying to "repair" it. (I tried
mounting it on a Red Hat box but it hung, so I tried with XP which was
successful).
The only drawback is that imaging the 160Gb drive takes around 1h30,
but well...

Thanks a lot for the suggestion !

Vicne


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

It should be safer to first rename all instances of CLIP2.MPG. With all
this caching etc in today's operating systems, I wouldn't blindly assume
that the "copy" command accesses the same instance than the following
"delete" (at least, my ISO9660 implementation wouldn't guarantee that in
case of an error like this, and I would use the same caching trick again
when implementing FAT).


  Stefan


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

Yes, I also prefer renaming instead of deleting.
I'll try that to see if renaming is allowed or if it also gives an
error.

Thanks for the help.


Vicne


Re: Multiple files with same filename on FAT

Quoted text here. Click to load it

You REALLY need to think that through, Stefan. If cache coherency is
so screwed up, then there is no way the operating system could
possibly work. Reads from a drive where write-behind cache is active
will read from the cache, not the media.

It doesn't really matter anyway - the important point is, all the FAT
code I've ever read, written or tested will only search directories
until it hits the first instance of the filespec. So "mv CLIP2.MPG
CLIP2.MP0" will only change the first one.


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

That makes sense indeed, and I think in any case renaming one of the
files is the safest way to proceed.

I was wondering also : I think a solution would be to mimic what a
file recovery utility would do.
Their job is to decode a source filesystem, detect deleted entries and
recover the files they pointed to to another drive. What I want to do
is very similar, except I'm interested in duplicate entries instead of
deleted ones.

I found PhotoRec - http://www.cgsecurity.org/wiki/PhotoRec - but
unfortunately, PhotoRec is precisely different in that it does care
about filesystem structure and only scans sectors for known file
"signatures".

Do you know of a "file recovery" (or maybe "undelete") utility for DOS
with available source code ?

Thanks very much.

Vicne


Re: Multiple files with same filename on FAT

Quoted text here. Click to load it

Deleted entries in a FAT filesystem are marked as deleted by changing
the first character of the filename to a special magic byte. All DOS
undelete utilities look for this signature. Other utilities (which
look for deleted files in unused space after the directories are
destroyed) work by looking for specific data signatures such as JPEG
headers.

If you want sourcecode for a FAT16/32 reader that can do what you want
(with some small glue you will have to provide), I wrote one:
http://www.zws.com/products/dosfs /


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

Unfortunately, I admit I never accessed a disk in a raw manner (by
sector, not by file), so finding the "glue" for DOS still puzzles me,
but I think with such source code, I'm getting very close.

Thanks a million.


Vicne




Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

Maybe I should've been a little more explicit.

My filesystem has, like any filesystem, a block cache of a bounded size.
Every block read from the disc is stored in there. When someone now
wants to open a file, I first look at the blocks I have in the cache
whether I find the relevant directory entry there. Only if I don't find
it there, I read the directory regularily from top to bottom. This works
nicely for ISO9660, because directory entries always start at sector
boundaries. It will also work for FAT. I implemented that to avoid
thrashing on *huge* directories (e.g. WinXP's i386 folder).

If you now have two files with identical names, it depends on the buffer
cache content (and thus, processes running in parallel, for example)
which instance you'll get.

Even if your buffer cache doesn't work this way, I gather that some
other operating systems maintain name->disk position mapping caches,
which have essentially the same problem: if the first entry has already
expired but the second one has not, you'll access the second one.


  Stefan


Re: Multiple files with same filename on FAT

Quoted text here. Click to load it


Absolutely not, because the rename operation either invalidates or
updates the cache. Same answer as I gave earlier: if this
functionality was broken, then the OS as a whole would be broken.


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

Your algorithm was:
  3. copy CLIP2.MPG to another location
  4. delete or rename CLIP2.MPG - the FAT code will rename the first
My point is simply: you cannot be sure that the "copy" operation
accesses the same instance as the "delete or rename" operation, because
there is no guarantee that you access the first instance both times.
Some cache *might* have remembered the second directory entry, so you
access that, not the first one.

You can hardly call this a broken OS, because you're giving it an
invalid filesystem image. For those, *my* requirements are not "behave
consistently and correctly", but "don't crash and don't make matters
worse" -- simply because defining what is "correct" can be rather hard
if input is a random byte soup.


  Stefan


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

Hmm, sounds like you two mean different cache data types.. :-).
Larwe means disk cache as I know it (and have done it for DPS), it
holds
disk logical blocks. It may happen these blocks hold a directory,
and it is much likelier for a directory read to get cached etc.,
but any higher level sees the same data, so there is no way to
present different images to the copy/delete/rename.
What Stefan seems to to mean is cacheing directory entries at a
higher level than disk I/O; this could perhaps cause the mess
he refers to.
 At least this is how I get it so far :-).
 But the entire thread is centered about how to fix the problem
without seeing a hex dump of the offending - duplicate - directory
entries in question... is it worth the effort? It should not be
so hard to do. It may turn out the duplicate directory entries
point to the same FAT chain in the first place... :-).

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------

Quoted text here. Click to load it



Re: Multiple files with same filename on FAT

Quoted text here. Click to load it


YES YOU CAN. Sheesh. Read my original post. I said you mount the image
via the loopback device. This runs the standard filesystem (FAT in
this case) code, and over/underlying layers, using an underlying
filesystem layer to do the sector-level I/O within the image. It is
indistinguishable from a normal FAT volume.

If the OS is caching pointers, disk positions, filenames, etc, it is
doing it the same way on this virtual drive that it would on a "real"
drive. There is no backdoor being used here that might cause some
cache coherency issue; both the original copy and the del/rename are
going through the same front door route through the overlying FAT code
(and volume/filename management code if separate).

I.e. the only way my scenario could fail would require that the OS
itself be intrinsically unable to handle the idea that a file might be
renamed.


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

I did. No reason to shout.

Quoted text here. Click to load it

Exactly. Now imagine this scenario:

You are listing the directory, by typing "ls" or something similar. The
operating system reads in the disk blocks making up the directory, and
places them in the cache. Let's say, block #17 contains the directory
entry for the first instance of CLIP2.MPG, block #18 contains the entry
for the second instance.

Now you perform your step 3. Most likely your OS will find the directory
entry from block #17 and copy that file.

In the meantime, another process (or maybe just the "copy" process) also
uses the cache, and happens to eject #17 from the cache.

Now you want to perform step 4. Your OS looks at its cache, and sees
that it still has #18 in it, so it checks that entry first before
starting expensive disk accesses. Luckily, that #18 contains a directory
entry for the file you want. Thus it deletes/renames a different file
than what you copied in the first place.

This optimisation is valid because it works on valid file systems. And
don't say "nobody does that", because my implementation does it (and I
think it's a quite obvious optimisation).

The lesson should be: if there is an atomic operation to perform the
thing you want - separate two identically named files -, use that, and
don't try to do it with multiple operations.


  Stefan


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

Well this scenario obviously works. I was about to question how
practical
it is to do so - but at a second thought I can see it can be
practical.
Not sure if it is worth the extra interaction between device dependent
and device independent OS layers, but I guess this "depends".
 What I have done in DPS is as follows: the device independent layer
writes to one or maximum two (consecutive) device blocks when updating
a
directory entry. Further, it allows the user level code to resume
searches at the point after the last search end because of a match.
 What the cacheing thing does is see if a directory
is being accessed, and if so - and if the directory is contiguously
allocated and not longer than the max. cacheable size (64k currently,
but settable), just read and cache the entire directory (this is
the typical scenario, will hit in 99+% of the cases). When writing to
the directory, since it is entirely cached as a single cache entry, it
cannot be split into cached and uncached blocks. The cache pushes
entries
on a LRU (Least Recently Used) base, so during directory intensive
r/w (copying multiple files etc.) the directory will most likely stay
cached; when pushed, the entire cache entry - up to 64k - is written
to disk, if dirty.
 Given that you have your method implemented already I guess I may
choose to follow suit, though :-). What I think I will add to
my above described mechanism is a bitmap of dirty disk blocks per
cache entry, and see how to write them back to disk when it comes
to that. Sounds like a good idea anyway - I guess I'll add it next
time I dig around those things.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------

Quoted text here. Click to load it



Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

I have major doubts that any mainstream OS does this "optimization" which
optimizes nothing. All FAT implementations I know do the linear directory scan.

Why optimizes nothing? because, if this page (cache of modern OSes is coupled
with memory-mapped files and so works in terms of CPU pages and not blocks) is
already present in memory AND the entry is found there, then there is no disk
access, with optimization or without it.

And, if the page is not in memory, then the expensive disk access is needed
anyway.

Also note that the sequence of FAT LFN entries can cross the page boundary.

--
Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
We've slightly trimmed the long signature. Click to see the full one.
Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

As far as I know, Linux caches name->inode mappings (dentry cache) which
is essentially "vulnerable" to the same problem.

Quoted text here. Click to load it

There will be *many* disk accesses if the directory you're looking at is
larger than your buffer cache. The linear directory scan will start at
the first sector (which is no longer in the cache, hence needs a disk
access), and ultimately proceed till the sector containing the directory
entry (which was in the cache when the operation started, but was pushed
out to make room for the other sectors).

Without that, my implementation took several minutes to read WinXP's
"i386" folder using a readdir/stat loop. Now, it's down to a few seconds
without even needing to seek.

An alternative would have been to implement a dentry cache. Doing it
this way way much simpler, all it needed was an enumerate-cached-blocks
function for the buffer cache.

Quoted text here. Click to load it

If you want to do LFNs, this can be trivially taken care of.


  Stefan


Re: Multiple files with same filename on FAT

Quoted text here. Click to load it

I am still unconvinced. I maintain that any FS layer that cannot deal
with renaming a file is broken. However even if this was an issue, the
OP could force a cache commit, e.g. by unmounting and remounting the
volume - or will you now assert that the OS will keep a cache valid
even across media ejects?


Re: Multiple files with same filename on FAT
Quoted text here. Click to load it

The problem ist that multiple lookups of a file may yield different
results if the FS assumes file names are unique, but on disk they are
not. I wouldn't call that a big surprise. Renames are unproblematic.


  Stefan


Site Timeline