Filesystem syntax constraints under Windows

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Oct 13, 2014 2:52 AM

When you start attributing "meaning" to the symbols used to create names -- and, thus, electing to impose "harmless transformations" on them (case, etc.), you risk altering the intended "meaning" (which only the originator of the identifier can define!).

Should U+24B6 and U+24D0 be treated as equivalent? What about U+249C? And *all* of U+00C0 thru U+00C5? And U+00E0 thru U+00E5? (and even more "obvious" mappings elsewhere in the codeset)

For even wonkier ideas, should U+2801 and U+2809 be considered "equivalent"? (why not?) In UBC one could argue that they are. Yet, if the creator had intended this to be UEB, they would be VASTLY different! (a vs c)

You can make translation maps for any set of symbols. But, it seems to be a lot more work and a lot less "robust" in interpretation...

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Oct 13, 2014 5:14 PM

Because you know what an exe file is and why you'd want to download one. Now think of the grandma who got an email that says she should pay that invoice she'll find in invoice10298234.doc.zip.exe, and if she doesn't, someone will come, kill her dog, poison her geraniums, slash her tires and spray-paint her door :-)

Stefan

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Oct 13, 2014 5:26 PM

Last time I looked, Unix software worked fine.

As we have learned, defining semantics for "just ignore case" is hard (if you're in Russia, what reason do you have to prefer the Ii case pair over the I? case pair?). In a mission-critical piece of software like a kernel or a file system, I don't want code with vaguely-defined or complex semantics. Thus, "here's a byte string, give me the file which has the same byte string as its name" sounds like a pretty good plan.

If you want to be case insensitive, do that in the application. Or in a foundation library for applications to use. The application knows when it wants to be case insensitive and when not. And, the application knows what locale it is in, and whether it should access "index.html" or "?ndex.html" when the user wants "INDEX.HTML".

As a bonus advantags, building a case-insensitive file system on top of a case-sensitive one is easy. The other way round is hard.

I think we are in comp.arch.embedded, not alt.dick.size.wars, but if you replace one-and-a-half language with a few unicode algorithms and some more filesystems, you end up at my stats.

Last project (incomplete): customer wants "hey, let's display an alphabet next to the word list ... what, there are different alphabets? Different sort orders depending on the language? Oh."

Stefan

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Oct 13, 2014 6:18 PM

That's why my mother has a Mac, and my mother-in-law has Linux mint :-)

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Mon, Oct 13, 2014 7:00 PM

Not really, the relevant part has never worked and still does not work. See my example in a previous post how may versions of say "index.htm" Index.htm INDEX.HTM you need.

How you treat cases when you record a string (be it a filename or not) is up to the application writing the name - or just the user typing it in. Once it is recorded along with the case information you no longer need to know in which language it is or which alphabet this is, for that. Just the character set - which may cover a lot of alphabets and their variations (check upthread my way of doing it in dps, I explained it).

So your above example is completely irrelevant to what we talk about.

Filenames which are for human processing consist of characters, not of bytes. In unix names are stored as bytes and the human is given to process bytes, not characters.

You should be more specific if you want me to bother to understand that. I gave you part of my stats for a good reason you gave me in your post, you did not get it, fine, I'll live. What I see is that you do not want to accept obvious facts, like what is a byte and what is a character, what is a character string and what is a sentence with meaning.(I assume you do know these?)

I can imagine it can be hard to even think you may have spent years and years building on a broken basis. Don't get me wrong, not all is bad about unix of course, but the file naming in it clearly is just someones quick hack which has survived for decades. Not that I expect the devotees to be able to swallow such a fact, not after seeing the reactions to me just stating the obvious.

Dimiter

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 8:25 AM

The file in question is usually called "index.html", except on systems based on outdated and limited MSDOS filesytems.

The file's name is "index.html". Not "Index.html", nor "INDEX.HTML", nor any other mixups. The name uses small letters. There is no confusion, and people don't have any trouble with it.

Most names in the real world are case-sensitive. My name is written "David" - not "david" or "DAVID".

Filenames are mainly for programs to process, not humans - and software is perfectly capable of getting the case correct and consistent. So are most humans that I know, except perhaps when names are particularly inconvenient (such as having double spaces, or unicode glyphs that look like other glyphs).

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 10:28 AM

Sorry David, I don't do religion. When you are able to grasp what nonsense you have written (see the above)we may have what to talk about again.

Dimiter

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 12:37 PM

I realise you have a wildly different opinion about this case, which you hold very strongly. I just don't understand it at all.

You have been shown clear linguistic reasons why case independence is not universal (even when languages share an alphabet) and therefore should not be part of an OS or filesystem. You have been shown clear practical reasons why it is better for an OS or filesystem to work directly with the bytes of the filename rather than imposing an interpretation on them. You have been shown how real names in (both in the software world and in the "real" world) can often be case-sensitive. And you have been shown many examples of systems that have case-sensitive filesystems which work perfectly well.

I don't think there is anything more that can be done here. You have a solid fixed opinion that is different from most other people's, and it does not look like you are going to change that soon - nor does it look like you can explain it in a way others can understand. I'm sure you have good reasons for your thoughts here, even if I can't appreciate them - so we must just agree to disagree here.

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 1:00 PM

Yes, thanks for teaching me the alphabet. If you can suggest a more moronic sort of effort to explain the obvious than that you are welcome to share it.

Now that has not only not been shown but I showed in clear, irrefutable terms that it is not the case at all, quite the opposite.

The fact that you have religious views on it speaks only about your ability to understand issues at this level. You may want to stop trying, posts like your last two only make you look not as bright as you would want to be.

Dimiter

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 1:19 PM

I think the difference lies in your expectations of the user's role in dealing with "names".

E.g., I find it particularly annoying that Windows doesn't use a strict LR alpha sort. So, I am always looking for "90" to follow "9" -- not "89"

And, "folder" to follow "ezzz" -- instead of appearing up at the top of the list among the other folders.

This happens in other things, too. E.g., calculator keypad vs telephone keypad.

In *my* case (as the OP), *names* are primarily (almost exclusively) used by pieces of code. Getting the identifier EXACTLY correct is a small price to INSIST UPON in robust software.

And, now I'm late...

- S
- Stefan Reuther
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 4:59 PM

I see a file "index.htm", I open a file "index.htm". Works fine for me.

I have shown you examples why this cannot work without information about the locale (the "iI" case, which caused real trouble in PHP).

Of course you can record a language along with the file name. Or you can store the file name in a normalized format. Or implement some fuzziness, like "if they ask for INDEX.HTML, I'll give them index.html if there is no ?ndex.html". But why? Think of all the software breakage you could cause by creating an ?ndex.html! (Much like the software breakage you can cause by making a c:\program.exe.)

IBTD. File names are for processing by programs. It is true that many file names are processed by programs on behalf of humans, but I wouldn't even claim the majority of files are created directly on behalf of a user (when every web page access generates a dozen temporary files).

But it's the responsibility of the program to deal with case conversions. And, actually, programs do that just fine. When I type a file name, typical GUI file requesters immediately offer possible completions, or move the cursor in the file list; of course they handle case differences here.

Of course I know what the difference between a byte and a character is. The simplest summary: byte = kernel thing, character = user thing (because the user can configure the mapping between bytes and characters. LC_CTYPE in unix, 'chcp' in Windows). A file system is a kernel thing.

Stefan

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 6:39 PM

Yeah. What an unmatched wisdom, why would I expect anything less than that from a devotee.

Are you sure you do work related to comp.arch.embedded? So far you are only posting religious babble like the above. Beg to differ all you want, be welcome to call it "my strong opinion" when I say that 1+1=2, feel free to disagree and write another ton of nonsense.

So in reality you don't know. Neither did you grasp the obvious fact that the unix problem is because of the filesystem user INTERFACE, a problem which survives for decades and which devotees like yourself want to fix by persuading the world to being to read/write bitstreams rather than text, like here:

I really have no more time for such nonsense, if I had I would not be reading comp.arch.embedded but some facebook group or some forum for techy housewives.

Dimiter

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 9:29 PM

Dimiter, I really don't know what your problem is here, but I see no reason for this stream of insults directed at people giving clear, rational and technical counterpoints to your opinion here. Usually such ad hominem attacks are used by a person who realises that they have lost the rational argument, and are trying to "win" by the written equivalent of sticking their fingers in their ears and shouting "la, la, la".

Perhaps your aim was to irritate people and make it clear that you are so smart and so experienced (more so than any other OS designers, such as those behind Unix), that you don't need to listen to anybody or consider any other viewpoint. If that is the case, then congratulations

- and I'm sure you will be rewarded with a lot fewer helpful or interesting replies in the future.

But perhaps it's just that I have a religious dislike of insults and misogynist language.

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 9:46 PM

Yes, this is what you and Stefan have been doing for a few posts now. Repeating nonsense and hoping that nobody will understand you are talking nonsense. In comp.hobby.wannabe, may be. But not here. Everyone can read the thread, remember. You may want to stop digging, meanwhile I am sure you have realized you are in yet another hole.

You were hasty to give an opinion on something you are ignorant about and now you madly try to prove you were right, talk of your "arguments" that 1+1 is not 2 - do you seriously hope anyone will buy your "arguments"?

Try some other group for that.

Dimiter

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Tue, Oct 14, 2014 11:38 PM

You might want to remember that yourself. Everyone can read that how the discussion played out. And even if everyone agreed with you that OS'es and filesystems should be case insensitive, I think that your posting style will make a far more lasting impression than any technical issues. But that is for others to decide - not you or me.

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Wed, Oct 15, 2014 5:57 PM

Am 14.10.2014 um 12:28 schrieb Dimiter_Popoff:

I have to call BS on that. The way you've reacted to people voicing any different opinion on this issue is a textbook-grade example case of exactly what happens when people have their religious dogmas challenged.

So: not only do you do religion quite heavily. This issue evidently _is_ your religion.

- M
- Mel Wilson
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Wed, Oct 15, 2014 6:43 PM

Yes. ISTR it was MacOS where those extra file "forks" (were they called? ) were first made popular. Where Windows maps from the file extension, and Posix applies magic to a few bytes from the beginning of the file, MacOS identified the relevant processing application within a non-data area in the file.

Mel.

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Wed, Oct 15, 2014 7:58 PM

I don't know what they are called on the Mac, nor who "invented" them, but "extended attributes" of one sort or another are supported by many OS's and filesystems. I haven't used them directly or knowingly, but I suppose they are used behind the scenes in different systems and desktops for things like image thumbnails, "emblems", or filetype information.

- D
- Dimiter_Popoff
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Wed, Oct 15, 2014 9:06 PM

Whatever you say. Call again when you can point us to an example where I was wrong or doing dogma.

I understand you badly want it to be this way, unfortunately for you it is not - and you know it.

- G
- George Neuner
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Oct 16, 2014 3:55 AM

Yes. Macintosh executables had forks: every file contained a "data" fork, executables contained an additional "resource" fork where their code lived. Window/widget templates, images, icons, sound files, etc. all went into the data fork of the executable.

It was backward to the traditional notion of "resources" as things for code to use, but at a developer conference I attended it was explained that "executables have the resources to use data".

It also made some sense to think of code as a resource in light of the Macintosh's small memory (relatively, for a desktop) and it's memory/resource manager that allowed demand loading/unloading of code.

George