Filesystem syntax constraints under Windows

Don entered filenames, albeit unusual ones. These filenames were perfectly valid for the filesystem (NTFS), they were perfectly valid for the OS itself (Windows), and they were perfectly valid for the programs used to enter those names. But they got mangled by other parts of the key Windows software when he viewed them.

Do you mean that because "a*b" has a non-letter, it is not a valid name? Or do you mean that Don doesn't count as a "user" because he is using filenames that you don't like?

We are talking about the Win32 API "CreateFile" call here! This is how you create files in Windows programming. If that is a "hack", then /all/ windows programming is a "hack".

No, it is your unique idea that an OS has to treat filenames using human language rules because they are always for human processing and consumption - and it is your unique idea to distinguish between a "file name" and a "file identifier", where a "file name" is case insensitive and human friendly, while a "file identifier" can store the case and use other characters.

Please try to cut down on the insults or implied insults - this is a friendly technical discussion, not a name-calling contest.

I agree that everyone says stupid things once in a while, especially on Usenet - I have done so often enough, and when I realise it, I have posted apologies or thanks as follow-ups.

I - and others - have completely disagreed with you concerning filenames and case sensitivity. From where I stand, you came up with some unusual claims that are at odds to the rest of the world of filesystem and OS design, and have repeatedly labelled these as "obvious facts" while denying all the counter-evidence presented. Your argument boiled down to accusing others of religious delusions. Look back in this thread if you don't remember the details.

(For your own OS, I assume that /your/ way of treating file names is the most suitable for the OS and its uses. That's fine. What I object to is your belief that it should apply to every OS, that other OS'es are fundamentally flawed because of their treatment of filenames, and that everyone else is "stupid" or "religious" for disagreeing with you.)

Some things I know, some things I learn. That's because I don't extrapolate the way /I/ do something to assuming it is the only right way to do it.

Reply to
David Brown
Loading thread data ...

No, he wrote a program to squeeze these names through the system. A user cannot enter names the system does not allow to be entered.

It might be, "*" is typically a reserved symbol for communication between the user interface and the directory search code. If the user interface won't let you do it chances are it is illegal. If you don't know the answer to that try to copy an existing file named say abc.txt to a*a, see what the error message will be. (Sorry to the rest of the group, obviously everyone knows the answers to that but it is not my fault we go there - and at the moment I don't feel like letting it go just because someone is too pushy with his nonsense).

Or do you mean that you would be wrongly labelled mental just because you want to have the name written on your identity card to consist only of hieroglyphs and special characters.

Don is a "user" as long as he uses the user interface. A programmer and a user are not the same thing in that contest and please do not go into more bollocks on obvious definitions like that.

While I don't know how this is done under windows I think you know that even less than I do. Your above means claim that every application written for windows has to prepare the name such that it will be a valid one. It is obvious that normal windows applications do not write their own name validations, if some call allows writing an invalid name then reading the OS manual will tell you that you must use another call prior to it which validates the name. You should demonstrate at least such basic programming knowledge if you want people to take your posts as something more than standard "always know better" babble by someone who does not really know what he is talking about.

Ah, now you are trying to cheat your way out of the hole. No, I never said that. I said that file names are ALSO for human consumption.

Try to spell over the phone a file name like "ThIs Is An eXamPle Of A nAMe foR iDioTs". Then come and repeat your claim that file names - or whatever names which are represented in text - are to be compared case sensitive.

Or simply stop posting nonsense.

Dimiter

Reply to
Dimiter_Popoff

"?" is the replacement character when it cannot map a Unicode character to the console code page.

Maybe try this: mark the file name in Explorer, copy it into a text file with Notepad, and hex-dump that. Maybe it's the SFU subsystem which translates. This mailing list post

formatting link
snipped-for-privacy@vger.kernel.org/msg09969.html indicates that they indeed map the reserved characters to Unicode characters above 0xF000.

Stefan

Reply to
Stefan Reuther

No, he did not "squeeze these names through the system". He used a program written by Microsoft, for Windows, in the way it is meant to be used, and it obviously let him enter the names.

NTFS can (and always could) be configured to be case-sensitive, which the SFU/Interix tools use through the POSIX subsystem. Of course, the Win32 subsystem which uses a different configuration isn't particularily happy about that, much in the same way the virtual DOS subsystem isn't particularily happy about names that don't fit into the 8.3 convention.

Stefan

Reply to
Stefan Reuther

Ah, OK. So, the next (theoretical) question would be what fopen(3c) would expect for such a file.

Ah, that's a good idea! Or, a Unicode editor...

That is, in fact, how the [box] characters are (in Windows Explorer) (U+F03A, iirc, for the ':' and U+F03F for the '?'). Note that all the other characters[1] are displayed properly in explorer ('`)

[1] The '*' appears to just "disappear" in explorer -- "A*a" displays as "Aa".

Amusing little exercise that just increases the uncertainty that a user will know the "real" name for a file! :-/

Reply to
Don Y

Am 22.10.2014 um 01:27 schrieb glen herrmannsfeldt:

It's not the rules of the file systems that differ from each other. It's the rules of how other subsystems use the file system(s).

The clue to the difference is that current NT-based Windows is actually a three-tier system: the Windows subsystem(s) on top of the NT kernel, on top of a hardware abstraction layer.

File systems are part of the Kernel, and they're apparently required to be fully case-respecting. The silly "case-preserving, but not case-sensitive" behaviour must be implemented on top of that, by the Windows subsystem.

The "Interix" tools Don Y talks about live in an alternative subsystem, distinct from Windows itself, that works directly with the kernel. So Interix can indeed use the same file systems, but do it differently than the Windows subsystem.

The difference isn't really that big. For reasons of compatibility, even NTFS directories have to maintain 8.3-format alias names for all entries whose names don't match that format.

It can't just supply one, it has to pick an 8.3 alias name at file creation time, and _store_ it along with the "real" one, because that alias has to remain unchanged for as long as the entry exists.

Some truly braindead installers even presume that the path to their installation directory must "of course" be below "c:\progra~1", (just to avoid having to evaluate %ProgramFiles%).

Reply to
Hans-Bernhard Bröker

Notepad is a Unicode editor.

In fact it was nearly the only program in Win NT 3.51 that supported Unicode :-).

Install "Arial Unicode MS" font (several tens of megabytes) and select it in Notepad and Notepad will show the file names correctly.

Reply to
upsidedown

I still do not understand what this thread is all about.

If the intention is to mount foreign file systems directly (local drive) or over the network, you really have to use features that all systems support. There is not much point of trying to map the most awkward features of each system to every other file system.

Some Linux based systems support Unicode file names in UTF-8, while Windows NTFS is UTF-16 based (but the supported characters might be different), while some Unix systems are Latin-1 based or just supporting 7 bit ASCII (upper and lower case). Some older 6 bit systems only supported upper case letters or just 40 symbols.

IMHO, it is pointless to try to make very special mappings. In order to co-operate, you really have to forget any "purity" claims and try to find what is common in different systems.

Reply to
upsidedown

He typed "touch C*c". "touch" is an extremely common command on all posix systems, including the various versions of MS's "unix services for windows" throughout the history of Windows NT.

So no, Don did not "write a program" or "hack the system" - he used a standard command line utility available to users.

The point is that /some/ interfaces on Windows let you use this symbol (and other symbols, and filenames differing only in case), while other interfaces disallow entering such filenames, and mangle the view of such files.

This is all using programs and utilities that come out of the box with Windows (at least the server and "ultimate" versions).

There are plenty of languages which are written in "hieroglyphs" or have special characters as part of their written form. I'm lucky - my name uses only characters from the 7-bit ASCII character set. But other people could certainly want their names written in non-ASCII characters.

See above for a note on the "touch" user program.

Most programs would simply pass on the filename to the Win32 API (CreateFile, which by MS logic is also used to open existing files), and if the function returns an error about an invalid filename, that will be passed on to the user. Programs would normally only do their own validation if they had reason to be extra fussy.

Note that the Win32 API supports two different semantics for file name validity, which you can choose - you can use "posix semantics", allowing case sensitive operation (and some extra characters), or "default windows semantics" with case insensitive operation. Either way, the function will check the validity of the operation requested.

Let me demonstrate my basic programming knowledge by my ability to look up CreateFile with google to find the MSDN page:

Looking up the documentation for the "touch" program, to see that it is a common user program and not a "hack" or a special program written by Don, is left as an exercise to the reader.

And it has been pointed out that humans use wildly different rules for how they write according to their language, alphabet (or non-alphabetic writing system), and even their personal habits - thus the OS should not try to second-guess them.

Filenames that are meant to be typed by humans (or read over the phone) should be chosen to make sense - but most humans will do that automatically. And there is /no/ requirement to be case insensitive for that - /I/ certainly have no problem saying "readme dot txt, all small letters" or "readme dot txt, with a capital r".

No case-insensitive system is going to protect someone from a file name like "This is an exampel of a name for idiots".

Case-insensitivity made sense when there were computers with 6-bit characters. And it is at least consistent when you are limited to 8.3 capital letters purely from the 7-bit ASCII set. (And it can still make sense on small, niche OS'es where you need a simple and limited system.) But on a modern multi-lingual general-purpose OS and filesystem? At best you get an inordinately complex, inconsistent system that works for some languages and not others - that's Window's solution.

Reply to
David Brown

Apparently so. Though MS seem to be somewhat more than "particularly unhappy" about case sensitive name searches, I stumbled across their text in the link David looked up demonstrating his programmer's skills saying "do not assume case sensitivity", "can be used to but" etc., basically what I say about DPS - which is also perfectly capable of case dependent compares - and does these as well as the case independent ones likely faster than the rest, it compares 32 bit words - but I expect one would run into problems when going into not that chartered territory, just as Don did with MS.

Dimiter

Reply to
Dimiter_Popoff

I suppose these skills will get you a secretary/typist job, at least you'll make it to the interview.

Clicking once or twice on links in your discovery one sees how strongly MS discourage use of any case dependent search capabilities. Like I explained to you earlier text data can be compared either way given it is all preserved, how can you have a problem understanding *that*?

So cheating did not work but why not persist by repeating yet again something completely irrelevant.

So when creating a file name the system should know whether it will be spelt over the phone. Good, good, way to go, Einstein.

Dimiter

Reply to
Dimiter_Popoff

(snip, someone wrote)

Do you mean actual DOS, or a DOS program run under Windows? (At least for versions that will run 16 bit programs.)

It is always interesting when you mix things that were designed separately, such that the differences show up in unobvious places.

-- glen

Reply to
glen herrmannsfeldt

Both actually. A DOS program running under Windows is still working through (an emulation of) the int21h file api. It can't do anything more than an actual DOS system.

George

Reply to
George Neuner

Dimiter, I have no idea why you are so obsessed with insulting me and posting pointless derogatory remarks. I have a pretty thick skin after many years of using Usenet, but I can't say I find it pleasant - especially from someone whose knowledge and experience I respect.

We disagree on whether filenames should be case sensitive or case insensitive at the filesystem level. That should be fair enough - it's a technical disagreement, and there has been an exchange of ideas and thoughts in this area (by other people as well).

But you have been dragging the disagreement down to a kindergarten squabble. I have tried to avoid retaliating, but I have failed - I have certainly been sarcastic and patronising, and thus encouraged you.

In order to avoid getting completely out of hand, I will therefore have to stop posting in this thread, and will not reply to your points. I hope that next time we "meet" in this newsgroup, we will be back to the friendly and professional tone that is the standard in comp.arch.embedded.

David

Reply to
David Brown

(snip, someone wrote)

But it can read/write to NTFS disks.

-- glen

Reply to
glen herrmannsfeldt

David, I am really a less patient person than I used to be say 5 years ago, the less time we have left the more we tend to care about it I suppose.

I have had no problem when we disagree on something which can be argued either way; unfortunately the fact that file names are also for human consumption at the current evolution stage is no more arguable than the result of adding 1 to 1. You are clearly unable to accept you have been wrong on something that basic and ever since you first declared that file names were not for human but for machine processing you have been in track repeat flailing mode. While I know I am not patient at all (have never been), I don't think I was too impatient. I am glad you decided to put an end to it.

Dimiter

Reply to
Dimiter_Popoff

I suspect that somewhere near the beginning, we have got something mixed up and misunderstood. I can't say for sure what it is, and I don't want to try and dig it up - but I think the result is that we have been arguing slightly at cross-purposes. Thus perhaps you have argued that 1

  • 1 = 2, while I have argued that 2 + 2 = 4, and that is why neither of us will back down! (Let us not try to find what went wrong, and certainly not attempt to find out whose fault it might be, as we would argue about that too.)

I just hope that Don learned whatever he needed to know from this thread :-)

David.

Reply to
David Brown

And also network mounts and NTFS junctions via the emulation ... but that's no different from DOS running over, e.g., a NetBIOS driver.

DOS programs under Windows still are constrained to DOS file naming rules and DOS supported file access modes. They are completely unaware of NTFS long file names, streams, extended attributes, AC lists, etc., and they can't follow Windows shortcuts.

There's probably some other restrictions that I can't recall just now. I haven't had to deal with DOS under Windows in a very long time.

George

Reply to
George Neuner

Am 23.10.2014 um 20:50 schrieb George Neuner:

Not true at least for the long names. MS-DOS has supported long filenames for about 20 years now (since Win95). NT was a little late on the bus, but they eventually made long names available to DOS in NT4 SP5, if memory serves.

Reply to
Hans-Bernhard Bröker

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.