Locking semantics

D

Don Y 11 years ago

Hi,

[This one is probably too subtle to sort out ahead of time. But, maybe some insights can be gleaned?]

I'm trying to sort out the best semantics for (file) locking operations.

I don't have a traditional filesystem. Or, in fact, *any* sort of "filesystem". Rather, I have "objects" (bad choice of words) and they are referenced via "namespaces". A namespace can

*resemble* what most folks would consider as a filesystem hierarchy (e.g., /devices/UART0; /log/startup; /net/connection/27; etc.). So, for example, the "UART0" object may reside in the "devices" container, etc.

Each "process" has its own namespace.

Unlike traditional filesystems, my namespaces are not portions of -- not windows into -- a single, unified filesystem (as is typically the case in most conventional systems).

With these things in mind, a single "physical" object (for example, that UART alluded to, above) may be known as: /devices/UART0 to process1 /my/favorite/bitbanger to process2 /output to process3 /log *and* /connections/upstream to process4 etc. Like having multiple hard/softlinks to the same "file", each in a separate *jail*!

Note that /output - and it's binding to that "object" -- only makes sense in the context of (e.g.) process3. Process2 may not have a "/output" in its namespace. Or, may have a /output that is bound to a display, etc.

However, a process can *share* a namespace (or portion thereof) with another process. In which case, "/output" *is* the same "/output" in that "other" process (call it process99). Said another way, if /output was renamed by process3, that other process (process99) would *see* this new name -- /output would disappear (or, be available for reuse by either or both of those processes -- keeping in mind that process2 could still be using it for something entirely different).

With that as background...

Imagine process3 wants to take an exclusive lock on /output (or, any other object that is accessible via its namespace).

I think we would agree that this should prevent process1 from accessing that "physical" object -- the UART, in this example -- for as long as the lock is held. Whether process1's operations block waiting on the resource, return an immediate error or register a callback is immaterial. The point is, the lock applies to the *object*, not the

*name(s)*!

But, what (should) happen when process99 tries to rename the object (in the namespace that it shares with process3)? The *name* binding isn't locked, just the object itself!

[Think about this... if you locked the name bindings, then, conceivably, process1 shouldn't be allowed to rename /devices/UART0 as /devices/MyUART while the lock is held!]

However, process3 may be in the process of building a namespace for a new process that it is about to spawn -- hence the reason he took the lock on the object... to prevent the object itself from changing from it's current state due to actions by "others". I.e., process3 wants to build a new namespace and reference that UART ("/output") *in* that namespace -- perhaps by the exact same *name*.

How does process3 "lock" the name binding to that object -- at least for the duration of this atomic operation it is attempting?

The workaround is to create another *private* name for the object (using the "handle" that it currently holds) and pass *that* binding on to the new namespace.

The other, heavy-handed approach is to take a lock on the "container" (in this case, "/") so that no changes to the container can be made for this duration.

Namespace operations aren't particularly expensive. OTOH, nor are they *cheap*! I'm mainly concerned with reducing the possibility of latent bugs where subtle races could potentially exist.

Imagine a *traditional* filesystem (single, unified namespace). You have a process that has been accessing a particular "file". It wants to spawn another process to do some further work on that file. So, it wants to pass "filename" to that new process.

But, at the same time, some *other* process (unbeknownst to you) wants to rename "filename" to "myfile" -- because it is ignorant of *your* reference to that "name". Depending on the order in which these competing actions interact, the new process that you spawn may find "filename" is "not found".

As I said, I think this is probably too subtle for a simple resolution. I imagine I'll have to implement it one way or the other... and see how clumsy the approach ends up -- then rework it. :<

Vote

B

Boudewijn Dijkstra 11 years ago

Op Tue, 19 May 2015 09:10:38 +0200 schreef Don Y :

...potentially partially shared...

A jail with holes in it?

That seems unnecessarily complicated and prone to security problems.

Exclusive locking is selfish. Why not send a message to the object's owner?

It would be polite of either process99 or the namespace manager to send a name change notification.

Why is that needed? Once you have a pointer to the object or a "file descriptor", its name shouldn't matter.

No. Normally you fork(), causing open file descriptors to be copied.

Well that is what happens when you allow renaming in a shared namespace.

That's a common adage.

(Remove the obvious prefix to reply privately.) Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Vote

A

Aleksandar Kuktin 11 years ago

Have you had a look at Plan 9 from Bell Labs? Their filesystem-based everything seems to be *exactly* what you have here. In fact, I'm not even sure you're not using Plan 9. :)

It might be a good idea to just shamelessly rip off Plan 9's file locking.

This might be a good place to start:

formatting link

Vote

D

Dimiter_Popoff 11 years ago

I suppose I have been through all of this last 20+ years writing and "living" inside DPS and I think things are pretty straight forward.

If it is a file, applications open it in a "registered" manner, i.e. each open file has only one IOCB (input/output control block). Tasks can "check in" to access it (there is a bitmap attached to the IOCB for that purpose). The IOCB can be "locked" by a task in which case the rest of the tasks just cannot access the file. Renaming a file will take locking the IOCB - having write access to this file - and doing it. Lock for renaming only is nothing I have ever considered, it would take another bit in the IOCB. But since the IOCB is a single one system-wide there could be no serious issues with renaming it, every registered task would have access to the latest name.

The more generic naming - or whatever parameter of whatever object - I have been through (and still am when programming) is with the DPS runtime objects. You could do all you want to do with one of them, say "lock" for modification - save former lock state (in case your task is locking an object it has previously locked already) - modify - restore lock state. The number of "lock" bits you can implement is obviously object specific, you can grow them with new types as needed etc.

I suppose what you are describing is still a bit too general. While it is crucial to get things right at the foundations I suppose your questions will get their answers automatically as you go :-). So I posted some of my experience, some phrase might click and be useful when you are doing this, which I suppose is what you hope for initiation the discussion.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Vote

A

Aleksandar Kuktin 11 years ago

What I'm thinking as well.

You obtain a lock on the underlying object. Presumably, user processes talking to the kernel/namespace server can reference an object by its True Name, regardless of the filesystem location of the object.

But if you don't obtain locks on the object but on the filesystem/ namespace locations of objects, then obviously every filesystem/namespace that exposes the object needs to be made aware of the lock. Otherwise, how will they enforce the lock?

Or are you trying to make the third, ultra complicated option, where every namespace exposes the same object, and locking one exposure of an object by design locks only some of the other exposures while leaving the rest unlocked?

Vote

D

Don Y 11 years ago

How so, "holes"?

It actually makes security *easier*!

A "process" can only access objects from which it can resolve a "name". E.g., process77 can't reference "/output" and expect it to yield access to the same object that process3 calls "/output". Process77's "/output" may have been bound to some other object (by whatever created process77's namespace -- or, whatever *other* object process77 had a handle to (including objects that process77 may have created or that may have been passed *to* process77 during the course of its execution).

So, if process66 (the creator of process77) builds a namespace for process77 (which can only reference objects that process66 can *name*), then it is known that process77 can only reference that subset of system objects. E.g., if none of them are bound to the physical "UART" device mentioned above, then there is no fear that process77 will EVER be able to access that UART (unless someone passes a reference to the UART

*into* process77 at runtime -- and, that can only be from someone who *has* such a reference AND has a comm link to process77.)

There are still cases where two or more processes may wish to share a single copy of (a portion of) their individual namespaces.

This is, in effect, what is done. But, "owner" is the server that is responsible for that sort of object, not some process that may be *using* it. I.e., all operations on the object are actually messages to the object's server ("implementer"). So, a lock is simply a message from someone holding a live reference to a particular instance of the type of object that is served stating "give me exclusive access to this object, regardless of other 'clients' that may currently have live references to it". As each object is "served" from one particular server (of thqat type), this ensures that the lock is applied (on what I called the "physical object") universally and consistently.

Being "polite" isn't the same as *requiring* that. Or, "enforcing" it.

I.e., the "best practices" approach would be to create another private portion of the namespace and bind a name to the object, there. So, only *you* can twiddle with that instance of the binding. Then, pass *this* on to the new process that you are creating (see immediately following).

You're thinking too "monolithic kernel". What do you do when the "new process" is on some other/remote node? Or, when you *want* to pass a "symbolic reference" in the form of an object name?

I can spawn another "process" on any physical node in the system (for which I happen to have "rights"). Likewise, a process that I spawn can be instantiated wherever the system decides is appropriate (based on current and future workloads).

Passing an open "file descriptor" (remember, these aren't really "files") is the same as passing an anonymous object to another process. If I truly want to *share* that file descriptor (and not just the current *copy* of it), then I have to arrange for any changes made to it by ProcessA to be seemlessly reflected back to ProcessB (this is expensive as A and B need not be collocated)

Yes. In a traditional file hierarchy, it is a potential problem any time *any* process changes an existing name. So, you adopt conventions to minimize the chances of this happening -- and/or use ACL's to protect it from happening (sure, B can access it but can't access the binding of its name in the directory, etc.)

When *everything* is accessed via (your) namespace, this gets potentially more difficult. More opportunities for conflict (because the writer of X hadn't imagined someone writing Y and failing to address this competition.)

But an expensive one!

Time for my nap...

Vote

D

Don Y 11 years ago

What if the name is changed?

See below.

The issue is "how do you obtain a lock on the *name* (binding)".

A namespace is, itself, an object. When you want to "lookup" a name in a namespace, add a name to a namespace, remove, rename, etc. you pass a "handle" to that namespace to the namespace server (which is typically replicated on each node as most processes don't deal with more than their *own* namespace -- yes, you can manipulate some *other* process's namespace if you have the right to do so!) along with the operation you want performed, any required credentials and parameters, etc.

The namespace server then performs the operation for you. (i.e., *you* don't keep track of the contents of your namespace -- as it is possible that they can be changed while you are accessing it... just like another process could add a file to the global namespace that is typically called a filesystem)

So, I can just as easily place locks on *names* as I can on the objects that they *reference* -- using the exact same mechanisms, just imposed in different servers (i.e., a "file/object lock" is implemented in the server that manages that particular object. Anyone holding a handle for that "physical" (bad choice of word) object has to (similarly) pass that handle ("along with the operation you want performed, any required credentials and parameters, etc.") to the server for that object. *It* knows that the object is locked, who currently holds the lock, what criteria apply to that lock, etc. REGARDLESS OF WHERE IN THE PHYSICAL, DISTRIBUTED SYSTEM those clients and objects reside. The server decides whether and how it will respond to those "other" operation requests on the referenced object. (Likewise, it can know to *grant* those operations to the lock holder -- because it knows which requests are coming *from* the lock holder... including the eventual request to release the lock!)

Repeat this description but replace "namespace" for "object". I.e., a lock on a name in a namespace can be similarly encoded in *one* place (regardless of how many processes have access to that namespace) and enforced as such.

Doing this in a monolithic filesystem (single namespace) is difficult, at best. Sure, you can lock a file (contents). But, how do you prevent its name from being changed? You need very fine-grained ACLs for the file and/or "container" (directory object) so you could prevent changes to individual directory entries by *some* processes while not interfering with directory operations for *other* entries.

Said another way: write a piece of code that only allows processX to modify the name of a particular file in a *large* directory while not preventing processY from adding, renaming, unlinking, etc. OTHER names in that same directory. And, once processX is done with its actions, allow processZ to impose the same sorts of restrictions (while it "does its thing")

Vote

D

Don Y 11 years ago

I don't think Plan 9 supports file locking as an OS primitive (at least, Inferno doesn't and much of inferno is derived from Plan 9's concepts/implementation.

[I'm not keen on the "in channel text" by which "commands" are invoked on "objects" -- too much parsing required: "Did I just receive a message to SET the baudrate? Or, GET the baudrate?" Likewise, too much error checking on syntax that must be done in the object implementation: "SET BAUD " vs. "GET BAUD". In my approach, everything is just an RPC: set_baudrate(device, rate) or get_baudrate(device) -- so, the compiler can do a lot of the checking, type enforcement, etc. for you]

My namespaces come from Inferno but the mechanisms for distributing and manipulating them (along with other objects) is more closely attuned with Mach. In particular, my introduction of "credentials" and fine-grained access control relies heavily on Mach's abstractions.

For (silly) example, I can let task1 set the baudrate on a device -- but not alter the "word format" (number of stop bits, etc.). And, at the same time, prohibit it (task1) from sending or receiving characters through that device -- while task2 can freely push/pull data yet is prevented from altering the baudrate.

(things that aren't possible in a more conventional namespace/ACL).

E.g., a "safety" task can be given control over the brakes on a mechanism yet prevented from otherwise altering the *operation* of that mechanism. There's no way to provide that level of access control using traditional semantics.

Vote

D

Don Y 11 years ago

So, how do you support two (or more) names pointing to the same file?

E.g., I might have, on one of my UN*X boxes, 20 or 30 file *names* (entries in a directory) that all reference the same "set of bytes" (i.e., "program") on the disk. This is a technique used in creating a "crunch'ed" file -- a single executable that actually represents many different programs mashed together.

The first thing this sort of "program" does is figure out how it was invoked -- i.e., by which *name* it was invoked. Then, based on that name, one of N "main()'s" contained in the file are given control. The overall effect is that you can link the traditional library routines ("overhead") that most programs require into this crunched "program" exactly once and have them accessible from any of the N "main()'s" that are embodied in it.

int main(int argc, char **argv) { ... if (strcmp(argv[0], "copy") { return mainCOPY(argc, argv); } else if (strcmp(argv[0], "delete") { return mainDELETE(argc, argv); } else if (strcmp(argv[0], "fly") { return mainFLY(argc, argv); } ...

} else { fprintf(stderr, "Not found\n"); return -1; } }

int mainCOPY(int argc, char **argv) { ... }

int mainDELETE(int argc, char **argv) { ... }

In this case, there would be directory entries for "copy", "delete", "fly", etc. -- all pointing to the *one* instance of "program" on the medium.

What I am concerned about is developerA deciding on a use for some set of objects/"names" at some point in time. He implicitly expects his use of them (the names *and* the objects) to be "exclusive" ("Why would anyone ever want to use this mumbledypeg object, 'foo'? It shouldn't be of value to anyone OTHER THAN me!").

If the mechanism for locking a *name* is tedious/expensive/error prone, then he will PROBABLY err on the side of laziness (after all, who could possibly want to use that name/object?!).

Sometime later, developerB sees that developerA has done a fair bit of his work for him in creating and manipulating "foo". So, being lazy, developerB tries to leverage that existing effort by referencing "foo" -- potentially ALONGSIDE developerA's reference!

Everything *seems* to work fine... until, one day, it doesn't! A (or B) loses a race and the corresponding code fails. "Can't happen" actually *does* happen!

And, identifying the actual cause of the problem may be deeply involved/convoluted depending on how those resources are used.

Even if you can point to developerA as the initial cause of the problem, he/she may no longer be available; their sources may be gone (or "closed"); the toolchain to support the binary may no longer be conveniently available, etc.

So, it would be nicer/safer/more robust to have mechanisms that a developer might need *available* instead of forcing the developer to create an ad hoc approach -- possibly faulty -- to get the results his code *requires*.

Vote

D

Dimiter_Popoff 11 years ago

I could easily do but I do not on purpose. Nothing is stopping me from having multiple directory entries pointing to the same piece of data or to some other directory entry which points to the data etc. However, this has implications - new directory entry type (so the data gets deallocated only for one of the directory entries) or just making a "link" type directory entry which may easily turn out to point to nothing (user deletes the file it points to). Solvable issues but adding complexity and no serious benefit.

Under DPS this is solved in a much more general - and clean - way. Each piece of executable code is a "program module" which is accounted for by the OS (keeping track on where it came from into RAM) and, if reentrant/position independent (all I do are that) it is used by any task which wants to use it. Then each module can be of a "program library module" type containing multiple, 64 bit identifiable, entry points. DPS keeps track by whom a module is used so it can be discarded/deallocated when no longer needed etc. This mechanism is widely used by various layers (e.g. some "programs" call such module functions directly, whereas a DPS object can have some of its actions located in a .plm file (.plm being obviously "program libray module" :D ).

Ah I see now. I suppose in my vocabulary this would be "name reservation". Can you not simply prevent the developer from using names which have already been used? Like in any language, "duplicate symbol definition" etc. sort of thing error if he tries? May be I am still not getting what you are after.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

D

Don Y 11 years ago

Yes. But, now when you take a lock on the file's *contents*, which file NAME is associated with that lock? I.e., traditionally, a name is treated as a separate entity from contents.

Most existing systems don't care *how* you got to the "contents" because the lock applies to the *content*, not the "name" that you resolved to access it.

When locking content, I work similarly: an object O "physically" residing on processor P can have the name N1 in task T1's address space (on processor P1) and name N2 in task T2 on processor P2, etc. When T1 takes the lock on "N1", it is dealing with the same "physical" object that T2 may try to access as "N2". And, none of the tasks are aware that they are executing on different physical processors (i.e., true parallelism) or that the object they are interested in actually resides on yet another processor.

The problem I'm addressing is conflict resolution in the *namespaces*. Note that T1 and T2 -- residing on different processors! -- may share all, part or *none* of their respective namespaces. Their namespaces are just like any other object -- they can "physically" reside anywhere in the system (though "locally" -- i.e., on P1 for T1 and P2 for T2) obviously gives better performance if they are doing lots of namespace operations!).

For me, a namespace is just a set of ("name", handle) pairings. E.g., "subdirectory" is a pairing of the text "subdirectory" with a handle to the "directory object" that will contain more such ("name", handle) pairings. The "leaf's" are pairings in which the handles point to non-directory objects.

The namespace code doesn't care about the types of these handles; it just walks the chain stripping off each successive "subdirectory" as it decends into the hierarchy and passes the remaining portion of the "filename/pathname" to the object referenced by the portion of the "name" that remains. In this way, things *inside* an object can be accessed.

For example: when resolving ">don_private&devices+UART:baudrate" (note the "separators aren't "reserved characters" -- they can be anything that makes sense to the object in which they occur!), the top level of the namespace is presented with this "string". As it is typically a directory object, it looks through the list of (name,handle) pairs that it currently contains and eventually finds ">don" (but not ">do" or ">don_"). It takes the handle associated with ">don" and passes "_private&devices+UART:baudrate" to the resolver for that handle.

As this is probably also a directory object, it scans *its* list of bindings and finds "_private". Or, perhaps "_private&de" (!!!). After stripping off the matched prefix, it again passes the remaining string to the object referenced by *that* handle and the process repeats.

In this example, lets assume we eventually get to a point where we have resolved "...e" (remember, the name of the uart device can be any contiguous substring up to and excluding the final character in the "pathname", for this example: "UART", "UART:", "+UART", "ces+UART:", "evices+UART:baudrat", etc. Of course, teh *logical* name of the UART would be something like "UART" -- with everything preceding it being part of the names of the objects in which it nested!) to reference a handle that happens to represent a *uart* object!

When we pass the remaining portion of the string (just like before) to that (uart) object and ask for *it* to resolve "baudrate" (or whatever portion of the pathname remains unresolved -- at least the final "e"), then the uart object decides we are referencing the component that (for example) governs baudrate.

The handle for *that* "baudrate object" then allows "baudrate operations" to be performed on it. E.g., things like: result_t set_baudrate(baudrate_handle h, baud_rate br); result_t get_baudrate(baudrate_handle h, baud_rate &br); etc.

It wouldn't make sense for you to try to "fseek()" on a baudrate object, for example!

A library is a different entity. It is designed to have multiple "access points" but no real "entry point" ("shared libraries" are a special case). An executable file has *one* entry point. The loader copies it from disk into memory and then transfers control to that entry point.

So, if you want a "program" to behave differently based on the name by which it is invoked, that single entry point (that eventually brings you to "main()") must examine the command line that caused the loader to place it into memory: "What was this called BY THE USER to get it here?"

[These aren't really material to what follows; just trying to put UN*X in its perspective]

You write a program that will, eventually, create a *named* object and pass the *name* of that object on to some other program (so that other program can reference that object through the namespace instead of via a "bare handle"; for example, if the other program is invoked some time LONG

*after* the original program has terminated!).

What if some other program (sharing that same namespace) creates a named object having that same name *before* your program has a chance to create it's object having that particular name? (remember, these are just objects; the first example program may think it is creating a timer called "deadline" -- while the *third* thinks it is creating a "handler"/executable object that will be invoked when some deadline is exceeded).

Or, the first program creates the "deadline" TIMER as planned. The other (third) program comes along and tries to create an object having that same name. But, the name is already "in use". Does it unlink the name ("delete") so that it can safely create it's "deadline handler" and bind the handler to that name? (what happens when that *second* program is invoked and it goes looking for the "deadline TIMER" -- which no longer exists even though "deadline" still resolves to a live object!) Or, does it abend claiming "name exists"?

Etc.

All of these "problems" go away with excellent documentation and *responsible* developers. But, that's true of *all* bugs! Yet, we still encounter them! :-/

What I'm trying to do is put in place mechanisms whereby the developer can set up "contracts" through the RTOS that allow his/her "programs" to operate reliably in light of past AND future components in the system -- without having to research every possible contingency (which makes the system increasingly difficult to "scale")

A name is a resource just like space on a disk, bytes in memory or cycles on a processor. Ideally, I want a developer to "formally" lay out the requirements/assumptions that his code RELIES UPON and then have the RTOS ensure those guarantees are met.

So, if some *other* "program" starts to run, *when* it similarly lays out it's requirements, it is told of conflicts that exist IN THE CURRENT ENVIRONMENT and *it* can address them. Much like getting NULL returned by malloc() -- you can't assume that because you *want* something you will *get* it. Figure out how to recover (e.g., decide to use a different name if the name you *want* is already "reserved" -- and, include a mechanism by which you can convey to that "other/second" program your choice of names!)

Have to get started on another cheesecake...

Vote

G

George Neuner 11 years ago

Hi Don,

You can't resolve multiple name conflicts without a global protocol. However, you still haven't provided any convincing reason to be worried about it.

If what you want to do is make the object and its name appear to be one and the same, you have to transitively lock both before allowing access to either.

That's why Unix indirects through the inode - the complexity is necessary to correct operation. [Yes, there are odd cases where Unix has problems, but it got right virtually all of the common uses. And most of the real problems are caching/timing issues that have no good solution.]

Why? Consider it as seeking to a new frequency.

Almost any function can be shoehorned into Unix's read/write/ioctl interface ... believing otherwise indicates a lack of imagination 8-)

Both object based and function based APIs have their place. Too many people have drunk the KoolAid and think objects always are the right way to go.

You can mitigate that slightly by supporting multiple versions if there is something to distinguish them (object type?) but there is no way to entirely prevent it.

The paired programs require a protocol for dealing with disruption to their name scheme. That's what UUIDs are supposed to handle. UUIDs don't entirely prevent collisions, but they are quite rare. No number of name bits will entirely prevent collisions, but you can make collisions so rare as to not bother worrying about them

A program may contract with a particular service, but it must rely transitively on further (sub)contracts which that service has with others and over which the program has no control.

There is no satisfactory answer to this.

That depends on the complexity of the dependency web: the probability of satisfying simultaneous dependencies multiplies both horizontally in the number of dependencies and vertically in their depth.

If you're willing to wait, you can turn to transaction and queue theory to estimate wait times. However, the estimates may be too broad to be worthwhile.

George

Vote

D

Don Y 11 years ago

Namespaces are self-contained objects -- administered by whichever namespace server happens to be holding the object at the time. If we limit the discussion (complexity) to just "directory objects" (in which create/delete/rename options make obvious sense), then N different tasks may have a particular directory object in their shared namespace (or, some part thereof). There may be cached copies of that *single* directory object in various places throughout the system. *But*, operations that modify it (instead of just lookups which are, by far, the most common operation) are forced to happen in one copy at a time -- those changes then automagically propagated to any other instances (DSM).

So, whichever server holds the write-lock on that object knows that its copy of the cache is "gospel" (for the time being).

That server may (conceptually) be handling multiple requests on multiple directory objects -- including multiple requests on this particular object (multithreaded so actions for one client don't tie up the service while those are resolved... like waiting for an object lock, DSM update, etc). But, the server wishing to update that "directory object" (to add/remove/rename an entry) has to take the (local) lock *for* that object -- which means also having the (DSM) write lock -- in order to make that change.

So, any other threads in that server instance inherently block awaiting access to that particular directory object *if* they similarly want to modify that object's contents. When they return, the first thread will already have made its changes; any actions that any other threads want to make will have to consider those changes in their service: "Hmmm... the name that I've been asked to delete doesn't exist (anymore). I'll have to return an error to

*my* client!"

Names are a limited resource. Just like memory, CPU cycles, etc. Names *typically* have significance -- we don't allow "The System" to make up names for us: "What's the 'console' called, today??" That would just add another layer of indirection to the naming system!

Imagine two processes use a named pipe to exchange information. But, before they get to the point where they *need* to access that pipe (e.g., perhaps before either has been *invoked*), someone else creates an object (maybe a plain file, maybe a "device", etc. [UNIX-speak]) having that same name. Now, when (either of) the aforementioned processes starts up and tries to create that pipe, the name is already in use!

If those processes were written "early on" (when the namespace was sparse and/or relatively unshared), the developers probably didn't consider that possibility (shame on them!). So, when someone *diligently* (and corectly) uses that name at some future date, the original use ends up manifesting as a run-time bug.

No. All I want to do is be able to place "reservations" on names in much the same way that I can place reservations on memory, CPU time, etc. So, at the time you place the reservation, you *know* that it will (eventually) be satisfied -- or not. And, because "not" is a possibility, you explicitly address how you will handle that case

*in* your coding.

Forcing the file-paradigm on all "objects" similarly shows lack of imagination. This is why systems that do so have to force "commands" (functions) to be "in-band": fwrite(uart, "set parity odd"...)

"Object" only is of significance to me as everything is RPC-based. So, you need some way of indicating the "thing" (object) on which you are operating. Almost all of my syntax is function based simply because the "object reference" (handle?) isn't something that you can resolve with a traditional object syntax!

If you can place "reservations" (reserves) when a module is *installed* (i.e., before it even executes), then installing a *new* module would make the conflict apparent *then* -- instead of at some later "run-time".

I think just reserves gives you the same capability. If a specific name *is* required (because it is used to interact with ... including the user!), that can be specified in the "IDL" for the module. When the module is installed into the system -- with its companion(s) -- that shared requirement can be enforced at install time (i.e., if the reservation can't be met, the "installer" can rescript all components that rely on that reserve: "OK, guys... the 'console' will be called 'disc'!").

Having been successfully installed, this (e.g., "disc") reservation now further constrains *future* installs ("Hmmm... 'disc' is in use; let's call the disk object 'bob'! This works because installs are serial (and seldom).

See above. Note how I deliberately chose name conflicts that are bizarre. The issue is "coercing" the developer to address this problem instead of just BREAKING and claiming the problem lies with software from some other "vendor" -- leaving the user in the lurch.

I can currently handle this by creating private portions of namespaces and advertising those to "interested parties" at run time. But, that just reduces the *number* of potential conflicts (because you can bundle all the "names of interest" into that single ISOLATED namespace) but you still need a "well known name" that you can agree upon a priori for the "advertisement".

E.g., like "\Program Files\" relying on the assumption that "Hughe's Paradise" won't opt for the of "HP".

If you can resolve those "conflicts" at install time, there is no wait *in* the system: it effectively looks like you bought/installed the modules a millisecond -- or *day*! -- later than you actually did! (this is a dodge but an effective one as it removes the RT component from the action)

I am expecting such conflicts to be incredibly rare. It's too easy to create a new namespace and very little incentive to *share* namespaces, broadly (except for "system objects" where a "well known name" is inherent).

Because these conflicts are expected to be rare, I expect developers to be lazy and *not* check for them as they should in their "best practices". Just like developers expecting malloc() to always succeed! For an even more prominent example of how much developers take for granted, when was the last time you saw the return value of printf() examined?? ;)

Hope you're making progress! Last of the "cheesecake triad" done, this morning. Hopefully a break in the baking, for a while -- short as it may be! :<

--don

Vote

D

Dimiter_Popoff 11 years ago

Yes, like I said solvable issues - but the added expense just does not justify the benefit this give you. The inode must link back to all directory entries pointing to it for this to work - they probably do that - which has further implications, e.g. do they store the path name as text - this must then be resolved for every access - or do they store medium specific (LBN etc.) data - this will take special processing when copying (my guess is they have opted for the first).

In DPS the inode is called RIB (.... this is what it was called on the first OS I had contact with - MDOS, Motorolas OS on their Exorciser, Retrieve Information Block - so I just reused a name I remembered well). It does not link back to the directory entry - and for files split in up to 2 pieces it can be unused, thus saving one disk access per file open - but I could easily expand that if I wanted to. For an application it would be trivial which name was used so it was invoked as iocb-s contain the directory entry name. But I can't see how this buys me anything of real value.

I agree, my contacts with unix have been only as a layman user trying to do something but I never encountered any issue with the filesystem, apart from the big one - they store names as bytes and expect the user to convert them to text (i.e. name search is case dependent only).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

G

George Neuner 11 years ago

If you have a proper coherence protocol, then you can't have a conflict unless the network is partitioned. And if there is a partition, you are screwed because "eventual" consistency doesn't work unless you either can version or afford to lose updates.

You have to indirect through a global shared namespace: i.e. "console" is a local alias for "231cacd6-0074-11e5-a322-1697f925ec7b", which never changes.

Want to create a new file called "foo"? Fine! Call it whatever you want but the system calls it "231cb834-0074-11e5-a322-1697f925ec7b". It's up to you to keep the name mapping and, if need be, to transmit the mapping to someone else.

Same answer as above.

For mail drop semantics, both sides have to agree on a suitably unique name. For rendezvous semantics, it suffices to agree on a unique service key to create the object on demand when both processes are ready.

A unique name generally must be longer (more bits) than a unique key because it usually needs long duration persistence. A key OTOH needs to be unique only when it's actually used - which may be both predictable and short duration.

That may be inconveniently late: the module may have to be recompiled if the conflict isn't resolvable by configuration. What about cross development without access to the target system? What if a user purchases a binary module that can't be installed due to a conflict? What if there are several such modules because short sighted developers picked the same names?

The time to reserve the name is when the module is compiled.

Reservation works in situ if the mappings are persistent, but it doesn't solve the cross development problem.

Maybe it works. Maybe it doesn't.

You can force them to be incredibly rare regardless of what the programmer does (or does not).

When was the last time you knew a priori what value to expect?

I used to do a lot of multiple sprintf into a fixed length buffer, so I did check and use it to position the "cursor" for the next write. But I haven't done much in C recently.

Somewhat. Working on several things at once.

Anything simple loses the flavor.

I'm currently thinking the best approach is to compute Wilson mean for each variable - substituting range median for nulls - and then treat the ordered values as vector components of a polyline.

From there are a number interesting comparisons possible, but I think the most meaningful is the ratio of the volume of 2 N-balls centered on the point where all components are minimized: ball with "radius" distance to endpoint of the polyline vs ball with "radius" the distance to the point where all components are maximized.

Dunno. Noodles are still wet.

George

Vote

G

George Neuner 11 years ago

No. Links are one way: from directory entries to the inode. The inode is reference counted and persists as long as there is at least one directory entry which references it.

The inode maintains the structure and security information for the file. Directory entries are just a name and an inode reference.

Inodes for open files are cached in memory and updates are lazily written back to disk [unless you deliberately (f)sync]. There is a lock on the in-memory cached inode, but it is used only for updates to the inode itself to coordinate sync flushes. File content locks are handled separately through a filesystem service - content locks are neither in nor on the inode.

In Unix the inode directly links to some small number of data blocks before indirecting through index blocks. The size of a file that can be addressed directly depends on the logical block size of the media, but it typically is 40-64KB for a desktop filesystem and may be megabytes in a server filesystem.

George

Vote

D

Don Y 11 years ago

Yes. The shared memory protocol is reliable (though expensive if you're doing lots of writes; the premis is that *shared* namespaces will be rare(r) -- especially those that are cached in more than one place. I.e., caching expedites lookups at the expense of slowing down *updates*.

The *actual* handle can vary from node to node, task to task and run to run. The "system wide" objects are created at system start-up as the initial "system" namespace is built. Only "init" sees this namespace; things that it spawns is given portions of the namespace as appropriate to their respective needs. And, of course, individual nodes can create their own "dynamic" objects naming them as they wish and *exporting* references to them by various means (injecting specific names into other tasks' namespaces, passing "live handles" to other tasks, etc.).

The goal is to restrict access to objects to only those entities that (should) need that access.

If the object doesn't exist as part of the "system namespace", then there is no need for a handle, there. E.g., unlike the single filesystem hierarchy in most machines, it is possible for a completely

*isolated* namespace to exist in my system -- where only those entities having access to that namespace are even aware that the objects named therein exist.

But, in each case, there must be some guarantee that the agreed upon "unique name" or "unique service key" are available for use by those interested parties! E.g., we can agree to use the name (key) "fred". But, if someone else INDEPENDENTLY decides to use "fred" (for some purpose), things break.

you could probably get away with this -- because no one *else* would

But, "in general", you can neither predict when nor how long a name or key will be used. It requires detailed knowledge of the application and potential competitors.

That's exactly the problem! If a developer unilaterally picks a name and *expects* it to be available for his/her use (without taking measures to adapt when/if it proves not to be), then it is "broken".

But, this is apparent when the application (name consumer) tries to install the module -- not at some "random" later time when the perfect storm of conflicts *happens* to occur (because the module(s) with which the name conflicts *happen* to be "live" when the later introduced module is also live).

Would you rather the module be installable -- leading to the appearance that it *will* work (in all future instantiations of the EXISTING modules) -- only to belatedly discover that (which may not be the module in question) STOPS working?

[i.e., if it was OK for module #25 to unilaterally pick a name that conflicts with one used by module #3, then it's just as acceptable for #3 to expect it's use of the name to be OK. So, if #25 runs before #3 -- even though #3 was purchased and installed LONG before #25 -- then #3 looks to be "broken" when, in fact, the introduction of #25 is the real source of the problem]

That requires everyone to cooperate on name choices. Or, everything to have independent namespaces (\Program Files\HP) to make this possible. Entities then can't effectively share things.

By putting the reservations in the IDL, I am hoping to force the conflict issue to be more visible. "Hello, Support Desk? I just purchased your ABC module and it won't install: 'name conflict'..."

Like someone deciding to name their program "/kernel" and

*expecting* that name in their code!

If it doesn't, it is because the developer hasn't played by the rules. E.g., you can unilaterally pick "/kernel" as your program name on a NetBSD system. (You can *possibly* also pick "/netbsd" -- the actual kernel name -- as I don't think the kernel name is hardcoded into any of the system utilities; even if so, there would usually be a command line argument to allow those utilities to fetch names from a different kernel image)

It's akin to a program *expecting* to be able to malloc() N bytes of memory, regardless of what else may be running in the system at any given time and, therefore, not *checking* to see if the allocation was successful. Then, crashing.

If, instead, that application was *guaranteed* M bytes of free memory regardless of other modules installed in the system, then as long as N> I am expecting such conflicts to be incredibly rare. It's too easy

I always expect the return value to be non-negative! Yet, how often do you see: if (0 > printf(...)) { /* Oh, crap! */ }

I use it a lot in character based displays -- similar reason.

But, you still rarely (ever?) see testing for the *FAIL* case (above).

Join the club!

I think your biggest exposure will be if the "numbers" don't get/stay large. I.e., the "fad factor" (but, I can't see how you can work around this -- you need some meat in order to make sense of anything). At some point, even "incentives" lose their appeal.

I'd have to think about whether volume or radius would be the better comparative metric. (too early in the day to do stuff like that!)

Throw against wall. If they *stick*, consider them done! :> (but don't eat those samples, regardless!)

I think I'm supposed to be going to a party tonight. I should probably find some pants... (or, start feigning an exotic illness!)

Vote

D

Don Y 11 years ago

No. In UN*X (and in my scheme), names aren't intimately tied to "files" (if you use the term 'file' to indicate the actual contents). A "directory object" maps "names" to "files". It's a pointer-style reference: you can have multiple pointers referencing the same "thing" (just like you can store the address of a particular thing in several different places!).

The (traditional) "lock" is applied to the "thing" (file contents), not to the pointer through which it was accessed. Otherwise, the lock wouldn't work: you could get around it and access the object through a different pointer!

A *file* doesn't know how many references "point" to it. Nor which reference was used to access it in this instance!

[My scheme doesn't suffer from this limitation. An object (by that, I mean, "the server that implements an object") knows how many references/handles exist for it at any given time because these are more than just simple "pointers". Also, an object knows how (and who) it is being accessed because it can see which handle is being used in that access.]

In my case, names apply to *all* objects: timers, devices, tasks, nodes, processes, etc. (though there are anonymous objects). This allows objects to be referenced by other objects, manipulated, destroyed, etc.

E.g., I can build a "memory object" and, optionally, give it a name. This would allow me to share/reference that object (read it, write it, grow it, delete it, move it, etc.) within the system.

Further, I can fill it with sets of (name, handle) pairs. I can then call this a "directory object" and, optionally, name it. It can now be referenced by any task on any node in the system -- *if* a handle to it is made available to that other task, etc.

Handles contain access permissions. So, I can create *one* handle for it that gives the "holder" permission to invoke the grow() and write() methods, but denies access (for the holder of that handle) to the read/delete/move/etc. methods. And, create *another* handle that grants permissions for the read() and write() methods, but no others.

When the "directory server" is passed one of these handles in a method invocation, it implicitly *knows* whether that method is allowed on that handle. So, trying to invoke grow() using that second example handle results in a permission error.

With this sort of mechanism, I can provide handles to the *directory* that allow lookup()'s, but no create()'s. Or, create()'s but no delete()'s.

Because the directory object exists in one (conceptual) place, a suitably empowered client can install a "name" that the directory server (for that object) will treat as "reserved" -- and only allow a client having the "able to create a binding with this particular name" to invoke the create() method with that name as an argument! Thus, implementing a "name reservation".

[The same sort of mechanism prevents certain clients from deleting that particular name, etc.]

Vote

D

Don Y 11 years ago

Subtle point: "reference counted". I.e., the inode has no idea

*where* the references happen to be! Anything referring to that inode (adding or removing a reference) is responsible for correctly updating the reference count and ensuring its integrity.

Another subtle point: the metadata is tied to the *file*, not the

*name*. I.e., names are functionally equivalent. [In my world, names map to handles and handles to objects. So, I can create different capabilities (bad choice of words) for different names (and different handles -- as I can subset capabilities for any particular handle before passing it on, anonymously)]

Vote

D

Dimiter_Popoff 11 years ago

Thanks for the explanation. This way it would be impractical to have directory entries from different directories pointing to one inode, or are they doing it? Would take a lot of directory digging before an inode is deleted so my guess is obviously "no", but it is only a guess.

In DPS there is no separate lock for the RIB - since I do not allow multiple directory references to the same data. Even if I would opt to implement it, I would make a different directory entry type (the entry type is 5 or 6 bits I think (it is a byte and I don't remember how many bits of it I used for flags). This directory entry can easily point to the "original" entry which points to the "data" (RIB or no RIB), or could use some level of indirection etc. Not that I plan to do it, I see no use for that sort of thing myself (I imagine they implemented in in the 70-s to save space for the multiple commands, which would have a negligible effect today if one just uses a file per command - or, if space allocation conscious, would put all the commands in a file with a "disk" image having a small cluster size (say 32 bytes or even 1 byte)).

Does that mean the inode is part of the directory file? That would be very similar to the way I do it - only I have put "segment descriptors" (two of them) in the directory entry, if there are more than 2 segments the first one of the two points to the RIB (which is not inside the directory file).

Dimiter

Vote

Locking semantics

Join the Discussion

Didn't find your answer?