"write()" semantics

D

Don Y 15 years ago

Hi,

[I have nothing at all like a POSIX API! :> ]

My "write()" semantics essentially say, "Once invoked, you forfeit the contents of 'buffer[]' -- even in asynchronous mode."

This makes it a lot easier for my OS to move data around as *it* sees fit. And, imposes minimal constraints on the caller.

In "write()"'s case, it just means you *use* the buffer[] contents, if necessary, *before* "write()"-ing them!

But, this means I can't take advantage of true parallelism and/or I/O device scheduling delays. E.g., I can't start an async write, get things enqueued for processing, and

*then* do a few last minute things with buffer[].

There is some cost to adding this support -- though I think I already have most of the mechanisms necessary to do so in place.

So, the question is, how do *you* use write() and the data it is called upon to process? E.g., if the de rigeur form is:

while (ALWAYS) { ... fill(buffer); ... write(buffer, device, mode); }

then you *don't* need this proposed functionality.

Thx!

--don

Vote

T

Tim Wescott 15 years ago

Well, if I read your semantics correctly and unwind that loop, I get

fill(buffer) // with stuff to write it ... write(buffer, device, mode); // write it ... fill(buffer) // Aw crap!!

Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html

Vote

D

Don Y 15 years ago

I don't understand the "Aw crap!!". :<

You fill the buffer with "whatever". You write it out (to "wherever"). You then REfill the buffer with whatever *else*, etc.

I.e., once you have called write() on a particular buffer's contents, you never *look* at those contents again -- they are OVERWRITTEN by the next "fill()".

If, OTOH, you did something like:

while (ALWAYS) { ... fill(buffer); ... write(buffer, device, mode); ... examine(buffer); // !!!! }

then you are *expecting* buffer's contents to be available to you after the write()

[which I currently don't guarantee -- regardless of whether its a synchronous or asynchronous service request]

Vote

I

IanM 15 years ago

The problem is preventing the application from stomping on data that has yet to be written.

There are four main strategies if you don't want it to be broken in interesting ways . . .

*EITHER write() COPIES the whole buffer before returning so the data in the original buffer is no longer needed *OR write() must block till final completion or failure, to protect the buffer, *OR the buffer must be owned by the filesystem manager so that your application requests a buffer from the filesystem manager, fills it and it is then destroyed by write() and a new buffer must be requested (which may actually be the same one - but you dont know that) before filling it again, *OR it is owned by the application, and you must have an is_written() function that the application polls to check if the buffer is available for reuse or to be freed, or a callback in the application the filesystem handler notifies.

I am sure that there are other more complex semantics that could provide data integrity if the write succeeds (nothing is guaranteed if it fails), but I have just grabbed the low-hanging fruit in an attempt to clarify the situation! ;-)

No doubt you will quickly reject two or three of the alternatives as undesirable for your system, but it is the various performance tradeoffs that make which one to choose interesting . . . .

Ian Malcolm. London, ENGLAND. (NEWSGROUP REPLY PREFERRED) ianm[at]the[dash]malcolms[dot]freeserve[dot]co[dot]uk [at]=@, [dash]=- & [dot]=. *Warning* HTML & >32K emails --> NUL:

Vote

G

George Neuner 15 years ago

I think Don is assuming (in the example) normal synchronous write ... so further manipulation of the buffer is non-interfering.

George

Vote

D

Don Y 15 years ago

Not a problem. As I said (above): "Once [write() is] invoked, you forfeit the contents of 'buffer[]' -- even in asynchronous mode."

I.e., in synchronous mode, it's a non-problem... the write() is "completed" (whatever *that* means) before it returns.

In the *asynchronous* case, the write (whatever that means) is

*scheduled* before write() returns. And (my) system's contract allows you to stomp on the data -- because it isn't *there* anymore (figuratively speaking).

The question I am posing is: does the fact that the data is no longer *there* pose a problem to *particular* coding practices THAT CAN'T EASILY AND CONVENIENTLY BE WORKED AROUND? I.e., how necessary is it for me to provide a means by which the application can be assured that the data *remains* there AFTER the "write()" completes?

Consider the coding style I outlined previously:

while (ALWAYS) { ... fill(buffer); ... write(buffer, device, mode); }

Implicit in this is the coder's acknowledgment that the data ("buffer") is worthless after the write(). It is AS IF the write() is effectively empty()-ing the buffer (hence my choice of name for *fill()*-ing the buffer)

E.g., imagine "write()" intentionally corrupts the contents of buffer[] prior to return-ing. Would you (your code) care??

Zero-copy semantics.

User can elect an asynchronous call, results (guarantees) are the same.

This is basically what I do except the buffer isn't really "owned" by anyone. Rather, when you pass it to a service, that service

*takes* ownership of it. The OS ensures that you can still *reference* "memory" where buffer[] was (previously) located. It just doesn't guarantee that the *contents* of that memory is "intact".

That's just a natural consequence of any asynchronous service.

Yes. But all these focus on mechanisms for *implementing* such a write(). I am more concerned with the (programming)

*consequences* of this sort of behavior.

E.g., I *can* arrange for the contents to "remain available" to the application after the write() returns. But, that means an implicit *copy* operation (actually, there are tricks to minimize that, as well) -- which gives me performance equal to traditional OS's... not *better*.

I've been looking through my code and casually skimming bits of OSS and it *seems* like the norm is for the programmer to write() data when he is, effectively, *done* with it. Hence the reason for my original optimization...

Vote

T

Tim Wescott 15 years ago

If by "asynchronous mode" you mean that a call to write(buffer), launches write, which may return immediately but uses the contents of buffer until the write is done, then if you go and immediately modify buffer again then you're hosed.

If some write process is using the contents of buffer after write returns, how do you protect those buffer contents? How, in other words, do you guarantee that the write process is done with buffer when you do the second fill?

Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html

Vote

D

Don Y 15 years ago

Correct.

Ah, but that's the point of the contract! You have "given" them to write(). It's write()'s responsibility to ensure that they get written AS INTENDED regardless of whether you requested a blocking or NONblocking service.

Where my contract differs with a traditional approach is: write() will guarantee the contents of buffer[] that you passed

*to* it WILL be those that are written (otherwise, how could the application ever know/specify its desires?) *but* the contents of buffer[] *after* write()'s invocation are no longer guaranteed to reflect what they were originally! *If* your coding style essentially write()'s data out when you are *done* with it, then you shouldn't care that buffer[] is (possibly!) different than it was prior to the write().

OTOH, if your coding style references the contents of buffer[]

*after* the write() AND EXPECTS THEM TO REMAIN AS IS (i.e., as if write took a "const" argument), then you would need to either take deliberate steps to preserve those contents (this sucks because it forces an extra effort onto you *and* adds a run-time cost WHICH MIGHT NOT BE NECESSARY... remember, you can't be sure that write() *will* alter buffer[]!) or would have to be able to request this capability (at some POTENTIAL run-time cost) from the OS.

By taking ownership of the buffer's contents *from* the application. All I have to do is guarantee that there is

*something* there for the application to REfill.

Vote

D

Don Y 15 years ago

The easy case to explain is the blocking write(). Think about that one first.

*If* you have a synchronous service, *you* (the application) know that the OS safely was able to "consume" the data in a way that is consistent with your desire (write). You need not worry about overwriting something that the OS might still be using. That's one issue off the table.

Now, further imagine the OS is "a big (smart!) boy" and is aware that you *might* want to twiddle with buffer[] *while* the write() is "pending" (asynchronous). I.e., it does whatever is necessary to ensure that the data that it *needs* (to fulfill its contractual obligation at the time of the write() invocation)

So, all of the same statements, above, apply. You know that your data got written (or *will* [1] get written). You know that you can freely scribble on buffer[].

But, imagine the last thing write() does before return-ing is to OVERWRITE the contents of buffer[] with gobbledegook. No, not "zero()". Nothing "predictable". In fact, it may elect to overwrite buffer[] with exactly the original contents of buffer[]! (!)

So, you, the application, KNOW that your data got (or will get) written. You *know* that you can now freely scribble on buffer[].

*But*, I am telling you that you can't rely on:

holdme = buffer[0] write(buffer) ASSERT(buffer[0] == holdme)

[that's the essence of the question I pose]

Note that the above code fragment would be *valid* in a more traditional API...

[1] If the write *eventually* succeeds, that is... [[(sigh) gotta go water my new plantings...]]

Vote

G

George Neuner 15 years ago

That is the most common interpretation of write semantics. It's just that very few systems bother to enforce it. The cost of copying the data to a kernel buffer or remapping the user buffer so the process can't touch it is, in general, too high.

[This is a near perfect example of where VMM segmentation /could/ be much more flexible than paging. You can't remap a buffer page if it contains other user variables - such as might be on the stack or in a heap structure. But there are a number of ways that a segmentation unit could be designed to exclude the user from the buffer without affecting anything around it. Such mechanisms were relatively common on mainframes, but micro VMM units mostly eschewed segments and Intel's half-baked implementation soured nearly everyone on ever trying segments again.]

That sounds *extremely* unsafe - even if this is a custom OS where you

*believe* you know what's going on at all times.

In general, you can't know the scheduling ... once you issue the asynch call, you have no way to know when the I/O will begin, and for a large buffer (or under VMM a physically disjoint one) there may be multiple distinct I/O operations needed. You can't know when or if it's safe to touch the buffer while the call is in progress.

George

Vote

D

Don Y 15 years ago

I don't see anything formally indicating the case, one way or the other.

Had I seen buffer declared as const, then I would "know" write doesn't dick with it. OTOH, that omission doesn't imply that write

*does*...

Unless you arrange for nothing *but* the buffer to reside within the page! :> This is particularly attractive if you are doing large write()'s -- i.e. *page* sized write()'s!

[you can cheat on "shared pages" if the buffer size is sufficiently large (or sufficiently *small*) so that copying the non-buffer portion (or *buffer* portion) cost "makes sense" for the service]

Without constraining the sizes of buffers (or their locations), you'd need some pretty fine-grained controls, there.

And, you would need to be able to have multiple such segments. AS IF you had a paging unit in which each page's size was configurable...

Think about it...

Assume write() "does what's necessary" to snapshot the contents of buffer at the time it was invoked (because your contract on write() requires it to do so). Whether this is a synchronous or asynchronous call, this snapshot data will be written (if the write succeeds). (i.e., this is AS IF buffer was passed as const).

Now, *if* write() preserves the contents of buffer on its return, then I can go peeking around inside buffer -- maybe to compute a checksum or just *use* the data *while* it has been enqueued for writing.

And, if write() has "snapshot-ed" (is there such a word?) the buffer's contents, then I can freely *modify* it as well! It makes the API cleaner because the queuing is buried in the system call.

So, the application doesn't have to build a mechanism to allow buffers to be enqueued (re-inventing the proverbial wheel!). Nor does it have to ALWAYS use that mechanism regardless of whether or not there is a *need* for it -- in any particular write() instance!

E.g., imagine you are write()-ing to /dev/null. The application code doesn't necessarily know this (because it might be part of some general purpose subroutine/library, etc.). Yet, it has to build and maintain this "list of enqueue-able (sp?) buffers" THAT IT WON'T NEED (because the write(/dev/null) will happen in zero time).

I.e., the entity best suited for determining how a particular buffer[] will be handled is the write() service, itself!

In particular, if *it* knows that it needs to "keep a copy" of the buffer[] for it's own needs, then *it* can take whatever action is required to make that happen -- without the application having to worry about it. The application can rest assured that the contents of the buffer[] *will* be available after the write()

[this assumes that I modify the current behavior to support these "preserve buffer" semantics!].

In other words, you can make either (preserve buffer or *don't* preserve buffer) behavior "safe" in these situations.

I'm trying to decide how people *use* the code to decide what mechanisms make sense to include/support (since there is no free lunch -- except on Thursdays! :> )

Vote

G

George Neuner 15 years ago

That's not really a question. The semantics in this scenario are not different from one in which the user is prevented from touching the buffer. The fact that the user *might*, for an indeterminate time during the write call, be able to use/modify the buffer as a scratchpad is not relevant ... the user has no way to know when that safe period is done and any modifications will be lost, so the overall semantics are the same as if the user was prevented.

George

Vote

D

Don Y 15 years ago

Argh! We're still thinking on different wavelengths :-/

Think of the above code snippet in a *synchronous* write(). I.e., the user would *know* that it is "safe to touch the buffer" as soon as the write() returned (indeed, the user *can't* touch the buffer once he has invoked write() because, in this case, the write would BLOCK!).

In my *current* implementation, the ASSERT is not guaranteed! I.e., write() could do anything it wanted to do to the contents of the buffer. For the caller to expect the buffer's contents to be unchanged (as the ASSERT implies) would be folly.

I maintain that this is different from what most write() implementations are like. E.g., you could fill a buffer with some data, call write() (blocking or otherwise) and

*expect* the contents of the buffer to be unchanged after the call. I.e., its as if the contents were *copied* to the "device" and the original remained intact, in place.

My (current) implementation doesn't provide the caller with this guarantee -- regardless of blocking/nonblocking call. This lets me optimize the write().

I contend that it doesn't *cost* the (typical) program anything because, IME, you write() something and then discard it in preparation for writing the *next* thing.

I am proposing adding that "implicit" behavior back in to my write(). I.e., so that the buffer contents remain intact *after* the write().

[It's not a question of a "race" between the "I/O system" and the application]

Vote

G

George Neuner 15 years ago

And you won't. I said it was the "common" interpretation, not the "canon" one.

The non-const pointer is for backward compatibility. write() did/does not modify the user data buffer on any system I am aware of (been at this for 25 years). But some systems I have used (Prime, Burroughs, etc.) absolutely did prevent modification of the buffer until the write was completed.

Yes, of course, but page alignment requires the buffer itself be a separate heap allocation - or the first field of a heap allocated structure. That prevents it from being on the stack and constrains the layout of structures that include buffers ... all of which increases the burden on the programmer. [I've said before that I think the majority of software developers should find other employment. I deal with people of all skill levels and, in general, I am against any methodology that makes a programmer's job more complicated.]

To a point. Snapshot copying only makes sense for a relatively small buffer. For a large buffer, VMM manipulation (where available) makes more sense.

Not really. All that's needed is a set cache with range lookup ... which is just one more comparison per lookup than a standard address translating cache. Not significantly different from the segment tables we used to have to deal with.

Moreover, you can limit an address to being in just 2 segments: 1 user, 1 kernel, and limit each cache set to 2 entries which the kernel entry taking precedence.

And there's no reason the average programmer should even be aware of segmentation. It can all be handled by a system API that a runtime library can manipulate, like mmap, VirtualProtect, etc.

The problem when discussing segmentation is that everyone immediately thinks of Intel's half-baked implementation and the stupid compilers that exposed it all to the programmer. Things were much different in the mainframe world where many machines had segmented architectures and virtually all of it was hidden from the application programmer.

Those already exist. But typically you can only choose 2 sizes at configuration: a small page and a large page, and there is some inefficiency on a page miss because the translation tables typically are split and both may have to be checked (e.g., an address may be shared between processes using different page sizes).

It all depends on what you consider as "a few last minute things with buffer[]". If you are simply reading back the data, e,g, to do CRC, digital signing, etc., then there is no problem even if write doesn't snapshot. [For this purpose I'm ignoring your other post that suggested write might purposely mess up the data in the buffer before returning. It doesn't happen on any system I have ever seen.]

I have no problem whatsoever with write performing a snapshot or even restricting access to the buffer during the operation.

But consider this ... if you can futz with a write buffer during a write call, why can't you do the same with a read buffer during a read call? A novice isn't going to draw a the same distinction between these as will an expert. It makes sense for the default semantics of both to be "hands off". There can be a separate expert API that allows more access where it makes sense.

I sincerely doubt that many people work with write the way you have though about. As I said, the common interpretation is "hands off" the buffer until the call completes and I believe most developers respect that even if it isn't enforced.

Now, if you can provide a low-overhead call having semantics that allow for more flexible programming, I have no doubt that developers will use it. But I wouldn't put a whole lot of effort into it because the currently understood semantics are good enough in most cases.

YMMV, George

Vote

A

Arlet Ottens 15 years ago

What happens if you call your write() function with a pointer to read-only memory ?

Vote

D

Don Y 15 years ago

The contents of the read-only memory referenced get "written" to the target of the write()

The point isn't that write *alters* the contents but, rather, that the caller can't expect the contents to remain "there". (i.e., the page "disappears")

Vote

A

Arlet Ottens 15 years ago

What if the caller still cared about other stuff that was in the same page as the data buffer ?

Vote

D

Don Y 15 years ago

Now you understand the nature of the original question! :>

To clarify, though, the *only* thing in the page(s) was THE BUFFER. (though you might still "care about other stuff that was in the" buffer! -- or, *would* you? I.e., if the last thing one tends to do with "data" is to *write* it, then you *wouldn't* care!)

Vote

A

Arlet Ottens 15 years ago

Maybe I'd want to write something like this:

write( "200 OK\r\n", 8 );

now imagine this 8 byte constant string is stored in flash, in the middle of my executable code. I wouldn't want to have to copy this string to an empty page of RAM first.

Or maybe I have a 4KB FIFO in a page, and I would like to write the first 512 bytes, while still adding more data at the end.

Vote

D

Don Y 15 years ago

Agrred. *But*, how often do folks *rely* on this (i.e., that the data they "wrote" is still there after the write() completes)?

How? Were there constraints placed on the buffer's size, alignment, location, number of such "active" buffers, etc.?

Actually, it could be located anywhere *in* a page. It's just that you can't have anything *else* in that page so putting it anywhere but at the start of the page buys you nothing...

Understood. This is an internal "kernel" interface intended to make writing services more efficient. If a "user" wants a more traditional interface (with its reduced efficiency!), he'd have to resort to a higher layer service. So, he trades flexibility for performance (which isn't possible if you provide only a "classic" API)

Exactly. This mechanism is for moving *lots* of data quickly and continuously. And, doing so while remaining within the protection domains established for competing "tasks" (processes?)

So, are you expecting each buffer to be "colocated" with similar buffers? I.e., the "system" provides the buffer instead of the "user" arbitrarily picking a spot in memory and calling it a buffer...

I'm still not sure what constraints you're putting on me, the "user" of this interface :-/

Exactly. E.g., you couldn't have a 512B, 4KB, 16KB, etc. all coexisting at the same time (bad example as support for the 512 would include all of the multiples thereof... the same could apply to, e.g., 512B, 544B, etc. -- use the GCD as the page size and you can cover all bases... at some added cost)

Exactly. Once you call write, regardless of mode (sync/async), you've decided (and DECLARED!) what the data being written should be. If you are going to be allowed to alter it thereafter, then you have to be playing with a *copy*.

*So*... if you want to be able to play with it, *someone* has to make this copy available!

Since my goal has been to fast-path the write()'s, I am deliberately avoiding that capability by expecting the user not to be interested in the contents AT ALL after the write(). I.e., as if *he* would soon be overwriting the buffer's contents with the next batch of data...

There's nothing *there* during the read() -- until the read() completes (async or sync).

As an analogy, consider (snail) mailing a letter. Once you put it in the mailbox (mail chute), it is effectively *gone*. If you want to reread what you wrote *after* you have mailed it, then you had better keep a *copy* of it BEFORE you drop it in the mailbox!

Similarly, incoming mail DOESN'T EXIST (i.e., there isn't even a blank sheet of paper sitting in the mailbox waiting for a messaage to be written on it!) until it has been "read()-ed".

These semantics make it possible, for example, to move data between processes across protection domains quickly. No need to copyin/copyout. So, I could set up a named pipe with a producer on the upstream end and a downstream consumer. The producer's write() *puts* the data into the pipe in an almost *physical* sense... like putting a letter in a mailbox. The consumer's read() retrieves the "physical" data. A *hole* moves from the consumer to the producer.

That's what this is! It is intended to be used in the creation of services for export to other, "more naive" users. E.g., I can provide a traditional file system write() atop this -- but, it loses the efficiency fo this implementation because it would be required to *copy* data out of these pages (because a buffer might coexist with other data; because the user might want to access the data after it is written, etc.) If that "traditional" implementation was the bottom-most layer, then its inefficiencies would be inherited by everything built above!

Again, it's not whether the user *can* touch the buffer (I safeguard against that) but, rather, whether he expects the data he "wrote" to remain in place.

Vote

"write()" semantics

Join the Discussion

Didn't find your answer?