Making Fatal Hidden Assumptions

unspecified result, implementation-defined, compares equal, unspecified result, unspecified result.

There, that was easy.

It constrains code, in a way. Existing code is more important than existing implementations, right?

You can't "catch it and do nothing"? What are you expected to _do_ about an invalid or protected address being loaded [not dereferenced], anyway? What _can_ you do, having caught the machine check? What responses are typical?

And why is anything but a dereference an "access" to the protected address?

What can the application do in the handler? Why couldn't a C implementation cause all C programs to have a handler that does something reasonable?

I didn't say override. I said ignore. Since it's not a dereference, no harm actually done. Why does loading a protected address into a register violate security?

Reply to
Jordan Abel
Loading thread data ...

In article Jordan Abel writes: > On 2006-03-23, Michael Wojcik wrote: ... > > No, there is not. The "trap" (a machine check, actually) can be > > caught, and it can be responded to, by application code; but ignoring > > it is not one of the options. > > You can't "catch it and do nothing"? What are you expected to _do_ about > an invalid or protected address being loaded [not dereferenced], anyway? > What _can_ you do, having caught the machine check? What responses are > typical?

Consider: int a[10], *p;

p = a - 1; p = p + 1;

The first line of code traps, you want to ignore that trap, so what is p after that line of code? Nothing useful, because nothing was assigned to it. Well, the second line also traps, but what is the sense in doing nothing here? If you do nothing p is still just as undefined as before that line.

--
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/
Reply to
Dik T. Winter

How about:

int a[10]; foo(a + 1);

where

foo(int *p) { p -= 1; /* do something with p[0]..p[9] */ }

Does p -= 1 still trap, in the first line of foo, given the way that it's called in the main routine?

If not, how could foo() be compiled in a separate unit, in the AS/400 scenario that you described earlier?

If it does trap, why? It's not forming an "illegal" pointer, even for the AS/400 world.

If it doesn't trap, why should p -= 1 succeed, but p -= 2 fail?

What if my algorithm's natural expression is to refer to p[0]..p[-9], and expects to be handed a pointer to the last element of a[]?

The significant difference of C, to other languages (besides the assembly language of most architectures) is that you can form, store, and use as arguments pointers into the middle of "objects". Given that difference, the memory model is obvious, and the constraint imposed by the "undefined" elements of the standard (laboured in this thread) unreasonably onerous. IMO. YMMV.

Cheers,

--
Andrew
Reply to
Andrew Reilly

In article Andrew Reilly writes: > On Thu, 23 Mar 2006 12:33:51 +0000, Dik T. Winter wrote: ... > How about: > int a[10]; > foo(a + 1); > where > > foo(int *p) > { > p -= 1; > /* do something with p[0]..p[9] */ > } ... > Does p -= 1 still trap, in the first line of foo, given the way that it's > called in the main routine?

Why should it?

It was not me who described it, but I see no reason why that should be impossible. Consider a pointer as a combination of the indication of a region and an index into that region.

It does not trap.

Because the latter creates an invalid pointer.

No problem.

That was also possible in Algol 68. But I see no problem with it on a machine like the AS/400.

(As an example from Algol 68: 'int' a[1:10, 1:10, 1:10]; 'ref' 'int' aa = a[2:6, 3:7,4]; the latter points to a two-dimensional slice...)

--
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/
Reply to
Dik T. Winter

Why not?

I guess my point is, why are you not allowed to hold an address in a register regardless of whether you would be allowed to access the memory at that address? That seems like a stupid architecture in the first place. It's not "security", it's bad design masquerading as security. If the goal is to protect an area of memory from being accessed, block programs from _accessing_ it, not from talking about it.

Only under the current standard. Using the fact that it's undefined to justify it being undefined is begging the question.

Reply to
Jordan Abel
[...]

Because the trap occurred before the value was assigned to p.

Because it catches errors as early as possible.

Why do you want to create an address that you're not allowed to dereference?

Presumably it does both.

The C standard doesn't require a trap when you create an invalid address. It merely declines to define the semantics. If you think the AS/400 architecture is stupid, that's your privilege. If you want to write code that creates addresses outside the bounds of an array, nobody is going to stop you; you'll just have to look somewhere other than the C standard for guarantees that your code will behave the way you want it to.

Your argument, I suppose, is that the C standard *should* guarantee the behavior. That may or may not be a good idea -- but I think we can guarantee that nothing is going to change until and unless someone comes up with a concrete proposal. I'm not going to do it myself, because I'm satisfied with the standard as it is (at least in this area). If you do so, you can expect a great deal of argument about what behavior should be guaranteed and what should be left implementation-defined (or even undefined).

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.
Reply to
Keith Thompson

No, it doesn't. It breaks idioms where no error or illegal access would occur.

Because the ability to do so is implied by the syntax of pointer arithmetic.

This restriction (undefined semantics IS a restriction) makes pointer-walking versions of algorithms second-class citizens to otherwize equivalent indexed versions of algorithms.

void foo(int *p, int n) { for (; --n >= 0;) p[n] = n; }

is "legal" and "defined" on all architectures, but the equivalent with a pointer cursor isn't:

void foo(int *p, int n) { p += n-1; for (; --n >= 0;) *p-- = n; }

I suppose that this is the root of my surprise and annoyance on discovering what the standard says. These versions *should* be equivalent, and equivalently well-defined.

Fine. Where can I find that? Can we make a sub-standard that includes the common semantics for all "normal-looking" architectures, so that our code can rely on them, please?

Yeah, but I expect most of that argument to go away, as all of the AS/400 wanna-be C coders drift off to use Java or .NET instead, leaving C a niche language, doing the low-level systems programming that it was designed for.

--
Andrew
Reply to
Andrew Reilly

It shouldn't. But if the comment (and the code) went off to do somthing with p[1]..p[10], and the main line passed a, rather than a+1, you're saying that it would trap. The coder, therefore, can't use a perfectly reasonalble idiom (base shifting), even though the syntax, and the semantics implied by that syntax, allow it.

I *was* considering a pointer as a combination of the indication of a region and an index into that region. In C, that index is a *signed* integer. If the hardware has a problem with that, and causes a trap if the index component is set to a negative value, then the implementation should go to some lengths to preserve the impression that it works anyway.

But the author of the foo() function (as modified above) can't know that.

But the author of foo() can't know that. This argument started because it was pointed out that p -= 1 produced undefined behaviour. It is clear that the behaviour *could* be very well defined. That it isn't is the root of the discussion.

It is a problem if those elements are accessed with a walking pointer, rather than with an array index; something that the syntax of C and most of it's idioms and historical code implies are equivalent.

--
Andrew
Reply to
Andrew Reilly

No, it's not implied by the syntax, any more than the ability to compute MAX_INT + 1 is implied by the syntax of addition.

[snip]

Sure, go ahead.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.
Reply to
Keith Thompson

The problem is that there's really no such thing as a "normal-looking" architecture. Every implementation differs in at least a few fundamental things you'd find it useful to nail down, so to provide enough detail to be meaningful your sub-standard would basically be defining the behavior of a particular implementation.

Just about the only thing that all modern machines agree on is CHAR_BIT == 8 (and I bet someone will dispute that). Ints, pointers, address space semantics, etc. are all up for grabs, and that's a good thing -- it allows systems to evolve in useful ways instead of locking us into something that, while appearing optimal today, is not likely to be tomorrow.

If you disagree, please list all of the various currently-undefined behaviors you want to define and what implementations conform to your spec. Who knows, ISO might adopt it...

S
--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS         smart people who disagree with them."  --Aaron Sorkin 

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from
http://www.SecureIX.com ***
Reply to
Stephen Sprunk

How about defining passing a positive signed int to printf %x, %u, %o?, or an unsigned int < INT_MAX to %d? That doesn't seem too unreasonable. It works fine on every existing platform, as far as I know, and it is currently required to work for user-created variadic functions.

Reply to
Jordan Abel

"Stephen Sprunk" writes: [...]

I don't know of any modern hosted implementations with CHAR_BIT > 8 (though there have certainly been such systems in the past (though I don't know whether any of them had C compilers)), but CHAR_BIT values of 16 and 32 are common on DSPs (Digital Signal Processors).

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.
Reply to
Keith Thompson

In article Andrew Reilly writes: ... > This restriction (undefined semantics IS a restriction) makes > pointer-walking versions of algorithms second-class citizens to otherwize > equivalent indexed versions of algorithms. > > void > foo(int *p, int n) > { > for (; --n >= 0;) > p[n] = n; > } > > is "legal" and "defined" on all architectures, but the equivalent with a > pointer cursor isn't: >

I have no idea on what you base your assertion. When the first is valid, the second is valid, and the reverse. In your first example your first assignment is to p[n-1] (using the initial value of n), the same for the second version. But it is worse: void foo(int *p, int n) { p += n; for(; --n >= 0) *--p = n; } is just as valid.

They are.

--
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/
Reply to
Dik T. Winter

But, the second version *finishes* with p pointing to the -1st element of the array, which (we now know) is undefined, and guaranteed to break an AS/400. The first version only finishes with the integer n == -1, and the pointer p is stil "valid". This is the discrepancy that irks me.

Yes, that one is certainly going to fly, even on the AS/400, as p doesn't ever point to p(initial) - 1. But it is (IMO) less idiomatic than the other construction. Certainly, different people's experience will differ, there, and certainly, different processor architectures often have better or worse support for one form or the other. In my experience, post-modification is more common (or, rather, more often fast, where both are available), but quite a few processors have no specific support for address register increment or decrement addressing modes.

Come again? This is the whole point that people (well, me, anyway) have been arguing about! If they were truly equivalent (and the non-unit-stride cousins), I'd go home happy.

--
Andrew
Reply to
Andrew Reilly

Sure there is. All the world's a VAX (but with IEEE float), with plausible exceptions for pointers different length than int. I'd also wear alignment restrictions pretty happily, as long as they're reasonable. Either-endian word significance is fine, too. Show me a "normal-looking" modern architecture that doesn't fit that description, in a material sense. Even most of the DSPs developed in the last ten years fit that mould. Mostly, so that they can run existing C code well. [The few that have been developed in that time frame, which *don't* fit that mould, are not intended to be programmed in C, and there's no reason to expect that they will be.]

I'd like the pointer memory model to be "flat" in the sense that for p, a pointer to some object, (p += i, p -= i) == p for any int i. (In a fixed word-legnth, non-saturating arithmetic, the "flat" geometry is really circular, or modulo. That's as it should be.)

[I'm not interested in arguing multi-processor or multi-thread consistency semantics here. That's traditionally been outside the realm of C, and that's probably an OK thing too, IMO.]

Cheers,

--
Andrew
Reply to
Andrew Reilly

Ironic, considering VAXen don't have IEEE float. Why not just say all the world's a 386? Oh, wait, 386 has that segmented-addressing silliness.

Reply to
Jordan Abel

On 2006-03-23, Andrew Reilly wrote: >> [segmented architectures and C] > > How about: > > int a[10]; > foo(a + 1); > > where > > foo(int *p) > { > p -= 1; > /* do something with p[0]..p[9] */ > } >

I think you're overlooking something about how segmented architectures actually work. I'm not sure exactly what, so I'll go through this in excruciating detail and hopefully it'll help. (I don't know the AS/400, so this is likely wrong in detail, but the general principle is the same.)

Going back to the code:

This compiles to "get me a new segment, of size 10*sizeof(int), to hold a."

At runtime, this results in a new segment identifier, which I'll suppose has the value 1234. This value isn't an address; it doesn't "mean" anything except that it's an index into some OS-level or machine-level table somewhere. That table holds the size of the segment; supposing sizeof(int) is 4, that size is 40.

This compiles to "create a pointer to a, add 1 to it, and call foo."

At runtime, the pointer to a has two parts: a segment identifier, namely 1234, and an offset, namely 0, which I'll write as

1234:0. Adding 1 (4 at the machine level) gives 1234:4. This is within bounds, so no traps occur.

Meanwhile,

This compiles to "subtract 4 from the pointer p".

At runtime, when called as above, this subtraction converts 1234:4 to

1234:0. This is within bounds, and no traps occur.

and this compiles to "add values between 0 and 36 to p and dereference".

At runtime, these additions yield pointers between 1234:0 and 1234:36; these are all within bounds, so no traps occur.

You'll note that the compilation of foo does not require knowing what or how many values p points to.

It doesn't.

Because p -= 2, when performed on the pointer 1234:4, tries to deduct

8 from the offset field. This underflows and traps.

That's fine too, because 1234:36 - 36 yields 1234:0, which is still within bounds.

You may have noticed that the compilation I did above is not actually standards-conforming; because of the one-past-the-end rule, the size of the segment for the array "a" has to be one larger than the array. Otherwise, forming the pointer to the one-past-the-end element would trap.

That doesn't follow. What I've described above allows pointing into the middle of objects, but it doesn't yield the memory model you're envisioning.

--
   - David A. Holland
     (the above address works if unscrambled but isn't checked often)
Reply to
David Holland

I think you're overlooking my assertions that I don't care how (some) segmented architectures actually work. They are by *far* the least significant of the processor architectures in use at the moment. Most of the segmented architectures (by installed number: x86) are not used as such, and even those that are do *not* have the particular behaviour with respect to the range of the "offset" component of the non-flat pointer.

Sure, it's possible to write C code that operates within the restrictions of such architecures. It's even easy. It is, however, not historically representative of quite large bodies of C code. That's OK. The architecture wasn't designed to run C code, and is not primarily coded in C.

And this is the behaviour that is at odds with idiomatic C. The standard has effectively forbidden such a construction (which is well defined and works perfectly on every other architecture, and is convenient in many obvious algorithms) just because this one architecture traps before any attempt is made to access memory. The onus should instead have been on the C implementation on this particular machine to paper over this machine defect.

Only because the machine architecture is antipathetic to C. The syntax and behaviour of C operators offers no suggestion that symmetrical behaviour, or non-unit steps past the end of the "object" would fail, when that particular idiom is catered-to (by hacking the underlying object model: over-allocating the memory segment).

Just because you can use a restricted subset of the behavour of C "naturally" on such trap-on-point segment machines is a pretty poor argument for restricting the semantically defined "correct" behaviour on all other architectures to that particular subset.

Look: lots of machine architectures have restrictions such that the full semantics of C require more effort on the part of the compiler. For example, lots of modern processors have quite restricted range for "fast" immediate address offsets. Does this mean that the standard should restrict the size of stack frames and structures so that all elements can be accessed with "fast" instructions? No, of course not. The compiler must issue more instructions to load and use large offsets in those situations so that larger structures can be accessed.

--
Andrew
Reply to
Andrew Reilly

Clean up, release resources, and get out.

--
Al Balmer
Sun City, AZ
Reply to
Al Balmer

Heh. The presence of an open balcony on the 15th floor implies the ability to jump off.

--
Al Balmer
Sun City, AZ
Reply to
Al Balmer

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.