Making Fatal Hidden Assumptions

On 2006-03-07, James Dow Allen wrote: > [...] but I'm sincerely curious whether anyone knows of an *actual* > environment where p == s will ever be false after (p = s-1; p++).

The problem is that evaluating s-1 might cause an underflow and a trap, and then you won't even reach the comparison. You don't necessarily have to dereference an invalid pointer to get a trap.

You might hit this behavior on any segmented architecture (e.g.,

80286, or 80386+ with segments on) and you are virtually guaranteed to hit it on any architecture with fine-grained segmentation. comp.std.c periodically reminisces about the old Burroughs architecture, and it's always possible something like it might come back sometime.

You will also see this behavior in any worthwhile bounds-checking implementation.

Yes, well, that's what comp.lang.c is about...

--
   - David A. Holland
     (the above address works if unscrambled but isn't checked often)
Reply to
David Holland
Loading thread data ...

[snip]

There's (at least) one more property I forgot to mention. Given:

#define LEN 100 #define INC 5000 /* modify both of these as you like */ int arr[LEN]; int *ptr1 = arr; int *ptr2 = ptr1 + INC; /* D */

would you also require that, at point D, ptr2 > ptr1? (If pointer arithmetic wraps around, this might not be the case even if adding and subtracting as above always gets you back to the original address.)

And you think that having the standard guarantee this behavior is worth the cost of making it much more difficult to implement C on systems where the underlying machine addresses don't meet this property, yes?

If so, that's a consistent point of view, but I disagree with it.

I'll also mention that none of this stuff has changed significantly between C90 (the 1990 ISO C standard, equivalent to the original ANSI standard of 1989) and C99 (the 1990 ISO standard).

In fact, I just checked my copy of K&R1 (published in 1978). I can't copy-and-paste from dead trees, so there may be some typos in the following. This is from Appendix A, the C Reference Manual, section

7.4, Additive operators:

A pointer to an object in an array and a value of any integral type may be added. [...] The result is a pointer of the same type as the original pointer, and which points to another object in the same array, appropriately offset from the orginal object.

[...] [... likewise for subtracting an integer from a pointer ...]

If two pointers to objects of the same type are subtracted, the result is converted [...] to an int representing the number of objects separating the pointed-to objects. This conversion will in general give unexpected results unless the pointers point to objects in the same array, since pointers, even to objects of the same type, do not necessarily differ by a multiple of the object-length.

The last quoted paragraph isn't quite as strong as what the current standard says, since it bases the undefinedness of pointer subtraction beyond the bounds of an object on alignment, but it covers the same idea.

The C Reference Manual from May 1975, , has the same wording about pointer subtraction, but not about pointer+integer addition.

So if you think that the requirements you advocate are "the very essence of the nature of C", I'm afraid you're at least 28 years too late to do anything about it.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
 Click to see the full signature
Reply to
Keith Thompson

a+9 > a+8 because a + 9 - (a + 8) == 1, which is > 0. Doesn't matter if the signed or unsigned pointer value wrapped around in an intermediate term. On many machines that's how the comparison is done anyway. You're suggesting that having the compiler ensure that a+8 doesn't wrap around wrt a is OK, but a+9 is too hard. I don't buy it.

Only if you put them there. (The real problem is objects larger than half the address space, where a valid pointer difference computation produces a ptrdiff value that is out of range for a signed integer.)

Unsigned ints have the nice property that (a + 1) - 1 == a for all a, even if a + 1 == 0. Overflow is generally no big deal in any case. (Other than the object larger than half the address space issue.)

The compiler can't necessarily avoid overflow, but it *can* arrange for pointer comparisons to work properly.

Seems like it will work at least as well as the usual unit-stride algorithm and idiom.

--
Andrew
Reply to
Andrew Reilly

So

and if the arithmetic happens to wrap round after s + N, you really are dead too.

It doesn't have to be about weird architectures and traps. No implementation can provide an unlimited range for pointer arithmetic without some kind of overflow behaviour, such as a wrap round. Granted a wrap-round needn't affect addition and subtraction, but it will affect comparisons.

Every allocated object comes with a limited range for pointer comparisons to satisfy p-1

Reply to
Robin Haigh

There are lots of embedded systems with 8- and 16-bit pointers. With the right value of buffer_pos, it wouldn't take a very large value of amount_needed for that addition to wrap and given you an incorrect comparison.

--
John W. Temples, III
Reply to
John Temples

How would you guarantee that a+(i+1) > a+i for all arbitrary values of i? It's easy enough to do this when the addition doesn't go beyond the end of the array (plus the case where it points just past the end of the array), but if you want to support arbitrary arithmetic beyond the array bounds, it's going to take some extra work, all for the sake of guaranteeing properties that have *never* been guaranteed by the C language. (Don't confuse what happens to have always worked for you with what's actually guaranteed by the language itself.)

[...]

But unsigned ints *don't* have the property that a + 1 > a for all a.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
 Click to see the full signature
Reply to
Keith Thompson

Er, didn't I point that fix out in the original article? That was the only error in the original sample code, all other problems can be tied to assumptions, which may be valid on any given piece of machinery. The point is to avoid making such assumptions, which requires recognizing their existence in the first place.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 Click to see the full signature
Reply to
CBFalconer

I get queasy reading the rants against 1's complement architectures; I wish Seymour Cray were still around to address this.

Michael Grigoni Cybertheque Museum

Reply to
msg

I have encountered situations where free(p); .... if (p == q) leads to the platform's equivalent of the much beloved "segmentation fault". Your theory means that this should have worked. Assigning NULL or a valid address to p after freeing avoids the error.

Incidentally, in gnu.gcc.help there is a discussion about much the same situation in C++ where someone gets in trouble for delete a; .... if (a == b) ... Happens only for multiple inheritance and only for gcc. Thread starts at

[snip!]

Cheers Michael

--
E-Mail: Mine is an   /at/ gmx /dot/ de   address.
Reply to
Michael Mair

Most coding was in flavors of RPG.

There is a large installed base of CISC (non Power-PC) AS-400s; they use a flat memory model reminiscent of the Multics design.

Michael Grigoni Cybertheque Museum

Reply to
msg

My last comment on the thread, hopefully:

No, they don't, but when you're doing operations on pointer derivations that are all in some sense "within the same object", even if hanging outside it, (i.e., by dint of being created by adding integers to a single initial pointer), then the loop termination condition is, in a very real sense, a ptrdif_t, and *should* be computed that way. The difference can be both positive and negative.

The unsigned comparison a + n > a fails for some values of a, but the ptrdiff_t (signed) comparison a + n - a > 0 is indeed true for all a and n > 0, so that's what should be used. And it *is* what is used on most processors that do comparison by subtraction (even if that's wrapped in a non-destructive cmp).

I actually agree completely with the piece of K&R that you posted a few posts ago, where it was pointed out that pointer arithmetic only makes sense for pointers within the same object (array). Since C doesn't tell you that the pointer that your function has been given isn't somewhere in the middle of a real array, my aesthetic sense is that conceptually, arrays (as pointers, within functions) extend infinitely (or at least to the range of int) in *both* directions, as far as pointer arithmetic within a function is concerned. Actually accessing values outside of the bounds of the real array that has been allocated somewhere obviously contravenes the "same object" doctrine, and it's up to the logic of the caller and the callee to avoid that.

Now it has been amply explained that my conception of how pointer arithmetic ought to work is not the way the standard describes, even though it *is* the way I have experienced it in all of the C implementations that it has obviously been my good fortune to encounter. I consider that to be a pity, and obviously some of my code wouldn't survive a translation to a Boroughs or AS/400 machine (or perhaps even to some operating environments on 80286es). Oh, well. I can live with that. It's not really the sort of code that I'd expect to find there, and I don't expect to encounter such constraints in the future, but I *will* be more careful, and will keep my eyes more open.

Thanks to all,

--
Andrew
Reply to
Andrew Reilly

Well I, for one, commented on the hidden assumption that must be made for what you call "the one real error" to actually be an error -- but it was not recognised! ;-)

[At the top of your original post did not, in fact, claim this was an error but you call it a "real error" later on.]

I feel that your points would have been better made using other examples. The context of the code made me read the C as little more than pseudo-code with the added advantage that a C compiler might, with a following wind, produce something like the assembler version (which in turn has its own assumptions but you were not talking about that).

I found Eric Sosman's "if (buffer + space_required > buffer_end) ..." example more convincing, because I have seen that in programs that are intended to be portable -- I am pretty sure I have written such things myself in my younger days. Have you other more general examples of dangerous assumptions that can sneak into code? A list of the "top 10 things you might be assuming" would be very interesting.

--
Ben.
Reply to
Ben Bacarisse

CBFalconer wrote (in article ):

Yes, which is precisely why I'm surprised at the ensuing debate over the original version, as my comments should reflect.

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those 
 Click to see the full signature
Reply to
Randy Howard

Sure, that's what it _means_, but...

the question is one of etymology.

Reply to
Jordan Abel

Um, I always thought that "within" and "outside" were two different things.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
 Click to see the full signature
Reply to
Keith Thompson

This is pure theology. the simple fact is that you can't GUARANTEE that p++, or p--, or for that matter p itself, points to anything in particular, unless you know something about p. And if you know about p, you are OK. What's your problem?

Paul Burke

Reply to
Paul Burke

Surely the camel's nose is already through the gate, on that one, with the explicit allowance of "one element after"? How does that fit with all of the conniptions expressed here about things that fall over dead if a pointer even looks at an address that isn't part of the object? One out, all out.

--
Andrew
Reply to
Andrew Reilly

It's an implementation of strlen(). One must expect it to be called with any pointer to a valid string - and those are usually pointers to the first byte of a memory block.

Richard

Reply to
Richard Bos

As previously stated, that only requires using one extra byte or, in the worst case of a HW word pointer, one extra word.

--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
 Click to see the full signature
Reply to
Flash Gordon

Are you quite sure that you know what the word "theology" means?

What Arthur wrote above is entirely correct. (Remember that undefined behavior includes the possibility, but not the guarantee, of the code doing exactly what you expect it to do, whatever that might be.)

What's your problem?

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
 Click to see the full signature
Reply to
Keith Thompson

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.