Making Fatal Hidden Assumptions

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Mar 24, 2006 11:36 PM

Sure. And it's OK to talk about it, too. No harm, no foul.

Forming a pointer to non-object space is "talking about it". Outlawing talking about it goes against the grain of C, IMO.

--
Andrew

- S
- Stephen Sprunk
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Mar 25, 2006 4:55 AM

Interestingly, I wasn't even aware the AS/400 did this until I started reading comp.arch, yet in nearly 10 years of C coding I've never (even accidentally) written code that would trigger such a trap. I thought the rule against forming pointers outside an object made sense, and I never saw any valid reason to do so since the alternative was always cleaner and more obvious to me. If your pointers are never invalid, you never have to worry if it's safe to dereference them. Then again, I set pointers to NULL after every free() too, so you probably consider me paranoid.

I also follow several open-source projects that have AS/400 ports, and the patches those port maintainers submit is very rarely in this area: it's usually in the OS-specific API calls (which the Windows folks have to submit as well), Makefile adjustments, etc.

This "defect", as you so impolitely call it, is considered by the folks that use such machines to be a feature, not a bug. Specifically, it is a feature that _catches_ bugs.

Personally, I'd love if x86 had some automatic means to catch invalid pointers on formation instead of catching them on access. Even the latter isn't very effective, since it only catches pointers that are outside that page of _any_ valid object; it happily misses accesses not only outside the original object but also outside of any valid object but on a valid page.

You might consider it "correct" to form invalid pointers, which I'll grant seems to make a tiny bit of sense if you're used to algorithms that do that, but if being unable to do that is the price one must pay to catch invalid accesses, that's a tradeoff I'd make.

Over-allocating the segment defeats the main purpose of the model: catching bugs. At best, when your hack does catch a bug, you'll usually be looking in the wrong place.

S

--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS         smart people who disagree with them."  --Aaron Sorkin 

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from
http://www.SecureIX.com ***

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Mar 25, 2006 6:43 AM

After following this thread for quite a while, I still do not understand what the problem is even with separate data and address registers and memory access rules.

While very trivial addressing expressions (usually register indirect or base+offset) can be handled directly by the memory access unit in most architectures, any more complex addressing expressions (e.g. multidimensional arrays) needs a lot of integer arithmetic processing until the final result is _moved_ into the address register for the actual memory access.

With separate data and address registers p=s-1 and p++ could as well be calculated in integer registers and the final result (==s) would be transferred to the address registers for memory access.

I do net find this problematic even in segmented access, unless saturating integer arithmetic is used.

Paul

- W
- websnarf
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Mar 25, 2006 9:27 AM

The earliest citation I found was from 1987 by Alan Mycroft here:

formatting link

As to when some company is going to patent this technique some time in the future, we'll just have to wait and see.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Mar 25, 2006 1:47 PM

I do signal processing code. Working backwards through data, or in non-unit strides happens all the time. I expect that *most* of my code, for the last twenty years would break according to this rule. None of it is "incorrect" (on the platforms that it runs on). None of it accesses data outside allocated blocks of memory. It just happens that the pointer-walking access code leaves the pointer dangling past the allocated block, after the last access.

This rule essentially means that *p-- is an invalid access mechanism, unless peculiar care is taken to exit loops early, while *p++ is valid,

*only* because they made a particular exception for that particular case, because they figured that C compilers on AS/400 systems could afford to over-allocate all arrays by one byte, so that that last p++ would not leave the pointer pointing to an "invalid" location. That's a hack, plain and simple.

Explicit index arithmetic, rather than pointer arithmetic, I guess?

See: it's not symmetrical or idempotic after all.

Hey, I do that too. I just don't consider leaving a pointer dangling one element short of an allocated array to be any less "invalid" than dangling one element past the end. Or two elements, or some other stride.

Sure. They write code for banks. Good for them. That machine feature probably works beautifully in that situation, with the designed-for COBOL or PL/1 codebase.

I write code that has to waste as few cycles as possible, and take up as little code space as possible, and be portable to the widest variety of chips as possible. I don't need to be unrolling the last loops of my algorithms, just to make sure that the pointers aren't decremented that one last time.

You know, you can have that, if you want it. There are plenty of people who build C compilers and run-time environments that put all sorts of run-time checks into your code. Or: you could use a language (like Java) that gave you good solid guarantees that you'll get an exception if you even try to read something out of range. And they don't have pointers as such to worry about. Peachy keen.

It *doesn't* catch invlaid accesses. If anything, some other mechanism catches invalid accesses. Traping on out-of-range pointer formation just gets in the way of clean code.

You seem to have missed the part of the discussion where over-allocation was the concession that the AS/400 C implementers gave to the standards body so that the wildly more common loop-until-just-past-the-end idiom worked OK. It's *their* hack. Bet their COBOL and PL/1 and Java compilers don't have to over-allocate like that. They declined to over-allocate by one element before an array because there's no telling how large each element might be. That would be too costly. So you end up with this pallid, asymmetric shadow of the C that might have been (and once was).

--
Andrew

- S
- Stephen Sprunk
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 12:46 AM

Perhaps. Or perhaps the same operations are allowed on address registers. I don't know the AS/400 well enough to say, but I'm certain that there are instructions to increment/decrement address registers. I'd also expect complex addressing modes to be available on instructions using those registers, whereas you'd have to use several arithmetic instructions on integer registers.

Considering that s is probably already in an address register, doing the manipulation your way would require transferring it to an integer register, doing the decrement, then doing the increment, then transferring it back to an address register when it's needed for dereferencing. Why do that when you can adjust the address register directly?

Also consider that pointers on such a system are likely to be longer than the largest integer register. That means you'd have to store the pointer to RAM, load part of it into an integer register, manipulate it, store that part, load the other part, manipulate it, store it, and load the new pointer back into an address register. That's a lot of work to do.

It simply doesn't make sense to do things that way since the only purpose is to allow violations of the processor's memory protection model. Work with the model, not against it.

S

--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS         smart people who disagree with them."  --Aaron Sorkin 

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from
http://www.SecureIX.com ***

- J
- Jordan Abel
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 1:35 AM

Because it's a stupid memory protection model.

Why can't the trap be caught and ignored?

- K
- Keith Thompson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 1:53 AM

[...]

I asked in comp.std.c whether the AS/400 actually influenced the C standard. Here's a reply from P.J. Plauger:

] AS/400 might have been mentioned. Several of us had direct experience ] with the Intel 286/386 however, and its penchant for checking anything ] you loaded into a pointer register. IIRC, that was the major exmaple ] put forth for disallowing the generation, or even the copying, of an ] invalid pointer.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 3:06 AM

Why force the AS/400's restrictive memory model on coders for all other architectures? The "use the subset that's known to work anywhere" argument is no kind of absolute: it's a moveable trade-off. Some systems are just so restricted or so different, and represent such a small portion of the market that it doesn't make sense to accommodate them: they need to accommodate the wider community, if they want to join in that particular game. Now, obviously, the AS/400 memory model managed to sneak it's nose into the C standards tent one day when the standards body was feeling particularly inclusive, and now we're all saddled with it (to mix a few metaphors.)

--
Andrew

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 3:22 AM

I don't understand this argument. The 286/386 doesn't even *have* pointer registers, as such. It has segment descriptors, which can be used to make things complicated, if you want to, but when you use a 286 as the 16-bit machine that it is, then there is no issue here at all. Similarly, the

386 can be used as a perfectly reasonable "C machine", and generally is, these days. It only gets curly when you try to synthesize an extended address range out of it. Unfortunately, the dominant compiler and platform made a hash of that, rather than putting in the effort to make it work in a (more) reasonable way.

Since that particular platform is (thankfully) falling into obsolescence, can't we start to consider tidying up the standard, to allow more traditional, idiomatic, symmetrical codeing styles? Restore pointer-oriented algorithm expressions to their place of idempotic symmetry with index-oriented expressions? Please?

--
Andrew

- C
- Chris Torek
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 3:56 AM

It has a 20-bit architecture, and people did (and still do) use it that way.

And now the x86-64 is coming, and everything old will be new again.

--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: forget about it   http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

- K
- Keith Thompson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 4:23 AM

I don't know enough about the 286/386 architecture(s) to offer any meaningful commentary on this. Possibly some committee members thought that future architectures might take some ideas from the

286/386 and extend them.

The only way that's going to happen is if somebody (1) comes up with a specification and (2) pushes it through the committee. Advocating it in comp.lang.c won't get it done.

Step 1 means, for each pointer operation, either specifying its semantics, or stating that the behavior is either implementation-defined, unspecified, or undefined. Once you get into the details, you can expect a lot of arguments, such as people pointing out that the suggested required semantics won't necessarily work on some real-world system(s).

Step 2 is left as an exercise.

Or you can create your own language, or you can limit your development to implementations that you *know* meet your requirements (which go beyond the requirements of the current standard).

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.

- K
- Keith Thompson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 5:07 AM

As far as I can tell, the x86-64 uses (or at least is capable of using) a flat 64-bit address space.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 8:12 AM

It's vaguely plausible to call the VM86 (real-mode) x86 arch 20-bit, but it's a stretch, as no processor-visible registers, and no ALU ops are

20-bits long. It's 16-bit in the same sense that the later PDP-11s with various memory extension schemes were 16-bit. It still gets used, to some extent, because it's the boot environment of PCs.

The 286 could plausibly be called a 24-bit segmented machine, and shares much of the memory model from it's IBM FS, OS/36 (which grew up to be AS/400) and intel 432 anticedants. A nice protected architecture for Pascal, PL/1, COBOL, and other HLL's of the age. You certainly couldn't call it a "C machine" other than when used within it's 16-bit, flat memory model (small) modes. Everything else required language extensions ("near" and "far" pointers), and any pointer misbehaviour sanctioned by the standard and by the implmentations could reasonably be said to be limited to those extensions, anyway. The fact that as much milage was had out of C in that environment is a testament to the industry's determination and enthusiasm. When compilation was done so that non-standard pointer extensions weren't required in the source, then it should have been the system run-time that gave ground, rather than the standard. I doubt very much that any new development work is being done in 286 protected mode, anywhere.

The x86-64 is a lovely architecture for a C machine. Specifically, it has jetissoned much of the segmentation issues. All 64-bits worth of address space can be loaded into any "pointer" register, and manipulated with the full compliment of integer and logical operations (because the pointer registers are also the integer ALU registers), and the only time you can ever get a peep out of a trap handler is if you try to actually access memory at an address not mapped into the process' address space.

--
Andrew

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 8:26 AM

... snip ...

We already have an ugly example of this process, in C# and the entire .NET hoax, from people with more influence (and money) than Mr Reilly.

--
 Some informative links:
   news:news.announce.newusers
   http://www.geocities.com/nnqweb/
   http://www.catb.org/~esr/faqs/smart-questions.html
   http://www.caliburn.nl/topposting.html
   http://www.netmeister.org/news/learn2quote.html

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 12:24 PM

We have just been discussing in dozens of messages :-) that this would trap on AS/400 and that trap could not be ignored.

By doing the calculations in integer registers this problem can be avoided. Going this route would only be necessary when such problematic expressions exists in the source code, not always.

Paul

- R
- Richard G. Riley
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 1:18 PM

Your caveat covers you. It can have a flat address space, but also has its legacy "hw mode" allowing 16 & 32 bit stuff to see the relevant addressing space.

- S
- Stephen Sprunk
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 7:54 PM

It can't be ignored because (apparently) the AS/400 and similar machines only do permission checks on pointer formation. Once the pointer is formed, accesses do not need permission checks. If you were able to ignore the trap on formation, that would mean all pointer accesses would be exempt from the security model.

Personally, I'd rather have my processor trap when an invalid pointer is formed, since in my code such an occurrence is _always_ a bug. Waiting until the pointer is dereferenced makes it significantly harder to debug.

S

--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS         smart people who disagree with them."  --Aaron Sorkin 

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from
http://www.SecureIX.com ***

- K
- Keith Thompson
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 8:04 PM

The piece I missed is that an x86-64 system can run 32-bit code. If I compile and run a program on an x86-64 system, it uses 64-bit pointers. If I compile a program on an x86-32 system and copy the executable to an x86-64 system, it runs properly and uses 32-bit pointers. (At least on the systems I have access to.)

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.

- C
- Chris Torek
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sun, Mar 26, 2006 9:12 PM

Yes. I am not saying that x86-64 has re-created the old 80x86 segmentation model. No, this is merely the thin end of the wedge. Segmentation will come back, sooner or later. :-)

--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: forget about it   http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.