Making Fatal Hidden Assumptions

  • Michael Wojcik:

Al chose to just insult me, in his first message -- I don't know why, and generally I won't speculate what his reasons could be.

There is no technical content in your posting, but there is pack of lies and personal attacks, the usual hare-brained ad homimem attack.

Like I won't speculate about Al's reasons for going 100% personal, I won't speculate about your reasons.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Reply to
Alf P. Steinbach
Loading thread data ...

or

PDP was (almost) the entire product line of DEC for much of its life, containing some similar architectures and some quite different ones with (thus) different assembly languages.

You are probably thinking of PDP-11 autoincrement. This uses the location _addressed_ by a register (only) which it increments (by stride) at the same time: register char * a = ?, b = *a++; // compiles to MOVB (R1)+, R2 register int * a = ?, b = *a++; // compiles to MOV (R1)+, R2 // note PDP-11 pointers are byte addresses & this adds _2_ // I write register explicitly for clarity, although a compiler could // place variables not declared register in a register, and could // choose not to use a register even though you specify it.

- David.Thompson1 at worldnet.att.net

Reply to
Dave Thompson

You mean -2048 to +2047 for two's complement, and -2047 to +2047 for the extremely rare case of non-2sC or 2sC-with-trap-representation.

PL/1 is like DECLARE X FIXED BINARY (11); /* not counting sign */ Pascal uses the actual range like X: -2047 .. +2047 and Ada similarly except, as arguably usual, more verbosely in most cases. In both cases you are only guaranteed _at least_ 12 bits; in Ada you can additionally specify a 'representation clause' that requires exactly

12 bits (or a diagnostic if that can't be implemented).

The syntax INTEGER*n is a common extension in Fortran (though not standard) for an integer of n _bytes_ not bits.

I don't think this is true. In both languages it is fairly easy to do portable but perhaps overconservative code. In C it is easy to get fairly well 'down to the metal' if you want; in Ada it is fairly easy to get even further down if the compiler supports it. And at the extreme, Ada standardly requires an interface to assembler; C does not do this standardly, but practically all implementations have some way.

Neither motivation nor correct effect; see other posts.

s/much/noticeably/ and I agree. (Except see below.)

Also awk. (Although you could consider it subsumed by perl.) And all Unix shells more or less not just bash. Also the LISP tribe (Scheme, etc.) and Prolog here or perhaps a smidgen higher.

I'd insert Ada here.

I'd insert FORTH here. And maybe pull some of the more powerful macro assemblers above 'basic' assembler.

I wouldn't distinguish hex from binary; that's trivial.

I'd insert microcode and then registers/gates here.

And below that turtles.

- David.Thompson1 at worldnet.att.net

Reply to
Dave Thompson

There are C compilers that will add bounds-checking code for a[y]=x, and it's quite ugly if disassembled. I think what you mean is that C doesn't require implementations to do it (and most don't), but other languages do.

I shudder to think of what the asm would look like for a[y]=x if a were a C++ object with virtual methods for the [] and = operators, with the latter having to call a copy constructor on x. For fun, compile as PIC too.

However, I agree with the general statement that when you write C, you have a good shot at imagining what the corresponding asm would be. I'm not sure that makes it a glorified assembler, however.

C's greatest feature, and its worst, is that you can do all sorts of ugly unportable things that virtually no other HLL allows but also have portable constructs available: it's your choice which to use. This means you can write non-portable implementation code in the same language as portable user code, and IMHO is the reason for C's enduring popularity.

S
--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS         smart people who disagree with them."  --Aaron Sorkin 

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from
http://www.SecureIX.com ***
Reply to
Stephen Sprunk

If I can summarize this as:

-- the source code changes when the underlying processor architecture changes then I agree this is a key reason why i consider C a glorified assembler.

Interesting features. I'm not sure how much different multiple value returns would be from values returned via reference parameters (pointers). it sounds like a good idea.

Thanks for the contribution to the discussion. Ed

Reply to
Ed Prochak

If a system traps on a prefetch, it's fundamentally broken. However, a system that traps when an invalid pointer is loaded is not broken, and the AS/400 is the usual example. Annoying, but not broken.

Why IBM did it that way, I'm not sure, but my guess is they found it was cheaper to do validity/permission checks when the address was loaded than when it was used since the latter has a latency impact.

S
--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS         smart people who disagree with them."  --Aaron Sorkin 

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from
http://www.SecureIX.com ***
Reply to
Stephen Sprunk

That's pretty close. I think that the link to Dan Bernstein's page on the topic said it better than me: you can use different code and different approaches where it matters to both the program and to the target processor, but you can also use a simpler, generic approach that will just work anywhere, when absolute maximum performance isn't necessary.

The significant difference is that reference parameters (pointers) can't be in registers. (Not to mention the inefficiency of repeatedly pushing the reference onto the call stack...)

Say you have a few to half a dozen peices of state in some algorithm, and the algorithm operates through a pattern of "mutations" of that state, such that some or all of the state changes as a result of each operation. The only way to code that in C is either to write out the code that comprises the element operations of each pattern long-hand, or use preprocessor macros.

The most obvious concrete example of this sort of thing is the pattern where you have one or more "cursors" into a data structure, and code that walks through it, producing results at the same time. You want your "codelets" to return both their result *and* change the cursor to point to the next element in the list to be processed. In C, you can't have both the result and the pointer in registers, but that's how you would code it in assembly.

Cheers,

--
Andrew
Reply to
Andrew Reilly

In my experience (which is more limited to recent years than many others' here), it is typically do-gooder whites that are offended by words like "black" or "oriental" or "Indian" (referring to the US domestic variety).

I recall an interview of Nelson Mandela by (I think) Dan Rather shortly after the former's first election, and he was asked "How does it feel to be the first African-American president of South Africa?" Mandela was understandably confused, but the interviewer simply couldn't bring himself to say the word "black". Mandela finally figured it out and answered, but he had to come away from that thinking all Americans are complete dolts.

S
--
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS         smart people who disagree with them."  --Aaron Sorkin 

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from
http://www.SecureIX.com ***
Reply to
Stephen Sprunk

And I still say that constraining C for everyone so that it could fit the AS/400, rather than making C-on-AS/400 jump through a few more hoops to match traditional C behaviour, was the wrong trade-off. I accept that this may well be a minority view.

--
Andrew
Reply to
Andrew Reilly

It is. The C standard wouldn't just have to forbid an implementation from trapping when it loads an invalid address; it would have to define the behavior of any program that uses such an address. A number of examples have been posted here where that could cause serious problems for some implementations other than the AS/400.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.
Reply to
Keith Thompson

Why? It's not that difficult to define the behavior of a program that "uses" such an address other than by dereferencing, and no problem to leave the behavior undefined for dereferencing

Reply to
Jordan Abel

Why? It's not that difficult to define the behavior of a program that "uses" such an address other than by dereferencing, and no problem to leave the behavior undefined for dereferencing

Reply to
Jordan Abel

In article Jordan Abel writes: > On 2006-03-22, Keith Thompson wrote: > > Andrew Reilly writes: ... > >> And I still say that constraining C for everyone so that it could fit the > >> AS/400, rather than making C-on-AS/400 jump through a few more hoops to > >> match traditional C behaviour, was the wrong trade-off. I accept that > >> this may well be a minority view. > >

But that would have locked out machines that strictly separate pointers and non-pointers, in the sense that you can not load a pointer in a non-pointer register and the other way around. Note also that on the AS/400 a pointer is longer than any integer, so doing arithmetic on them in integer registers would require quite a lot.

--
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/
Reply to
Dik T. Winter

Surely there's some way to catch and ignore the trap from loading an invalid pointer, though. I mean, it stops _somewhere_ even as it is now, unless the register melts the silicon and drips through the floor, then accelerates to the speed of light.

Reply to
Jordan Abel

The problem is pointer arithmetic. For example, given:

#define BUFFER_SIZE /* some big number */ int buf[BUFFER_SIZE]; int *ptr = buf + BUFFER_SIZE; int offset = /* some number */

Requiring (ptr + offset - offset == ptr) probably wouldn't be too much of a burden for most systems, but requiring (ptr + offset > ptr) could cause problems. Given the current requirements, buf can be placed anywhere in memory; there just (on some systems) needs to be a single extra byte just past the end of the array. Requiring out-of-bounds pointer arithmetic to work "correctly" in all cases could be much more burdensome.

And avoiding creating invalid pointers really isn't all that difficult.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  
San Diego Supercomputer Center               
We must do something.  This is something.  Therefore, we must do this.
Reply to
Keith Thompson

You don't have to do that at all. As you said, AS/400 uses long, decorative pointers that are longer than integers. So no one's going to notice if what your C compiler calls a pointer is actually a (base, index) tuple, underneath. Being object/capability machines, these tuples point to whole arrays, not just individual bytes or words. The compilers could quite easily have managed all of C's pointer arithmetic as actual arithmetic, using integers and indices, and only used or formed real AS/400 pointers when the code did memory references (as base[index]). There's no need for pointer arithmetic outside this model, against different "base" pointers, so that's a straw-man argument.

Cheers,

--
Andrew
Reply to
Andrew Reilly

This is a really lame argument, IMO.

Given that we're working with fixed-word-length machines, rather than scheme's bignums, p + offset > p doesn't even necessarily hold for integers, so why should it hold more rigerously for pointers? Wrap-around or overflow is a fact of life for fixed-range machines. You just deal with it. Don't go too close to the edge, or make damn sure you're checking for it when you do.

Correctness depends on what you're trying to do. The one-byte-extra argument doesn't help the fact that the flat memory model of C will still "work OK" even if buf[BUFFER_SIZE-1] occupies the very last word in the address space of the machine: there's no room for even that single byte extra. Sure, in that instance, ptr = buf + BUFFER_SIZE will equal 0, and your pointer comparison may break if you do it carelessly, but ptr[-1] will still point to the last word in the array, and there are no dumb restrictions against iterating backwards through the array, or forwards with non-unit stride.

No, of course it isn't. Just use pointers as object handles, and do your pointer arithmetic with integers. Whoopee: exactly the same syntax and semantics as Pascal and Java. I wonder why we bothered with pointers in the first place?

--
Andrew
Reply to
Andrew Reilly

Just incase it hasn't been mentioned [a rather long thread to check!], and might be useful, Google has an interesting summary on finding a nul in a word by one Scott Douglass - posted to c.l.c back in 1993.

"Here's the summary of the responses I got to my query asking for the trick of finding a nul in a long word and other bit tricks."

formatting link

--
==============
Not a pedant
==============
Reply to
pemo

I must have missed the bit in the C Rationale where the committee wrote, "We did this for the AS/400". They probably thought it was obvious, since no other architecture could ever have the same requirements and support C.

OK, define the behavior of all non-dereferencing accesses on invalid pointers. Be sure to account for systems with non-linear address spaces, since nothing else in the C standard excludes them.

Yup. The AS/400 has a set of opcodes for manipulating integers, and a different set for manipulating pointers. Nothing in C currently requires it to treat the latter like the former, and I don't see any reason why it should. (Indeed, I admit to being mystified by Andrew Reilly's position; what would be gained by requiring that C implemen- tations have defined behavior for invalid pointers? How is leaving invalid pointer access undefined by the standard "constraining" C?)

No, there is not. The "trap" (a machine check, actually) can be caught, and it can be responded to, by application code; but ignoring it is not one of the options. On the AS/400, only LIC (Licensed Internal Code) can bypass memory protection, and the C implementation is not LIC.

The AS/400 uses a Single-Level Store. It has *one* large virtual address space for all user-mode objects in the system: all jobs (the equivalent of processes), all files, all resources of whatever sort. It enforces access restrictions not by giving each process its own virtual address space, but by dynamically granting jobs access to "materialized" subspaces. (This doesn't apply to processes running under PACE, AIUI, but that's a special case.)

Yes, it stops: if the machine check isn't handled by the application, the job is paused and a message is sent to the appropriate message queue, where a user or operator can respond to it.

That happens under LIC control. The C implementation can't override it; if it could, it'd be violating the system's security model.

Of course, the C implementation could emulate some other machine with less-vigilant pointer handling by generating some intermediate representation and interpreting it at runtime. That would have made the early AS/400s unusably slow, rather than just annoyingly slow, for C programs.

But in any case a favorite maxim of comp.lang.c applies here: what the AS/400, or any other extant implementation, does *does not matter* to the C standard. If we decommissioned all AS/400s today, there might be a new architecture tomorrow with some other good reason for disallowing operations on invalid pointers in C.

--
Michael Wojcik                  michael.wojcik@microfocus.com

The lecturer was detailing a proof on the blackboard.  He started to say,
"From the above it is obvious that ...".  Then he stepped back and thought
deeply for a while.  Then he left the room.  We waited.  Five minutes
later he returned smiling and said, "Yes, it is obvious", and continued
to outline the proof.  -- John O'Gorman
Reply to
Michael Wojcik

That would break inter-language calls, which were an absolute necessity in early AS/400 C implementations (notably EPM C), as they were unable to use some system facilities (such as communications) directly.

Prior to the ILE environment, there was no "linker" as such for most (all?) AS/400 application programming languages. Source files were compiled into separate program objects (*PGM objects) in the filesystem. Calls with external linkage were resolved dynamically. (This is closer to the external-call model COBOL uses, actually, so it made sense for the 400's primary audience.)

It would have been a real mess if the C implementation had to figure out, on every external call passing a pointer, whether the target was C (and so could use special fake C pointers) or not (and so needed real AS/400 pointers). Putting this burden on the C programmer would not have improved the situation.

And, of course, pointers in aggregate data types would pose a real problem. If a C program wanted to define a struct that corresponded to a COBOL group item, that would've been a right pain. Obviously, it's an implementation-specific task anyway, but on most implementa- tions it's pretty straightforward provided the COBOL item doesn't use any of COBOL's oddball data types.

That doesn't mean it couldn't have been done, of course, but it would have made C - already not a member of the popular crowd on the '400 playground - too cumbersome for all but the most determined fans.

As the Rationale notes, one of the guiding principles behind C is to do things the way the machine wants to do them. That introduces many incompatibilities between implementations, but has rewards of its own. Since C is rather unusual among HLLs in this respect, why not let it stick to its guns rather than asking it to ape all those other languages by hiding the machine behind its own set-dressing?

--
Michael Wojcik                  michael.wojcik@microfocus.com

Aw, shucks. And I was just trying to be rude.  -- P.J. Plauger
Reply to
Michael Wojcik

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.