Relocating application architecture and compiler support

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 10:58 AM

In the 1974..78 period I developed my own p-code system. It was designed to run on at least the 8080, and was largely patterned after the HP3000 architecture. Code was loaded in segments, and those segments were never altered. Data was stack (and stack marker) relative. Unfortunately data could not be moved, due to the lack of hardware, which was a limitation _on the 8080_.

Code segments limited branching to functions, and the code for branches, conditional or otherwise, was always self-relative. Function call was done via a table appended to the segment, and addressed by table index. The table entries could be negative, when they specified a self-relative point within the segment, or positive, when they contained two fields in 0..127 (leaving a spare bit). These described segment number, and an index within that segments transfer table, and provided for inter-segment calls. This meant the stack marker on a call had to hold segment and segment displacement values.

The result was that a segment could be freely moved during any inter-segment call, and that no more than the single destination segment needed to be in memory at any one time. A single system table indexed by segment number sufficed to keep track of everything. A table required 8 bytes per segment for this, and was fixed in memory.

The system could execute an upper limit of approximately 3 Mbytes of code using the swapping algorithms (which were independent of the organization). For comparison the complete ISO compliant Pascal compiler occupied about 40 Kbytes. The same compiler, compiled to machine code for either the HP3000 or 8086 (MsDos), occupied about 150 Kbytes, which indicates the compactness of the actual pcode.

I intended the system to be directly portable to many architectures. When the 8088 effectively took over the PC world circa 1982 I started to port it, but never finished. The reason was that there was no need for it in my particular small niche, which was providing embedded machinery for medical testing, and revolved around 8080 based hardware I had designed and built. It all worked so well that I had no excuse for rebuilding it. It had the unfortunate limitation to 64k of data space, barring which it would be viable today.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 11:27 AM

Thanks. It would almost certainly have been an oddball one if so. We used a large range of compilers, several of which had been developed at Cambridge, but including other non-IBM ones. There were also a large number of student compiler projects that got to the point of usability.

Writing position independent code was pretty easy, but you had to separate out the system interfaces and/or write your own macros. IBM's standard ones often generated ACONs and VCONs.

Regards, Nick Maclaren.

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 6:25 PM

(snip)

(I wrote)

It might have been possible in a closed system, where users couldn't write any assembly code. I do remember CALL/OS, a timesharing system with BASIC, FORTRAN, and PL/I, but I don't know at all what it was like internally.

With a convention where some registers only held addresses, and the rest never did you would know which registers to change on relocation.

I suppose there should be two kinds of position independent code. One kind could be executed at any address, even more than one (virtual) address at the same time. The second could actually be moved to a different address while executing. It is the second kind that is more complicated because registers could hold addresses.

-- glen

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 6:43 PM

But once you do that it can't be relocated without knowing which registers contain addresses.

(snip)

This reminds me of the way Macintosh OS used to work. (Does it still do that?) Fortunately I never wrote any low level Mac code, but you had to be careful when things could be relocated, and not store addresses across system calls that could move things around.

In the code above, you can't move things around between the AR and the BALR. The original mac didn't do preemptive multitasking, so the user program always knew when things could move and when they couldn't.

On the other hand, for a system like 680x0 with separate address and data registers, maybe it isn't so hard.

-- glen

- A
- Anne & Lynn Wheeler
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 8:25 PM

it turns out that this is slightly different operation ... i (re)wrote the displatcher, scheduler, page replacement, etc for cp/67 ... a lot of it as undergraduate ... which was shipped in the standard product. a lot of this was dropped in the morphing of cp/67 into vm/370 ... but I got to put it (and a lot of other stuff) back in when i did the resource manager for vm/370

formatting link

because cp/67 (and vm/370) had virtual memory support ... each user had their own virtual address space ... where pages could be brought in and out of real storage at arbitrary real addresses.

the problem i had in the early 70s with location independent code ... was that i wanted to be able to page-map applications to files on disks ... and that multiple different virtual address spaces could share the same exact (r/o) page-mapped information.

the problem with the standard os/360 model was that when executable files out on disk were mapped into memory (virtual or real), the program loader had to run thru arbritrary locations in the executable image ... swizzling arbritrary address constants into different values (randomly spread thruout the executable image).

as a result, it wasn't just a simple matter of page mapping an executable image file (out on disk) into a virtual address space ... there was still all this address constant swizzling that had to be done before the program could actually start execution. Furthermore, the default was to touch every possible page that potentially contained an address constant (that concievably might be used ... whether it actually was used or not) and do the appropriate swizzling operation. And even further complicating the process was that the swizzling operation went on within the specific virtual address space that had just mapped the executable image.

So the swizzling operation that would go on in each virtual address space ... pre-touched and changed an arbritrary number of virtual pages (whether the actually application execution would touch those specific pages or not) ... as well as making the pages chnaged ... defeating the ability to have the program image mapped into r/o shared segments that were possibly common across a large number of different address spaces. The issue that i was trying to address wasn't what might be in the registers of each individual process context in each virtual address space .... it was trying to make the physical executable storage image of the application r/o shared concurrently across a large number of different address spaces. Any modification required on the contents of that executable image, defeated the ability to have it r/o shared concurrently across a large number of different address spaces (as well as possibly prefetching and changing pages that might never actually be used).

So that was the first problem.

A subset of the relocating shared segment implemention (that I had done in the early 70s) was picked up by the product group and released under the feature name as DCSS (DisContiguous Shared Segments). I had done a page mapped file system along with the extended shared segment support ... so that it was relatively straight-forward to page map objects in the file system into virtual address spaces. The page mapped file system wasn't part of the subset of stuff that was picked up for DCSS. random posts on the page mapped file system work

formatting link

They addressed the address constant swizzling problem by defining globally unique system addresses for every application that would reside in predefined virtual memory segments; loading the application at that predefined virtual memory location and saving the virtual address space image to a reserved portion of the system-wide paging system (in part because they had failed to pick up the page mapped filesystem enhancements).

So DCSS had a new kind of system command that would map a portion of a virtual address space to a presaved application image (and specify things like shared segments, etc).

So there were several limitations ... one it required system wide coordination of the presaved applications as well as system privileges to setup stuff (i.e. individual department couldn't enable their own private applications).

A large installation would have more applications defined in this infrastructure than could fit in a single 16mbyte virtual address space ... and as a result, there had to be careful management of applications that were assigned conflicting predefined, preswizzled virtual addresses. While no single user was likely to try an map all possible applications into a single address space at the same moment ... it was possible that a single user might need to map an arbritrary combination of applications (in total less than 16mbytes), some of which may have conflicting, pre-assigned and swizzled virtual address images. As systems got bigger with wider variety of users and applications, the problem of pre-swizzled virtual application images with conflicting virtual address locations increased.

So the next solution was to have multiple pre-swizzled application images defined at multiple different virtual address location. Users would decide on the combination of page image applications and libraries that needed to have concurrently loaded ... and try and find some possible combination of the multiple different images of each application that could comfortably co-exist in the same address space.

The original implementation that i had done from the ealy 70s ... had allowed page mapping arbitrary files as virtual address apace images at arbitrary virtual address locations .... w/o needing to swizzle address constants embedded in those page mapped images ... in part because I had both done a lot of work on allowing address location independent executable images

formatting link

as well as page mapped filesystem enhancements

formatting link

and furthermore that an executable image could occupy a read-only shared segment ... were the same exact same pages were mapped concurrently into multiple different virtual address spaces ... at possibly different virtual addressses.

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 8:44 PM

Eh? Not at all. Every compiler that had an assembler interface also had a set of documented conventions, which people writing assembler had to follow. Not a problem.

We aren't talking about moving a task once it had been started, which I agree is and was very hard, but about position independent code. I.e. code that could be read into memory, anywhere, without relocation, and executed. It wasn't hard - just a bit tedious.

I can still describe how it was done, if you are interested.

Regards, Nick Maclaren.

- S
- Stephen Fuld
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 9:19 PM

snip

Certainly on S/360, but not on other then contemporary architectures. All that is needed is a protected "system base regiser" which is part of the program's context, to which the hardware adds all program addresses. Then all programs can start at zero and go to whatever, and all that is needed if the OS needs to move a program in mid execution is to change the base register.

BTDT

--
 - Stephen Fuld
   e-mail address disguised to prevent spam

- P
- pc
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 9:41 PM

i guess all those 'adcons' and the expectation that appl'n programs must anticipate interrupts were the real complication. the base-displacement scheme seemed clever to me, but BAL was my first language so maybe i just didn't know any better. didn't have much to do with Cobol, but i remember seeing these giant programs that were written for 360 DOS and depended on a bit of link-editor support. feature referred to as some kind of 'overlay'. i vaguely recall that they could thrash away for hours, doing the equivalent of an OS 'xctl' several times a second.

but i think the original idea must have been far-sighted for its time and probably under-exploited, just like Gene Amdahl's TRT. had a chance to meet him when he was in his seventies, still working and answering the phones when the receptionist wasn't around. passed it up partly because i thought he deserved more respect than listening to me ask why cpu's didn't have double-ended stacks. in spite of all the fooferah that got added by the operating systems, what he and others did still seems very clear and clean to me. not like all that little-endian Intel stuff which i gather has complicated everybody's life in unseen ways just so some 8080 programs could keep working.

a boss once made us read a business article about how complicated jet engines are when they don't really need to be if we forget about creature comforts for passengers and municipal bylaws. i guess 360's were a bit like this - you had to air condition them, but they also had heaters to warm up the core!

pc

- A
- Anne & Lynn Wheeler
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 9:49 PM

so tss/360 had page mapped filesystem as standard and an objective that a portion of virtual address space could be mapped to the executable image out in the filesystem ... and that the mapping process didn't have to track thru arbritrary virtual pages of that executable image ... swizzling address constant. The system could do the mapping and start program execution (almost) immedately ... possibly w/o having to prefetch any of the executable image.

furthermore, if other address spaces were already sharing that executable image on a r/o basis ... the mapping process would just setup the segments so the different address spaces utilized the same virtual pages.

so tss/360 had system wide convention ... assembler, applications, compilers, etc ... that positioned any address constants that needed swizzling separately from the executable image. This separate table of address constants needing swizzling would be prefetched (as part of mapping the executable file to the virtual address space) and the necessary swizzle of address constants could be done in a private area for each specific virtual address space. There then were system wide conventions on how executable code accessed the table of (swizzled) address constants .... the executable code image could be exactly the same for all virtual address spaces ... and in fact, the same exact executable image could be the same exact virtual pages (using shared segments) .... but the swizzled address constants would be located in an (tss/360 system-wide convention) area separate from the executable program image.

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 10:58 PM

Stephen Fuld wrote: (snip)

But there is one thing that OS/360 does easily that is hard in other systems, which is what the LINK macro does. To take another program, load it into the same address space, execute it (with the original still addressable) and then return.

I remember TOPS-10, where each program was loaded with virtual origin of 0. There was a system for sending files to the print queue which involved writing the entire address space to disk, running the QUEUE program with an option to execute a program when it was done, which would then reload and continue on.

Any system with a fixed load point can't have two programs in the same address space at the same time.

In the OS/360 case, besides doing LINK, many routines, especially access methods, were directly addressable somewhere in system space. Now, there are systems which reserve parts of the address space for system and parts for user, but that doesn't help LINK.

-- glen

- A
- Anne & Lynn Wheeler
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 11:29 PM

basically you could do link-like function in both os/360 as well as tss/360. the primary difference was that os/360 had to read the executable file image into memory and then run the address constant swizzle against that were randomly sprinkled thruout the execution image.

in the tss/360 scenario ... you just had to memory map some portion of the virtual address space to the executable file image on disk (and let the paging operation fetch the pages as needed) ... and all the address constants needing swizzling were kept in a different structure.

os/360 had a single real address space orientation so that it was possible for multiple different processes to share the same executable image because they all shared the same (single, real) address space.

tss/360 had multiple virtual address spaces .... and for multiple different processes to share the same exact executable copy ... it relied on the shared segment infrastructure.

in the os/360 scenario, all program loading was to a unique real address ... since all processes shared the same, single real address space (there was no chance for program address conflict ... since each program could be assigned a unique real address as it was read into memory).

in the tss/360 scenario, different processes might load & populate their own virtual address space in difference sequences ... potentially creating address assignment conflict if there was a requirement that each application have identically assigned virtual address across all virtual address spaces.

in the os/360 scenario ... if you had two different processes, the first LINKed app1, then app2, then app3, and finally app4 and the second LINKed app2, then app4, then app3, and finally app1 ... it all fell out in the wash since there was a single global real address space.

a difference between the os/360 and tss/360 operation, is that os/360 allowed all the address constants (needing position location swizzling) to be randomly sprinkled thruout the executable image. tss/360 collected address constants (needing swizzling) into a different structure.

both could do dynamic process location specific binding at program load time. however, in the tss/360 scenario ... different processes could share the same exact executable image at different address locations ... because the executable image was a separate structure from the structure of address constants (that required location specific swizzling).

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

- S
- Stephen Fuld
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Jan 21, 2005 11:52 PM

In the interests of simplicity, I left out a few things. In the first implementation that I know about, there were two "base registers", typicaly one for instructions and one for data. There was an instruction to "link" to another chunk of code. The typical mechanism for things like access methods, was to keep the same data area and link to the code, replacing the user program code temporarily. Thus the access method code could access the user program data just fine. When it was done with whatever it had to do, ir executed the instruction to return to the user code and replace the base address with the original one. Later models added a second set of two base registers that allowed further flexibility for more complex schemes. Even later ones had more features that moved the architecture more toward a segment scheme (with up to 16 active segments out of something like 32K total available) that is totally independent of the (later introduced) paging mechansim.

Yuck! That sounds awfull. It seems far simpler for the program to write the file to disk and simply pass a file name to the printer writer. But I am sure there were other constraints that I just don't know about.

Actually it can, but it takes some doing, like the hardware I talked about above, or a convention about using different parts of the address space. With the extra hardware, it isn't much of a problem.

One extra advantage of the scheme I an taliking abouit is that there is no need for the "address swizzling stuff" on load that was required by OS/360. That speeded up program loads.

Right. See above.

--
 - Stephen Fuld
   e-mail address disguised to prevent spam

- P
- Peter Flass
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Jan 22, 2005 1:38 AM

I was going to say something like this, but then I reread Stephen's comments about the system-base-register being part of the program's context. Possibly he was thinking of the equivalent of LINK macro resetting the register for each call/return. Then, of course, it blows away any access to the other program's data.

- P
- Peter Flass
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Jan 22, 2005 1:39 AM

So why was TSS considered so slow? (not that OS/360 was any ball of speed).

- A
- Anne & Lynn Wheeler
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Jan 22, 2005 2:39 AM

lots of reasons ... long path lengths ... over complex algorithms ... bloated code size.

lets say you had os/360 (or cms) environment that did program load ... it could queue up a single i/o that read application into memory in multiple 64k chucks ... all batched together.

tss/360 could have a large compiler laid out as 1mbyte memory mapped file ... do a memory mapping for 256 4k pages ... and then possibly individually page fault each 4k page one at a time. To some extent they got over enamored with concept of one-level store and paging and lost the fact that the purity of the concept could significantly increase the latency if all transfers had to be serialized 4k bytes at a time (as opposed to doing program loading batching in larger transfer units).

kernel was really large ... tss/360 supposedly was going to have target of 512k 360/67 ... but quite quickly grew to minimum 768k storage because of bloated fixed kernel. at one point there was a statement about a set of benchmarks down on a 1mbyte 360/67 uniprocessor and on a 2mbyte 360/67 two processor machine ... with the

2mbyte/2processor having 3.8 times the thruput of the single processor benchmark. the official coment was (while the uniprocessor system was really slow), the two processor benchmark having 3.8 times the thruput of the single processor benchmark demonstrated how advanced the tss/360 multiprocessor algorithms were (getting almost four times the thruput with only twice the hardware). the actual explanation was that the kernel was so bloated that it could hardly fit in 1mbyte configuration .... and that in the 2mbyte configuration there was almost enuf memory (not in use by the kernel) for running applications.

benchmark at the university on 768k 360/67 running tss/360 .. i believe prerelease 0.68 with four emulated users doing mix-mode fortran edit, compile and execute ... had multi-second response for trivial program edit line input. at the same time, on the same hardware with cp/67 running same mix-mode fortran edit, compile and execute had subsecond response for trivial edit line input .... but running 30 conccurent cms users ... compared to 4 tss users (although i had already done a lot of cp/67 performance enhancements).

there was folklore that when tss/360 was decommited and the development group reduced from 1200 people to possibly 20 ... that a single person now had responsibility for the tss/360 scheduler and a large number of other modules. supposedly the person discovered that on a pass thru the kernel ... every involved kernel module was repeatedly calling the scheduler ... when it turn out that it was only necessary to call the scheduler once per kernel call (rather than every module calling the scheduler resulting in multiple scheduler calls per kernel call). fixing that is claimed to have eliminated a million(?) instructions per kernel call.

at some point in the tss/370 life ... it had been relatively stable for a long time ... with a lot work over the years on performance tweaking ... they claimed that they got the total pathlength to handle page fault (page fault, page replacement select, schedule page read, task switch, page read complete, task switch, etc) down to maybe five times better than MVS ... but still five times longer than my pathlength for equivalent sequence in vm/370.

it was in this era (late 70s) that they did the unix project for at&t ... where a kernel semantics interface was built on low-level tss/370 kernel function to support high-level unix environment. i believe this tss/370 environment for unix was called ssup.

one of the early tss/360 features that i gave them trouble about was the whole thing about interactive vs-a-vs batch. early 360/67 had 2301 "drum" for limited amount of high-speed paging space and the rest was

2311 (very slow) disks. the scenario went something like ... if some finished a line in edit and hit return (on 2741) ... tss kernel would recognize it as interactive and pull the pages that the task had been using the previous time from 2311 into memory and then write them all out to 2301 ... once that was done it would start the task ... which would allow it to page fault the pages back into memory from the 2301. when the trivial interactive task went into wait (for more terminal input), the kernel would pull all of the tasks pages off the 2301 back into memory and move them to 2311. This dragging of pages off 2311 int memory and out to 2301 and then dragging the pages off 2301 into memory and out to 2311 would occur regardless of whether there was any contention for either 2301 space or real memory space. It was a neat, fancy algorithm and they would do it everytime ... whether it was necessary or not ... just because it was so neat(?).

... there is a lot more ... but that should be enuf for the moment.

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

- J
- jmfbahciv
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Jan 22, 2005 1:15 PM

It can't be moved while executing because the monitor is the one whose code is getting executed to do that shuffling task.

/BAH

Subtract a hundred and four for e-mail.

- J
- jmfbahciv
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Jan 22, 2005 1:23 PM

involved

Nope. Glen is misremembering. There was a way to "dump" your core to disk and then our spooling feature would send it to the print queue. This often happened, under our batch system. But I don't know a single person who ever did this command on purpose :-).

Glen was talking about a feature of the -10 in that all files with a LPT extension, as in FOO.LPT, would automatically be queued when the job logged out.

/BAH

Subtract a hundred and four for e-mail.

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Jan 22, 2005 1:45 PM

Eh? You are confusing two things. If that refers to physical memory, then that is just virtual memory support. Not a problem.

If it refers to virtual memory, it won't work for any program that takes the address of code and needs to compare it with others. In C, for example, you can take the address of functions and manipulate them ad lib. You can't relocate them within virtual memory thereafter.

Yes. It is useful and can be (and has been) implemented even in variants of Unix. It needs no extra hardware support.

All numbers of base registers from 1 to 3 were common, and more were not unknown. If I recall, I once implemented a language system with over 3, but can't now remember exactly why. Something like the data for the program and run-time system being separate.

Regards, Nick Maclaren.

- S
- Stephen Fuld
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sun, Jan 23, 2005 7:37 AM

Well, it wasn't virtual memory in the conventional sense. There was no paging, and you couldn't define a single program larger than the physical memory (without doing the link type stuff). In a sense, it was "inverted" virtual memory, or perhaps "virtual addresses" in that wheras in virtual memory you could have addresses larger than physical memory that were mapped to the same physical address at different times, in this scheme, you had multiple pieces of code with the same "virtual" address mapped to the same physical address at different times.

Remember, the base register was invisible to the program. If a program loaded an address into a register, or passed it as an argument, it was always a program relative address. If the address was used to address physical memory (either via a load/store or a jump/call) the hardware added the base register contents to it at that time.

--
 - Stephen Fuld
   e-mail address disguised to prevent spam

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sun, Jan 23, 2005 8:28 PM

(snip of S/360 and base displacement addressing)

Did you ever write C programs for the original Macintosh? (I didn't, either, but I read some of the manuals.)

Many routines would, instead of pointers return pointers to pointers such that the OS could move things around when necessary. (The 68000 didn't have an MMU.) There were complicated rules on the use of pointers because you had to know when the OS could move things and make sure you didn't keep a pointer to something that could be moved.

With OS/360 once you load an address into a register that object can't move because the system doesn't know to change the register.

-- glen