What are the most conservative, *practical* expectations I can make living in x86 real mode: TEXT of 64K and DATA of (disjoint)
64K? BSS in it's own segment? Or, shared with DATA?
And, how seemlessly will the compiler let me *implicitly* move those segments around as well as between them? (e.g., practical limitations on code/data sizes). "PC" handled 640K so should I expect that to be the size of my playground?
[Presumably, any "object" is constrained to fit within a single segment]
I imagine this will all be accomplished in the linkage editor (not visible to the source code).
So, the "most conservative" would be to assume a tiny-ish model -- 64K TOTAL address space? (with everything residing therein)
[I don't really care what it is, just need to know the constraints before I settle on a design. E.g., I surely wouldn't use int's if values would fit in char's and every byte of data "cost" me a byte of code!!]
It should, I think be possible to have DS, CS and SS each pointing to a different 64 KB segment. But .bss and .data would both be part of the data segment. Access to other segments (including pointers to stack data, if SS is not the same as DS) is via "far pointers".
I believe in real mode you can get 1 MB of address space total. I am not sure if things like video cards get mapped into that space at all, or are only accessible with I/O instructions.
It doesn't have to be - but it is going to be a little complex to access objects that span 64K segments.
Yes, x86 was screwed up the day it arrived on the market.
But the real question, which you knew someone was bound to ask, is /why/ are you using real mode on the x86? The chips boot in real mode, but you would normally jump to protected mode after a few instructions to set it up (unless you really are using an 8086 processor).
I believe I read somewhere that in modern x86 chips, the cache can be configured to work as ram (basically, you have a write-back cache but disable writing back to external memory), so that if you are working with x86 code this small, you don't need any external memory.
Typically compilers would offer several memory models you could use. One code (text) and one data segment was "small" model. You also had large (code and data could both be more than one segment, but no object could be bigger than a segment), and medium (one data, multiple code) was very common. Rarely did you see compact (one code, multiple data). The pointer sizes obviously varied between the models, and the larger pointers had performance impacts.
There would be some limitations even with the large data models (beyond the single-object size limit). Invariably your stack could still not exceed 64KB, and some compilers insisted on putting some data (and usually the stack as well) in one "primary" data segment (usually called DGROUP), and it was occasionally possible to run out of space there if you had masses of initialized storage (you could usually tell the compiler to put initialized items elsewhere).
Usually the compiler would include libraries compiled for each model, so all would work pretty much as expected.
Typically you'd select the memory model with a command line switch.
Some folks also supported "huge" model, which was like large, except objects bigger than 64K were supported (the generated code was often painful, but it worked). Tiny existed as well, which was the old single segment (".com") model, although that was more a mangling of small model in most cases.
In most cases you could explicitly deal with larger pointers in code, for example by declaring a pointer as "far". Although that would usually have library issues that you'd have to manually deal with (for example, a program compiled medium model would have nominal small data pointers, and while you could allocate additional "far" data, and access it via far pointers, you couldn't pass those addresses to a "normal" library function like strcpy(), which, in medium mode, would be expecting only small pointers).
But most serious compilers, and pretty much all C compilers, support(ed) large model just fine, and you basically see no oddities accessing the entire 640KB (or more, in 16-bit protected mode), although there is a performance hit. Nor were there usually any oddities with the other memory models (other than sometimes having different size code and data pointers). You did see some stuff if you built a mixed model application, where, as I mentioned, you'd have explicit "far"s in you code. If you *don't* do mixed model, it's just another platform.
One question is whether your target processor is 8086/88/186/188 or something bigger. The former wrap addresses at 1 MiB, so the old PC/AT trick to address 65520 bytes above 1048576 works only on the bigger processors.
Some older 8086 code (including base PC BIOS) used small model segmenting for ROM just below 1 MiB and RAM at 0. This is the cause why there is a thing called A20 gate in PC/AT and derivatives.
It depends on the compiler and on the memory model you selected when invoking the compiler.
Real-mode compilers usually had at least 4 memory models, sometimes more. It was a _long_ time ago I think the last time I used the Microsoft DOS C compiler there were 5:
tiny small medium large huge I think tiny meant everything was in the same 64K segment (all segment registers were equal and never changed). That may have only been used for .com files rather than .exe files. All pointers were 16 bits.
IIRC, small meant there was a single 64K text segment, and a single 64K data segment. All pointers were 16 bits.
In medium, I think each function (or file?) had it's own text segment, but there was still a single 64K data segment (pointer to function was
32 bits, pointer to data was 16 bits). Or was that large.
In huge every function (or file?) had its own text segment and its own data segment. All pointers were 32 bit.
Or something like that.
And then there was paging/overlay support you could add into the mix.
Grant Edwards grant.b.edwards Yow! Jesus is my POSTMASTER
at GENERAL ...
Depends on how you want to be "most conservative".
TINY model means you can effectively ignore segment registers and work with 16-bit flat addresses. Of course this is an assumption that doesn't hold everywhere. Being "conservative" from this standpoint would mean you have to assume that every object is in a different segment, i.e. LARGE / HUGE model.
The smallest would be putting into a single 64 KiB segment. That would be equivalent to a 1960/70's minicomputer.
Putting 64 KiB of TEXT in one segment and DATA+BSS in an other 64 KiB would be equivalent to code/data address space of some minicomputers. Some had issues with self modifying code, but unless you did something exotic, there should not have been much issues even on the x86.
Those small modes could be easily programmed in assembler, but with larger data models with segment overrides and long pointer made assembly programming really ugly, so it is understandable that high level support were developed.
Actually, Intel did not intend to use the 8086 real mode as such, only as bootstrap mode for 80286. The success of 8088 and original PC came as a surprise, and it influenced the design of 80386 with the virtual 8086 mode.
There were no PC's at the design time of 8086 and 80286, and the aim was for well protected real-time multitasking applications. So it was natural that 80286 does not voluntarily return back to real mode from protected mode. This led to a massive kludge in PC/AT, where the keyboard controller can reset the main processor and the RTC CMOS RAM will contain a code explaining why the reset was this time.
If you use the tiny (64K code+data) or small (64K code, 64K data) model, .bss and stack share space with your data. If you choose a model with multiple data segments, .bss and stack can be separate (but tool dependent you may have to ask for it explicitly).
The linker determines where your segments get placed (assuming the OS permits the requested placement).
In the "huge" model arrays and structs can straddle segments. However, you should take care that no individual element of an array or struct lies across a segment boundary ... I have found that many compilers don't renormalize intermediate pointers when indexing from a huge pointer, so you can get into trouble when the offset is close to or exceeds 64K. I've been bitten by that more times than I care to admit. [The solution is to deliberately construct a new huge pointer. When you store the new pointer it will be normalized and thus safe to use.]
Arithmetic on "far" pointers affects only the 16-bit offset and wraps at the segment boundary. This means that two far pointers having different segments can't reasonably be compared. If you are using multiple data segments and you need to compare pointers, you have to use normalized "huge" pointers. Do be aware that huge pointer arithmetic can be quite a bit slower due to (re)normalization of the results.
You really only need to choose the huge model if you expect a single array or struct to exceed 64K. If you need huge pointers purely for comparison, they can be mixed with far pointers in any of the multiple data segment models. Because far pointer arithmetic is faster, they should be preferred wherever you can live with their limitations.
Depends on the tool - some compilers use pragmas to control placement of objects into particular segments. Placement of the segments themselves is controlled by the linker (and/or OS).
It isn't bad until you grow beyond 64K data and are forced into explicit use of far or huge pointers. There's a model that allows