x86 real mode

D

Don Y 11 years ago

Hi,

What are the most conservative, *practical* expectations I can make living in x86 real mode: TEXT of 64K and DATA of (disjoint)

64K? BSS in it's own segment? Or, shared with DATA?

And, how seemlessly will the compiler let me *implicitly* move those segments around as well as between them? (e.g., practical limitations on code/data sizes). "PC" handled 640K so should I expect that to be the size of my playground?

[Presumably, any "object" is constrained to fit within a single segment]

I imagine this will all be accomplished in the linkage editor (not visible to the source code).

[What a screwed up architecture!]

Thx,

--don

Vote

A

Arlet Ottens 11 years ago

It would depend on what tools you use. I remember working with Borland tools, and they'd offer a choice of several different memory models.

Vote

D

Don Y 11 years ago

So, the "most conservative" would be to assume a tiny-ish model -- 64K TOTAL address space? (with everything residing therein)

[I don't really care what it is, just need to know the constraints before I settle on a design. E.g., I surely wouldn't use int's if values would fit in char's and every byte of data "cost" me a byte of code!!]

Vote

D

David Brown 11 years ago

It should, I think be possible to have DS, CS and SS each pointing to a different 64 KB segment. But .bss and .data would both be part of the data segment. Access to other segments (including pointers to stack data, if SS is not the same as DS) is via "far pointers".

I believe in real mode you can get 1 MB of address space total. I am not sure if things like video cards get mapped into that space at all, or are only accessible with I/O instructions.

It doesn't have to be - but it is going to be a little complex to access objects that span 64K segments.

Yes, x86 was screwed up the day it arrived on the market.

But the real question, which you knew someone was bound to ask, is /why/ are you using real mode on the x86? The chips boot in real mode, but you would normally jump to protected mode after a few instructions to set it up (unless you really are using an 8086 processor).

I believe I read somewhere that in modern x86 chips, the cache can be configured to work as ram (basically, you have a write-back cache but disable writing back to external memory), so that if you are working with x86 code this small, you don't need any external memory.

You might find more useful information on the

formatting link

or

formatting link

websites.

Vote

A

Arlet Ottens 11 years ago

Here's an overview of the memory models:

formatting link

The smallest one is the Tiny model, but I don't know if it makes sense to use that as a design constraint, since it's a trivial matter to change to another memory model if your tools support it.

I would start by looking for a toolchain, and see what it supports.

Vote

R

Robert Wessel 11 years ago

Typically compilers would offer several memory models you could use. One code (text) and one data segment was "small" model. You also had large (code and data could both be more than one segment, but no object could be bigger than a segment), and medium (one data, multiple code) was very common. Rarely did you see compact (one code, multiple data). The pointer sizes obviously varied between the models, and the larger pointers had performance impacts.

There would be some limitations even with the large data models (beyond the single-object size limit). Invariably your stack could still not exceed 64KB, and some compilers insisted on putting some data (and usually the stack as well) in one "primary" data segment (usually called DGROUP), and it was occasionally possible to run out of space there if you had masses of initialized storage (you could usually tell the compiler to put initialized items elsewhere).

Usually the compiler would include libraries compiled for each model, so all would work pretty much as expected.

Typically you'd select the memory model with a command line switch.

Some folks also supported "huge" model, which was like large, except objects bigger than 64K were supported (the generated code was often painful, but it worked). Tiny existed as well, which was the old single segment (".com") model, although that was more a mangling of small model in most cases.

In most cases you could explicitly deal with larger pointers in code, for example by declaring a pointer as "far". Although that would usually have library issues that you'd have to manually deal with (for example, a program compiled medium model would have nominal small data pointers, and while you could allocate additional "far" data, and access it via far pointers, you couldn't pass those addresses to a "normal" library function like strcpy(), which, in medium mode, would be expecting only small pointers).

But most serious compilers, and pretty much all C compilers, support(ed) large model just fine, and you basically see no oddities accessing the entire 640KB (or more, in 16-bit protected mode), although there is a performance hit. Nor were there usually any oddities with the other memory models (other than sometimes having different size code and data pointers). You did see some stuff if you built a mixed model application, where, as I mentioned, you'd have explicit "far"s in you code. If you *don't* do mixed model, it's just another platform.

Vote

D

Don Y 11 years ago

Not a question of toolchain but, rather, what the *environment* already "expects" (provides). Hence the "*practical*" qualifier in my original post.

Thanks! I can do a LOT in 64K. Just need to know that "going in" instead of "after the fact"...

Vote

T

Tauno Voipio 11 years ago

One question is whether your target processor is 8086/88/186/188 or something bigger. The former wrap addresses at 1 MiB, so the old PC/AT trick to address 65520 bytes above 1048576 works only on the bigger processors.

Some older 8086 code (including base PC BIOS) used small model segmenting for ROM just below 1 MiB and RAM at 0. This is the cause why there is a thing called A20 gate in PC/AT and derivatives.

Tauno Voipio

Vote

A

Arlet Ottens 11 years ago

The environment is provided by the toolchain startup code, so it is a question of toolchain.

Vote

G

Grant Edwards 11 years ago

It depends on the compiler and on the memory model you selected when invoking the compiler.

Real-mode compilers usually had at least 4 memory models, sometimes more. It was a _long_ time ago I think the last time I used the Microsoft DOS C compiler there were 5:

tiny small medium large huge I think tiny meant everything was in the same 64K segment (all segment registers were equal and never changed). That may have only been used for .com files rather than .exe files. All pointers were 16 bits.

IIRC, small meant there was a single 64K text segment, and a single 64K data segment. All pointers were 16 bits.

In medium, I think each function (or file?) had it's own text segment, but there was still a single 64K data segment (pointer to function was

32 bits, pointer to data was 16 bits). Or was that large.

In huge every function (or file?) had its own text segment and its own data segment. All pointers were 32 bit.

Or something like that.

And then there was paging/overlay support you could add into the mix.

Yup.

Grant Edwards grant.b.edwards Yow! Jesus is my POSTMASTER at GENERAL ... gmail.com

Vote

G

Grant Edwards 11 years ago

How are you going to use a memory model that isn't supported by your toolchain?

What do you mean by "environment"?

Are you asking how the CPU instructions and addressing modes work?

Are you asking how much physical RAM your platform has?

I've no clue what you mean by "practical" if you're not going to take into account what your toolchain supports.

Grant Edwards grant.b.edwards Yow! I joined scientology at at a garage sale!! gmail.com

Vote

S

Stefan Reuther 11 years ago

Depends on how you want to be "most conservative".

TINY model means you can effectively ignore segment registers and work with 16-bit flat addresses. Of course this is an assumption that doesn't hold everywhere. Being "conservative" from this standpoint would mean you have to assume that every object is in a different segment, i.e. LARGE / HUGE model.

Stefan

Vote

G

glen herrmannsfeldt 11 years ago

(snip)

Some instructions, such as CALL, use a far pointer.

Others use a segment override prefix.

Most use DS: for accessing data, but you can put a CS:, ES:, FS:, or GS: prefix on the instruction (or on the data operand), and it will use the specified segment instead.

If you have a far pointer, you load it into a segment register, index register pair, and then use the appropriate prefix.

-- glen

Vote

U

upsidedown 11 years ago

The smallest would be putting into a single 64 KiB segment. That would be equivalent to a 1960/70's minicomputer.

Putting 64 KiB of TEXT in one segment and DATA+BSS in an other 64 KiB would be equivalent to code/data address space of some minicomputers. Some had issues with self modifying code, but unless you did something exotic, there should not have been much issues even on the x86.

Those small modes could be easily programmed in assembler, but with larger data models with segment overrides and long pointer made assembly programming really ugly, so it is understandable that high level support were developed.

Vote

L

Les Cargill 11 years ago

Go find out how tiny, small, (medium?) large and huge models worked in Borland and M$ tools.

It's all about how the compiler treats the segment registers. And then there's extended memory, expanded memory, exalted* memory.

*I made that last one up.

It fit well enough into memory prices for the time. But yeah. We know :)

Les Cargill

Vote

L

Les Cargill 11 years ago

The point is that you have choices.

Okay then.

Les Cargill

Vote

T

Tauno Voipio 11 years ago

Actually, Intel did not intend to use the 8086 real mode as such, only as bootstrap mode for 80286. The success of 8088 and original PC came as a surprise, and it influenced the design of 80386 with the virtual 8086 mode.

There were no PC's at the design time of 8086 and 80286, and the aim was for well protected real-time multitasking applications. So it was natural that 80286 does not voluntarily return back to real mode from protected mode. This led to a massive kludge in PC/AT, where the keyboard controller can reset the main processor and the RTC CMOS RAM will contain a code explaining why the reset was this time.

Tauno Voipio

Vote

G

George Neuner 11 years ago

If you use the tiny (64K code+data) or small (64K code, 64K data) model, .bss and stack share space with your data. If you choose a model with multiple data segments, .bss and stack can be separate (but tool dependent you may have to ask for it explicitly).

The linker determines where your segments get placed (assuming the OS permits the requested placement).

In the "huge" model arrays and structs can straddle segments. However, you should take care that no individual element of an array or struct lies across a segment boundary ... I have found that many compilers don't renormalize intermediate pointers when indexing from a huge pointer, so you can get into trouble when the offset is close to or exceeds 64K. I've been bitten by that more times than I care to admit. [The solution is to deliberately construct a new huge pointer. When you store the new pointer it will be normalized and thus safe to use.]

Arithmetic on "far" pointers affects only the 16-bit offset and wraps at the segment boundary. This means that two far pointers having different segments can't reasonably be compared. If you are using multiple data segments and you need to compare pointers, you have to use normalized "huge" pointers. Do be aware that huge pointer arithmetic can be quite a bit slower due to (re)normalization of the results.

You really only need to choose the huge model if you expect a single array or struct to exceed 64K. If you need huge pointers purely for comparison, they can be mixed with far pointers in any of the multiple data segment models. Because far pointer arithmetic is faster, they should be preferred wherever you can live with their limitations.

Depends on the tool - some compilers use pragmas to control placement of objects into particular segments. Placement of the segments themselves is controlled by the linker (and/or OS).

It isn't bad until you grow beyond 64K data and are forced into explicit use of far or huge pointers. There's a model that allows

Vote

D

Don Y 11 years ago

The environment is defined by the rest of the code which I have to co-operate. It is poorly documented -- hence my query as to what I could likely expect to encounter.

I'm all set, though. Just have to figure out what I want to shoehorn in and then find the right sized shoehorn! Thanks!

Vote

G

Grant Edwards 11 years ago

So you wanted us to explain the memory model used by code that we haven't seen and wasn't even mentioned until now?

Grant Edwards grant.b.edwards Yow! I have a VISION! It's at a RANCID double-FISHWICH on gmail.com an ENRICHED BUN!!

Vote

x86 real mode

Join the Discussion

Didn't find your answer?