Direction of Stack Growth

karthikbalaguru · 2007-10-21T11:44:30+00:00

Hi,Why some processors have stack growing downwards and others have stackgrowing upwards ?Any advantages/disadvantages w.r.t both these designs.Which is the best model ?I serached the internet, but i did not find a good link that explainsthese stuffs in detail.Thx in advans,Karthik Balaguru

S

Steve Underwood 18 years ago

Of course it has a louder voice in the computer world, if it goes to 11. :-)

Steve

Vote

P

Paul Keinanen 18 years ago

Little endian addressing is a good idea, if the bus width is less than the address size. You can perform the effective address calculations on the LSB (and generate the carry from that calculation) before the MSB is loaded.

In big endian systems with a narrow bus, you must first load both the MSB and LSB, before you can start calculating the effective address, thus, the calculation is slower or you need much more carry-lookahead gates to perform the effective address calculation swiftly.

The upper 4 kW (8 KiB) was reserved for the I/O page (memory mapped I/O). At least the RSX-11 linker (TKB) reserved 512 words (or was it

512 bytes) of stack space in the low addresses. This convention was also used by Fortran IV and Fortran IV+.

Some sissy language implementations such as Pascal or C might have used the software stack in addresses above the code (but below the 8 KiB I/O page).

Paul

Vote

J

John Ahlstrom 18 years ago

Or leave it to the ethologists. Why do Teco, Emacs, and VI have disproportionate influence on people?

The baby duck syndrome.

JKA

How do you tighten a screw? Turn it until the thread strips and then back it off a quarter of a turn?

Vote

P

Paul Keinanen 18 years ago

How on earth would you fit a pre-increment mode into the instruction map of the PDP-11 processor ???

While it might be possible to replace the post-autoincrement with pre-increment mode and pre-decrement with post-decrement modes, by doing some considerable wire wrap work on the CPU boards or doing a complete rewrite of the microcode, what is the point ??

Anyway, after doing this, none of the original PDP-11 programs would run on that processor.

Paul

Vote

P

Paul Keinanen 18 years ago

While I am a youngster, with only vacuum tube experience with audio and RF systems, but no logic vacuum tube systems, it would be interesting to design a usable computer using vacuum tubes (and possibly some semiconductor rectifiers).

After all, we currently know something about how to design computers (able to execute usable programs) that the people in the 1940's did not know about.

Paul

Vote

J

Jerry Avins 18 years ago

...

My early lab programs ran on a timeshare-for-hire PDP-6 that we reached by modem. Eventually I had a 9600-baud almost-half-duplex modem. I eventually rigged the 50-baud reverse channel to the keyboard of the

35ASR to get around the time delay for turning the line around. The tape reader ran at 110 baud, do I had to turn the line around for that.

Jerry

Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Vote

J

Jerry Avins 18 years ago

You don't want to work with a tube computer. Harvard's, which occupied an entire building, wasn't much better than an 8031. It was possible to stand outside it in the dead of winter and bask in the 60-degree cooling air that issues from gratings in the sidewalk. (In summer, it was best to stay away.) I don't think you can get Special Red tubes any more. With ordinary tubes that have maybe a 2000-hour MTBF, a computer with

10,000 tubes -- fewer active devices than ay pocket calculator -- the up time is tens of minutes.

Jerry

Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Vote

E

Everett M. Greene 18 years ago

(snip)

I'd forgotten about those machines when I said something about some processors only being able to use positive index offsets. The negative indexing was the origin of the BES symbol?

Vote

T

Terje Mathisen 18 years ago

Please!

I suggest you do a quick google of my name, or maybe browse some of my comp.arch posts over the last 10-15 years.

I've written code to work around the Pentium FDIV bug, this got included in pretty much every compiler on the market for a few years.

I've also written asm code for one of the AES contenders, as well as doing optimization work on things like the first ever software DVD player. You'll also find my name in the original Quake manual.

On your end, I assume you're the Steve Underwood who's worked on various telephony open source projects?

I once wrote, mostly in hex and without a computer, over an Easter break, a program to encode arbitrary binary data as an executable ascii file, using nothing but the 70+ chars blessed by the MIME standard as being transparent across all email gateways.

This should be in the same ballpark as Mel's card-playing program, right?

It uses the minimum possible amount of self-modification (a single two-byte backwards branch instruction), it survives almost all forms of paragraph reformatting, and it uses Base-64 encoding of the payload with the smallest decoder I have ever seen. (It is significantly better than the "best" virus code I've disassembled.)

Here's the two lines that contains the first two bootstrap stages, check it out in a 16-bit debugger: ZRYPQIQDYLRQRQRRAQX,2,NPPa,R0Gc,.0Gd,PPu.F2,QX=0+r+E=0=tG0-Ju E= EE(-(-GNEEEEEEEEEEEEEEEF 5BBEEYQEEEE=DU.COM=======(c)TMathisen95

Terje PS. I've won two international code optimization competitions, and ended up in second or third place a couple of times.

- "almost all programming can be viewed as an exercise in caching"

Vote

G

glen herrmannsfeldt 18 years ago

Paul Keinanen wrote: (snip)

This might reasonably be true for processors like the 8080 or

6502 that were limited by transistor count. For addresses that are part of the instruction or for autoincrement addressing modes, you can do the add while incrementing the PC or index register. For addresses in memory, it isn't that much harder to load the higher byte first and continue down in memory. (Consider the subject of this thread.)

For any machine big enough to have a multiply instruction, even more for a divide instruction, one needs to address memory (or registers) repeatedly and most of the advantage is gone.

Note that S/360 was implemented on processors with an eight bit ALU and eight bit addressable memory (the 360/30), yet they went for big-endian. It is just a small change in the microcode when addressing memory. (Especially if you don't allow unaligned access.)

Those who think big tend to use big-endian, those who think small tend to use little-endian (for the reason you say).

-- glen

Vote

D

Dennis Ritchie 18 years ago

Yes, it was.

Dennis

Vote

T

Tom Knight 18 years ago

Yes, but even on the 7094 mod II one could execute the instruction "Enter Multiple Tag Mode" which would carefully replicate this bizarre behavior for compatability.

I don't think this was a matter of coming to their senses -- it was simply far too expensive to build 7 registers at the time of the 704. The ORing behavior was also a direct result of the implementation of the registers, in which the gating of the data onto a common bus with an OR rewrote the flip flops of the register. It was not an intentional design "feature". This is similar to the PDP-4 and PDP-7 opcode which shifted both directions at once, resulting in the OR of the two shifted results, for similar accidental implementation reasons.

Vote

P

Paul Keinanen 18 years ago

While one or two exclusive-or gates on the LSBit line(s) would be sufficient to perform the out of order fetch if unaligned access is not allowed, the thing gets nasty, if unaligned access if allowed.

Assuming variable length instructions from 1 up to N bytes long (including immediate data or offsets/addresses) the constant can be on any byte border and hence a full adder would have to be used to perform the out of sequence fetch. Alternatively stepping the PC rapidly in increments of one and store all the intermediate results in temporary registers, until the PC has reached the last byte of the constant, before the address of the LSByte is sent out, followed by addresses stored in the temporary registers to access the MSbyte(s).

Paul

Vote

G

gmpundlik 18 years ago

stack

Well, hardware stacks can grow whatever direction they want to, so d stacks on harvard machines like 8051. But for stacks placed in syste memory, IMO it is better to have a downward stack especially for smalle processors with smaller memory spaces, for reasons:

You can init stack pointer to top of RAM (or available segment), an avail as much memory as growing stack pointer allows, meaning if you program does not use stack much you have more contagious memor available.
Using upward stack, you put a absolute limit on usable memory for you program, while stack may not even use all the memory allocated for it.
While working with tiny memory model programs on DOS I used to put al initialisation procedues at the program end, that aren't needed once th program has started. This would be a buffer zone - I would actually allo stack to encroach on program memory that would have been otherwise hol valid and required code.
Stack overrun problems are reduced since memory is always availabl downwards, whereas for upward stack overruns might get into trouble mor often.

Now with MBs of memory available, location and direction of stack does no really matter, but an embedded developers should not accusto himself/herself with that luxury. We are poor people ;) who have to wor with few Ks of memory on processors running at few tens of MHz... we hav to make every penny every byte and every cycle count!

Vote

P

Paul Keinanen 18 years ago

In the interest of keeping the package pin count at a reasonable level MC68008 was big-endian with 8 bit data bus and 16/24/32 bit internal registers, while 8088 was little endian, 8 bit data bus and 16 bit or

16+16 (segment+offset) internal registers.

Unaligned access is not a problem with an 8 bit data bus, unless you complicate things by insisting big-endian :-).

Paul

Vote

G

glen herrmannsfeldt 18 years ago

Paul Keinanen wrote: (previously snip regarding big vs. little endian memory access)

There was a very small window in time where this was true. As data buses got wider it made much less difference, and the other effects of unaligned access got worse. The 8086 has a 16 bit data bus, but eight bit aligned instructions. The only reasonable way to do that is to buffer the instruction stream and extract from the buffer. The PDP-11 has a 16 bit bus and instructions aligned on 16 bit words, with no 32 bit operands but is little endian, anyway.

The 6502 uses even stranger tricks than this. Note that the 6502 doesn't push the address of the next instruction onto the stack for CALL, but one less. That is, the PC hasn't been incremented to the next instruction yet! RET has to increment it. I saw once an indexed jump that pushed an address (minus 1) on the stack and then executed RET, then wondered why all the minus 1.

Many older machines were word addressed, avoiding this problem. From the 8080 until the 8086 it was probably true. The 8086 is little endian for 8080 compatibility. The 68000, SPARC, and HP-PA, designed later than the 8086, are big endian. As I said before, those who think big tend for big endian.

-- glen

Vote

G

glen herrmannsfeldt 18 years ago

(I wrote)

For programs in RAM you can modify an instruction in memory. They conveniently added the ability to address I/O with a register in the 8086.

Or load an address from memory and do PCHL. I used to have a disassembler that would follow through code, putting branch addresses onto a stack to be followed later. When it got to RET, or a JMP to an address it had seen before it would take one off the stack and start disassembling from there. One of the first programs I used it on had a PCHL jump table, so all those addresses were needed as initial stack entries.

-- glen

Vote

J

Jerry Avins 18 years ago

...

Endianness also influences the ease of reading a memory dump. We read numbers from left to right, but calculate with them from right to left. Orientation that simplifies one complicates the other.

Jerry

Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Vote

G

glen herrmannsfeldt 18 years ago

(snip)

The 8088 was designed after the 8086. The 8086 has the buffers needed to do unaligned 16 bit access on a 16 bit bus. For the

8088, it does two 8 bit cycles into the same buffer, even for aligned access. Even though it is little endian, it still goes through the buffer.

The 68008 was designed after the 68000. I don't remember how many pins it has, though. I expect that like the 68000 all the buffers were there and might just as well be used.

and you don't already have a 16 bit (or larger) ALU and the appropriate buffers.

-- glen

Vote

A

Archi 18 years ago

I believe Z80 added this.

Archi

Vote

Direction of Stack Growth

Join the Discussion

Didn't find your answer?