Direction of Stack Growth

karthikbalaguru · 2007-10-21T11:44:30+00:00

Hi,Why some processors have stack growing downwards and others have stackgrowing upwards ?Any advantages/disadvantages w.r.t both these designs.Which is the best model ?I serached the internet, but i did not find a good link that explainsthese stuffs in detail.Thx in advans,Karthik Balaguru

J

Jerry Avins 18 years ago

Yes. I used it there.

Jerry

Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Vote

G

glen herrmannsfeldt 18 years ago

Yes. Once you decide that they are reasonably equal for other reasons (In processors more complicated that the 6502.) ease of use for the programmer comes into play. The convenience in reading dumps and programs not accidentally working when passed arguments (by reference) of the wrong size.

VAX/VMS had a solution for dump reading, though. They printed the ASCII values left to right and the HEX values right to left with the row address down the middle.

-- glen

Vote

S

Stephen Fuld 18 years ago

If ease of dump reading were important, you should use a decimal computer! No need to grow those six extra fingers. :-) Besides, ten groups of ten digits each , with appropriate spaces and addresses fits nicely on a 132 column printout. :-)

- Stephen Fuld (e-mail address disguised to prevent spam)

Vote

G

glen herrmannsfeldt 18 years ago

Archi wrote: (someone wrote)

(Regarding pushing instructions on the stack and executing them.)

Yes, it seems that it did.

Adam Osborne indicates that both the Z80 and 8085 can be considered equal successors to the 8080, with the Z80 designed by people who left Intel after working on the 8080. It seems, though, that Intel saved more of their new features for the 8086 with very little being added for the 8085.

-- glen

Vote

G

glen herrmannsfeldt 18 years ago

(snip)

You mean like od -d generates dumps?

-- glen

Vote

T

Terje Mathisen 18 years ago

But they still didn't allow you to generate a sw interrupt with a programmable IRQ number.

The standard workaround used to be to code an INT 0FFh, then modify the

0FFh part at runtime.

The two other alternatives were to generate a 512-byte code area, with two bytes for each possible IRQ number:

INT 0 INT 1 INT 2 ...

and jump to the correct entry, or you could generate a fake stack frame, then IRET into the target IRQ handler address.

There's a couple of tricky points related to the latter, in that you needed to make sure every register was saved/restored, and the IRET had to pop a flag value with interrupts disabled, since that would be expected the IRQ handler.

Terje

- "almost all programming can be viewed as an exercise in caching"

Vote

G

glen herrmannsfeldt 18 years ago

(snip regarding IN and OUT to a variable port on the 8080 and 8086)

The EXecute instruction on S/360 was carefully designed for this. It allows one to modify the second byte of a copy of the instruction before executing it. Conveniently, that is where the length field of variable length instructions go, and the SVC (interrupt) number for the SVC instruction. There is still no EX in x86?

-- glen

Vote

J

John L 18 years ago

Nope. I gather that execute is bad news in pipelined architectures since it adds dependencies between the instruction fetch pipeline and operand data. The 360 style makes it worse by adding dependencies on a register. As far back as the 8086 the fetch pipeline was visible to the program and you were told that storing into the next instruction wouldn't work.

The main use of EX in the 360 is for variable length string moves, but the 390s have new instructions that explicitly take the length from registers. It's still somewhat useful for some of the more exotic string instructions like translate and test, but on modern CPUs I wouldn't be surprised if a loop of simpler instructions were just as fast. It does let you do a variable SVC number, but that's a pretty exotic application.

Vote

G

glen herrmannsfeldt 18 years ago

(snip)

The question, then, is does the operand go to the I cache or D cache. I don't remember which way the ESA/390 machines do it.

About as exotic as variable INT.

-- glen

Vote

J

John L 18 years ago

I looked at some of the JR&D articles which don't say either, but they did remind me that the 360 architecture has no visible fetch pipeline. You are allowed to store into the very next instruction and it has to work. The z series chips have quite a lot of logic to make this work, invalidating caches and pipelines and all. Once they can do that, the extra pain to do execute doesn't seem so bad, but life for the chip designer would be a lot easier if you didn't have to handle either one.

Other than debugging, I don't ever recall seeing an SVC number changed at runtime on a S/360. The various operating systems define the numbers, and they're built into libraries or assembler macros.

Vote

N

Nick Maclaren 18 years ago

|> >The question, then, is does the operand go to the I cache or D |> >cache. I don't remember which way the ESA/390 machines do it. |> |> I looked at some of the JR&D articles which don't say either, but they |> did remind me that the 360 architecture has no visible fetch pipeline. |> You are allowed to store into the very next instruction and it has to |> work. The z series chips have quite a lot of logic to make this work, |> invalidating caches and pipelines and all. Once they can do that, the |> extra pain to do execute doesn't seem so bad, but life for the chip |> designer would be a lot easier if you didn't have to handle either |> one.

I speeded up a random number generator by 20% when a colleague told me (as he had been reading the microcode) that separating store targets by at least 256 bytes (on the 370/165) from instruction fetch would avoid the slow microcode. That was one of the reasons that the Fortran linkage was so slow - I could get it to run at twice the speed by merely reorganising the layout, but still obeying the same instructions in the same order.

Regards, Nick Maclaren.

Vote

R

robertwessel2 18 years ago

Amusingly that issue made a reappearance with the z990, where a split L1 cache (I and D) was introduced. Basically a store into the same

256 byte cache line that that had instructions in it (and was in the L1I), introduced a significant slow down while a whole lotta flushing happened.

The same solution (separate the code and data a bit more) applied as well.

This hit several products that did dynamic instruction generation pretty hard.

Vote

A

Anne & Lynn Wheeler 18 years ago

this is analogous but different to the significant rewrite effort for both MVS and VM in the 3084 time-frame (four-way multiprocessor) for kernel storage ... trying to force storage to cache lines and multiples of cache lines ... so that different storage allocations didn't overlap in common cache line ... which had the possibility that different processors were simultaneously operating on different storage areas sharing common cache line (resulting in significant cache thrashing).

Vote

G

glen herrmannsfeldt 18 years ago

(I wrote)

That logic was also in the 360/91. The 91 could do instruction prefetch, including along the path of potential branches.

The Fortran library did use self-modifying code, though not the immediately following instruction as far as I know. For example, SIN and COS routines would modify an instruction depending on which entry point was used. LOG routines change a multiply or no-op depending on the ALOG or ALOG10 entry point.

(snip)

It comes out, just as for the x86 INT case, if you want a routine callable from a high level language that will do SVC or INT.

Otherwise, there isn't much reason to do it.

-- glen

Vote

J

John L 18 years ago

Oh, yes. That part they had to get right, unlike the interrupts when an operation generated a fault, e.g., divide by zero. S0C0.

They certainly did a lot of stuff that seems silly now to save space.

Agreed. Still seems pretty exotic.

R's, John

Vote

R

Richard Owlett 18 years ago

Reminds me of of an 8080 embedded application written in assembler when

1k proms were expensive. System could crash in a certain routine if external inputs were in a transient "should never occur state". There wasn't room for kindly backing out of nested calls.

Solution, stuff "correct" value in stack pointer and jump to suitable location.

I came along when we had more prom and did things neatly. Result, under that condition it would attempt to execute some store text messages.

Vote

G

glen herrmannsfeldt 18 years ago

S0C0 is for multiple imprecise interrupts.

For those who don't know it, the 360/91 could execute instructions out of order. If an interrupt came in, it was necessary to finish all the instructions that had already been started, possibly resulting in more interrupts. In addition, the address stored is likely not the address of the actual instruction that caused the interrupt.

So, in addition to being able to execute instructions out of order and prefetch along multiple paths it has to get the right answer, assuming no program interrupts, with self modifying code.

-- glen

Vote

D

David Thompson 18 years ago

Nit: 1s/stack$/&s/ On -10 for (explicit) call/return and/or data you could have one stack for each GPR you were able to dedicate; none was used for interrupts etc. and thus could be considered a hardware preference/definition, the way R6=SP (or specifically KSP) is on -11.

- formerly david.thompson1 || achar(64) || worldnet.att.net

Vote

Direction of Stack Growth

Join the Discussion

Didn't find your answer?