Branch delay slot on MIPS32 processors

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
MIPS32 processors have "delayed" loads and branches.  The MIPS32
manual says that the instruction immediately following a branch
is always executed, regardless of whether the branch is taken or
not.  Optimizing compilers try to fill a branch delay slot with
an appropriate instruction.

Are there any restrictions on the kind of instruction that can be placed
in the branch delay slot?  Is it possible to place a STW (store word)
in the delay slot?  Is it possible to fill the delay slot with a
multiply-and-add (MADD) instruction, even if MADD needs more than
one cycle to complete?

Thanks for your answers.

Julia.

Re: Branch delay slot on MIPS32 processors
i am not compiler expert but just an user.
i think putting sw in delay slot won't be a problem. madd i am not sure.
also if pipeline (> r4k) has hazard detection, it will stall the pipeline
if it detects one.  


Julia Groszark wrote:

Quoted text here. Click to load it


Re: Branch delay slot on MIPS32 processors
Quoted text here. Click to load it

I'm sorry, I've never worked with MIPS32, so I can't answer your
question.

But the concept of delayed branches sounds quite interesting to
me.  It not only solves the problem of prefetch stall, but also
gives quite a number of exciting new possibilities.  If you, for
example, were to place another branch in the slot, you control
the execution flow at the branch source, instead of the destination.
That's not possible on traditional designs.

I think this feature could be quite useful on tiny (havard) 8 bit
micros, where you squeeze the best out of a very limited code space.

Marc

Re: Branch delay slot on MIPS32 processors

Quoted text here. Click to load it

OUCH!

Branch delay slots have problems:

a) They expose, and thereby tie you to, a particular microarchitectural
implementation.

a) How long should they be? On a 4-issue superscalar version of the
architecture, you'd want 4, 8 or even 12 instructions if you want to
avoid pipeline bubbles.

Terje

--
"almost all programming can be viewed as an exercise in caching"


Re: Branch delay slot on MIPS32 processors

Quoted text here. Click to load it

(snip)

Quoted text here. Click to load it

I was reading a thread, at least farther down on my news reader, called
"Simplified forwarding".

I was then remembering a discussion some years ago on branch delay slots,
and the problem of how many cycles (or instructions) are needed.  A thought
I came up with was to add a field (yes, there are always not enough bits) to
the branch instructions for how many delay slot instructions there should
be.   Maybe two bits for 0, 1, 2, 4, for example.

The compiler could determine how many instructions it could possibly execute
in the delay slot and move them there.  If the hardware only needed one,
nothing would be lost.

It would be interesting to know, for compiler generated code, how many
instructions could possibly be moved into delay slots.   Code containing
small loops would not have so many instructions available to move.

-- glen



Re: Branch delay slot on MIPS32 processors

Quoted text here. Click to load it

Another idea: always make it eight instructions but include a version of
NOP with an immediate field that says how many NOPs it really stands for.

(apologies to TMS320C6000)

-Peter

Re: Branch delay slot on MIPS32 processors

Quoted text here. Click to load it

Yet another idea: add a bit in the instruction encoding to signify the
end of the delay slot. Compilers would generate delay slots as long as
is possible and the processor can take advantage of them as needed.

--
Wishing you good fortune,
--Robin Kay-- (komadori)


Re: Branch delay slot on MIPS32 processors
Quoted text here. Click to load it

At that point, you might consider that bit as explicit termination of a
group of instructions that can be executed in parallel, so the processor
doesn't have to check dependencies...

Even better, we could separate the branch decision from the jump point
entirely, removing the need for speculative execution or delay slots...

Oh wait, that's already been done :)

S

--
Stephen Sprunk         "God does not play dice."  --Albert Einstein
CCIE #3723         "God is an inveterate gambler, and He throws the
We've slightly trimmed the long signature. Click to see the full one.
Re: Branch delay slot on MIPS32 processors

Quoted text here. Click to load it

Point taken ^_^', but perhaps arbitarily long delay slots could be
benificial on their own and without some of the
other 'features' you mention. Am I mistaken? I would not have thought
scheduling my long delay slots significantly more complex than slots
only a single instruction long.

--
Wishing you good fortune,
--Robin Kay-- (komadori)


Re: Branch delay slot on MIPS32 processors

Quoted text here. Click to load it
I've always wanted to do a "Prepare branch/jump" and "Do branch" type of
system. It feels like a logical extention to modern prefect instructions.

`Allan

Re: Branch delay slot on MIPS32 processors

Quoted text here. Click to load it

The delay slot instruction must be harmless to both paths. An STW
instruction often uses several clock cycles, it is not harmless
if the memory content is re-used soon and it's not clear how the
pipeline works in that case, e.g. you might have a non-blocking
load/store implementation or not.
So don't use memory access, coprocessor instructions, ..., and
take care about the registers you modify in the delay slot.

Typically the delay slot is used to update the stack pointer.
Or you can use it to otpimize algorithms in assembler.

Best regards, Mike...


Re: Branch delay slot on MIPS32 processors
snipped-for-privacy@gmx.net (Julia Groszark) writes:

Quoted text here. Click to load it

I believe that Dominic Sweetman's "See MIPS run" and/or Gerry Kane's
"MIPS RISC Architecture" should answer your questions.

Quoted text here. Click to load it

Putting the addiu in the delay slot should work perfectly. I cannot
say for sure; it has been too many years (7 and counting) since I last
worked with a MIPS processor.


--Kai

Re: Branch delay slot on MIPS32 processors
Quoted text here. Click to load it

There's some volumes of the MIPS32 spec at
http://www.ece.lsu.edu/ee3755/ref.html but the spec itself is quite
vague. If you want chapter and verse guarantees, I'm not sure you'll get
them.

You can get specs for specific implementations like 4Kc from
<http://www.mips.com/ of course.

But to me and my understanding of the MIPS pipeline, it makes sense
that a SW in a branch delay slot should be fine. Read the docs above
about the pipeline, and work it out for yourself what happens at
what stage if you like.

FWIW, playing with some code in GCC compiled with "mipsisa32-elf-gcc
-mips32", it generated stuff with the STW in the branch delay
slot.

Jifl
--
--[ "You can complain because roses have thorns, or you ]--
--[  can rejoice because thorns have roses." -Lincoln   ]-- Opinions==mine

Re: Branch delay slot on MIPS32 processors
snipped-for-privacy@gmx.net (Julia Groszark) writes:

Quoted text here. Click to load it

You can certainly do the ADDIU in the delay slot.  What you need to do is bias
the end condition, since you are doing the subtract after the test.

Looking at the GCC sources, it allows stores in the delay slot, and I know I've
done it with earlier generations of MIPS processors.  On older MIPS (ie, MIPS1
and MIPS2) loads, transfers, and move hi/lo weren't allowed since they had user
visible delays before their result could be used.

If you can, I would suggest unrolling the loop at least 1 time, so that you
aren't as dependent on things being in the level 1 cache.

--
Michael Meissner
email: mrmnews@the-meissners.org
We've slightly trimmed the long signature. Click to see the full one.
Re: Branch delay slot on MIPS32 processors
The only instructions that absolutely must not be put into
a MIPS32 branch delay slot are those which themselves
have a delay slot, i.e. you cannot have a branch followed
directly by a branch.  Stores, adds, whatever, are fine.
Note that the branch *likely* variants cause the delay
slot instruction to be "squashed" (not executed) if the
branch isn't taken.  Otherwise the instruction in a branch
delay slot can be thought of as being logically sequenced
prior to the branch instruction.  Indeed, MIPS assemblers
typically have a default mode where one writes as if there
were no delay slots, and the assembler re-orders things
auto-magically and moves a branch-invariant instruction
(on which the branch does not depend) down into the delay slot.



Site Timeline