Inline assembler on PowerPC

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Consider the following (compiler=GCC3.4.3, host=I686,
target=powerpc-eabi):

typedef void(*pVoid)(void);

static inline bool1 kSetVector(uint1 level, pVoid func, int type) {  
     int r;
     const int code = 0;                                      
     __asm__ __volatile__ (                      
     " li 0, %1 \n"  /* code */      
     " mr 3, %2 \n"  /* level */
     " mr 4, %3 \n"  /* func  */                   
     " mr 5, %4 \n"  /* type  */                   
     " sc       \n"  /* System Call: may corrupt regs: result in r3 */
     " mr %0, 3 \n"  /* Return result */          
     : "=r" (r)
     : "rI" (code), "0" (level), "r" (func), "r" (type)      
     : "r0", "cc", "memory"         
     );                                          
     return r;
}
...
(void)kSetVector(31, SerialIoInterrupt, 3);

 This compiles, & runs fine (producing the code below). However I
would like to improve the efficiency, by eliminating the "mr"
instructions to move arguments to & from registers. The "sc" needs the
data in precisely the registers shown, so GCC needs to be coaxed into
using those registers itself.

Generated code (comments added):

  54:h/services.h  **** static inline bool1 kSetVector(uint1 level,
 pVoid func, int type) {
 203                 .loc 2 54 0
 204 019c 3940001F   li 10,31             /* level */
 205 01a0 3D200000   lis 9,SerialIoInterrupt@ha     /* func  */
 206 01a4 39290000   la 9,SerialIoInterrupt@l(9)
 207 01a8 39600003   li 11,3             /* type  */
 208                .LBB3:
  55:h/services.h  ****      int r;
  56:h/services.h  ****      const int code = 0;
  57:h/services.h  ****      __asm__ __volatile__ (
 209                 .loc 2 57 0
 210 01ac 38000000   li 0, 0
 211 01b0 7D435378   mr 3, 10      /* The "mr's" I want to remove */
 212 01b4 7D244B78   mr 4, 9
 213 01b8 7D655B78   mr 5, 11
 214 01bc 44000002   sc
 215 01c0 7C6A1B78   mr 10, 3      /* result */

 In the X86 builds of GCC, there are "register loading codes", as "c",
"a" & "D" in the following example (from: "Using Inline Assembly With
gcc" by Clark L. Coleman).

asm ("cld\n\t" "rep\n\t" "stosl"
      : /* no output registers */
      : "c" (count), "a" (fill_value), "D" (dest)
      : "%ecx", "%edi" );

 Is there a similar device for the PowerPC, whereby I can tell GCC to
create the values in specific registers, so eliminating the need for
those "mr" instructions?
 TIA,


Re: Inline assembler on PowerPC

Quoted text here. Click to load it

Imho, the easiest way is to do it ... in C:
static inline bool1 kSetVector (uint1 level, pVoid func, int type)
{
    register uint1 _level __asm__ ("r3");
    register pVoid _func __asm__ ("r4");
    register int _type __asm__ ("r5");

    _level = level;
    _func = func;
    _type = type;
    __asm__ __volatile__ (
    "li 0, %1 \n"
    "sc       \n"
    : "=r" (_level)
    : "rI" (code)
    : "r0", "cc", "memory");

    return _level;
}

Then gcc will be able to optimise variables allocations then only produce
mr or lwz if necessary.
The second thing to consider is that this code is more easily readable
than any inline assembly dependency.
The only drawback is that you have to use the same local variable for the
first argument and the returned value.

[...]


Re: Inline assembler on PowerPC
Quoted text here. Click to load it

Of course, you will still get pretty much the same "mr" instructions in
the stand-alone version of the function (if it is generated) - it is
only in in-lined versions that they could be eliminated.

And I presume you are only doing this optomisation for interest and
understanding, not because you are setting vectors so often that 3
cycles delay here will be a serious issue?

David

Re: Inline assembler on PowerPC

Quoted text here. Click to load it

You won't have any mr in the stand-alone version:
as the arguments are passed in registers r3 ..., then level already is in
r3, func in r4 and type in r5.
As the returned argument is into r3, there won't be any mr at all.
Then, when I compile this function as a standalone one, I get:
00000000 <kSetVector>:
   0:   38 00 00 00     li      r0,0
   4:   44 00 00 02     sc
   8:   4e 80 00 20     blr

Which is optimal.

Quoted text here. Click to load it

We always want optimal code, don't we ? ;-)


Re: Inline assembler on PowerPC
l_indien_no_more_ snipped-for-privacy@magic.fr says...
Quoted text here. Click to load it

Actually no.  Readable (human readable) and correct first.  Optimal is,
at best, a distant third.

Robert

Re: Inline assembler on PowerPC
On Tue, 14 Jun 2005 22:49:54 -0400, R Adsett

Quoted text here. Click to load it

Optimal implies correct code. One cannot decribe anything as an
optimal solution, if it does not do what it is supposed to do.
Things that are obscure at first, become very "Human Readable" if it
is the optimum solution to a problem.
Readable code for even a complete newby programmer is total black
magic to the avarage lay person.

Regards
  Anton Erasmus



Re: Inline assembler on PowerPC
snipped-for-privacy@spam.prevent.net says...
Quoted text here. Click to load it

Well, yes.  The converse is not, I think, true.  Unless of course you
define correct as a synonym for optimal.  In this case though the context
suggests that optimal meant fast.

To quote Knuth "Premature optimization is the root of all evil".  I think
that was Knuth anyway.  Clear, fast enough and small enough are good for
me.  No need to go to the trouble of as small as possible or as fast as
possible in most cases.

I've seen attempts to optimize that ended up only optimizing the obvious
and missed doing the correct thing for the whole set of inputs when the
clear version worked correctly for all cases.  This in a case where the
clear version was fast enough and small enough.

Quoted text here. Click to load it

On this I will disagree.  We've all done clever things at one time or
another that when we went back to them later were far from clear.  If you
have ever used APL I can guarantee it ;)

Quoted text here. Click to load it

Yes, but so what?  If it is necessary to optimize a sequence to fit it
within tight constraints then sufficient supporting comments must be
added to make it clear what is being done and why even to someone who is
encountering it for the first time.  Basic knowledge of the
implementation language and external HW can probably be assumed but when
you start relying on multiple side effects or delay testing a flag for
several instructions you had better warn the unwary reader of the traps
that lay in the code.  I don't expect I get this right all the time
either but I do try.

Robert



Re: Inline assembler on PowerPC
On Thu, 16 Jun 2005 10:57:03 -0400, R Adsett

Quoted text here. Click to load it

No correct code is not neceserally optimal, but I believe that for
code to be the optimal code for a specific problem, it should be
correct. Under all conditions within the specific problems domain.
Optimal code is the smallest and/or fastest set of instructions to
do the specific thing one wants to do.

Quoted text here. Click to load it

If the code is broken when trying to optimize, then the resultant code
is not optimal, but wrong. And yes I do agree that trying to optimize
a total application to the point where it is impossible to get it
faster/smaller is in 99.999% of the cases just a waste of time.

Quoted text here. Click to load it

What I mean is that when confronted with a section of code for the
first time, it might be quite obscure. If this sequence of code is the
optimal solution to a specific problem, and many programmers end up
using this sequence, then it become "Human Readable" by the mere fact
that it is used often, by many people in a well defined context.
For someone used only to high level code, simple basic assembly can be
quite obscure and not readable at all. What is obscure to a beginner
might actually be quite clear to a more experienced person. As in all
most things the difference is not Black/White and exactely where the
line lies is open to debate.

Regards
  Anton Erasmus


Re: Inline assembler on PowerPC
Quoted text here. Click to load it
... snip ...
Quoted text here. Click to load it

I recall an exposition of Knuths some years ago, in which he
reworked some fairly normal code into a peculiar monster.  It was
developed step by step to improve efficiency in a perfectly logical
manner.  IIRC it ended up with a goto into the middle of a
structured statement, which is considered a no-no.

One of the points he made with it was that such derivations should
include the original, and the various steps taken to attain the end
result.  Otherwise it has virtually no chance of making sense to
the later reader.

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Inline assembler on PowerPC
snipped-for-privacy@spam.prevent.net says...
Quoted text here. Click to load it

I think we are agreeing here.  My only point in the above was that the
obsession with optimization appears to often result in somewhat faster
and/or smaller but broken code.

Quoted text here. Click to load it

Again we will have to disagree on this.  Once an implementation goes
beyond the straightforward it becomes obscure.  It may be completely
transparent while you are working on it, but once you leave it for 6
months or a year it will no longer be so.  I still find myself at least
occaisionally beefing up the comments on code when I revisit it later.

As far as a frequently used sequence being clear, it seems to me that
more than once I've read code with a comment that goes something like
"For some reason everyone does this, I don't know why it works but it
does".  Frequently used obscure code appears to also have a tendency to
devolve into magic incantations ;)

Quoted text here. Click to load it

True enough, but there is a difference between a implementation being
obscure because you are not familiar with the language, processor or
harwares and it being obscure because you are using non-straightforward
techniques.  It's this latter that demands special care and attention.
particularly in making sure that whoever follows can figure out what is
going on without needing to spend a lot of time re-inventing the
solution.  Sometimes that involves a short note, sometimes a long
description and sometines a reference to a discussion in a lab book or
paper.

An example might be an FFT.  Any implementation of that should either
contain a full description or better a reference to a full description.  
In this case the refernce would be better since the explanation is almost
certainly more complete than one any of us are likely to have the
patience to complete to accompany the code.

Robert

Re: Inline assembler on PowerPC
Quoted text here. Click to load it

<snip>

Quoted text here. Click to load it

Yes, it is easy to write code that is fast but incorrect!

Quoted text here. Click to load it

Knuth also gave two rules of optimization:

1) Don't do it.
2) (For experts only) Don't do it yet.


Readability goes hand-in-hand with correctness in priority - code that
is unreadable is unlikely to be correct, and even less likely to be
checked to be correct (either by testing or proof).  Getting optimal
code, or at least close to optimal, involves two things - thinking about
your code as you write it, and using a good compiler.  The biggest
difference to the speed and size of code is made when thinking about
what your code should do (i.e., at the algorithmic stage), then by
thinking about your implementation (e.g., using integers instead of
floats, and understanding how your code will fit with the target's
capabilities).  Small things, such as when to use arrays and when to use
pointers, are best left to the compiler if it has a good optimizer.

Re: Inline assembler on PowerPC
On Fri, 17 Jun 2005 09:20:04 +0200, David Brown

Quoted text here. Click to load it

The problem is that many programmers seem to understand. "Write code
as sloppily as possible, and do not even think about whether the
current approach is easy or difficult for the processor"

The worst I have seen was an application where a set of different
routines was called based on configuration data. There was an array of
function pointers, and obviously for readability they decided that
calling function[3](args) is not clear. Calling something like
function[FOO](args) is a lot clearer where FOO gives some indication
of what the function does. In stead of just have a #defined or
enumerated list, the programmers put the names in a character array.
They then had a function that did a string compare on the character
array every time it was called to decide which function pointer in the
array to take. Needless to say this was VERY slow. A very simple
optimization speeded the code up by more 100 times.

Of course it is debatable whether this was optimization or fixing
incorrect code.

Quoted text here. Click to load it

Having now read the rest of yout post :)  I totally agree with your
additions to the Knuth optimization rules. A better algorithm is worth
more than a poor algorithm optimized to the core.

Regards
   Anton Erasmus

Re: Inline assembler on PowerPC
snipped-for-privacy@spam.prevent.net says...
Quoted text here. Click to load it

Well there is a difference between badly designed code and un-optimized
code :).  OTOH this might have made sense if this was part of a command
parser or interpreter.  In a command parser the lookup might have been
small comapred to the typing time ;).   In a straightforward case of
substituting different run-time routines it seems unnecessarily complex
though.


Re: Inline assembler on PowerPC
snipped-for-privacy@westcontrol.removethisbit.com says...
Quoted text here. Click to load it

Well put.

Robert

Re: Inline assembler on PowerPC
Many thanks. That works with one addition: you still have to mention
all the arguments to the "sc" (_level, _func, _type) on the inputs
line, else GCC will optimise them away.
I got it down to:

static inline bool1 kSetVector (uint1 level, pVoid func, int type)
{
    register uint1 _code  __asm__ ("r0") = 0;
    register uint1 _level __asm__ ("r3") = level;
    register pVoid _func  __asm__ ("r4") = func;
    register int   _type  __asm__ ("r5") = type;

    __asm__ __volatile__ (
    "sc       \n"
    : "=r" (_level)
        : "rI" (_code), "0" (_level), "r" (_func), "r" (_type)      
    : "cc", "memory" );

    return _level;
}



:On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote:
:
:> Consider the following (compiler=GCC3.4.3, host=I686,
:> target=powerpc-eabi):
:>
:> typedef void(*pVoid)(void);
:>
:> static inline bool1 kSetVector(uint1 level, pVoid func, int type) {  
:>      int r;
:>      const int code = 0;                                      
:>      __asm__ __volatile__ (                      
:>      " li 0, %1 \n"  /* code */      
:>      " mr 3, %2 \n"  /* level */
:>      " mr 4, %3 \n"  /* func  */                   
:>      " mr 5, %4 \n"  /* type  */                   
:>      " sc       \n"  /* System Call: may corrupt regs: result in r3 */
:>      " mr %0, 3 \n"  /* Return result */          
:>      : "=r" (r)
:>      : "rI" (code), "0" (level), "r" (func), "r" (type)      
:>      : "r0", "cc", "memory"         
:>      );                                          
:>      return r;
:> }
:> ...
:> (void)kSetVector(31, SerialIoInterrupt, 3);
:>
:>  This compiles, & runs fine (producing the code below). However I
:> would like to improve the efficiency, by eliminating the "mr"
:> instructions to move arguments to & from registers. The "sc" needs the
:> data in precisely the registers shown, so GCC needs to be coaxed into
:> using those registers itself.
:
:Imho, the easiest way is to do it ... in C:
:static inline bool1 kSetVector (uint1 level, pVoid func, int type)
:{
:    register uint1 _level __asm__ ("r3");
:    register pVoid _func __asm__ ("r4");
:    register int _type __asm__ ("r5");
:
:    _level = level;
:    _func = func;
:    _type = type;
:    __asm__ __volatile__ (
:    "li 0, %1 \n"
:    "sc       \n"
:    : "=r" (_level)
:    : "rI" (code)
:    : "r0", "cc", "memory");
:
:    return _level;
:}
:
:Then gcc will be able to optimise variables allocations then only produce
:mr or lwz if necessary.
:The second thing to consider is that this code is more easily readable
:than any inline assembly dependency.
:The only drawback is that you have to use the same local variable for the
:first argument and the returned value.
:
:[...]


Re: Inline assembler on PowerPC

Quoted text here. Click to load it

You're absolutely right. I have to admit I wrote it down without testing...

Quoted text here. Click to load it

I just have two questions/remarks:
- why don't you directly initialise _code = code ? This would make code
even more easy to read and won't product more output code.
- I would use "+r" constraint for _level, to follow gcc asm constraints
specifications. But, I'm not a specialist on this point, I must admit...


Quoted text here. Click to load it


Re: Inline assembler on PowerPC
Answering your questions:
1. _code is explicitly a constant: being the function code. There are
several similar definitions in the header file, having different names
& corresponding function codes. The number of arguments varies too.
2. "+r", although legal in pure asm, is not accepted by GCC.


:On Tue, 14 Jun 2005 18:12:31 +0800, David R Brooks wrote:
:
:> Many thanks. That works with one addition: you still have to mention
:> all the arguments to the "sc" (_level, _func, _type) on the inputs
:> line, else GCC will optimise them away.
:
:You're absolutely right. I have to admit I wrote it down without testing...
:
:> I got it down to:
:>
:> static inline bool1 kSetVector (uint1 level, pVoid func, int type)
:> {
:>     register uint1 _code  __asm__ ("r0") = 0;
:>     register uint1 _level __asm__ ("r3") = level;
:>     register pVoid _func  __asm__ ("r4") = func;
:>     register int   _type  __asm__ ("r5") = type;
:>
:>     __asm__ __volatile__ (
:>     "sc       \n"
:>     : "=r" (_level)
:>      : "rI" (_code), "0" (_level), "r" (_func), "r" (_type)      
:>     : "cc", "memory" );
:>
:>     return _level;
:> }
:
:I just have two questions/remarks:
:- why don't you directly initialise _code = code ? This would make code
:even more easy to read and won't product more output code.
:- I would use "+r" constraint for _level, to follow gcc asm constraints
:specifications. But, I'm not a specialist on this point, I must admit...
:
[snip]


Re: Inline assembler on PowerPC

Quoted text here. Click to load it

OK, sorry, I misread your code...

Quoted text here. Click to load it

I did the test, gcc does accept it.
"+r" is documented in gcc documentation (I'm using gcc 2.95.3 as a PowerPC
cross compiler).

[...]


Site Timeline