Inline assembler on PowerPC

Consider the following (compiler=GCC3.4.3, host=I686, target=powerpc-eabi):

typedef void(*pVoid)(void);

static inline bool1 kSetVector(uint1 level, pVoid func, int type) { int r; const int code = 0; __asm__ __volatile__ ( " li 0, %1 \n" /* code */ " mr 3, %2 \n" /* level */ " mr 4, %3 \n" /* func */ " mr 5, %4 \n" /* type */ " sc \n" /* System Call: may corrupt regs: result in r3 */ " mr %0, 3 \n" /* Return result */ : "=r" (r) : "rI" (code), "0" (level), "r" (func), "r" (type) : "r0", "cc", "memory" ); return r; } ... (void)kSetVector(31, SerialIoInterrupt, 3);

This compiles, & runs fine (producing the code below). However I would like to improve the efficiency, by eliminating the "mr" instructions to move arguments to & from registers. The "sc" needs the data in precisely the registers shown, so GCC needs to be coaxed into using those registers itself.

Generated code (comments added):

54:h/services.h **** static inline bool1 kSetVector(uint1 level, pVoid func, int type) { 203 .loc 2 54 0 204 019c 3940001F li 10,31 /* level */ 205 01a0 3D200000 lis 9,SerialIoInterrupt@ha /* func */ 206 01a4 39290000 la 9,SerialIoInterrupt@l(9) 207 01a8 39600003 li 11,3 /* type */ 208 .LBB3: 55:h/services.h **** int r; 56:h/services.h **** const int code = 0; 57:h/services.h **** __asm__ __volatile__ ( 209 .loc 2 57 0 210 01ac 38000000 li 0, 0 211 01b0 7D435378 mr 3, 10 /* The "mr's" I want to remove */ 212 01b4 7D244B78 mr 4, 9 213 01b8 7D655B78 mr 5, 11 214 01bc 44000002 sc 215 01c0 7C6A1B78 mr 10, 3 /* result */

In the X86 builds of GCC, there are "register loading codes", as "c", "a" & "D" in the following example (from: "Using Inline Assembly With gcc" by Clark L. Coleman).

asm ("cld\n\t" "rep\n\t" "stosl" : /* no output registers */ : "c" (count), "a" (fill_value), "D" (dest) : "%ecx", "%edi" );

Is there a similar device for the PowerPC, whereby I can tell GCC to create the values in specific registers, so eliminating the need for those "mr" instructions? TIA,

Reply to
David R Brooks
Loading thread data ...

Imho, the easiest way is to do it ... in C: static inline bool1 kSetVector (uint1 level, pVoid func, int type) { register uint1 _level __asm__ ("r3"); register pVoid _func __asm__ ("r4"); register int _type __asm__ ("r5");

_level = level; _func = func; _type = type; __asm__ __volatile__ ( "li 0, %1 \n" "sc \n" : "=r" (_level) : "rI" (code) : "r0", "cc", "memory");

return _level; }

Then gcc will be able to optimise variables allocations then only produce mr or lwz if necessary. The second thing to consider is that this code is more easily readable than any inline assembly dependency. The only drawback is that you have to use the same local variable for the first argument and the returned value.

[...]
Reply to
l'indien

Of course, you will still get pretty much the same "mr" instructions in the stand-alone version of the function (if it is generated) - it is only in in-lined versions that they could be eliminated.

And I presume you are only doing this optomisation for interest and understanding, not because you are setting vectors so often that 3 cycles delay here will be a serious issue?

David

Reply to
David Brown

You won't have any mr in the stand-alone version: as the arguments are passed in registers r3 ..., then level already is in r3, func in r4 and type in r5. As the returned argument is into r3, there won't be any mr at all. Then, when I compile this function as a standalone one, I get:

00000000 : 0: 38 00 00 00 li r0,0 4: 44 00 00 02 sc 8: 4e 80 00 20 blr

Which is optimal.

We always want optimal code, don't we ? ;-)

Reply to
l'indien

Many thanks. That works with one addition: you still have to mention all the arguments to the "sc" (_level, _func, _type) on the inputs line, else GCC will optimise them away. I got it down to:

static inline bool1 kSetVector (uint1 level, pVoid func, int type) { register uint1 _code __asm__ ("r0") = 0; register uint1 _level __asm__ ("r3") = level; register pVoid _func __asm__ ("r4") = func; register int _type __asm__ ("r5") = type;

__asm__ __volatile__ ( "sc \n" : "=r" (_level) : "rI" (_code), "0" (_level), "r" (_func), "r" (_type) : "cc", "memory" );

return _level; }

l'indien wrote:

:On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote: : :> Consider the following (compiler=GCC3.4.3, host=I686, :> target=powerpc-eabi): :> :> typedef void(*pVoid)(void); :> :> static inline bool1 kSetVector(uint1 level, pVoid func, int type) { :> int r; :> const int code = 0; :> __asm__ __volatile__ ( :> " li 0, %1 \n" /* code */ :> " mr 3, %2 \n" /* level */ :> " mr 4, %3 \n" /* func */ :> " mr 5, %4 \n" /* type */ :> " sc \n" /* System Call: may corrupt regs: result in r3 */ :> " mr %0, 3 \n" /* Return result */ :> : "=r" (r) :> : "rI" (code), "0" (level), "r" (func), "r" (type) :> : "r0", "cc", "memory" :> ); :> return r; :> } :> ... :> (void)kSetVector(31, SerialIoInterrupt, 3); :> :> This compiles, & runs fine (producing the code below). However I :> would like to improve the efficiency, by eliminating the "mr" :> instructions to move arguments to & from registers. The "sc" needs the :> data in precisely the registers shown, so GCC needs to be coaxed into :> using those registers itself. : :Imho, the easiest way is to do it ... in C: :static inline bool1 kSetVector (uint1 level, pVoid func, int type) :{ : register uint1 _level __asm__ ("r3"); : register pVoid _func __asm__ ("r4"); : register int _type __asm__ ("r5"); : : _level = level; : _func = func; : _type = type; : __asm__ __volatile__ ( : "li 0, %1 \n" : "sc \n" : : "=r" (_level) : : "rI" (code) : : "r0", "cc", "memory"); : : return _level; :} : :Then gcc will be able to optimise variables allocations then only produce :mr or lwz if necessary. :The second thing to consider is that this code is more easily readable :than any inline assembly dependency. :The only drawback is that you have to use the same local variable for the :first argument and the returned value. : :[...]

Reply to
David R Brooks

You're absolutely right. I have to admit I wrote it down without testing...

I just have two questions/remarks:

- why don't you directly initialise _code = code ? This would make code even more easy to read and won't product more output code.

- I would use "+r" constraint for _level, to follow gcc asm constraints specifications. But, I'm not a specialist on this point, I must admit...

Reply to
l'indien

Answering your questions:

  1. _code is explicitly a constant: being the function code. There are several similar definitions in the header file, having different names & corresponding function codes. The number of arguments varies too.
  2. "+r", although legal :On Tue, 14 Jun 2005 18:12:31 +0800, David R Brooks wrote: : :> Many thanks. That works with one addition: you still have to mention :> all the arguments to the "sc" (_level, _func, _type) on the inputs :> line, else GCC will optimise them away. : :You're absolutely right. I have to admit I wrote it down without testing... : :> I got it down to: :> :> static inline bool1 kSetVector (uint1 level, pVoid func, int type) :> { :> register uint1 _code __asm__ ("r0") = 0; :> register uint1 _level __asm__ ("r3") = level; :> register pVoid _func __asm__ ("r4") = func; :> register int _type __asm__ ("r5") = type; :> :> __asm__ __volatile__ ( :> "sc \n" :> : "=r" (_level) :> : "rI" (_code), "0" (_level), "r" (_func), "r" (_type) :> : "cc", "memory" ); :> :> return _level; :> } : :I just have two questions/remarks: :- why don't you directly initialise _code = code ? This would make code :even more easy to read and won't product more output code. :- I would use "+r" constraint for _level, to follow gcc asm constraints :specifications. But, I'm not a specialist on this point, I must admit... : [snip]
Reply to
David R Brooks

Actually no. Readable (human readable) and correct first. Optimal is, at best, a distant third.

Robert

Reply to
R Adsett

OK, sorry, I misread your code...

I did the test, gcc does accept it. "+r" is documented in gcc documentation (I'm using gcc 2.95.3 as a PowerPC cross compiler).

[...]
Reply to
l'indien

Optimal implies correct code. One cannot decribe anything as an optimal solution, if it does not do what it is supposed to do. Things that are obscure at first, become very "Human Readable" if it is the optimum solution to a problem. Readable code for even a complete newby programmer is total black magic to the avarage lay person.

Regards Anton Erasmus

Reply to
Anton Erasmus

Well, yes. The converse is not, I think, true. Unless of course you define correct as a synonym for optimal. In this case though the context suggests that optimal meant fast.

To quote Knuth "Premature optimization is the root of all evil". I think that was Knuth anyway. Clear, fast enough and small enough are good for me. No need to go to the trouble of as small as possible or as fast as possible in most cases.

I've seen attempts to optimize that ended up only optimizing the obvious and missed doing the correct thing for the whole set of inputs when the clear version worked correctly for all cases. This in a case where the clear version was fast enough and small enough.

On this I will disagree. We've all done clever things at one time or another that when we went back to them later were far from clear. If you have ever used APL I can guarantee it ;)

Yes, but so what? If it is necessary to optimize a sequence to fit it within tight constraints then sufficient supporting comments must be added to make it clear what is being done and why even to someone who is encountering it for the first time. Basic knowledge of the implementation language and external HW can probably be assumed but when you start relying on multiple side effects or delay testing a flag for several instructions you had better warn the unwary reader of the traps that lay in the code. I don't expect I get this right all the time either but I do try.

Robert

Reply to
R Adsett

No correct code is not neceserally optimal, but I believe that for code to be the optimal code for a specific problem, it should be correct. Under all conditions within the specific problems domain. Optimal code is the smallest and/or fastest set of instructions to do the specific thing one wants to do.

If the code is broken when trying to optimize, then the resultant code is not optimal, but wrong. And yes I do agree that trying to optimize a total application to the point where it is impossible to get it faster/smaller is in 99.999% of the cases just a waste of time.

What I mean is that when confronted with a section of code for the first time, it might be quite obscure. If this sequence of code is the optimal solution to a specific problem, and many programmers end up using this sequence, then it become "Human Readable" by the mere fact that it is used often, by many people in a well defined context. For someone used only to high level code, simple basic assembly can be quite obscure and not readable at all. What is obscure to a beginner might actually be quite clear to a more experienced person. As in all most things the difference is not Black/White and exactely where the line lies is open to debate.

Regards Anton Erasmus

Reply to
Anton Erasmus

... snip ...

I recall an exposition of Knuths some years ago, in which he reworked some fairly normal code into a peculiar monster. It was developed step by step to improve efficiency in a perfectly logical manner. IIRC it ended up with a goto into the middle of a structured statement, which is considered a no-no.

One of the points he made with it was that such derivations should include the original, and the various steps taken to attain the end result. Otherwise it has virtually no chance of making sense to the later reader.

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
 Click to see the full signature
Reply to
CBFalconer

Yes, it is easy to write code that is fast but incorrect!

Knuth also gave two rules of optimization:

1) Don't do it. 2) (For experts only) Don't do it yet.

Readability goes hand-in-hand with correctness in priority - code that is unreadable is unlikely to be correct, and even less likely to be checked to be correct (either by testing or proof). Getting optimal code, or at least close to optimal, involves two things - thinking about your code as you write it, and using a good compiler. The biggest difference to the speed and size of code is made when thinking about what your code should do (i.e., at the algorithmic stage), then by thinking about your implementation (e.g., using integers instead of floats, and understanding how your code will fit with the target's capabilities). Small things, such as when to use arrays and when to use pointers, are best left to the compiler if it has a good optimizer.

Reply to
David Brown

The problem is that many programmers seem to understand. "Write code as sloppily as possible, and do not even think about whether the current approach is easy or difficult for the processor"

The worst I have seen was an application where a set of different routines was called based on configuration data. There was an array of function pointers, and obviously for readability they decided that calling function[3](args) is not clear. Calling something like function[FOO](args) is a lot clearer where FOO gives some indication of what the function does. In stead of just have a #defined or enumerated list, the programmers put the names in a character array. They then had a function that did a string compare on the character array every time it was called to decide which function pointer in the array to take. Needless to say this was VERY slow. A very simple optimization speeded the code up by more 100 times.

Of course it is debatable whether this was optimization or fixing incorrect code.

Having now read the rest of yout post :) I totally agree with your additions to the Knuth optimization rules. A better algorithm is worth more than a poor algorithm optimized to the core.

Regards Anton Erasmus

Reply to
Anton Erasmus

I think we are agreeing here. My only point in the above was that the obsession with optimization appears to often result in somewhat faster and/or smaller but broken code.

Again we will have to disagree on this. Once an implementation goes beyond the straightforward it becomes obscure. It may be completely transparent while you are working on it, but once you leave it for 6 months or a year it will no longer be so. I still find myself at least occaisionally beefing up the comments on code when I revisit it later.

As far as a frequently used sequence being clear, it seems to me that more than once I've read code with a comment that goes something like "For some reason everyone does this, I don't know why it works but it does". Frequently used obscure code appears to also have a tendency to devolve into magic incantations ;)

True enough, but there is a difference between a implementation being obscure because you are not familiar with the language, processor or harwares and it being obscure because you are using non-straightforward techniques. It's this latter that demands special care and attention. particularly in making sure that whoever follows can figure out what is going on without needing to spend a lot of time re-inventing the solution. Sometimes that involves a short note, sometimes a long description and sometines a reference to a discussion in a lab book or paper.

An example might be an FFT. Any implementation of that should either contain a full description or better a reference to a full description. In this case the refernce would be better since the explanation is almost certainly more complete than one any of us are likely to have the patience to complete to accompany the code.

Robert

Reply to
R Adsett

Well put.

Robert

Reply to
R Adsett

Well there is a difference between badly designed code and un-optimized code :). OTOH this might have made sense if this was part of a command parser or interpreter. In a command parser the lookup might have been small comapred to the typing time ;). In a straightforward case of substituting different run-time routines it seems unnecessarily complex though.

Reply to
R Adsett

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.