mixing C and assembly

Possibly but The C programmer will beat that assembler programmer on all other MCU...

He will also be a close second MCU A... More to the point he will be able to turn out reliable, more easily maintainable, applications faster.

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills  Staffs  England     /\/\/\/\/
 Click to see the full signature
Reply to
Chris H
Loading thread data ...

It took a bit of convincing to make me understand how C could work better than assembly for small processors. Part of that was from my own experience, and part of it was that, until fairly recently, C compilers were not good enough to compete with an experienced assembly programmer. But let me give you a brief example of the sort of thing a C compiler can easily do, that an assembly programmer cannot do while still writing maintainable and legible code.

On the COP8 processor, the two most important addressing modes for arithmetic instructions are direct access (in which the memory address is specified in the instruction), and indirect via the B register. Direct access instructions take 3 bytes and 4 cycles, indirect take 1 byte and 1 cycle. (This is from memory, so I might make a few errors here.) Suppose you have a function that adds two global variables, and stores the result in a third. The natural assembly code is something like this:

.sect .data var1: .dsb 1 var2: .dsb 1 sum: .dsb 1 .endsect

.sect .code AddNumbers: ld a, var1 ; 3 bytes, 4 cycles add a, var2 ; 3 bytes, 4 cycles x a, sum ; 3 bytes, 4 cycles ret ; 1 byte, x cycles .endsect

Total: 10 bytes, 12 cycles + ret (I can't remember how many cycles ret takes).

C code:

uint8_t var1, var2, sum; void AddNumbers(void) { sum = var1 + var2; }

Possible compiler-generated assembly code:

.sect .data var1: .dsb 1 var2: .dsb 1 sum: .dsb 1 .endsect

.sect .code AddNumbers: ld b, #var1 ; 2 bytes, 2 cycles IIRC ld a, [b+] ; 1 bytes, 1 cycles add a, [b+] ; 1 bytes, 1 cycles x a, [b] ; 1 bytes, 2 cycles IIRC ret ; 1 bytes, x cycles .endsect

Total: 6 bytes, 6 cycles + ret

Obviously, an assembler programmer could write this code directly as well. But it only works as long as var1, var2 and sum are ordered in this manner. If they were spilt up, the assembly code would break - maintainance and legibility suffer greatly. Perhaps you have other routines that could be optimised using [b] mode if the data were in a different order. Writing the assembly by hand, you've got to figure out which ordering works best - and re-write your functions to take advantage of the ordering. A small change to one part of the code means a re-write for other parts of the code - that's not a good plan for software development. Thus in realistic programs, the programmer will go for the pessimistic code that works regardless of the orderings. A compiler, on the other hand, can pick a reasonable (not *optimal* - that is not achievable in polynomial time, but pretty good nonetheless) ordering based on variable usage, and it will make use of that ordering when generating function code.

Reply to
David Brown

That's just tail call elimination (changing a "call X; ret" into a "jmp X"), which is a standard optimisation technique (some assemblers will do that for you).

A better example would be:

WriteSpace: ld a, #' ' WriteChar: st a, outputCharacter ret

with C code:

extern volatile char outputCharacter; void WriteChar(char c) { outputCharacter = c; } void WriteSpace(void) { WriteChar(' '); }

Reply to
David Brown

A quick test on avr-gcc 4.2.2, using 16-bit and 8-bit ints rather than

32-bit and 16-bit (since it's an 8-bit cpu) reveals that avr-gcc is smart enough to do a 8-bit x 8-bit -> 16-bit multiply as desired. It's a little harder to see exactly what is happening for bigger numbers and for division, since these use library calls - certainly the compiler will generalise some of these functions. But for the very common case of the multiply like this, you get optimal code.
Reply to
David Brown

Robert,

A lot of approach depends on processor. We use the "as if" rule a lot in code generation. In general 8*8->16 bits will use a processor 8*8 if we can. Similarly we grab the MS 8bits when we multiply two 8 bit fracts rather than casting and using a 32 bit multiply.

Regards

-- Walter Banks Byte Craft Limited Tel. (519) 888-6911

formatting link
snipped-for-privacy@bytecraft.com

Reply to
Walter Banks

formatting link

Regards

-- Walter Banks Byte Craft Limited Tel. (519) 888-6911

formatting link
snipped-for-privacy@bytecraft.com

Reply to
Walter Banks

I should have used fixed point type to make the listing fragment clearer. This is the source used in the example.

void bar (void);

void foo (void) { NOP(); bar(); }

void bar (void) { NOP(); }

void main (void) { foo(); bar(); }

Regards

-- Walter Banks Byte Craft Limited Tel. (519) 888-6911

formatting link
snipped-for-privacy@bytecraft.com

Reply to
Walter Banks

Huh? Is something wrong with my writing or with your reading? Where in the above did you see me talking about maintainability or difficulty? The issue at hand is _speed_ and _size_. No more, no less.

That's why the prudent assembly programmer would secure such tricks with assemlby-time assertions. I.e. make the assumptions explicity, and make sure that the code fails to translate if any of them is no longer true.

Agreed. But you're still missing the point under discussion.

Reply to
Hans-Bernhard Bröker

On Apr 27, 10:23=EF=BF=BDam, David Brown

Even

. =EF=BF=BDSo

Walters paper isn't even a test case. Your right that I wont get evidence here, that would require some effort rather than just waffle

Quite so, I use C most of the time.

Thats a different claim from Walter, not being familiar with AVRs I couldnt say one way or the other but it's no good on small PICs.

=EF=BF=BDSplitting your code into separate functions

de up

A C function is not just a subroutine, It's much more than that. It's a complete stand alone program, has no dependencies on either the calling program or programs it calls. The idea is that each function can be developed, tested and debuged independently.

(maybe you're thinking of Forth?).

Ive seen that one before gforth! wont run on windows.

They cant sell this one any more, some european directive, they dont say which one.

So I'm still waiting.

Reply to
cbarn24050

In which case you loose... I can read the C. I cant read the ASM so I won't be able to see that what you have done is the same as the C or even correct.... :-)

The whole point is that the C can be as fast and as small as the ASM but MUCH easier to read, debug and maintain. Certainly far faster to write.

(BTW I do enjoy writing in asm but that is not the point)

Also the compilers can do some optimisations that humans find difficult to do. Some optimisations involve the linker, not just the compiler so I am told be a compiler writer (no, it was not Walter).

So in SOME cases an experienced asm writer MIGHT be able to do smaller faster code than the compiler but certainly NOT in the same time frame. Also that particular experienced ASM programmer can probably only do that for one or two MCU and not for all types of program.

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills  Staffs  England     /\/\/\/\/
 Click to see the full signature
Reply to
Chris H

... snip ...

Defining 'optimal' is a varying target. Among others, see Knuth. In particular, in the past I have compromised on an 8 * 16 -> 24 bit heart, two of which, with an addition, produced a 16 * 16 -> 32 multiplication. This had, on the machine of interest (an 8080), significant advantages, i.e. about a 50% decrease in multiplication times. Other games are available at the compile stage where one operand is constant, especially those where the multiplier consists of some solid string of 1 bits.

--
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: 
 Click to see the full signature
Reply to
CBFalconer

... snip ...

Well, that executes foo (and thus bar), followed by bar. I see no savings there from fall-thru. See my message of Sat. 11:13 am EDT

-0400.

--
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: 
 Click to see the full signature
Reply to
CBFalconer

... snip ...

But that doesn't do anything, because normal C executes a return on the closing brace. Am I missing something?

--
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: 
 Click to see the full signature
Reply to
CBFalconer

It is this type of check that is already embedded in C compilers. Programming in asm is both an exercise in application programming and implementation. C the focus is about application algothrims with an implementation outline.

I don't think so. Most of what I have been saying is use the correct tool for the job. This is not an asm vs C issue. The importance of the work we did that created the white paper is proof that C did not have to be at a performance disadvantage to asm. That said, lets look at the other issues and see where C has an advantage.

We are increasingly seeing ISA's that were designed specifically for machine generated code. Our focus has always been on making the code generation process easier.

Regards

-- Walter Banks Byte Craft Limited Tel. (519) 888-6911

formatting link
snipped-for-privacy@bytecraft.com

Reply to
Walter Banks

So at least some compilers do so. Thanks.

Robert

** Posted from
formatting link
**
Reply to
Robert Adsett
0F4412A512EFDD0498C5F87E Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit

CBFalc> Walter Banks wrote:

There is a savings

Look at the listing I posted before. It follows in fixed point type. Don't start a rant about html please

w..

void bar (void);

void foo (void) {

0100 9D NOP NOP(); bar(); }

void bar (void) {

0101 9D NOP NOP(); 0102 81 RTS }

void main (void) {

0103 AD FB BSR $0100 foo(); 0105 20 FA BRA $0101 bar(); }

__MAIN: FFFE 01 03

Reply to
Walter Banks

You must be missing something :-) Your example code was not very helpful, because your first version implied that foo is a callable function in its own right - making a combined fall-through foobar would require duplicating the code for foo. Thus Walter did a direct translation to C and generated code that was slightly better than your first assembly code. In the code I've given, I wrote an assembly function with two distinct entry points, and the typical equivalent C code for it. The question is, will Walter's C compiler generate a fall-through here?

Reply to
David Brown

Yes, "optimal" can mean different things - code size, speed, stack use and ram size being the most common points. "optimal" also depends on things like shared library code, and any other information that the compiler may have. That's why I restricted my test to a simple 8x8->16 multiply on the AVR - the generated code is simple enough to be optimal in every way.

Reply to
David Brown

Good to know, thanks Walter.

Robert

** Posted from
formatting link
**
Reply to
Robert Adsett

Actually, no, that's not the point under discussion. The point is this, and only this statement taken from your post of 2008-04-26, 07:36h+0100h:

code than assembler.

I rejected exactly that statement, as written, and you think I shouldn't. I haven't seen any argument to support that statement.

Instead you (and Walter) argue against things I didn't say.

Yes, compilers users produce good code faster than assembler users. But that doesn't mean compilers produce faster code than assembler.

There is no code a compiler can "produce" that an assembler couldn't, so it's impossible for them to have the rigid advantage claimed by the quoted statement.

Reply to
Hans-Bernhard Bröker

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.