What micros do you actually hate to work with?

It sounds as if you are comparing experienced, knowledgeable embedded assembler programmers with C programmers who don't have a clue about using C in embedded systems. A better comparison, surely, would be to compare to an experienced, knowledgeable embedded C programmer -- the kind who has a very good idea of the machine code that will result from every line of his C source.

Reply to
Mike Silva
Loading thread data ...

You are arguing... The generality of the example means that any of the meaningless instructions can be replaced with a sequence of useful instructions.

What human factor? We are comparing two languages, not two humans.

Is your GIF code an actual example where you got a factor of 10? If so, compared to which C code? If not, how did you derive the factor of 10?

It's probably easy to locate some badly written GIF code, a good version might be more difficult... In any case both versions would need to adhere to the same spec.

Wilco

Reply to
Wilco Dijkstra

"Anton Erasmus" skrev i meddelandet news: snipped-for-privacy@4ax.com...

...

Output from IAR compiler, maximum optimization for space:

4 unsigned short f(unsigned short x) \ f: 5 { \ 00000000 01B8 MOVW R23:R22, R17:R16 6 unsigned short y; 7 y = x + (x >> 2); 8 y = x + (y >> 2); 9 y = x + (y >> 4); 10 y = x + (y >> 2); 11 y = x - (y >> 2); 12 y = x + (y >> 3); 13 return y >> 1; \ 00000002 .... RCALL ?Subroutine0 \ ??CrossCallReturnLabel_3: \ 00000004 0F06 ADD R16, R22 \ 00000006 1F17 ADC R17, R23 \ 00000008 E044 LDI R20, 4 \ 0000000A .... RCALL ?Subroutine1 \ ??CrossCallReturnLabel_0: \ 0000000C .... RCALL ?Subroutine0 \ ??CrossCallReturnLabel_4: \ 0000000E 019B MOVW R19:R18, R23:R22 \ 00000010 1B20 SUB R18, R16 \ 00000012 0B31 SBC R19, R17 \ 00000014 0189 MOVW R17:R16, R19:R18 \ 00000016 E043 LDI R20, 3 \ 00000018 .... RCALL ?Subroutine1 \ ??CrossCallReturnLabel_1: \ 0000001A REQUIRE ?Subroutine3 \ 0000001A ; // Fall through to label ?Subroutine3 14 }

\ In segment CODE, align 2, keep-with-next \ ?Subroutine3: \ 00000000 9516 LSR R17 \ 00000002 9507 ROR R16 \ 00000004 9508 RET

\ In segment CODE, align 2, keep-with-next \ ?Subroutine1: \ 00000000 ........ CALL ?US_SHR_L02 \ 00000004 0F06 ADD R16, R22 \ 00000006 1F17 ADC R17, R23 \ 00000008 9508 RET

\ In segment CODE, align 2, keep-with-next \ ?Subroutine0: \ 00000000 .... RCALL ?Subroutine2 \ ??CrossCallReturnLabel_2: \ 00000002 0F06 ADD R16, R22 \ 00000004 1F17 ADC R17, R23 \ 00000006 REQUIRE ?Subroutine2 \ 00000006 ; // Fall through to label ?Subroutine2

\ In segment CODE, align 2, keep-with-next \ ?Subroutine2: \ 00000000 9516 LSR R17 \ 00000002 9507 ROR R16 \ 00000004 .... RJMP ?Subroutine3>

Total of 24 instructions. Point proven...

--
Best Regards,
Ulf Samuelsson
 Click to see the full signature
Reply to
Ulf Samuelsson

But it indeed has literally translated the code as expected, it just needs 2 instructions for every operation.

I like this bit:

swap r27 swap r26 andi r26,0x0f eor r26,r27 andi r27,0x0f eor r26,r27

This is a good example where a typical assembler programmer would have gone for repeating lsr+ror 4 times rather than this neat swapping trick.

On a 32-bit CPU use unsigned instead of unsigned short, this makes the ARM version use exactly 8 instructions!

Wilco

Reply to
Wilco Dijkstra

compete with asm. There is no mechanism in the language to specify that the elapsed time between two points in code should always be a specific amount of time. It is also not an

I don't like that method. As Walter points out, assembly makes sense when you need predictable timing of sections of code, as the semantic is simply missing from C.

For example (and I can supply a nicely prepared white paper on the subject to those interested), in some cases the speed at which an arbitration scheme can operate depends largely on the predictability of the open-collector/open-drain arbitration line operation by each processor. In such cases, and note here we are talking about open drain outputs where a '1' bit is often handled slightly differently than a '0' bit when driving, you want the time required to output a '1' and the time required to output a '0' in some series of bit values to be _exactly_ the same. Not even one cycle different, if possible. You cannot specify this in C. It doesn't have way for one side of an if branch to be specified as taking the same cycle count as the other. It's possible to add this to C. But so far, no compilers offer it.

This example is one of several where the semantic scope of C fails, since it simply has no way to direct the compiler towards a goal.

However, your suggestion above also isn't good enough. I need to have the least number of possible cycles on this processor -- the least number where both branches have exactly the same timing, that is.

I can't afford to have two function calls, associated overhead, etc. And frankly, I don't believe that the implementors of such things would truly get this down to exact cycle count precisions. They'd screw up, almost for sure. Finally, I don't want to try 320, then when that works, try 310, then when that works, try 280, etc. This kind of playing around is painful and I wouldn't want any part of it.

What makes sense is blocking out some code between two #pragma's or specially interpreted comment lines, as is used by lint. Which to use, I don't know. But the Bulldog compiler illustrates some means by which it was actually done, I think.

Code between the #pragma's would be required to have a fixed execution time, regardless of any code edge transitions within it. Simple. It would be the compiler optimizations which would attempt to find the minimum possible timing. But it doesn't have to be perfect in that sense, just as no C compiler is now required to always produce the fastest possible implementation. Just fixed.

Jon

Reply to
Jonathan Kirwan

That's no threat. It's pleasurable.

Jon

Reply to
Jonathan Kirwan

I've only had exactly one case in my entire professional life where I was permitted to implement the exact same application, from top to bottom, in both assembly and C. It was on a PIC processor using Microchip's C compiler tools and their assembly under MPLAB. I was the only programmer on the project and the first implementation was in assembly and had to fit into a processor with 4k of code space. The second implementation was to be placed into the then newer PIC18 with

32k of code space with an eye to start adding features, once it was exactly ported into C at the start.

This kind of real-life application test doesn't happen that often.

We'd first decided upon the assembly route in the earlier incarnation because of some tests we did with the earlier C compiler and the very limited code space we had available to us in the available PICs that were appropriate to the design. At that time, the PIC18 line of parts was 'very new' and we still hadn't even been able to get samples. So there was no way to consider it, seriously. We kept our eye on them, though.

In this case, also, I was the only programmer. I've been using C since 1978 and I've actually written a toy C compiler on my own, so I hope you can accept the fact that I do know a little about how to use it. I knew the application well from having written it already in the first place. So going into the C incarnation, it was probably the best of possible circumstances for the C side -- I knew all the details of various functions needed. Similarly, the assembly had fit into just 2k. After writing a nicely designed equivalent in C, the footprint was just over 12k, by comparison. Data footprint wasn't that much different, to be honest.

The time I had available for writing both the assembly version and the C version was also similar, by the way -- four months or so. One of the time-wasters in writing the C had to do with the compiler's use of static compiler temporaries. This was bad news for interrupt events calling C, because the compiler's live variable analysis was not able to cover that circumstance. And it cost me time to track that down and design a work-around for the case. Others just had to do with learning all the #pragma's needed to deal with variable placement, etc.

Now, I'm a very experienced assembly code writer, too. I would put myself against any C compiler on small or large programs without any anticipation worries that I wouldn't be able to beat the output on any measure enough so that anyone else looking at it from the outside would agree that it was enough better to be worth having, and on a similar schedule.

One of the advantages in writing assembly is that your semantic options are wider. All tools are two-edged swords. C's advantages are also its disadvantages. Same for assembly. But assembly does unarguably have wider semantic options, whether we are talking about extremely miniscule details such as access to applying status bits in ways that C cannot directly support (and if you imagine I mean clearing or setting some status bit you have no clue what I'm talking about here), to mid-level semantic choices such as exact timing regardless of code edge taken, or large-scale semantic choices such as the fixed and varying assignment of registers and value passing modes, mixing various styles of function prologues/epilogues, or coroutine semantics (simply unavailable in C.) Whether or not these mean as much to your application as some of the benefits of C is another matter.

And none of any of this means one must use either C or use assembly. Most of my applications use both, to be honest. So for me, it's an amalgam that works more often. But one of the really big reasons for using C, is that other programmers for C are easier to find. And that can be very important.

But there are times and places for assembly code. And some applications are so competitive that the nickels and dimes are important, or the power consumption is important, or the die size is important, or... and assembly can at times make the difference.

My hope is that everyone be proficient at both.

I don't know if anyone else here can say that they actually were put in a position of doing the exact same project twice -- once in assembly and once in C -- that they had excellent experience in using both and were competent at both and can then make a real comparison of a real world case. But I have had that experience once. And there was a remarkable difference in code side (not data size.)

I've also challenged myself in writing code snippets in both. I suppose a lot of us may have done that. I'd be happy to provide one such example here and let anyone try their compilers on it and see what I did in assembly, by comparison. But others will legitimately argue this proves nothing and I'd agree with them. Still, it may open an eye or two.

Meanwhile, C is a very good choice for many if not most applications, where a good C tool is available.

Jon

Reply to
Jonathan Kirwan

That's not what I was referring to. I was referring to constructs like __SP__ to refer to the stack pointer (different meaning of intrinsic, perhaps you use a different term). I see that as non-C and certainly non portable. Similary intrinsics that access status flags.

Not the same thing at all as forcing the result to returned in the carry flag. C pretty much requires wider return values, although maybe you could come up with something using C99's new boolean type.

I sometimes drop to asm (your 1% is not a bad estimate) for performance but more usually for interrupt epilogue/prologue and task switching. I know those items are non-portable and I prefer to have precise control over what is happening in those cases.

Robert

Reply to
Robert Adsett
3g2000cwc.googlegroups.com...

is

to

rily true in the embedded world.

ybody agrees it is exactly the opposite, assembler is typically

e is no human factor involved. Your claim was that you can

Your missing the point, its not a matter of compiler efficiency, even if you had a perfect compiler it wouldnt make any difference.

Reply to
cbarn24050

Actually, most of them aren't. There are some "intrinsics" like or the offsetof() macro that are standardized, but stuff like __nop(), __interrupt(), or __rotate_with_carry() aren't. Not even close.

The majority of them aren't.

And most programmers would never know, because they would have written

xa = yb + zc;

, i.e. used the next-highest data type, and let the compiler take care of carry.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.
Reply to
Hans-Bernhard Broeker

ounds as if you are comparing experienced, knowledgeable embedded

Surely that would mean an expert assembly programmer? Isnt C supposed to remove that requirement?

Reply to
cbarn24050

Good point. Walter, any examples ?

[Modula-2 has a BOOLEAN type, and does return in the CY flag, so I'd forgotten this older-C drawback ]

-jg

Reply to
Jim Granville

Hello Robert,

They seem to be non-portable even between versions for different processors. Ulf mentioned __delay_cycles on the IAR (for AVR), doesn't show on the IAR version for the MSP430. But it does offer __set_SP_register.

--
Regards, Joerg

http://www.analogconsultants.com
Reply to
Joerg

Unpowered ones

martin

Reply to
martin griffith

sounds as if you are comparing experienced, knowledgeable embedded

Who ever said such a thing? C is no magic bullet that guarantees the practicioner will understand embedded hardware, software and systems, and the ways to squeeze the most performance out of the whole package. Forget assembler, you could just as well claim that a good embedded C programmer will produce code that is 10x smaller and/or faster than a C programmer who doesn't have a clue about embedded programming. You might even get a lot more takers on that claim, BTW.

Reply to
Mike Silva

cannot compete with asm. There is no mechanism in the language to specify that the elapsed time between two points in code should always be a specific amount of time. It is also not an

I am inclined to agree. On the other hand I've been lucky enough not to need to control timing via instruction timing. If my timing needs are that dire I usually look to hardware.

Not my suggestion really. Just commenting that if you needed a construct to synchronize your timing that was a useful syntax. I can think of cases where someone might find it useful. Communications protocols come to mind where you want to sample exactly n cycles after a certain point. If you just want to be consistent on multiple paths that's a different problem. I understand some crypto functions might benefit from the latter.

Why would there be function calls. For instruction level timing I would expect the compiler to implement it internally.

That way lies madness no question. I don't even trust interrupt keywords so it's not likely I'll use the construct myself anytime soon.

I am rather curious as to how well it might be done by a compiler though, so I'm happy to have set up a straw man.

Robert

Reply to
Robert Adsett

my entire professional life where I

Your absolutly right but your comments will fall on deaf ears, some because they are selling compilers and some because they dont know any better.

Reply to
cbarn24050

cannot compete with asm. There is no mechanism in the language to specify that the elapsed time between two points in code should always be a specific amount of time. It is also not an

I think the Bulldog compiler is a rich source of ideas on this score. It had to deal with different timings on adjacent DRAM banks, which means it had to know about this when compiling. It had to deal with pushing up code blocks across code edges and conditionally toss the results. It needed programmer information about which branches were more likely and could aide in generating that information. Etc. It did quite a bit. It was designed for VLIW, but some of the ideas may be appropriate for today's embedded use -- or at least provide one possible model. A lot of new(ish) compiler idea details were exposed there.

My preference is something akin to this pidgeon code:

failure= 0; for ( i= 0; i < 7; ++i ) { #pragma begin_fixed_execution_time if ( v & mask ) { port.bit= 1; tris.bit= 1; nop; /* simulated single-cycle delay */ nop; nop; if ( port.bit == 0 ) { failure= 1; break; } } else { port.bit= 0; tris.bit= 0; /* no sampling required */ } #pragma end_fixed_execution_time v >>= 1; }

Where I don't have to care about anything except that the per-bit timing is fixed.

Jon

Reply to
Jonathan Kirwan

"Wilco Dijkstra" schreef in bericht news:S4bXg.2759$ snipped-for-privacy@newsfe3-gui.ntli.net...

What would it look like... a circular buffer in which the squared readings are stored, update the sum of all values stored in the buffer and divide by buffer lenght, and calculate a square root of that average as final output result. Repeat for each new value stored in the circular buffer.

Of course you would use a buffer with 2^n lenght, to keep it all as easy as possible. What is left, is a series of adds, shifts and subs.

It's not even an interesting example. It's just one of those tasks that scream for optimisation because such tasks often run at a high repetition rate and when done sloppy it would use too much processing power. So there is real work to be done, no matter if you do it in C or ASM.

The big advantage (imo) using C is with the rest of application. Writing code for that is relative easy, since it is easy you can allow yourself to make it flexible, add nice features, add clever features, make it behave a bit intelligent, resulting in a much better product.

I'm not interested in his 60 byte mean and lean code. Everybody can do that. I want to see the remaining part of his application.

--
Thanks, Frank.
(remove 'q' and '.invalid' when replying by email)
Reply to
Frank Bemelman

"Jonathan Kirwan" schreef in bericht news: snipped-for-privacy@4ax.com...

Okay, so you are a very talented programmer, both in C and ASM. But you wrote such a long reply, that you forgot my words:

You may well be king on the mountain of all ASM programmers, but the mountain is big and the summit is small.

--
Thanks, Frank.
(remove 'q' and '.invalid' when replying by email)
Reply to
Frank Bemelman

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.