code optimization in embedded systems

yusibin · 2007-11-15T13:53:08+00:00

1. using switch kerword properlyRemember that put the most frequently used case branch on the firstplace of switch-case branches. It helps to reduce time spent ondetemining which branch to execute. 2. using assembly language for time-critical operationsReplace codes written in C language with assembly language wouldgreatly improve the performance especially when the code is executedfrequently. Note that using assembly language may decrease thereadability of the code3. using register keywordUsing a register variable tell the compiler to put this variable inthe CPU's registers instead of stack. The speed to read variablesstored in registers is faster than read variables store in memory. 4. Avoid using some standard C library functionsStandard C functions are designed for general use. They must considerall the possibility, this increase the code size(e.g, springf isoversized for some embedded systems). You can write your own standard-C-fuction-like functions, with some features ignored.for more info, refer to

J

Jyrki Saarinen 18 years ago

often it

Ok. I have to admit, that I don't have experience about such a situation. Since you seem to have more knowledge about these things, would it be impossible to review my project:

formatting link

Just in case that I'm not on the wrong track.. with 20kV..

Jyrki Saarinen http://koti.welho.com/jsaari88/

Vote

J

Jyrki Saarinen 18 years ago

Dijkstra, IMHO sizeof(x) can be implemted in IR level. It just must be specified by the language. Perhaps your point was about C being quite inprecise?

This evil switch-case was what I did in 1998 in a project. I might be useful that you should try google and "vtable".

Jyrki Saarinen http://koti.welho.com/jsaari88/

Vote

J

Jyrki Saarinen 18 years ago

Likely the first.. Like I have said, I'm not a compiler writer, nor even these kind of duties are available in Finland.

I am normal M.Sc. piece in CS, and being intereted in compilers. You don't have to insult me for not knowing something that you do.

So this might be one reason why gcc has a huge amount of source..? (as I said, I am not a compiler expert as you are, no means, but being a very strong SW guy).

Let's say that we a piece of code in SSA; quite a lot of optimizations can be done on SSA, more easily than in other forms. Isn't this the point of SSA after all? One "nice" algorithm was global value numbering, which did "by accident.." eliminate quite a lot of redundancy.

Jyrki Saarinen http://koti.welho.com/jsaari88/

Vote

J

Jyrki Saarinen 18 years ago

concatenations to calculate the key and

Not to say that this wasn't true, but being interested, whose STL implementation this was?

Jyrki Saarinen http://koti.welho.com/jsaari88/

Vote

J

Jyrki Saarinen 18 years ago

Knuth has his wisdom (which Volume is being written now?) Now, probably any reasoable human being could think about this..

Jyrki Saarinen http://koti.welho.com/jsaari88/

Vote

S

Stefan Reuther 18 years ago

I don't know why you're addressing Dijkstra here. Sizeof(x) cannot be implemented in IR level. You need sizeof() values even at the front-end layer. struct foo { char x; long y; }; will have size 3 on a 56k, 5 on a 8-bit chip like PIC18 or 8088, 6 on a

16-bit chip like 8086, 8 on 386's, and 16 on Alphas. Now, your program may contain this: int a[sizeof(struct foo) - 6]; That program isn't even valid if sizeof() yields 6 or less. Don't claim "nobody writes such code". That's true, but a compiler front-end would have to handle it anyway. Hence, the front-end needn't only produce some kind of IR and hand that to the middle/back-end, it must also be able to query the back-end "build the layout of this structure and tell me its size". It gets even worse if you have to instantiate C++ templates with sizeof parameters.

Thanks, I did so. 15 years ago. The tool I used was called "book", not "google", though (who was accusing someone of insulting in a parallel post?). I also implemented a few toy compilers.

Implementing states as classes with virtual functions for each event (=State pattern) will build vtables of size Num_States x Num_Events x sizeof(vtable_entry). 100 states x 1000 events x 8 bytes = 800k just for vtables. The switch() solution can be a lot better (remember: "can be", not "always is").

Stefan

Vote

W

Walter Banks 18 years ago

C99 with ISO/IEC 18037

Dd = Ds + Dd + CC.X;

Regards,

-- Walter Banks Byte Craft Limited Tel. (519) 888-6911 Fax (519) 746 6751

formatting link

snipped-for-privacy@bytecraft.com

Vote

S

Stefan Reuther 18 years ago

The C++ standard used to be strange about std::list's complexity requirements. It required O(1) size() and O(n) splice(), which IMHO is rubbish, but most implementations did O(n) size() and O(1) splice(), which is more useful. I generally use lists if I need a fast splice(), otherwise I'd use vectors.

I've seen too many bogus comparisons of std::string vs. something else. For example, the equivalent of 'strcat(s,a); strcat(s,b); strcat(s,c)' is 's.reserve(enough); s+=a; s+=b; s+=c' (no temporaries, at most one allocation). Not 's = s+a+b+c' (three temporaries, about four allocations). The advantage of STL simply is that it makes it hard to write wrong code. Writing efficient code still needs some thoughts. In my opinion, that's still a quite useful tradeoff.

Stefan

Vote

W

Wilco Dijkstra 18 years ago

You can represent the carry in C: if ((x + y) < x) tests that the carry of an unsigned addition is set. I taught a compiler to use this idiom to generate an add with carry instruction in the following:

xl = yl + zl; xh = yh + zh; if (xl < yl) xh++;

would produce:

ADDS xl,yl,zl ADC xh,yh,zh

Does this allow you to select which instruction sets the carry?

Wilco

Vote

W

Wilco Dijkstra 18 years ago

You're right, but unfortunately s = s + a + b is how it is often written. I believe the C++ standard was quite foolish to explicitly disallow the obvious (and correct) optimization into s += a; s += b;

But even with the correct way of writing this I am seeing a factor 5-10 penalty for using STL strings vs strcat/strcpy. I believe that kind of overhead is too much even for desktops.

Wilco

Vote

W

Wilco Dijkstra 18 years ago

concatenations to calculate the key and

VC++. You still get the 50 times slowdown with VS2005 when doing it the obvious way (see the other post).

Wilco

Vote

C

cbarn24050 18 years ago

ng

I tend not to write programs twice just to see how some tool or other performs. I did however have to redesign an existing product due to processor unavailability. The original was in C around 60kbytes from memory, the redesign was in assembler for a PIC, what else, came in around 1.5k bytes. It wasnt a speed critical app so the speed performance wasn't measured but the pic version worked noticeably quicker. If you would like to write an existing small peice of code in C to see how you compare, the rest can join in as well, I'm quite happy to see how you do. It wont take you long but I'd advise you to pass on it.

I still don't see your point, are you trying to use your compiler as an assembler? It will let you do that. Thats the mistake others make when trying to justify their compiler claims. They write in assembler on a C compiler, so when it comes out the same, as it must, they think they have proved something, which they haven't.

e

't

in

I dont think so, RISC is very tedious for assembly, PPC is an overly complex chip for assembly, I know nothing about cyphers. This kind of project is very obscure for an embedded newsgroup, we're still dominated by 8 bitters here.

Vote

W

Walter Banks 18 years ago

Neat example, elimination of conditionals and generated code at the same time.

I assume you mean in the arm RISC sense. In the eTPU we used intrinsics to control the condition code generation.

In most embedded systems compilers we can access all the processor registers including status registers (ISO/IEC 18037) referencing condition code registers in math operations that map back to the instruction set is a minor code generation issue.

Things like multiple variable shifts are both useful and generally portable. Other operations

registercc CC;

CC.C = 1;

if (CC.C == 0) { ... }

As with all things close to the processor some care needs to be exercised. Carry after subtracts or compare to 0 vary processor to processor.

In the code example addx on the MC680x0 the x bit in the condition code register is rare and probably safe only to the 68K.

Regards,

-- Walter Banks Byte Craft Limited Tel. (519) 888-6911 Fax (519) 746 6751

formatting link

snipped-for-privacy@bytecraft.com

Vote

P

Patrick de Zeester 18 years ago

Speed improvements in the order(s) of magnitude are in my experience only achieved by changing the algorithm. The best improvement I have ever been able to achieve by rewriting code to assembly was 6x, and that was because in that particular case I could use tricks that could not be expressed in C (or any other high level language for that matter). In most cases improvements in the order of none to a couple of tens of percents can be expected by manually rewriting code to assembly. Usually you gain more by approaching the problem from a more high level perspective. Besides that many processors have become harder to optimize for by hand (the days of cycle counting are over), compilers have become a lot better and programmer time is usually spend better on other things. That being said it is still fun to out do the compiler once in a while.

Vote

P

Patrick de Zeester 18 years ago

So you are comparing apples with oranges, the orange felt quicker but you didn't measure it, and that proves your point that assembly code you write is usually at least 10x as fast as the equivalent program generated by a C compiler. Impressive evidence indeed.

It would be wise to select a target processor and compiler to bend things in your favor, otherwise it might become difficult to prove your

10x speed improvement claim.

Are we? 16-bit and 32-bit processors appear to be quite common in embedded applications (ColdFire, MSP430, MIPS, ARM, ARM, ARM), especially for the kind application one would prefer to use high level programming languages. Your 8-bit statement and your PIC experience does give a clue where your claims are coming from.

Vote

P

Patrick de Zeester 18 years ago

Which has its limits, especially in the context of a compiler. Sometimes as programmer you have more knowledge than what you can express in C (for example the real range of a variable). An assembly programmer can take advantage of this knowledge, the C compiler will just have to assume the worst.

Vote

P

Patrick de Zeester 18 years ago

In case of the good old 56000 DSP I can imagine that to be true. I remember a 56000 developer I used to work with which was particulary good in translating algorithms to handcrafted 56000 code. In this case the inner loop took in C about 10 statements, the 56000 code was 2 or 3 instructions. It took him more than a day and several iterations to get there, and the code had no resemblance what so ever with the original C code.

GCC isn't generally know for its optimizer, often there are better alternatives.

It also depends on the architecture itself, some architectures are just no good targets for C compilers.

Vote

C

cbarn24050 18 years ago

wing

I dont think so, I'm comparing a program written in C, with an equivilent program written in assembler. Isn't that what we're discussing?

the orange felt quicker but

If only I'd known this threat would come up in 5 years time I would have.

and that proves your point that assembly code you

I dont think you'll find I said that.

Impressive evidence indeed.

As I said I do have a time critical program written many years ago in assembler, you can try it in C if you want.

et

ece

sn't

he

r in

Yes we are, and have been for a very long time. coldfire, MIPS and MSP430 are minnows, ARM is making big inroads but has a long way to go to catch up.

Indeed, from the real time world where things need to happen when they need to happen.

- Hide quoted text -

Vote

C

CBFalconer 18 years ago

... snip ...

This whole argument is fairly ridiculous, because none of the contenders separate the influence of the library. Such a library may include many abilities that are simply not used in the application, and can easily represent the majority of the code.

Chuck F (cbfalconer at maineline dot net) Try the download section.

Vote

W

Wilco Dijkstra 18 years ago

No. You are comparing a C program on one architecture with one redesigned from scratch in assembler for a different architecture. You can't get any more different that...

We are talking about writing two equivalent programs for the same architecture. The assembler version can do any tricks it likes, but the same tricks are obviously allowed in the C version (I think that is the bit you're missing). The only difference is the language used.

I have done this many times - I usually develop assembler code by first writing a working C version.

You said earlier "An order of magnitude is pretty much the norm.". I claim that that is pretty much impossible.

RISC assembler is actually easier to learn and write, especially compared to the older 8-bitters. It's something you have to try to understand.

formatting link

Maybe true 5 years ago but not today. 16/32-bit have 70% of the micro controller market, while 8-bit is shrinking fast. Over 3 billion ARM's will be sold this year, what is the yearly volume of say 8051 or PIC?

Wilco

Vote

code optimization in embedded systems

Join the Discussion

Didn't find your answer?