code optimization in embedded systems

Are you related to "the" Wilco? ;)

--
Grant Edwards                   grante             Yow! I want to kill
                                  at               everyone here with a cute
 Click to see the full signature
Reply to
Grant Edwards
Loading thread data ...

Ah. I forgot about the rounding towards zero requirement -- that's fairly new isn't it?

IIRC, Python and Pascal do it the other way (with a "mod" operation to match).

--
Grant Edwards                   grante             Yow! YOU PICKED KARL
                                  at               MALDEN'S NOSE!!
 Click to see the full signature
Reply to
Grant Edwards

Then you will have trouble implementing the 'sizeof' and 'offsetof' operators. Or even simple things as constant folding: your front or middle end should know whether 1000*1000 is 1000000 or 16960.

It gets short when you write everything into the class declaration, as inline methods: class MyState : public AbstractState { public: void onKey1Press() { puts("key 1"); } void onTimer() { puts("timer"); } }; It gets a little longer if you spell it out (i.e. class declaration and later an implementation of all methods). The state handling function looks like this void MyStateFunc(event e) { switch (e) { case KEY1: puts("key 1"); break; case TIMER: puts("timer"); break; default: return OtherStateFunc(e); break; } } and can hardly be made longer.

Stefan

Reply to
Stefan Reuther

"...in an embedded environment, someone familiar with both disciplines can routinely achieve an order of magnitude improvements in speed of selected functions. The trick is to be able to determine which functions to rewrite in assembler."

I guess I should have capitalized the self-limiting words in the above sentence for you guys.

Ken Asbury Out

Yow! If it WASN'T Oscar Wilde who said, Don't argue with fools. People can't always tell the difference," it SHOULD been.

Reply to
Ken Asbury

I try to avoid saying sentences containing "always" when talking about a compiler. A table jump may very well be faster than an 'if' (and a smart compiler might even have the idea to merge an 'if (x == 3) ... else switch(x)' into one switch, making your transformation pointless).

Depends on the CPU. x86's do better with signed arithmetic. The three-operand multiplication instruction only comes signed, and the integerFPU conversions produce signed values only as well.

In an embedded context, it also makes sense to find a way to leave global variables completely uninitialized. This is compiler specific, but it can save a lot of startup time. The huge buffer that's going to be used for receiving data doesn't need to be zeroed first.

Stefan

Reply to
Stefan Reuther

Such a requirement is in the C99 standard, but was not in C90. The more important problem is that 'z=4; i/z' should yield the same value as 'i/4', and if the processor's divide instruction rounds towards zero, the ASR replacement code must emulate that.

Not that I know of. At least, Turbo Pascal uses the same 'div' instruction as C compilers of the same era.

Stefan

Reply to
Stefan Reuther

Look at gcc56k and write a simple FIR filter. Then do the same in 56k assembler and compare. Our 56k developer once showed me the listings, and there easily is a 10x difference.

Granted, gcc56k is really old, but when I evaluated Blackfin gcc a while ago, I also saw a magnitude difference to VisualDSP++ for similar computationally intensive code. gcc simply isn't yet smart enough for that target (maybe it is today, haven't looked for a while).

Billions of ARM processors are shipped each year, hence good tools for them are widely available and understood. For lower volume architectures, it might look much worse.

Stefan

Reply to
Stefan Reuther

You haven't looked at the output produced by the compiler Microchip supplies for the PIC24/PIC30.

On the other hand, I've found that the 4X is a good estimate of the improvement that can be made.

Reply to
Everett M. Greene

he

n

be

No, it doesn't much matter how good the compiler is.

No, assembly fairs less well on difficult targets.

Err, have no idea what your saying here.

We're all gifted.

I'm quite sure I wouldn't even try.

Reply to
cbarn24050

There is also the Itanium floating point register file which really rotates.

I mean conditional execution as a more general form of conditional move (some architectures have tried to call it predication). On ARM you can execute if statements without branches:

if (x == 100 && y == 10) z++; else z--;

CMP x,#100 CMPEQ y, #10 ADDEQ z,z,#1 SUBNE z,z,#1

The interesting thing is that compilers can spot optimizations like this even in cases that are non obvious to humans (eg. comparisons are far apart or even using different conditions).

Wilco

Reply to
Wilco Dijkstra

For a table jump you'd typically need to do a range check (unless all case values of the type switched on are used), address the jump table and then either load from the table and do an indirect branch or indirect branch into the table and then do a direct branch to the code that implements the case.

So even if indirect branches are as fast as branches, it is a lot more work than a simple if statement (compare+branch). A linear search would be the closest to an if statement.

I'm sure unsigned division is still faster on the latest x86s. Other operations don't have an obvious preference indeed. The missing unsigned FP conversions are odd, I suppose this may be because floating point is by definition signed.

That's a good point. I've heard about cases where it took many seconds to clear megabytes of zero initialised memory due to running it before setting up the clock and caches etc. "noinit" is a typical extension to avoid clearing memory unnecessarily.

Wilco

Reply to
Wilco Dijkstra

the the user has control over which instructions will set the condition bits (e.g. the subract instruction can be specified so that it determines the condition bits or so that it leaves them alone).

In ARM assembly, Just trying to keep track of where the condition codes came from is a headache and source of bugs -- when humans do it. Compilers are quite good at that sort of thing.

--
Grant Edwards                   grante             Yow! LBJ, LBJ, how many
                                  at               JOKES did you tell today??!
 Click to see the full signature
Reply to
Grant Edwards

Then, with all due respect, I'd have to conclude you're either blissfully ignorant or stark raving mad.

There's no way the middle layer of a traditional three-layer compiler design can be kept free of target-specific details without severely impairing its abilities.

You might as well replace a car's windows by sheet metal to keep the driver free from road-specific details --- such as traffic lights, signs, or the current positions of other cars nearby.

Reply to
Hans-Bernhard Bröker

That there can be no "the" correct advice on issues as complex as these. The answer to this kind of questions is: "It depends." Every single time.

Where of course a generalization like "always faster" would be impossibly hard to prove. An guaranteed exhaustive set of C compilers, anyone?

[...]

Absolutely. In the end, the same old truism about micro-optimizations still applies:

1) Don't do it. 2) _Still_ don't do it. 3) OK, if you still think you must, measure before you modify.
Reply to
Hans-Bernhard Bröker

You should tell this to the people who thought being a master of any assembly. The guy who told about being able to outperform a compiler, did not reply. The challenge still applies. The Master Of Assembly Shall Fill The Pipes of a PPC, and schedule those insns. I have demonstarted, that I'm not just an academic piece of shit. So any so called "Master Of Assembly" should contact.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

You have been challenged. Give an example where you are able to outperform even a s**te C compiler _in order of magnitude_.

Of course we are. addx on the MC680x0 is one example of a mechanism which cannot be expressed in C. This is a trivial example.

Someone was talking about C compilers not being able to detect bit rolling; it is quite simple, by using algebracic identities. Even the lowly SAS/C on the Amiga did this.

So, what are you pointing at then? You made a statement that one can beat a compiler by an order of magnitude. You are proved wrong.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

^^^^"""""

Your first statement is, how would I say it in a nice way, ie. piece of s**te.

You have been challenged. I'm waiting.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Hey, DSP stuff is something really different..

.. although even gcc is able to do vectorization these days.. on the targets that allow it. Note: I'm not a DSP person, so I might be writing about something about I don't have been implementing. FFT, DFT, these sound familiar.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Maybe this CPU is _the_ case of an architecture for which is difficult to emit good code. I did look at the arch., and it seems to be quite nasty. This is the case of the another wall of RISC machines being on the another wall.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

Well - how would you explain something being NP-complete in simple way?

I do mind.. :)

The "difficulty" is that it has "pointers"..

Now, really, why I started this free time project, was that a dear friend (being a colleague and supporting our studies) of mine (Aki Laukkanen and his sister) was killed by a guy while being chased by the cops, and being in amphetamines and ethanol. With Aki, we tried to modify the gcc back-end to generate good code for the MC68060. I find it somehow being responsible, that I should do this.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/
Reply to
Jyrki Saarinen

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.