code optimization in embedded systems

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 3:31 AM

If you don't understand what NP means, read my statement again. Also, I provided URLs which may explain this issue.

I'm not a compiler writer. In Finland, it is very rare that one gets such a job. In Nokia it might be possible, but whatever. You simply don't seem to understand. That is not my problem.

In general, I'm just interested in implementing a modern aggressively optimizing compiler (also re-targetable), and I chose Pascal from the

70's. It has pretty simply syntax and semantics. Also, because of my background in the demo scene, I find it interesting to create a compiler that would finally schedule insn for MC68060 (gcc doesn't do that, neither vbcc).

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 3:32 AM

But, now you are challenging the masters of assembly! It is a magniture faster than crappy code emitted by a compiler.. ;)

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 3:35 AM

(this really should be in comp.compilers, but perhaps this could be educational for .. people)

For your statement, on the level of IR code, how are you supposed you know whether it does pay off to do linear search or binary search?

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 3:42 AM

Well, that is the common way to do that. But if I want to keep the "middle-end" free of target specific details?

Ok. I have to confess, that I have never been on that freq. On the Amiga, OTOH, everything was synched into the 50Hz screen. But that is an another issue again, which isn't really related to this group.

So does the compiler when it'll decide to make your switch-case

-statement into a table? I don't quite understand what do you mean by increasing the source size. Normally, when one applies the State pattern, the source gets quite small. This pattern can be made in C, even though it is PITA.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 3:48 AM

I don't agree.

By the way, are you related to "the" Dijkstra?

Now, you are being a troll. As it has been stated _several_ times in this thread (which was created by a troll), you need to do this only with very bad compilers.

A hint: Single Statement Assignment (SSA)

The first sentence is quite nice. I have seen (or forced) to refactor code in C++, which did not use STL. Instead, some wise guy did his own implementation of string and hashtable. Which is nice.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- R
- robertwessel2
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 4:14 AM

Indeed: The usual approach is to turn a sparse tree into a hard coded binary search for the upper part of the tree, and then a linear search for the lower part. The point where you switch depends on the cost of mispredicted branches. In the binary tree part, about half the branches will mispredict (not counting the branch-eq leg of the common compare / branch-gt / branch-eq construction), in the sequential part, only one (the one that's actually taken) will mispredict. Each extra layer of the binary search adds a compare, and half each of a predicted and mispredicted branch, but halves the number of compares and correctly predicted branches in the sequential part.

And of course switches with both spares and dense sections often generate code that does computed branches in the dense parts of the tree.

- G
- Gene S. Berkowitz
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 4:28 AM

Well, there's your problem right there. Here in the USA, we had to do everything 20% faster than you because our Amigas refreshed at

60 Hz...

--Gene

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 5:19 AM

Right on.. in the US Amiga wasn't that popular. Though you used to make several kinds of peripherals, at very good quality.

Now we are once again out of context..

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- N
- Neil
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 7:18 AM

Is this for Inline assembler or, A Full ASM function in its own module?

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 10:39 AM

No, but I'm planning to be more famous so people stop asking that question.

The word contiguous doesn't imply linearly incrementing, it means no gaps. Compilers always sort case values so the order is irrelevant. If your case labels are not dense (ie. have large gaps) then you're unlikely to get a branch table.

Now you're trolling. SSA has absolutely nothing to do with switches... FYI, few compilers use SSA today and the ones that do only use it in a limited set of optimizations.

C++ and STL is a different matter altogether. I would not advocate STL in programs that need to be efficient, use as little memory as possible and be small (ie. most embedded systems and many big desktop applications). It's great for fast development or if you simply don't care about memory use or performance.

STL often uses too much resources even for desktop systems, so many performance critical applications avoid it. I've seen cases where STL string concatenation slowed down command-line parsing so much that command-line parsing it became the most performance critical part of a compiler...

Many performance critical programs use their own memory allocation strategies as new/malloc are not good enough. So in specific situations it is a good idea to create your own implementation - but you'd have to have a really good reason and do significantly better than existing solutions.

Wilco

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 10:47 AM

The code generated for i/4 goes like this (for 32-bit int):

res = (i + ((unsigned)(i >> 31) >> 30)) >> 2;

ASR cannot be used for division by powers of 2 as it rounds towards minus infinity rather than zero, so an explicit round step done to round the result up if it is negative.

Wilco

- K
- karthikbalaguru
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 10:50 AM

I came across the following w.r.t ARM processor w.r.t Efficient C:-

1) Unsigned types are faster during divisions. 2) Do not use implicit/explicit narrowing casts as they will cost extra cycles. 3) Do not use char and short types for function arguments or return values. Try, using the int type even if the range of the parameter is smaller. This makes the compiler not to perform unnecessary casts.

Karthik Balaguru

- K
- karthikbalaguru
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 10:56 AM

Optimization techniques largely vary from processor to processor w.r.t architecture, compiler, linker , OS and many other factors.

Karthik Balaguru

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 11:06 AM

comp.compilers is not easy to post to so many of us don't bother.

As I said, you query the backend. The intermediate code is usually very close to the final code, so you can ask questions like: "how many cycles/ bytes is this compare going to take?". A good compiler asks the backend similar questions all the time so that most target-specific decisions are made well before the IR enters the backend.

However in this case it is good enough to ask at how many case labels one should change from binary into linear search for the remaining cases.

Wilco

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 12:00 PM

I wouldn't consider intermediate code being "close to the target". Quite often it is a "three-address code" in SSA form. Your other statements apply, of course.

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- J
- Jyrki Saarinen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 12:06 PM

:)

I was trying to make a point of SSA enabling some (actually quite many) optimizations which are algorithmically more difficult to do in non-SSA form.

STL has very carefully selected algorithms, and being based on templates, it is also likely to be faster than any custom made data structure(s).

I don't undestand how a custom made linked list for integers (for example) should be any faster compared to:

formatting link

You must be kidding (about the last sentence). How many command line options did you have, a million?

--
Jyrki Saarinen
http://koti.welho.com/jsaari88/

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 12:11 PM

Yes. Our libraries and startup code is written in C.

Regards

-- Walter Banks Byte Craft Limited Tel. (519) 888-6911 Fax (519) 746 6751

formatting link

snipped-for-privacy@bytecraft.com

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 12:31 PM

Actually he is right. NP complete problems can be solved in reasonable time as long as the problem size stays small. It only becomes impossible to solve as the size grows.

The fact is compilers and humans can produce optimal code for small functions (have you heard about super optimizers?). It only becomes impractical for large programs.

If you are interested in such a job and don't mind going abroad then I know several UK companies who are keen to hire.

Indeed Pascal is not bad choice to start with.

Wilco

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 12:40 PM

it

Most targets use 3-address instructions or a subset thereof, so it is a good match. In a compiler I worked on the intermediate code could represent almost every feature of the target, including some of the more complex features like conditional execution and addressing modes. So even for CISC targets you'd be close to final code.

Wilco

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Nov 16, 2007 1:12 PM

Actually the quality and performance of STL varies greatly between implementations. Some functions have algorithmic complexity guarantees, but even that doesn't necessarily result in good performance. If it guarantees an operation in O(log(N)) time but you need it in O(1) time you're stuck.

Just look at the size() operation - it takes O(N) time. Are you saying that it is not possible to do this in O(1) in a custom implementation?

No, just the usual ~5 arguments. The compiler slowed down by about a factor of 2, and the startup time was noticeably slower. The commandline options were stored in a string hash (key->value) and needed various string concatenations to calculate the key and return the result. The STL version was a literal translation from C into STL C++. The slowdown was caused by doing lots of lookups during compilation based on the assumption they were fast operations - which was true for the C implementation but not for STL.

At the time nobody would believe STL could be that slow, but I compared STL strings with C strings, and you do indeed get a 30-50 times slowdown when doing strcat and similar operations. The total slowdown of the lookup was initially around 400 times and after a week of hard work the author got it down to only 200 times slower... That was the point I decided to ban the use of STL even in non-performance critical code.

Wilco