Faster for() loops?

Hi Folks,

formatting link
Page states:for( i=0; i

Reply to
Neo
Loading thread data ...

Hi,

Neo wrote: [...] In tight loops, this make a

There is nothing like an experiment to test a theory. I just tried with AVRGCC

void countDown(void){ int i; for(i=10; i!=0; i--) doSomething(); } void countUp(void){ int i; for(i=0;i

Reply to
Al Borowski

Many micros have a decrement jmp if zero (or non zero) machine instruction so a decent optimising compiler should know this and use it in count down to zero loops. Counting up often needs a compare followed by a jmp zero (or non zero) which will be a tad slower.

Ian

Reply to
Ian Bell

The answer is "implementation-dependent".

A major advantage of writing in C is that you can, if you choose, write understandable, maintainable code. This kind of hand-optimisation has the opposite effect. If you really need to care about exactly how many instruction cycle a loop takes, code it in assembly language. Otherwise, for the sake of those that come after you, please write your C readably and leave the compiler to do the optimisation. These days, most compilers can optimise almost as well as you can, for most "normal" operations.

Regards,

-- Peter Bushell

formatting link

Reply to
Peter Bushell

Regardless of the performance issue, I'd like to point out that after for( i=10; i--; ) finishes, i will have the value -1, since the decrement is performed even if i is zero. This is counterintuitive, so it's worth noting. It also means the following is not equivalent:

for (i = 10; i != 0; --i)

Since here one less decrement is performed. Incidentally, my compiler/platform generates better code with this version -- it compares i to -1 in the other, which is no better than comparing it to 10! If you want to count down, I suggest writing what you mean and separating the test and decrement parts -- it has the added bonus of making things more readable. The rest is best left to the compiler.

S.

Reply to
Skarmander

Neo wrote On 09/25/05 23:41,:

Unroll it completely.

Reply to
Scott Moore

It may or not save a couple of assembly language instructions, (of course depending on the compiler and processor used,) but I doubt this "noptimization" will make any noticeable change in the performance of a program, unless your code consist mainly of empty for() loops.

What impact can a minuscule reduction in the time required to decide if the loop has ended or not have, if the body of the loop, for example, call functions that format a CAN message, deliver it, wait for a response, retry if there were errors or timeouts, decode the response, store the values in a serial EEPROM, and based on them start a few motors, open pneumatic valves, optionally sending an email message to Katmandu.

That is not an optimization, but a total waste of time. Read the first example in "Elements of programming style" and learn...

Roberto Waltman

[ Please reply to the group, ] [ return address is invalid. ]
Reply to
Roberto Waltman

What if the difference is between fitting into memory and not?

Reply to
Joe Butler

(that reversing loop order is faster)

The page is talking rot. It *may* be faster. It *may* be slower. The only way to know is to benchmark your particular implementation in the specific case you're examining.

Benchmark.

--
Mark McIntyre
CLC FAQ 
 Click to see the full signature
Reply to
Mark McIntyre

The subject line was "faster for() loops", not "smaller and/or more memory efficient for() loops"

If you must shoehorn code into a microcontroller memory that is one size too small, you do what you need to do. But I would look somewhere else first:

  • Are there libraries including code that will never be used?
  • Is there common code that could be factored into functions?
  • Are there text messages that could be shortened?
  • Are there any compiler options that could produce smaller code?
  • Is there a better compiler available?
  • If the problem is RAM, can smaller data types be used, (pack structures, use char instead of int for small values, etc.)
  • etc.
[Dropping x-post to comp.lang.c]

Roberto Waltman

[ Please reply to the group, ] [ return address is invalid. ]
Reply to
Roberto Waltman

Depends what you're doing. If you're accessing a large chunk of memory on a system with cache, you want to go through incrementing addresses to maximize the use of cache. Decrementing through memory is generally pessimal.

--
#include 
 _
 Click to see the full signature
Reply to
Kevin D. Quitt

Question: How can I optimise code better than the compiler? Answer: If you ask, then you can't.

Reply to
Christian Bau

I think I disagree.

If you can fit something into a cheaper processor model because you save a couple of bytes by changing 1 or two loops, then you are not in trouble anymore.

Reply to
Joe Butler

Don't top post. Replies belong after the text you are replying to.

Don't include peoples signatures unless you are commenting on them.

I'll be more explicit then. EVERY SINGLE TIME I have come across a system where people have tried to squeeze the code in believing it will just about fit (either size or speed) one of the following has happened:

1) Customer required a subsequent change which proved to be impossible unless the card was redesigned because there was no space for the new code. 2) A bug fix requires some additional code and oops, there is no more space. 3) By the time all the required stuff was added that the person who thought it would only just fit had forgotten it did NOT fit by a mile so it did not even come close to meeting the customers requirements 4) It turned out there were massive savings to be had else where because of higher level problems allowing me to save far more space/time than you could possibly save by such micro-optimisations.

Only with the third of those possibilities was it possible to meet the requirements using the existing hardware, and meeting the requirements involved fixing the algorithms or doing large scale changes where the coding was just plain atrocious.

So my experience is that it is never worth bothering with such micro-optimisations.

--
Flash Gordon
Living in interesting times.
 Click to see the full signature
Reply to
Flash Gordon

Reply to
Joe Butler

You could just test it.

I think it's a mistake to obfuscate your loops just for the sake of (what is probably) executing one more instruction which in all likelihood isn't on the critical path of your application _anyway_. If, as you say, you don't use the loop index, you could indeed do without the one extra compare instruction, but you'd probably benefit from loop unrolling too.

Premature optimization is a hindrance to software development.

Mark

Reply to
Mark VandeWettering

Actually, the page is talking rubbish about a great deal more than just this case. It's full of generalisations that depend highly on the compiler and target in question (the post is cross-posted to comp.arch.embedded, so we are looking at a wide range of targets). "Use switch instead of if...else..." (varies widely according to target/compiler and the size of the switch), "Avoid ++, -- in while () expressions" (good compilers work well with such expressions), "Use word-size variables instead of chars" (great for PPC, indifferent for msp430, terrible for AVR), "Addition is faster than multiplication - use 'val + val + val' instead of 'val * 3' " (wrong for most compiler/target combinations).

It's a nice idea to try to list such tips, but the page is badly out of date, and makes all sorts of unwarranted assumptions.

So, as Mark says, benchmark your implementation. Also examine the generated assembly code (you do understand the generated assembly? If not, forget about such minor "optimisations".) And remember Knuth's rules regarding such code-level optimisations:

  1. Don't do it.
  2. (For experts only) Don't do it yet.
Reply to
David Brown

I tend to avoid this kind of loop because it's a bit less intuitive to use with unsigned loop counters. After the loop is done, an unsigned i would be set to some very high implementation-defined number.

There is not much to be gained on loops that only count to

10... that extra instruction 10 times through the loop would only add an extra 10 nanoseconds. This is likely to pale in significance to any useful work done in the body of the loop.

Loops that range over memory should never count backwards, at least not when speed is important. For better or worse, operating systems and memory caches only prefetch when reading ascending addresses.

Reply to
Anonymous 7843

FWIW, my bit-bang SPI output function looks something like

bit_ctr = 8; do { Set_IO(SPI_DATA, (data&0x80) != 0); Set_IO(SPI_CLOCK, 1); data

Reply to
Dave Hansen

[snip]

You need to get the other point, the one about not top-posting.

Brian

Reply to
Default User

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.