reducing flash size in embedded processors? - Page 2

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it

Then I guess I've never seen "an experienced and good assembly
programmer": somebody who can take advantage of delayed
branches, piplelines, register windows -- somebody who can keep
track of a dozen different variables and intermediate values in
registers and on the stack, and all the other things that
compilers are good at. I suspect those people are so rare these
days that the chances of finding one are negligible.

Quoted text here. Click to load it

--
Grant Edwards                   grante             Yow!  Does someone from
                                  at               PEORIA have a SHORTER
We've slightly trimmed the long signature. Click to see the full one.
Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it

They are probably the ones who wrote the compiler code generator
and optimizer.

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)

Quoted text here. Click to load it

... on *tiny* examples only, and even then by a small margin (say
< 20%). This is typically because a particular trick is used that is
not available to the compiler. I've used the stackpointer as a general
purpose register by saving it in a global variable. Not something
compilers will ever do (especially not if you still need to take
interrupts!), but great if you're trying to squeeze out the last few
percent of a bit blitter. However such tricks only apply in very
specialized circumstances.

When you take any non-trivial amount of code (say 10+K lines) then
it is obvious even the best assembly programmer is going to lose big
time against a compiler. Even if he is able to apply some of those
tricks, he cannot routinely apply all the global transformations
compilers do, even if given an infinite amount of time. Humans
simply cannot do it - just like we can't compute the 100-th digit
of sin(x) in our head.

Quoted text here. Click to load it

How would that matter? The C++ programmers were most likely not
experts anyway. They moved to a new language and architecture
and made a significant improvement doing so. A lot of the improvement
was due to the compiler being able to aggressively inline functions and
remove redundant code exposed by inlining. Not something assembly
programmers can do.

Wilco



Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it

I think this discussion goes the wrong path. There is no sense in an
"Assembly vs Compiled" kind of thinking in that both aproaches if
taken as a religion are plain wrong! If possible, IMHO a better
aproach is to code in a higher level language like 'C' or the like,
then if the specs require it make an in depth performance analysis of
the resulting code. Take a very close look at the code that is time
critical and called frequently. So far I always was able to optimize
these cases. Sometimes the optimisation was done by choosing a better
algorythm or by using a different aproach to the "sub problem",
reversing the order things are done - you name it. Then there are also
those rare cases where parts of the code were hand crafted in
assembly. There ARE examples where assembly can be better. Just
imagine a situation where you have to send data from say ComactFlash
out to the lan as fast as possible. If you use assembly, you can
create code which at the same time it reads the CF data register
calculates the TCP checksum in an aditional CPU register before the
data is stored into the network controller. Such an optimisation would
avoid fetching the data twice. I don't think it would be easy or
possible at all to do the same thing in 'C' so it's IMHO sometimes
better to use the brain instead of following a dogma blindly :-)

Just my 2 of course.

Markus, running to get a flame proof suit :-)

Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it
... snip ...

No argument.  I was simply pointing out a theoretical limit.  It is
almost always cheaper to spend money on more hardware than on the
last ounce of efficiency and compression.

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)


Quoted text here. Click to load it

It clearly depends on the unit cost increase multiplied by the number to be
manufactured versus the cost of developing the more efficient version.  if
the quantities are large it is almost always cheaper to spend the money on
additional development.

Ian
--
Ian Bell

Re: Assembly vs Compiled
Quoted text here. Click to load it

Again, agreed.  I just haven't spent my life in the large volume
situation, but I have spent it in the reliability situation.  Now
consider Microsoft ....

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Assembly vs Compiled

Quoted text here. Click to load it

Do I have to? ;-)

Ian
--
Ian Bell

Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it

But when managers decide to pay the extra HW costs to get Linux
running in there product (because it is sooo stylish) the extra costs
for a slightly faster CPU are irrelevant.
--
42Bastian
Do not email to snipped-for-privacy@yahoo.com, it's a spam-only account :-)
We've slightly trimmed the long signature. Click to see the full one.
Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)

Quoted text here. Click to load it

Sorry, I didn't think we were discussing poor management.

Ian
--
Ian Bell

Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)


Quoted text here. Click to load it

Yes, I agree with all that. The problem however is that many people
skip the intermediate steps. They write some bad code, compile it
(sometimes even without turning on full optimization), look at the
compiler output, then decide their only option is to write it all in
assembler... As you say in most cases you can optimise the design,
algorithm and implementation to get most, if not all, of the possible
improvement. Only if all else fails and you know you can beat the
compiler, you should consider of doing it in assembler.

Quoted text here. Click to load it

Hmm, now you're falling into your own trap... How do you mean it
wouldn't be possible in C? Because C is a low-level language there
is almost a one-to-one mapping from the source to assembly language,
so just about anything you can write in assembly code can be written
in C (the only exceptions are features C doesn't directly support, such
as saturated arithmetic - however most compilers provide intrinsics
for these sorts of things). In fact when doing really low-level source
optimizations, I typically use simple assignments which compile
into single instructions, so there is virtually no difference between
doing this and writing assembler. For example (using ARM syntax):

x = *p++;  // LDR x, [p], #4
a = b + c;  // ADD a, b, c

A simple 32-bit CRC checksum that compiles to 4 instructions
could look like this:

sum = ((sum << 8) | (sum >> 24)) ^ crc_table[data];

The equivalent low-level form:

ptr = &crc_table[0];  // LDR ptr, =crc_table
tmp = ptr[data];         // LDR tmp, [ptr, data]
sum = (sum << 8) | (sum >> 24);  // MOV sum, sum, ROR #24
sum ^= tmp;               // EOR sum, sum, tmp

If it was a function you would need one extra instruction to return
of course.

Quoted text here. Click to load it

Don't worry, you're making sense!

Wilco



Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)

news:_fG8d.605

Quoted text here. Click to load it

Can you explain what saturated arithmetic is, please ?

Richard [in PE12]



Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
"Endymion Ponsonby-Withermoor III"
Quoted text here. Click to load it

It's when you get overflow (normally comes with improper sign of the
result), you replace the overflowed value with maximum possible or minimum
possible, but with correct sign.

Example (16-bit signed integer arithmetics):

32767+1 yields -32768, which is overflowed and thus -32768 is replaced with
32767 (maximum, which is much closer to 32767+132%768 than -32768 is)

-32768-1 yields 32767, which is overflowed and thus 32767 is replaced
with -32768 (minimum, which is much closer to -32768-1=-32769 than 32767 is)

This is useful in integration/summation, in control systems, where you can't
afford such huge errors due to the possible overflows.

It just doesn't let it invert the sign and keeps it between the minimum and
maximum.

Alex



Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it
Worse, the change in sign is likely to cause the control to work in the
wrong direction (Imagine pressing the brake and having the car speed up).

Robert

Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it
...
...
can't

That's implied, obviously. No one would like to drive such a car. A test
drive could be the last drive. :)

Alex



Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
On Thu, 7 Oct 2004 00:40:33 +0400, the renowned "Alexei A. Frounze"

Quoted text here. Click to load it

Reversal of phase in a control system is very bad news indeed.


Best regards,
Spehro Pefhany
--
"it's the network..."                          "The Journey is the reward"
snipped-for-privacy@interlog.com             Info for manufacturers: http://www.trexon.com
We've slightly trimmed the long signature. Click to see the full one.
Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it

I almost broke a foot that way.

Our torque tool would run up against the stop and the error integral
would get larger and larger until the hydralic valve was all the way
open and the tool putting out around 2200 foot-pounds of torque.

Then *bam*, the error term rolled over, the hydralic value slammed to
the other stop, and the 75 pound tool would do a nice back flip as it
immediately tried to apply 2200 foot pounds of torque in the other
direction.

Since the test setup was near my desk, debugging that problem meant
standing way the hell out of the way.  At first I wouldn't believe it
was a software problem ("Gee Jeff, where do you buy such crummy
proportional valves?").  Now I have a better understanding of saturating
math and defensive coding and keep an open mind when troubleshooting. :)

Probably still some pictures of the tool online at www.underc.com.
"Turning Point 2000" was the product, although the control system was
almost identical to the big tool.

Kelly

Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it

[arm example code snipped]

For the ARM you are right and as such my example was badly choosen.
The CPU I had in mind is having a severe addressing limitation when it
comes to addressing external memory (memory with > 16 addresses) which
make the compiler generate code that initializes and loads and
initializes the only address register pair spending the wast majority
of cycles only doing this. In the particluar case using internal
memory was no option. Hence, writing the routine in assembly really
saved more than 70% of cycle time as oposed to the generated code. In
general this particular compiler does well, but there are special
cases where it does not. We are back to my statement. It pays off to
first use HLL, then analyze and change algorithms etc. if possible. If
this is done then maybe assembly can do the trick if the gain in speed
and the amount of work needed to get there justify it.

Markus

Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
Quoted text here. Click to load it

Well, this has been the mantra ever since we had languages more
advanced than assembly language, and it used to be unequivocally true,
no doubt. However code for modern 32-bit(+) micros is very difficult
to hand-optimize, especially for speed; I'm no longer firmly convinced
that this statement is globally true. And I'm /utterly/ convinced that
sometimes a HLL is overall the most efficient way of doing things,
even sometimes on "small" (8-bit) systems.

There are so many subtle things to remember on these complex modern
cores. I doubt there are many, if any people who can sit down and
hand-write "the optimized loop" for a given function on x86, with the
same degree of one-pass optimization that, say, a really experienced
6502 programmer could show.

*Maybe* given enough time and analysis, a really dedicated and skilful
programmer could beat the world's best compiler. The human will
probably be able to perceive the global (system-wide) context better,
hence can perform some cross-module optimization. But this argues that
the algorithm should be restructured so the compiler can do the hard
work.

Re: Assembly vs Compiled (Was: reducing flash size in embedded processors?)
snipped-for-privacy@larwe.com (Lewin A.R.W. Edwards) writes:
Quoted text here. Click to load it

My experience with many compilers for many different languages
over the years has been that a decent (not a super) ASM programmer
can reduce a program size and/or increase its speed by about a
factor of 4.  The only exceptions I've ever encountered were the
compilers DEC produced for the VAX -- it was very difficult to
beat those.

[I fed one of the "standard" benchmark programs to the VAX
 Fortran compiler one time for fun.  The comiler analyzed
 the source, realized that the program did nothing useful,
 and optimized it to zero code and zero running time.]

Quoted text here. Click to load it

Site Timeline