Dear All, Two years I attended a Hitachi Embedded Seminar. They presented theirembedded low power Flash microcontrollers there. The Presenter said: "If you look to the die you'll see that the cpu is only a smallfraction and the flash part occupies much of it. Especially if theflash size is >32k." If this is real why don't they change the architecture to save someflash size?Any Comments? I am sure somebody has already worked on it.

Hi, the modern trend is to write single chip applications in C, much easierthan assembler for micros like the Hitachi range. The consequence is that youneed all that extra memory, both rom and ram, for allmost any practicalprogram.

They are probably the ones who wrote the compiler code generatorand optimizer.-- Chuck F ( ( Available for consulting/temporary embedded and systems. USE worldnet address!

reducing flash size in embedded processors?

G

Grant Edwards 21 years ago

Then I guess I've never seen "an experienced and good assembly programmer": somebody who can take advantage of delayed branches, piplelines, register windows -- somebody who can keep track of a dozen different variables and intermediate values in registers and on the stack, and all the other things that compilers are good at. I suspect those people are so rare these days that the chances of finding one are negligible.

Grant Edwards grante Yow! Does someone from at PEORIA have a SHORTER visi.com ATTENTION span than me?

Vote

W

Wilco Dijkstra 21 years ago

... on *tiny* examples only, and even then by a small margin (say < 20%). This is typically because a particular trick is used that is not available to the compiler. I've used the stackpointer as a general purpose register by saving it in a global variable. Not something compilers will ever do (especially not if you still need to take interrupts!), but great if you're trying to squeeze out the last few percent of a bit blitter. However such tricks only apply in very specialized circumstances.

When you take any non-trivial amount of code (say 10+K lines) then it is obvious even the best assembly programmer is going to lose big time against a compiler. Even if he is able to apply some of those tricks, he cannot routinely apply all the global transformations compilers do, even if given an infinite amount of time. Humans simply cannot do it - just like we can't compute the 100-th digit of sin(x) in our head.

How would that matter? The C++ programmers were most likely not experts anyway. They moved to a new language and architecture and made a significant improvement doing so. A lot of the improvement was due to the compiler being able to aggressively inline functions and remove redundant code exposed by inlining. Not something assembly programmers can do.

Wilco

Vote

M

Markus Zingg 21 years ago

I think this discussion goes the wrong path. There is no sense in an "Assembly vs Compiled" kind of thinking in that both aproaches if taken as a religion are plain wrong! If possible, IMHO a better aproach is to code in a higher level language like 'C' or the like, then if the specs require it make an in depth performance analysis of the resulting code. Take a very close look at the code that is time critical and called frequently. So far I always was able to optimize these cases. Sometimes the optimisation was done by choosing a better algorythm or by using a different aproach to the "sub problem", reversing the order things are done - you name it. Then there are also those rare cases where parts of the code were hand crafted in assembly. There ARE examples where assembly can be better. Just imagine a situation where you have to send data from say ComactFlash out to the lan as fast as possible. If you use assembly, you can create code which at the same time it reads the CF data register calculates the TCP checksum in an aditional CPU register before the data is stored into the network controller. Such an optimisation would avoid fetching the data twice. I don't think it would be easy or possible at all to do the same thing in 'C' so it's IMHO sometimes better to use the brain instead of following a dogma blindly :-)

Just my 2¢ of course.

Markus, running to get a flame proof suit :-)

Vote

C

CBFalconer 21 years ago

They are probably the ones who wrote the compiler code generator and optimizer.

Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. USE worldnet address!

Vote

L

Lewin A.R.W. Edwards 21 years ago

Well, this has been the mantra ever since we had languages more advanced than assembly language, and it used to be unequivocally true, no doubt. However code for modern 32-bit(+) micros is very difficult to hand-optimize, especially for speed; I'm no longer firmly convinced that this statement is globally true. And I'm /utterly/ convinced that sometimes a HLL is overall the most efficient way of doing things, even sometimes on "small" (8-bit) systems.

There are so many subtle things to remember on these complex modern cores. I doubt there are many, if any people who can sit down and hand-write "the optimized loop" for a given function on x86, with the same degree of one-pass optimization that, say, a really experienced

6502 programmer could show. *Maybe* given enough time and analysis, a really dedicated and skilful programmer could beat the world's best compiler. The human will probably be able to perceive the global (system-wide) context better, hence can perform some cross-module optimization. But this argues that the algorithm should be restructured so the compiler can do the hard work.

Vote

C

CBFalconer 21 years ago

... snip ...

No argument. I was simply pointing out a theoretical limit. It is almost always cheaper to spend money on more hardware than on the last ounce of efficiency and compression.

Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. USE worldnet address!

Vote

H

Hans-Bernhard Broeker 21 years ago

Obviously not, because a well-tooled assembly programmer can always run the compiler and inspect its output to find ways to improve it (or keep it as it and call it "assembler source code"). The compiler can never do anything like that, so the human programmer has an unfair advantage.

Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.

Vote

W

Walter Banks 21 years ago

A much more interesting exercise recoding an assembly routine in C and looking generated difference from a good compiler. The first time I did this as a serious exercise was with a math library for our compiler that supports the microchip PIC. The C version of the library was as tight as assembler and would run on all variations and addressing modes of the PIC.

w..

CBarn24050 wrote:

Vote

S

Spehro Pefhany 21 years ago

serious

Had you tried the exercise with a math library different from the C standard you would have found a relatively large difference in favor of the assembly program. It stands to reason that the C compiler is (well, should be) very good at doing C things. The further you deviate from those things, the better assembly looks.

Best regards, Spehro Pefhany

"it's the network..." "The Journey is the reward" speff@interlog.com Info for manufacturers: http://www.trexon.com Embedded software/hardware/analog Info for designers: http://www.speff.com

Vote

J

Jonathan Kirwan 21 years ago

Finally. The right point. Thanks!

Jon

Vote

I

Ian Bell 21 years ago

Whilst it may be true that compilers are improving, it is an unfortunate fact that programmers are not. Particularly on smaller embedded systems where resources like RAM and MIPS are in short supply, it is all too easy for programmers brought up on HLLs alone to create algorithms which make poor use of of the available resources. A compiler may well be able to produce code as good as any assembly programmer given the algorithm but that is not where the problem often lies. IME, assembly language programmers are much better at creating resource efficient algorithms, whether coded in HLL or assembler, than programmers with no assembler experience.

Ian

Ian Bell

Vote

E

Everett M. Greene 21 years ago

My experience with many compilers for many different languages over the years has been that a decent (not a super) ASM programmer can reduce a program size and/or increase its speed by about a factor of 4. The only exceptions I've ever encountered were the compilers DEC produced for the VAX -- it was very difficult to beat those.

[I fed one of the "standard" benchmark programs to the VAX Fortran compiler one time for fun. The comiler analyzed the source, realized that the program did nothing useful, and optimized it to zero code and zero running time.]

Vote

I

Ian Bell 21 years ago

It clearly depends on the unit cost increase multiplied by the number to be manufactured versus the cost of developing the more efficient version. if the quantities are large it is almost always cheaper to spend the money on additional development.

Ian

Ian Bell

Vote

P

Paul Keinanen 21 years ago

While I fully agree that even an experienced assembly program could not consistently keep up with a good C-compiler in a large system, a good assembly programmer can improve the performance in some device specific operations, e.g. copying data between normal memory and video RAM compared to the standard memcpy library routine.

First of all, the assembly programmer can force the critical data to be aligned according to memory width or cache line width. Of course, the C-compiler could do this alignment, but if applied to all data, quite a lot of memory would be lost. Often the assembler programmer knows how much data is actually going to be transferred, which helps to unwind the loop for best performance.

Prefetching, i.e. touching (but not transferring) one byte at each cache line will load a cache line into the cache and only after this start the actual copying and by doing this cache preload avoids some memory bandwidth bottlenecks.

While a compiler writer could include such features, using such features in the generated code might not be sensible, if it would apply only to one processor and it might in fact be counterproductive on some other processor version.

It seems to be a sport to write very efficient memcpy routines, but in general, simple memory to memory transfers do not make much sense in real life, since most of such transfers could have been eliminated with better data structure design. However, with wildly different memory read, write or read/write performance, it could make a lot of sense to transfer large amounts of data as a block transfer, especially when no DMA transfer is available.

Paul

Vote

I

Ian Bell 21 years ago

Ans whilst it may be true that compilers are improving, it is an unfortunate fact that programmers are not. Particularly on smaller embedded systems where resources like RAM and MIPS are in short supply, it is all too easy for programmers brought up on HLLs alone to create algorithms which make poor use of of the available resources. A compiler may well be able to produce code as good as any assembly programmer given the algorithm but that is not where the problem often lies. IME, assembly language programmers are much better at creating resource efficient algorithms, whether coded in HLL or assembler, than programmers with no assembler experience.

Ian

Ian Bell

Vote

A

Alexei A. Frounze 21 years ago

unfortunate

Maybe asm programmers are better at this than pure (or asm-unaware) HLL programmers, but here are a few key points that both must be familiar with:

- asm optimization won't improve much an initially poor algorithm (I mean, if it's implemened with O(N**2) MIPS cost and is worse than possible O(N*log(N)), doesn't matter what language is chosen, unless your N is a small number and there are no outer loops)

- redundant operations (no matter in which language written) will lower the performance, so any calculation or access that can be avoided, should be, no matter what language is used (with asm, though, you have a better control here)

- if there are parts in the program that need data memory at mutually exclusive periods of time, it makes sense to share memory between them. Even C/C++ has "union" keyword helping to do it (in asm you can always do things like that).

- if your memory isn't strictly divided into two distinct categories, program and data (the harvard architecture), and can be used for both, then you can share the stack/heap/uninitialized-yet-bss with some of the startup code that runs just once in the beginning of the program. And this mostly have to do with the linker, so the language is again up to you because this sort of size optimization is language independent

- finally, without resorting to asm optimization, one could HLL constructs that fit the target CPU instructions better. I mean, if branching is expensive, unroll the loops and replace conditional branching by logic. If logical operations are performed worse than arithmetical (which may happen with e.g. DSPs that have quick addition and multiplication and other things), consider replacing (A && B) by (!(A*B)), etc etc.

In practice, when embedded systems are considered and the cost should be real low, all kinds of optimizations and tricks are handy. If the production volumes are big, higher development cost will be covered by the savings.

And I'd say that algorithm replacement/refinement is the ultimate optimization. Then you can further squeeze and speed things up in asm to the absolute minimum. Bad algorithm, especially optimized with asm, can be as expensive as several algorithms because a. it's bad and b. if you consider its replacement, earlier asm optimization is just thrown away.

Alex

Vote

I

Ian Bell 21 years ago

snip of much sensible stuff

Which is just what I was trying to say. The key question is 'how do we best implement this function on this platform?' To answer this you need to understand 'this platform' in sufficient depth and know how to create an efficient algorithm.

Ian

Ian Bell

Vote

C

CBFalconer 21 years ago

Again, agreed. I just haven't spent my life in the large volume situation, but I have spent it in the reliability situation. Now consider Microsoft ....

Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. USE worldnet address!

Vote

P

Paul Carpenter 21 years ago

Sometimes but not always...

That is very target and particular part of application specific, whereas most compilers are general purpose so will have flaws for some circumstances.

Or for that matter in other applications on the same processor.

Depends how you have saved configuration data for that specific unit and how big the size of that data set is. No hard drive or media card is a good enough reason to save in Flash or even EEPROM setup data that may well be read by memcpy. A typical printer (even networked) does not need a large dataset of customisation variables.

It also depends on a lot of things, for example I quite often use memcpy to copy some small sets of system defaults between external Flash and internal RAM, because the overhead of setting up a DMA transfer would take longer. However If I am copying large chunks of data from Flash to dual ported hardware RAM or similar then I do use DMA transfer, even for a memory to memory mapped transfer. Often followed by memcmp to validate the data set, then a smaller C routine I have control over to isolate the actual failing location and values, to report and action upon.

-- Paul Carpenter | snipped-for-privacy@pcserviceselectronics.co.uk PC Services GNU H8 & mailing list info For those web sites you hate

Vote

W

Wilco Dijkstra 21 years ago

serious

In what way could a math library be C specific? The math functions are the same independently of the language - at the end of the day you pass in some floating point value(s), some computation happens, and you get a result - that's it. So it's no surprise that same libraries are used for various languages such as C, C++, Java, Fortran as they all define a common set of math functions.

Wilco

Vote

reducing flash size in embedded processors?

Join the Discussion

Didn't find your answer?