Benchmarking C compilers for embedded systems

- P
- Philipp Klaus Krause
  
  Contact options for registered users
posted
12 years ago

Mon, Sep 19, 2011 5:43 PM

I'd like to compare C compilers targetting embedded systems - the kind that only have a few KB of RAM, etc.

However there seems to be a lack of software I could use. Waht I'm looking for should be free, written (mostly) in standard C, be not totally different from typical embedded applications.

So far I see:

- dhrystone

- The Contiki OS

- the files I've been using to track sdcc over revisions at

formatting link

Are there any other suitable benchmarks I could look at? They don't have to be designed as benchmarks, other software, like the Contiki OS mentioned above is OK, too. THey don't even have to "run", so libraries would be OK, too.

Philipp

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Mon, Sep 19, 2011 8:10 PM

There are no good general benchmarks for small embedded systems - partly because it would be a great deal of work (standard C is not standard for many compilers - you need to use compiler-specific features to get the best out of them), and partly because it would be of very little use.

What you want to know is which tools are the best for /your/ code, not some general code written for completely different uses. Most tool vendors provide free evaluation software - test them out on your own code.

- A
- Abhishek
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Tue, Sep 20, 2011 7:50 AM

Hi Philipp,

Have you tried Coremark from EEMBC

formatting link

It is available for free. The other benchmarking suites from EEMBC require you to pay small membership fee i guess.

Regards, Abhishek

- P
- Philipp Klaus Krause
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 7:40 AM

Am 20.09.2011 09:50, schrieb Abhishek:

I now did, however the license is a problem:

1) The license. I can understand when benchmarking organizations don't distribute older versions of their benchmarks and no longer accept reults generated using an old version, like SPEC does. However the Coremark license explicitly forbids you from just running or even keeping a copy of older versions: You loose all rights to them each time a new version is released.

I'll probably use it anyway and just hope they don't release a new version anytime soon. Still, I'm looking for further benchmarks. The Contiki OS seems perfect, but I'd like to find more of that kind.

Philipp

- I
- Ian Collins
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Wed, Sep 21, 2011 9:06 AM

What about your own code?

--
Ian Collins

- P
- Philipp Klaus Krause
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 22, 2011 7:12 AM

I'd like to get results that are useful to other people as well. I intend to use some of my code as one of the benchmarks benchmark, but in general I'd prefer to use more generally accepted stuff.

Philipp

- P
- Philipp Klaus Krause
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 22, 2011 7:16 AM

I don't just want to know which tool is best for my code. I want to know, which tool is best for typical code. I am one of the developers of the free sdcc compiler and currently working on register and stack allcoation. I see that what I implemented results in better code when compiling my own programs, Contiki, dhrystone, Coremark. But I'd like to know if there is any code for which the new approach performs worse than the on epreviously used in sdcc. And I'd like to know it know, instead of having to wait for bug reports about code size regressions coming in after the next release.

Philipp

- I
- Ian Collins
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 22, 2011 7:41 AM

Every target and code base combination will behave differently, even on different members of the same processor family. This makes general benchmarks useless for embedded systems.

--
Ian Collins

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 22, 2011 8:26 AM

I understand what you are trying to achieve - I just don't think it is possible. There is no such thing as "typcial code" for embedded systems, and often "benchmark code" is so artificial that it doesn't represent the real world (though it can be useful for identifying particular cases).

What you are doing now, testing with a sample of your own real-world programs and other real-world embedded code (Contiki), is the best you can do. If you want to do more, then collect more samples in a similar vein. If sdcc has a collection of example code, then that would be a good start - you may also find that some users will give you programs you can use. The toughest test is real-world code that is written for portability - good examples there would be FreeRTOS and LWIP. These are code bases that are written to work on a wide range of targets, and can't take advantage of any target or compiler-specific tricks and optimisations, and they are projects that are used by a great many people.

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Thu, Sep 22, 2011 8:34 AM

Elm's FAT filesystem implementation:

formatting link

- B
- Bruno Richard
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Sep 23, 2011 7:35 AM

Hi Abhishek,

Just a little notice here: The CoreMark is fully based on 32-bit arithmetic. Which is *meaningless* if you are considering using an 1/16-bit MCU, which seems to be the case here. Anyway if you plan on having 32-bit arithmetic in your application, a 32-bit MCU will be much easier than 8 or 16 bits.

BR

- P
- Philipp Klaus Krause
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Sep 23, 2011 10:21 AM

Am 23.09.2011 09:35, schrieb Bruno Richard:

I wouldn't say "meaningless", at least not if other benchmarks are used, too: E.g. the Contiki OS has quite some 32 bit arithmetic as well, and is commonly used on 8-bit systems.

Philipp

- B
- Bruno Richard
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Sat, Sep 24, 2011 8:13 AM

Philipp,

I do not agree. CoreMark is based on heavy, intensive 32-bit arithmetics. If your application *really* requires matrix inversions, FFTs and the like, then an

8-bitter looks undersized to me, hence there is no point into benchmarking it. CoreMark should NOT be used to evaluate 8 and 16-bit micros!

Bruno.

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Sep 30, 2011 12:11 PM

Philipp,

There are tons of embedded systems benchmarks on the internet. In our compiler work we have found the most informative benchmarks are application type benchmarks close to the actual intended application.

The application fragment type benchmarks (fft, filters, math packs) are useful generally is seeing how focused a compiler is in optimizing a specific application area.

Generic benchmarks would likely need to also have some definition of the intended application area.

There isn't much meaning in comparing the performance of processors used in mice and processors used in home appliance controllers. Both are non hosted embedded controllers with a few K of code but very different requirements. They may even be members of the same family.

Walter..

- P
- Philipp Klaus Krause
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Sep 30, 2011 6:45 PM

Am 24.09.2011 10:13, schrieb Bruno Richard:

Well, it can happen that 8-bit applications need some 32-bit arithemtic somewhere, even relatively complex arithemtic, as long as it's not the the main task. And even seldomly executed code has to be placed somewhere. Since I primarily want to compare compilers (and individual approaches to problems within a compiler) in terms of code size and stack space usage it seems a good idea to me to include a benchmark focused on 32 bit arithmetic.

Philipp

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Sep 30, 2011 9:06 PM

You can do this and the answers have some meaning, however the results don't represent a significant processor measurement. In the same way as comparing an M3 against some well implemented 8 bit processors running

8 bit sized dominated data. Data size and probably code size will not reflect well on an otherwise good 32 bit processor.

You might be wise to create a mix of benchmarks that represent a large mix of typically embedded systems code blocks so that the strengths and weaknesses of various processors can be seen. A single benchmark result would not show a lot of information..

You probably could achieve a reasonable broad mix with a small number of test programs that cover 8,16,32 bits, math, filters and control structures.

Possibility some common application code PWM and event driven functions for example

Regards,

-- Walter Banks Byte Craft Limited

formatting link

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Fri, Sep 30, 2011 9:08 PM

You can do this and the answers have some meaning, however the results don't represent a significant processor measurement. In the same way as comparing an M3 against some well implemented 8 bit processors running

8 bit sized dominated data. Data size and probably code size will not reflect well on an otherwise good 32 bit processor.

You might be wise to create a mix of benchmarks that represent a large mix of typically embedded systems code blocks so that the strengths and weaknesses of various processors can be seen. A single benchmark result would not show a lot of information..

You probably could achieve a reasonable broad mix with a small number of test programs that cover 8,16,32 bits, math, filters and control structures.

Possibility some common application code PWM and event driven functions for example

Regards,

-- Walter Banks Byte Craft Limited

formatting link

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Sun, Oct 2, 2011 6:21 PM

That logic doesn't work. 32 bit arithmetic has no visible relation to what you're trying to do, so the "since" in that sentence is rather unjustified.

The key problem with your plan still is that the quantity you want to measure and compare has been repeatedly and convincingly demonstrated, here and elsewhere, not to exist. Neither code size nor stack space usage is, by any reasonable stretch of the imagination, a property of any given compiler. It's a property of the particular chosen combination of of test code, target platform, compiler, and selected optimization options.

In real-world applications, practical reasons drive a rather strong correlation between two main pillars of that construction: code to be run, and target hardware. It simply doesn't matter how good a compiler for a 4-bit platform might be at translating code "focused on 32-bit arithmetic", because these days it makes absolutely no sense to choose that kind of hardware for that kind of code any more. You might as well benchmark watchmakers' tool sets on their performance at servicing a V8 engine.

- A
- Arlet Ottens
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Sun, Oct 2, 2011 6:59 PM

It all depends. I've done plenty of 32 bit integer and floating point arithmetic on 8 bit targets.

There are still reasons to use 8 bit CPUs for such jobs, such as price, power consumption, package options and just plain legacy applications. For instance, I used a TSSOP8, and there isn't much choice in 32 bit CPUs for that kind of package, and there was even less choice just 5 years ago when I picked the CPU.

And when an 8 bit CPU is the best option for the job, and you have to deal with 32 bit arithmetic, it would be nice if the compiler could make the code fit.

- M
- Mel
  
  Contact options for registered users
Vote on answer
posted
12 years ago

Sun, Oct 2, 2011 8:39 PM

As of just today, I have a tiny 8-bit uC driving a single pin with bits from a 32-bit LFSR. I could write that without 32-bit arithmetic, but why?

Mel.