Verify execution speed - "cycle counting" etc?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi all,

triggered by the recent "cycle counting" threads, I would like to know
how you verify "is my code fast enough"?

Although you can't predict the exact timing in modern controllers and
you use HW timers to get exact speeds, there are many situations where
you simply need to be sure that a piece of code is executed before a
timer expires or an ADC has to be read etc.

Do you insert delays in the code until it breaks, analyse assembler
output, not use C at all, logic analyzer, combinations...?

Oliver
--  
Oliver Betz, Munich
despammed.com is broken, use Reply-To:

Re: Verify execution speed - "cycle counting" etc?
On 20/02/14 12:03, Oliver Betz wrote:
Quoted text here. Click to load it

Don't forget to consider the effects of worst-cast I/D cache misses.
In particular consider the ratio of longest:mean and longest:mode times.

I'm assuming you're already considering the effects any RTOS scheduling
and synchronisation will introduce, particularly if there is priority
inversion.

Re: Verify execution speed - "cycle counting" etc?
Tom Gardner wrote:

Quoted text here. Click to load it

that's what i meant with "you can't predict the exact timing in modern
controllers"

Quoted text here. Click to load it

I'm even considering the delay introduced by external hardware.

My question targeted small scale timing, most times not tolerating
interrupts at all.

And it was really a question about the methods used by c.a.e. members.

Oliver
--  
Oliver Betz, Munich http://oliverbetz.de/

Re: Verify execution speed - "cycle counting" etc?
On Thu, 20 Feb 2014 20:15:47 +0100, Oliver Betz wrote:

Quoted text here. Click to load it

Oh, well, if _that's_ all you want:

I measure the timing on an O-scope, and hope that it stays correct in the  
field.

But -- I don't do life critical stuff.  If I did, I'd probably wish for a  
"run as slow as you ever will" switch for the processor, which I would  
turn on, then do the above measurements for the portions of the code that  
were, indeed, life critical.

--  

Tim Wescott
Wescott Design Services
We've slightly trimmed the long signature. Click to see the full one.
Re: Verify execution speed - "cycle counting" etc?
Tim Wescott wrote:

[...]

Quoted text here. Click to load it

Yes, a widespread method.

Quoted text here. Click to load it

I also don't. But since my (industrial) products are usually a small
cog in a big wheel, the consequential cost *) of my errors can be very
high compared to the product cost.

Oliver

*) what is the original term used by Philip Crosby in "Quality Is
Free" for the cost of an "deviation"? The German translation used
"Preis der Abweichung".

Oliver
--  
Oliver Betz, Munich http://oliverbetz.de/

Re: Verify execution speed - "cycle counting" etc?
On 20/02/14 23:28, Tim Wescott wrote:
Quoted text here. Click to load it

That's always a useful method.  Typically you have code like this:

{
    turnOnTestPin();
    doSomethingSlow();
    turnOffTestPin();
}

Then you can see the timing.

To be a little more sophisticated, I often have something like this:

uint32_t maxTime = 0;

{
    uint32_t startTime = getHardwareMicrosecondsCount();
    doSomethingSlow();
    uint32_t slowTime = getHardwareMicrosecondsCount() - startTime;
    if (slowTime > maxTime) maxTime = slowTime;
}

(Quiz for C language purists - what happens when the hardware counter
rolls over?)

That way it's easy to track the longest time needed by the slow code.

One thing to be aware of is that the compiler may re-order your code to
put some of the "doSomethingSlow" before or after your timers or test
pin settings...




Re: Verify execution speed - "cycle counting" etc?
David Brown wrote:
Quoted text here. Click to load it

It depends and you need to check for it. There are two possible  
rollvers, and they'll have different effects. At least the quantities  
are signed.

I'd put "fake" rollovers in to test all three-four cases.

Quoted text here. Click to load it

It's a horrible way to do it, but a usleep(0) before ( and maybe after )  
  the doSomethingSlow() will put a stop to that.

--  
Les Cargill


Re: Verify execution speed - "cycle counting" etc?
On Fri, 21 Feb 2014 13:07:44 -0600, Les Cargill

Quoted text here. Click to load it


As long as the "doSomethingSlow" is compiled separately from the
timing loop, the compiler can't reorganize those statements.  

Quoted text here. Click to load it

Any externally compiled dummy routine should prevent any optimization
across calls.

Are you sure that usleep(0) does nothing ?

At least on some cooperative or round robin systems, the XYZsleep(0)
type  function will at least run the scheduler, possibly activating
the thread at next clock tick.  



Re: Verify execution speed - "cycle counting" etc?
On 14-02-21 22:29 , snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it


Compilers with link-time optimization (LTO) can reorganize code even
then. AIUI, recent gcc versions have LTO. We should hope that LTO can be
disabled with suitable compiler options.

--  
Niklas Holsti
Tidorum Ltd
We've slightly trimmed the long signature. Click to see the full one.
Re: Verify execution speed - "cycle counting" etc?
snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

Absolutely not. Your dummy routine is a really good variation on what  
I'd  said.

it will not be costless. It may cause a context switch - I'd
count on it.

Quoted text here. Click to load it

Right. And even with preemptive systems, it may realign you with the  
clock tick that's usually at 10 msec.

Quoted text here. Click to load it

--  
Les Cargill

Re: Verify execution speed - "cycle counting" etc?
On Fri, 21 Feb 2014 10:50:40 +0100, David Brown wrote:

Quoted text here. Click to load it
doSomethingSlow();
Quoted text here. Click to load it

AFAIK, if you declare startTime and slowTime as volatile that shouldn't  
happen -- but that's "shouldn't" in a moral sort of way, not in a "you  
can always expect this" sort of way.  Apparently sometimes it does anyway.

There's a paper floating around in the eWorld that gets very technical  
about what volatile's supposed to do, what compilers often fail to do  
under aggressive optimization, and ways that you can force correct  
behavior (sometimes at the cost of some execution speed).

--  

Tim Wescott
Wescott Design Services
We've slightly trimmed the long signature. Click to see the full one.
Re: Verify execution speed - "cycle counting" etc?
On 21.02.2014 21:37, Tim Wescott wrote:

Quoted text here. Click to load it

Actually, not even the moral sort of way really works.  The key issue is  
that making objects volatile affects _only_ volatile objects in any way.  
  It really creates no extra restrictions whatsoever on non-volatile  
objects.  The compiler would be fully allowed, in any sort of way  
including the moral, to move that doSomethingSlow() before the  
assignment to startTime, after that to slowTime, or even split it into  
three separate parts to go before, between and after them.  The only  
thing those volatiles really do prohibit is moving slowTime before  
startTime.

Quoted text here. Click to load it

Making stuff volatile, if it's actually needed, may _always_ costs some  
execution speed.  That's part of the reason it exists.

Re: Verify execution speed - "cycle counting" etc?


Quoted text here. Click to load it

Perhaps I should have said "more execution speed than is necessary to  
just do what it should".

Clearly the C language needs to allow you to mark a block as volatile, to  
say "execute everything inside this block after any "volatile" stuff  
before and before any "volatile" stuff after, but do whatever you want  
inside.

volatile {
  do_some_stuff;
}

And it needs to be simpler, too -- it's just getting too complex.

--  

Tim Wescott
Wescott Design Services
We've slightly trimmed the long signature. Click to see the full one.
Re: Verify execution speed - "cycle counting" etc?

Quoted text here. Click to load it

Yes, that is exactly the point I was getting at (but letting people  
think a little before giving them the answer).  "volatile" accesses are  
strictly ordered (assuming there are sequence points between them -  
there are no order guarantees if you do something like "foo(vol1,  
vol2)").  But volatile accesses have no ordering requirements with  
respect to non-volatile accesses or any calculations.

There are various ways to enforce - or at least, partially enforce - the  
desired behaviour.  Some have been mentioned in this thread.

If the compiler does not know anything about "doSomethingSlow()",  
because it was separately compiled (and you are not using LTO or other  
whole-program optimisation), then the compiler has to assume it will  
contain volatile accesses, and it cannot do any re-ordering.  You can  
achieve a similar "zero knowledge" effect by using function pointers,  
although gcc can sometimes optimise function pointers too.  If the  
function pointer itself were volatile, you would be okay.

As long as the function doSomethingSlow() is compiled as a stand-alone  
function, rather than being inlined, then it is very unlikely that the  
compiler will re-order around it - but it /could/, if it wanted to.

Most compilers have some sort of "memory barrier" - there is no standard  
C solution.  Sometimes this is done using intrinsics or compiler  
extensions.  In gcc, it is done using inline assembly with a "memory"  
clobber: "asm volatile ("" ::: "memory"); ".  This tells the compiler  
that memory may be read or written in an unexpected (volatile) manner.  
It does not force a complete ordering on code, but it forces an ordering  
on data in memory - no memory accesses can be re-ordered across a memory  
barrier, or "cached" across the barrier.  So a memory barrier before and  
after the call to doSomethingSlow() will disallow the compiler from most  
re-ordering of code, and will usually have low impact on the speed.

It would be nice if there were some C standard way to deal with this  
sort of thing, but there is none - and there is not likely to be any.  C  
simply does not have any concept of the timing of events, and only  
considers the order of /observable/ events to be relevant (i.e.,  
volatile accesses and calls to external code).

Quoted text here. Click to load it


Re: Verify execution speed - "cycle counting" etc?
On 14-02-22 19:39 , David Brown wrote:


Quoted text here. Click to load it


A volatile function pointer really is a good way to force a function
call where one wants it. New one to me, thanks for it!

This trick should defeat even LTO, assuming that the C compiler
implements volatiles as it should. And assuming also that the LTO knows
about and respects volatiles. Does anyone here happen to know if gcc LTO
respects volatiles?

--  
Niklas Holsti
Tidorum Ltd
We've slightly trimmed the long signature. Click to see the full one.
Re: Verify execution speed - "cycle counting" etc?
On 22/02/14 18:53, Niklas Holsti wrote:
Quoted text here. Click to load it

Quoted text here. Click to load it

gcc should respect volatile accesses regardless of LTO.

Of course, the compiler is free to take something like this:

extern void doSomethingSlow(void);

typedef void (*FVoid)(void);
volatile FVoid indirectDoSomethingSlow = &doSomethingSlow;

void foolTheCompiler(void) {
    startTiming();
    indirectDoSomethingSlow();
    stopTiming();
}


and turn it into this, with everything inlined or as extra small
functions - whatever the compiler prefers:

void foolTheCompiler(void) {
    volatile bits of startTiming()...
    FVoid fp = indirectDoSomethingSlow;
    if (fp == &doSomethingSlow) {
        some non-volatile bits of stopTiming()...
        inline some of doSomethingSlow mixed in here...
        rest of startTiming()...
        volatile bits of stopTiming()...
        the rest of doSomethingSlow()...
        any left-over bits...
    } else {
        // External function call is a full barrier
        fp();
        // External function call is a full barrier
        stopTiming();
    }
}

        
It is unlikely that the compiler /would/ generate something like that -
but it would be legal.  Trying to "defeat" the compiler is seldom a good
idea - though it can occasionally be useful to know tricks like this.
It is usually better to use things like memory barriers, which are
working /with/ the compiler instead of /against/ it.

And don't forget that once you have "beaten" the compiler, the processor
could also re-arrange things if it is an out-of-order or superscaler
processor...





Compiler technology, was: Re: Verify execution speed - "cycle counting" etc?
Quoted text here. Click to load it

Let me ask a "meta" type question.

_Why_ should you or I need to know tricks like this, even though we do
need to know this with modern compilers ?

Or to put it another way, have compiler optimisers gone too far and
is the quest for fractional improvements in code generation coming at
the expense of reliability in the generated code ?

There are some safe optimisations, such as looking at a function as a
whole instead of line by line (so the code generator can pull commonly
accessed variables into registers at the start of the function) which
no-one would disagree with.

You also need to tell the compiler if a memory location can be changed
from outside of the code (ie: hardware register mapping) which no-one
would also disagree with having to do.

However, it feels like there are an increasing set of rules and aggressive
optimisations just so someone can be a few percent faster (if even that)
which feels like it is coming at the expense of robustly generated code
without any surprises in it.

I wonder if it's time to take a step back and ask if compiler optimisation
has become too aggressive ?

Should we have some explicit "--enable-dangerous-optimisations" option
which doesn't get turned on with any of the normal optimisation levels
until requested and for which it's side effects are clearly documented
in the manual ?

Simon.

--  
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP
Microsoft: Bringing you 1980s technology to a 21st century world

Re: Compiler technology, was: Re: Verify execution speed - "cycle counting" etc?
Simon Clubley schreef op 24-Feb-14 3:13 PM:
Quoted text here. Click to load it

The problem is that the users have grown to depend on the compiler NOT  
performing optimizations that the compiler has always been allowed to  
do. In other words: users could get away with code that was strictly  
speaking invalid or undefined or erroneous (or whathever the language  
definition calls it).

Wouter van Ooijen


Re: Compiler technology, was: Re: Verify execution speed - "cycle counting" etc?


Simon Clubley wrote:

Quoted text here. Click to load it

You raised a bunch of points.

1) Are compilers optimization being too aggressive?
    Yes and no. Most of  the embedded systems targets that we
    compile  for require tight fast, low power code. Some other systems
    the optimization may not be as critical. What is a problem in many
    compilers is too many optimization options and it is rare that they are
    tested in combination.

2) The best way a programmer can help the compiler is to write
    clear clean source code.

3) The best way the compiler vendor can help the application
    developer is focus the optimization on the requirements of the
    intend ended applications.

   For example, small scale process control and data logging intended
   applications would benefit more from the compiler developer's time
   in optimizing the handling of polynomials than many other potential
   optimizations.

4) Volatile variables as has been noted in this thread are both badly
    handled and extremely difficult to handle. As an example of this
   some volatile variables have hardware side effects, out of order
   referencing the volatile in some rare (but significant) cases results
   in the side effects happening not where they were intended. One
   case for example is where a reference cleared a buffer flag in
   hardware, a second reference then dropped a byte of data.
   Solvable sure, nasty to find, you bet.

Walter Banks
Byte Craft Limited








Re: Compiler technology, was: Re: Verify execution speed - "cycle counting" etc?
On 24/02/14 15:13, Simon Clubley wrote:
Quoted text here. Click to load it

Let me respond first with another "meta" question - /do/ we need to know
this with modern compilers?  The answer here, I think, is that at least
some of us need to be aware of this sort of thing - even though much of
what I wrote was theoretical rather than real-life compiler behaviour.
I certainly have seen a development in compilers over the years that has
"broken" code that "worked" with previous compilers, because the
programmer did not understand what the compiler can do when generating
optimised code.

As an aside here, it is important to note that people who think this
sort of re-ordering of code only applies when they choose "optimisation"
flags are /wrong/.  A compiler can legally (in the sense of "C
standards" legally) generate as good or bad optimised code as it wants,
with a total disregard for optimisation settings.  Such settings are
just a hint to the compiler - they are never a command.

For the next point, let me misquote and paraphrase Linus Torvalds a bit
when he says that "volatile is evil".  There are good reasons why many
developers should /not/ know about volatile and its complications.  When
you are using an OS of some sort (an RTOS, embedded Linux, or whatever),
then generally you should /not/ be using "volatile" yourself.  You might
be using volatile registers declared in device header files, and you
might use it indirectly (as the source for the OS almost certainly will
use it somewhere).  But in most cases where you might think of using
"volatile" for your own data, you would be far safer using constructs
from your OS (such as locks, shared data, atomic data, etc.).  When you
are using an OS, "volatile" is rarely sufficient or necessary.

Of course, when you are working "close to the metal", or implementing
such OS features, then you need to know all about volatile - in all its
gory details.  But volatile alone is not enough - you also need to know
about the cpu details, such as cache coherency, synchronisation
instructions, etc.


Also note that my discussion earlier concerns what the C standards say
you can rely on regarding ordering, volatiles, etc.  Any given compiler
is free to make additional promises to make the developers' live easier.
 As an example, C++ has loser specifications for "volatile" than C has:

extern volatile int vol;
void foo(void) {
    vol;
}

is guaranteed to read "vol" in C, but not in C++ (it /may/ read it, but
it is not required).  gcc states in its manual than this /will/ always
cause a read in C++ - it guarantees that volatiles in C++ will work as
they do in C.  (In this case, you would be better writing "(void) vol;",
which is guaranteed to read vol by the C and C++ standards, and will
also avoid spurious warnings about unused statements.)



What this means is that programmers should learn to understand how much
they need to know, and use methods that are appropriate for their usage.
 If they are using an OS, they need to learn the right methods for that
OS.  If they are using bare-metal, they need to understand how their
compiler works (and if it gives additional help), what sort of code is
generated, and also how their cpu works and how /that/ affects the
working code.  Understanding the C standards behaviour is useful - but
it might not be necessary, and it is certainly not sufficient.


Quoted text here. Click to load it

No, I don't think so - smaller and faster object code is always a nice
thing.  It is important to be able to limit the optimisations somewhat
as an aid to debugging and development, but compilers should be able to
generate as optimal code as they are able.

However, I think the C (and C++) standards could benefit from more
specifics regarding some types of behaviour, and perhaps with the
addition of features targeted at embedded developers who need tighter
control.  (The addition of atomics in C11 and C++11 is one step towards
this.)  And I think compilers could benefit from extending beyond what
the standards say, if the standards don't do a good enough job here -
for example, memory barriers of various types could be compiler
intrinsics instead of needing inline assembly (or at least, they could
be given standard names and put in standard headers).

Quoted text here. Click to load it

I have seen "working" code that breaks when this sort of optimisation is
used.  /I/ agree that this sort of thing is "safe" optimisation, but I
know that others may disagree.

Quoted text here. Click to load it

I have also seen people write code that /assumes/ all reads and writes
of memory-mapped registers and other data is automatically "volatile".
I have seen compilers with command-line switches to make /all/
non-register data work in this way.

In other words, I agree with you here - but I disagree about who you
think might disagree.  (We are having a meta-discussion here....)

Some people think the compiler should blindly translate code line for
line, and not try to be "smart".  There may be a use for such a
compiler, and it is certainly possible to make a standards-compliant C
compiler that acted that way - but C does not require a compiler to
behave like that.

Quoted text here. Click to load it

I can certainly agree that it can be a problem when compilers surprise
the developers with their optimisations.  But I don't think compilers
have yet reached the point where they are so good that there is little
left to gain in optimisation - and we users always want smaller and
faster code.

One thing that makes a big difference is good use of compiler warnings.
 This can make a significant difference by helping spot when the
optimisations may change the meaning of the code - and there is always
scope for compilers to have better warnings.  For example, optimisations
from type-based alias analysis can surprise developers - gcc is able to
warn about many cases where there could be problems.

Quoted text here. Click to load it

There are some cases in which the C standards are /too/ restrictive, and
compiler flags let the compiler break the standards in order to generate
better code.  For example, with floating point code you might expect the
compiler to be free to change "x / 2.5" into "x * 0.4" - but that kind
of thing would break the standards rules.  gcc has a "-ffast-math" flag
that lets the compiler do optimisations and make assumptions that are
fine for most programs, but are against the strict rules.

And often compilers will have specific flags to enable or disable
particular types of optimisations, to let you have control over what you
consider "dangerous" - as this will vary from user to user.  This is, of
course, entirely compiler-specific and is not part of the language
standards.  Reading your compiler manual is definitely recommended :-)


Quoted text here. Click to load it


Site Timeline