Verify execution speed - "cycle counting" etc?

O

Oliver Betz 12 years ago

Hi all,

triggered by the recent "cycle counting" threads, I would like to know how you verify "is my code fast enough"?

Although you can't predict the exact timing in modern controllers and you use HW timers to get exact speeds, there are many situations where you simply need to be sure that a piece of code is executed before a timer expires or an ADC has to be read etc.

Do you insert delays in the code until it breaks, analyse assembler output, not use C at all, logic analyzer, combinations...?

Oliver

Oliver Betz, Munich despammed.com is broken, use Reply-To:

Vote

T

Tom Gardner 12 years ago

Don't forget to consider the effects of worst-cast I/D cache misses. In particular consider the ratio of longest:mean and longest:mode times.

I'm assuming you're already considering the effects any RTOS scheduling and synchronisation will introduce, particularly if there is priority inversion.

Vote

W

Wouter van Ooijen 12 years ago

Oliver Betz schreef op 20-Feb-14 1:03 PM:

When timing realy realy matters you don't want things that sometimes speed things up, like caches. Microcontrollers can have a reasonably predictable speed.

Wouter van Ooijen

Vote

D

dp 12 years ago

Well, this is somewhat like healing a running nose by decapitation :D .

There are plenty of complex enough tasks needing large systems where caches *always* speed things up quite significantly so worst case timing guarantee must be somehow addressed. I am pretty sure the OP asks about these cases, he has more than enough experience with the small MCU-s from the past (I remember encountering him on the HC11 mailing list some 15 years ago).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

S

Stef 12 years ago

You could use HW timers on entry/exit of the piece of code under investigation. But what I usually do is toggle an output on entry/exit and connect a scope. This makes the timing very visible and if you have a scope that can trigger on pulselength > time, you can verify code never takes more than 'time' by checking it does not trigger.

Stef (remove caps, dashes and .invalid from e-mail address to reply by mail) Patience is a minor form of despair, disguised as virtue. -- Ambrose Bierce, on qualifiers

Vote

N

Niklas Holsti 12 years ago

Worst-Case Execution-Time (WCET) analyzers are available for many controllers, both tools using fully static analysis (including cache analysis) and tools a using combination of measurements (for execution times of basic blocks) and static analysis (for adding up the WCET of the possible execution paths).

See

formatting link

(static analysis) and

formatting link

(measurement-based, hybrid analysis).

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

O

Oliver Betz 12 years ago

that's what i meant with "you can't predict the exact timing in modern controllers"

I'm even considering the delay introduced by external hardware.

My question targeted small scale timing, most times not tolerating interrupts at all.

And it was really a question about the methods used by c.a.e. members.

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

O

Oliver Betz 12 years ago

[...]

exactly.

In the HC11 and the HC12/9S12 times, things were easy. Even the debugger's background access didn't introduce a delay on the *12, and since there were no Flash wait states, the code position was irrelevant. From looking at the assembler code, you could count the cycles or delay the code until it breaks to determine the margin.

With newer controllers, there are _many_ factors to consider. Even simple derivatives might read flash in blocks so you get something like 2-1-1-1 wait states, and the execution speed becomes position dependent. DMA tasks and debugger can introduce "random" delays and so on.

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

O

Oliver Betz 12 years ago

[...]

I do so since a long time. It's o.k. for a coarse estimation, but you have to consider the effects of the instrumentation code.

that's what I meant by "logic analyzer".

But as above, the instrumentation slows down the code.

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

O

Oliver Betz 12 years ago

[...]

thanks Niklas! This is new to me, and I will have a look at it, although I'm afraid that it's oversized for my demand.

Do you have experience with these tools?

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

N

Niklas Holsti 12 years ago

These tools are a bit expensive, yes. But perhaps negotiable...

I have used the AbsInt tool; it worked, but the application had very complex control flow (autogenerated code from Rhapsody, consisting of interacting state machines with event queues -- yuck) and it was difficult to avoid overestimation of the WCET for this reason. The tool should work much better for manually written code with a cleaner, application-oriented control flow.

I have worked with the RapiTime tool, but I have not used it for any real application. Its analysis method makes it more easily portable to any target architecture, while the AbsInt tool requires a serious (and usually expensive) porting effort for new architectures.

RapiTime uses measurements, which means that you must have a test suite that can be measured, while the AbsInt tool is pure static analysis and needs neither a test suite nor a measurement system. For complex processors with variable execution times RapiTime is theoretically unsafe -- its WCET estimate is not guaranteed to be an upper bound on the real WCET -- but with a good test suite the risk of an actual execution ever exceeding the WCET estimate can be made very small, they claim.

I've been working as a tool developer myself in this area for a goodly time, but my tool

formatting link

is aimed at predictable, cache-less controllers, so it wouldn't suit you. It's cheap, however :-)

If you have ACM Digital Library access, there is a survey paper on WCET analysis at

formatting link

which explains the methods and lists several other tools. Few of them are commercially available, though.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

H

Hans-Bernhard Bröker 12 years ago

Of course it does. But if even what little instrumentation it takes to pull up a single port pin and put it down again makes a difference big enough to cast doubt on the validity of the result, that fact on its own almost certainly fixes the answer: that CPU isn't up to the task.

Ultimately, the necessary procedure boils down to a typical system of technological escalation:

0) Pick a measurement / instrumentation method. 1) Implement that method and measure long and hard. 2) Check the results for timing requirement violations. 2.1) If none found, you're done; if the margin is big enough, might just leave the instrumentation in. END. 2.2) So there are infractions. Check how big they are: 2.2.1) if they're big, or happen on too regular a basis, you have to give up: this CPU can't do this algorithm quick enough. Time for a re-design, possibly from the ground up. END. 2.2.2) you have an undecided result. This method wasn't good enough. 2.2.2.1) if you know a better method, pick that and back to 1) 2.2.2.2) find someone who does and back to 1) 2.2.2.3) So this CPU may be able to do this algorithm fast enough, or it may not be. But nobody knows how to prove either way. You can't justify sticking to this pairing of hardware and algorithms. Time to re-design. END.

Vote

T

Tim Wescott 12 years ago

Oh, well, if _that's_ all you want:

I measure the timing on an O-scope, and hope that it stays correct in the field.

But -- I don't do life critical stuff. If I did, I'd probably wish for a "run as slow as you ever will" switch for the processor, which I would turn on, then do the above measurements for the portions of the code that were, indeed, life critical.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

O

Oliver Betz 12 years ago

[...]

Yes, a widespread method.

I also don't. But since my (industrial) products are usually a small cog in a big wheel, the consequential cost *) of my errors can be very high compared to the product cost.

Oliver

*) what is the original term used by Philip Crosby in "Quality Is Free" for the cost of an "deviation"? The German translation used "Preis der Abweichung".

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

O

Oliver Betz 12 years ago

[...]

money is only a fraction of the cost, and as you write, using these tools is "not simple". I think they are by far too big for my problems.

[...]

I use "cache-less" controllers. What you get also with small controllers these days is some kind of burst (Flash-) memory access so the execution speed can differ slightly depending on the exact placement of the code. And even with interrupts disabled, there might be a DMA transfer running in the background.

IMO more difficult to handle is the variability of compiler output.

As mentioned, the Coldfire has a severe penalty on writing peripherals consecutively (4c single write, 12c if previous isn't ready). And GCC delays all volatile accesses until the latest possible time, so you will get consecutive accesses. Without this delayed writes, normal code and slow accesses would be interleaved (using barriers is a mitigation but somewhat ugly).

Well, I don't use the Coldfire anymore. But the problem exists anyway: If you use compiled code for time critical things, you need a thorough verification strategy, since a small change in the code or toolchain can change the results fundamentally.

I will have a look at your tools and maybe the article you mentioned, thanks for the hint!

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

O

Oliver Betz 12 years ago

I disagree with this generalisation. Most times it is true, but think about code where only few instructions are between the time critical accesses.

Consider also that there are controllers where writing to port pins is extremly slow: Some time ago, I worked with Freescale Coldfire controllers, they had 12 (!) cycles for (consecutive) write accesses to port registers.

I often build things with restricted resources (space, power, money), controlling hardware in the microsecond range, so using a controller fast enough to run with "logic analyser instrumentation" would be a waste of resources or simply impossible.

Anyhow, this is not the main point of my original posting - "how you verify", "insert delays ... analyse assembler ... logic analyzer".

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

D

David Brown 12 years ago

That's always a useful method. Typically you have code like this:

{ turnOnTestPin(); doSomethingSlow(); turnOffTestPin(); }

Then you can see the timing.

To be a little more sophisticated, I often have something like this:

uint32_t maxTime = 0;

{ uint32_t startTime = getHardwareMicrosecondsCount(); doSomethingSlow(); uint32_t slowTime = getHardwareMicrosecondsCount() - startTime; if (slowTime > maxTime) maxTime = slowTime; }

(Quiz for C language purists - what happens when the hardware counter rolls over?)

That way it's easy to track the longest time needed by the slow code.

One thing to be aware of is that the compiler may re-order your code to put some of the "doSomethingSlow" before or after your timers or test pin settings...

Vote

L

Les Cargill 12 years ago

Scopes and logic analyzers are fine, but they may not have enough buffer depth.

Hopefully, you can detect and report "took too long" errors from within the driver or application. If not, you might need to have datasets that help generate detectable errors and run those in regression test.

For longer time scales, where it is applicable, I use the stuff in sys/time.h

And in cases, I've simply added counters that could be pulled over a serial port or a Telnet connection. The counters must be arranged such that you can tell a "miss" happened without the serial or Telnet link being realtime. If there's an A/D interrupt, increment one counter, ISR service, increment another, the middleware ( that blocks on the semaphore set by the ISR ) another.

I have also used a hardware timer set with a budget that fires an interrupt that increments an error counter if the critical section doesn't cancel it in time.

Frankly, the really time critical stuff tends to get done in FPGA these days. We'll have a few soft-realtime - say 10msec - polled things, but that's about it.

Profilers can help, or not. I never did trust them that much. There are lies, damned lies and models :)

Somebody mentioned "priority inversion" - if you can break your code by changing priorities , it's already broken, IMO. If "misses" are just unfortunate and not critical, then it's not broken per se.

Les Cargill

Vote

L

Les Cargill 12 years ago

It depends and you need to check for it. There are two possible rollvers, and they'll have different effects. At least the quantities are signed.

I'd put "fake" rollovers in to test all three-four cases.

It's a horrible way to do it, but a usleep(0) before ( and maybe after ) the doSomethingSlow() will put a stop to that.

Les Cargill

Vote

R

Rob Gaddi 12 years ago

When I'm doing that I just trigger on the rising edge, and set the scope to infinite persistence. The DSO is a godsend; the fact that good ones have become cheap doubly so.

Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.

Vote

Verify execution speed - "cycle counting" etc?

Join the Discussion

Didn't find your answer?