Hi all, triggered by the recent "cycle counting" threads, I would like to know how you verify "is my code fast enough"? Although you can't predict the exact timing in modern controllers and you use HW timers to get exact speeds, there are many situations where you simply need to be sure that a piece of code is executed before a timer expires or an ADC has to be read etc. Do you insert delays in the code until it breaks, analyse assembler output, not use C at all, logic analyzer, combinations...? Oliver -- Oliver Betz, Munich is broken, use Reply-To:

Don't forget to consider the effects of worst-cast I/D cache misses. In particular consider the ratio of longest:mean and longest:mode times. I'm assuming you're already considering the effects any RTOS scheduling and synchronisation will introduce, particularly if there is priority inversion.

Verify execution speed - "cycle counting" etc?

L

Les Cargill 12 years ago

But this isn't very persistent, and if you get a 1 in a 100,000 failure, you'll probably miss it.

Les Cargill

Vote

H

Hans-Bernhard Bröker 12 years ago

If the thing you're measuring takes about as much time as a port pin toggle, there's no point using any measurement technique. In those cases, you _can_ either prove your timing correctness by cycle counting with pen and paper, or you're toast.

And anyway, a job that's worth timing to that kind of precision must involve some kind of peripheral access, anyway. Because if it didn't, it wouldn't actually matter how long it takes, as it's all behind the scenes. The shortest time really worth measuring on an embedded system is that between two _external_ events. Framing these external events by an additional pair of external events (port pin toggles) won't make enough of a difference to invalidate the result unless you're cutting it _extremely_ close.

Or, to turn this issue around: if you need that kind of timing precision, why on earth did you do it on a serial processor instead of, say, an FPGA? Seemingly hard problems are sometimes just a result of having picked the wrong tool.

And you know exactly how many there are, so you can subtract the time taken by the port pin toggles. Since that doesn't generate any doubt about the actual result, there's no problem.

Vote

N

Niklas Holsti 12 years ago

Huh? Do you think that priority-based preemptive scheduling plus some schedulability analysis (say, response-time analysis with deadline-monotonic priority assignment) is "broken" in some way?

If a system is designed along those lines, changing thread priorities can surely cause deadline misses, but IMO this does not mean that the code or the design/analysis method are "broken".

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

U

upsidedown 12 years ago

As long as the "doSomethingSlow" is compiled separately from the timing loop, the compiler can't reorganize those statements.

Any externally compiled dummy routine should prevent any optimization across calls.

Are you sure that usleep(0) does nothing ?

At least on some cooperative or round robin systems, the XYZsleep(0) type function will at least run the scheduler, possibly activating the thread at next clock tick.

Vote

T

Tim Wescott 12 years ago

doSomethingSlow();

AFAIK, if you declare startTime and slowTime as volatile that shouldn't happen -- but that's "shouldn't" in a moral sort of way, not in a "you can always expect this" sort of way. Apparently sometimes it does anyway.

There's a paper floating around in the eWorld that gets very technical about what volatile's supposed to do, what compilers often fail to do under aggressive optimization, and ways that you can force correct behavior (sometimes at the cost of some execution speed).

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

N

Niklas Holsti 12 years ago

Compilers with link-time optimization (LTO) can reorganize code even then. AIUI, recent gcc versions have LTO. We should hope that LTO can be disabled with suitable compiler options.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

H

Hans-Bernhard Bröker 12 years ago

Actually, not even the moral sort of way really works. The key issue is that making objects volatile affects _only_ volatile objects in any way. It really creates no extra restrictions whatsoever on non-volatile objects. The compiler would be fully allowed, in any sort of way including the moral, to move that doSomethingSlow() before the assignment to startTime, after that to slowTime, or even split it into three separate parts to go before, between and after them. The only thing those volatiles really do prohibit is moving slowTime before startTime.

Making stuff volatile, if it's actually needed, may _always_ costs some execution speed. That's part of the reason it exists.

Vote

T

Tim Wescott 12 years ago

Perhaps I should have said "more execution speed than is necessary to just do what it should".

Clearly the C language needs to allow you to mark a block as volatile, to say "execute everything inside this block after any "volatile" stuff before and before any "volatile" stuff after, but do whatever you want inside.

volatile { do_some_stuff; }

And it needs to be simpler, too -- it's just getting too complex.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

L

Les Cargill 12 years ago

Absolutely not. Your dummy routine is a really good variation on what I'd said.

it will not be costless. It may cause a context switch - I'd count on it.

Right. And even with preemptive systems, it may realign you with the clock tick that's usually at 10 msec.

>

Les Cargill

Vote

L

Les Cargill 12 years ago

Outside of an ... FPGA register that's updated asynchronously, I haven't trusted the "volatile" keyword for over a decade. There were failures, and rather than spending my employer's dollar on chasing them , I just changed direction.

Using objdump was enough. W. T. F?

Les Cargill

Vote

L

Les Cargill 12 years ago

I mean that these things are inherently stochastic, and if you are betting the ranch based on awhat a 5"X4" screen tells you...

Les Cargill

Vote

L

Les Cargill 12 years ago

Absolutely. I mean absolutely no offense, but having done that sort of thing for a long time, it finally dawned on me that it was mostly a waste of time. Add silicon; it's the only way.

we don't have to scale processing to purchase price any more.

The number of cases where this design choice is defensible gets smaller in my mind every year. I swear I cannot even remember why one would do that in the first place, and it's a common element in so many failure stories.

Granted, if a deadline miss doesn't *matter*, then by all means. But if it does, I'd replan and put specialist processors or FPGA on the job.

At least think about it. It may well be that you live in a domain where that sort of thing is inevitable. I have managed to stop doing that.

Les Cargill

Vote

L

Les Cargill 12 years ago

Holy cow! So who watches the watchers?

Les Cargill

Vote

N

Niklas Holsti 12 years ago

I suppose you mean to ask, in your quaint way, why one should trust these tools. Of course they should not be the only means of verifying real-time performance -- validation of the real behaviour in the real environment remains necessary for critical things -- but they add evidence that the code is fast enough even in very rare situations which are unlikely to happen in validation tests.

The static analysers can find performance problems early on, before the system hardware is running, and both kinds of tools are good at finding the most time-consuming parts of the SW, as an alternative to profiling.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

N

Niklas Holsti 12 years ago

Assuming that the silicon is used for parallel processing (FPGA or otherwise) that is an alternative, I agree. But not for everyone.

SW in space systems. Constrained size, mass, power, clock frequency. Radiation-tolerant, big-feature chips with internal triple modular redundancy.

Even if most or all of the high-rate, high-volume processing is delegated to FPGAs, the CPU or CPUs still have to do both slower background processing and rapid responses. Preemptive scheduling and priorities are still needed, IMO, at least in my field.

What sort of systems do you work on?

Are there many others in this group who share Les' view on priorities?

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

U

upsidedown 12 years ago

I have been working with priority based pre-emptive systems for just a few decades and I still think this is the right way to handle most things. Some things to check:

1.) Can you run the system with RT priorities ? The nice thing about priority based systems is that you can _lower_ the priority of non-critical tasks, i.e. you can "sacrify" less critical tasks. 2.) run (at least part of the functionality) in interrupt context and do the actual processing of the queue in normal context. Suitable for burst with low average load. 3.) Run everything in FPGA with low average load.

Vote

J

John Devereux 12 years ago

You can instead set the scope to trigger when a pulse width is exceeded. That will catch any such event, there is no dead time. Of course that in itself does not prove anything, but it can catch your 1 in 100000 failure.

John Devereux

Vote

O

Oliver Betz 12 years ago

[...]

Depends on the hardware counter, it might rollover at other values than 2^32.

One has to look at the code and use barriers if needed.

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

O

Oliver Betz 12 years ago

[...]

again a generalisation I disagree with.

I do so.

Size, cost, power, peripherals. I don't know a FPGA with integrated "precise enough for UART" oscillator, ADCs and low power consumption in tiny packages for less than 1USD.

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

D

David Brown 12 years ago

Yes, that is exactly the point I was getting at (but letting people think a little before giving them the answer). "volatile" accesses are strictly ordered (assuming there are sequence points between them - there are no order guarantees if you do something like "foo(vol1, vol2)"). But volatile accesses have no ordering requirements with respect to non-volatile accesses or any calculations.

There are various ways to enforce - or at least, partially enforce - the desired behaviour. Some have been mentioned in this thread.

If the compiler does not know anything about "doSomethingSlow()", because it was separately compiled (and you are not using LTO or other whole-program optimisation), then the compiler has to assume it will contain volatile accesses, and it cannot do any re-ordering. You can achieve a similar "zero knowledge" effect by using function pointers, although gcc can sometimes optimise function pointers too. If the function pointer itself were volatile, you would be okay.

As long as the function doSomethingSlow() is compiled as a stand-alone function, rather than being inlined, then it is very unlikely that the compiler will re-order around it - but it /could/, if it wanted to.

Most compilers have some sort of "memory barrier" - there is no standard C solution. Sometimes this is done using intrinsics or compiler extensions. In gcc, it is done using inline assembly with a "memory" clobber: "asm volatile ("" ::: "memory"); ". This tells the compiler that memory may be read or written in an unexpected (volatile) manner. It does not force a complete ordering on code, but it forces an ordering on data in memory - no memory accesses can be re-ordered across a memory barrier, or "cached" across the barrier. So a memory barrier before and after the call to doSomethingSlow() will disallow the compiler from most re-ordering of code, and will usually have low impact on the speed.

It would be nice if there were some C standard way to deal with this sort of thing, but there is none - and there is not likely to be any. C simply does not have any concept of the timing of events, and only considers the order of /observable/ events to be relevant (i.e., volatile accesses and calls to external code).

Vote

Verify execution speed - "cycle counting" etc?

Join the Discussion

Didn't find your answer?