Verify execution speed - "cycle counting" etc?

A volatile function pointer really is a good way to force a function call where one wants it. New one to me, thanks for it!

This trick should defeat even LTO, assuming that the C compiler implements volatiles as it should. And assuming also that the LTO knows about and respects volatiles. Does anyone here happen to know if gcc LTO respects volatiles?

--
Niklas Holsti 
Tidorum Ltd 
 Click to see the full signature
Reply to
Niklas Holsti
Loading thread data ...

True!

It's always a challenge.

--
Les Cargill
Reply to
Les Cargill

Ah. Yeah, I can see that then. Corner SWAP requirements like that will up the game.

I'd at least put priority-dependency in the "risk" column. If you can avoid it, it avoids the risk...

These days? Comms systems that act more as message switches than signal processors. This is more in an industrial setting. It's not all very hard-real-time, although there's usually one thread that runs on a tick to do sampling-based things like read A/D and all.

--
Les Cargill
Reply to
Les Cargill

It's not a *bad* thing; it is just hard(er) to verify. There are Black Swan/"fat tail" errors latent.

If you can, that's great. Generally, one goal has always been to make ISR code as short as possible...

--
Les Cargill
Reply to
Les Cargill

If you are concerned only about pieces of code containing some 100 instructions, you may be right. If that code has some conditional branches and a couple of loops, and a cache is enabled, these tools (at least the static-analysis tools) could be technically apt, but perhaps not cost-effective at such small scales.

Indeed, the Flash interface often has cache-like, wide buffers with prefetch and perhaps special buffering for repeated jumps (innermost loops). The tools I referenced handle those in the same way as they handle caches.

If DMA transfers are not synchronized with respect to the time-critical pieces of code, you must assume that all memory accesses in the critical code are delayed by the maximum DMA load (hopefully the bus arbiter guarantees some fixed worst-case access latency for CPU accesses even in that case). The static WCET analyzers can be configured to make such assumptions.

The static WCET analyzers use the binary code, not the source code, so variations in compiler output do not harm the analysis, but do make the new WCET bound for changed source code less predictable from the changes.

...

The static analyzer from AbsInt understands about delays like those in the Coldfire peripheral accesses, I believe. But if the peripheral registers are accessed through pointers, the points-to analysis may not be able to separate peripheral accesses from ordinary memory accesses, so the tool may have to make pessimistic assumptions.

--
Niklas Holsti 
Tidorum Ltd 
 Click to see the full signature
Reply to
Niklas Holsti

IF your WCET bounds and schedulability analysis are correct, they will reveal such errors or prove their absence, at least when static WCET analysis is used. (Hybrid measurement-based WCET analysis does have a "fat tail" problem, IMO, although the tool vendors claim that the risk can be made smaller than any HW failure risk. I'm not quite convinced about that.)

Such foreground-background systems are in effect preemptive systems with two priority levels. If there are nested, prioritized interrupts, they are equivalent to multi-priority preemptive tasking systems, although implemented with interrupt HW rather than SW scheduling.

Even in the absence of nested interrupts, if there are several interrupt sources, each with their own response deadline, IMO it is necessary to perform some sort of WCET analysis or measurement of the interrupt handler plus a response-time analysis of the worst-case scenario of simultaneous or closely spaced interrupt requests.

--
Niklas Holsti 
Tidorum Ltd 
 Click to see the full signature
Reply to
Niklas Holsti

Yes, but we can - and do - have more than just that. In a system, where ISR-s are as short as possible, preemptive scheduling kicks in only if cooperative scheduling fails to relinquish control things can get pretty close to the limit the hardware allows. IRQ latency determines the overall system latency to external events which require the shortest response time (a few uS), the rest of the stuff (including file I/O, networking etc system stuff) can be queued and processed based on priority etc. As it is done under DPS :-).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Reply to
dp

You're probably referring to "Volatiles Are Miscompiled, and What to Do about It," E. Eide and J. Regehr, Proceedings of the Eighth ACM and IEEE International Conference on Embedded Software (EMSOFT). It's been out for a few years so, one hopes, at least some of the issues addressed have been corrected.

Reply to
Rich Webb

... that's the Big If here... if you can go with those confidently, then you're good.

Yep - although they're not preemptive necessarily in the sense of "preemptive multitasking" - having the tier tick do a context switch.

HW scheduling doesn't help for hopelessly oversubscribed CPUs....

Sadly, what to actually do about these issues tends to be highly domain-specific.

--
Les Cargill
Reply to
Les Cargill

.. and that's another class of latent defect, IMO. You need carefully calibrated data sets to stimulate known-to-be WCET,, or define carefully what can be dropped and account for it vigorously.

What is DPS?

--
Les Cargill
Reply to
Les Cargill

One of the authors has a blog post from 2011 about it. The paper itself is linked from the blog post, if anyone wants to see it.

Blog post:

formatting link

Quote:

"...in retrospect, I believe volatile has proved more trouble than it's worth, and that C/C++ would be better off without it. The alternative is to use an explicit function call to access variables that live in special kinds of memory; these calls need not have high overhead since they can be inlined."

Reply to
Paul Rubin

gcc should respect volatile accesses regardless of LTO.

Of course, the compiler is free to take something like this:

extern void doSomethingSlow(void);

typedef void (*FVoid)(void); volatile FVoid indirectDoSomethingSlow = &doSomethingSlow;

void foolTheCompiler(void) { startTiming(); indirectDoSomethingSlow(); stopTiming(); }

and turn it into this, with everything inlined or as extra small functions - whatever the compiler prefers:

void foolTheCompiler(void) { volatile bits of startTiming()... FVoid fp = indirectDoSomethingSlow; if (fp == &doSomethingSlow) { some non-volatile bits of stopTiming()... inline some of doSomethingSlow mixed in here... rest of startTiming()... volatile bits of stopTiming()... the rest of doSomethingSlow()... any left-over bits... } else { // External function call is a full barrier fp(); // External function call is a full barrier stopTiming(); } }

It is unlikely that the compiler /would/ generate something like that - but it would be legal. Trying to "defeat" the compiler is seldom a good idea - though it can occasionally be useful to know tricks like this. It is usually better to use things like memory barriers, which are working /with/ the compiler instead of /against/ it.

And don't forget that once you have "beaten" the compiler, the processor could also re-arrange things if it is an out-of-order or superscaler processor...

Reply to
David Brown

This is clearly written by someone who does not understand one of the basic premises of C - the compiler must generate code that gives the same observable results as if it were run on the "C virtual machine". These "observable results" are calls to /external/ functions, data passed back and forth to these functions, and volatile accesses. So wrapping your data in accessor functions written in C without "volatile", and then inlining them, gives absolutely /nothing/.

The paper and its authors are correct that "volatile" is not as well defined as it could be (and C++ makes it slightly worse than C), and that compilers are poor at complicated cases. But the blog post is not correct that "volatile" is "more trouble than it is worth" - and without an alternative standardised solution, it is the best we have at the moment.

Torvald's "rants" about volatile are certainly worth reading - they make clear some of the misconceptions about "volatile" (that it can be used to mean "atomic", or in place of locking mechanisms in multi-threaded code). And Linux provides an alternative mechanism using accessor functions - but critically, these accessor functions either use "volatile" or inline assembly in their implementation.

There are three key issues to using "volatile" safely:

  1. Use at most one read and one write volatile access in any one statement (or between two sequence points). "x = vol1;" or "vol1 = vol2;" is good, but "foo(vol1, vol2);" or "x = vol1 + vol2;" is bad. Stick to this rule, and you can be confident that the compiler will not generate incorrect code - and just as importantly, you can be sure that /you/ will not get confused or make incorrect assumptions about ordering.

  1. Remember that "volatile" applies to /accesses/, not to data. It is not the variable that is "volatile", it is the /reading/ of the variable or the /writing/ of the variable that is volatile. Personally, I often use this macro to emphasis this, and to help optimise code:

#define volatileAccess(v) *((volatile typeof((v)) *) &(v))

  1. Remember the limitations of volatile. All it says is that the compiler must do the read or write operation at this point, and that the compiler must preserve the ordering of the volatile operations. It does not say anything about the ordering with respect to other code. It does not say anything about atomic access to the data, and does not provide any locking or synchronisation with other threads (or interrupts). It does not say anything about timing, about cache access, bus locking write buffering, cpu instruction re-ordering, or anything else related to the hardware of the system. It does not give any guarantees about read/write sizes (compilers can give that guarantee, but the C standards do not) - this can be particularly surprising with bitfields. Use volatile bitfields with caution.
Reply to
David Brown

On Sunday, February 23, 2014 9:56:57 PM UTC+2, Les Cargill wrote:

Well it depends on what you do. IRQ latency is easy to measure/prove, given the underlying OS guarantees staying masked a small maximum number of cycles (well below 1uS @ a 400 MHz power core in my example). So the IRQ latency is as long as your longest IRQ handler is - it is up to the application programmer to ensure that. The remaining latencies can be in the order of mS or tens of mS, they are dealt with at user level by tasks with different priorities (writing them wisely so task switch occurs cooperatively rather than waiting to be forced out preemtively minimizes lost CPU resources). The example I can give is our netmca-3 device, it manages to process an input stream of a few MSPS 14-bit ADC data with no lost samples, detect "events" in the data stream, filter (doing plenty of 64 bit FP MACs) the events; all this can take something between 10% and 80% of the CPU time depending on the density of the events. The remaining 20% are sufficient to maintain networking, (tcp/ip over 100 MbpS Ethernet),1-2 VNC links (includes "screen" change detection and compression), do disk I/O when needed and probably a few things I can't think of now (e.g displaying, maintaining windows, refreshing "display" memory from off-screen buffers etc.). And (though not used on this particular device but on a very similar one) the CPU deals with the 2 PS2 connectors, 1 interrupt per PS2 clock, without ever missing a byte (has plenty of margin really). (All this on an MPC5200B processor).

The OS under which the above runs. Come to think of it, not many an OS can claim all of that in one place :-).

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Reply to
dp

Let me ask a "meta" type question.

_Why_ should you or I need to know tricks like this, even though we do need to know this with modern compilers ?

Or to put it another way, have compiler optimisers gone too far and is the quest for fractional improvements in code generation coming at the expense of reliability in the generated code ?

There are some safe optimisations, such as looking at a function as a whole instead of line by line (so the code generator can pull commonly accessed variables into registers at the start of the function) which no-one would disagree with.

You also need to tell the compiler if a memory location can be changed from outside of the code (ie: hardware register mapping) which no-one would also disagree with having to do.

However, it feels like there are an increasing set of rules and aggressive optimisations just so someone can be a few percent faster (if even that) which feels like it is coming at the expense of robustly generated code without any surprises in it.

I wonder if it's time to take a step back and ask if compiler optimisation has become too aggressive ?

Should we have some explicit "--enable-dangerous-optimisations" option which doesn't get turned on with any of the normal optimisation levels until requested and for which it's side effects are clearly documented in the manual ?

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

Simon Clubley schreef op 24-Feb-14 3:13 PM:

The problem is that the users have grown to depend on the compiler NOT performing optimizations that the compiler has always been allowed to do. In other words: users could get away with code that was strictly speaking invalid or undefined or erroneous (or whathever the language definition calls it).

Wouter van Ooijen

Reply to
Wouter van Ooijen

You raised a bunch of points.

1) Are compilers optimization being too aggressive? Yes and no. Most of the embedded systems targets that we compile for require tight fast, low power code. Some other systems the optimization may not be as critical. What is a problem in many compilers is too many optimization options and it is rare that they are tested in combination.

2) The best way a programmer can help the compiler is to write clear clean source code.

3) The best way the compiler vendor can help the application developer is focus the optimization on the requirements of the intend ended applications.

For example, small scale process control and data logging intended applications would benefit more from the compiler developer's time in optimizing the handling of polynomials than many other potential optimizations.

4) Volatile variables as has been noted in this thread are both badly handled and extremely difficult to handle. As an example of this some volatile variables have hardware side effects, out of order referencing the volatile in some rare (but significant) cases results in the side effects happening not where they were intended. One case for example is where a reference cleared a buffer flag in hardware, a second reference then dropped a byte of data. Solvable sure, nasty to find, you bet.

Walter Banks Byte Craft Limited

Reply to
Walter Banks

Let me respond first with another "meta" question - /do/ we need to know this with modern compilers? The answer here, I think, is that at least some of us need to be aware of this sort of thing - even though much of what I wrote was theoretical rather than real-life compiler behaviour. I certainly have seen a development in compilers over the years that has "broken" code that "worked" with previous compilers, because the programmer did not understand what the compiler can do when generating optimised code.

As an aside here, it is important to note that people who think this sort of re-ordering of code only applies when they choose "optimisation" flags are /wrong/. A compiler can legally (in the sense of "C standards" legally) generate as good or bad optimised code as it wants, with a total disregard for optimisation settings. Such settings are just a hint to the compiler - they are never a command.

For the next point, let me misquote and paraphrase Linus Torvalds a bit when he says that "volatile is evil". There are good reasons why many developers should /not/ know about volatile and its complications. When you are using an OS of some sort (an RTOS, embedded Linux, or whatever), then generally you should /not/ be using "volatile" yourself. You might be using volatile registers declared in device header files, and you might use it indirectly (as the source for the OS almost certainly will use it somewhere). But in most cases where you might think of using "volatile" for your own data, you would be far safer using constructs from your OS (such as locks, shared data, atomic data, etc.). When you are using an OS, "volatile" is rarely sufficient or necessary.

Of course, when you are working "close to the metal", or implementing such OS features, then you need to know all about volatile - in all its gory details. But volatile alone is not enough - you also need to know about the cpu details, such as cache coherency, synchronisation instructions, etc.

Also note that my discussion earlier concerns what the C standards say you can rely on regarding ordering, volatiles, etc. Any given compiler is free to make additional promises to make the developers' live easier. As an example, C++ has loser specifications for "volatile" than C has:

extern volatile int vol; void foo(void) { vol; }

is guaranteed to read "vol" in C, but not in C++ (it /may/ read it, but it is not required). gcc states in its manual than this /will/ always cause a read in C++ - it guarantees that volatiles in C++ will work as they do in C. (In this case, you would be better writing "(void) vol;", which is guaranteed to read vol by the C and C++ standards, and will also avoid spurious warnings about unused statements.)

What this means is that programmers should learn to understand how much they need to know, and use methods that are appropriate for their usage. If they are using an OS, they need to learn the right methods for that OS. If they are using bare-metal, they need to understand how their compiler works (and if it gives additional help), what sort of code is generated, and also how their cpu works and how /that/ affects the working code. Understanding the C standards behaviour is useful - but it might not be necessary, and it is certainly not sufficient.

No, I don't think so - smaller and faster object code is always a nice thing. It is important to be able to limit the optimisations somewhat as an aid to debugging and development, but compilers should be able to generate as optimal code as they are able.

However, I think the C (and C++) standards could benefit from more specifics regarding some types of behaviour, and perhaps with the addition of features targeted at embedded developers who need tighter control. (The addition of atomics in C11 and C++11 is one step towards this.) And I think compilers could benefit from extending beyond what the standards say, if the standards don't do a good enough job here - for example, memory barriers of various types could be compiler intrinsics instead of needing inline assembly (or at least, they could be given standard names and put in standard headers).

I have seen "working" code that breaks when this sort of optimisation is used. /I/ agree that this sort of thing is "safe" optimisation, but I know that others may disagree.

I have also seen people write code that /assumes/ all reads and writes of memory-mapped registers and other data is automatically "volatile". I have seen compilers with command-line switches to make /all/ non-register data work in this way.

In other words, I agree with you here - but I disagree about who you think might disagree. (We are having a meta-discussion here....)

Some people think the compiler should blindly translate code line for line, and not try to be "smart". There may be a use for such a compiler, and it is certainly possible to make a standards-compliant C compiler that acted that way - but C does not require a compiler to behave like that.

I can certainly agree that it can be a problem when compilers surprise the developers with their optimisations. But I don't think compilers have yet reached the point where they are so good that there is little left to gain in optimisation - and we users always want smaller and faster code.

One thing that makes a big difference is good use of compiler warnings. This can make a significant difference by helping spot when the optimisations may change the meaning of the code - and there is always scope for compilers to have better warnings. For example, optimisations from type-based alias analysis can surprise developers - gcc is able to warn about many cases where there could be problems.

There are some cases in which the C standards are /too/ restrictive, and compiler flags let the compiler break the standards in order to generate better code. For example, with floating point code you might expect the compiler to be free to change "x / 2.5" into "x * 0.4" - but that kind of thing would break the standards rules. gcc has a "-ffast-math" flag that lets the compiler do optimisations and make assumptions that are fine for most programs, but are against the strict rules.

And often compilers will have specific flags to enable or disable particular types of optimisations, to let you have control over what you consider "dangerous" - as this will vary from user to user. This is, of course, entirely compiler-specific and is not part of the language standards. Reading your compiler manual is definitely recommended :-)

Reply to
David Brown

If you read some more of that guy's blog, it should become clear that he understands the premises and fine points of C better than basically anyone else.

I read it as more like a link-time inlining (though done by the compiler). I.e. the observable semantics of an external function call would be preserved. See

formatting link
for more info.

Nice.

formatting link
has more discussion.

Reply to
Paul Rubin

That's a very good point and one to keep in mind.

It's things like this, when you need to read the compiler manual to be sure, which I really dislike. :-) If caught by this, I would determine what was required to make sure the line was always executed, make sure I always used it in my code from now on, and hence make the choice of compiler a non-issue.

Ouch. Provided such code declared things like volatile variables correctly, I would consider that broken code.

My wording above is confusing, sorry.

What I meant to say was that I cannot see why anyone would disagree with having to tell the compiler when a memory location can be changed from outside of the code.

If you are not willing to tell the compiler that, then it has to assume _all_ memory references are volatile and hence the generated code would be hopelessly inefficient.

If anyone reading this disagrees with that, then try turning on the gcc optimisation option which optimises at line level only, not function level. I did that recently for a unrelated reason for a ARM MCU and the result was _very_ illuminating. :-)

(And yes, I _know_ line level optimisation is not the same thing as treating all variables as volatile. :-) It's just a easy way of showing how quickly the code size increases without function wide variable optimisation.)

I think you have misunderstood what I was trying to say due to my poor wording. Given how safe optimisation is when optimising variables at function level (provided you tell the compiler which variables cannot be optimised in that way) I am inclined to think any code needing the above would be broken.

I think "dangerous" for me is whether you can reasonably determine the behaviour of the code by examining just the source code, or whether you also need to read the compiler manual in order to see how it, in general, or at the specific optimisation settings in use, can modify the code semantics.

Of course, I am also the type of person who actively tries to write simpler code; I have _no_ macho desire to use every trick I know in my code and in the end I consider a macho approach in programming to be self-defeating. That approach does also mean I tend to trip up less frequently over optimisation issues than other people seem to do.

Thanks for the comments David, they were rather interesting.

Simon.

--
Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP 
Microsoft: Bringing you 1980s technology to a 21st century world
Reply to
Simon Clubley

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.