Verify execution speed - "cycle counting" etc?

[snip]

Whew, for a moment there I thought you were inserting delays to fix the problem.

Honestly I really haven't done such timings on modern multilevel cache machines. Where we find such timing to be critical we are finding inexpensive solutions such as Arduino boards to localize the hard real-time issues.

Reply to
Ed Prochak
Loading thread data ...

One thing I did recently on a project is write the initialisation code so the CPU frequency could be set separately from peripheral clocks. So you can run the system at 1/2 CPU speed, say, but have everything work normally otherwise.

--

John Devereux
Reply to
John Devereux

I understood Regehr's suggestion differently, see below.

...

I understood Regehr's suggestion for "explicit function calls" to mean that the C standard would provide such library functions, which are defined in the standard to work as volatile accessors, just as your volatileAccess macro. (It seems that Regehr believes that this approach would make it easier for compilers to implement volatility correctly. Maybe, but I'm doubtful that such an essentially syntactic change could help much.)

The "explicit function call" or macro brings with it the risk of forgetting to use the call or macro for some accesses to the variable, and then one gets non-volatile behaviour. IMO the "volatile" property logically applies to the variable, helping to ensure that all accesses to that variable are volatile accesses.

Yep.

For a pedant, that's a bit too strongly worded, I believe: the value that is written into a volatile variable must be computed before the volatile write, and any code using (and depending on) a value read from a volatile variable must be executed after the read. For some meaning of "execute", at least -- speculative execution with a guessed "read" value is allowed, of course, as long as no wrong external effect follows.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
      .      @       .
Reply to
Niklas Holsti

Good to know, thanks.

Aaah... good catch. My imagination failed me again.

(Shouldn't "rest of startTiming()" go here, too?)

Yes.

I agree on both -- and wish that the language standards would consider these special needs of embedded / real-time systems.

But would a memory barrier work here, for timing a piece of code? It seems to me that if doSomethingSlow() starts with some purely register-using code, it could be in-lined before any memory barrier or volatile access in startTiming(), and ditto at the end some register-using code could move after stopTiming().

Indeed. The whole concept of the "execution time" of some arbitrarily delimited sequence of instructions becomes fuzzy, and only physically serialized external effects can be used as timing points.

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
      .      @       .
Reply to
Niklas Holsti

I agree that it is usually best to find a method that works for /all/ compilers. But failing that, finding something that works for the compilers you use is better than nothing!

In this particular case, the issue is that C++ does not specify that an lvalue should trigger a volatile read. In C, a simple "vol;" statement is an rvalue - thus it is a volatile read. In C++, it can remain an lvalue and not be read. But if you have "(void) vol;", the cast forces it to be an rvalue and therefore it must be read.

In general, the rule with volatiles is to keep things simple - then there is much less need for checking the details.

(Of course, there is already lots that is compiler or target specific, rather than being completely specified in the standards. Some are obvious, such as the size of "int" and the endianness. Others are much more subtle. And then there are lots of embedded compilers that claim to follow the standards, but have slight differences - and many that don't even bother to make claims of standards compliance.)

Well, code that relies on the order of execution between volatile accesses would fall into this category - and such code is not hard to find (I can find it in my own old projects :-)

No, I think I understood you correctly - but /my/ wording may have been confusing.

Not everyone understands that you have to tell the compiler this - and some people believe the compiler should not be "smart", but should translate code so dumbly that you don't have to tell it anything about volatiles.

There is a third option - you can assume that the compiler doesn't move memory accesses around unnecessarily. Then you can assume it will all work fine - and for more limited compilers, or compilers with optimisations disabled, it /probably/ will work out fine - and the code will not be quite as bad as if the compiler always treated all memory as volatile. Of course this is dangerous, but people /do/ write code that makes such assumptions.

Yes, the difference between -O0 and -O1 is usually very large.

I believe I understood you, but you misunderstood me - I agree entirely with you about what code is broken here. I just think there are more people that write such broken code, and I know that there are some people who believe that such broken code is not broken, but it is the compiler that is being overly clever (and is therefore broken).

I agree with your attitude here.

It's a bit side-tracked from the original thread, but it is always good to think through these things. And I hope it has provoked thought in others (whether they agree with you and I or not).

David

Reply to
David Brown

Of course, that assumes gcc is bug-free... gcc doesn't have many bugs, but I would make sure I used a recent version (at least 4.7) before using LTO - let other people find those bugs!

Missed that one - it should simply be the whole of startTime() here.

That's absolutely true - memory barriers are /data/ barriers, not /code/ barriers. But in most cases, there will not be much in the way of code before the first memory read, and not much after the last write (otherwise much of that code could probably have been computed at compile time, or eliminated as unneeded code).

Yes.

Reply to
David Brown

I have been reading his idea to mean wrapping volatile accesses in C functions - replacing "volatile int vol;" with something like this:

static inline int readVol(void) { return vol; } static inline void writeVol(int x) { vol = x; }

(perhaps with generic functions of some sort rather than individual ones for each volatile variable).

Even though these are function calls, there is nothing that forces the volatile-style access of the data. The same applies whether these are compile-time inlined, or link-time inlined.

But I know that some people /think/ that using a function call like this forces such access.

However, if the guy was meaning a special kind of accessor function (such as suggested in another of his blog posts you reference) rather than a regular C function, then I agree with him - such special functions are definitely a good alternative to "volatile". But he didn't say anything like that in the particular post first discussed.

Reply to
David Brown

I did not see anything like that in that particular post, but he has other posts which suggest accessor functions like "__volatile_read_int" with special functionality.

Marking the variable "volatile" just sets the default access method - it does not enforce it (it can be cast away in a similar manner to volatileAccess() above). But you are right that it helps ensure that it is not forgotten. I mark variables as "volatile" if they will always be used that way (hardware registers are usually always volatile), but use volatileAccess() when I sometimes want different uses (perhaps using volatileAccess() from main(), and non-volatile access within an interrupt function).

The big risks of accessor functions are forgetting to use them, or producing uglier and harder-to-read code. I could imagine a C++ class could be used to get the best of all worlds, but that doesn't help for C.

Fair enough - the compiler is bound by the laws of causality!

Reply to
David Brown

I think we basically agree that volatile are a problem no matter how they are handled. One approach that I took at one point was to treat a volatile as meaning a single read within a the processing of a C source statement. (remaining references in that statement could be register or variable references of algebraic copies and generally would be) It was implemented as processing of a C source statement but between sequence points maybe would have been a better approach.

What this did was give the developer some control. If they really did want more than one close access then they could break a longer statement into smaller statements. It did get around the compiler issue I have seen most where out of order execution (within a statement) could create side effects that were hard to debug in the application.

This approach fundamentally works even though it is not strictly correct. Some of our compilers were implemented that way.

w..

Reply to
Walter Banks

Yes, and as a compiler writer, rather than just a compiler user, I am sure you see a lot more challenges than the rest of us. /We/ can make live easier by using simple constructs when using volatile, but you have to deal with the users who are trying to be "clever".

I believe (but I am not entirely sure) that the standards say that if there are any volatile variables accessed between two sequence points, then the generated code should make the same read and write accesses. But the order of these accesses is up to the compiler, as is the number of accesses (if the code says more than one access). So if you write "vol1 = vol2 + vol3++ + vol2;", then any of the following are allowed:

read vol2, read vol3, read vol2, write vol1, write vol3 read vol3, read vol2, read vol2, write vol1, write vol3 read vol2, read vol2, read vol3, write vol1, write vol3 read vol2, read vol3, write vol1, write vol3 read vol3, read vol2, write vol1, write vol3

read vol2, read vol3, read vol2, write vol3, write vol1 read vol3, read vol2, read vol2, write vol3, write vol1 read vol2, read vol2, read vol3, write vol3, write vol1 read vol2, read vol3, write vol3, write vol1 read vol3, read vol2, write vol3, write vol1

read vol2, read vol3, write vol3, read vol2, write vol1 read vol3, read vol2, write vol3, read vol2, write vol1 read vol3, write vol3, read vol2, write vol1

read vol3, write vol3, read vol2, read vol2, write vol1

(Ordering of read vol3, write vol3 and of write vol1 is due to causality rather than standards-required ordering.)

In other words, the generated code has to match the C abstract machine (or dumb, literally translated code) for volatile accesses at the sequence points, but can do what it likes in between.

I think the compiler is free to handle this the way it wants, but it needs to be consistent as to how many reads and writes are generated when there is more than one read or write in the source code (such as "(vol1++)++") - "What constitutes an access to an object that has volatile-qualified type is implementation-defined".

But this should be handled according to sequence points, rather than by C statements (in any sane code, this will amount to the same thing).

Absolutely.

If it were handled by sequence point rather than statement, then I think it /is/ strictly correct. But I am only an amateur C language lawyer, so I could well be wrong (as you say, the only thing we can be absolutely sure of is that volatiles are a problem).

Reply to
David Brown

More to the point, it has to perform _all_ of them. More further down.

I'm reasonably sure the latter two are not allowed. The rule is that every access to a volatile object must actually take place as seen in the source code. IOW "volatile" cancels the "as-if" rule for the object it's applied to.

Sure there's no rule on the _sequence_ of things between sequence points (thus their name), so the the code can order things any which way it wants, even if they're volatile.

But skipping the second read of vol2 here would, by my reading, violate C99 5.1.2.3p2: in the source there are two accesses to the volatile object 'vol2' between that semicolon and the preceding sequence point. If either one of those hasn't been "completed" before crossing that semicolon, the compiler has violated a "shall" clause of the C Standard. Neither does the "what constutes an access ... is implementation-defined" absolve it, IMHO: there's no way one of those referrals to "vol2" is an access, but the other is not.

Reply to
Hans-Bernhard Bröker

Well, the very fact that we interpret this differently shows the big problem here. I cannot say that I am right and you are wrong here - but I know that both interpretations have been used by many people (compiler writers as well as compiler users). /My/ reading of the standards, and noting that the compiler is given great freedom to what it does between sequence points, is that multiple volatile reads of the same address can be combined when they are within the same pair of sequence points. In particular, I think that the compiler is free to view all parts within a sequence as a whole and make decisions about "what constitutes an access" based on that whole - it does not have to separate the parts into individual accesses. So it can take that sequence and see that it contains at least one volatile read of vol2, and issue a read of vol2.

I also understand /your/ reading of it. To my knowledge, the C standards folk have never come up with a clear "official" answer here, despite it being a common complaint against the standard.

(If anyone knows differently, and can give a clear answer, I would love to know.)

And of course there are worse aspects than this regarding volatiles, at least partly due to the phrase "what constitutes an access ... is implementation defined". On some processors (such as pre-Cortex ARMs), the most efficient way to read a byte is by reading 32 bits and masking. What should the compiler do when asked to read a volatile uint8_t ? How should the compiler interpret a write to a bitfield? On some processors, that means separate reads and writes - on others it could mean a single RMW cycle. What about a bitfield declared as uint8_t, but which straddles the boundary between two bytes - should the compiler issue a single 16-bit read or two 8-bit reads?

Reply to
David Brown

This is probably the best simple example that illustrates the whole volatile problem from all points of view.

Part of the reason that there never has been a clear answer is there is so much distance between the simple always read/write a volatile and even the fundamental ways a statement is parsed and code generated.

I have looked at this problem in our own products for a very long time and never had an answer that was clear and unambiguous.

w..

Reply to
Walter Banks

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.