Modern debuggers cause bad code quality

Oliver Betz · 2014-12-02T15:31:52+00:00

Hi All, of course, the subject is just a rant to make you read and comment this. Did developers two decades ago think better before they started coding? In the early days of embedded computing, most embedded developers could use a TTY interface at best and instrumented the code with some print statements if something went wrong. A build and test cycle took several minutes because erasing and programming EPROMs took so long. ICEs were extremly expensive and didn't even provide the capabilities of modern tools. Today, you can get some kind of "background debug interface" nearly for free, and build and upload new code in seconds. On the ESE Kongress in Sindelfingen, Jack Ganssle lamented today in his keynote about developers spending 50% of their time on debugging. Could it be that today's sophisticated tools lead to more "try and error", less thinking before doing? Oliver -- Oliver Betz, Munich

D

Dimiter_Popoff 11 years ago

I imagine I would have felt the same had my main assembly experience been x86 or similar. 68k assembly - the language itself - has been an excellent foundation to build on - (un?)fortunately I am the only person being busy doing that I suppose :-).

You have misunderstood me. I am not advocating any "close to natural language" thing, why would we want that. We want a language which makes our brains more efficient at programming. So my analogy with natural language is of the sort "when you use a low level language you deal with words and when you use a HLL you deal with predefined sentences". Thus high level languages deprive you from a most basic feature of languages - the ability to design your own sentences. Hence the eternal "C is an assembler" vs. "it is not one" thing, language users just do need the low level to design the higher one themselves according to what they want to say. Predefined sentences are just for the general public which does not author much in writing to enable them also to communicate :-).

I agree that the set of programmers is larger of course and if by "more expressive" you mean "packs more info into less text" I'd have to work hard to check if I agree or not. But this does not make it better in many cases, for example for me it would mean I would be more like the rest of the world but it would degrade my efficiency to a fraction of what it is now which simply would not work, I'd not survive the way I do now (being unable to offer what I have on offer now).

Well like I said at that level of "same" I agree with you, of course :-).

My average output is around 150 kilobytes of source text/month. I have thrown away very little of what I have written over the past 20 years, and of course the entire thing is subdivided into separate "projects". E.g. when I added a tcp/ip stack to DPS it took me about 6 months to get to basic tcp connect functionality and another 2-3 months to do the basic higher level, DNS, ftp, smtp etc. I needed. Giving these figures to just make the picture of what we talk about clearer, sounded too general.

Dimiter

------------------------------------------------------ Dimiter Popoff, TGI

formatting link

------------------------------------------------------

formatting link

Vote

D

David Brown 11 years ago

Type-punning through unions is defined in the standards, and works as expected to my knowledge (though earlier C standards were not entirely clear about this). memcpy() will always assume that the source and destination pointers may alias other areas (but they may not overlap each other).

But the compiler does not have to generate a call to a memcpy function - it can generate the "copy" inline, and it is free to make as many or as few copies as it wants, as long as the behaviour is /as if/ it called memcpy().

If you rely on memcpy() code to do something else - such as assuming it is a memory barrier or has a visible effect in a multi-threaded environment - then the fault is with these assumptions, not the way the compiler handles the memcpy().

The behaviour of the compiler was correct - the bug was in the kernel code. It annoyed the kernel developers, but the mistake was in the source code.

Dereferencing null pointers is undefined behaviour in C - this is well known. The compiler can therefore remove checks for null pointers that are run /after/ accesses through the pointer.

The compiler's behaviour here is correct, and it can occasionally lead to improvements in the generated code. But it is not particularly helpful for testing or debugging. It is therefore a good idea to either disable this "-fdelete-null-pointer-checks" optimisation, or to change the environment (such as by avoiding mapping a real page to address 0). The ideal answer, of course, is to correct the error in the source code.

Again, the C standards are quite clear and well-known - signed-integer overflow is undefined behaviour, and you cannot rely on them to wrap around as two's complement values. And there are situations where the compiler can take good, correct code and generate smaller and faster object code by "knowing" that signed integer arithmetic does not wrap.

gcc's warnings are usually quite good at telling you about these things

- /if/ you use them properly.

It is certainly arguable that the skills needed to make sure that the C code works as expected have changed over the years - I know I have written code over the years that would not work when compiled with a modern compiler and heavy optimisation enabled.

It is also certainly the case that the C standards committees and the compiler maintainers don't always seem to live in the same world as the people actually /using/ the tools.

Such people are not looking for the C programming language - they are looking for a language they think C should be. To my knowledge, no such language actually exists - so they use C as the nearest they can get, and complain when it is not /their/ ideal language. They could get on quite well if they learned how to use "volatile" appropriately.

I am not claiming C is an ideal language here - there are many things in it that I would change if I could. But we use it as the nearest practical choice, and write code to suit the language rather than expecting the language to suit our code.

The differences between undefined behaviour, unspecified behaviour, and implementation defined behaviour are subtle but important.

"Undefined behaviour" means that there is no meaningful interpretation for the code, and the compiler can optimise based on the assumption that such behaviour will never happen, and also that /if/ such behaviour happens, the programmer doesn't care about the result. Running off the end of an array is undefined behaviour, so the compiler can assume it won't happen.

"Unspecified behaviour" means that the standards don't say what will happen, nor does the compiler have to define the behaviour. The order of evaluation of function arguments is unspecified - the compiler can evaluate them in different orders at different times.

"Implementation defined behaviour" is supposed to be documented and consistent for a given compiler. The size of "int", and the storage format, is implementation defined behaviour.

(See Annex J of the C11 standards - or document N1570, which is the last freely available draft and is easily found on the web.)

Would you think that the comparison (x + 1) > y is the same as x >= y, where x and y are int's ? Mathematically, they are clearly the same thing - and converting to the second comparison will mean smaller and faster code that the first expression. But the conversion is only valid if the compiler can assume that integer overflow will never occur. If it is possible for "x" to get so big that "x + 1" overflows, then the programmer will have made a mistake here. So the compiler can assume that the programmer is competent, and generate smaller and faster code assuming that the undefined behaviour never occurs (or that the programmer doesn't care about the results if it /does/ occur).

The same is true for a lot of different undefined behaviour.

Compilers don't go out of their way to spot possible undefined behaviour, and then maliciously generate garbage to spite you. They assume the programmer knows what he is doing, and has written correct code (with plenty of warnings available if enabled), and that the programmer wants the result to be as small and fast as possible with the specified behaviour.

C does not support that sort of thing. If they want behaviour like that, they need to write in assembly (or inline assembly). You can't just make up your own rules about how you think C ought to behave.

Vote

L

Les Cargill 11 years ago

I mean 68k, too. Nothing has driven more towards that for a considerable amount of time.

To be fair, that's a pretty broad target. If we knew what that meant, we'd be more likely to have it.

I feel like I know less about your goal than I did before :)

"More expressive" to my mind means it's inherently easier to read once you get the hang of it.

I am biased by a (probably misunderstood) blurb from "Godel, Escher Bach" where he claims without exposing the proof that "there is no higher level language than FLOOP" - FLOOP being a cute metaphor for Algol, of which 'C' is a descendent.

Understood.

There's just a lot we don't know here.

Les Cargill

Vote

N

Niklas Holsti 11 years ago

In the comp.arch thread, there was considerable discussion about what the C standards really say, and whether or not they are internally consistent and clear. The impression I got from that discussion was that union punning does not always work in the standard.

I'm not going to argue these points -- my understanding of the C standard is too poor for that. (In fact I found it frustrating that the comp.arch contributors who seem, to me, quite competent and even expert, could not agree in that discussion.)

In the comp.arch discussion, the point was that the traditionalists felt that having to use memcpy() instead of their traditional pointer-casting method would be too inefficient, because they thought that memcpy() would copy data, even if inlined. The reply from the modernists was that the C compiler can treat a memcpy() call quite abstractly, as saying only that the name of destination variable afterwards refers to the same string of byte values as the name of the source variable; ergo, if subsequent data dependencies do not force a copy, the compiler need not implement a copy, by any means, and can just internally (and knowing it it safe) use the source data, in situ, where the source code specifies using the destination (copied) data. I guess this falls under the general /as if/ rule, but it is a rather wider interpretation of that rule than the traditionalists expected.

This, and your later replies, show that you agree with what I have called the "modernist" camp. I'm not saying that this camp is wrong; indeed I think this camp is correct in its interpretation of the C standard; but there are also the "traditionalists" who don't like the current C standard and its effect on current C compilers -- in particular, making C much less of a "portable assembler".

You are of course entirely right, from the formal point of view, but again, there are the C traditionalists who take the argument "the programmer knows what she wants" further, and say that if the programmer wrote x+1, then x+1 should be computed, and the programmer wants to be responsible for what happens -- overflow or no overflow.

In this example, it seems simple for the compiler to warn (perhaps only under some option asking for such warnings) that it has generated code for "(x+1)>y" under the assumption that x+1 does not overflow. Such warnings would please the traditionalists, especially if the compiler has options to suppress optimisations, like this one, that assume no overflow. Understandably, the modernists are not eager to bloat the compiler's optimiser and code-generator with such warnings and options, which they feel are not in the modern C spirit.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

H

Hans-Bernhard Bröker 11 years ago

I'll have to insist on an explanation quoting chapter and verse before I accept that claim. All editions of the C standard that I've seen try rather strongly to state the exact opposite of what you say there. E.g. C99 6.7.2.1p15: "The value of at most one of the members can be stored in a union object at any time." I don't see any definition of what happens if try to you retrieve a value that's not currently stored in the object, which would make that, rather obviously, undefined behaviour.

Vote

D

David Brown 11 years ago

The key point is clarified in a footnote in the C11 standards (draft N1570 is easily and freely available on the web, and is therefore more common than the official final version which must be bought). Section

6.5.2.3 on page 83 (page 101 of the pdf) has a footnote:

""" If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ??type punning??). This might be a trap representation. """

Accord """ An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88) ? a type compatible with the effective type of the object, ? a qualified version of a type compatible with the effective type of the object, ? a type that is the signed or unsigned type corresponding to the effective type of the object, ? a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, ? an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or ? a character type.

88) The intent of this list is to specify those circumstances in which an object may or may not be aliased. """

That's the best I can do, I think. If you want more here, then comp.lang.c would be better than comp.arch.embedded - where I suspect most people are already asleep! For comp.arch.embedded, the main issue is not what the standards say, but what compilers /do/ - and I haven't found a compiler that does not allow type-punning through unions. In particular, gcc makes it explicit:

Vote

E

Ed Prochak 11 years ago

d.

n

e

d

I don't see it as either/or. The balance should be able to be done using co mpiler options.

yes that may mean turning off most optimizations when I want the compiler t o emit machine code the way I wrote it in C code. So I have control. (This means I may have to make some careful programming to optimize at the source code level.

But then when I am writing an application that I want to be portable I trea t C as high level. This means letting the compiler writers learn all the tr icks of the machine code and I take advantage of their knowledge by switchi ng the optimizations on full power.

The design of C allows both worlds to meet. It does create some friction, b ut to me it is this flexibility that is an advantage. You just have to know what you are doing. C will not hold your hand.

Vote

E

Ed Prochak 11 years ago

On Saturday, December 6, 2014 2:58:11 PM UTC-5, Les Cargill wrote: []

[]

(maybe we really need to start another thread on languages)

Sorry but after trying to learn some Chinese recently I have to disagree on that the differences in human languages can be described as small.

yes human languages can be arranged hierarchically, just as programming languages can. But there are some large divergences in both trees.

English for the most part is based on vowels and consonants and is basically atonal. (Words mean the same thing whether you speak in a monotone or in a song.)

Chinese and other oriental languages are tonal. The same consonant and vowel combination can mean vastly different things depending on the inflection.

In programming languages the divide is between ones like C and languages like LISP.

I'll say one last thing today, then I have to get back to work. (This is more directed to the entire group, not you Les.)

There is no one language that can be used for all problems.

Someone else posted about high level languages not being flexible enough. Maybe you chose the wrong language. A specialized language has great advantages over a general purpose language within its problem domain.

You can write C programs to read and write databases, but it is much easier and clearer to express the program in SQL. You can write a compiler in COBOL, but it may be easier and clearer to use LEX and YACC. you can write a GUI in PERL and X, but maybe C# and XAML will be easier.

In terms of programming, think like a mechanical engineer, and pick the right tool for the job.

Vote

P

Paul E Bennett 11 years ago

[%X]

From the basic machine languages (those understood directly by the electronic processor) the aim of every programmer should be to build a language that is specific to the application domain. You know you are getting that right when the client can begin to see how to do the stuff they know in the language you create for them. The machines need to be told how to cope with the constructs of the Application Specific Language. Once tat is done, the rest becomes much easier.

As a Systems Engineer who uses Forth mainly High Integrity Systems, I find it gratifying when my clients really get comfortable with what grows from such a basis.

******************************************************************** Paul E. Bennett IEng MIET..... Forth based HIDECS Consultancy............. Mob: +44 (0)7811-639972 Tel: +44 (0)1235-510979 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

R

rickman 11 years ago

Getting the data out is not the hard part. The hard part is getting to the point in the program at the right time to see the data and that is the real trick in any situation. If you are just stepping through your code you are grabbing at the low hanging fruit where a debugger is not needed at all.

I'm sure there are plenty of times when a debugger made it easier to "see" some signal. A complex trigger definition may have made it easier to get to the right point in the code at the right time. But if you reread the thread you said print statements were "inefficient". I find they are often expedient since I don't need to deal with the ginormous complexity of a debugger and the *thinking* involved in each case is the same.

What??? Which eval boards are "free"? 1000 Eur is far from free. I'm not sure what you are saying.

The "cost" is the added hardware on the target as well as the equipment cost.

I have *never* used a JTAG port in production programming other than FPGAs where there is no choice. Manufacturing tells me they don't want the complexity. For MCUs there is nearly always a serial interface to load the production code.

Tests that are often built into the MCU as a selftest. It's not just the factory where such tests are performed.

I guess we have had different experiences.

Rick

Vote

O

Oliver Betz 11 years ago

[...]

I'm writing about data ("variables") stored in memory, and by SWD/BDM... I can access them permanently, regardless what the program does at this time.

[...]

many.

There are free (or cheap) eval boards with SWD debugger to start. And you can get "better" interfaces for less than 1000EUR.

There is no additional hardware on the target.

not my problem.

with the same complexity.

[...]

I'm writing about code testing / verification.

likely even different languages.

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

R

rickman 11 years ago

The value of a signal is not nearly as important as the timing. I don't care that the value went to zero, I already *know* that, I want to know

*where* in the code and *why* it went to zero. That means I have to watch the value change in the context of the program. That is the hard part of using a debugger.

Please send me some of these "free" boards...

Then why do you need "automated" tests? That says production to me.

Rick

Vote

L

langwadt 11 years ago

formatting link

is $10, that is one beer at the pub

-Lasse

Vote

R

rickman 11 years ago

Yep, that is a lot less than $1000 Euro, but hardly free. More importantly it is one brand of MCU. Does this help me with other MCUs?

This board is two separate sections, one is the MCU and the other is the debugger support hardware... notice the debugger support is a third as large as the MCU section. Hardly inconsequential to include on your target board.

Rick

Vote

L

Les Cargill 11 years ago

Oh, certainly - but there still appears to be "hardware support" for learning, say, Mandarin just as there is for English.

This is basically the Chomsky "large structure" argument. SFAIK, no

human languages are the products of innate features of the human mind,

to be Searle, not Chomsky )

formatting link

programming languages can. But there are some large divergences in both trees.

Well said, Ed.

Les Cargill

Vote

L

langwadt 11 years ago

you don't include it on a target board, all you need on the target is a connector

they just made it easy and put both the debugger and the target on the same board

-Lasse

Vote

P

Paul Rubin 11 years ago

No really, the compiler almost never can tell if x+1 might overflow. So it would have to emit a warning anytime the program says x+1. Note that if x is a signed int, then the >y is irrelevant: signed integer overflow is an undefined condition, i.e. nasal demons.

Gcc has the -wrapv option which ensures wraparound arithmetic (what the "portable assembler" crowd wants), and -trapv which signals an exception on overflow. Both of these potentially bloat the output code, of course.

Vote

P

Paul Rubin 11 years ago

Turning off optimizations doesn't make undefined behavior defined. You have to either use compiler flags like -wrapv to specify the behavior that you want, or else write the code to avoid undefined cases.

Vote

J

John Devereux 11 years ago

They are as close to free as makes no difference, you only need one per engineer! My rep gave me a handful of the things out of the boot of his car!

Yes it does, it should work on anything with a SWD port, i.e. all ARM cortex MCUs. Which is likely the majority of new microcontrollers by now.

You just reserve a few pins on the target, there is no other per-target debug hardware needed. Maybe a pullup or some such. You will generally need a way to program the device anyway.

John Devereux

Vote

O

Oliver Betz 11 years ago

[...]

formatting link

Or Google for "automated code testing" or "automated software test".

Oliver

Oliver Betz, Munich http://oliverbetz.de/

Vote

Modern debuggers cause bad code quality

Join the Discussion

Didn't find your answer?