Code metrics

Don Y · 2015-03-07T18:55:34+00:00

Hi, Any pointers to research that can shed light on the robustness/vulnerability of *particular* code metrics in the context of modeling: - effort required - "code correctness" (non-bugginess) And, in practice (for shops regularly *using* metrics in their development/test models), whether the type of metric employed has a measurable impact on the coding styles of developers (consciously or subconsciously). I.e., do they try to "game" the system? Thx, --don

H

Hans-Bernhard Bröker 11 years ago

That works well until an ill-minded customer discovers they can effectively run a denial-of-service attack on you by just claiming the existence of all manner of bugs, regardless whether they correspond to reality.

The other side of the above strategy will then be that they re-phrase all their change requests as bugs to be fixed for free.

Vote

D

Don Y 11 years ago

8<

But that just confirms the misuse of (those) metrics! They are being used to

*force* change (fire ineffective teachers; justify smaller class sizes; "get back to basics"; etc.) instead of as a *tool* to help understand the "system" that is being measured.

Do people *really* think it's A Good Thing for kids NOT to be tested to "standards" in (primary) school? Sure, you won't risk hurting Johnny's feelings. Or, having the teacher concentrate too much on "teaching to the test" vs. a more general approach.

Until, of course, Johnny gets to college and the admissions officer tosses his application in the trash because of poor grammar, spelling, etc. And, the FinAid office does likewise because "his numbers don't add up".

"OhMiGosh! What are we going to do! Obviously, LOWER the standards cuz it's too late to *fix* Johnny's primary school problem!"

But, *do* we? We *probably* can agree on egregious concoctions. But, do you really *know* how "good" *your* code is? How do you make that evaluation? "It runs"? "It was 'finished' on time/under budget"? "It hasn't killed anyone"? "There's not much maintenance being required"?

And, more to the point, do you KNOW how to make it *better*? Or, do you just *think* you do? I.e., all these rules/guidelines developed and codified over the last 50 years *try* to bias your efforts to a "better" result. Yet, you could faithfully implement ALL of them and still have crappy code!

When it comes down to specifics, do can you put a number on *how* important any particular coding practice is wrt code correctness? Or, its cost to The Project?

Do you even *know* how much your code "costs" (i.e., *you*)?

When I first started on my own, I was stunned at how much time was spent on non-engineering tasks! Equipment maintenance, purchases, ordering supplies, accounting, etc. How easily an hour could pass talking with a client, sales rep, colleague, etc. on the phone with "nothing" to show for it! E.g., sorting out some technical detail in a particular device prior to selecting it for the design; or a detail in the project specification; or, a detail in some other colleague's subsystem upon which you rely; etc.

The *good* thing is that you can have "low expectations" from the results -- there are no "target numbers" involved. Just trends, guidance, etc.

Vote

G

George Neuner 11 years ago

That's true but mostly irrelevant because the numbers include unknown amounts of slop that can't be correlated.

The problem with quick'n dirty is that much effort inevitably is wasted. Even if you know the general direction, there always are false starts and blind alleys before you reach the destination.

Eventually the application falls over under the weight of the grafts and has to be refactored - which is all waste. Unavoidable sometimes, but waste nonetheless. Metrics rarely take into account how much work needs to be redone, however "you can always do it over" is a basic premise of agile.

Not really. Compared to other methods, agile development is extremely sensitive to team makeup. Replace any member of the team and the numbers you've gathered become meaningless.

You can compare different agile teams which are doing essentially the same work, but you can't necessarily generalize from that to their performance on a different problem.

You can compare the total cost of agile to the total cost of some other method, but the numbers are misleading because agile deliberately trades work redone later for short time to market now. That is fundamentally different from the feature trading that other methods consider and makes comparing agile with other methods extremely difficult (unless you consider correct function to be a "feature" that can be delayed until version X).

I spent quite a few years doing "continuous" development - which is similar to "agile", but more structured. I think agile has been the worst thing to come along in my lifetime. On the surface it looks appealing, but the appearance is a mirage concealing a tar pit beneath.

YMMV, George

Vote

D

Don Y 11 years ago

But that's not the fault of the metrics! Replace them with and the folks *applying* them would still cause you the same grief! If, instead, they were used as an advisory tool: "Hmmm... your LOC/day figure is dropping, Stefan. This *suggests* whatever you are working on, now, is more tedious than what you were working on previously. And, possibly suggests more testing will be required of that module than the previous (cuz you are having to 'think harder' about it while writing it)" Or: "Wow! At this point, in the last project, we were seeing XXX. But, we're now seeing YYY. How does this forebode our future efforts and costs wrt that previous project and the estimates we prepared for this?"

Sure. We call that "experience" and "quality developers". You can

*know* something is bad -- and still do it, intentionally, in spite of the acknowledged risks! Knowing doesn't ensure you will be *wise* in the use of that information.

Again, that's how the metrics are *applied*, not a characteristic of the metrics themselves.

Returning to my "no warnings" comment, below: I can just turn OFF all warnings to achieve the same result! With the expected downside impact on code quality.

You can't "legislate" good behavior/practices. But, you can put tools in place that let people *see* the costs of their actions and make INFORMED decisions in light of that information.

The alternative is *hope* the developer (the original developer as well as any that *follow*!) understands all the warnings and has IMPLICITLY decided that they can be ignored. Are you sure the warnings *you* are seeing are the same warnings that *he* saw, previously?? :>

I port a fair bit of software. So, switching compilers, environments, etc. is a commonplace occurrence for me. The first thing I do is turn on all warnings and see how "messy" the output becomes. Then, track down each "violation" to see why the compiler flagged it as such. Was the previous developer (which may have been me!) just lazy, here? Or, are the tools different which makes certain previous assumptions no longer valid (e.g., sizes of data types)? Or, is this a compiler specific behavior that wasn't caught by previous Best Practices?

"Warning" means exactly that: "Hey, are you sure you know what you are doing, here?" I can either dismiss all with a naive, "Yup". Or, if I care for the quality of my code, I can spend the time to investigate why each was signaled. Then, take measures to "mark" them so they don't require additional time from me (or my successor) in the future.

I can't guarantee particular results. But, I can choose to put in place procedures/mechanisms that "improve my odds" of getting things right. Metrics are just an advisory tool along that same continuum.

E.g., I designed my IDL so the spec defines *all* the results from a particular method invocation. This allows me to handle each RPC/IPC in a boilerplate manner to ensure every potential outcome is at least

*recognized*/acknowledged by the developer. It's the equivalent of ensuring each malloc() is followed by "if (result == NULL)...". I.e., it doesn't guarantee that the developer handles the out-of-memory case CORRECTLY. But, it prods him/her to at least remember that this is a real possibility at each invocation!

Vote

D

Don Y 11 years ago

I think the problem is finding two *apples* to compare. From what I've seen of agile, spiral, etc. development styles, they're "never done". So, how do you know when you've got project Y in the same state that project X was at the time of the metrics you're using in the comparison?

That's my point, above. If you look at the project at a point AFTER the refactoring, then the costs of that are reflected in the new metrics.

The same applies to any other effort. Comparing metrics of developer A to developer B (who has an entirely different process) are meaningless as are language A to language B, etc.

Comparing the cost of a Ferrari to a Chevy would be equally meaningless.

OTOH, comparing the costs of a Model XYZ w/an inline 6 to a Model XYZ w/a V8 bears some merit!

You would use the trends observed *during* "problem A" to glean insights into what to expect for "problem B". Note you don't try to *know* what will happen in "problem B" but, rather, be alert as to what *has* happened with problem A in the past. E.g., "everything went fine UNTIL we got to..."

Ferrari vs. Chevy.

Someone has to place a value on "time to market" and decide how it offsets the other "costs"/consequences of that approach. No free lunch.

Likewise, someone has to put a cost on the disdain you risk from your guinea pigs^H^H^H^H^H customers by producing an incomplete product and *charging* them for a working unit! How many FUTURE customers so you lose? How many "lines of debugged code" would that have purchased?

Isn't the same sort of thing evident in the blatant: "We don't have time to do it right; but, we'll have time to do it over" or: "Great idea! We'll put that in version 2" mentalities? These have been around since *I* got started in industry!

Agreed. But I don't see any of these things as arguments against *having* metrics. Merely complications as to how they can be used and the "reliability" of the observations gleaned from them.

Sunday Lunch. Finestkind!

Vote

G

glen herrmannsfeldt 11 years ago

(snip on coding metrics)

Then add "No Child Left Behind", a plan from a C student president to make sure that all students are C students.

In Washington state, all schools are now failing, according to NCLB, as they are refusing to use student test results for teacher evaluation.

The result is that pretty much all schools have to send a letter to parents indicating that the school is failing.

and, similarly, test results aren't all that good at measuring teachers.

-- glen

Vote

D

Don Y 11 years ago

*Any* and *every* plan to "measure" the education system (its components, its results, etc.) is a farce. Too many "special interests" -- teachers, administrators, parents, etc. And, the system is uncharacterized. Like trying to control a loop for which you have no idea of the extents of lag present, etc.

So, instead, we wait until Johnny is responsible for administering that IV drug upon which your health/life depends. When he screws *that* up, we fire Johnny (let him move on to some other profession... like teaching!), give the patient 9or the patient's family) our condolences, pay off some lawyers and lament how the system failed "Johnny" (with no mention of the *patient*!)

Or, enforcing *laws* on the streets.

Charter schools! Really?? We all know how well business has addressed the needs of its "customers", historically. "Ah, but these are *schools*! SHIRLEY, they'll do better -- due to the moral imperative!" (just like pharmaceutical companies set their pricing and policies based on THAT morality)

To be fair, how would *you* measure teachers? It would be like measuring

*your* coding performance based on a starting point of some other developer(s)' code upon which you've built. WITHOUT even giving you the choice of whose codebase you will be supporting!

(No, you can't go back and rewrite it all; there are only 180 days in the school year and you have to have made *progress* in that time!)

Building on Frank's comment. We all (think!) we know a good/bad teacher when we see him/her. But, the verdict is never "in" until long after the student has moved beyond. And, van never truly be isolated and identified as the *cause* of the student's success/failure.

I.e., don't measure students to reward/punish the students *or* the teachers (or the System). Instead, use them as tools to evaluate areas that may need special attention. Or, to gauge the benefits of certain "investments".

(Personally, I don't understand the big hullabaloo over testing. I can remember taking "big tests" throughout my primary school education. And, I'm sure there was *some* effort to guide my studies in such a way that I would fare well enough on those. Without drawing attention to the fact that this is what was actually being done!)

Otherwise, you wait until it is too late and some "vested interest" casts judgement on whether or not Johnny is eligible for a particular vocation, continuing education or . Do you then create laws to prevent medical schools, employers, etc. from discriminating based on intelligence or other measures of aptitude? So Johnny can be a doctor or rocket scientist even if he's not qualified??

Vote

P

Paul E Bennett 11 years ago

My own development process fits both waterfall and spiral methods of development. In the manner of good "Project Management" there is a significant portion of Up-Front work in getting the specs right (these documents are "Components of the System" as well and are kept under tight version control and change management throughout).

The core element of my development process leaves an audit trail automatically (whether you operate it on paper or software tool aided). Within this audit trail you will find your metrics (on how many times a component has been round its action, review, change loop). How many issues were raised in the reviews against it. How many problem reports involved the component. How long it took to get an approved component to pass the review. When even the act of getting a good specification is under such a development regime you can ensure it meets the 6 C's criteria for a good requirements specification.

******************************************************************** Paul E. Bennett IEng MIET..... Forth based HIDECS Consultancy............. Mob: +44 (0)7811-639972 Tel: +44 (0)1392-426688 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

D

Don Y 11 years ago

I can tell you how and when an object is touched, what was done to it, etc.

What I *can't* tell is how much *effort* goes into "creating/changing" it. (effort is measured in man-hours)

E.g., how long (hours of labor) did it take to create a spec? How long (hours of labor) to create the hardware/software to reify that spec?

While you may have "checked out" an object at a particular time and checked in the next version at some *later* time, the difference (in - out) isn't truly representative of the time required to "do whatever you did" to create that new version from its predecessor. It just acts as an upper limit on the "time required".

Did you check it out, work on it (actively) for 10 minutes and then go on vacation for 2 weeks before checking it back in on your return?

Did you check it out along with several other objects and work on other things along with it before checking it back in?

Did you try several different, unsatisfactory variations of the revision and only check in the "final" attempt?

I.e., I want to be able to put a number on the *effort* required (beyond counting effective keystrokes). I can check out a version and spend a lot of effort refactoring it into an *equivalent* version with very similar metrics. How is that *effort* measured and accounted?

Vote

G

glen herrmannsfeldt 11 years ago

(snip on coding metrics, and then on school metrics)

I don't think we will get completely away from testing, or for that matter, grades, but yes they are never perfect.

There was a case not so many years ago, where a nurse gave the wrong dose of some medicine to a patient. It required a complicated calculation to determine the right dose, and it seems that she got it wrong. She was immediately fired, and not so many days later, committed suicide.

Now, certainly we expect nurses to always get it right, but on the other hand, what did the hospital expect her to do? She had gone to school for many years, and then had many years of experience as a nurse. Most likely, no other hospital would hire her.

It seems reasonably to me that if something is that critical that two nurses should do the computation and verify that they agree. (That doesn't eliminate the problem, but maybe reduces it enough.)

To get back to coding, I hope that there are strict standards for those writing control programs for nuclear (or any) power plants. Most likely, as noted above, with more than one person involved.

I believe that there is a system to measure teachers based on the change in test scores. That is, from the end of the previous year (and teacher) to the end of the current year. That should work, but has a lot of statistical uncertainty.

As I understand it, some in Washington now have the principal make decisions based on all data, including test scores, but not with a fixed proportion. It seems that isn't good enough for NCLB.

(snip)

Well, yes, but when a teacher has had bad reports for 10 years or so, and nothing changes, then parents get mad. But then the teacher has tenure and can't be fired.

Well, I remember tests maybe once every four years. It seems that now they have two or three tests a year.

-- glen

Vote

D

Don Y 11 years ago

Again, there is nothing inherently wrong with the grade/score/metric. It is how it is *used* that begs attention.

There are *lots* of "screwups" that happen EVERY DAY in hospitals. SWMBO at one time sat in on the "take no notes" meetings where this sort of stuff was discussed. Some of the horror stories would have you perform your own surgery rather than risk going into a hospital!

We forget that "it's just a job" -- to *all* of these people: doctors, cops, nurses, etc. Expecting them to never make mistakes is wishful thinking.

SWMBO was in for some out-patient surgery. I accompanied her to the recovery room (still sedated). Nurse came over and gave her some meds. I, of course, asked "what's that?" and "what's it for?". As the answers seemed sensible -- and didn't conflict with any of her known allergies (amazing how often medical professionals fail to read the big letters on your chart warning against these items!) -- I acquiesced to her being dosed.

Several minutes later, nurse (same one) came over to give her some meds. "What's that?" I then told her "you already gave it to her, 10 minutes ago!". Nurse got beligerant. "No, I didn't!" "Isn't it used for..." and then I recited what she had told me previously when I had asked the first time.

Now she's in a box: what are the chances that Joe Offthestreet happens to KNOW the indications for a particular *odd* pharmaceutical? And, he *claims* it had already been dosed. So, he'd be likely to offer "reliable" testimony on that fact...

"Well, it's not written down on her chart!"

[Hmmm... three people here: one is unconscious. Another is a lay person/visitor. Third is a paid healthcare professional CHARGED with running this (4 bed) recovery room. Which of us *should* be responsible for making that notation on the chart??]

Sure. Or, have it predispensed by the hospital's pharmacy. Or, an "app" for it.

No guarantee that a second individual will be willing to contradict/correct the first when the calculation is in error. (it is amusing to see how easily people can be coerced into going along with )

If one of the professionals is a *doctor*, then all bets are off. Nurses routinely claim that doctors don't take kindly to criticism and tend to be bullies -- as well as making mistakes that the nurses have to catch or correct.

And, of course, *two* professionals only increases the cost of that care!

Most of these things rely on good practices in place and "lots of eyes". But, the "eyes" have to be motivated to be critical. If they just go through the motions, they're just "excess overhead".

How do you calibrate that system? E.g., when students get to a "rebellious" age, I imagine there are more "other issues" that interfere with performance. I.e., change from K->1 and 8->9 can be very different sorts of differences.

No idea. School (system, facilities, curriculum, funding, etc.) has changed a lot since I was a kid. I'm glad that *I* don't have to solve that problem. From talking with educators, they see schools as giant "playgrounds" for politicians, parent groups, etc. to *experiment* in.

(Cripes, how many different "initiatives" have their been in this area??)

Or, concerned parents speak out ON BEHALF OF *their* CHILD and "fix" the problem from *their* perspective (leaving the rest of the kids in that class to deal with the substandard teacher). How do you "prove" there is a problem with the teacher? :-/

Thankfully, my school district was well funded and lots of very capable teachers. Most opened doors for me (opportunity) and then stepped out of the way so I wouldn't be hindered by the "regular curriculum". Or, fought for funding for extracurricular "geek" activities that weren't available in the district at that time.

I can't imagine what it would be like with teachers who considered it "just a job"...

Yes, they were infrequent. But, I recall entire days set aside for The Test, etc. And, of course, in JHS & HS you have midterms and finals in each class... (plus weekly quizzes, typically).

Vote

J

Jacob Sparre Andersen 11 years ago

Agreed.

I have considered keeping a log of the expected warnings (with documentation of why the warnings are expected) and then making it an error to have a different set of warnings than exactly the documented set.

Unfortunately I haven't gotten around to try this idea out in practice, so I'm also (still) in the *no warnings* camp.

The Ada compiler I use allows me to identify some warnings (unreferenced parameters and objects) as expected, but the *why* has to be a comment.

But yes, it would be nice to be able to document all expected warnings with a required explanation of why they are expected.

Keeping a log of expected warnings and checking compilation and tool results against that list is of course a solution, but it feels too much like a hack.

Greetings,

Jacob

Infinite loop: n., see loop, infinite. Loop, infinite: n., see infinite loop.

Vote

D

Don Y 11 years ago

In light of that, you want to keep as many warnings ("advisories") enabled as is possible! But, in practice, this can generate a lot of "advisory output" that, once you've checked everything "the first time", you really want to be able to IGNORE (without turning them "off")

You could capture the output to a file and then diff that against future output. But, as line numbers can change, this just turns one problem (verifying the exact same warnings persist) into another problem (verifying the warning on line X is really the same warning that is now reported on line Y).

In a GUI IDE, one could conceivably tag (click) each warning's corresponding source and have the IDE remember "this warning is OK, here". But, unless you can encode that in the source itself, its not portable to other tools.

I'd like something akin to bison's %expect/%expect-rr capability. I.e., if the expected warning WOULD BE generated, it is suppressed. And, if the expected warning (or, a *different* warning) would be generated, an ERROR is signaled.

Of course, in a yacc (bison) grammar, you can put this sort of directive anywhere in the "source" file to achieve the desired result; there's no need to tie it to a specific *line* number -- or statement!

In most languages, that would be sort of useless: "expect 'missing cast'" (yeah, sure. Like *which* one and how *many*??)

And, when the warning fails to materialize, have the compiler *complain*! ("Hey, I know you expected this warning, here. But, it didn't happen. Are you sure? At the very least, your documentation claiming it *would* be here is faulty!")

Agreed. And, too "manual"/disciplined.

I believe people are inherently lazy. If you *require* them to perform some action, there is a good chance that they will (eventually) fail to do so. E.g., remove the comment alerting the developer to the warning (hence, make that a hard error).

OTOH, taking the approach of "treat all warnings as errors" (compiler flag) ends up getting folks to just insert to placate the compiler. Without considering the nature of the warning and *why* (if?) their action should compensate, logically.

E.g., when my IDL compiler writes a client-side stub, it litters the source with ("superfluous"?) casts as it marshalls the arguments (which may be complex types/structs) and prepares to push them down the wire as "octets". Had someone manually written that stub code, he/she would *probably* get lazy and omit all those casts and perhaps not bothered thinking about whether a simple cast *would* solve the problem warned -- or, if there is a bigger issue that is being glossed over (e.g., endian-ness, network/host byte order, data type encoding variations in a heterogeneous environment, etc.)

"Feh. Damn compiler is always warning about that sort of thing. Just ignore it."

Vote

D

Don Y 11 years ago

Grrr... s/would be/would NOT be/

Vote

S

Stefan Reuther 11 years ago

Some static checking tools (Klocwork, Coverity?) have a database and you can set their warnings to "ignore" there.

However, such a tool is too clunky for a edit/compile/test cycle for my taste. And not seeing the forest for the trees in compile output doesn't help too much either, even if the compiler warnings are declared nonexistant by a later step.

Stefan

Vote

R

Reinhardt Behm 11 years ago

I have a simple rule: There may be not warnings.

Simple things like a warning about a missing cast will be fixed. Others in my experience are most often the result of bad programming style and _have_ to be fixed. Possibly by not removing the warning but the programmer.

Reinhardt

Vote

P

Paul E Bennett 11 years ago

That information is recorded as well. My process documentation runs with four forms. A Review Record form (where the issues with a component are recorded and the people involved in the review), A Change Proposal Form, to determine what should be changed (which is also reviewed before progressing), a Work Instruction Form (which details the specific change permitted to be made and the designated insertion point), and a Problem Report Form (that captures any remaining problems that escape notice before deliver). There is a Project Register which records the events of these activities, so there is some semblance of effort time as an upper bound. However, for actual timing expended, there is a reliance on the individual engineers journal if they happen to remember to record such information.

As I am at a Conference (on Provavbly Correct Software) at present I am away from the metrics record. However, I provide an aggregate effort measure for the Inspection, Functional and Limits Testing per software component based on the cyclomatic complexity of the component under eamination. That works out at:-

For a component of cyclomatic complexity 3 and 7 and 10 it is a best guess on how many days/weeks it may take.

I only remember this because I was recently reviewing the recorded metrics for effort expended in this regard. Inspection and Test are a form of review and are covered by a Review Record Form.

Getting to a final requirements specification (that meets the 6 C'S criteria). can utilise up to 60% of the project life-time. However, this upfront effort has a benefit in giving the designers and developers a solid basis from which to work and a big reduction in the number of errors that arise in the initial specification stage.

You have to look at your metrics regime and design the data collection steps that are important to you. If it is important to you, then you have to work out how you collect it. Time spent on developing an individual component are not that much of a concern for me. I get a reasonably good average figure from my own development rate but that is aggregate data from my own journals and the development process metrics. It takes a bit of effort to extract that data and I do from time to time. However, my most important figure that takes my focus is the number of errors released to the client (which is satisfyingly low).

I guess only you will be able to answer why you need that data and how important that figure will be for you.

******************************************************************** Paul E. Bennett IEng MIET..... Forth based HIDECS Consultancy............. Mob: +44 (0)7811-639972 Tel: +44 (0)1392-426688 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

G

glen herrmannsfeldt 11 years ago

(snip)

Compiler writers keep adding more warnings, no matter how rare the condition warned about. At some point, the warnings are take more time to check than the condition being warned about.

-- glen

Vote

J

Jacob Sparre Andersen 11 years ago

Exactly.

My thoughts too.

Yes. But it it is possible to encode it in the source. The problem is to do it robustly - and in a way that doesn't annoy the programmer.

This sounds like how "my" Ada compiler works:

procedure Warnings is Object : constant Boolean := True; pragma Unreferenced (Object); begin if Object then -- "warnings.adb:4" null; end if; end Warnings;

Compiling:

warnings.adb:4:07: warning: pragma Unreferenced given for "Object"

Definitely!

What I've considered is to keep the "log" of expected warnings as specially formatted comments in the source, and then write a tool which correlates the compiler and tool output with the expected warning markers in the source files.

It is still an extra step to do, but it would be easy to integrate it in my existing build and test framework, once the tool was written.

Exactly. But what is the solution then? To accept warnings as

*warnings* until the reason can be peer-reviewed?

Greetings,

Jacob

"Can we feel bad for the universe later?"

Vote

J

Jacob Sparre Andersen 11 years ago

By line? Or by kind?

It should definitely be a tool which fits into the edit/compile cycle.

Maybe it should be inserted as a filter on the compiler output?

Greetings,

Jacob

"I am an old man now, and when I die and go to Heaven there are two matters on which I hope enlightenment. One is quantum electro-dynamics and the other is turbulence of fluids. About the former, I am rather optimistic." Sir Horace Lamb.

Vote

Code metrics

Join the Discussion

Didn't find your answer?