Requesting critique of a C unit test environment

YMMV of course, but if I could get Donald Knuth to prove my programs correct "by hand", I'd feel no need for additional confidence.

--
Ben Pfaff 
http://benpfaff.org
Reply to
Ben Pfaff
Loading thread data ...

What did y'all do if the "formal" test failed?

What I look for is this: Replicate the failure as a short unit test. Not a proof - just a stupid test that fails because the code change needed to fix that formal test isn't there.

The point is to make the fast tests higher value as you go...

--
 Phlip
 http://www.oreilly.com/catalog/9780596510657/
Reply to
Phlip

No, MMIS[1]. I did not intend to disparage Prof. Knuth's "hand proofs" (what a thought!) but rather to say that the problem he is referring to is as likely to be that one proves something other than the program one has written (or later writes) as it is to be that ones proof is (internally) flawed.

I suspect that he is not entirely happy with the way that quip is used so often to suggest the pointlessness of proofs[2] (after all, what did he choose to do with his "Notes on van Emde Boas constriction of priority deques" -- a proof rather than a test implementation!).

[1] "My mileage is similar". [2] This not one of those times -- RH was just countering the much stronger assertion that proof => no need to test.
--
Ben.
Reply to
Ben Bacarisse

On Aug 27, 4:17 pm, Ben Bacarisse wrote: [snip]

Aside: Working vEB tree implementation here:

formatting link

Reply to
user923005

If performed, internal formal testing is still a step away from developer testing.

How so? A unit test suite doesn't just vanish when the code is released, it is an essential part of the code base.

That depends on your definition of Acceptance tests. In our case, they are the automated suite of tests that have to pass before the product is released to customers.

Again, that depends on your process.

Why? Our acceptance test are very comprehensive, written by professional testers working with a product manager (the customer).

It sounds like you don't have fully automated acceptance tests. Where ever possible, all tests should be fully automated.

--
Ian Collins.
Reply to
Ian Collins

Right. One problem is that they don't always prove what you asked them to prove. What you actually want to know is "does this program properly do what I need it to do?", but what a prover actually tells you is whether program X conforms to a particular expression of specification Y. It makes no comment whatsoever on whether specification Y corresponds to wishlist Z. And, very often, such correspondence is far from perfect.

--
Richard Heathfield 
Email: -www. +rjh@
Google users: 
"Usenet is a strange place" - dmr 29 July 1999
Reply to
Richard Heathfield

Perhaps because this is an example of a medical product where that is felt to be required. There is a list of various kinds of "structural coverages," those selected commensurate with the level of risk posed by the software. (Saying something has "coverage," I think, always implies 100% coverage, too. Not partial. So you either have coverage or you don't.)

Borrowing from one of the US CDRH PDFs I have laying about:

· Statement Coverage ? This criteria requires sufficient test cases for each program statement to be executed at least once; however, its achievement is insufficient to provide confidence in a software product's behavior. · Decision (Branch) Coverage ? This criteria requires sufficient test cases for each program decision or branch to be executed so that each possible outcome occurs at least once. It is considered to be a minimum level of coverage for most software products, but decision coverage alone is insufficient for high-integrity applications. · Condition Coverage ? This criteria requires sufficient test cases for each condition in a program decision to take on all possible outcomes at least once. It differs from branch coverage only when multiple conditions must be evaluated to reach a decision. · Multi-Condition Coverage ? This criteria requires sufficient test cases to exercise all possible combinations of conditions in a program decision. · Loop Coverage ? This criteria requires sufficient test cases for all program loops to be executed for zero, one, two, and many iterations covering initialization, typical running and termination (boundary) conditions. · Path Coverage ? This criteria requires sufficient test cases for each feasible path, basis path, etc., from start to exit of a defined program segment, to be executed at least once. Because of the very large number of possible paths through a software program, path coverage is generally not achievable. The amount of path coverage is normally established based on the risk or criticality of the software under test. · Data Flow Coverage ? This criteria requires sufficient test cases for each feasible data flow to be executed at least once. A number of data flow testing strategies are available.

For potentially high risk software, you may not just use a different compiler or a different operating system environment or change even the optimization options. As the OP mentioned, it's probably going to enough just justifying an instruction simulator.

I can easily see a desire for an automated way of demonstrating that structural testing has achieved one or more of these cases. If I read the OP right about this, anyway.

Jon

Reply to
Jonathan Kirwan

Sed quis custodiet ipsos custodes?

Richard

Reply to
Richard Bos

The why was prompted by the posting subject "critique of a C unit test environment". To my way of thinking (TDD), unit tests are developer tool, not formal product tests.

--
Ian Collins.
Reply to
Ian Collins

|--------------------------------------------------------------------------| |"[..] | | | |YMMV of course, but if I could get Donald Knuth to prove my | |programs correct "by hand", I'd feel no need for additional | |confidence." | |--------------------------------------------------------------------------|

Such as the way Donald E. Knuth told Leslie Lamport that TeX would hardly change at all? From

formatting link
:"[..] [..] When Don was writing TEX80, he announced that it would be a reimplementation of TEX78, but he was not going to add new features. I took him seriously and asked for almost no changes to TEX itself. [..] However, there were many other im- provements that I could have suggested but didn't. In the end, Don wound up making very big changes to TEX78. But they were all incremental, and there was never a point where he admitted that he was willing to make major changes. Had I known at the begin- ning how many changes he would be making, I would have tried to participate in the redesign. [..] [..]"

Regards, Colin Paul Gloster

Reply to
Colin Paul Gloster

|------------------------------------------------------------------------| |"Ben Bacarisse said: | | | |> Richard Heathfield writes: | |> | |>> Erik Wikström said: | |>> | |>> | |>> | |>>> Testing is used to find errors, while formal methods are used to | |>>> prove that there are no errors, at least that's the goal. So if you | |>>> can prove that there are no errors why test for them? | |>> | |>> "Beware of bugs in the above code; I have only proved it correct, not| |>> tried it." - Donald E Knuth. | |> | |> But this was a "by hand" proof in 1977. A machine assisted proof of | |> the actual code could be expected to inspire a little more confidence.| | | |Why? Presumably the machine that is doing the assisting is itself a | |computer program. What makes you think the assistance program is | |correct?" | |------------------------------------------------------------------------|

Full points to Mister Heathfield.

Reply to
Colin Paul Gloster

formatting link

"Principle 4

" * Level out the workload (heijunka). (Work like the tortoise, not the hare).

"This helps achieve the goal of minimizing waste (muda), not overburdening people or the equipment (muri), and not creating uneven production levels (mura)."

formatting link

--
 Phlip
Reply to
Phlip

I will second that, well put.

The Achilles heel for either Testing or formal methods is contaminating the evaluation process with information from the implementation.

I have seen unit tests contaminated from just knowing the application area the code was going to be used.

w..

Col> >

Reply to
Walter Banks

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Yes. However, above you said that it should not matter for unit testing whether you use the same compiler or not. Since unit testing can and often *is* formal such a statement is at least misleading. Had you said that it did not matter for informal testing and had the OP been asking about informal testing you might have a point, but it was never stated that the unit testing was informal.

Simple. If it is not formal then you (the next developer) have no guarantee that it is in a usable state. So you, the next developer, have to fully validate any tests you will rely on during your development.

Yes, this could be a matter of definition. To me an acceptance test is the customer coming in and witnessing some pre-agreed tests where if they pass the customer will accept the SW and/or HW (and pay for it). It has nothing to do with whether the company is prepared to give the SW to the customer.

I've not worked for a company where they would be prepared to try and get a customer to accept SW before having a decent level of confidence that it is correct *and* acceptable to the customer.

It is not possible for a reasonable cost to fully automate all testing. On a number of projects I have worked on the formal testing included deliberately connecting up the system incorrectly (and changing the physical wiring whilst the SW is running), inducing faults in the HW that the SW was intended to test, responding either correctly and incorrectly to operator prompts, putting a plate in front of a camera so that it could not see the correct image whilst the SW is looking at it, swapping a card in the system for a card from a system with a different specification etc. It would literally require a robot to automate some of this testing, and some of the rest of it would require considerable investment to automate. Compared to the cost of the odd few man-weeks to manually run through the formal testing with a competent whiteness the cost of automation would be stupid.

BTW, on the SW I am mainly thinking of there were so few bug reports that on one occasion when the customer representative came to us for acceptance testing, a few years after the previous version, both the customer representative and I could remember all of the fault reports and discuss why I new none of them were present in the new version. The customer representative was *not* a user (he worked for a "Procurement Executive" and not for the organisation that used the kit), so he would not have seen it for several years.

If you doubt the quality of the manual testing, then look at how many

50000 line pieces of SW have as few as 10 fault reports from customers over a 15 year period. Most of those fault reports were in the early years, and *none* were after the last few deliveries I was involved in.

BTW, if they are still using the SW at the start of 2028 we have a problem, but that is documented and could easily be worked around.

--
Flash Gordon
Reply to
Flash Gordon

We all work from our own point of reference, in mine, units tests are a developer tool so that's why I answered as I did.

Again, as one who uses TDD, the tests are always up to date as they document the workings of the code. All down to process.

Ah, that explains a lot!

Neither have I.

True, but with care you can automate the majority of them. The beauty of automated tests is they cost next to nothing to run, so they can be continuously run against your code repository.

There you have what I'd call integration testing, something we also do with any software that interacts with other equipment.

I don't doubt it, I just prefer to send my resources elsewhere. We go through the full manual integration tests for major software releases (adding acceptance and unit tests to reproduce any bugs found). This process of feeding back tests into the automated suites makes them progressively more thorough, to the extent that minor updates can be released without manual testing ant the testing of major releases finds few, if any, bugs. Most of the bugs found by the manual testing are differing interpretations of the specification.

--
Ian Collins.
Reply to
Ian Collins

You should try to avoid assuming everyone works the same way. In the defence industry at least it is very common for there to be a lot of formal unit tests.

If the process is enforced then the testing is formal and, I would expect, the results are recorded somewhere the 10th developer after you will be able to find them.

Acceptance tests are used to accept, simple :-)

So you do your acceptance tests before the customer sees the kit?

I fully understand the use of them. However, it is not always either practical or cost effective. In this case there was no automated test system available, so if we wanted one we would have had to design, implement and test it, then write all the test harnesses...

Almost forgot, we would have had to generate and validate a *lot* of test data instead of just using real kit either with or without faults.

At the end of the day we would also have had to do thorough integration testing as well. So I still believe doing automated testing would have been more expensive overall, and certainly would have been a significant up-front cost.

Note that this SW does a *lot* of HW interaction, since it is actually the main SW of a piece of 2nd line test equipment.

Yes and no. Each set of tests was focused on exercising a specific unit, it was just using the rest of the SW as a test harness.

Obviously. We just killed multiple birds with the same high-tech missile^W^W^Wstone.

I still don't believe it cost more time overall.

We also added tests to trap the few bugs that were found.

We started off by making the tests thorough which is why the testing takes so long. Due to this and the low bug count almost all releases whilst I worked at the company were major releases (adding support for major variants of the kit it tested, testing major new features in new versions of the kit it tested etc) with only a small number of bug-fix releases.

Not on this SW. Reviews of requirements caught most of them and reviews of design most of the remainder. I can only think of one interpretation issue on the SW that was not caught before coding started on this SW.

--
Flash Gordon
Reply to
Flash Gordon

The results are recorded every time the tests run - either "OK" or failure messages :)

They are run as soon as the feature they test is complete.

It the project is log running, or a family of products are to be maintained it can be worth the effort. I preferred to have my test engineers developing innovative ways to build automatic tests that have them running manual tests. Provided they can produces the tests at least as fast as the developers code the features, everyone is happy.

I like to capture all of the data generated during manual tests and feed it back through as part of the automated tests.

The example I'm referring to were power system controllers.

This project has been running (the product has to continuously evolve to meet the changing market) for 5 years, so the up front cost has paid for its self many times over.

--
Ian Collins.
Reply to
Ian Collins

I think Ian refers to "developer tests". Giving them different definitions helps. They have overlapping effects but distinct motivations.

The failure of a unit test implicates only one unit in the system, so the search for a bug should be very easy. The failure of a developer test implicates the last edit - not that it inserted a bug, but only that it failed the test suite! Finding and reverting that edit is easier than debugging.

They are more cost effective than endless debugging!!!

--
  Phlip
  http://www.oreilly.com/catalog/9780596510657/
  "Test Driven Ajax (on Rails)"
  assert_xpath, assert_javascript, & assert_ajax
Reply to
Phlip

They are only recorded if they are put somewhere that someone can see them after you have left the company. Otherwise they are only reported.

And all re-run after the final line of code is cut, I trust.

I started on it in the late 80's and the last I heard was a contract signed giving an option of support until 2020, that long enough for you?

Only half a dozen or so variants, all using over 90% common code.

Ah, but we did not spend vast amounts of time running the tests, not compared to the time/effort involved in generating the required test data, automating the tests, and then writing the integration tests needed to prove it works as an entire system.

We did not have the luxury of dedicated test developers. Those developing the tests where those analysing the requirements, designing the SW and implementing it.

That would require writing a lot of SW to capture the data. All of which would have to be tested.

I'm talking about 2nd line test equipment for *very* high end camera and image processing systems. 2nd line is the kit the customer puts it on when it has come back from operation broken.

Ah well, the SW I'm referring changes only every few years due to new customers or existing customers wanting enhancements to the kit it is to test. The last set of updates I'm aware of will have started probably in

2001 (maybe 2000) but I had left the company by then. I know we had won the contract. So definitely over twice as long a period. Requirements changes also had minimal code impact because we had designed the system to allow for changes.
--
Flash Gordon
Reply to
Flash Gordon

The tests are part of the project, in the sane source control. Without the tests, the project can not build. Building and running the tests is an integral part of the build process.

The last sentence is important, so I'll repeat it - the unit test are built and run each time the module is compiled.

Rerun every build, dozens of times a day for each developer or pair.

--
Ian Collins.
Reply to
Ian Collins

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.