Safety-Critical Software Design

R

Randy Yates 10 years ago

Hi Everyone,

Are there any formal requirements, guidelines, or recommendations for software which will run in a safety-critical environment in the United States or world-wide?

By a "safety-critical" environment I mean an environment in which a failure can lead to loss of, or serious injury to, human life. For example, automobile navigation systems, medical devices, lasers, etc.

I know there is the MISRA association and MISRA C. I am wondering if there are others.

My gut and experience tells me there should NEVER be software DIRECTLY controlling signals of devices that might lead to human injury. Rather, such devices should be controlled by discrete hardware, perhaps as complex as an FPGA. There is always going to be a chance that a real processor that, e.g., controls the enable signal to a laser is going to crash with the signal enabled.

I realize that hardware-only control is subject to failures as well, but they wouldn't seem to be nearly as likely as a coding failure.

Let me get even more specific: would it be acceptable to use a processor running linux in such an application? My gut reaction is "Not only no, but HELL no," but I'm not sure if I'm being overly cautious.

Any guidance, suggestions, comments, etc., would be appreciated.

Randy Yates, DSP/Embedded Firmware Developer Digital Signal Labs http://www.digitalsignallabs.com

Vote

N

Niklas Holsti 10 years ago

Several, of course.

One starting point is IEC 61508,

formatting link

I'm not an expert, but I believe that the standards and requirements do not prohibit or mandate certain designs, but mandate certain analyses and the resulting assurances that the design has the required safety properties.

It is up to the designer to balance the complexity of the design against the complexity of the safety analysis or "safety case".

MISRA is more design-oriented.

Often the design provides a separate control system and a separate safety system that monitors the control system and prevents unsafe behaviour.

I don't thinkt that Linux would be prohibited out of hand, but showing that a Linux-based system has the required safety properties is probably harder than for a simpler, more to-the-point implementation.

HTH.

Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .

Vote

R

Reinhardt Behm 10 years ago

There are several regulations: IEC 61508 for industrial functional safety ISO 26262 for automotive DO-178 for avionics ??? for medical

I am not at home with the first two, but develop avionics systems to DO-178. At its highest safety level (DAL-A) you have to trace back every line of code to system/high-level/low-level requirements and develop test procedures to verify that the software (and that is the whole system, from boot loader to OS, to libraries, to the application) fulfills these requirements and also that every decision ("if (a && b)...) and every combination of inputs ("a" and "b" in this example) is correctly processed. Also every low level requirement must be derived from high-level reqs, every HLR must be derived from Sys-Reqs and it must be shown that the step from Sys-Reqs to HLR covers all Sys-Req. The same for HLR to LLR. You even have to certify or verify all tools that can contribute to errors or are used for verification and can lead to not detecting an error during verification.

If you think you can do it instead in hardware like an FPGA something similar applies: There is DO-254 for "complex hardware".

For any software that needs certification to DO-178 e.g. Level A the hardware has to be certified to the corresponding DO-254 level. Often you will also have not just one CPU but at least two doing the same by software written by different teams and some kind of interlock between these which brings the system into fail safe state in case of discrepancies. In such a case you also have to asses what fail has to be. Shutting down the engines in mid flight probably is not very fail safe.

Thing like MISRA C does by itself not guaranty any safety. It is just a set of guidelines. Mostly of what not to do because a less competent programmer might misuse a feature. Something in the line of "Somebody has cut himself with a knife, so we forbid to use of knifes."

The most critical part is the development of the Sys-Reqs, since everything derives from these. This is something that is in principle outside of the realm of the software designers. From what I know about some catastrophic failures, the problem was often rooted in incorrect Sys-Reqs.

One example was the fatal crash during landing with heavy side wind of an Airbus in Warsaw several years ago. It was caused by an interlock that prevented the thrust reversal to be activated when not both wheels signal "weight on wheel". This interlock was introduced because a Lauda-Air bird dropped from the sky in Malaysia (?) because the pilot had erroneously activated thrust reversal during normal flight. The people who had changed the Sys-Reqs for the flight software had not thought everything to its end. The software people were not to be blamed for correct implementation of this "feature" according to the Sys-Reqs.

My guideline is:

- first do safety assessments, identify any potential thread

- develop Sys-Reqs, take into account every dangerous situation and how to handle it (like: have a upper time limit for activation of the laser). Decide what to to in HW what in SW, the upper limit could better be a HW monoflop.

- review the Sys-Reqs and fix/freeze them, signed by the customer. Can every Sys-Req be verified? How do different Sys-Reqs interact to create an additional thread (see above fatal plane crash)

- develop HLRs. This will also define the overall design. Do nothing fancy, keep everything simple.

- review and freeze, Make sure each HLR can be verified during integration and verification.

- and so on...

In parallel develop the requirements for the test cases and develop the tests. These will help you during coding.

Never ever change HLRs or even Sys-Reqs when coding. When such changes need to be done, re-start the process including the reviews and the impact analysis for the changes.

Reinhardt

Vote

U

upsidedown 10 years ago

At least keep the safety critical and non-safety critical systems well apart, at least in separate (or even different type of hardware).

Analyzing the _small_ safety critical system becomes possible. Also some organizations might dictate what data may be transferred between the two systems, some might allow only data out from the safety critical to the non-safety critical system.

Use a voter system, the simplest I have seen was a bar with three solenoids attached each controlled by separate systems. When at least two systems agree, the bar is moved in that direction and the bar is controlling something big.

Redundancy helps, especially if the systems are made by different hardware and different programming teams. At last resort, use springs or gravity to handle power loss and similar problems

Gravity might be unreliable during an earthquake :-).

It is important that different safety systems are separate from each other and preferably implemented with different technology. Remember Fukushima, they had a lot of redundant emergency cooling diesel generators, but they all got wet due to a single tsunami wave and rest is history.

The question is, why would you need such complex general purpose operating system for running small safety critical systems ?

However, I wouldn't be too surprised finding some heavily stripped down Linux based system.

Vote

T

Tim Wescott 10 years ago

Try searching on the terms?

The standards that I know about are DO-178, MIL-STD-498 and MIL- STD-2167. Searching on some of the standards given should give you some threads to start looking into.

Regardless of your gut and experience, there's plenty of places where software DOES directly control signals or devices that might lead to injury or death -- but the software design methods are much more stringent.

DO-178 lists five levels of criticality, ranging from "E" (the in-flight movie doesn't work) through "A" (smoking hole in the ground surrounded by TV crews). "E" is pretty much "anything goes" -- i.e., go ahead and use commercial software. The rule of thumb is that each time you bump up a level, the amount of work on the software alone goes up by, roughly, a factor of 7, and (as mentioned), the hardware and all the tools used in the software must march with the software design.

I can't say that the details are the same for the FDA-approved stuff, but I've got friends that worked on pacemaker software, and the general vibe was the same. For instance, Protocol Systems, before they got bought by Welsh-Allen, would build a whole prototype, do animal tests on it, then design the whole pacemaker again from scratch, using lessons learned. I think Welsh-Allen did the same thing.

Tim Wescott Control systems, embedded software and circuit design I'm looking for work! See my website if you're interested http://www.wescottdesign.com

Vote

F

FreeRTOS info 10 years ago

Gosh, lots, here is a start:

For industrial look at IEC61508, and all its industry specific derivatives.

For aerospace look at DO178C.

For rail look at EN50128/EN50129

For automotive look at ISO 26262

For medical look at IEC 62304 and/or FDA 510(K)

etc.....try some simple Google searches for whichever industry you are working in.

Regards, Richard. + http://www.FreeRTOS.org The de facto standard, downloaded every 4.2 minutes during 2015. + http://www.FreeRTOS.org/plus IoT, Trace, Certification, TCP/IP, FAT FS, Training, and more...

Vote

L

Les Cargill 9 years ago

There are those and they are all different. For medical there is IEC 62304. It's a holistic process standard.

One such approach is outlined in books by Bruce Powell-Douglas. It's a bit "executable UML" in its aroma but the principles apply.

"Safety critical" really expands to fill the process sphere of the project - it's well beyond just tools and paradigm selection.

While I'm sympathetic to this sentiment, I think it's rather an odd one. I've fixed too many hardware and FPGA problems in software to be too sympathetic.

I'd rather leverage Linux as a comms concentrator running a set of actual micro controllers with much smaller code bases - hopefully code bases verging on being provably correct ( for some weak version of "proof" ). Hopefully, you know what I mean by that,

This being said, Tiny Linux seems pretty stable.

I've seen ( and coded for ) platforms using Linux for industrial control which had the aroma of safety criticality but the "safety" bits were often as you say - discrete signals for emergency stop and the like.

All the observed failures were systems/hardware failures, not software ( after an appropriate field test and significant pre-field testing). Not that there were no bugs, just that the bugs were not critical.

One warning; the operators were trained and employed by the same firm that developed the software.

Les Cargill

Vote

Safety-Critical Software Design

Join the Discussion

Didn't find your answer?