computer reliability

Found this recently:

++++++++++

Subject: Tech worker: 'Blue screen of death' on oil rig's computer

Gregg Keizer, *Computerworld*, 26 Jul 2010

A computer that monitored drilling operations on the Deepwater Horizon had been freezing with a [BSOD] prior to the explosion that sank the oil rig last April, the chief electrician aboard testified Friday at a federal hearing.

In his testimony Friday, Michael Williams, the chief electronics technician aboard the Transocean-owned Deepwater Horizon, said that the rig's safety alarm had been habitually switched to a bypass mode to avoid waking up the crew with middle-of-the-night warnings.

Williams said that a computer control system in the drill shack would still record high gas levels or a fire, but it would not trigger warning sirens, He also said that five weeks before the April 20 explosion, he had been called to check a computer system that monitored and controlled drilling. The machine had been locking up for months. You'd have no data coming through." With the computer frozen, the driller would not have access to crucial data about what was going on in the well.

The April disaster left 11 dead and resulted in the largest oil spill in U.S. history.

==========

What can i say? MS Windows should not be used for safety critical systems in any way.

Reply to
JosephKK
Loading thread data ...

Old news:

The Yorktown lost control of its propulsion system because its computers were unable to divide by the number zero, the memo said. The Yorktown=92s Standard Monitoring Control System administrator entered zero into the data field for the Remote Data Base Manager program. That caused the database to overflow and crash all LAN consoles and miniature remote terminal units, the memo said.

formatting link
dead-in-the-water.aspx

Reply to
Richard Henry

I didn't know BSOD was $MS ?

Reply to
Jamie

I think the following forum should be of interest to anyone using computers:

formatting link

--
Muzaffer Kal

DSPIA INC.
ASIC/FPGA Design Services

http://www.dspia.com
Reply to
Muzaffer Kal

formatting link

----------------------------------------- Waaayyyy too much reading to do in a reasonable amount of time. If you can point to any documentation that would be applicable to the subject of this thread, please do so. I'm not a Windows proponent, but since it's the OS that runs all of the apps that I need and like, it's the one that I use and prefer until something much better comes along.

Also, the BSOD can be attributed to Windows malfunction or misconfiguration, a hardware failure, or application software failure or misconfiguration. I haven't heard whether the actual cause of the BSOD was ever determined. Until that can be known, you can't put the blame on the OS. At any rate, the brunt of the blame should rest on the computer tech, since, apparently, the problem was never resolved.

As to the the Yorktown issue, that problem was most likely an application software deficiency, not the OS. Any software developer worth 10% of his pay will trap and handle bad data entry occurrences, which is what that was. If the application software calculates and attempts to use a zero value in a calculation it should detect that and handle it so as not to crash either the OS or the application.

--
David
dgminala at mediacombb dot net
Reply to
Dave M

Related story in latest comp.risks says they turned off the alarm system at night so workers could sleep and not have to wake up for the frequent false alarms at 3:30 :(

Grant.

Reply to
Grant

formatting link
Whoever wrote the data entry program should be strung up buy the balls for NOT checking the validity of EVERY parameter entered during entry! There is absolutely NO excuse!

Reply to
Robert Baer

Robert Baer wrote:

The Rules of Operating System Design #1 Applications must never crash the OS. #2 APPLICATIONS MUST NEVER CRASH THE OS.

Reply to
JeffM

.

It's really hard to arm chair analyze the BSOD. In an industrial environment, you have sensors going to i/o boards, noise spikes, etc. This can easily be a hardware problem.

I've had usb soundcards lockup linux in the past. Current ALSA seems a bit more robust.

Reply to
miso

MS Windows should not be used for _any_ systems. ;-)

formatting link

What kind of idiot "programmer" fails to check for a divide-by-zero condition? Maybe I'm O/C, but when I write a program that uses data, I mercilessly limit-check the data - of course, what action to take with bad data would depend on the application.

And I certainly wouldn't do it on a Doze platform!

Cheers! Rich

Reply to
Rich Grise

formatting link

On a Linux system, when an app crashes it doesn't take down the whole furshlugginer system.

Cheers! Rich

Reply to
Rich Grise

No. The OS must not be *able* to be crashed by an application. *WHATEVER* mischief the application tries to get into.

Reply to
krw

BSODs are usually caused by a bug in the OS itself -- some user mode application makes a system call, and some driver or other part of the OS doesn't check parameters or whatever and -- poof! -- a bug causes a critical bit of memory to be overwritten or some important process table trashed.

What people are really saying is that, "those writing device drivers and the OS itself need to be held to a higher standard than those just writing user mode apps," and I'd agree with that. Writing device drivers is also not the kind of thing you usually see beginning programmers do either (there is no, "Windows Device Drivers for Dummies" or "Windows Device Drivers in 24hrs" book out there -- yet). Nevertheless, over time there have been plenty of buggy drivers written by well-known companies that certainly had the resources to do better. E.g., some Creative Labs Sound Blaster drivers would crash and burn on multi-processor PCs, because they didn't bother to appropriate lock and synchronize access to their various queues and other data structures. They had this problem for years, and chose to ignore it because, up until the point that Intel started putting multiple cores on a single IC (and true multi-processing became inexpensive), it was only high-end users and "enthusists" with dual- or quad-CPU motherboards and Creative felt that was a tiny enough market that they could ignore it. :-(

---Joel

Reply to
Joel Koltner

Ok, but that doesn't change the point; *nothing* in user-mode should *ever* crash the OS. This failure was one caused by exactly this (invalid entry).

Certainly.

Like M$. How many times have they done kernal mode things in user mode?

Creative has always ignored reliability. Their products have *always* sucked as badly as M$, or worse. I'm surprised they've survived.

Reply to
krw

miso@ sushi.com wrote:

You're not keeping up with the thread. ...and the fact that the term even exists and is widely recognized is evidence that that platform is the wrong choice.

1) In 1997, the guided missile frigate USS Yorktown was dead in the water for over an hour because **an app** tried to divide by zero, showing that OS (NT) is unsuitable for mission-critical operations. The point is, on a properly-designed OS, the *application* layer shouldn't be permitted to take down the OS, thereby taking down the entire system it controls. 2) In 2010, the Deepwater Horizon was running NT (again, shown unsuitable for mission-critical operations) and was so unreliable that the operator disabled parts of the system.
formatting link
"Deepwater.Horizon"+DrillWorks+%2BDrillWorks+-inurl:groups+-JeffM&hl=all
formatting link
"DrillWorks software operates [only] on Windows 95/98/NT4.0"

The logical thing for that multi-billion dollar corp to have done was

1) Stop using the unreliable OS. 2) Select a reliable OS (to which they have the source code). 3) Hire someone to write an app that does the task with the corp RETAINING FULL RIGHTS TO THE SOURCE CODE.

This isn't rocket surgery.

...and while Keith criticized my syntax, *he* did get my point. Joel is leaning in the right direction too.

Well, as an example, there's OpenBSD which is widely known as an extremely stable platform

formatting link
...and if apps are open and written to POSIX standards, they can be run on *many* platforms.

...but, when presented with a life-and-death situation, people who go immediately to *Windoze* are clearly clueless.

Reply to
JeffM

EDICT-0001

The term buggy whip exists, but you hardly hear it mentioned these days. BSODs used to be common, but with protected memory, they are rare.

BSD is no picnic either. You just don't hear much about it because it is usually only used in servers, and servers don't tend to get a lot of weird ass stuff attached to them. Servers run a very small subset of the available software.

Reply to
miso

e:

WHATEVER*

ical

r*

ry).

the

ser

no,

" book

buggy

to do

?

nd

=A0They

point

was a

sucked

Creative survives because everyone else is just as bad. On desktop linux boxes, the only thing I run are C-media boards. At least the drivers work.

Firewire devices seem to be very reliable. What did they get right that USB didn't?

Reply to
miso

 *WHATEVER*

book

do

point

USB is simply a souped up keyboard and mouse clocked serial data interface: 5V, clock, data, 0V down only just so far of shielded cable. I think USB3 introduces some LVDT lane tricks for the higher speed link options, like the SATA serial data connection.

Grant.

Reply to
Grant

 *WHATEVER*

book

do

point

a

And slew-rate controls to keep the EMI in check... what I have the patents for (Intel only knows fast, zippo on analog :-) ...Jim Thompson

--
| James E.Thompson, CTO                            |    mens     |
| Analog Innovations, Inc.                         |     et      |
| Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    |
| Phoenix, Arizona  85048    Skype: Contacts Only  |             |
| Voice:(480)460-2350  Fax: Available upon request |  Brass Rat  |
| E-mail Icon at http://www.analog-innovations.com |    1962     |

                   Spice is like a sports car... 
     Performance only as good as the person behind the wheel.
Reply to
Jim Thompson

Well, concerning noise spikes, i connected a 2.5KV regulator tester to my PC via an A-D/DIO board and it was a simple matter to filter and isolate "bazz-fazz" from the tester - so that there would be no problems with the PC. Hardware problem? Yess.. Fixable? Yess..

Reply to
Robert Baer

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.