Self restarting property of RTOS-How it works?

- T
- Terje Mathisen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 7:58 AM

I have noted previously here in c.a that most of the _really_ good low-level/systems programmers I know seem to have an engineering instead of computer science background.

A coincidence?

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"

- G
- Guy Macon
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 9:09 AM

Like I said, only if you define "crashing" as a subset of "software problem." If you *by definition* say that only software problems are crashes, then you can of course conclude that only software problems are crashes - a tautology. If you define crashing as a certain class of undesirable system behaviors (as I do), then it doesn't matter whether the described behavior is caused by an infinite loop with interupts turned off or a bad RAM chip.

- C
- Casper H.S. Dik
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 12:42 PM

Yeah, let the software guys figure out how to work around it :-)

It does point to one thing: hardware errors are really hard to fix you need to ship new, physical, product. Software can be fixed after it was shipped at little extra expense for the manufacturer (but potentially a lot for the customer).

Casper

--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

- K
- Ken Hagan
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 1:43 PM

Therefore, it would be foolish, from a business point of view, to build software to the same standards of correctness as hardware. And sure enough, it doesn't happen.

Whether software companies are getting the cost/quality balance right is debatable, but the idea that they should aim for the same balance point as hardware companies is surely wrong.

- R
- R Adsett
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 3:40 PM

Let me add another example of a HW problem leading to a crash. Putting a quickly re-occuring interrupt on the NMI input. If an interrupt burst lasts long enough it will bring the micro to a halt. Even a short burst can cause a large stack depth as context is save repeatedly overwriting areas of memory used by other parts of the program causing problems later in execution. I'd call that a crash and it's not solvable in software. There are a number of harware solutions of greater and lesser acceptability.

Robert

- G
- Guy Macon
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 4:49 PM

I disagree on both counts. Software *on a PC* can be fixed after it ships, but PCs are only a small but visible part of the total worldwide production of computer hardware/software. I have produced toys that sold more units per year than all the PC manufacturers combined, and the software was in masked ROM. It was also built to the same standards of correctness as the hardware. Over on the other end of the price/quality spectrum, I have worked on aircraft parts, and here too the software was built to the same standards of correctness as the hardware. This is true of nearly all embedded systems.

--
Guy Macon

- C
- Casper H.S. Dik
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 5:13 PM

There's no proof that embedded systems are not engineered to a higher standard; I'm certain they are (well, not for small household electronics things or not even cars; I've encountered bugs in all of my embedded electronics devices, including my car, tvs, dvd player, etc)

Casper

--
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 6:19 PM

That's why the historical succession from masked ROMS to PROMS, EPROMS, EEPROMS, flash, etc. is important. They all eased the fixing process. Another similar area is PLAs replacing random wired logic.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson

- E
- Elder Costa
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Feb 10, 2005 10:03 PM

First of all, I dare to say this kind of hardware is mostly if not totaly software as these processors are synthesized from (Verilog? VHDL?) building blocks. Therefore some or most of software engineering applies I guess.

I think Ganssle meant much more the mindset than the knowledge/expertise area itself. Hardware design and development also carries its own set of "bugs" and bad practices though (I wonder how many engineers design based only on components typical figures.)

Regards.

Elder.

- S
- s.subbarayan
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 4:38 AM

Hi, Thanks for your reply.My inference from your reply is that better avoid using taskrestart when working on critical areas and use it only on noncritical tasks. But only one doubt reminds still to me: I agree that its not possible to resume a task,when it crashes.But when I restart it,How do we make sure that we are in synch with the current status of the application,the same dosage problem as stated in your posting?What if the task which is restarted requires a semaphore or a message which is held by some other task at the time of restarting?How do we make sure that the application reaches safely with out causing any trouble to the current situation?

Regards, s.subbarayan

- I
- israel t
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 10:18 AM

So does that mean I can order saag ghosht at MacDonalds :-)

- I
- israel t
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 10:22 AM

Or for an example of hardware errors in a mature industry, consider the hundreds or thousands of car recalls that occur globally each year.

- J
- Jan Vorbrüggen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 12:03 PM

Slightly different issue. This occurs because with more units deployed, you manage to hit those areas of failure that before were inaccessible, due to simple statistics. Many of the software failures I've seen, on the other hand, are of the trivial kind a cursory user would find in a few minutes to days. (In)famous was one of the major VMS releases - V5, if I remember correctly - where it took three people from our institute literally minutes to find a dozen bugs or so ranging from the annoying

- changing the password gave the success message twice - to the grave

- doing something slightly unexpected on the command line lead to a reproduceable system crash. We wondered what that year of internal and external field test had been wasted on.

Jan

- E
- Ed Beroset
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 12:27 PM

I have also noticed that the programmers from a computer science background tend to be much better at working out a system architecture and planning first.

My hypothesis: the more detail-oriented people tend to gravitate toward the engineering side, and tend to excel at detail-oriented tasks, while the computer science people tend to be better at big picture and abstract concepts.

Just MHO.

Ed

- P
- prep
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 12:38 PM

get lost.

THe Therac problem was that no one considered what would happen if the operator did an ABA type mode change with out waiting for either step to complete. The result was the real world was out of step with the software internal state. This negated the effect of the saftey checks.

Although no input SHOULD cause a fault, and all Florida swampland should be wonderfull, in real systems there are some things where you need to act ASAP to prevent interesting times. Railway switching, BOS converter injection, plating power supplys to think of a quick few.

It is not the system that figures out if a task is restarted or not, that is for the designer. The system just has to implement it.

--
Paul Repacholi                               1 Crescent Rd.,
+61 (08) 9257-1001                           Kalamunda.
                                             West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 1:22 PM

!!!! My experience is that they are generally CATASTROPHIC at that; MUCH worse than even engineers :-(

Oh, yes, they work out an 'architecture' and a 'plan', but it is usually based on a completely unrealistic view of the world, where nothing ever goes wrong and nobody ever makes a mistake. The worst fault is usually that they regard it as reasonable to omit all error recovery, diagnosis and robustness, and claim that going bananas is a perfectly reasonable response to a natural human error.

Also, they regard it as perfectly reasonable to produce interfaces that positively encourage such errors, and fail to see that it is the responsibility of a designer to ensure that the product is (as far as is possible) easy to use and fail-safe in operation.

There are a FEW meritorious exceptions, and some computer science academics who would love to change this but are constrained by the pressure to produce graduates with the widest possible (theoretical) knowledge in the shortest possible time. You CAN'T teach an engineering attitude in a short lecture course - it needs practical training, and lots of it.

|> My hypothesis: the more detail-oriented people tend to gravitate toward |> the engineering side, and tend to excel at detail-oriented tasks, while |> the computer science people tend to be better at big picture and |> abstract concepts.

I.e. producing ridiculously unrealistic designs and leaving all the real work to someone else.

Regards, Nick Maclaren.

- D
- Del Cecchi
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 2:15 PM

Those comp-sci geniuses are the ones that gave us a software paradigm that is susceptible to attacks as simple as buffer overruns, and store data in randomly scattered chunks linked by pointers. And put multiple unrelated locks in the same cache line? That the ones you are talking about?

Del cecchi

- K
- Keith R. Williams
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 3:07 PM

Not really. A lot of the processor may be synthesized from HDL, but much is in custom circuits with perhaps an HDL model of the custom circuit for simulation. Either way, you can't reboot them from a new HDL file either. Silicon's got to change.

Sure. Statistical models are used in the design process. Sometimes one designs to a number that's even better than the process mean (I suppose that's your definition of "typical"). If one designed for worst case nothing would work because it would never be built.

--
  Keith

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 6:15 PM

Reminds me of the last GM car I will ever own, which happened about

25 years ago. Within the first 500 miles it had seized the front brakes (that took less than 10), destroyed a fan belt pulley (with no replacements in the parts stream, jury rigged with a weld. Two months later they had a replacement pulley). Within 10,000 miles a front door had literally fallen off. By 60,000 miles the engine block was cracked due to a non-functional freeze plug (this was also due to a careless mechanic who changed the coolant to pure water in the summer while repairing the failed heater).

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson

- G
- Guy Macon
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Feb 11, 2005 6:19 PM

Actually, they did consider it and concluded that it was impossible

- and they were (sort of) right. The Therac had separate well-tested code that ran the machine, and separate not-so-well-tested code that ran the operator interface. As originally shipped, it was impossible to complete the data input that fast. Then they started getting complaints about having to re-enter the data in a bunch of fields every time, so they put in a feature where a tab would give you the same input as was used in the last run. Because it was in the operator interface code, it didn't get tested as well. What testing they did do failed to show the bug because developers tend to watch the screen looking for odd behavior, while an actual operator hits the tab key as fast as he/she can in order to do the next run.

I still think that the biggest error was taking out the microswitch with hardware that wouldn't let it operate if the mechanical moving parts had not arrived where they should be. Just sending the move command and waiting N seconds was an unacceptable system design decision whether or not the code was buggy. The cryptic error messages and the ability to keep trying to dose the same patient over and over in response to an error message didn't help things.

I have worked on systems for aircraft where the software engineer was invited to write malicious code that would damage the hardware, with a reward of an extra week of vacation time for writing that code. Then the hardware engineer was invited to induce a single fault that would cause the real software to lock up, go crazy, etc, with the same reward offer.

--
Guy Macon