Policy on rebooting?

- B
- bobster
  
  Contact options for registered users
posted
19 years ago

Tue, Oct 5, 2004 8:49 PM

Is it possible to have a policy on when to cause a reboot instead of just logging the error and continue?

I think you could say, for example, that any unrecoverable error should cause a reboot (like if malloc fails and you really cannot continue). Otherwise you log the error somewhere and carry on (maybe in other cases you can handle low memory conditions gracefully). Seems like the basis of a policy to me.

I think it is, but my boss says it isn't. He says it wouldn't work and each developer should use his/her judgement in each case.

What do you guys think?

- J
- Jeroen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Oct 5, 2004 10:23 PM

That in an embedded system one should not use dynamic memory allocation.

Jeroen

- N
- Neil Bradley
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, Oct 5, 2004 11:42 PM

That's an amazingly open ended question and the answer can be yes, no, or maybe depending upon the application, type of code, and discipline of the developers in question.

In general, dynamic memory allocation isn't a good idea for embedded systems that don't have any virtual memory. And if there is dynamic memory allocation, very, very close scrutiny and usage of the code should occur before it's committed to the codebase.

I have never worked on an embedded product that required reboots. Too often I've found that people use watchdogs and restarts to cover up problems in their code. Projects I've worked on have literally had to be up for years at a time, and none of them required reboots. But then again, we had disciplined, paranoid developers, tight code reviews, and very good testing, too. Not all projects have this or can afford that.

-->Neil

- R
- Reinhard Kopka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 7:17 AM

We use watchdogs. During development they can tell you (with some diagnostic) if something went wrong. In the finished product (or the intermediate versions for tests at the site) they can be helpful to keep the system running if something happened that you didn't considered. A reboot is often better than a completely blocked system (especially if a customer is watching :-). Of course the reason for this error must be cared for !

RK

- M
- mk
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 9:16 AM

[a bit OT] Why is dynamic memory allocation bad idea? What does dynamic memory allocation have to do with virtual memory?

-- mk

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 9:45 AM

... snip ...

Disagree. Dynamic allocation allows you to make better use of the available memory. If you can't do the job with the resources available without it, use it, but make sure you know what to do when the allocation fails. Also make sure you know how your allocater works, and if necessary design your own. The most likely unexpected failures will be due to fragmentation.

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
     USE worldnet address!

- P
- Peter Bushell
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 10:22 AM

Your boss is right, for the situation you are talking about. In an embedded system, it is almost *never* acceptable to reboot immediately in the face of an error (unless there is some backup system to take over). You must, at all costs, ensure that such a fatal error does not occur, or, if it does, that you have some graceful degradation strategy. Planned rebooting can be a good belt-and-braces strategy, though. I once worked for a company making PBXs. Before launch, they discovered they had a memory leak (see "dynamic memory" cautions in this thread and elsewhere). They got round the problem by rebooting at 0300 every night, provide there wasn't a call in progress. Later, the problem was fixed, but they kept the reboot, as a matter of policy, because it ensured a clean start, every day, no matter what might still be lurking there to destroy the system. Whether you can get away with that sort of thing depends upon your system requirements, of course.

--
--
Peter Bushell
http://www.software-integrity.com/

- R
- Rene Tschaggelar
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 12:23 PM

What do you gain by rebooting ? That it reboots every ten minutes ? IMO, there is no place to reboot, not even if half the world has gone missing. Run as long as there is power.

Rene

--
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
& commercial newsgroups - http://www.talkto.net

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 12:52 PM

Fragmentation.

After a few years of running the dynamic memory pool might not have 1 KiB of _contiguous_ memory available, even if there is more than 1MiB of free dynamic memory scattered all a round in 1-1000 byte fragments and any malloc/new requests larger than 1 KiB will fail.

With a few GiB of virtual memory, it only takes longer until the virtual memory is fragmented. The device might become obsolete and removed from service (or at least rebooted) before the whole virtual memory has been fragmented.

With a fragmented virtual memory, the performance drops, since usually only part of each virtual memory page is in use, thus more pages must be loaded from the page file on disk, when for example accessing a linked list.

Only if the virtual memory fragments are larger than the virtual memory page size, the unused pages are not loaded from the page file.

Paul

- D
- D. Zimmerman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 1:08 PM

This also depends on the operating system being used. Most real time OS's, which are often used for embedded, will only allocate fixed size memory blocks and will reclaim the entire block. This prevents the problem Paul described.

Dennis,

--
If sending a reply you will need to remove "7UP".

- W
- Walter Mallory
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 5:32 PM

Rene Tschaggelar wrote in news:4163e423$0$28029 $ snipped-for-privacy@news.sunrise.ch:

In some applications, the reboot will put the system in a "safe" condition. Having a watchdog timer as a final "safing mechanism" is common practice on space instruments (often to protect sensitive sensors) as well as in other applications. Not all applications have 24/7 uptime as the most important requirement. Some have as their overriding requirment to be as close to

24/7 uptime as possible but not at the expense of risking damage to the system. Service calls in space are *very* expensive (and often impossible).

--
Walter Mallory   walter.mallory@mbda-us.com
Reach, Connect, Celebrate, Grow, Serve

- L
- Lewin A.R.W. Edwards
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 5:39 PM

Ow. Sorry, I can't accept that as being a rational policy. It might be OK if you are hotfixing some field problem with the intention of tracking down the ACTUAL root bug at some time. But at that time you should remove the hotfix.

The problem with this kind of preemptive rebooting is that there is no logical limit on how often to do it. Once a day? Once every eleven minutes on Tuesdays? You're basically saying that a mole is known to peek out of his hole at infrequent intervals, so you are going to whack the hole - regardless of the mole's location - at slightly more frequent intervals. The difficulty there being that if there are multiple moles with different peeking frequencies, you might not catch all of the "mole in hole" events. And if the one mole changes his habits (e.g. due to external influences), your whacking policy breaks down horribly. You're masking one race condition with another, basically.

- W
- Walter Mallory
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 6:10 PM

snipped-for-privacy@larwe.com (Lewin A.R.W. Edwards) wrote in news: snipped-for-privacy@posting.google.com:

I don't know, sounds like it might be a reasonable risk management approach if the cost of even a single failure in the field is very high. If the reboot at 0300 is doing no harm and has a history of preventing the problem, even without the fix, I could see the decision to keep the reboot being prudent (if perhaps a bit "chicken soupish"). From a technical standpoint I don't like it much, but I can understand it from pure risk management point of view.

Except that prior history (apparantly) showed that once a day was adequate. I'm not really disagreeing with you, it rubs me the wrong way too from a technical standpoint, but I can understand the decision.

--
Walter Mallory   walter.mallory@mbda-us.com
Reach, Connect, Celebrate, Grow, Serve

- N
- Neil Bradley
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 6:16 PM

Please explain why close scrutiny and usage of the code should not occur before it's commited to the codebase. My point here was that if you're doing dynamic memory allocation, you had better be careful of memory leaks and how your code behaves.

There is no 100% right or wrong here, Chuck. Tell me that dynamic memory allocation in an environment with 256 bytes is a good idea. It isn't. But if you have megabytes or hundreds of K or even tens of K, the application may warrant it.

The most likely "unexpected" failures wind up being memory leaks in my experience.

You're "disagreeing" with me by restating what I said in a different way. I've noticed that you're quick to do such things in this newsgroup. Why so grumpy?

-->Neil

- E
- Endymion Ponsonby-Withermoor III
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 6:46 PM

Why is this a problem in embedded systems, which are generally doing the same "thing" day-in, day-out ?

I recently had to do an embedded system (weighing machine controler) which was heavy on user interface. With all the various weight formats and languages, there was a lot of string processing to be done. The only sane way to do this was to use a malloc() or equivalent. On each iteration of the main loop, all the mallocs had been discharged, and the heap was again one contiguous block of free 64K.

Doing dynamic allocation for building, say, a parse-tree where you are adding and removing things in arbitrary order, and where individual tree nodes may remain for a long time is not a good idea on an embedded system, but this doesn't mean that dynamic allocation per se should be damned.

Richard [in PE12]

- F
- Frank Bemelman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 7:28 PM

Well, he said 'In general...'.

Disagree ;) A lack of resources should never force a developper jumping though too damn small hoops. Insist on more resources or quit. There is enough rubbish around as it is.

--
Thanks, Frank.
(remove 'x' and 'invalid' when replying by email)

- F
- Frank Bemelman
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 7:41 PM

"Neil Bradley" schreef in bericht news: snipped-for-privacy@corp.supernews.com...

[snip]

But memory leaks are plain bugs. Fragmentation is, eh, sort of a bug. A allocation system that works with handles and lock/unlock may give the allocation system a chance to tidy things up. Then there's running out of memory all together, what should we call that?

[snip]

--
Thanks, Frank.
(remove 'x' and 'invalid' when replying by email)

- W
- Walter Mallory
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 8:03 PM

"Frank Bemelman" wrote in news:41644a30$0 $568$ snipped-for-privacy@news.xsall.nl:

A bad thing.

--
Walter Mallory   walter.mallory@mbda-us.com
Reach, Connect, Celebrate, Grow, Serve

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, Oct 6, 2004 10:21 PM

I am disagreeing with "isn't a good idea". I believe in snipping by paragraph, usually, so that the immediate context remains (unlike our glorious leader).

I have seen an application that did just that to manage about 256 bytes of storage. The reason was that storage was precious, and it had to manage variable length lists. It was a peculiar malloc, but it could never fragment.

Those should be long gone before the results are released. Those are gross coding errors to me.

I didn't intend it to come across that way. As I said above, I am basically disagreeing with the 'never malloc on Sunday' mantra. I can agree with 'never use malloc carelessly'. I have built systems that used the equivalent of malloc and ran continuously for years.

Another case is where you only malloc on initialization, and then use that for the duration of the run. Now you don't have to be especially careful.

If anything, the signal for care is the use of free or realloc.

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
     USE worldnet address!

- N
- Neil Bradley
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, Oct 7, 2004 12:00 AM

And that's one case where it might apply. Honestly, I've never run in to the fragmentation problem, but I see how it could be a problem.

Should is the operative word, but depending on the system, memory leaks may take days or months to show their ugly heads.

I didn't say "never malloc", I said "it's generally not a good idea". And that's a true statement - if you'd like to avoid problems in that area, don't use the construct. That's different from it's never a good idea, and there are scenarios (that you've outlined) where it is a good idea.

Yup - one time allocations we do all the time, but in those cases we don't use malloc or a heap manager, rather allocate from the top of memory downward.

-->Neil