Policy on rebooting?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Is it possible to have a policy on when to cause a reboot instead of
just logging the error and continue?

I think you could say, for example, that any unrecoverable error
should cause a reboot (like if malloc fails and you really cannot
continue). Otherwise you log the error somewhere and carry on (maybe
in other cases you can handle low memory conditions gracefully). Seems
like the basis of a policy to me.

I think it is, but my boss says it isn't. He says it wouldn't work and
each developer should use his/her judgement in each case.

What do you guys think?

Re: Policy on rebooting?

Quoted text here. Click to load it

That in an embedded system one should not use dynamic memory allocation.

Jeroen



Re: Policy on rebooting?
Quoted text here. Click to load it

That's an amazingly open ended question and the answer can be yes, no, or
maybe depending upon the application, type of code, and discipline of the
developers in question.

In general, dynamic memory allocation isn't a good idea for embedded systems
that don't have any virtual memory. And if there is dynamic memory
allocation, very, very close scrutiny and usage of the code should occur
before it's committed to the codebase.

I have never worked on an embedded product that required reboots. Too often
I've found that people use watchdogs and restarts to cover up problems in
their code. Projects I've worked on have literally had to be up for years at
a time, and none of them required reboots. But then again, we had
disciplined, paranoid developers, tight code reviews, and very good testing,
too. Not all projects have this or can afford that.

-->Neil



Re: Policy on rebooting?
On Tue, 5 Oct 2004 16:42:56 -0700, "Neil Bradley"

Quoted text here. Click to load it

We use watchdogs. During development they can tell you (with some
diagnostic) if something went wrong.
In the finished product (or the intermediate versions for tests at the
site) they can be helpful to keep the system running if something
happened that you didn't considered. A reboot is often better than a
completely blocked system (especially if a customer is watching :-).
Of course the reason for this error must be cared for !

RK

Re: Policy on rebooting?
Quoted text here. Click to load it

[a bit OT]
Why is dynamic memory allocation bad idea?
What does dynamic memory allocation have to do with virtual memory?

-- mk

Re: Policy on rebooting?

Quoted text here. Click to load it

Fragmentation.

After a few years of running the dynamic memory pool might not have 1
KiB of _contiguous_ memory available, even if there is more than 1MiB
of free dynamic memory scattered all a round in 1-1000 byte fragments
and any malloc/new requests larger than 1 KiB will fail.

Quoted text here. Click to load it

With a few GiB of virtual memory, it only takes longer until the
virtual memory is fragmented. The device might become obsolete and
removed from service (or at least rebooted) before the whole virtual
memory has been fragmented.

With a fragmented virtual memory, the performance drops, since usually
only part of each virtual memory page is in use, thus more pages must
be loaded from the page file on disk, when for example accessing a
linked list.

Only if the virtual memory fragments are larger than the virtual
memory page size, the unused pages are not loaded from the page file.

Paul


Re: Policy on rebooting?

Quoted text here. Click to load it

This also depends on the operating system being used.  Most
real time OS's, which are often used for embedded, will only
allocate fixed size memory blocks and will reclaim the
entire block.  This prevents the problem Paul described.

Dennis,

--
If sending a reply you will need to remove "7UP".


Re: Policy on rebooting?
Quoted text here. Click to load it

This does not *prevent* the problem, it only limits/slows the problem
if the application has a particular pattern of dynamic memory usage.

Imagine a limited supply of large fixed size blocks. If these blocks
are allocated and released and then subsequently allocated and
fragmented to fulfill the needs of another part of the system
requiring a smaller block size (of which there are no more), then
eventually such an allocation system will fail as well.

Of course, we're talking about general malloc/free new/delete here,
not private memory pools, which are pre-allocated for a particular
part of the system with fixed sized blocks.

What's more is that block memory allocation schemes reduce the
"memory efficiency" advantages of dynamic memory allocation by
wasting the unused portion of each allocated block.

BTW, allocation is not the issue. If memory is never freed,
then fragmentation will not happen. Of course, then its not
*really* dynamic.

--
Michael N. Moran           (h) 770 516 7918
5009 Old Field Ct.         (c) 678 521 5460
We've slightly trimmed the long signature. Click to see the full one.
Re: Policy on rebooting?
Quoted text here. Click to load it
The fixed-size-block allocation scheme is foolproof unless some fool tries
to circumvent its "shortcomings" by doing the kind of thing you describe
above. The solution is to have the right number of right-sized pools
containing the right number of right-sized blocks (as you say below) - but
with *no poaching allowed*. If you can't get the proper-sized block, then
there aren't enough, so back to the drawing board. Stealing a bigger block
from another pool (transparently or not) is not acceptable; it just
re-introduces fragmentation by the back door.

Quoted text here. Click to load it

I agree wth eschewing malloc() and free(). However, if you want to use C++
effectively in an embedded system, you really have to customise new() and
delete() to use a safe, fixed-size-block scheme tailored to your system.
This is painful, but essential. It is very hard to use C++ effectively
without using new() and delete().

Quoted text here. Click to load it
True, but better that than the indeterminacy of fragmentation.

Quoted text here. Click to load it
Also true, and not as daft as it sounds. Using such a "heap" is quite a
convenient way of initially allocating pools, stacks, etc.


--
--
Peter Bushell
http://www.software-integrity.com /



Re: Policy on rebooting?
Quoted text here. Click to load it

Like malloc/free , new and delete get their resources from a
pool that is *at least* global to a given thread if not the
entire system.

However, this is *not* true about "placement new", which I
use to construct objects using statically/pre allocated memory
as required.

My point is that a general purpose new/delete
(beyond the "placement" variety) is the problem
due to heap fragmentation issues.


Quoted text here. Click to load it

Note that I am not arguing against the use of fixed block
pools, but against general purpose dynamic memory allocation,
which allocates *dynamically sized* objects from the same
source. IOW, I *am* arguing that such schemes introduce
fragmentation and are therefore evil in any system that
is to execute indefinitely.


Quoted text here. Click to load it

Yep ... I pre-allocation at startup frequently.


--
Michael N. Moran           (h) 770 516 7918
5009 Old Field Ct.         (c) 678 521 5460
We've slightly trimmed the long signature. Click to see the full one.
Re: Policy on rebooting?

Quoted text here. Click to load it

A production system should not have, what we called "memory block
promotion" enabled.  This is where a larger block is given when no more
appropriate-size blocks are available.  It is one of the hardest things to
tune, but really necessary.  I found that determining the appropriate
sizes of the pools statically was impossible.  Hence we did it with lots
and lots of testing/tuning.  



Re: Policy on rebooting?
Quoted text here. Click to load it

It seems to me that such a system with "memory block promotion"
disabled is prone to behavior changes due to run-time variations in the
order in which blocks of various sizes are allocated. This could wildly
affect the memory usage efficiency, if several large blocks are
allocated and released before small blocks, and thus limiting the
number of small blocks available. Thus, determinism is affected.

Further, the situation can still exist where there is plenty
of memory available for a purpose, but it cannot be used
because it exists in the wrong "size partition." That seems
like a fragmentation issue, though perhaps of a different kind.


--
Michael N. Moran           (h) 770 516 7918
5009 Old Field Ct.         (c) 678 521 5460
We've slightly trimmed the long signature. Click to see the full one.
Re: Policy on rebooting?
Quoted text here. Click to load it
... snip ...
Quoted text here. Click to load it

The problem is not the use of dynamic memory, the problem is
fragmentation.  Avoiding this doesn't require garbage collection,
etc, in fact that is harmful.  What is required is a means of
avoiding fragmentation while preserving efficiency.

What you can do (and I have in the past) is build an allocater that
functions through indirect pointers, so that the allocation can be
moved.  Then a routine can be called periodically, basically in the
idle time of the process, that advances one step towards complete
defragmentation.  That means find the lowest (or highest) portion,
move it, and update what the indirect pointer points to.  Sooner or
later, barring continuous allocate/free, the memory will be
compacted.  So you usually have to be able to test whether there is
sufficient memory when malloc fails, and just postpone something
until it can succeed.  My application timed out the use of the
storage (which was a store and forward message system) and dumped
the timed out message to the console with an undeliverable tag
attached.  So we were guaranteed eventual freeing of the storage.

If you can guarantee that only one pointer to a particular block is
in play at any time, you can, in the allocation supervisory block,
replace the indirect table with a pointer back to the actual
pointer.  I hope that makes sense, but I am too lazy to make an
ascii diagram.

I repeat, dynamic memory is not the problem.  Fragmentation is.

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Policy on rebooting?

Quoted text here. Click to load it

While this would work nicely in a cooperative multitasking system, how
do you handle the situation in a preemptive system ?

Running the compacter in the idle task simplifies things, since each
dynamic memory element access does not have to be protected in normal
code, but you still have to disable interrupts when the segments are
actually moved around in the compacter, to prevent any high priority
task from activating during this move operation. If large amounts of
data must be moved, this may disable the interrupts for an
unacceptably long time.  

Paul


Re: Policy on rebooting?
Quoted text here. Click to load it

The compactor moved only one allocation at a time, thus limiting
any possible interrupt inhibition.  This is relatively inefficient
for compaction, since various things have to be recalculated on
each call, but who cares - this is idle time anyhow.  As far as
large items are concerned, this application knew each live item was
at most 100 or so bytes, and the (possibly) large free areas didn't
have to be copied.

This was another application that ran undisturbed for years, with
no reboots.

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.
Re: Policy on rebooting?

Quoted text here. Click to load it

One way around this problem would be to disable interrupts, make the
decision to move one block and temporarily reserve a block at the
final destination. The interrupt can then be enabled and the actual
copying started. After the copying, the interrupts are disabled and if
the copying had been done without an interrupt (e.g. each interrupt
routine sets a flag when executed), the movement is committed and the
original block is freed. However, if there had been interrupts, the
temporary copy result is discarded (rollback) and tried again at some
later time. Anyway, the interrupts are enabled again.

Of course, the worst case copy time must be much less than the minimum
time between any interrupts, so that the compactor will eventually be
able to do the move.

Paul
  

Re: Policy on rebooting?
Quoted text here. Click to load it

Actually, those guys who have worked with TI DSP chips and read about the
XDA(I)S (eXpress DSP Algorithm Standard) already know this. Those who
haven't, through experience may come to this (or may not). A few key points
of the XDA(I)S (the I is in parentheses 'coz TI's changed the name by
sticking in this additional letter) are:

- the algorithm instance (think of C++ class realization as an object, with
common code and as many "*this" things created as required) must never: use
any global data but constants and any peripheral registers, call OS
functions, i.e. be the kind of software that can't break the system or any
other instance of the same or other algorithm

- memory allocation/freeing is unified in a way that when you create an algo
instance (think of constructing a C++ object) the algorithm says what memory
it wants, how much and how aligned but it does not allocate/free anything on
its own. This information is then passed to the whatever memory manager you
have (standard or custom, doesn't matter as long as you comply with the API)

- the algorithm functions (think of C++ class member functions) are always
called with 1st parameter being the pointer to the algorithm's data/object
(think of the "this" pointer of C++)

- memory fragmentation can be reduced by calling the standard XDAS moved()
function (think of an abstract member, whose interface is defined but the
inner implementation is up to the programmer). This moved() member is used
to relocate the instance (its data) to a different location if needed, in
this function you should update/relocate any pointers internal to the
instance/data.

- other nice things...

So, following a few simple rules and mechanisms you get a safe piece of code
that can be used in multichannel and multithreaded systems and you can
defragment memory whenever needed by calling the moved() methods of the
instances.

IMO, this is a rather useful standard to follow when designing code.

Alex
P.S. I've referred to C++ because it's very similar concept, but not C++.
It's all pure C. Probably more closer to Objective C (correct me if I'm
wrong).




Re: Policy on rebooting?
Quoted text here. Click to load it
Actually, it does prevent the problem.  The reason is that
the RTOS won't take (for example) a 256 byte space and
re-allocate it in smaller chunks, then turn around and try
to make a 256 byte allocation out of a 128 byte and two 64
byte blocks.  Basically there is a memory pool.

The downside is what you alluded to in that it is not as
memory efficient.  For example, a structure that needs 200
bytes will get a 256 byte allocation.  Life is not perfect.

Dennis,



--
If sending a reply you will need to remove "7UP".


Re: Policy on rebooting?
Quoted text here. Click to load it

Such allocation algorithms certainly exist, however, while
maintaining "ease of use" for the programmer, such schemes
can lead to stability/determinism/efficiency issues due to
fragmentation at a different level of granularity.

Quoted text here. Click to load it

Life is not perfect, however, if a system is *required* to
execute predictably for an indefinite period of time
without reboot (most autonomous embedded systems), then
these issues must be addressed directly *by design*
and ad-hoc use of malloc/free new/delete is inappropriate.


--
Michael N. Moran           (h) 770 516 7918
5009 Old Field Ct.         (c) 678 521 5460
We've slightly trimmed the long signature. Click to see the full one.
Re: Policy on rebooting?

Quoted text here. Click to load it

Why is this a problem in embedded systems, which are
generally doing the same "thing" day-in, day-out ?

I recently had to do an embedded system (weighing machine
controler) which was heavy on user interface. With all the
various weight formats and languages, there was a lot of
string processing to be done. The only sane way to do this
was to use a malloc() or equivalent. On each iteration of
the main loop, all the mallocs had been discharged, and the
heap was again one contiguous block of free 64K.

Doing dynamic allocation for building, say, a parse-tree where
you are adding and removing things in arbitrary order, and where
individual tree nodes may remain for a long time is not a good
idea on an embedded system, but this doesn't mean that dynamic
allocation per se should be damned.

Richard [in PE12]

Quoted text here. Click to load it



Site Timeline