Embedded systems using hard disks - reliability quandary

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
We make a product containing, among other things, an LCD and a hard
drive. We have until this moment been characterizing the product
lifespan based on the LCD backlight's rated lifespan, since if you
look at the speced MTBF column in our BOM, the CCFL has the smallest
number.

Recently, however, I was asked to add spindown-HDD-after-idle
functionality to reduce acoustic noise. Then someone said "it should
also increase drive lifespan greatly" and I started to research this
topic using the full 200-page drive datasheets instead of the one-page
spec sheets we used previously (which is where we got the single
"MTBF" number). Now, I'm thoroughly confused. I'm trying to work out
some reasonable defaults to reduce drive spinning time (= reduce noisy
time) without overstressing some part of the mechanism and inducing
premature failure.

Looking at one particular drive - IBM DJSA-220 - the drive is rated
for 5 years or 20,000 power-on hours, whichever comes first. At 24/7
power-on, that's about 2.28 years. However, the assumptions in that
lifespan are: less than 333 power-on hours per month (not valid for
our product, which is normally powered up 24/7), and seek/read/write
operations occupy less than 20% of power-on
hours (might or might not be valid for our product, depending on
exactly what the user is doing). The datasheet essentially says that
all bets are off if those limits are exceeded. Furthermore, the drive
is rated for 300,000 normal head unloads, and 20,000 emergency
unloads. Our product's power switch is an emergency unload. Me
spinning down the drive in software is a normal unload.

These questions become even more interesting for some other drives we
use regularly, e.g. Fujitsu MHK2120AT. They are rated for the same 5
years/20k hours (250 hours/month maximum, and 1 power-cycle per day
REQUIRED, but no more than 50 spinup/down operations per day!), but
they are only rated for 50,000 spindle start/stop operations total (it
appears this limit is related to the associated head load/unload
operations, not specifically the spindle motor). If the user has the
drive auto-spinning down every 15 minutes (not unreasonable, this
means less than 1.5 years before something dies. Weirdly enough, the
250 hours/month and max 50 spinups per day limits are removed if you
can guarantee to keep the disk envelope at 48 Celsius or below (which
we can't).


Is there anyone else who uses 2.5" IDE hard disks in an embedded
system, and has developed a sane method of choosing default power
management settings? I would like to be able to say "yes, guarantee it
for three years", although two would do. As a secondary point, I'd
like to know what to put in the product's instruction manual, since
the HDD sleep time is user-configurable. Do I say "anything other than
our carefully tuned default setting will reduce the lifespan of your
hard disk"? Or should we just put a waiver in the warranty saying the
HDD is only warranted for 12 months, and after that time only the
labor is free?

Re: Embedded systems using hard disks - reliability quandary
snipped-for-privacy@larwe.com (Lewin A.R.W. Edwards) wrote in

[snip]

Quoted text here. Click to load it

FWIW,

I think you will find that these drives are not supposed to run 24/7 and
will start posting Error[UNC] errors sooner than you'd like under heavy
load. I encourage you to enable the maximum AAM (hopefully supported) to
both quiet the seeks and slow them down which is good for lifetime. Then
enable the drives APM and set the auto-spin down feature so that your
software does not need to. The drive will spin up when you access it (be
sure to allow the required 31 seconds before calling an error though). I
use 3.5" DeskStars.

--
- Mark ->
--

Re: Embedded systems using hard disks - reliability quandary
Hi Mark,

Hmm. Interesting response. I guess I should go into more detail:

Quoted text here. Click to load it

Yes, this makes sense given that they're intended to be laptop drives. I
will download some of the 3.5" specs and compare the reliability data. But
our housing would require radical redesign to accommodate a 3.5" drive
( And heat would be a problem, with those high-speed desktop drives.

However, the typical failure modes we see are:

* Spindle bearing noise suddenly increases. This usually causes an end-user
complaint before the unit has time to actually fail. This problem happens
mostly with pre-Hitachi IBM drives, but it also happens with 20Gb+ Fujitsus.
But the spindle noise varies widely even inside a single batch of drives;
I've opened a box of 20 to test this phenomenon and found two or three loud
ones alongside 17-18 imperceptibly quiet ones, all with the same manufacture
date.

* Drive won't spin up. Interestingly, I have opened a couple of these drives
and found the bearing is jammed really, really hard. It's not head stiction,
because I can see the drive seeking the heads across the stationary disk. If
I break the jam by turning it a couple of revs by hand (with power off!),
then replace the cover, the drive operates flawlessly (apart from SMART
reporting "imminent failure! Danger, Will Robinson!!"). This problem only
occurs on Fujitsu drives 12Gb and smaller (12, 10 and 6 are the sizes we
have used; these are non-coincidentally all out of older units, too).

Quoted text here. Click to load it

Do you mean the acoustic noise management feature? It's supported on all the
drives we have shipped to date. I set it to maximum in this latest beta
firmware version (the first version to support spindown). However I have to
give the user the option of disabling that feature, because it adversely
affects the product's ability to play high-bitrate video.

Quoted text here. Click to load it

Oh, I don't spin the drive down manually, I use the normal set sleep time
command and let the firmware handle spindown. Sorry, should have clarified
that :) I allow the user to configure it to "disabled", or in 1-minute
intervals from 1-20 minutes.

Quoted text here. Click to load it

How long have you been fielding products based around hard drives? I'd be
very interested to hear what kind of real-world reliability results you're
seeing 12-36 months down. Our normal production process includes a 48-hour
burn-in designed to catch as many moribund infants as possible. Recently we
have had a spate of units between two weeks and six months old, suddenly
dying. In descending order of frequency, the top problems are:

* Intermittently bad RAM. The RAM is from different vendors/batches, and
appears to test good at first glance, but swapping out the SODIMM is
guaranteed to fix the problematic unit, and it's not dirty connectors
because I've tried cleaning them.
* Hard disk failures of the type mentioned above, particularly sudden
increases in bearing noise.
* Mainboard failures. In particular, we are observing the CS5530 chip is
just not putting out a video signal. Poking around on the board, all the
necessary clocks and supplies seem OK, and by inspecting over a serial port
I can write/read all the chip registers, but there's just no video output on
either the analog or TFT-LCD ports. Weird.

The actual numbers of failed units aren't huge, but they represent a big
overall spike; for instance, out of a sample of 250 units between 12-24
months old, we have only four real failures*, all of which were failed power
supplies reasonably attributable to mains glitches. But there is no common
factor that would be an obvious cause for the current set of problems.


* - Not counting units which we have determined were damaged in shipping.

--
-- Lewin A.R.W. Edwards (http://www.zws.com /)
Learn how to develop high-end embedded systems on a tight budget!
We've slightly trimmed the long signature. Click to see the full one.
Re: Embedded systems using hard disks - reliability quandary
Quoted text here. Click to load it
 > [ snip ]

you could consider to have batterypowered RAM plus perhaps a Flash
as short term storage and have the disk strictly as longterm storage,
such that the disk is powered up once a day only for a quick transfer.

Rene
--
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
& commercial newsgroups - http://www.talkto.net


Re: Embedded systems using hard disks - reliability quandary

Quoted text here. Click to load it

Oh, no. This is for storage of multimedia data (pictures, MPEG movies, MP3
audio). Gigabytes of data, not a tiny logfile :)

--
-- Lewin A.R.W. Edwards (http://www.zws.com /)
Learn how to develop high-end embedded systems on a tight budget!
We've slightly trimmed the long signature. Click to see the full one.
Re: Embedded systems using hard disks - reliability quandary
On Thu, 23 Oct 2003 13:49:06 -0400, "Lewin A.R.W. Edwards"

Quoted text here. Click to load it

There mature GB-size flash disks on the market.
No noise, no seek time, do not care for spin-ups/spin-downs;
operating temperature from -40 to +85 plus vibrations ...
If you want reliability - you gotta pay...

Duke S.

Re: Embedded systems using hard disks - reliability quandary
: There mature GB-size flash disks on the market.
: No noise, no seek time, do not care for spin-ups/spin-downs;
: operating temperature from -40 to +85 plus vibrations ...
: If you want reliability - you gotta pay...
:

....and with limited life-cycles depending on how much you write to
them....

Flash is not the answer for storing large multimedia files.  It dies after
a measely million or so writes (depending on the media of course).

I'm not sure how this compares with the life of an IDE drive though.  I
think the comparison would suggest buying IDE drives that can be easily
replaced......

--buddy

--
Remove '.spaminator' and '.invalid' from email address
when replying.


Re: Embedded systems using hard disks - reliability quandary

Quoted text here. Click to load it

That is a million per sector and they are wear leveled. Completely
rewriting a 1GB FLASH device at say 4MB/s will take 256 seconds and doing
that 1 million times will take 71,000 hours which compares rather well with
the 20,000 power on hours for a 2.5" hard drive.




Re: Embedded systems using hard disks - reliability quandary
Quoted text here. Click to load it

At thousands of dollars each, and not designed for constant-rewrite
applications.

*mild exasperation* I think my question was very specific. I am interested
in heuristics for calculating power management settings that will extend the
life of an IDE hard disk. It is ridiculous to think about using flash media
in our application, until 30Gb of flash storage can be obtained for under
$80. I know there are people using hard disks in applications such as laptop
computers, MP3 players, and TiVos, so somebody probably knows the answer to
my question.

Quoted text here. Click to load it

Much slower write access than hard disks, also. Depending on the media type,
could be slower read access as well.



Re: Embedded systems using hard disks - reliability quandary
oN 23-Oct-03, Lewin A.R.W. Edwards said:

Quoted text here. Click to load it

I can't claim to have the experience you hope to find, but some of my =
observations may yet be of use to you.

We manufacture video servers, and have been shipping such beasts since =
1997. One of the first observations we made was that the drives in use in =
1997 (9GB Seagate Barracudas), if run in the open on the bench, would soon =
rise to the full rated temperature given in the spec sheet (150F). We =
developed our own mounting subchassis, and sandwiched the drives between =
aluminum plates, with significant airflow. We then observed that the =
temperature of each drive maintained at about 105F. In those units (many =
still in service), we had remarkably few drive failures.

Heat is always a significant factor in failure, and drives generate it on =
their own, in prodigious amounts. It would be interesting to develop some =
power cycling controls, and then to graph the performance trade-offs vs. =
average operating temperature. I'm sure that managing the up-time of the =
drive will boost MTBF, but by how much, of course, is difficult to =
ascertain.

My observations over the last 6 years have been that drives either fail =
very quickly from some sort of component failure (typically less than 30 =
days), or they fail from wear. Since the obvious wear point is the =
bearing, and it's inaccessible, in my view, the only things you can do to =
extend its life will be to a) reduce running time, and b) reduce heat =
generation.

It would be great to discuss such things with a drive design engineer or =
two, but the odds of gaining access to any such seem very poor.

--
Bill
Posted with XanaNews Version 1.15.7.4

Re: Embedded systems using hard disks - reliability quandary

Quoted text here. Click to load it
observations may yet be of use to you.

Quoted text here. Click to load it
One of the first observations we made was that the drives in use in 1997 (9GB
Seagate Barracudas), if run in the open on the bench, would soon rise to the
full rated temperature given in the spec sheet (150F). We developed our own
mounting subchassis, and sandwiched the drives between aluminum plates, with
significant airflow. We then observed that the temperature of each drive
maintained at about 105F. In those units (many still in service), we had
remarkably few drive failures.

Quoted text here. Click to load it
own, in prodigious amounts. It would be interesting to develop some power
cycling controls, and then to graph the performance trade-offs vs. average
operating temperature. I'm sure that managing the up-time of the drive will
boost MTBF, but by how much, of course, is difficult to ascertain.

Quoted text here. Click to load it
quickly from some sort of component failure (typically less than 30 days), or
they fail from wear. Since the obvious wear point is the bearing, and it's
inaccessible, in my view, the only things you can do to extend its life will be
to a) reduce running time, and b) reduce heat generation.

Quoted text here. Click to load it
but the odds of gaining access to any such seem very poor.

return key broken?

Re: Embedded systems using hard disks - reliability quandary
Hi William,

Quoted text here. Click to load it
the full rated temperature given in the spec sheet (150F).
Quoted text here. Click to load it
aluminum plates, with significant airflow. We
Quoted text here. Click to load it
In those units (many still in service), we had
Quoted text here. Click to load it

Our drives are mounted, PCBA-down, onto a subchassis made of ~2mm steel
(it's some crazy non-metric gauge, but it's roughly 2mm thick). We have bent
tabs that run up the sides of the drive, and we use the side-entry screw
holes, not the bottom screw holes. The "parts side" of the subchassis faces
the outside world, with about 2cm of airspace then a perforated thin steel
outer housing.

The thing is, we don't temperature-test/characterize every single point over
the surface of the subchassis. We have a couple of temperature probe
points - on top of hot ICs, and in the power supply - and we test in various
environments to make sure we don't exceed rated temperatures of 60 Celsius,
with a target temperature of 50 degrees.

Quoted text here. Click to load it
their own, in prodigious amounts.

Unfortunately, the main thing I managed to extract from the datasheets is
that you're damned if you do and damned if you don't. Powering down the
drive eats into a "number of head unload cycles" lifespan. Leaving the drive
running eats into a "number of hours of bearing life" lifespan.
 It would be interesting to develop some power cycling controls, and then to
graph the performance trade-offs vs. average operating temperature. I'm sure
that

Quoted text here. Click to load it
two, but the odds of
Quoted text here. Click to load it

Yeah.. I would settle for talking to an engineer at one of the big laptop
manufacturers, though. When Toshiba decides on the warranty period for their
laptops, they must have some sensible method of determining it...

--
-- Lewin A.R.W. Edwards (http://www.zws.com /)
Learn how to develop high-end embedded systems on a tight budget!
We've slightly trimmed the long signature. Click to see the full one.
Re: Embedded systems using hard disks - reliability quandary
oN 27-Oct-03, Lewin A.R.W. Edwards said:

Quoted text here. Click to load it

Probably even more difficult than getting a shot at a drive design
engineer....

I haven't looked at the head load/unload cycle figures. I'd have
thought that the numbers for that activity would be more than
sufficient.

We always use the side mounting holes, ensure good contact with metal,
and on the 3.5" drives, we use screws in all three positions on each
side, also for better thermal coupling. Not sure what I would expect
from steel; we have always mounted with aluminum.

It's an interesting problem, and I'd be interested in what you find if
you develop some test setups.

--
Bill
Posted with XanaNews Version 1.15.7.4

Site Timeline