OT; Power faults and SSD's

M

Martin Riddle 13 years ago

A while back there was a discussion of power failures and the effects on Hard disks. Well, not really discussion but a 'Yes, it does happen' vs. a 'No, it doesn't happen' post.

Anyway, I found this article that points out SSD's are more susceptible to power faults.

And here is the link to the paper.

You need a few extra fingers and toes to count the faults on SSD's Vs. the few faults on a rotational disk.

Cheers

Vote

D

DecadentLinuxUserNumeroUno 13 years ago

Thanks for that. Another good reason to have a UPS.

Vote

M

miso 13 years ago

You obviously didn't read the paper. They created a fake power failure by gating power through a kludge protoboard circuit. They did not turn off the AC to the power supply. Who knows how much ringing their was on the power supply pin due to wiring inductance and di/dt. A computer power supply has some hold time. Their test is quite bogus.

To elaborate, these wankers left the data bus connected to the SSD while they gated the supply voltage. This is a prescription for inducing latch-up.

Direct gating of the power to a device via a fet is just bad. Yes it is done, and yes most manufacturers will at least lab test their parts to insure reasonable behavior under such circumstances, but such shenanigans are not in the test flow. Chips provide power down pins/modes. Use them.

While I'm at it, most home UPS are cheesy square wave (OK, "modified") inverters that hopefully switch on in time. The data centers use double conversion UPSs.

Vote

V

Vladimir Vassilevsky 13 years ago

Good SSD models have built-in power loss protection:

formatting link

VLV

Vote

M

miso 13 years ago

I have two 320 series SSDs. So far so good. Even the next generation Intel SSD has a 5 year warranty.

Vote

J

Jeff Liebermann 13 years ago

Yep. Most hard disk drives also convert the spindle motor into a generator if it detects a momentary power loss, so that the drive can flush its write buffer to the platter. It's only a few msec, but it makes a big difference. As in the Intel article above, SSD's have a similar feature in the form of big fat capacitors on the power line. When the drive senses a power loss, the big caps gives it time to flush the buffer. Whether the caps on the SSD can keep the drive up long enough to do this is the unanswered question.

The tests were also done on unformatted drives, which eliminates the benefits offered by the BIOS and operating system in detecting errors and assigning alternate blocks.

I couldn't tell from the long description of the methodology at what point in the write cycle they disconnected the power. If they pulled the plug in the middle of a block of data, they should not expect it to be written to the drive. There was also no clear indication if write cacheing was enabled in the Linux OS. If command queuing is enabled, then writes are written in the order in which to minimize "seek" times, and not in sequential order. If power is interrupted during a re-ordered write, of course there's going to be data serialization errors:

The failure scenario somewhat contrived. It creates a situation where the drive is writing all the time, and the power drops during a write. In reality, the drive spends most of its time idle. It does mostly reads, and does comparatively fewer writes. You can see for yourself by using some performance monitoring tool, and comparing the bytes read to the bytes written. On my XP box, the ratio is usually about

5:1 (using HDDled) but right now is 1:1 because I was doing quite a bit of downloading today (about 14 hrs). Systems with a large read cache will have a lower ratio.

The authors didn't bother supplying the names of the drive vendors, or the models tested. I suppose this could be part of some kind of double blind test method, but the final results should have included the maker and model.

Jeff Liebermann jeffl@cruzio.com 150 Felker St #D http://www.LearnByDestroying.com Santa Cruz CA 95060 http://802.11junk.com Skype: JeffLiebermann AE6KS 831-336-2558

Vote

J

Jasen Betts 13 years ago

are you suggesting that maybe software could have corrected some of the hardware failures, one problem it could also compound them and it makes comparisons harder.

true, are you suggesting they did?

that depends what it'se being used for OLTP, DVR, and video editing all write in with a high frequency,

particularly if they have plenty of RAM.

these are real numbers from an OLTP database server. 34 734 222 256 sectors written vs 8 381 621 185 read Roughtly 4:1 in favour of writes.

They probably wanted to avoid the fervid attention of lawyers.

?? 100% natural --- news://freenews.netfront.net/ - complaints: news@netfront.net ---

Vote

D

Dave Platt 13 years ago

In my experience, it's uncommon these days for the BIOS and operating system filesystem layer to have much to do with bad-block management and reassignment. A lot of filesystems do have at least some of this capability, in principle, but it is very rarely used. Maintaining "good, readable storage" is a job now left almost entirely to the hard-drive (or SSD) onboard controller.

There are a number of reasons for this... and they boil down to "it's easier, and more effective to do it this way."

It's rather difficult for an operating system to determine whether any given sector on the drive is "going bad" (starting to develop low-level bit errors). In the old days (old-style RLL disks) the OS could do a "raw" read of the data patterns on the drive and look for low-level errors, but that's a capability that IDE/SATA drives generally don't provide today. The high-performance low-level Reed-Solomon (or other) error-correction coding on the disk, sophistication of the "read channel" logic on the disk, and vendor-specific variations in the implementations would make it incredibly difficult for a host to determine whether "raw" data patterns are "good" or "bad".

These days, a "sector" isn't even always a "sector". On most high-capacity drives being made today, the drive controller is actually reading and writing 4k-byte sectors on the platter. Although the controller still does I/O to the host in terms of 512-byte "sectors" (logical blocks) this is an illusion on these drives.

It's hard for an OS to determine "soft" errors by looking at read timing, due to the presence of the read cache in the drive (which you can disable), and due to the fact that occasional, nonrepeating read errors are actually fairly common due to vibration and electronic noise.

Even if you suspect that a particular sector may be going bad, you may not be able to do anything to the sector directly. SCSI disks usually allowed you a fairly fine degree of control over bad-block management, during the formatting process and otherwise... it's possible to tell the drive controller "Please map out sector NNNN and reassign it to a spare". The IDE/SATA command set has no such capability... all you can do is overwrite the questionable sector and hope that this either "heals" the problem, or the drive controller decides to reassign it to a spare. This tweak can be used as a nearly-last-ditch way of "mending" sectors on a failing drive (for a while at least) but it usually involves destroying the contents of the file involved.

Yes, when you do a "high-level" (filesystem-level) formatting of a hard drive, you can first do a read scan and see if any sectors are completely unreadable, and then put these in a "bad sector" table in the filesystem so they aren't assigned to any files you write. It's been years since I've seen anybody bother to do this. Standard filesystem formatting does *not* involve a complete read test in any Linux distro I know of, nor in Windows, because it takes many hours on a large drive and (for all practical purposes) never turns up any unreadable sectors.

And, in my experience, by the time you have sectors which are truly "bad" (unreadable after multiple retries and use of the Reed-Solomon error correction codes) the drive is probably rotting... its reliability is going to be poor and its performance degraded. Nothing that the BIOS/OS can do is going to have much benefit.

A similar issue exists with regards to wear leveling, on solid-state drives. It *can* be done at the filesystem layer, and it used to be done that way... but the industry has moved in the direction of doing it either in the drive controller for IDE and SATA drives, so that these appear to be "plug-compatible" with rotating-media drives. You

*can* run a filesystem with its own wear leveling and bad-block management on a "raw" flash device, or use a standardized "flash translation layer" library or driver, but (in my experience) the use of these sorts of techniques are limited to low-cost embedded systems that can't bear the expense of a drive with its own real controller.

So, really, these days, everybody seems to depend on the drive controller to maintain the health of the storage. The best thing you can do is query the health data (S.M.A.R.T.) on a regular basis, run an occasional S.M.A.R.T. "extended" test (which involves a full scan of the entire platter) and replace any drive which starts to throw bad sectors at you (because it's gonna die soon anyhow!).

As to data corruption during power outage... yes, it's a real issue, and it's not limited to solid-state disks. Hard drives have the same issue, which can possibly be even worse. If the drive controller re-orders the writes you give it (in order to minimize seek delay) changes to a file's "metadata" may reach the platter before the data sectors they point to are actually written, and this can lead to a corrupt file being present after a power loss. Good operating systems attempt to minimize this by being careful about the order in which they present data to the drive (and in many cases by writing transaction journals) but drive re-ordering of writes can still mess this up. What you need is the ability to tell the drive "This is a write-barrier moment. All data which has already been written, in the following context, needs to hit the platter *before* any data written in this context from this moment onwards". A proper sequence of write barriers can (if the drive obeys) eliminate re-ordering-related corruption.

Unfortunately, IDE/SATA drives don't (yet) have good command-set support for write barriers, as far as I know. All you can do in most drives is say "FLUSH BUFFER", which forces *all* writes pending in the drive to be committed. It's effective but slow, as it's a complete barrier and (in drives that don't support native command queueing) it blocks *ALL* further I/O to the drive (including reads) until the buffer is flushed.

Dave Platt AE6EO Friends of Jade Warrior home page: http://www.radagast.org/jade-warrior I do _not_ wish to receive unsolicited commercial email, and I will boycott any company which has the gall to send me such ads!

Vote

J

Jeff Liebermann 13 years ago

I'm not sure how to answer that because the methodology isn't clear. I do know that they used Linux and raw filesystems, which suggests that a program like DD was used to perform the writes. The OS does quite a bit of buffering and some error checking while scribbling to the drive. This could easily be a test of how well Linux tolerates having its target drive disappear in the middle of a write.

Again, it's difficult to tell what they did and how they did it without details. No code fragments, no photos of the entire setup, no OS buffer settings, no drive jumper settings, very little on how the power disconnect was timed, and very little on how the results were interpreted. The paper concentrates on the background and logic, and leaves out the implementation, operation, and results interpretation.

True. My numbers for the ratio of reads to writes also depends on what I'm doing. Incidentally, for your amusement, booting XP read about 900KBytes and wrote 320KBytes. I just pulled the numbers from a server running about 10 thin clients. 15TB read, 1TB write, 28 day uptime.

Oh-oh, I may have goofed. The utilities I'm using pull their numbers from the OS, which might be including read/writes to/from the OS HD cache in the numbers. I'll check, but it will need to be next week. I'm doing jury selection this week.

Not unexpected if you're locally caching the forms and mostly doing order entry, which is predominantly writes. Database backups will skew the numbers in favor of reads. Also, you might find it amusing to use the uptime to calculate the average read+write speed. Then, divide that by the speed reported by your favorite HD benchmark. The result is approximately the utilization of the drive. I typically see about 5% utilization on desktops (not including backups) over a typical 24 hr period.

Possibly. However, I would have difficulties believing that a drive manufacturer would sue in order to suppress unfavorable test results.

Here's the Google version from their drive failure analysis at: "However, in this paper, we do not show a breakdown of drives per manufacturer, model, or vintage due to the proprietary nature of these data." What happened was the various drive manufacturers helped Google perform the tests, but demanded in return that the vendor specific information not be published in the event that the data prove unfavorable to the manufacturer. I saw no indication of such an arrangement with this SSD and drive test.

Jeff Liebermann jeffl@cruzio.com 150 Felker St #D http://www.LearnByDestroying.com Santa Cruz CA 95060 http://802.11junk.com Skype: JeffLiebermann AE6KS 831-336-2558

Vote

J

Jasen Betts 13 years ago

yeah, I saw a demo on you-tube (possibly "Shouting in the Datacenter") and was able to replicate those results on a smaller scale in the office

on linux ddrescue /dev/sda /dev/null --force

then shout at the drive and watch the data rate dip.

?? 100% natural

Vote

OT; Power faults and SSD's

Join the Discussion

Didn't find your answer?