OT: bad sector remapping

Hi,

I'm putting together a laptop for a friend. Clean install of XP. Add required drivers, etc.

At this point, I would normally take a snapshot of the disk (restoring from snapshot is *much* quicker than reinstalling, etc.).

Doing so, Clonezilla claims a couple of bad sectors (I'd have to doublecheck the log to see which tool complained).

I believe this as the "disk check" built into the BIOS gave a supper informative "Error #2" or somesuch. Machine operates normally, otherwise.

So, is XP smart enough to avoid the bad sector? Or, has it just not stumbled across it, yet?

I had thought modern PATA drives had finally adopted the "grown defect table" support SCSI drives have had. And, that the drive itself would take care of the necessary housekeeping (i.e., to map a "good" physical sector in to replace the bad one).

I've checked the Fujitsu site (seems they have sold off their drive business to Toshiba?) but nothing like a "low level format" utility there -- or any other diagnostic/maintenance tool.

It's a 100G drive so I'd hate to discard it. But, I think I have some 160's I could call on to replace it if push comes to shove...

I hate this time of year! :-/

Thx,

--don

Reply to
D Yuniskis
Loading thread data ...

..

No, it's in the disk firmware. If it can recover data using its array of tricks (error correction, re-seeks and recalibration), it'll remap the bad sectors, and return correct data, possibly after a significant delay. If it can't recover the data, it reports disk reading errors, and XP can't do anything about it.

Since late 1990s practically all disks have the firmware diagnostics called SMART. They can be read from ATA, SATA, SAS and even USB interfaces. In Linux SMART data can be read via the skdump and smartctl commands, and a very nice disk utility called 'palimpsest' that can read SMART data, run SMART tests, and even characterize disk speed dependence on sector location. Some BIOSes display SMART status, but w/o much detail. There are manufacturer utilities to show SMART; they should be vendor- independent, actually, because SMART is almost standard.

While the firmware can swap out recoverable bad sectors, there was a Google paper showing that any bad sectors are a significant predictor of a future complete disk failure. Watch for Reallocated Sector Count (recovered errors) and especially Current Pending Sector (unrecoverable errors).

Bah, humbug, as well :)

Reply to
Przemek Klosowski

Yes, that's what I expected. So, the fact that XP didn't complain during the format, install and a subsequent "defragment" suggests it simply didn't *touch* that sector (?).

By contrast, Clonezilla *did* (suggesting that Windows had put something *in* it) access it and complain.

I stumbled across "MHDD" which *seems* like it should be able to do what I need. Of course, at an hour per pass, it will be a while before I can go through the entire install with Clonezilla chaser.

I'll post back when I have more conclusive results.

I thought the paper mentioned how *bad* SMART was at predicting failures (?)

End of year replacing equipment (and finding new homes for the old stuff) always ends up eating up far more time than I expect! :-/

Reply to
D Yuniskis

The disk will only report bad sectors when /reading/. If it notices a failure during a write (this is very rare), it will silently re-locate that sector to one of its spare sectors (unless, of course, the drive is so bad that it runs out of spares). When reading a sector, the disk can correct a number of errors. If the errors reach a certain threshold, then again the disk will silently re-locate the sector and return the corrected data. But if there were uncorrectable errors, the disk returns an error message.

Think about what happens during an NTFS format. Assuming it is a "quick" format, very little is actually written to the disk - just the superblocks and a skeleton MFT. And /nothing/ is read - thus bad sectors will not be found. During installation of XP, very little is read - most access is writing to the disk, so again the chances of spotting a failure are small. In use, XP will obviously read from the disk - but only the sectors it needs, which is a small proportion of the total written data. But when Clonezilla is copying a disk, it will read /everything/ - if there are errors, these will show up.

It is also possible that Clonezilla does a more low-level read and gets information about poor or failing sectors rather than just unrecoverably failed sectors. I don't know if that's the case or not - you would have to check in the Clonezilla documentation.

Reply to
David Brown

Yes, but not "on the fly", as the drive does internally. If you run a chkdsk /F/R, Windows will test all sectors and try to relocate the data in the bad sectors to a good sector. It is unlikely the recovery will succeed, but at least it will let you know which file is lost.

Quite possible. There are lots of files that are almost never used, and even quite a few that are never used in normal operation.

They do. The problem is that this defect table is of limited size, and when it is full, the drive has to start marking additional bad sectors the old fashioned way.

This means that once you see the first bad sector, there are actually hundreds, maybe thousands of bad sectors already. This is caused by a fault in the drive, and this fault will cause more bad sector to appear with time.

Modern drives use a voice coil to position the heads. Such drives cannot be low level formatted, because they need the servo information on the platters to position the heads. If you erase the platters with a strong magnetic field, the drive will start to hopelessly clank the heads against the end stop, desperately trying to find the servo information.

That's poor customer service. But if the BIOS says the drive is bad, and a different drive tests OK, then it *is* bad.

The drive is broken. If you don't want to discard it, use it as a paperweight.

If you continue to use the drive, you will experience neverending, random problems, and you may hate Windows for being so unstable. Note that a bad sector propping up in the page file will cause a bluescreen crash with a message suggesting a memory error. Once sector zero dies, you will no longer be able to use the drive on a Wintel BIOS based machine.

Do yourself a favor: Replace the drive.

--
RoRo
Reply to
Robert Roland

Not *exactly* correct. If chkdsk encounters an unused bad sector it will it to the bad sector list. A status line from chkdsk reports the number of bad sectors if it is non-zero. That sector will be avoided (not remapped) until the next re-format.

I agree 1000%. The cost of the drive (1T < $100) is dwarfed by the cost of the data.

RK

Reply to
d_s_klein

That matches what I've seen, over the past ten years of doing hard-drive support at TiVo for DVR applications.

It's common for drives to have a small number of "grown" defects over its lifetime (i.e. bad sectors which were not detected during factory formatting, but have "gone bad" during use). However, once you see more than a very few of these, or get several of them occurring in a fairly short period of time, it's a pretty good sign that the drive is on a rapidly-descending path to total failure, and it's time to back up it (if you haven't already, shame on you) and replace it.

I discussed the issue with engineers from a couple of hard-drive manufacturers, and believe that there's a good reason why this is the case. In modern hard drives, the read/write head is flying over the media surface at a *very* low height -- the width of a human hair is huge by comparison, and even a particle of cigarette smoke looks like a boulder. If there's any contaminating particle at all on the disk surface, it'll get between the head and the surface and cause the head to "bounce" or scrape. Even if it doesn't force the head away from the platter, it causes enough friction to heat up the tiny read/write head and change its electrical characteristics, resulting in a bad read (the engineers used the term "thermal asperity" to refer to this sort of event).

A major reason for the development of bad sectors, is physical damage: "head slap", in which the head actually hits the platter and knocks some particles loose. Thie can happen if the drive or computer is banged or moved sharply while the drive is spinning, or even when it's powered off. If these particles remain on the disk surface (or fly away and then land again, elsewhere) they're likely to be hit by the head during a subsequent revolution... which can cause more scraping or even slapping of the head, more damage to the surface, and the creation of more loose particles. Although the drives have a fiber filter in the air-flow path which is intended to catch these particles, it's never 100% effective... and so the amount of contamination builds up as time goes by, and the problem snowballs and the drive goes downhill towards failure.

After playing around with a number of error-management ideas at TiVo, I came to the conclusion that it wasn't worth bothering with. A drive which shows a significant number of reallocated (or pending- reallocation) sectors, probably doesn't have very long to live... its performance will degrade (due to the need to retry reads on sectors that are going bad) and it will lose more and more data (unrecoverable errors). The best filesystem- and application-level error management techniques I could dream up, wouldn't have extended the life of a failing drive enough to be worth the effort and complexity.

--
Dave Platt                                    AE6EO
Friends of Jade Warrior home page:  http://www.radagast.org/jade-warrior
 Click to see the full signature
Reply to
Dave Platt

Excellent advice. The cost of a simple replacement is relatively small (around $50), the cost to you of lost productivity and/or data could be far higher if your current drive fails completely, and your current drive probably *will* fail completely before long no matter what you do.

--
Dave Platt                                    AE6EO
Friends of Jade Warrior home page:  http://www.radagast.org/jade-warrior
 Click to see the full signature
Reply to
Dave Platt

My point is that windows' (possessive) handling of the error must be done at a higher level (abstraction) than the remapping that the drive is *supposed* to be doing. I.e., it modifies some structure in the filesystem that *it* (windows) maintains; and does NOT force the drive to remap the sector.

[this makes sense if the "application" sits in userland and isn't tightly integrated into the OS -- otherwise, the OS would have to export the "remap bad HARDWARE sector" facility in its API]

E.g., I re-did the install. Chkdsk. Defrag. etc. and partclone (part of Clonezilla) *still* complained about a bad sector.

So, what I did last night (while waiting for the moon to creep into shadow) was to run this utility. repeatedly. (takes a bit over an hour to scan the drive). Each time, the same sector was flagged as "bad". Sector number corresponded with the one reported by partclone.

Then, had that utility "remap" the sector. And, another scan.

Sector no longer reported as bad. Though in its place, the "new" sector shows an increased access time (utility graphically depicts access times of each sector -- so, you can "see" each time the head moves to a new cylinder as the first access in that cylinder will be higher than all subsequent ones) no doubt a result of the drive having to "fetch" the remapped cyclinder.

Ran the BIOS "test hard disk" utility. Passed without complaint.

Reinstalled windows. chkdsk. defragment. Still no complaints.

Ran clonezilla. Successfully completed in ~4 minutes (i.e., I can now do a "restore" in about the same amount of time -- not the hours that a new windows install would require!)

But, in theory, chkdsk would have examined EVERY sector on the drive, even those not currently in use by "files" (or "superblock")

Yes, we bulk erase (and/or drill a 3/8" dia hole through the drive) drives headed out for "scrap". Amusing to see folks try to resurrect a drive that has been bulk erased (but not yet drilled).

BIOS only complained when explicitly asked to "test hard drive". Obviously did a surface scan and complained when it encountered this sector (this is speculation on my part; error message was totally useless -- the equivalent of "check engine").

As I said, I have several other drives. But, my experience has been that getting *at* the drive often results in damage to the case (the 'snaps" on these cheap plastic clamshells invariably break). So, if it boils down to open the case to replace the drive, I'll turn the disk off in the BIOS and use it as a diskless workstation (which pretty much eliminates it's usefulness as a laptop -- just turns it into a "portable, *wired* computer")

Or, "recycle" it.

For now, everything *looks* like it is working. So, I'll install the rest of the applications and periodically "scan" the disk by doing periodic partclones. (I can ALWAYS discard it at a later date! :> )

Thanks!

Reply to
D Yuniskis

But, is this list at filesystem level or *within* the drive (e.g., "grown defect list")?

This suggests the list is in an OS-maintained structure and NOT in the drive itself. It also suggests that formatting purges the list (?). I.e., format by itself will just re-introduce the flakey sector back into the pool of available sectors (until the next chkdsk)

Machine will just be used for email and to view camera photos off SC cards while traveling. As I said elsewhere, opening the case to replace the drive stands a good chance of damaging the case (I repair rescued laptops for a local non-profit so I am well aware of how easily they can be trashed -- having trashed far too many, myself! :> ). Some laptops are a bit friendlier for the "common" repairs (swap drives, swap PCI modules, swap memory -- some are even cooperative in replacing the CD/DVD!).

This one isn't. :-/

The consolation is that if it starts acting up "while traveling", it can just be dumped into the nearest trash can :>

Reply to
D Yuniskis

e

There is a bad sector list that is maintained by the OS on the hard drive as part of the file system. Every file system I have ever worked with has such a feature, and I date back to the RT-11 days.

Typical method is the mark the sector (cluster) as 'unavailable' - before- it is attached to a file, and it gets skipped when the OS searches for the next available sector (cluster).

When you build a new file system, this list typically starts at "empty". One exception is the Windows floppy format - it scans the disk and builds a new list every time unless you specifically tell it not to. There are probably options to the other format routines to force it to build a new list. (I use chkdsk with surface scan immediately after formatting.) In the 'pre-IDE' days, one got a chart with the drive, and format was told about the bad sectors "by hand".

RK

Reply to
d_s_klein

Configure it to boot off of a flash drive. Or boot off of a CD, and use a flash drive for storage. - That way there is no data to recover after it has breathed its last.

Or realize that you can get a bucket-class laptop for less than you've 'spent' so far on this one. :)

RK

Reply to
d_s_klein

=A0 =A0 =A0 =A0 =A0 =A0 =A0 AE6EO

r

Thanks for the insight - the physical explanation why this happens was something I could only have guesses about. I am mostly using 2.5" drives and DPS (our OS here) has all the defect detection etc. stuff, done in the early 90-s (mostly on a SCSI 200 MB 2.5" drive, which was the largest then... its R/W head must have been still of the inductive type, a little later drives got much better and larger). However all that stuff is practically unused since :-).

Can you say something on the life expectation of drives which are on most if not all of the time? The SMART data do have a spin up counter, perhaps this is the main killer, but I wonder how long the bearing will last with little spin up/ down involved, the number of disks I have had in my hands is dwarfed by the one your DVR experience suggests.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Reply to
Didi

formatting link

has some interesting insights

Reply to
D Yuniskis

This makes sense. I'd never considered the issue before.

Yes, I can remember old MFM drives with such lists. I only ever had one DEC disk drive -- small (capacity, not size), 128 fixed heads. I seem to recall it was essentially byte (word) addressable (?)

Reply to
D Yuniskis

There *never* will be anything to "recover" on it. I don't use laptops for anything other than "small GUI's". A friend will be using this one to access email, browse the web and view pictures from digital camera. I.e., all things that exist *outside* the PC's disk.

I volunteer at a recycling facility. This attitude is far too common. "Recycled" machines (at most such places) I *scrapped* for precious metals, etc. A huge waste of resources for something that is typically still "serviceable" (it is not uncommon for an 18-wheeler to pull up FULL of PC's to be recycled -- just because some business decided that it's been 18 months since they last purchased machines... :-/ ). Unfortunately, it takes a lot of labor to reintroduce a machine into normal use once it has gone this route. But, schools, charities, etc. sure appreciate NOT having to pay for them!

I would prefer not to be a part of that problem.

Reply to
D Yuniskis

Modern drives can't be low level formatted by the user, but not because of the servo, which is written between the sectors or more deeply than the other information. One early drive with an embedded servo was the Toshiba MK-72x, an ST506/412 interface drive with 10 heads and 5 platters, and it could definitely be low level formatted.

Reply to
do_not_spam_me

First HD I owned was a Kalok 330 RLL. I think it was around 30 megabytes. The DEBUG command was used to low level format these and MFM drives. Usually on an AT/XT PC it was > debug.com then G=C800:5.

Do you go back that far?

--
Live Fast, Die Young and Leave a Pretty Corpse
Reply to
Meat Plow

I've rewritten timing tracks on DEC DF32 and RF08 head-per-track drives when you were in grade school.

Reply to
Jim Stewart

So is this "MY COCK IS BIGGER THAN YOURS" contest for you or do you really think I give a f*ck what you did or when you did it.

BTW I was is high school back when you were fiddling wither your DIC.

--
Live Fast, Die Young and Leave a Pretty Corpse
Reply to
Meat Plow

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.