Problem with an external disc drive

Ad> So what is a suitable replacement HDD or SSD ?

I would recommend Seagate, I use nothing else and very satisfied :)

Mount a 500GB or 1TB drive in a USB 3.0 case (more reliable than 2.0) or, as I did, mount a 250GB SSD in a USB 3.0 case. WD is also known to be reliable.

If you want long-term stability consider a 3.5" drive in a USB 3.0 case using the external power supply. I use these for Laptop backups, too :)

.- Keep the faith, --------------------------------------------------. | | | Ben aka cMech Web: http|ftp|binkp|telnet://cmech.dynip.com | | Email: fido4cmech(at)lusfiber.net | | Home page:

formatting link
| `----------- WildCat! Board 24/7 +1-337-984-4794 any BAUD 8,N,1 ---'

... Just my opinion, but of course I'm correct.

Reply to
Ben Ritchey
Loading thread data ...

PI B+ with an external Seagate 2TB disc drive connected up via USB, drive formatted as EXT4. This has happily been running for several years,

Today is Monday.

Late on Saturday night, accessing the drive became impossible, basic commands (ls, rm du) would hang indefinitely.

Having been out for most of the day yesterday, it was late afternoon before I was able to look at it. It was clear that some files had been added to the disc, but not everything that I would expect, and soon after I started to look at it, it started to hang again, so I tried to reboot it. /var/log/messages shows that the reboot command was issued at 17:20. At 17:44, there is an entry to say that "nfsd: last server has exited, flushing export cache". At 18:17, the boot process appears to start, this takes about 10 seconds.

This morning I logged in briefly, and all appeared to be well, files were written to it overnight, and during the day. This afternoon, I logged in again to see if I could work out what had happened, and at around that time, error messages started to appear :

May 1 14:39:35 ffoilpi1 kernel: [67009.943117] sd 0:0:0:0: [sda] UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 May 1 14:39:35 ffoilpi1 kernel: [67009.943167] sd 0:0:0:0: [sda] Sense Key : 0x3 [current] May 1 14:39:35 ffoilpi1 kernel: [67009.943193] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x0 May 1 14:39:35 ffoilpi1 kernel: [67009.943222] sd 0:0:0:0: [sda] CDB: opcode=0x28 28 00 1f ec a8 00 00 00 f0 00

Looking at dmesg, things are getting worse :

[67016.577720] sd 0:0:0:0: [sda] UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 [67016.577768] sd 0:0:0:0: [sda] Sense Key : 0x3 [current] [67016.577793] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x0 [67016.577824] sd 0:0:0:0: [sda] CDB: opcode=0x28 28 00 1f ec a8 80 00 00 08 00 [67016.577848] blk_update_request: critical medium error, dev sda, sector 535603328 [67201.185193] INFO: task jbd2/sda1-8:2602 blocked for more than 120 seconds. [67201.185232] Not tainted 4.1.19+ #858 [67201.185246] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [67201.185262] jbd2/sda1-8 D c0534afc 0 2602 2 0x00000000 [67201.185334] [] (__schedule) from [] (schedule+0x4c/0xa4) [67201.185373] [] (schedule) from [] (schedule_timeout+0x180/0x1f8) [67201.185407] [] (schedule_timeout) from [] (io_schedule_timeout+0xc4/0x148) [67201.185436] [] (io_schedule_timeout) from [] (bit_wait_io+0x48/0x70) [67201.185464] [] (bit_wait_io) from [] (__wait_on_bit+0x90/0xcc) [67201.185492] [] (__wait_on_bit) from [] (out_of_line_wait_on_bit+0x78/0x84) [67201.185532] [] (out_of_line_wait_on_bit) from [] (__wait_on_buffer+0x3c/0x44) [67201.185575] [] (__wait_on_buffer) from [] (jbd2_journal_commit_transaction+0x1008/0x1700) [67201.185618] [] (jbd2_journal_commit_transaction) from [] (kjournald2+0xb8/0x23c) [67201.185665] [] (kjournald2) from [] (kthread+0xdc/0xf8) [67201.185706] [] (kthread) from [] (ret_from_fork+0x14/0x3c) [67321.186203] INFO: task jbd2/sda1-8:2602 blocked for more than 120 seconds. [67321.186241] Not tainted 4.1.19+ #858 [67321.186256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [67321.186271] jbd2/sda1-8 D c0534afc 0 2602 2 0x00000000 [67321.186343] [] (__schedule) from [] (schedule+0x4c/0xa4) [67321.186381] [] (schedule) from [] (schedule_timeout+0x180/0x1f8) [67321.186411] [] (schedule_timeout) from [] (io_schedule_timeout+0xc4/0x148) [67321.186440] [] (io_schedule_timeout) from [] (bit_wait_io+0x48/0x70) [67321.186471] [] (bit_wait_io) from [] (__wait_on_bit+0x90/0xcc) [67321.186499] [] (__wait_on_bit) from [] (out_of_line_wait_on_bit+0x78/0x84) [67321.186537] [] (out_of_line_wait_on_bit) from [] (__wait_on_buffer+0x3c/0x44) [67321.186583] [] (__wait_on_buffer) from [] (jbd2_journal_commit_transaction+0x1008/0x1700) [67321.186624] [] (jbd2_journal_commit_transaction) from [] (kjournald2+0xb8/0x23c) [67321.186669] [] (kjournald2) from [] (kthread+0xdc/0xf8) [67321.186711] [] (kthread) from [] (ret_from_fork+0x14/0x3c) [67373.317504] sd 0:0:0:0: timing out command, waited 360s [67373.317584] blk_update_request: I/O error, dev sda, sector 1953346160 [67373.317735] Aborting journal on device sda1-8. [67395.841132] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal [67395.849920] EXT4-fs (sda1): Remounting filesystem read-only

It looks as though I've got some sort of disc error, but it is far from clear (to me) what it is, and what needs to be done.

Adrian

--
To Reply : 
replace "bulleid" with "adrian" - all mail to bulleid is rejected 
 Click to see the full signature
Reply to
Adrian

can you read the SMART data (with smartctl)?

IMHO the disk is dying, or some problem with the USB interface (rpi or disk side). Have you tried to change the cable? change the USB port? Connect th e drive to another machine? or to the rpi through a powered USB HUB?

Bye Jack

Reply to
jack4747

Based on

formatting link
these two lines translate to ?Medium Error - unrecovered read error?.

I would replace the disk and restore from backup.

--
http://www.greenend.org.uk/rjk/
Reply to
Richard Kettlewell

Probably, replace the disc.

You could check the USB interface with another disc, but disc failure is the way I'd bet.

Reply to
Roger Bell_West

In message , Adrian writes

Thanks for the various replies.

In no particular order :

This is (currently) the only external disc drive I've got, so swapping it for another one isn't possible (but it looks as though some shopping is in order :-( ).

No I haven't tried the disc on another machine, I probably won't have time before Thursday PM at the earliest.

The disc runs off its own power supply, rather than using the USB lead, so I'm not sure what I would gain by running it via a powered hub.

Since putting up the original posting, I've managed to get smartctl working :

smartctl 5.41 2011-06-09 r3365 [armv6l-linux-4.1.19+] (local build) Copyright (C) 2002-11 by Bruce Allen,

formatting link

=== START OF INFORMATION SECTION === Model Family: Seagate Barracuda Green (Adv. Format) Device Model: ST2000DL003-9VT166 Serial Number: 5YD9T0VK LU WWN Device Id: 5 000c50 052f2f84d Firmware Version: CC98 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Mon May 1 16:14:37 2017 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled

=== START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes.

General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 623) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x1033) SCT Status supported. SCT Feature Control supported. SCT Data Table supported.

SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 099 099 006 Pre-fail Always

- 26876781 3 Spin_Up_Time 0x0003 092 092 000 Pre-fail Always

- 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always

- 77 5 Reallocated_Sector_Ct 0x0033 099 099 036 Pre-fail Always

- 1072 7 Seek_Error_Rate 0x000f 067 060 030 Pre-fail Always

- 5963469 9 Power_On_Hours 0x0032 060 060 000 Old_age Always

- 35414 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always

- 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always

- 43

183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always

- 0

184 End-to-End_Error 0x0032 100 100 099 Old_age Always

- 0

187 Reported_Uncorrect 0x0032 087 087 000 Old_age Always

- 13

188 Command_Timeout 0x0032 100 100 000 Old_age Always

- 0

189 High_Fly_Writes 0x003a 100 100 000 Old_age Always

- 0

190 Airflow_Temperature_Cel 0x0022 056 041 045 Old_age Always In_the_past 44 (2 156 45 41) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always

- 0

192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always

- 16

193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always

- 77

194 Temperature_Celsius 0x0022 044 059 000 Old_age Always

- 44 (0 6 0 0)

195 Hardware_ECC_Recovered 0x001a 014 004 000 Old_age Always

- 26876781

197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always

- 48

198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline

- 48

199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always

- 0

240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline

- 8916352141912

241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline

- 498582715

242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline

- 271744506

SMART Error Log Version: 1 ATA Error Count: 12 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 12 occurred at disk power-on lifetime: 35414 hours (1475 days + 14 hours) When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 25 da 08 ff ff ff 4f 00 20:04:50.024 READ DMA EXT 25 da 08 ff ff ff 4f 00 20:04:46.778 READ DMA EXT 25 da 08 ff ff ff 4f 00 20:04:21.724 READ DMA EXT 25 da 60 ff ff ff 4f 00 20:04:21.721 READ DMA EXT 25 da 08 ff ff ff 4f 00 20:04:21.609 READ DMA EXT

I'm not sure I'm convinced by the line : "SMART overall-health self-assessment test result: PASSED"

Adrian

--
To Reply : 
replace "bulleid" with "adrian" - all mail to bulleid is rejected 
 Click to see the full signature
Reply to
Adrian

with disks there is noth8imng to be done but get a new one

And restore data from backup. you do back up nightly don't you?

--
No Apple devices were knowingly used in the preparation of this post.
Reply to
The Natural Philosopher

That alone tells me its time for a new disc..

--
No Apple devices were knowingly used in the preparation of this post.
Reply to
The Natural Philosopher

Nor me. Given the amount reallocated sectors, and sectors it is yet to figure out what to do with.

Disk is dying, back it up and replace. Warranty replacement claim?

--
--------------------------------------+------------------------------------ 
Mike Brown: mjb[-at-]signal11.org.uk  |    http://www.signal11.org.uk
Reply to
Mike

Ageist :)

9 Power_On_Hours 0x0032 072 072 000 Old_age Always - 20897 9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 22486 9 Power_On_Hours 0x0032 073 073 000 Old_age Always - 24008 9 Power_On_Hours 0x0032 062 062 000 Old_age Always - 28175 9 Power_On_Hours 0x0032 061 061 000 Old_age Always - 34782 9 Power_On_Hours 0x0032 061 061 000 Old_age Always - 34782 9 Power_On_Hours 0x0032 051 051 000 Old_age Always - 43124 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 51210 9 Power_On_Hours 0x0032 019 019 000 Old_age Always - 59381 9 Power_On_Hours 0x0012 092 092 000 Old_age Always - 59486 9 Power_On_Hours 0x0032 008 008 000 Old_age Always - 81246

(not all on one pc!)

--
--------------------------------------+------------------------------------ 
Mike Brown: mjb[-at-]signal11.org.uk  |    http://www.signal11.org.uk
Reply to
Mike

Yep. I just had two fail around that mark (37000 hours for 2.5" and 49700 hours for 3.5" - coincidentally both Hitachi drives).

It may well be worth spending the extra few quid for an enterprise grade drive if its going to be on continuously, e.g get a WD Red rather than a WD Blue or the equivalent for your preferred brand.

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
 Click to see the full signature
Reply to
Martin Gregorie

My experience with consumer class disks is that the MTBF is 5 years of power on, and about 1 year of flat out seek/read/write activity.

Even if the disk wasn't showing errors, Id still throw it out after 5 years.

Its like changing the oil on your car. Maybe it will run another 10,000 miles on that oil...or not.

--
"It is an established fact to 97% confidence limits that left wing  
conspirators see right wing conspiracies everywhere"
Reply to
The Natural Philosopher

I'd forgotten how long it had been in place.

Yes it is on all the time, and it is being accessed very regularly, so it looks as though replacement is due.

Adrian

--
To Reply : 
replace "bulleid" with "adrian" - all mail to bulleid is rejected 
 Click to see the full signature
Reply to
Adrian

The current disc is a 2TB, largely because it was what was available at the time. Most of the time it is less 2% used, so it is rather bigger than it needs to be, but is being used on a continuous basis.

So what is a suitable replacement HDD or SSD ?

Adrian

--
To Reply : 
replace "bulleid" with "adrian" - all mail to bulleid is rejected 
 Click to see the full signature
Reply to
Adrian

Pass. Wot I wrote is wot I know.

I recently replaced the 3700 hour 120GB Hitachi HDD in my Lenovo R61i laptop with a 128GB Sandisk SSD but as smartd says it has only had 2 hours use so far I obviously can't say anything useful about its durability.

The R61i was replaced by a Lenovo T440 with a Hitachi 500GB HDD with 2000 hours on it (really cute machine with 8GB RAM and a 1600x900 screen). All is good so far.

The dead 3.5" 250GB Hitachi Deskstar (49700 hours) was replaced with a

500BG WD Blue about 250 hours ago, so again I can't yet say anything useful about it.

Speculation follows.

If I'd been *thinking* I should have replaced the dead 3.5" Hitachi with a 500GB or 1 TB WD Red because the MTBF and warranty length for the Red series is much longer than for the Blue and the annoyance factor of replacing a disk increases at least linearly with disk size.

Going beyond that, at some point the annoyance factor gets big enough to justify replacing a single HDD with a RAID farm so you can hot-swap failing drives. The offline backup is still, IMO, mandatory if you go that way because online drives are still subject to destruction from mains strikes and house fires.

Hearsay and internet sources say that a consumer-grade SSD can fail completely with little or no warning while an enterprise SSD tends to fall back to read-only at EOL so you can at least pull the data off it when it fails. Whether this (and cost) influences your choices depends a lot of how much you value the data and how frequently you make offline backups.

HTH

--
martin@   | Martin Gregorie 
gregorie. | Essex, UK 
 Click to see the full signature
Reply to
Martin Gregorie

And burglary.

--
Cheers, 
John
Reply to
John Aldridge

In message , Adrian writes

This took a bit longer than I hoped for, but I'm now back up and running again. Many thanks to those who offered help and advice.

If anyone is interested, in the end, I went for a WD 2TB NAS drive, which is in a SATA - USB housing.

Adrian

--
To Reply : 
replace "bulleid" with "adrian" - all mail to bulleid is rejected 
 Click to see the full signature
Reply to
Adrian

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.