Problem with an external disc drive

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
PI B+ with an external Seagate 2TB disc drive connected up via USB,  
drive formatted as EXT4.  This has happily been running for several  
years,

Today is Monday.

Late on Saturday night, accessing the drive became impossible, basic  
commands (ls, rm du) would hang indefinitely.

Having been out for most of the day yesterday, it was late afternoon  
before I was able to look at it.  It was clear that some files had been  
added to the disc, but not everything that I would expect, and soon  
after I started to look at it, it started to hang again, so I tried to  
reboot it.  /var/log/messages shows that the reboot command was issued  
at 17:20.  At 17:44, there is an entry to say that "nfsd: last server  
has exited, flushing export cache". At 18:17, the boot process appears  
to start, this takes about 10 seconds.

This morning I logged in briefly, and all appeared to be well, files  
were written to it overnight, and during the day.  This afternoon, I  
logged in again to see if I could work out what had happened, and at  
around that time, error messages started to appear :


May  1 14:39:35 ffoilpi1 kernel: [67009.943117] sd 0:0:0:0: [sda]  
UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
May  1 14:39:35 ffoilpi1 kernel: [67009.943167] sd 0:0:0:0: [sda] Sense  
Key : 0x3 [current]
May  1 14:39:35 ffoilpi1 kernel: [67009.943193] sd 0:0:0:0: [sda]  
ASC=0x11 ASCQ=0x0
May  1 14:39:35 ffoilpi1 kernel: [67009.943222] sd 0:0:0:0: [sda] CDB:  
opcode=0x28 28 00 1f ec a8 00 00 00 f0 00

Looking at dmesg, things are getting worse :

[67016.577720] sd 0:0:0:0: [sda] UNKNOWN(0x2003) Result: hostbyte=0x00  
driverbyte=0x08
[67016.577768] sd 0:0:0:0: [sda] Sense Key : 0x3 [current]
[67016.577793] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x0
[67016.577824] sd 0:0:0:0: [sda] CDB: opcode=0x28 28 00 1f ec a8 80 00  
00 08 00
[67016.577848] blk_update_request: critical medium error, dev sda,  
sector 535603328
[67201.185193] INFO: task jbd2/sda1-8:2602 blocked for more than 120  
seconds.
[67201.185232]       Not tainted 4.1.19+ #858
[67201.185246] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"  
disables this message.
[67201.185262] jbd2/sda1-8     D c0534afc     0  2602      2 0x00000000
[67201.185334] [<c0534afc>] (__schedule) from [<c0534e44>]  
(schedule+0x4c/0xa4)
[67201.185373] [<c0534e44>] (schedule) from [<c0537224>]  
(schedule_timeout+0x180/0x1f8)
[67201.185407] [<c0537224>] (schedule_timeout) from [<c05347d4>]  
(io_schedule_timeout+0xc4/0x148)
[67201.185436] [<c05347d4>] (io_schedule_timeout) from [<c0535628>]  
(bit_wait_io+0x48/0x70)
[67201.185464] [<c0535628>] (bit_wait_io) from [<c0535228>]  
(__wait_on_bit+0x90/0xcc)
[67201.185492] [<c0535228>] (__wait_on_bit) from [<c05352dc>]  
(out_of_line_wait_on_bit+0x78/0x84)
[67201.185532] [<c05352dc>] (out_of_line_wait_on_bit) from [<c015299c>]  
(__wait_on_buffer+0x3c/0x44)
[67201.185575] [<c015299c>] (__wait_on_buffer) from [<c01ed188>]  
(jbd2_journal_commit_transaction+0x1008/0x1700)
[67201.185618] [<c01ed188>] (jbd2_journal_commit_transaction) from  
[<c01f2710>] (kjournald2+0xb8/0x23c)
[67201.185665] [<c01f2710>] (kjournald2) from [<c003e5dc>]  
(kthread+0xdc/0xf8)
[67201.185706] [<c003e5dc>] (kthread) from [<c000f858>]  
(ret_from_fork+0x14/0x3c)
[67321.186203] INFO: task jbd2/sda1-8:2602 blocked for more than 120  
seconds.
[67321.186241]       Not tainted 4.1.19+ #858
[67321.186256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"  
disables this message.
[67321.186271] jbd2/sda1-8     D c0534afc     0  2602      2 0x00000000
[67321.186343] [<c0534afc>] (__schedule) from [<c0534e44>]  
(schedule+0x4c/0xa4)
[67321.186381] [<c0534e44>] (schedule) from [<c0537224>]  
(schedule_timeout+0x180/0x1f8)
[67321.186411] [<c0537224>] (schedule_timeout) from [<c05347d4>]  
(io_schedule_timeout+0xc4/0x148)
[67321.186440] [<c05347d4>] (io_schedule_timeout) from [<c0535628>]  
(bit_wait_io+0x48/0x70)
[67321.186471] [<c0535628>] (bit_wait_io) from [<c0535228>]  
(__wait_on_bit+0x90/0xcc)
[67321.186499] [<c0535228>] (__wait_on_bit) from [<c05352dc>]  
(out_of_line_wait_on_bit+0x78/0x84)
[67321.186537] [<c05352dc>] (out_of_line_wait_on_bit) from [<c015299c>]  
(__wait_on_buffer+0x3c/0x44)
[67321.186583] [<c015299c>] (__wait_on_buffer) from [<c01ed188>]  
(jbd2_journal_commit_transaction+0x1008/0x1700)
[67321.186624] [<c01ed188>] (jbd2_journal_commit_transaction) from  
[<c01f2710>] (kjournald2+0xb8/0x23c)
[67321.186669] [<c01f2710>] (kjournald2) from [<c003e5dc>]  
(kthread+0xdc/0xf8)
[67321.186711] [<c003e5dc>] (kthread) from [<c000f858>]  
(ret_from_fork+0x14/0x3c)
[67373.317504] sd 0:0:0:0: timing out command, waited 360s
[67373.317584] blk_update_request: I/O error, dev sda, sector 1953346160
[67373.317735] Aborting journal on device sda1-8.
[67395.841132] EXT4-fs error (device sda1): ext4_journal_check_start:56:  
Detected aborted journal
[67395.849920] EXT4-fs (sda1): Remounting filesystem read-only


It looks as though I've got some sort of disc error, but it is far from  
clear (to me) what it is, and what needs to be done.


Adrian
--  
To Reply :
replace "bulleid" with "adrian" - all mail to bulleid is rejected
We've slightly trimmed the long signature. Click to see the full one.
Re: Problem with an external disc drive


Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it

can you read the SMART data (with smartctl)?

IMHO the disk is dying, or some problem with the USB interface (rpi or disk
 side). Have you tried to change the cable? change the USB port? Connect th
e drive to another machine? or to the rpi through a powered USB HUB?

Bye Jack

Re: Problem with an external disc drive
Quoted text here. Click to load it

Based on https://en.wikipedia.org/wiki/Key_Code_Qualifier these two
lines translate to ?Medium Error - unrecovered read error?.

I would replace the disk and restore from backup.

--  
http://www.greenend.org.uk/rjk/

Re: Problem with an external disc drive
On 2017-05-01, Adrian wrote:
Quoted text here. Click to load it

Probably, replace the disc.

You could check the USB interface with another disc, but disc failure
is the way I'd bet.

Re: Problem with an external disc drive


Thanks for the various replies.

In no particular order :

This is (currently) the only external disc drive I've got, so swapping
it for another one isn't possible (but it looks as though some shopping
is in order :-( ).

No I haven't tried the disc on another machine, I probably won't have
time before Thursday PM at the earliest.

The disc runs off its own power supply, rather than using the USB lead,
so I'm not sure what I would gain by running it via a powered hub.

Since putting up the original posting, I've managed to get smartctl
working :

smartctl 5.41 2011-06-09 r3365 [armv6l-linux-4.1.19+] (local build)
Copyright (C) 2002-11 by Bruce Allen,
http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda Green (Adv. Format)
Device Model:     ST2000DL003-9VT166
Serial Number:    5YD9T0VK
LU WWN Device Id: 5 000c50 052f2f84d
Firmware Version: CC98
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon May  1 16:14:37 2017 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection:
Enabled.
Self-test execution status:      (   0) The previous self-test routine
completed
                                        without error or no self-test
has ever
                                        been run.
Total time to complete Offline
data collection:                (  623) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon
new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging
supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1033) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   099   099   006    Pre-fail  Always
-       26876781
  3 Spin_Up_Time            0x0003   092   092   000    Pre-fail  Always
-       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always
-       77
  5 Reallocated_Sector_Ct   0x0033   099   099   036    Pre-fail  Always
-       1072
  7 Seek_Error_Rate         0x000f   067   060   030    Pre-fail  Always
-       5963469
  9 Power_On_Hours          0x0032   060   060   000    Old_age   Always
-       35414
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always
-       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always
-       43
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always
-       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always
-       0
187 Reported_Uncorrect      0x0032   087   087   000    Old_age   Always
-       13
188 Command_Timeout         0x0032   100   100   000    Old_age   Always
-       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always
-       0
190 Airflow_Temperature_Cel 0x0022   056   041   045    Old_age   Always
In_the_past 44 (2 156 45 41)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always
-       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always
-       16
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always
-       77
194 Temperature_Celsius     0x0022   044   059   000    Old_age   Always
-       44 (0 6 0 0)
195 Hardware_ECC_Recovered  0x001a   014   004   000    Old_age   Always
-       26876781
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
-       48
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age Offline
-       48
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always
-       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age Offline
-       8916352141912
241 Total_LBAs_Written      0x0000   100   253   000    Old_age Offline
-       498582715
242 Total_LBAs_Read         0x0000   100   253   000    Old_age Offline
-       271744506

SMART Error Log Version: 1
ATA Error Count: 12 (device log contains only the most recent five
errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 12 occurred at disk power-on lifetime: 35414 hours (1475 days + 14
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 da 08 ff ff ff 4f 00      20:04:50.024  READ DMA EXT
  25 da 08 ff ff ff 4f 00      20:04:46.778  READ DMA EXT
  25 da 08 ff ff ff 4f 00      20:04:21.724  READ DMA EXT
  25 da 60 ff ff ff 4f 00      20:04:21.721  READ DMA EXT
  25 da 08 ff ff ff 4f 00      20:04:21.609  READ DMA EXT


I'm not sure I'm convinced by the line :
"SMART overall-health self-assessment test result: PASSED"

Adrian
--  
To Reply :
replace "bulleid" with "adrian" - all mail to bulleid is rejected
We've slightly trimmed the long signature. Click to see the full one.
Re: Problem with an external disc drive
On 01/05/17 16:41, Adrian wrote:
Quoted text here. Click to load it


That alone tells me its time for a new disc..


--  
No Apple devices were knowingly used in the preparation of this post.

Re: Problem with an external disc drive
Quoted text here. Click to load it

Ageist :)

  9 Power_On_Hours          0x0032   072   072   000    Old_age   Always       -       20897
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22486
  9 Power_On_Hours          0x0032   073   073   000    Old_age   Always       -       24008
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       28175
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       34782
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       34782
  9 Power_On_Hours          0x0032   051   051   000    Old_age   Always       -       43124
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       51210
  9 Power_On_Hours          0x0032   019   019   000    Old_age   Always       -       59381
  9 Power_On_Hours          0x0012   092   092   000    Old_age   Always       -       59486
  9 Power_On_Hours          0x0032   008   008   000    Old_age   Always       -       81246

(not all on one pc!)
--  
--------------------------------------+------------------------------------
Mike Brown: mjb[-at-]signal11.org.uk  |    http://www.signal11.org.uk

Re: Problem with an external disc drive
On 01/05/17 19:32, Mike wrote:
Quoted text here. Click to load it

My experience with consumer class disks is that the MTBF is 5 years of  
power on, and about 1 year of flat out seek/read/write activity.

Even if the disk wasn't showing errors, Id still throw it out after 5 years.

Its like changing the oil on your car. Maybe it will run another 10,000  
miles on that oil...or not.


--  
"It is an established fact to 97% confidence limits that left wing  
conspirators see right wing conspiracies everywhere"

Re: Problem with an external disc drive
Quoted text here. Click to load it

I'd forgotten how long it had been in place.

Yes it is on all the time, and it is being accessed very regularly, so  
it looks as though replacement is due.

Adrian
--  
To Reply :
replace "bulleid" with "adrian" - all mail to bulleid is rejected
We've slightly trimmed the long signature. Click to see the full one.
Re: Problem with an external disc drive
On Mon, 01 May 2017 18:00:34 +0100, The Natural Philosopher wrote:

Quoted text here. Click to load it

Yep. I just had two fail around that mark (37000 hours for 2.5" and 49700  
hours for 3.5" - coincidentally both Hitachi drives).

It may well be worth spending the extra few quid for an enterprise grade  
drive if its going to be on continuously, e.g get a WD Red rather than a  
WD Blue or the equivalent for your preferred brand.
  

--  
martin@   | Martin Gregorie
gregorie. | Essex, UK
We've slightly trimmed the long signature. Click to see the full one.
Re: Problem with an external disc drive
Quoted text here. Click to load it

The current disc is a 2TB, largely because it was what was available at  
the time.  Most of the time it is less 2% used, so it is rather bigger  
than it needs to be, but is being used on a continuous basis.

So what is a suitable replacement HDD or SSD ?


Adrian
--  
To Reply :
replace "bulleid" with "adrian" - all mail to bulleid is rejected
We've slightly trimmed the long signature. Click to see the full one.
Re: Problem with an external disc drive

 Ad> So what is a suitable replacement HDD or SSD ?

I would recommend Seagate, I use nothing else and very satisfied :)

Mount a 500GB or 1TB drive in a USB 3.0 case (more reliable than 2.0) or, as I  
did, mount a 250GB SSD in a USB 3.0 case. WD is also known to be reliable.

If you want long-term stability consider a 3.5" drive in a USB 3.0 case using  
the external power supply. I use these for Laptop backups, too :)


.- Keep the faith, --------------------------------------------------.
|                                                                    |
|    Ben  aka cMech  Web: http|ftp|binkp|telnet://cmech.dynip.com    |
|                  Email: fido4cmech(at)lusfiber.net                 |
|              Home page: http://cmech.dynip.com/homepage/ |
`----------- WildCat! Board 24/7  +1-337-984-4794  any BAUD 8,N,1 ---'

... Just my opinion, but of course I'm correct.

Re: Problem with an external disc drive
On Mon, 01 May 2017 22:08:13 +0100, Adrian wrote:

Quoted text here. Click to load it
Pass. Wot I wrote is wot I know.

I recently replaced the 3700 hour 120GB Hitachi HDD in my Lenovo R61i  
laptop with a 128GB Sandisk SSD but as smartd says it has only had 2  
hours use so far I obviously can't say anything useful about its  
durability.

The R61i was replaced by a Lenovo T440 with a Hitachi 500GB HDD with 2000  
hours on it (really cute machine with 8GB RAM and a 1600x900 screen). All  
is good so far.

The dead 3.5" 250GB Hitachi Deskstar (49700 hours) was replaced with a  
500BG WD Blue about 250 hours ago, so again I can't yet say anything  
useful about it.

Speculation follows.  

If I'd been *thinking* I should have replaced the dead 3.5" Hitachi with  
a 500GB  or 1 TB WD Red because the MTBF and warranty length for the Red  
series is much longer than for the Blue and the annoyance factor of  
replacing a disk increases at least linearly with disk size.  

Going beyond that, at some point  the annoyance factor gets big enough to  
justify replacing a single HDD with a RAID farm so you can hot-swap  
failing drives. The offline backup is still, IMO, mandatory if you go  
that way because online drives are still subject to destruction from  
mains strikes and house fires.

Hearsay and internet sources say that a consumer-grade SSD can fail  
completely with little or no warning while an enterprise SSD tends to  
fall back to read-only at EOL so you can at least pull the data off it  
when it fails.  Whether this (and cost) influences your choices depends a  
lot of how much you value the data and how frequently you make offline  
backups.

HTH
    

--  
martin@   | Martin Gregorie
gregorie. | Essex, UK
We've slightly trimmed the long signature. Click to see the full one.
Re: Problem with an external disc drive
says...
Quoted text here. Click to load it

And burglary.

--  
Cheers,
John

Re: Problem with an external disc drive

Quoted text here. Click to load it


Nor me. Given the amount reallocated sectors, and sectors it is yet
to figure out what to do with.  

Disk is dying, back it up and replace. Warranty replacement claim?
--  
--------------------------------------+------------------------------------
Mike Brown: mjb[-at-]signal11.org.uk  |    http://www.signal11.org.uk

Re: Problem with an external disc drive
On 01/05/17 15:19, Adrian wrote:
Quoted text here. Click to load it

with disks there is noth8imng to be done but get a new one

And restore data from backup. you do back up nightly don't you?


--  
No Apple devices were knowingly used in the preparation of this post.

Re: Problem with an external disc drive
Quoted text here. Click to load it

This took a bit longer than I hoped for, but I'm now back up and running  
again.  Many thanks to those who offered help and advice.

If anyone is interested, in the end, I went for a WD 2TB NAS drive,  
which is in a SATA - USB housing.

Adrian
--  
To Reply :
replace "bulleid" with "adrian" - all mail to bulleid is rejected
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline