Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Anyone have experience with these 2 Centos Dell PowerEdge 830 errors?
(1) PCIe Training Error: Embedded Bus#00/Dev#1C/Func#4
(2) SATA port 0 not found

I'm in a small training class where the teacher's old computer died.
I told her I'd look at it where those are the two errors on the screen.
(1)
https://cdn1.imggmi.com/uploads/2019/3/22/682cc0489fb1b14d13b8b9d62c857074-full.jpg
(2)
https://cdn1.imggmi.com/uploads/2019/3/22/e4298661477cb43bcbf788d135f23977-full.jpg

Opening the case, I see only this card in a long slot on the motherboard.
https://cdn1.imggmi.com/uploads/2019/3/22/cda5568220f2c5fe51a1e452de059e9e-full.jpg

I don't know what the card does but it has an SATA cable to each of 4 HDDs.
https://cdn1.imggmi.com/uploads/2019/3/22/cb54c28ae2f721e3de71a916f27059c4-full.jpg

It seems disk 0 of the four disks is "unknown device" for some reason.
https://cdn1.imggmi.com/uploads/2019/3/22/05b4cd669244d8b161aa02595e73b67f-full.jpg

Only 3 of the 4 "Arrays" are found (What is an array? Is that a disk?)
https://cdn1.imggmi.com/uploads/2019/3/22/5d842ac2bbd08fb429d611e4afcc8eae-full.jpg

Do you have debugging advice that I can give to this teacher for her Dell?
f

Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
On Friday, March 22, 2019 at 12:37:43 AM UTC-4, Oliver Wilson wrote:
Quoted text here. Click to load it
57074-full.jpg
Quoted text here. Click to load it
23977-full.jpg
Quoted text here. Click to load it
e-full.jpg
Quoted text here. Click to load it
s.
4-full.jpg
Quoted text here. Click to load it
f-full.jpg
Quoted text here. Click to load it
e-full.jpg
Quoted text here. Click to load it
?
Quoted text here. Click to load it

My first inclination is to think that the card is a raid controller and tha
t the first disk has failed.  Part of the "conversation" is to identify eac
h device to the controller.  These are smart devices these days.  I would c
heck disk 0.  The disk itself is probably OK, but its controller may have f
ailed.

Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
snipped-for-privacy@yahoo.com wrote:

Quoted text here. Click to load it

The RAID card seems to have four SATA devices attached, apparently all  
working, which are formed into three arrays

since there are two 80GB disks ST380013, those are both likely members  
of the RAID 1 74GB array#2

The single 500GB ST3500320, is likely a single disk volume, 465GB array#1

and the single 1TB WD1003FZEX is likely the single disk volume 931GB array#0

so I'd say the physical and logical disks are fine, and that at some  
point the RAID controller is talking to them, the issue seems to be the  
server is sometimes having issues negotiating PCIe links to the RAID card

if you're lucky try removing and re-seating the PCIe card in case it's a  
loose contact, but other people seem to have either failed capacitors or  
a mismatch of PCIe generations between the card and the motherboard

<https://serverfault.com/questions/310041/dell-poweredge-pcie-training-error-what-to-do


Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
Re: Fri, 22 Mar 2019 13:58:50 +0000,  

Quoted text here. Click to load it

I am confused.

The PCIe slots are the black unused slots I think where there is only one
card, which is that RAID card you indentified.

I did move the RAID card from the leftmost long white slot to the rightmost
long white slot and that "helped".

Both errors remained but at least the machine booted to CentOS after I did
that switch (and also reseated all cables, blew all dust out, rebooted, so
it could be any number of things that allowed the machine to boot to
CentOS).

My main confusion about the SATA 0 unknown is whether it's the 1TB disk
that's bad, or the RAID card that is bad.

I seem to see you saying that you think the 1TB drive is actually good?
Did I understand that correctly?

If the 1TB drive is likely good, then are you saying the RAID card is
likely bad?

Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
Oliver Wilson wrote:

Quoted text here. Click to load it

I think the message about SATA port0 is referring to a SATA port on the  
motherboard, not a SATA port on the RAID card.

Quoted text here. Click to load it

One of your photos shows the RAID card saying that all four drives and  
all three arrays are good.

Quoted text here. Click to load it

Hopefully it is good too, after all one of your photos does show the  
RAID card having detected the drives and saying the arrays are optimal.

I would move the card back to the slot it was in, different PCIe slots  
can have different numbers of "lanes" and the PCIe training error you  
show is referring to the motherboard and PCIe card being unable to agree  
the correct number of lanes to use.




Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
Quoted text here. Click to load it

That's the way it looked to me, as well.  The SATA ports and drives on
the motherboard are normally handled by the motherboard chipset and
the BIOS.

The motherboard BIOS doesn't deal directly with the ports on the
add-on card.  These are the responsibility of the card's own on-board
BIOS - the resulting drives/volumes are registered as drives, but not
as "ports" per se.

Quoted text here. Click to load it

Simply unplugging, and then re-seating a controller card can often be
effective at resolving problems like this.  Not always, but it
sometimes works.

Make sure that the card is properly seated in the slot, both when you
first plug it in, and after you screw the card bracket to the case!

I've seen plenty of situations in which a bent bracket, or a case
having slots of a funny size, or a bit of obstruction at the bottom of
the card slot where the bracket "finger" fits in, is enough to cause
the act of "screwing down" the card to actually flex the card upwards
a bit out of the PCI or PCIe slot.  Even if it works OK at first, the
card sometimes works its way upwards a bit further and the slot
connection becomes intermittent.






Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
On Fri, 22 Mar 2019 12:19:02 -0400, Oliver Wilson wrote:

Quoted text here. Click to load it
Do you have, or can you get, a Linux system that you can use to check the  
disks? It might have a spare SATA connector you can connect the disk  
being tested to or, easier, you could use USB-connected disk dock that  
you can slot the disks being tested into.

If so, try two tests, both to be run with the disk powered up but not  
mounted.

- 1 (quick) run gparted to look at the disk partitioning.  
  Are any errors reported?  
  Does the partitioning scheme look sensible and is it the same on  
  mirror disks?

- 2 (slower) run "fsck -p" against each partition each disk.
  If any errors are reported, try using fsck to repair the failing
  partition(s).

- 3 install smartd if it isn't already installed and use it to see how
  many hours each disk has run and what prefailure and/or failure
  indications each of them shows  

  I've had good quality (Fujitsu and Western Digital) disks fail at around
  40-50k hours and cheap consumer crap fail at 3000 hours.

Quoted text here. Click to load it
If those tests show the disks are OK, THEN you should suspect the RAID  
controller.
  

--  
Martin    | martin at
Gregorie  | gregorie dot org

Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
Martin Gregorie wrote:

Quoted text here. Click to load it

If (some of) the disks have RAID metadata on them, be very careful  
attaching them to non-RAID SATA ports ...

Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
On Fri, 22 Mar 2019 20:42:13 +0000, Andy Burns wrote:

Quoted text here. Click to load it

I don't 'do' RAID (never needed to outside RAID 1 on Tandem NonStop and  
Stratus fault tolerant systems, but apart from suggesting gparted or fsck  
repairs (WHICH THE OP CAN EASILY IGNORE), everything else I suggested is,  
or should be, read-only. How could read-only checks mess up RAID metadata?

Colour me genuinely puzzled: an explanation would be appreciated.

What I described (looking at what gparted, fsck and smartd have to say)  
is no more and no less that what I do routinely to hopefully spot failing  
disks before they break.



--  
Martin    | martin at
Gregorie  | gregorie dot org

Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
Martin Gregorie wrote:

Quoted text here. Click to load it

read-only wouldn't, but an inexperienced user could accidentally write  
something, and if the disks are from a RAID system, the partitions  
probably don't start where partition tools are going to be looking for  
them, so you'll get a false sense that there aren't valid partitions on  
the disks.


Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
Andy Burns wrote:

Quoted text here. Click to load it

s/partitions/file-systems

Quoted text here. Click to load it

Generally to inspect RAID disks that aren't attached to their RAID  
controller, you need special software, e.g.

<https://www.osforensics.com/rebuild-raid.html

Re: Centos Dell PowerEdge 830 PCIe Training Error & SATA port 0 not found
On Fri, 22 Mar 2019 22:31:07 +0000, Andy Burns wrote:

Quoted text here. Click to load it

OK, noted. Thanks.





--  
Martin    | martin at
Gregorie  | gregorie dot org

Site Timeline