[OT] PC Hardware Problem

It's been ages since I kept current with PC technology, so I wanted to run this by some of you, to see if it lights any bulbs.

One of my boxen runs for a while, then (in Linux at least) kernel panics and resets (in Windows it resets, but I haven't stood over it to know if Windows notices the problem). My kid and I were working on it today to reinstall Ubuntu on the theory that the software was just royally screwed, which is when I noticed the kernel panicking.

It acts like a thermal problem -- leave it off for a long time and it takes a long time to have a problem, use it a lot and it happens much more often. All the fans work, and at one point I was able to monitor the various system temperatures which showed OK, so it's not something simple like the processor overheating.

At this point I'm about ready to start swapping parts, but part-swapping costs $$, so I thought I'd ask the group if these symptoms sound familiar, and if you found out anything specific to go with them.

--
www.wescottdesign.com
Reply to
Tim Wescott
Loading thread data ...

I had the same problem. Starting a few weeks ago my computer started freezing up mainly after I would leave it(say in over night). It turned out that the heatsink compound was dried up... fixed that and it's been running fine ever since(about 2 weeks).

Of course it could potentially have been something else but that seems to have been the issue. What happened was the thermal compound was relatively dry and I guess wasn't making good enough contact and would eventually cause the thermal sensor to trip(most modern CPU's have a shutdown mode to prevent damage).

I was monitoring the temp too but since it always happened when I was off(except the last few times) I never knew what was going on and imagined it couldn't be overheating when I wasn't on it(since it was basically in idle) but after replacing the compound no issues at all.

Anyways, it's worth a look...

It could be the memory or PS... usually one of those is the issue(Which is why I figured it was my memory since I have a monster PS).

Reply to
Jon Slaughter

I'd ask what brand but that's not really *that* important.

Check all your electrolytics -- especially the bulk decoupling around the processor. Obviously, bulges are a sure sign of problems. But, even "normal" looking caps can be an issue (check the names of the manufacturers on the caps!).

What folks often fail to take into account is the caps in the power supply. I've found a couple of machines with the symptoms you described that were actually bad caps in the PS. Depending on the type of PS, it might be easier just to swap it (with a similar one from another machine -- so you don't have to buy new to run the test).

I have a pair of Dell servers that have bad caps in the "power sharing" module (has triple redundant power supplies) currently. Yet, the power supplies and "motherboard" are fine. (I have seen a *lot* of Dell desktop machines with bad caps on the motherboards)

Also *try* to notice if there is any pattern to the failure. E.g., I had a machine that would crap out *only* when the CD was accessed. Of course, that was a common activity during an OS install :> Yet, if I built a disk image on another machine and swapped the drive into the problem machine, the machine would run "forever" without a problem -- until the CD was accessed.

Reply to
D Yuniskis

s
f

to

r
g

d/l and burn a boot disk (from a working computer) from memtest86.com and run on the problematic computer

Michael

Reply to
Michael

Memory SIMMS can cause that...or something else. Take all cards out, clean contacts and reseat them and all the cables. If more than one SIMM, try to run on one or the other and move it to different slots. Disconnect all devices and drives to see if it will act-up with just the MB, processor and a SIMM. Examine all the capacitors on the MB with a magnifying glass to look for bulges and splits. Do you smell anything? Pull the cover and clean out the power supply. I've never seen heatsink compound fail but good idea to redo it. Sometimes if you just give it a ride in the trunk of a car for a few days it will work fine.

Reply to
Buerste

Have you tried running memtest86? It's usually an option when you're in grub. Also clean the heatsink fins if you haven't Mine were packed but you couldn't tell till you removed the fan.

Reply to
kfvorwerk

Check the BIOS settings, and if possible alter them (memory timing, CPU speed, etc.) to use more conservative settings. If the BIOS settings were not improper (overclocking) and the problem goes away, don't expect it to be a permanent fix as the faulty part is likely to continue to deteriorate until even the more conservative settings will not keep it stable.

The next thing I would do is buy/get one new memory module. Then, run on just that one for awhile. If the problem does not happen at all, one of your original memory modules is probably bad. Reintroducing them one by one should reveal the troublemaker.

You can use a hair dryer (watch your operating temps, don't over do the heat) and freeze spray on the CPU and main chips to see if one is thermally sensitive.

You can try pulling cards and see if one is causing problems on the power supply or bus, but I have not see cards cause problems like this often in practice (it usually shows up as problems related to the function of the card).

The next thing I would do is swap out the power supply.

What remains is problems with the onboard chips or electrolytic capacitors which probably means a new motherboard is in order. If you want a recommendation here, I've had good experiences from those from Tyan.

I have found that compiling a Linux kernel:

formatting link

can present a more intensive test than many memory test routines, although I would still suggest you try them. IMHO, a machine should not be put into service until it can show that it can compile a kernel a few times after being fully warmed up (run for a couple hours before compiling).

BTW, I'm CCing this to sci.electronics.repair as that is a better place to ask this.

Cheers,

Mike Shell

Reply to
Michael Shell

Good morning, Tim.

I have a couple comments / suggestions.

First, in my experience, problems like this generally start in the power supply. Check the obvious things first.

Second, if you've been away from PC builds for a while, you may be pleasantly surprised to learn how cheap things have gotten lately. Click away over at

formatting link
for some PC deals. Recently, one on my engineering computers was starting to have intermittent hick-ups (best way I can describe it). I decided it was time for an upgrade. $350 later, the machine is

100%, and it's blazingly fast. (And I didn't even go whole-hog on the upgrade!) It's a route worth considering.... The $350 did include a new, legit copy of Windows XP-3 by the way, otherwise it would have been even less money spent.

Third, as others have pointed out, bad memory (RAM) can also be the source of the problem.

Forth -- Whatever you do, don't entice Skybuck into this thread! :)

- mpm

Reply to
mpm

Definitely run memtest86 for a few hours. We had a similar problem at work, turned out to be memory. It did not show up until we ran memtest86 for a few hours.

i
Reply to
Ignoramus15530

Others have made good suggestions regarding the PSU and the memory. Very often it's just the contacts rather than a defective piece of hardware. I suggest you remove the RAM sticks and clean the contacts, taking the usual precautions against ESD. Brush and blow the slots too. You didn't say how old the computer is. If it's using a PATA HDD, check the Molex power connector by pushing and pulling at the wires _individually_ to see if one or more are a bit loose. I've come across numerous cases in which symptoms like the one you're having now are caused by either of these two possibilities.

Reply to
pimpom

Could you tell if the problem was a poor connection? I've found that wiggling connectors solves the majority of electronic problems. When I was testing and repairing field returns over half the boards had no problem although they came from competent techs at major semiconductor manufacturers. We assumed that swapping the board cleaned the contacts, and they sent the old one back as a precaution. Those boards almost never came back a second time.

I suspected that ammonia from floor cleaning compounds was attacking the copper underneath the gold fingers. I couldn't convince management to pay to have that tested, but they did confirm that silicone from candy bar wrappers contaminated boards and caused poor solder joints. It's applied to make them slide out of vending machines better.

jsw

Reply to
Jim Wilkins

My first thought on seeing the kernel panic was Memtest. Under Memtest it quietly locks up and resets.

--
www.wescottdesign.com
Reply to
Tim Wescott

That sounds like bad electrolytics in the power supply for the CPU. If you keep using it, it will reach a point where it won't even boot. There are about a half dozen low ESR electrolytics in parallel. The AC current through them causes them to heat up. As the ESR rises, the capacitor dries up, causing more jheat until it fails. If you are good with a soldering iron I would replace the electrolytics. Make sure to use brand name low ESR 105° C parts.

Some motherboards have a couple extra sets of holes where they either went with a higher capacitance, or cut corners.

--
Greed is the root of all eBay.
Reply to
Michael A. Terrell

Have you taken a good look at the caps on the motherboard??? Mine had started misbehaving about a month back and got to the point it sometimes would not boot. 5 33000 mfd caps had "popped".

Take a look at the top of the cans - they have 3 lines scored in them. If they are at all convex they are shot. I replaced all 5 caps last night and it is working fine

Reply to
clare

Tim Wescott Inscribed thus:

Thats a clue that the capacitors in the memory power supplies could be the cause. To confirm that, temporally substitute the ram module(s). If the fault remains, replace the caps, usually three electrolytics.

--
Best Regards:
                     Baron.
Reply to
Baron

Thanks to all. We've been reminded of stuff we knew, used your responses to order our attack, and had the obvious (dust in the heat sink) pointed out.

And there _was_ enough dust in the CPU heat sink to insulate a house. I don't know if that was the problem, but it sure could have been.

--
www.wescottdesign.com
Reply to
Tim Wescott

While not necessarilly applicable here I thought I'd mention it. A friends daughter, 14, was complaining of her laptop BSODing regularly and we investigated and found no problems but suspected a heat related cause. In the end it seems she like to use the laptop on the carpet or her lap and both situations blocked the CPU cooling causing the issue. When used on a suitable flat surface that didn't block the cooling all was fine.

Reply to
David Billington

Tim Wescott Inscribed thus:

It must be bad for Memtest to cause the CPU to shut down due to heat !

--
Best Regards:
                     Baron.
Reply to
Baron

Even a small hair between the CPU heatsink and the CPU can cause a thermally induced system crash. It also could be a defective hard drive or in the case of some motherboards leaking and swelling electrolytic capacitors.

--
Joe Leikhim K4SAT
"The RFI-EMI-GUY"©

"Use only Genuine Interocitor Parts" Tom Servo  ;-P
Reply to
RFI-EMI-GUY

Also suspect the power supply. Some PSUs are such POSs that they will try to kill the mombo when (not if) the e-caps dry out. I won't buy a PSU without OVP.

Best regards, Spehro Pefhany

--
"it's the network..."                          "The Journey is the reward"
speff@interlog.com             Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog  Info for designers:  http://www.speff.com
Reply to
Spehro Pefhany

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.