A PC question.

This is something that seems to have defeated the best minds in the UK so I thought I'd try here...;-)

I have a home assembled PC - about 18 months old - using an Asus A8N5X MB and an Athlon 64 3000+ CPU. It's mainly used for semi-pro AV work.

After a year or so of faultless service, it started shutting down at random. Would usually boot up again ok and carry on. After a few occasions I took to having PCProbe loaded and noticed the CPU temp would shoot up just before it shut down. So naturally removed the heatsink/fan, cleaned and replaced with new thermal transfer compound.

All was well for a month or so, then the fault started happening earlier and earlier - sometimes before XP had loaded. The bios power management page again showed the CPU overheating - going from ambient to overheat in around a minute. But the heatsink was cool to the touch. ;-)

I was intending to use the shotgun approach and simply replace the MB - and possibly CPU - but it seems this design of MB is now obsolete so I'd have to change lots of other things too.

I've not been able to find any description of how the CPU temp sensing works let alone any clues on fixing what must be an intermittent fault - as I've stripped and re-assembled the entire computer, cleaned all connectors etc, and it's fine again once more. But for how long?... Any informed guesses?

--
 

    Dave Plowman        dave@davenoise.co.uk           London SW
                  To e-mail, change noise into sound.
Reply to
Dave Plowman (News)
Loading thread data ...

I wonder if you have one of those LSI do-it-all ball grid array surface mount chips on your mobo? They have circuitry for controlling fans etc. I have had a good number of them in laptops that give wierd faults such as yours. It seems that sometimes the soldering of these is not quite as good as it should be and I have repaired a few by the judicious use of a heat gun together with a temperature probe. (Not recommended unless you are faced with a choice of dumping it or having a go at fixing it) You may be able to check if that is the problem by using freezer spray. I have seen reports of some gaming machine or other that suffers from poor soldering of the BGA chips - there is even a video of someone reflowing the solder using half a cat-food tin containing ingnited alcohol stood on the offending chip on youtube or somewhere. (Definitely NOT recommended, although it seemed to work!!). Anyway, thats my contribution for what its worth.

Reply to
R.Smith

Speculating here, but if the heatsink is cool and the CPU says that it's overheating, then you may not have gotten a good thermal bond between the two.

Many possible causes. Might have too much thermal compound. The heatsink could be canted at an angle where there's only a linear contact area. Might be some foreign material between the two (cat hairs?).

Try the clean and replace process one more time. The Arctic Silver website has some how-to's that apply to installing CPU heatsinks generally.

--
Rich Webb     Norfolk, VA
Reply to
Rich Webb

"Dave Plowman (News)" wrote in news: snipped-for-privacy@davenoise.co.uk:

I had a similar shutdown problem on an older 900Mhz Athlon,and replacing the CPU fan solved it. The fan still turned,but drew a lot of current,causing the shutdown.

--
Jim Yanik
jyanik
at
kua.net
Reply to
Jim Yanik

That will be the infamous ibook G3 logic board fault. The ATI video chips are BGA with a poor soldering history, and sometimes the heatgun thing works, sometimes the board is toast, likewise with professional reflowing but with a somewhat better success rate. Whether this is a permanent fix has not clearly been established - any disturbance of a dry joint may get it working for a while.

Gareth.

Reply to
Gareth Magennis

OK,

  1. If it's a work PC meant to earn money and depreciated to nil after a moment of time - assume it's got there now and replace it.

Or ...

There will be an ASUS three year waranty on your motherboard. Contact the retailer?

--
Adrian C
Reply to
Adrian C

Can you disable the thermal shutdown in the BIOS?

Reply to
James Sweet

Definitely not. I'm fairly experienced at this sort of thing.

All rather obvious, I'm afraid. But the design of the heatsink makes it near impossible to fit incorrectly. It has over centre latches which wouldn't work if anything was wrong. And when I have removed it, it has needed a deal of force due to the 'sticion' caused by the air seal.

--
*Can fat people go skinny-dipping?

    Dave Plowman        dave@davenoise.co.uk           London SW
                  To e-mail, change noise into sound.
Reply to
Dave Plowman (News)

Fine. You pay for it then. ;-)

It's a point. I know the CPU has a three year warranty but wasn't sure about the MB. However, the retailer says it's obsolete, so presumably can't be replaced by an identical one.

--
*It was all so different before everything changed.

    Dave Plowman        dave@davenoise.co.uk           London SW
                  To e-mail, change noise into sound.
Reply to
Dave Plowman (News)

OK, I'm planning on raiding a bank tomorrow at 10am. If you can be by the getaway car at 10.15am, I'll be the one wearing the red Balaclava, striped shirt and carrying a large bag labelled 'SWAG'. Our boss won't mind if you pinch some readies from that. In fact you could also take some readies along to my boss, and then I can step on a plane somewhere where him and his goons can't find me...

Yes, the motherboard will be covered if it's a boxed retail model. Stick to your guns with the retailer, insist they have to send it back and it will be replaced by ASUS with like or fixed.

I've had Gigabyte motherboards repaired no issues by my fav retailer (RL Supplies in Watford) but some others (of the box shifters mode) will unfortunately give folks the brush-off as it's unwanted hassle for them....

--
Adrian C
Reply to
Adrian C

Grrr... I mean't sent for repair...

no issues by my fav retailer (RL

--
Adrian C
Reply to
Adrian C

Usually it's best to leave the retailer out of it and RMA the item back to the manufacture directly.

Reply to
James Sweet

Hi Dave,

Have you tried any other temerature measurment software? I use both "Everest Home Edition" and "Core Temp" on my Asus A8N SLI Deluxe and A8N SLI SE based machines, and they both give identical results for temps. Your board is just a version of mine.

I had an early shutdown problem reecently: the machine would get to the end of populating the righthand icon bar and restart as soon as the the F-Secure Anti-Virus software finished installing and the Windows Update icon appeared. This was after an earlier install of an app. so may have been a virus problem, although the same app., after winding the system back a few days using System Restore, and then re-installing it, has worked fine ever since. Might just be worth a try?

Just an idea, from one RISC OS user to another.

Regards,

Sarah

Reply to
Sarah E. Bailey

have you looked for swollen caps around the processor and checked the esr of them ?

Reply to
nipperchipper

Good observation on the CPU temp, but it might just be a red herring. Running lots of operations in the CPU will also spike its temp, this may just be a normal signature of the crashing code.

I'm battling one of these random problems myself. If you go to the properties of my computer, the advanced tab and Setup button in the startup and recovery section you can uncheck "Automatically Restart" and also make sure it is creating a crash dump file (note the folder location). You will now begin seeing the BSOD (blue screen of death) when you crash and you can learn if the same operation is crashing it each time or if it truly is random.

Find these crash dump files and use MSDEBUG. If you want to try it, just download it from here:

formatting link

The analysis of the crash file will tell you which driver is crashing. Running the -V command inside the result file will dump the last few lines of code it was running and the addresses it was using. If you are lucky it will point to a driver for a peripheral. I keep getting NTOSKRNL.sys which is essentially the windows memory manager and is too generic to be useful. Look up the results on the last line of the report on google for more clues.

These problems can be maddening to fix. It could be a driver compatibility issue or a hardware failure. Months of trial and error are usually required and even then, you may not know if or how it was repaired.

Try running in safe mode for a few days if you can tolerate it. If it stopps crashing, this suggests a driver. You can also temporarily disable many drivers and TSR programs from MSCONFIG (use the run command in the start button to invoke it).

Reply to
pipedown

FWIW, another method to separate software from hardware would be to download and burn a Linux bootable CDROM. You can run Linux for a few days and see if it still crashes. All the drivers will be completely different for a different OS thus eliminating a bad windows setup in one swoop.

Reply to
pipedown

On Tue, 13 May 2008 14:06:10 +0100, "Dave Plowman (News)" put finger to keyboard and composed:

There is a thermal diode on the CPU die. The diode appears on two pins, THERMDA and THERMDC (anode and cathode?). An external hardware monitor chip senses the diode voltage and adds calibration factors (eg temperature offset) as per the CPU's Thermtrip Status Register. Furthermore AMD's datasheet states that "if the temperature sensor has an ideality factor different from 1.008, a small correction to this offset is required".

So, AFAICT, your CPU's temperature is being sensed right at the die, and involves only one additional piece of hardware, namely the hardware monitor chip on your motherboard.

Is the CPU's Vcore voltage within spec?

- Franc Zabkar

--
Please remove one \'i\' from my address when replying by email.
Reply to
Franc Zabkar

Thanks very much - I couldn't find that information.

Ok.

Yes.

I decided to have one more go and cleaned the MB round and underneath the socket, removed the socket cover and cleaned the connectors, and the CPU. All of which I've tried before. Assembled just the basics - HD and video card. And it worked. Re-installed the rest of the hardware checking at each step. Still fine. Now got it all as was - and still ok. Been on now all night - and still fine. CPU temp is 42C while AVC is doing its routine tests. I'm confused. Most things like a dry joint or cracked track manifest as it warms up.

--
*It\'s a thankless job, but I\'ve got a lot of Karma to burn off 

    Dave Plowman        dave@davenoise.co.uk           London SW
                  To e-mail, change noise into sound.
Reply to
Dave Plowman (News)
[snip]

[snip]

Yes - but if you read the bit of my post I've left in this was happening sometimes before XP had loaded. Also I'd not get the 'BSOD' as the PS was shutting the machine down completely. Sometimes pressing the power button would start it again right away - sometimes not.

[snip]
--
*Why isn\'t there mouse-flavoured cat food?

    Dave Plowman        dave@davenoise.co.uk           London SW
                  To e-mail, change noise into sound.
Reply to
Dave Plowman (News)

Looked, yes. But I'd really not expect this on a board so new and from a respected maker.

--
*Why is the word abbreviation so long? *

    Dave Plowman        dave@davenoise.co.uk           London SW
                  To e-mail, change noise into sound.
Reply to
Dave Plowman (News)

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.