The prevalence of CPU faults was surprising to me, as was the improved reliability of laptops (in hindsight that should've been obvious).Microsoft Research wrote:We present the first large-scale analysis of hardware failure rates on a million consumer PCs. We find that many failures are neither transient nor independent. Instead, a large portion of hardware induced failures are recurrent: a machine that crashes from a fault in hardware is up to two orders of magnitude more likely to crash a second time. For example, machines with at least 30 days of accumulated CPU time over an 8 month period had a 1 in 190 chance of crashing due to a CPU subsystem fault. Further, machines that crashed once had a probability of 1 in 3.3 of crashing a second time. Our study examines failures due to faults within the CPU, DRAM and disk subsystems. Our analysis spans desktops and laptops, CPU vendor, overclocking, underclocking, generic vs. brand name, and characteristics such as machine speed and calendar age. Among our many results, we find that CPU fault rates are correlated with the number of cycles executed, underclocked machines are significantly more reliable than machines running at their rated speed, and laptops are more reliable than desktops.
Analysis of hardware faults a million PCs
Moderator: Thanas
Analysis of hardware faults a million PCs
- Brother-Captain Gaius
- Emperor's Hand
- Posts: 6859
- Joined: 2002-10-22 12:00am
- Location: \m/
Re: Analysis of hardware faults a million PCs
Huh. I guess I've just been unlucky then; my desktops have generally been Ol' Reliable while every laptop I've ever had significant contact with has been a shoe-in for the Sledgehammer Solution.
Agitated asshole | (Ex)40K Nut | Metalhead
The vision never dies; life's a never-ending wheel
1337 posts as of 16:34 GMT-7 June 2nd, 2003
"'He or she' is an agenderphobic microaggression, Sharon. You are a bigot." ― Randy Marsh
The vision never dies; life's a never-ending wheel
1337 posts as of 16:34 GMT-7 June 2nd, 2003
"'He or she' is an agenderphobic microaggression, Sharon. You are a bigot." ― Randy Marsh
Re: Analysis of hardware faults a million PCs
Microsoft was only able to analyze CPU, RAM and disk errors; there could be many other problems with a laptop that went undetected by their metrics (which they freely admit).Brother-Captain Gaius wrote:Huh. I guess I've just been unlucky then; my desktops have generally been Ol' Reliable while every laptop I've ever had significant contact with has been a shoe-in for the Sledgehammer Solution.
Re: Analysis of hardware faults a million PCs
Well, if they didn't collect data about GPU and bridge failures, then that explains it, doesn't it?
JULY 20TH 1969 - The day the entire world was looking up
It suddenly struck me that that tiny pea, pretty and blue, was the Earth. I put up my thumb and shut one eye, and my thumb blotted out the planet Earth. I didn't feel like a giant. I felt very, very small.
- NEIL ARMSTRONG, MISSION COMMANDER, APOLLO 11
Signature dedicated to the greatest achievement of mankind.
MILDLY DERANGED PHYSICIST does not mind BREAKING the SOUND BARRIER, because it is INSURED. - Simon_Jester considering the problems of hypersonic flight for Team L.A.M.E.
It suddenly struck me that that tiny pea, pretty and blue, was the Earth. I put up my thumb and shut one eye, and my thumb blotted out the planet Earth. I didn't feel like a giant. I felt very, very small.
- NEIL ARMSTRONG, MISSION COMMANDER, APOLLO 11
Signature dedicated to the greatest achievement of mankind.
MILDLY DERANGED PHYSICIST does not mind BREAKING the SOUND BARRIER, because it is INSURED. - Simon_Jester considering the problems of hypersonic flight for Team L.A.M.E.
Re: Analysis of hardware faults a million PCs
I haven't read the study in full yet, but it seems to me that Microsoft did not include the more likely failures: power supply, motherboard, or GPU. I can't find what their data set size is either in the paper.
I bet most of the hardware failures in the study are due to overheating. I'd be more interested in seeing the difference between computers that are adequately cooled and inadequately cooled.
I bet most of the hardware failures in the study are due to overheating. I'd be more interested in seeing the difference between computers that are adequately cooled and inadequately cooled.
If it waddles like a duck and it quacks like a duck, it's a KV-5.
Vote Electron Standard, vote Tron Paul 2012
Vote Electron Standard, vote Tron Paul 2012
Re: Analysis of hardware faults a million PCs
Their analysis tools cannot report those sorts of faults. §4 says that there were c. 950,000 machines.TronPaul wrote:I haven't read the study in full yet, but it seems to me that Microsoft did not include the more likely failures: power supply, motherboard, or GPU. I can't find what their data set size is either in the paper.
Alas, Windows doesn't really grab that data :/ Still, the levels of unreliability are much higher than I expected.I bet most of the hardware failures in the study are due to overheating. I'd be more interested in seeing the difference between computers that are adequately cooled and inadequately cooled.
- Sarevok
- The Fearless One
- Posts: 10681
- Joined: 2002-12-24 07:29am
- Location: The Covenants last and final line of defense
Re: Analysis of hardware faults a million PCs
Can someone elaborate CPU faults ? Is it the hardware developing a permanent fault or an error in CPU design causing it to execute an operation different way leading to software crashes ?
I have to tell you something everything I wrote above is a lie.
Re: Analysis of hardware faults a million PCs
Both. The study defines a CPU fault as when it issues a machine-check exception.Sarevok wrote:Can someone elaborate CPU faults ? Is it the hardware developing a permanent fault or an error in CPU design causing it to execute an operation different way leading to software crashes ?