Just gone back to check when I started this game. I built the system back in August 2010. I started with a Radeon ATI graphics card and as I remember (this was before I started writing things down) had hibernate/resume problems so I swapped the card for an Nvidia GTX460. I had had success with Nvidia on a previous machine running XP - although I had problems with hibernate/resume with that one until I eventually tamed it.
I also have an Antec CP850 power supply, one of the bigger-than-ATX-standard supplies that only fits a few Antec cases. Seems to have had good reviews for stability, ripple and so on. No UPS, though. Additionally, in the last 18 months or so, I think that I might have had just one unexplained BSOD while the machine was running. It has been pretty stable by Windows standards. It is only at resume-time that I see problems. What's different about resume that doesn't happen at boot? Is the Vdram thing relevant? Dunno - haven't actually hung a voltmeter on the motherboard test point to check.
Once past the "verifying DMI pool" stage where I did have problems with (I think) the SATA3 controller, the hang or BSOD only seems to happen once Windows has been reloaded into memory and started running. Is there a driver somewhere that does not do the right things on resume when it comes to reinitialising the hardware? Is the memory image itself corrupt (and the symptoms are consistent with this)? Does the BIOS do the right things on resume to reset the hardware properly (and there is mention of this in an X58 chipset release note that I found that gives a BIOS instruction sequence necessary to properly do this - Gigabyte would have done this, wouldn't they?)
I'm currently running the system with ErP disabled in the BIOS (power management-related parameter that just might be relevant as it affects power to the motherboard in "S5" state - although that's not a very well-defined power state). So far, it's done a couple of days and maybe a half-dozen resumes without a problem, but sometimes it can go a week without showing any issues. It's also unclear whether there is any sensitivity to how long it has hibernated before resuming, so 7 hibernate/resume cycles in a day is not the same as one cycle per day for a week. I hate intermittent faults with a passion! My personal next-steps are going to be:
- replace graphics card with a different model (despite this being the second card that I have used but it's an easy one to try)
- reconfigure the disks by moving around data so I can use it normally with only one hard drive in use and the RAID controller and its disks disabled.
18 months this has been going on. I've just built a new system for my daughter-in-law with an i7 and Z68 chipset, so the mark 2 version of the CPU and newer chipset. Wonder if she'll notice if I sneak it out for testing here...?