Overclocking Without Permission
Sorry for the long delay between posts this month. I really want to try to get to posting a couple of posts a week, but Real Life keeps popping up and using up my time. In any event, I am going to try to get back to that frequency starting now. I have a lot of ideas for posts, and am going to start actually typing them up. I'll start off with something I promised a while ago.
As I mentioned in the post about my custom PC build, I had an issue with the system crashing randomly. It only crashed twice - once when I was running the Windows Experience Index, and once overnight. This worried me because it was a brand new system, and everything should have been in tip-top shape - unless I screwed something up (or received a faulty piece of equipment). I was really hoping I wouldn't have to redo all or part of the build. Luckily, I enjoy troubleshooting almost as much as I like building PCs, so I put on my thinking cap (do they still make those) and started working.
When I checked the event logs for the second crash, I saw that the system crashed at 3:02 AM. Since I wasn't actively using the system then, it really puzzled me... while the system can do some automatic tasks overnight, what could it do that makes the system crash? The event logs also indicated that it was a hardware issue, but didn't specify which hardware caused the crash. Since I had to work, I put the minidump files onto a flash drive and left for the office.
While I was at work, I went to the ASUS site and downloaded all the updates available for my motherboard, as well as a few utilities. I also reviewed the minidump files, but they didn't really give me much information - it looked like a different module crashed each time. This made it look even more like a hardware problem to me, but I still wasn't sure.
When I got home, I installed the updates as well as some of the additional ASUS utilities. I also installed Who Crashed, a utility that reads minidump files and translates the contents of the file into plain English. Analyzing the logs in Who Crashed didn't give me any additional information as to the root cause, but that was due to the nature of the crashes. Since the crashes seemed like they were probably hardware related, I decided to do some tests.
The ASUS utilities included ASUS PC Diagnostics, a program that includes a "stress test" for the CPU, system memory, and the Video system. I decided to start with a CPU test. The CPU test forces the processor to calculate Pi and uses 100% of all available cores. It runs for as long as you want it, but defaults to1 minute. I ran the one-minute test, and the system only made it through 45 seconds before crashing. While it looked like the CPU was the issue, I ran the memory and video tests to confirm. Both test ran without any issues. It seems that I had an issue with the CPU.
While I wasn't sure what the exact issue was, I started to suspect it was a heat issue. This was mainly due to the fact that under basic use or idling, the computer didn't crash. Both minidump files showed a crash while the system was performing maintenance that could involve the CPU. I tried running the CPU test while running the ASUS AI II suite's monitoring module, but the diagnostic test turned off the module before running the test. The computer made it through abut 30 seconds before crashing, and this time the BIOS gave a CPU temperature warning when the computer restarted. This confirmed the heat issue, but I still wanted to see what was going on. This lead me to discover the root problem.
I really wanted to watch the CPU temperature while I ran a stress test, so I download Core Temp, a utility I have been using for a while. I installed the program, and watched it while I ran the stress test. The system crashed after about 40 seconds, and Core Temp showed that it definitely was a temperature issue. The core temperatures increased to over 105 Celsius (the TJ Max - or the point where the processor will start to throttle or shut down to avoid damage) very quickly. Core Temp also inadvertently showed me the root cause of the problem. It seems my processor was overclocked.
As I mentioned in the post about my custom PC build, I had an issue with the system crashing randomly. It only crashed twice - once when I was running the Windows Experience Index, and once overnight. This worried me because it was a brand new system, and everything should have been in tip-top shape - unless I screwed something up (or received a faulty piece of equipment). I was really hoping I wouldn't have to redo all or part of the build. Luckily, I enjoy troubleshooting almost as much as I like building PCs, so I put on my thinking cap (do they still make those) and started working.
When I checked the event logs for the second crash, I saw that the system crashed at 3:02 AM. Since I wasn't actively using the system then, it really puzzled me... while the system can do some automatic tasks overnight, what could it do that makes the system crash? The event logs also indicated that it was a hardware issue, but didn't specify which hardware caused the crash. Since I had to work, I put the minidump files onto a flash drive and left for the office.
While I was at work, I went to the ASUS site and downloaded all the updates available for my motherboard, as well as a few utilities. I also reviewed the minidump files, but they didn't really give me much information - it looked like a different module crashed each time. This made it look even more like a hardware problem to me, but I still wasn't sure.
When I got home, I installed the updates as well as some of the additional ASUS utilities. I also installed Who Crashed, a utility that reads minidump files and translates the contents of the file into plain English. Analyzing the logs in Who Crashed didn't give me any additional information as to the root cause, but that was due to the nature of the crashes. Since the crashes seemed like they were probably hardware related, I decided to do some tests.
The ASUS utilities included ASUS PC Diagnostics, a program that includes a "stress test" for the CPU, system memory, and the Video system. I decided to start with a CPU test. The CPU test forces the processor to calculate Pi and uses 100% of all available cores. It runs for as long as you want it, but defaults to1 minute. I ran the one-minute test, and the system only made it through 45 seconds before crashing. While it looked like the CPU was the issue, I ran the memory and video tests to confirm. Both test ran without any issues. It seems that I had an issue with the CPU.
While I wasn't sure what the exact issue was, I started to suspect it was a heat issue. This was mainly due to the fact that under basic use or idling, the computer didn't crash. Both minidump files showed a crash while the system was performing maintenance that could involve the CPU. I tried running the CPU test while running the ASUS AI II suite's monitoring module, but the diagnostic test turned off the module before running the test. The computer made it through abut 30 seconds before crashing, and this time the BIOS gave a CPU temperature warning when the computer restarted. This confirmed the heat issue, but I still wanted to see what was going on. This lead me to discover the root problem.
I really wanted to watch the CPU temperature while I ran a stress test, so I download Core Temp, a utility I have been using for a while. I installed the program, and watched it while I ran the stress test. The system crashed after about 40 seconds, and Core Temp showed that it definitely was a temperature issue. The core temperatures increased to over 105 Celsius (the TJ Max - or the point where the processor will start to throttle or shut down to avoid damage) very quickly. Core Temp also inadvertently showed me the root cause of the problem. It seems my processor was overclocked.
Core Temp Window (image from Core Temp Web site)
In addition to showing the core temperature, Core Temp also shows other processor information. One item is the frequency, which, in my case, showed that the processor was set at around 4.2 GHz. Since my processor's stock speed was 3.5 GHz, and it had a turbo speed of 3.9, this was weird. I went in to the ASUS AI II app and found a module that showed the processor settings. This confirmed that my processor was overclocked. I used the utility to change the setting back to stock, and then ran the stress test again. It made it through 1 minute without an issue, and the temps didn't get above 65 C. I ran a 10 minute test with the same result. Now the question was "how did this happen?"
While I am not exactly sure what happened, I have narrowed it down to the fact that I like to poke around with settings, and at some point, I had clicked a button in the ASUS AI II software that caused the system to automatically overclock*. Since I had just been poking around, I hadn't noticed, and this is what caused the system to crash under high processor loads. I made sure that the stock settings stuck, and then purchased an aftermarket cooling system, and haven't had random crash issues since. I have subsequently overclocked my system on purpose, and it still is stable as a rock.
The main lesson I learned in all of this is that, when I am poking around trying settings, make sure to find out what the settings do before I try them (especially if they mess with core system settings).
I'm curious as to what aftermarket cooling system you chose ... did it replace the CPU heat-sink and fan that appeared in the photo of your original build?
ReplyDeleteDid you leave your CPU overclocked setting in place once you installed the aftermarket cooling components?
Finally, is it possible for applications to request temporary overclocking through a system call to Windows 8?
I ended up going with a total replacement cooling system - I bought an Antec Kuhler H20 620 liquid cooling system. I did leave the overclocking settings in place. I let the ASUS auto-tune software find the best settings for me for now - may play with it later. I left all the Intel Turbo and throttling features on, so the processor idles at about 3.6 GHz, and peaks at 4.4 GHz under full load. I've run it for two hours at 100% (using Prime95) and the core temps haven't gotten much past 70 C (which is supposedly OK with the new Ivy Bridge processors). Regular high-usage tasks like video encoding run at around 50-60 C.
DeleteThe ASUS MB seems to override the turbo having different speeds for different cores too, so I get the full speed from all four. I'm planning a post on the overclocking and cooling to cover this a bit more.
I am not sure about the last question... from what I understand about the Intel turbo technology, it is a function of the BIOS and the processor settings, and the OS doesn't factor into it. The ASUS AI II suite modifies the BIOS settings from within Windows, but that's definitely not the same as a system call to Windows. I will have to look into this further.
Looks like an economical and functional resolution to overheating. I did a quick check on the Antec h20 620 and folks were reporting around 13 F cooling differentials in their overclocked machines compared with stock non-liquid/pump coolers. Install < 15 minutes, price around $75 - all to the good.
ReplyDeletePCs are really pushing some envelopes these days!