Thermal shutdown

I started playing some games on my Debian laptop. After some time of intense fan blowing, the laptop shut down. I figured this was likely a thermal shutdown - the device turning itself off to protect itself.

I started monitoring the temperature - and it was getting to 100ºC at times, while I was playing my game. This is terrible; I like my hardware cool. The game did what it was supposed to - use up all available resources to give me the most fluid gameplay. But the laptop did not - it was supposed to throttle itself to prevent overheating before it simply shuts down for a thermal emergency. Instead, the default Intel drivers were pushing it into constant Turbo Boost.

Heat and electronics

Why do I like my hardware to stay cool?

Because it prolongs its life. There are many electronics that suffer accelerated wear when hot, so, the cooler they run, the better.

A report by Texas Instruments gives the Arrhenius relationship for their components designed to run at 105ºC for 10 years.

This is used to calculate an "acceleration factor" which predicts how long a component survives at different temperatures. If you run the component at 95ºC instead of 105ºC (just 10ºC cooler), you get a whopping 79% increase (so, the component will work for 17.9 years instead of 10).

Graph showing the relationship between temperature and acceleration factor

(Note the logarithmic vertical AF scale)

Another chart is offered by Overclockers.net, and it's useful to get ballpark values for a CPU's life expectancy wrt. temperature. It should really be taken with a grain of salt - your mileage may vary.

Software solutions

I have researched software solutions for this problem. I have found the best Linux tool for the job: cpufreqd. It allows you to specify custom rules for changing CPU states.

Here are my settings for the laptop and desktop I use, as well as useful commands to deal with this tool.

I have had no more thermal problems since using cpufreqd, however, the caveat is that my game plays with a noticeably lower framerate.

Cool beans! :)

So far, I use only the CPU for playing games. I need to figure out how to use cpufreqd with my closed-source and not-very-usable-under-open-source Nvidia video card before I can use it (I got thermal crashes before). There is no open-source driver support for the GTX 850M yet. Hopefully I can still use cpufreqd on it. If you have any tips, please mail me, and you can get featured on this blog if you'd like!

Cleaning, and thermal compounds

I thought cleaning my laptop was the right course of action. I took it to a laptop cleaning place, and asked them to replace the "thermal paste", but it did not seem to improve much.

I have since learned a bit about thermal pastes, and I can recommend you ones with a high thermal conductivity. Here is the video that made me interested in this topic. A Linus other than Mr. Torvalds shows that you can lower your laptop CPU temperature by 20ºC by replacing the stock thermal compound with a possibly dangerous aluminum-dissolving and electrically-conducting liquid metal paste from "Thermal Grizzly".

This is because the stock thermal paste is, paradoxically, among the least thermally conductive points in the heat dissipation pipeline. Replacing it with a better performing one is, perhaps, more important than having a better heatsink or air flow.

I have not yet replaced my thermal paste, since I am still reading about how to mitigate the risks of a metal paste, as well as researching non-conductive pastes. Also, very few seem to be available in Romania. (Here is one I found so far, but I have not bought it yet).

On liquid metal ones:

  • They have the best thermal conductivity, usually, up to 13 W/(m*K).
  • You have to protect the electrical components near the CPU or GPU for which you're replacing the paste, such as by using a skirt around them made of Kapton tape.
  • Also, you have to be extra careful with the material compatibility and not to spill it on some other places on the motherboard.

On others:

Note, however, that many vendors overstate their thermal conductivity. For instance, the study measured Arctic Silver 5 at 0.94 W/(mK), but the vendor claims 8.7 W/(mK) - that is an overstatement of over 900%.


Hope you find this post useful. If so, drop me a comment or an e-mail.

Have a cool day!

Comments