[RESPONDED] Excessive CPU thermal(?) throttling on 12th-gen

kelnos · August 19, 2022, 8:46pm

Ah, somehow a search didn’t find that thread. Thanks for pointing it out!

George_Zeng · August 20, 2022, 3:03pm

Just saw a video on this with the same issue.

It may be that it reaches its critical temperature, thermal throttles, and there’s a bug where it doesn’t know how to throttle back up from 399 mhz.

Maybe changing the thermal throttling limits down 3-5C on throttlestop or wherever would help… I dunno what it is right now, but if its set to throttle at 100C, switch it down to 97-95, or if it is at 95C, switch it down to 92-93C. This causes it to throttle sooner and maybe more frequently, so if the bug could happen at anytime you thermal throttle, you may experience this bug sooner, though.

Another fix may be to limit the max turbo frequency on every core 100mhz, which helps decrease max power spike, and or set PL1 and PL2 to infinite time and change the value to a TDP where it CPU is happy to stay at, say somewhere between 92-93C, where it shouldn’t thermal throttle. You might be able to do this by setting TDP to 28W and running either Throttlestop benchmark or Prime95 for 5 minutes and see where to max Temp stabilizes at a certain temperature and adjust the TDP to a point where it will never throttle.

Shiroudan · August 20, 2022, 5:18pm

Intel has a bug that if it thermal throttles for too long CPU gets stuck at 399mhz.
Could your thermal paste or mounting be bad?

kelnos · August 22, 2022, 4:06am

Not really sure how to evaluate that… nothing looked odd on the board when I was installing the RAM and storage, but I imagine all that is hidden under other parts?

Shiroudan · August 22, 2022, 12:14pm

Alright! I was just worried you’d taken off the heatsink and fan and re-mounted it badly! It may help temps to repaste the CPU die though.

lhl · August 24, 2022, 6:04am

A few things to try out, since we know the issue (basically your CPU is hitting 105C and trigger PROCHOT), a few solutions:

Since you’re only hitting this after a few hours of gaming/being fully thermally saturated, have you considered using a laptop stand or an active cooling pad? Honestly, anything to make sure the fans and vents aren’t being blocked might be enough to make sure it doesn’t happen.
Make sure you are running thermald and adjust the thermal-conf.xml and lower the TripPoint Temperature. You should be able to force your CPU to throttle before it gets to the danger zone.
For more fine grained control, you can also manually set your power limits to enforce a lower power limit.

You can also run an app like MangoHud which will show you your CPU and GPU temps (or more detailed tools like s-tui, turbostat, or sensors on a remote terminal) if you need to be more aggressive w/ your thermal mitigation measures.

kelnos · August 24, 2022, 6:11am

I suppose I could, but I like to game in various places, often with the laptop on my lap (in bed, sometimes, even), and ensuring I have the thermal pad nearby and ready seems like a pain. (Yes, I know, I’m a little weird.)

Ah, will give it a try! I just installed throttled and was going to see if that helped. But seems like thermald might be easier to configure.

I’ve been using s-tui for the past few days and it’s been really useful to help me see what’s going on.

Simon_F · August 24, 2022, 9:23am

Another useful tool for gaming would be an fps limiter.
Get GOverlay and limit your CPU to e.g. 30 fps.
Your system will now only render the 30 fps you have set, even if it would be capable of rendering more. This will keep your system cool(er) and even get you more consistent frame times

kelnos · August 24, 2022, 11:17pm

Hmm, I think I spoke too soon. My attempt to use thermald resulted in triggering the 400MHz issue while the laptop was sitting in a cool room on a cool glass table, with all temperatures in the 40-45C range. I waited over an hour afterward, and eventually had to shut off the laptop and leave it powered down for 10 minutes (after 5 minutes, it was still throttled when I powered up) to get it to go back to normal. Back to giving throttled a try. I think I just don’t know how to configure thermald correctly; seems more complicated than it needs to be (thanks, Intel…).

kelnos · June 1, 2023, 9:36pm

Edit: nevermind after another half hour or so, one of the core temperatures spiked to 100C, and now all cores are stuck throttled down to 400MHz again.

Ok, finally I think this is solved. After going through a torturous amount of troubleshooting steps with Framework Support, they finally decided to send me a mainboard replacement. I’ve been testing 100% CPU loads for the past hour and a half or so, and I haven’t encountered any throttling.

I’m feeling pretty confident about this because of how drastically different the thermals look with the new board vs. the old. With two simultaneous kernel compiles (using make -j20; make clean over and over in a loop), with the laptop on a glass table (bottom vents unobstructed), temperatures for all cores are stable, and max out at around 71C. If I block the bottom vents (laptop on my blanket-covered lap), temps slowly creep up to about 85C and stay there. Touching the bottom of the laptop at 85C is a little uncomfortable, but doesn’t actually cause pain like with the old board.

With the old mainboard, this kind of load (even with the vents unobstructed) would cause unpredictable temperatures, constantly jumping around to various values between 80C and 100C, without settling to any particular stable temperature.

I am a bit curious as to what’s wrong with the original mainboard. Support had me ensure the fan was clean and working; re-apply thermal paste; remove expansion cards and even RAM, storage, and the WiFi card; and reset the mainboard. The BIOS version on the new and old boards is the same (3.05), and BIOS and OS settings around power management etc. haven’t changed. A visual inspection of the old board doesn’t show any physical damage that I can see.

So, if you’re running into this problem, and you’ve gone through all the troubleshooting steps, I’d suggest you ask your case to be escalated to someone who can authorize a mainboard replacement. Hopefully that helps others as well.

Matt_Hartley · August 17, 2023, 6:14pm

Hi kelnos,

This thread is getting pretty long and goes back a bit, and there are a few things that deviate outside of a vanilla install. If you do not have an active ticket, please create one and link to your posts.

My 12th gen does not exhibit this behavior, but I also keep to a pretty vanilla (guides specific) installation.

Thanks

kelnos · August 17, 2023, 6:31pm

Support had me test on the “blessed” Ubuntu LiveCD linked from Framework’s website, and I was able to easily reproduce the issue, both on my original board, and the replacement board that was shipped to me.

(Regardless, if the Framework laptop cannot run a vanilla Debian stable install – I’m now running stable since testing was promoted to stable in June – that’s… pretty atrociously messed up.)

Not sure if my support ticket is still “active”; they had me perform a bunch of troubleshooting steps, and I got shuffled back and forth between several support people who kept asking me to do the same things I’d already done and had provided results for. At this point I’m not sure what else to do on that support ticket, as they’ve given me no further instructions. It’s been incredibly frustrating, to be honest.

Matt_Hartley · August 17, 2023, 6:54pm

Okay, if this is happening on a replacement board and we cannot replicate it. That does put us in a rock and a hard place as something else we cannot duplicate is happening.

We do not test again distros outside of what we have listed. We do however, have multiple pinned community Debian guides.

I would reply to to your last email from the ticket and see what the suggested next steps would be. I imagine if you were sent another board and are still seeing the issue, it would need to be elevated to engineering as we cannot replicate the environment affecting two separate boards (yours and the replacement).

kelnos · August 18, 2023, 5:00pm

I’m out of town for the next couple weeks, but will do so when I get back home.

Would it not make sense for me to return one of the two mainboards I have that exhibits the problem, and then your support or engineering team can attempt to reproduce it on that board?

Matt_Hartley · August 18, 2023, 10:07pm

Replying to the ticket when you return sounds fine.

Julian_Partanen · August 21, 2023, 5:06pm

Hi everyone,
I encounter the same problem regularly as well. Since I use an eGPU I tend to play a bit heavier games that do not take this throttling lightly: They stutter, freeze and/or crash and sometimes even take down the whole system. I reported this initially in another thread because I didn’t monitor the CPU frequencies back then (htop doesn’t show them): However using s-tui I can monitor that before every crash/freeze the frequency of all cores bumps down to 400 MHz and stays there even after the crash. It can take some time (even through reboots) until the CPU recovers, sometimes it takes half a minute, sometimes more than 10 minutes.
Reproducing this issue is a bit cumbersome, since to me it only occurs after a certain amount of time in the game (0,5-2 hours depending on the game and whether I use the iGPU or not). What I find particularly strange is that in the moment of throttling the CPU isn’t even extremely hot: With my eGPU it sits around 90° up to 95° tops even right before the crash. If I do other CPU-heavy work (which don’t last as long) like compiling code, the CPU can get up to 100° hot without any core throttling down to 400MHz. The only difference I can think of is, that after a 2 hours gaming session the heat has spread over the whole laptop (to other components and the outside of the housing which is quite hot at this point). Maybe this somehow triggers this throttling instead of the temperature inside the CPU? I will definitely do some more testing on this, maybe even on Windows (if this is indeed a Firmware issue, then Windows should be affected as well)

System:
CPU: i7-1260p
RAM: Crucial 3200MHz 32GB
SSD: WD-Black SN850 2TB
OS: Arch Linux (kernel 6.4.11-zen2-1-zen)

Matt_Hartley · August 21, 2023, 5:35pm

Julian_Partanen:

I encounter the same problem regularly as well. Since I use an eGPU I tend to play a bit heavier games that do not take this throttling lightly: They stutter, freeze and/or crash and sometimes even take down the whole system. I reported this initially in another thread because I didn’t monitor the CPU frequencies back then (htop doesn’t show them): However using s-tui I can monitor that before every crash/freeze the frequency of all cores bumps down to 400 MHz and stays there even after the crash. It can take some time (even through reboots) until the CPU recovers, sometimes it takes half a minute, sometimes more than 10 minutes.
Reproducing this issue is a bit cumbersome, since to me it only occurs after a certain amount of time in the game (0,5-2 hours depending on the game and whether I use the iGPU or not). What I find particularly strange is that in the moment of throttling the CPU isn’t even extremely hot: With my eGPU it sits around 90° up to 95° tops even right before the crash. If I do other CPU-heavy work (which don’t last as long) like compiling code, the CPU can get up to 100° hot without any core throttling down to 400MHz. The only difference I can think of is, that after a 2 hours gaming session the heat has spread over the whole laptop (to other components and the outside of the housing which is quite hot at this point). Maybe this somehow triggers this throttling instead of the temperature inside the CPU? I will definitely do some more testing on this, maybe even on Windows (if this is indeed a Firmware issue, then Windows should be affected as well)

Since we do not test against eGPUs, we cannot speak to this.However, if you see throttling, please do test on Windows well for the eGPU. Linux eGPU handling is a mixed bag, so it can be helpful to have a comparison.

kelnos · February 7, 2024, 10:34pm

Hi @Matt_Hartley, I again put this off out of frustration, but finally emailed again on the support ticket 3 weeks ago, and again 2 weeks ago, after receiving no response. Any idea why Framework Support seems to be ghosting me?

Matt_Hartley · February 13, 2024, 6:32pm

Hi @kelnos,

Your ticket is in with the engineering escalations, I have added a note indicating that you are on the 3.08 BIOS. What games are you trying to play and with which distro/kernel?

kelnos · February 13, 2024, 7:20pm

Thank you for the update, @Matt_Hartley.

My usual game reproducers are Left 4 Dead 2 and Stellaris. Sometimes I can go an hour or so with L4D2 without seeing it occur, but with Stellaris it usually repros within 20 minutes or so. I can repro faster if the vents on the bottom of the laptop are partially blocked by my lap, though it still happens soon enough if it’s on a hard surface like a table.

These are not particularly “strenuous” games; they play just fine on my older 2018 Dell XPS 13.

I’m currently running Debian testing (trixie), kernel 6.6.13. This also occurs on Debian stable (bookworm), any kernel in the 6.2, 6.3, 6.4, 6.5, and 6.6 series, and likely older.

During previous support interactions, I’ve also reproduced this using the Ubuntu 22.04 LTS image linked from Framework’s Linux page, by creating a tmpfs volume and compiling the Linux kernel (with make -j20) over and over.

The OS doesn’t seem to matter once it’s in this state: I can reboot the laptop and enter BIOS settings or the GRUB prompt, and if it’s still throttled I can see sluggish cursor movement and screen redraw.

I’ve disabled turbo boost (both in the BIOS and in sysfs on Linux, and in tlp’s config). I’ve also experimented with throttled, limiting TDP before doing some “softer” throttling, but that hasn’t helped. I’ve also run through support’s suggestions to try removing and shuffling the RAM around, as well as removing the NVMe drive (running the OS off a USB stick) and Wifi card… plus a lot of other stuff they’d asked me to do.