Hi @kelnos,
Your ticket is in with the engineering escalations, I have added a note indicating that you are on the 3.08 BIOS. What games are you trying to play and with which distro/kernel?
Hi @kelnos,
Your ticket is in with the engineering escalations, I have added a note indicating that you are on the 3.08 BIOS. What games are you trying to play and with which distro/kernel?
Thank you for the update, @Matt_Hartley.
My usual game reproducers are Left 4 Dead 2 and Stellaris. Sometimes I can go an hour or so with L4D2 without seeing it occur, but with Stellaris it usually repros within 20 minutes or so. I can repro faster if the vents on the bottom of the laptop are partially blocked by my lap, though it still happens soon enough if it’s on a hard surface like a table.
These are not particularly “strenuous” games; they play just fine on my older 2018 Dell XPS 13.
I’m currently running Debian testing (trixie), kernel 6.6.13. This also occurs on Debian stable (bookworm), any kernel in the 6.2, 6.3, 6.4, 6.5, and 6.6 series, and likely older.
During previous support interactions, I’ve also reproduced this using the Ubuntu 22.04 LTS image linked from Framework’s Linux page, by creating a tmpfs volume and compiling the Linux kernel (with make -j20
) over and over.
The OS doesn’t seem to matter once it’s in this state: I can reboot the laptop and enter BIOS settings or the GRUB prompt, and if it’s still throttled I can see sluggish cursor movement and screen redraw.
I’ve disabled turbo boost (both in the BIOS and in sysfs on Linux, and in tlp’s config). I’ve also experimented with throttled, limiting TDP before doing some “softer” throttling, but that hasn’t helped. I’ve also run through support’s suggestions to try removing and shuffling the RAM around, as well as removing the NVMe drive (running the OS off a USB stick) and Wifi card… plus a lot of other stuff they’d asked me to do.
Appreciate the update. We’ll need to see what the escalation team is able to sort from this. In instances where reproduction is tricky, this becomes harder.
On the 16th, we emailed you with thoughts from our engineering team. I’ve added in your latest feedback about the games and tmpfs volume into the ticket as well.
Yep, got the email, unfortunately while I was out of the country. Just got back yesterday and will try the new troubleshooting idea.
You were asking in another thread what thermal paste I used to solve this issue for me. It’s this noctua thermal paste from amazon. I took my time and was very deliberate about removing the old thermal paste. I even used a little isopropyl alcohol on a q-tip to completely remove it. Then I was pretty generous with applying the new paste.
Other than that, all I did was to use compressed air to blow out the fans. Hope this helps.
I haven’t had the opportunity to try new thermal paste, but I’ve been playing with a possible workaround, at least for gaming: restricting the GPU’s max clock frequency. This seems to be more or less working, though at the expense of worse performance in games, and the need to reduce quality settings in the games themselves.
On Linux I set the values in both /sys/class/drm/card0/gt_max_freq_mhz
and /sys/class/drm/card0/gt_boost_freq_mhz
to something lower. Looks like they default to 1450. I started by dropping them down to 650, and have been inching them back up, 100MHz at a time, with so far good results up to 950MHz.
My admittedly qualitative test is to stretch out on my couch, put a blanket over myself, the laptop on top of my lap with the blanket under it, and start playing Civilization VI. At the default of 1450MHz, the throttling kicks in after 10 minutes or so. At 950MHz and below I was able to play for several hours with no issues.
I’m still testing (next stop: 1050MHz), so hopefully I still have room to run it faster here. This is a decent workaround: I’m happier to be able to play games at all, vs. having to save and quit every so often when the laptop decided to misbehave (especially a problem with online multiplayer games). But it’s still pretty lame that I have to live with worse perf and lower graphics quality for this to be usable.
And this isn’t a complete workaround. As I’ve mentioned, I can trigger this just by running Linux kernel compiles at full-tilt, even when the GPU is more or less idle.
Update: 1050Mhz was too high. I did manage to play for a about 3 hours before it throttled and got stuck. Will try bumping down to 1000MHz and see if that’s stable. Otherwise it might be 950.
Update2: 1000MHz was no good either; this time it throttled after an hour and a half or so. Back to 950.
Running on an i7-1260p, 64GB RAM, 1TB NVMe, 3.08 BIOS, under Fedora 40.
I changed the thermal paste to a Noctua NT-H2 and cleaned up the fan, the improvement is marginal.
I have disabled Hyper-Threading, yet when is under load (compiling for a couple of minutes) it reaches above 95°C-100°C it starts throttling to 400 MHz and it takes almost half an hour to return to its normal state (when in idle).
The only solution I have found is to limit the CPU frequency using cpupower
to 1.8GHz:
sudo cpupower --cpu all frequency-set --max 1.8GHz
this way the temp is around 80°C under load but at least it does not throttle.
This sucks big time since I select an i7 to have a powerful processor, yet I have to limit it to behave like an i3 (or less).
I don’t know why the cooling system from 13in laptops (in general) is awfully bad, I know about the space constraints, but the way the fan is positioned is just a poor design from every single vendor, airflow is pretty bad and the heatsink too small for a power chip.
Note for my future self: Buy the lower-end CPU available, it will work the same as a high-end CPU.
Hyperthreading doesn’t reduce the max thermal output much.
It almost sounds like it runs into one of the harder thermal limits (but not the had hard one otherwise you’d just have a crash) and panics to 400mhz, have you tried turning down the soft thermal limit and see if that helps?
The cooling in the framework 13 is actually pretty nice for the size, I think there is something screwy going on with yours.
You are using intel_pstate and not some legacy frequency scaling thing right?
My experience tells me otherwise, you might be surprised but it helps.
The cooling is pretty nice? not sure if you are using the i5 version, have a special limited edition, or living in Norway, but every single laptop of this size has this cooling design flaw, I’m not blaming Framework, every vendor has this issue, for me, the cooling of the framework is pretty standard, and yes, I’m using intel_pstate.
It can reduce it in certain workloads but you can definitely easily still exceed the power limit without it.
I’m using the r7 version, stock it can cool a bit under 30w, with ptm or liquid metal around 45, that’s quite a lot for that size bracket.
The throttling behavior you are seeing does not look right though, proper throttling should ride the power/temp limit and not crash to 0 when it reaches the max temp. The newest intel notebook I have access to is 11th gen and it does pretty much exactly that, so do the older ones. Can you reproduce this behaviour on windows or an ubuntu live system?
Edit:
In active mode?
Yes, under certain workloads it helps, and yet as mentioned, I still hit the thermal throttling even without HT.
I don’t have an “r7” version, I have the i7-1260p, and not use liquid metal either, then it might be my fault for not using liquid metal
the whole thread is about the 12th gen, so it might be an issue specific to the 12th gen, in any case, that doesn’t invalidate that practically every 13-inch laptop (any vendor) has a poor cooling system.
$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
$ cat /sys/devices/system/cpu/intel_pstate/status
active
Don’t worry, I can live with this issue, so this is my last reply.
Nah, even 30W should be more than enough for quite a bit of performance. The liquid metal just gets me more of it.
This is most likely not a cooling issue but a frequency scaling one. Even desktop chips are power limited these days. Running at the temp/powerlimit is normal these days, partially even on desktop, getting stuck at min clocks is not.
The stuck at 400mhz thing is not supposed to happen, that’s some kind of bug.
That works too I suppose but it’s kind of a waste.
Ouch, I just watched this: https://youtu.be/H8LrwI-I_fY?si=RXVFGluAKXAiNb-s
The 12th gen does not crash, but if this is true, my next CPU will be an AMD (or with some luck, a Snapdragon X ).
That is entirely unrelated to you issue but definitely quite the issue (hell it doesn’t even concern laptops but just the roided out of their mind high end desktop chips), especially the complete non handling of it from intel.
What did the escalation team find?
Why am I wasting hours of my life trawling forum posts for yet another FW issue that should never even have a hint of occurring?
This computer is a joke.
Who knows? My last reply from Matt on my support case was back in February, after they asked me to try something else (which didn’t help). I emailed again in April to ask for an update, but no one replied.
It’s incredibly frustrating.
Two nights ago I was in bed with the laptop, with it resting on the blankets on my lap, just doing some regular activities (writing code, browsing the web), and it started throttling. CPU was nearly idle; one of the cores was doing some more or less constant work, but that was it. Temperatures were around 55°C Like… what? How is this acceptable? I decided to “ride it out” this time just to see what would happen, continuing to use it, as slow as it was, and it stopped throttling about an hour and a half later, even though internal temps never went up farther than 60°C or so.
Restricting the GPU to 950MHz has made gaming more reliable, but I still get throttling here and there while playing. And of course it’s not just the GPU; heavy (or sometimes not at all heavy) CPU workloads can and do often cause throttling.
So, @Matt_Hartley ? What was found?
I don’t think responding to issues that render the laptop virtually unusable is optional. What has happened in the last 217 days?
Or are you just going to upgrade me from the absolute trainwreck that is 12th gen intel to absolve yourselves of the responsibility to fix the myriad of issues?
Class action lawsuit anyone?
I apologize this was not replied to earlier. I have brought this to the attention of the escalation team so they can review the ticket. It’s in a queue I do not see updates for, so I am watching it now.
I will be responding to this now and also adding it to my watch list so I can make sure you are not forgotten.