[RESPONDED] Changing the fan temperture points with ectool

May I ask if you are using the 11th gen or 12th gen model? Looking to install on my Framework Laptop running PopOS too.

I use a 11th Gen Mainboard.

You can check the thread below. My CPU temperature is always less than 60 degrees by setting it up.

1 Like

I wonder if there are any long-time experiences. @Rene_Treffer Are you still using this? Have you experienced any issues?

@real_or_random been using it all the time. No complications, laptop runs hotter but more silent. Only annoyance is that I sometimes need to manually reapply this.

2 Likes

I am running the fw-fanctrl service on 12th gen FW and in deaf mode but sometimes it tells me I don’t have any battery left and shuts down not long afterwards.

I keep on using it though

Curious, are you able to take screenshot when this happens?

I’ll try and post it here although there isn’t much to see.

I get a battery critically low notification and something like 10 seconds later the laptop shuts down.

I am running Fedora’s cinnamon spin so it may be a Cinnamon bug ? It wouldn’t be the first bug I get

Edit: @Loell_Framework So yeah it happens too fast for me to screenshot it. Another piece of information is that I am maxing out my battery at 60% in the UEFI since it’s stationary.

Edit 2: Actually there’s a Cinnamon setting to “Do nothing” when the battery is extremely low (which I never let happen anyways). So I guess I jsut “fixed” my issue

1 Like

Any updates about fw-fanctrl? AFAIK the fan control monitors the cpu_f75303@4d which is NOT the CPU temperature, the actual CPU temperature is cpu@4c but the fan only starts when the CPU is already thermal shutdown(103 C and 105 C), as shown

$ sudo ectool temps all
--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d       319 K (= 46 C)          20% (313 K and 343 K)
cpu_f75303@4d         321 K (= 48 C)          25% (319 K and 327 K)
ddr_f75303@4d         315 K (= 42 C)        N/A (fan_off=401 K, fan_max=401 K)
cpu@4c                365 K (= 92 C)           0% (376 K and 378 K)

Is it possible for fw-fanctrl to use the temp reading of cpu@4c and edit the fan curve on that one accordingly?

I’ve been using this systemd service for a while, and I’m happy with it:

Use this at your own risk, of course.

I’ve been having issues with my framework 16 overheating and shutting down, so I tried to follow the advice in this thread to configure my fans to be a bit more aggressive.

I managed to install ectool (I’m running NixOS, so I just installed the default version of fw-ectool available on nixpkgs), but it’s giving me a rather odd output, with a bunch of zeroes.

> sudo ectool thermalget
sensor  warn  high  halt   fan_off fan_max   name
  0      363   363    378      0       0     ambient_f75303@4d
  1      363   363    378      0       0     charger_f75303@4d
  2      363   363    378    320     335     apu_f75303@4d
  3      381   381    400    320     335     cpu@4c
  4        0     0      0      0       0     gpu_amb_f75303@4d
  5      344     0      0    323     347     gpu_vr_f75303@4d
  6        0     0      0      0       0     gpu_vram_f75303@4d
  7        0     0      0    323     353     gpu_amdr23m@40

Can anyone help me figure out what is going on here, and how I can make my laptop not overheat?

1 Like

Overheating and shutting down seems like a defect unless you ambient temp is like 40c+

Are you fans running at all? Or what is the situation this happens in?

As for the zeros I suspect those must mean there are either no temps or fan speeds set for those sensors.

It seems to happen specifically when my laptop is both plugged in to wall power and under load (specifically, light gaming, I haven’t had any trouble with CPU-only loads like compiling).

I suspect it’s the battery or charging circuit that is overheating, since even right after it forcefully reboots, and is still very warm to the touch, btop reports a CPU temp around 50 to 60 C, which feels very low for a laptop that literally just overheated.

The fans do run, but I don’t notice a difference in fan speeds when it’s plugged in (and running much hotter) versus when it’s not

Sounds like a problem with the board that should be investigated. It’s a hard shutdown with nothing weird in the logs?

I tried to search through the logs with journalctl -g 'temperature' -S 2024-09-05, since I found some resources claiming that a shutdown because of overheat would be logged as “critical temperature reached”, but there were no entries that matched.

I guess that means it’s not the OS that’s deciding to reboot, but rather the board? I was a bit worried I’d caused this myself by using Nix (which I think is not officicially supported), but if it’s the board I should be safe.

@a_framework_owner I’m curious if you managed to find out more about the root cause? I’m running into the same behaviour: once my laptop gets to about 10% battery it shuts down. I’m running Gnome on Nix, so I guess that rules out software. I did also set the battery limit (first at 60%, later tweaked it to 80%), I’m going to try turning that off and seeing if it fixes it

I would be looking before one of these events to see if it says anything rather than something specific.

Ah, there is in fact something weird. I just had a forced reboot (around 22:40 local time), and there are repeated hardware errors in the system log in the 20 minutes preceding.

Sep 08 22:25:20 nixos kernel: [Hardware Error]: Corrected error, no action required.
Sep 08 22:25:20 nixos kernel: [Hardware Error]: CPU:15 (19:74:1) MC1_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0xdc20000006030151
Sep 08 22:25:20 nixos kernel: [Hardware Error]: Error Addr: 0x00007f1e4a70ff40
Sep 08 22:25:20 nixos kernel: [Hardware Error]: IPID: 0x000100b0200eab00, Syndrome: 0x000000001a00417a
Sep 08 22:25:20 nixos kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 3, IC Data Array Parity Error.
Sep 08 22:25:20 nixos kernel: [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
Sep 08 22:25:20 nixos kernel: mce: [Hardware Error]: Machine check events logged
Sep 08 22:25:20 nixos kernel: [Hardware Error]: Corrected error, no action required.
Sep 08 22:25:20 nixos kernel: [Hardware Error]: CPU:14 (19:74:1) MC1_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0xdc20000006030151
Sep 08 22:25:20 nixos kernel: [Hardware Error]: Error Addr: 0x00007f4d8c1a0e00
Sep 08 22:25:20 nixos kernel: [Hardware Error]: IPID: 0x000100b0200eaa00, Syndrome: 0x000000001a004170
Sep 08 22:25:20 nixos kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 3, IC Data Array Parity Error.
Sep 08 22:25:20 nixos kernel: [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
Sep 08 22:25:40 nixos .xdg-desktop-po[2871]: Failed to stop screen cast session: GDBus.Error:org.freedesktop.DBus.Error.Failed: Session not s>
Sep 08 22:28:27 nixos kernel: perf: interrupt took too long (2540 > 2500), lowering kernel.perf_event_max_sample_rate to 78000
Sep 08 22:30:47 nixos kernel: mce: [Hardware Error]: Machine check events logged
Sep 08 22:30:47 nixos kernel: [Hardware Error]: Corrected error, no action required.
Sep 08 22:30:47 nixos kernel: [Hardware Error]: CPU:15 (19:74:1) MC1_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0xdc20000006030151
Sep 08 22:30:47 nixos kernel: [Hardware Error]: Error Addr: 0x01ffffff85e01d00
Sep 08 22:30:47 nixos kernel: [Hardware Error]: IPID: 0x000100b0200eab00, Syndrome: 0x000000001a004168
Sep 08 22:30:47 nixos kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 3, IC Data Array Parity Error.
Sep 08 22:30:47 nixos kernel: [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
Sep 08 22:30:47 nixos kernel: mce: [Hardware Error]: Machine check events logged
Sep 08 22:30:47 nixos kernel: [Hardware Error]: Corrected error, no action required.
Sep 08 22:30:47 nixos kernel: [Hardware Error]: CPU:14 (19:74:1) MC1_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0xdc20000006030151
Sep 08 22:30:47 nixos kernel: [Hardware Error]: Error Addr: 0x01ffffff85397f40
Sep 08 22:30:47 nixos kernel: [Hardware Error]: IPID: 0x000100b0200eaa00, Syndrome: 0x000000001a00417a
Sep 08 22:30:47 nixos kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 3, IC Data Array Parity Error.
Sep 08 22:30:47 nixos kernel: [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
Sep 08 22:35:50 nixos kernel: perf: interrupt took too long (3187 > 3175), lowering kernel.perf_event_max_sample_rate to 62000
Sep 08 22:36:15 nixos kernel: mce: [Hardware Error]: Machine check events logged
Sep 08 22:36:15 nixos kernel: [Hardware Error]: Corrected error, no action required.
Sep 08 22:36:15 nixos kernel: [Hardware Error]: CPU:14 (19:74:1) MC1_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0xdc20000006030151
Sep 08 22:36:15 nixos kernel: [Hardware Error]: Error Addr: 0x01ffffff85e01d00
Sep 08 22:36:15 nixos kernel: [Hardware Error]: IPID: 0x000100b0200eaa00, Syndrome: 0x000000001a004168
Sep 08 22:36:15 nixos kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 3, IC Data Array Parity Error.
Sep 08 22:36:15 nixos kernel: [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD
Sep 08 22:36:15 nixos kernel: mce: [Hardware Error]: Machine check events logged
Sep 08 22:36:15 nixos kernel: [Hardware Error]: Corrected error, no action required.
Sep 08 22:36:15 nixos kernel: [Hardware Error]: CPU:15 (19:74:1) MC1_STATUS[Over|CE|MiscV|AddrV|-|-|SyndV|-|-|-]: 0xdc20000006030151
Sep 08 22:36:15 nixos kernel: [Hardware Error]: Error Addr: 0x01ffffff85de0b80
Sep 08 22:36:15 nixos kernel: [Hardware Error]: IPID: 0x000100b0200eab00, Syndrome: 0x000000001a00415c
Sep 08 22:36:15 nixos kernel: [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 3, IC Data Array Parity Error.
Sep 08 22:36:15 nixos kernel: [Hardware Error]: cache level: L1, tx: INSN, mem-tx: IRD

Doesn’t look good. Looks like a bad CPU from a quick googling. Things I would try:

Stress test CPU and check temperature across all cores and check log for messages like these.
Make sure your BIOS is up to date
Make sure your kernel is up to date
Confirm you’ve had those events around the time of other reboots.

But I am guessing you need a new mainboard. I might reach out to support right away with the logs.

Yeah, I noticed that the errors were always about CPU 14 and 15 (and always both at the same time). I disabled those CPUs and the problem magically disappeared, it seems most likely I’ve got a bad core, so will indeed be reaching out to support.

@a_framework_owner disabling the suspected faulty CPU cores also made my battery-related shutdowns disappear. Since you described similar problems, you might want to check your syslog as well, to see if your CPU is also faulty

1 Like