Framework 16: High Fan RPM after Suspend / Hibernate

Hey there,
just trying, maybe someone has an idea for me:

I’ve got a Ryzen™ 9 7940HS with the AMD Radeon™ RX 7700S expansion board. Factory build. Currently I have EndeavourOS installed.

Now for the fun part: While everything else is running just fine, whenever the Notebook goes to Suspend or Hibernate, after returning it puts the right fan (it seems) spins up to full RPM, just leaving it there until a full power cycle (shut off). Simple reboot does not help.

The left fan seems somewhat unaffected, adding a lot of CPU load makes that spinning up, too - but getting the load down also makes it roll back again.

I think I tried all of my tricks, maybe someone could give me a hint in the right direction for something to check …

Greetings.

some things first.

  1. do you have the latest bios installed?

  2. have you verified that the cpu, or other system resources, isn’t running hot?

  3. do you have the latest kernel, drivers, and firmware installed?

/Zoe

  1. Yes, 3.05
  2. Yes, cold as stone, actually with the fan running for a bit after Suspend / Hibernate, the notebook gets very cold :grinning_cat_with_smiling_eyes:
  3. Yes, fully upgraded EndeavourOS, Kernel 6.13.5-arch1-1; Framework System Firmware 0.0.3.5

Some more testing also done:

The problem only appears if the AMD Radeon™ RX 7700S expansion board is mounted. Without it, no problem.

did the fault start after you installed a new kernel?, 6.13.5 is quiet recent.

/Zoe

Since the Notebook is like 3 days old, there never was an old kernel to begin with :grinning_cat_with_smiling_eyes:

As far as I can establish, it might go into the direction of the ACPI thermal zone bug, that should have been fixed somewhere around BIOS / Firmware 3.03 (see: this post and/or around Kernel 6.9), which makes me wonder:

If I have the Expansion Bay Shell in, I’ve got 4 ACPI thermal zones (0-4), all of them showing proper temperatures (sensors). Suspend and Hibernate both work fine.

If I add the AMD Radeon™ RX 7700S expansion board, there are 4 zones added (4-7) for a total of 8. But: I get the error

ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000) ACPI: thermal: [Firmware Bug]: No valid trip points!

for each of them (journalctl -b | grep thermal). And the four additional sensors do not show any values (sensors).

have you installed the linux-firmware package?, it sounds like the fault is the firmware on the gpu, alternatively a hardware fault.

/Zoe

Just had to build it back in again to check:

The GPU has Firmware 113-BRT125778.001 as per gnome-firmware, with no newer firmware available as to fwupdate:

$ fwupdmgr update
Devices with the latest available firmware version:
 • Fingerprint Sensor
 • System Firmware
 • UEFI dbx
 • Unifying Receiver
Devices with no available firmware updates: 
 • AMD Radeon RX 7700S
 • Hub
 • WD BLACK SN770 2TB

The moment the card comes back in, I get a report from sensors like this:

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +40.8°C  
temp2:        +41.8°C  
temp3:        +39.8°C  
temp4:        +37.8°C  
temp5:         -0.2°C  
temp6:         -0.2°C  
temp7:         -0.2°C  
temp8:         -0.2°C  

and the errors

Mär 07 00:59:38  kernel: thermal LNXTHERM:00: registered as thermal_zone0
Mär 07 00:59:38  kernel: ACPI: thermal: Thermal Zone [TZ00] (33 C)
Mär 07 00:59:38  kernel: thermal LNXTHERM:01: registered as thermal_zone1
Mär 07 00:59:38  kernel: ACPI: thermal: Thermal Zone [TZ01] (35 C)
Mär 07 00:59:38  kernel: thermal LNXTHERM:02: registered as thermal_zone2
Mär 07 00:59:38  kernel: ACPI: thermal: Thermal Zone [TZ02] (37 C)
Mär 07 00:59:38  kernel: thermal LNXTHERM:03: registered as thermal_zone3
Mär 07 00:59:38  kernel: ACPI: thermal: Thermal Zone [TZ03] (50 C)
Mär 07 00:59:38  kernel: ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
Mär 07 00:59:38  kernel: ACPI: thermal: [Firmware Bug]: No valid trip points!
Mär 07 00:59:38  kernel: thermal LNXTHERM:04: registered as thermal_zone4
Mär 07 00:59:38  kernel: ACPI: thermal: Thermal Zone [TZ04] (28 C)
Mär 07 00:59:38  kernel: ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
Mär 07 00:59:38  kernel: ACPI: thermal: [Firmware Bug]: No valid trip points!
Mär 07 00:59:38  kernel: thermal LNXTHERM:05: registered as thermal_zone5
Mär 07 00:59:38  kernel: ACPI: thermal: Thermal Zone [TZ05] (27 C)
Mär 07 00:59:38  kernel: ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
Mär 07 00:59:38  kernel: ACPI: thermal: [Firmware Bug]: No valid trip points!
Mär 07 00:59:38  kernel: thermal LNXTHERM:06: registered as thermal_zone6
Mär 07 00:59:38  kernel: ACPI: thermal: Thermal Zone [TZ06] (28 C)
Mär 07 00:59:38  kernel: ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
Mär 07 00:59:38  kernel: ACPI: thermal: [Firmware Bug]: No valid trip points!
Mär 07 00:59:38  kernel: thermal LNXTHERM:07: registered as thermal_zone7
Mär 07 00:59:38  kernel: ACPI: thermal: Thermal Zone [TZ07] (0 C)