[RESPONDED] Firmware bug: ACPI table error causes missing sensors on Linux 6.7+

After upgrading to upstream’s 6.8.1, I discovered that the acpitz sensors (motherboard temperature sensors) are missing.

The kernel now logs on boot:

ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
ACPI: thermal: [Firmware Bug]: No valid trip points!
ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
ACPI: thermal: [Firmware Bug]: No valid trip points!
ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
ACPI: thermal: [Firmware Bug]: No valid trip points!
ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
ACPI: thermal: [Firmware Bug]: No valid trip points!

It looks like this was caused by:

which now validates the value in the _CRT element for the thermal zone, and ignores any thermal zones where it is invalid.

If I’m reading the DSDT correctly, the _HOT threshold is reported as 0x1218 (463.2 degrees) and the _CRT threshold is 0x12E0 (483.2 degrees). Linux ignores any threshold above 448 degrees as implausibly high.

I’m not sure how to report firmware bugs like this, but can the ACPI tables be fixed in a future firmware update so that Linux continues to report thermal information?

5 Likes

As that commit says no intended functional impact I think you should also report a kernel bug to kernel bugzilla.

I filed 218652 – acpitz sensors regression on Linux 6.7+ on Framework 16 upstream for this. Still, it seems like both sides are buggy - even if they restore the missing sensors, without valid thresholds, Linux can’t suspend itself automatically when the system is overheating.

I agree with you based on what you’ve said above. You should file a report with framework support so they can get the bios side fixed in a future update.

I just saw this now and wanted to mention I’ve also got a bug reported: 218586 – No ACPI Thermal Zones after Kernel 6.8

I’ve got 2 patches that ‘fix’ this:

  1. Keeps the warning, but continues registering the thermal zones
  2. Raises the valid limits to 488K (215°C) so it’s now valid again
2 Likes

This is still an issue with the Framework 3.03 beta firmware and kernel 6.8.2-zen2-1-zen.

I think you should report this to framework support if you haven’t already so they can track it to get fixed with the engineering team.

I tried, but I got stuck in a cycle of irrelevant questions about things like which SSD brand I’m using x_x

Looks like your ticket is waiting on you to provide requested logs. From there, the ticket is then escalated and if need be, will be sent to engineering.

The kernel logs are already here and in the linked kernel.org bugs, but I just replied to the ticket with a copy of them there, too.

It looks like Kernel 6.9 will be getting a fix that will allow the thermal zones to register despite the bogus trip values. However, that does not eliminate the need for Framework to put valid thermal zone data in the ACPI tables for the various operations that may utilize the trip values.

1 Like

Is this bug report now tracked by the framework support?

1 Like

I’m currently on kernel 6.9.5 and the issue is still present.

Also on Debian Trixie with kernel Linux phoenix 6.9.12-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.9.12-1 (2024-07-27) x86_64 GNU/Linux
the issue still exists:

Aug 04 11:34:59 phoenix kernel: ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: [Firmware Bug]: No valid trip points!
Aug 04 11:34:59 phoenix kernel: thermal LNXTHERM:00: registered as thermal_zone0
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: Thermal Zone [TZ00] (41 C)
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: [Firmware Bug]: No valid trip points!
Aug 04 11:34:59 phoenix kernel: thermal LNXTHERM:01: registered as thermal_zone1
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: Thermal Zone [TZ01] (41 C)
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: [Firmware Bug]: No valid trip points!
Aug 04 11:34:59 phoenix kernel: thermal LNXTHERM:02: registered as thermal_zone2
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: Thermal Zone [TZ02] (40 C)
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: [Firmware Bug]: Invalid critical threshold (-274000)
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: [Firmware Bug]: No valid trip points!
Aug 04 11:34:59 phoenix kernel: thermal LNXTHERM:03: registered as thermal_zone3
Aug 04 11:34:59 phoenix kernel: ACPI: thermal: Thermal Zone [TZ03] (78 C)

I don’t know about the messages mentioned above. But I would very much like to see each temp sensor named/described in the ACPI tables.
I.e. Give a name to temp1, temp2, temp3, temp4 etc.
All I have currently is:
acpitz-acpi-0
Adapter: ACPI interface
temp1: +46.8°C
temp2: +48.8°C
temp3: +46.8°C
temp4: +45.8°C

1 Like

good news, i can KINDA help with that

if you use dhowett’s ectool fork, you can correlate the temperatures ectool shows with the acpitz-acpi-0 sensors. i did that for my fw16, and also was guided to a framework hosted repository that has config files for lm-sensors to show names for these values

of course, ectool and that set of config files disagree, and if you look you can see i commented on the linked pull request with my findings and the results of some testing. (plus the pull request has ‘double-check sensor names’ as a todo, so it seems plausible these aren’t final/validated values)

my understanding is the temp1 through temp4 values SHOULD be the same for an amd fw13 (or an fw16 without dgpu), it’s temps 5 through 8 that are specific to having a dgpu.

in a more distant future, newer linux kernels (6.11 ish, as i understand it?) are expected to have working embedded controller drivers for our machines, which will let the kernel get at things more directly

1 Like

While I feel like it should be easy enough for a Framework employee to just ask an engineer, in the absence of direct info from Framework, if you have not already considered it, it might be quicker and easier to use a hair dryer/heat gun to blow some warm air on the board and find the sensor locations that way. :slight_smile:

hmmm. hadn’t considered that, no. well, now i guess a hilarious new conversation starter with the spouse is on the table.

“My laptop needs your hair dryer …”
“Your laptop needs what ???”
“It needs your hair dryer”
“what does it need that for ???”
“to dry its hair of course …”
“but it doesn’t have any hair!!!”
"You haven’t seen the hairy things I’m going to do to it … "

1 Like

How about this problem - any updates on this?

This certainly needs to get fixed, and the fix should be simple.