[RESPONDED] FW 13 7840U ACPI thermal readout problem

Guest68 · July 5, 2024, 9:13pm

On my relatively new DIY FW 13 7840U I notice erratic and problematic behavior with the ACPI thermal_zone3 [*] temp readout. Several times a minute it will alternate between incorrectly reporting 180800 (180.8°C) a reasonable value. Each rotation lasts several seconds. The other zones (0,1,2) seem to have no such issue. 180800 is the only abnormally elevated reading I observe; in all other cases the value is within a reasonable range of the others. I checked dmesg for any relevant repeating logs, but there was nothing. I’m not sure where else to look.
[*] /sys/class/thermal/thermal_zone3/temp

Is this a behavior others notice? Is there a known cause and or fix? It’s not a critical issue, but I would like to have accurate temperature information.
Additionally, what exactly do each of the the thermal_zones measure on a Framework 13 AMD? I see all four are acpitz type according to cat /sys/class/thermal/thermal_zone*/type, but that’s not particularly informative.

Thanks for any assistance.

System Information:
OS: Artix Linux
Kernel: 6.9.6-6.9.7
BIOS: 3.05

$ for f in $(echo /sys/class/thermal/thermal_zone*); do basename -z $f; echo -ne '\t'; cat $f/temp; done
thermal_zone0	31800
thermal_zone1	32800
thermal_zone2	33800
thermal_zone3	180800

$ acpi -t
Thermal 0: ok, 32.8 degrees C
Thermal 1: ok, 180.8 degrees C
Thermal 2: ok, 31.8 degrees C
Thermal 3: ok, 33.8 degrees C

NB: It appears thermal_zone3 corresponds to Thermal 1 (and temp4 from sensors). Every time I checked the values this was the case. I have no idea why the numbering is different between the three.

Loell_Framework · July 9, 2024, 10:22pm

hi @Guest68,

Welcome to the community, can you try and replicate these readings on Ubuntu 24.04 or Fedora 40 live? just to be certain this isn’t just with Artix linux.

cheers!

Guest68 · July 10, 2024, 10:06pm

Thanks for the response; that’s a good thought. I tried with a live boot of both Fedora 40 and Ubuntu 24.04 (both used 6.8 series kernels), but unfortunately neither appeared to detect any acpitz sensors. Consequently the only devices in /sys/class/thermal/ were cooling_device{0..15}, and acpi -t produced no output.
Is there something I’m missing here?

Loell_Framework · July 11, 2024, 10:41pm

What does sensors command say for both distro?

Guest68 · July 12, 2024, 4:14pm

It seems “.txt” is not an authorized file extension, so here’s the copy-paste of the three.

Artix

ucsi_source_psy_USBC000:003-isa-0000
Adapter: ISA adapter
in0:           5.00 V  (min =  +5.00 V, max =  +5.00 V)
curr1:         0.00 A  (max =  +1.50 A)

ucsi_source_psy_USBC000:001-isa-0000
Adapter: ISA adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:       680.00 mA (max =  +0.00 A)

ucsi_source_psy_USBC000:004-isa-0000
Adapter: ISA adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:         0.00 A  (max =  +0.00 A)

amdgpu-pci-c100
Adapter: PCI adapter
vddgfx:      857.00 mV 
vddnb:       652.00 mV 
edge:         +38.0°C  
PPT:           6.22 W  (avg =   4.12 W)

BAT1-acpi-0
Adapter: ACPI interface
in0:          15.49 V  
curr1:       386.00 mA 

mt7921_phy0-pci-0100
Adapter: PCI adapter
temp1:        +36.0°C  

ucsi_source_psy_USBC000:002-isa-0000
Adapter: ISA adapter
in0:           5.00 V  (min =  +5.00 V, max =  +5.00 V)
curr1:         0.00 A  (max =  +1.50 A)

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +39.9°C  

nvme-pci-0200
Adapter: PCI adapter
Composite:    +33.9°C  (low  = -273.1°C, high = +89.8°C)
                       (crit = +94.8°C)
Sensor 1:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +38.8°C  
temp2:        +38.8°C  
temp3:        +38.8°C  
temp4:       +180.8°C

Fedora

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +47.9°C  

ucsi_source_psy_USBC000:004-isa-0000
Adapter: ISA adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:         0.00 A  (max =  +0.00 A)

ucsi_source_psy_USBC000:002-isa-0000
Adapter: ISA adapter
in0:           5.00 V  (min =  +5.00 V, max =  +5.00 V)
curr1:         0.00 A  (max =  +1.50 A)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +33.9°C  (low  = -273.1°C, high = +89.8°C)
                       (crit = +94.8°C)
Sensor 1:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)

mt7921_phy0-pci-0100
Adapter: PCI adapter
temp1:        +38.0°C  

amdgpu-pci-c100
Adapter: PCI adapter
vddgfx:      731.00 mV 
vddnb:       653.00 mV 
edge:         +41.0°C  
PPT:           5.17 W  (avg =   6.09 W)

ucsi_source_psy_USBC000:003-isa-0000
Adapter: ISA adapter
in0:           5.00 V  (min =  +5.00 V, max =  +5.00 V)
curr1:         0.00 A  (max =  +1.50 A)

ucsi_source_psy_USBC000:001-isa-0000
Adapter: ISA adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:       680.00 mA (max =  +0.00 A)

BAT1-acpi-0
Adapter: ACPI interface
in0:          15.50 V  
curr1:       602.00 mA

Ubuntu

mt7921_phy0-pci-0100
Adapter: PCI adapter
temp1:        +30.0°C  

ucsi_source_psy_USBC000:004-isa-0000
Adapter: ISA adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:       680.00 mA (max =  +0.00 A)

ucsi_source_psy_USBC000:002-isa-0000
Adapter: ISA adapter
in0:           5.00 V  (min =  +5.00 V, max =  +5.00 V)
curr1:         0.00 A  (max =  +1.50 A)

nvme-pci-0200
Adapter: PCI adapter
Composite:    +33.9°C  (low  = -273.1°C, high = +89.8°C)
                       (crit = +94.8°C)
Sensor 1:     +33.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +27.9°C  (low  = -273.1°C, high = +65261.8°C)

amdgpu-pci-c100
Adapter: PCI adapter
vddgfx:      679.00 mV 
vddnb:       651.00 mV 
edge:         +37.0°C  
PPT:           4.21 W  (avg =   4.23 W)

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +39.6°C  

ucsi_source_psy_USBC000:003-isa-0000
Adapter: ISA adapter
in0:           5.00 V  (min =  +5.00 V, max =  +5.00 V)
curr1:         0.00 A  (max =  +1.50 A)

ucsi_source_psy_USBC000:001-isa-0000
Adapter: ISA adapter
in0:           0.00 V  (min =  +0.00 V, max =  +0.00 V)
curr1:         0.00 A  (max =  +0.00 A)

BAT1-acpi-0
Adapter: ACPI interface
in0:          14.81 V  
curr1:       535.00 mA

Guest68 · July 12, 2024, 4:29pm

The erratic reporting seems to happen intermittently as well. For several hours everything will be fine, but then at some point the sensor value starts cycling between (presumably) good and bad readings. After a while it will return to working normally again. I don’t see any particular pattern to when it’s reliable or erratic, based on temp or anything else (and least it appears elevating the temperature with a large project compile or stress does not induce the behavior).
Interestingly all of the temperature readings I have seen from the acpitz sensors end in .8. Is this just a quirk of the hardware?

Guest68 · July 19, 2024, 5:23am

Hey, @Loell_Framework, just wanted to check on this since it’s been a bit and I haven’t heard back from you. Is there anything else I should try? Did the sensors output have any useful info?

Guest68 · August 8, 2024, 12:52am

After checking the Discord group, it appears several other users have noticed the same issue on other (Intel) models (at least 12th gen) as well. The oldest oldest post I found was from 2022/10/16. A search for 179.8 or 180.8 will yield relevant results. I also found one other forum post referencing the same elevated reading, but there was no additional info there.
In response to one user’s recent (2024/08/04) question regarding the normalicy of a 179.8° ACPITZ reading, Dustin Howett provided this insight:

probably slightly normal at least until kernel 6.10 or 6.11
(spurious readings from the memory-mapped I/O region of the embedded controller)
assuming it returns to normal shortly after
if not: uh no
- source (requires an account to view)

Later he double checked and confirmed that the fix is scheduled for a 6.11 release with no backport to 6.10.

So it seems this has been a Linux kernel issue affecting several (all?) FW 13 models for quite a while, but will soon be resolved.

DHowett · August 8, 2024, 12:58am

Ah, sorry! I didn’t realize that folks were experiencing this with the AMD platforms.

The MMIO fix was not required on the AMD Frameworks Laptop because they were not susceptible to that specific issue.

Unfortunately, that means that 6.11 will bring no relief for folks suffering this issue on AMD and that the root cause is still unknown.

Guest68 · August 8, 2024, 1:07am

Well that was a quick response. And unfortunate. Can you provide any insight about what’s going on here, with either the Intel or AMD versions? Why might they exhibit the same issue, but have different causes?
Also, for you or whoever else may be looking into this in the future, let me know if there is anything I can provide to assist with identifying or resolving the issue on the AMD side. Unless you think it’s hardware related?

Thomas_Weissschuh · August 11, 2024, 9:47am

The value 180800 looks suspicious.

The original value comes from the ChromeOS EC and gets read by the application processor (Linux) via a shared memory segment.
It is a single byte which get transformed by the following formula into the millicelsius value you see in sysfs:

(x + 200) * 1000 - KELVIN_TO_CELSIUS_OFFSET

KELVIN_TO_CELSIUS_OFFSET is 273150 in the Linux kernel, but here the calculation is done by the ACPI firmware which seems to use 273200.
Then for x = 0xfe we get the observed value 180800.
0xfe in turn is a special value meaning EC_TEMP_SENSOR_ERROR.

So there are two issues:

The EC fails to read the sensor. (Maybe the EC logs help investigating)
The ACPI firmware incorrectly reports an error value as a real result value. This should be fixed in the firmware.

Guest68 · August 12, 2024, 6:48am

I see. Thanks for the informative reply. Sounds like the place to start my investigation is the EC then.

To that end:

How do I troubleshoot the EC failing to read the sensor? Does that depend entirely on what the logs report? Is this failure likely to be hardware or firmware? Is there anything I can do about the failure?
When you say the ACPI firmware incorrectly handles the error value, is that the fault of the kernel interface or the Framework firmware? What does the process to fix that look like? Is that something that Framework needs to be involved with, or is that just the realm of you and/or the other kernel contributors (or both)?

I also have a few questions of varying relevance to the current issue if you have a moment:

Where does this information about the temp sensor and related formula come from? Just kernel source? The ACPI spec? Somewhere else? I’d like to look more into this and know for the future how to trace such a problem. Would I simply need to be familiar with the software to have known 180800 was the error value? EC_TEMP_SENSOR_ERROR seems to be defined by the Framework/ChromeOS EC though, so at least I know where to find that one.
You reference a kernel formula and offset value, but say that here the calculation is done by the firmware (presumably exposing the result to the kernel under an identifier for pre-calculated values?). Why are there two places where ACPITZ values are interpreted? Why choose to implement one over the other, particularly as it pertains to Framework?
Why does the kernel KELVIN_TO_CELSIUS_OFFSET differ from the one used by the Framework ACPI firmware?

Additionally (intentionally or otherwise) I now have an answer to my previous question about why the ACPI temp sensors all report values in 1 degree increments that always end in .8. I suppose only having single degree precision is the consequence of using a single byte, but this leads me to more questions…

Why report values with a decimal component if a single degree is the most precise the exposed reading will be?
Why .8? Is it just a quirk of the necessary offset?

I realize this is a lot of questions, so I understand if you can’t answer all of them. Thanks for your time regardless.

Guest68 · August 12, 2024, 7:12am

Here are a few recording of the EC log from ectool console (Dustin’s fork) while bad values were being reported:

console log 1

[618483.934400 SB-SMI: Mailbox transfer timeout]
[618483.935600 SB-RMI Error: 4]
[618486.962800 SB-SMI: Mailbox transfer timeout]
[618486.964200 SB-RMI Error: 4]
[618493.918500 Battery 65% (Display 65.4 %) / 8h:33 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[618517.618900 HC 0x0115 err 1]
[618541.867300 Battery 65% (Display 65.3 %) / 8h:11 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[618578.763400 Battery 65% (Display 65.2 %) / 8h:22 to empty]
[618583.327400 HC 0x0115 err 1]
[618624.463600 Battery 65% (Display 65.1 %) / 8h:10 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[618648.383300 HC 0x0115 err 1]
[618671.599000 Battery 65% (Display 65.0 %) / 9h:4 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[618706.758400 Battery 65% (Display 64.9 %) / 8h:5 to empty]
[618712.375000 HC 0x0115 err 1]
[618756.670700 Battery 65% (Display 64.8 %) / 8h:20 to empty]
PORT80: F022
PORT80: F90E
[618776.392800 HC 0x0115 err 1]
[618801.555500 Battery 65% (Display 64.7 %) / 7h:8 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[618839.508100 HC 0x0115 err 1]
[618840.722900 Battery 65% (Display 64.6 %) / 7h:5 to empty]
[618859.032700 HC 0x0000]
[618876.061500 Battery 65% (Display 64.5 %) / 8h:35 to empty]
PORT80: F022
PORT80: F028
PORT80: F90E
[618902.470300 HC 0x0115 err 1]
[618903.688900 Battery 64% (Display 64.5 %) / 9h:17 to empty]
[618928.253900 Battery 64% (Display 64.4 %) / 8h:24 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[618966.407700 HC 0x0115 err 1]
[618968.644000 Battery 64% (Display 64.3 %) / 7h:30 to empty]
[619017.131300 Battery 64% (Display 64.2 %) / 8h:0 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[619032.173500 HC 0x0115 err 1]
[619053.699500 Battery 64% (Display 64.1 %) / 8h:27 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[619093.333400 HC 0x0115 err 1]
[619102.869300 Battery 64% (Display 64.0 %) / 8h:41 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[619146.017100 Battery 64% (Display 63.9 %) / 8h:4 to empty]
[619154.451000 HC 0x0115 err 1]
[619197.194500 Battery 64% (Display 63.8 %) / 8h:56 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[619215.619800 HC 0x0115 err 1]
[619230.064600 Battery 64% (Display 63.7 %) / 7h:25 to empty]
[619265.434600 Battery 64% (Display 63.6 %) / 6h:7 to empty]
PORT80: F90D
PORT80: F90E
[619277.571100 HC 0x0115 err 1]
[619307.581200 Battery 64% (Display 63.5 %) / 7h:25 to empty]
PORT80: F022
PORT80: F028
PORT80: F90E
[619338.732200 HC 0x0115 err 1]
[619347.223400 Battery 63% (Display 63.4 %) / 6h:26 to empty]
[619378.331700 Battery 63% (Display 63.3 %) / 7h:29 to empty]
PORT80: F90E
[619400.156800 HC 0x0115 err 1]
[619423.489500 Battery 63% (Display 63.2 %) / 8h:30 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[619461.266600 HC 0x0115 err 1]
[619471.665000 Battery 63% (Display 63.1 %) / 8h:19 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[619515.315900 Battery 63% (Display 63.0 %) / 7h:55 to empty]
[619522.982600 HC 0x0115 err 1]
[619542.409400 Battery 63% (Display 62.9 %) / 6h:27 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[619584.337100 HC 0x0115 err 1]
[619584.812600 Battery 63% (Display 62.8 %) / 7h:46 to empty]
[619632.479500 Battery 63% (Display 62.7 %) / 7h:29 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[619645.840600 HC 0x0115 err 1]
[619668.141500 Battery 63% (Display 62.6 %) / 7h:41 to empty]
PORT80: F90E
[619709.766200 HC 0x0115 err 1]
[619712.762900 Battery 63% (Display 62.5 %) / 8h:10 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[619762.690800 Battery 63% (Display 62.4 %) / 8h:49 to empty]
[619773.193700 HC 0x0115 err 1]
[619794.806800 Battery 62% (Display 62.4 %) / 6h:55 to empty]
[619805.844600 Battery 62% (Display 62.3 %) / 6h:38 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[619834.198400 Battery 62% (Display 62.2 %) / 6h:37 to empty]
[619835.135700 HC 0x0115 err 1]
[619877.352800 Battery 62% (Display 62.1 %) / 7h:8 to empty]
PORT80: F022
PORT80: F022
PORT80: F90E
[619898.486900 HC 0x0115 err 1]
[619924.529400 Battery 62% (Display 62.0 %) / 7h:58 to empty]
PORT80: 3C01
[619943.738900 HC 0x0002]
[619943.741200 HC 0x000b]

console log 2

[619773.193700 HC 0x0115 err 1]
[619794.806800 Battery 62% (Display 62.4 %) / 6h:55 to empty]
[619805.844600 Battery 62% (Display 62.3 %) / 6h:38 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[619834.198400 Battery 62% (Display 62.2 %) / 6h:37 to empty]
[619835.135700 HC 0x0115 err 1]
[619877.352800 Battery 62% (Display 62.1 %) / 7h:8 to empty]
PORT80: F022
PORT80: F022
PORT80: F90E
[619898.486900 HC 0x0115 err 1]
[619924.529400 Battery 62% (Display 62.0 %) / 7h:58 to empty]
PORT80: 3C01
[619943.738900 HC 0x0002]
[619943.741200 HC 0x000b]
PORT80: 3C08
PORT80: F022
PORT80: F90E
PORT80: F90E
[619961.314300 HC 0x0115 err 1]
[619962.649300 Battery 62% (Display 61.9 %) / 6h:52 to empty]
[619971.750100 HC 0x0002]
[619971.754500 HC 0x000b]
[619973.328000 HC 0x0002]
[619973.332100 HC 0x000b]
[619995.016400 Battery 62% (Display 61.8 %) / 7h:38 to empty]
[620012.622300 HC 0x0002]
[620012.626600 HC 0x000b]
PORT80: F022
PORT80: F90E
PORT80: F90E
[620017.051400 HC 0x0002]
[620017.055400 HC 0x000b]
[620017.888000 HC 0x0002]
[620017.893400 HC 0x000b]
[620023.313200 HC 0x0115 err 1]
[620034.908500 Battery 62% (Display 61.7 %) / 6h:37 to empty]
[620072.329200 Battery 62% (Display 61.6 %) / 6h:20 to empty]
PORT80: F022
PORT80: F022
PORT80: F90E
[620086.851400 HC 0x0115 err 1]
[620116.233200 Battery 62% (Display 61.5 %) / 7h:16 to empty]
PORT80: F022
PORT80: F022
PORT80: F90E
[620144.082300 Battery 62% (Display 61.4 %) / 6h:10 to empty]
[620151.641100 HC 0x0115 err 1]
[620187.217200 Battery 62% (Display 61.3 %) / 6h:48 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[620205.767900 Battery 61% (Display 61.3 %) / 6h:39 to empty]
[620215.115400 HC 0x0115 err 1]
[620227.097300 Battery 61% (Display 61.2 %) / 7h:6 to empty]
PORT80: F90D
[620269.744100 Battery 61% (Display 61.1 %) / 7h:26 to empty]
[620277.285900 HC 0x0115 err 1]
[620300.102100 Battery 61% (Display 61.0 %) / 6h:35 to empty]
PORT80: 3C01
PORT80: F022
PORT80: F90E
PORT80: F90E
[620329.834700 HC 0x0002]
[620329.837100 HC 0x000b]
[620339.108500 HC 0x0115 err 1]
[620341.247100 Battery 61% (Display 60.9 %) / 7h:1 to empty]
[620376.872500 Battery 61% (Display 60.8 %) / 6h:0 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[620400.985500 HC 0x0115 err 1]
[620417.280400 Battery 61% (Display 60.7 %) / 6h:51 to empty]
[620448.373200 Battery 61% (Display 60.6 %) / 6h:40 to empty]
PORT80: F90E
PORT80: F022
PORT80: F90E
[620464.019000 HC 0x0115 err 1]
[620479.803200 HC Suppressed: 0x97=342 0x98=144 0x113=0 0x103=0 0x115=58 0x2b=0 0x67=0 0x121=0]
[620487.262000 Battery 61% (Display 60.5 %) / 6h:18 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[620527.828000 HC 0x0115 err 1]
[620532.176500 Battery 61% (Display 60.4 %) / 6h:50 to empty]
[620559.548300 Battery 61% (Display 60.3 %) / 6h:31 to empty]
[620570.541900 HC 0x0002]
[620570.546000 HC 0x000b]
PORT80: F022
PORT80: F90E
[620588.836000 HC 0x0002]
[620588.841500 HC 0x000b]
[620589.891300 HC 0x0115 err 1]
[620590.052000 HC 0x0002]
[620590.056400 HC 0x000b]
[620600.414800 Battery 61% (Display 60.2 %) / 7h:7 to empty]
[620611.202000 Battery 60% (Display 60.2 %) / 6h:49 to empty]
[620638.591300 Battery 60% (Display 60.1 %) / 5h:50 to empty]
PORT80: F022
PORT80: F90D
PORT80: F90E
[620653.640200 HC 0x0115 err 1]
[620672.382000 HC 0x0002]
[620672.386200 HC 0x000b]
[620673.677000 Battery 60% (Display 60.0 %) / 5h:38 to empty]
[620702.777200 Battery 60% (Display 59.9 %) / 6h:23 to empty]
PORT80: F90D
PORT80: F90E
PORT80: F90E
[620715.183000 HC 0x0115 err 1]
[620738.148500 Battery 60% (Display 59.8 %) / 6h:11 to empty]
PORT80: F022
PORT80: F022
PORT80: F90E
[620776.578400 HC 0x0115 err 1]
[620780.829100 Battery 60% (Display 59.7 %) / 7h:5 to empty]
[620822.206000 Battery 60% (Display 59.6 %) / 6h:45 to empty]
PORT80: F022
PORT80: F90E
PORT80: F90E
[620840.335300 HC 0x0115 err 1]
[620853.339900 Battery 60% (Display 59.5 %) / 6h:24 to empty]
[620892.261400 Battery 60% (Display 59.4 %) / 6h:38 to empty]
PORT80: F90D
PORT80: F022
PORT80: F90E
[620906.297100 HC 0x0115 err 1]
[620924.292100 HC 0x0002]
[620924.296300 HC 0x000b]

Running the command several times shows a repeating sequence of these lines appended to the log:

[xxxxxx.xxxxxx HC 0x0002]
[xxxxxx.xxxxxx HC 0x000b]

Occasionally interspersed with some other lines like these:

PORT80: F022
PORT80: F90D
PORT80: F90E
[620653.640200 HC 0x0115 err 1]

What do these logs mean? I’ll take a look through the docs (or the source) to do a little discovery myself when I have some time in the next few days, but I’d like some input from someone a more knowledgeable.

P.S. Can we please get .txt as an authorized extension? Working with large pasted blocks of text is cumbersome and frustrating.

Thomas_Weissschuh · August 12, 2024, 4:44pm

Most likely by looking at the logs and EC source code.
I have no idea about the details and solution.

It’s purely an ACPI firmware issue. The fix needs to come from the ACPI supplier, going through Framework.
The kernel can’t do anything about it.

This information comes from the EC API headers.
I recognized this value because I wrote the hwmon driver for the CrOS EC, which is completely unrelated to this specific issue, though.

The ACPI firmware exposes these readings under standard ACPI interfaces so it works everywhere.
But for example it misses the labels for the sensors, which my driver also exposes.
And there is data for which no standard ACPI interfaces may exist, so a dedicated driver makes sense.

Probably because the exact value really does not matter, and maybe some interface somewhere in the chain only supports on decimal digit.

In the kernel driver there is a preexisting constant and conversion function which is used for many drivers.
I expect the same to be true for the conversion in ACPI.
It could be rounded but it doesn’t really matter.

Yes, it’s an artifact of the kelvin offset constant.

I hope to have answered all of them in a useful way.
If you have more questions or ideas let me know.

Thomas_Weissschuh · August 12, 2024, 4:47pm

“Port 80” is a (emulated?) debug IO port.
See io - What does the 0x80 port address connect to? - Stack Overflow

Otherwise I don’t really know and also would need to look at the EC source.

Guest68 · August 17, 2024, 8:32am

I hope to have answered all of them in a useful way.
If you have more questions or ideas let me know.

Yes, that was informative and helpful. Thanks.

Most likely by looking at the logs and EC source code.

I’ll start digging through the source for the EC and ectool when I have time then.

The fix needs to come from the ACPI supplier, going through Framework.

Sounds like Framework would have to chase the fix for this then. I guess I’ll create a support ticket or something when I have more info.

I wrote the hwmon driver for the CrOS EC

Nice. I was not aware of this driver; it looks useful. Unfortunately it seems my kernel was not shipped with it though, so I guess I’ll be compiling it soon.

The ACPI firmware exposes these readings under standard ACPI interfaces so it works everywhere.
But for example it misses the labels for the sensors, which my driver also exposes.
And there is data for which no standard ACPI interfaces may exist, so a dedicated driver makes sense.

Ah, so there are two interfaces.

I found some interesting information in the message thread about your v2 patches while looking into the driver you wrote. You’d have already read it, but I’ll reproduce it here for completeness sake.

Stephen Horvath:
Oh I see, I haven’t played around with the temp sensors until now, but I
can confirm the last temp sensor (cpu@4c / temp4) will randomly (every
~2-15 seconds) return EC_TEMP_SENSOR_ERROR (0xfe).
Unplugging the charger doesn’t seem to have any impact for me.
The related ACPI sensor also says 180.8°C.
I’ll probably create an issue or something shortly.
- [v2,1/2] hwmon: add ChromeOS EC driver - Patchwork
(corroborated by Guenter Roeck in the following message as well)

This matches my experience, so I guess this is a known issue. Checking with ectool temps all while the problem occues reports Sensor 3 error, so at least the EC handles the error even if ACPI doesn’t.

Would you happen to know what the status of his plan to create an issue is? It sounds to me like he’s referring to reporting this to Framework, so perhaps they’re already aware. If so, following or joining whatever existing effort there may be to track down the issue sounds beneficial.

Thomas_Weissschuh · August 17, 2024, 8:49am

It will only be part of v6.11, so no kernel shipped with it yet.
Backporting it will be a bit annoying because it also requires new utility functions and the MFD bits.

Both the ACPI firmware and the Linux driver read the data from the EC through the same interface. Same for ectool.
(I’m decently sure)

Good find, I forgot about that one.
No idea what became of the plan.

Guest68 · August 17, 2024, 9:58am

It will only be part of v6.11, so no kernel shipped with it yet.

Ah, of course. I probably don’t plan to compile and run an rc kernel, so I guess I’ll get it at release.

Same for ectool.
(I’m decently sure)

Hm. I think I don’t understand how the ACPI firmware is related to the EC. I imagined two interfaces for the hardware sensors: an EC interface and an ACPI interface, and thought the former correctly detected the error where the latter did not because ectool (EC interface) reported the error where acpi (ACPI interface) did not.
It seems like you are saying that the EC is the root source of the measurement that then exposes the temp sensors (including the error value) to the ACPI firmware that incorrectly reports the error value as a valid temp measurement.
Something like this:

hardware probe ─> EC ─> ACPI firmware
                  |     └─> interface (read by Linux kernel ACPI driver)
                  └─> interface (read by ectool and the ec_* drivers)

What is the correct relationship?

No idea what became of the plan.

Do you know if he has an account here that I could message him to ask? If not, I suppose I’ll email and ask? Unless it’s better if you do it.

Thomas_Weissschuh · August 17, 2024, 11:17am

Exactly.

ACPI is only a bytecode definition that can be used to map standard datastructures and interfaces to the concrete hardware implementations on the platform.
As the sensor is hooked up to the EC, the ACPI functions also read the EC memory map.

Sounds good. You could even respond to the original mail.

Guest68 · August 19, 2024, 4:22am

Exactly.

Great.

Sounds good.

Will do.

Topic		Replies	Views
[RESPONDED] Temperature Sensor Locations Linux	4	886	February 25, 2024
Framework 13 AMD cpu temps? Linux	3	399	January 25, 2025
Running very hot, high ACPI temps Framework Laptop 13	6	3403	August 23, 2022
FW13 AMD 7840u thermal throttle at much lower temperature Linux debian	26	793	January 23, 2025
Uneven CPU thermals! Framework Laptop 16 framework-laptop-16-amd-7040	1320	28195	June 3, 2025

[RESPONDED] FW 13 7840U ACPI thermal readout problem

Related topics