Missing per-core CPU temperatures from k10temp?

In sensors (and respective file in /sys/class/hwmon), I see only the Tctl temperature coming from k10temp driver. I look at the kernel driver code and I am not immediately sure why, since it seems like this model was added a long time ago (kernel 6.0), and this information seems like it is exposed to Windows users. Do others have Tdie/Tccd temperatures? If so, which kernel/distro are you using?

Thanks!

(Edit: Note utilities like KDE System Monitor fall back to Tctl for “CPU core” temperatures when Tccd is missing, so don’t rely on that :-))

Debian testing/sid
Kernel 6.9.10
BIOS 3.03
7840HS (family 25, model 116)

Hmm, did this get buried somehow by the forum software? I don’t want to be That Guy who impatiently bumps their own topics, but I didn’t expect it would take days to get one other person to run sensors and say yes/no. Any distribution is fine, my kingdom for a fellow Linux FW16 user’s feedback here! :slight_smile:

I am using Kubuntu 24.04 and i only have Tctl. Bevor Kubuntu i used Mint Cinnamon and sensors also only had Tctl / k10temp.

I’ve been using Arch Linux since 6.6 ( now on 6.10.3 ) on FW16 and never seen a per core temperature on Linux

I think it’s something not yet supported on Linux k10temp kernel module.

Great, thank you both for checking! Seems odd since it on the surface it looks like it is supposed to be reading this data. @Mario_Limonciello is it intended that no Tccd is available? I ask specifically only because your name appears as author or reviewer on most of the recent commits. Thanks!

CCD temperature readings are only valid when ZEN_CCD_TEMP_VALID is set.

Otherwise values are considered garbage.
See linux/drivers/hwmon/k10temp.c at master · torvalds/linux · GitHub for more details.

2 Likes

So is this a CPU bug where the valid bit is not being set, and some Windows driver ignores that and will use the value anyway, and it happens to be valid? Otherwise, how are Windows users seeing values? I wish I were in a better position to probe directly by rebuilding driver with some instrumentation to just check myself but I am not right now, for unrelated reasons. Thanks for your quick thoughts!

You would need to ask the developer of that Windows tool if they ignore the CCD valid bit on the register.

If they do this is invalid data that they are presenting in their tool.

And no; I am not aware of any errata related to an invalid CCD valid bit. If it’s invalid you shouldn’t use the data in that register.

HWiNFO is not open source so can’t check directly but guess they are doing some thing similar to some other OSS utility called LibreHardwareMonitor that seems to take its own approach of reading the low 12 bits for a few specific models (not 74h), then if it looks like the value is in range they use the value. Gutsy. I guess HWiNFO is doing something similar but since this was mostly a “hey, is this expected?” query, I don’t feel especially motivated to ask them since I think an AMD engineer with direct knowledge would know best. :slight_smile: Interesting that those utilities are seeing plausible-looking temperature values anyway. I hope the folks in that other thread are not wasting time chasing phantoms!

I did at least spend a very exciting dinner time looking for a programmer’s guide or open-source register reference for this part of the system and was not successful. I only found OSRR for 17h which only describes base register 0x00059800, which is too bad. (I guess this is to say: I’m currently unsure how external developers would know there is a valid bit since I didn’t succeed at finding the docs about it.)

Thanks for the feedback and the work you do to make everything work so nicely on Linux :slight_smile:

1 Like

Sorry, this is just nagging at me for literally no good reason tonight. Feel free to ignore me about this topic forever if it becomes annoying.

Picking at the history of the k10temp driver, seems like the code for determining CCD valid bit was by experimentation on family 17h at hwmon: (k10temp) Display up to eight sets of CCD temperatures · torvalds/linux@fd8bdb2 · GitHub and then inferring that because GPU thm_10_0_sh_mask.h header from AMD defined some valid bit mask that it holds for CPU as well, but latest thm_14_0_2_sh_mask.h does not have these definitions. So is the valid bit actually gone on newer revisions? (Was it ever truly correct to infer from this header in the first place?)

I assume you have access to internal datasheet to verify it is the case or not but just wanted to make sure there was not just some assumption based on empirical observation and inference that no longer holds due to changes in undocumented registers, and HWiNFO is actually right since it is returning plausible values.

I’m currently OOO from work, so I can’t get to the internal documentation to double check it all.

Could you please file a kernel bug to look into this and assign it to me? I’ll look when I get back into the office at the end of the month.

2 Likes

I filed at 219148 – k10temp: Tccd missing on Zen 4, possible incorrect valid bit check?. Not able to change the assignee but added as CC and hopefully default assignee will reassign to you when they see it. Thanks!!

2 Likes

don’t get too excited with CCD…
It is not per core temp but per “complexe”.

On AMD Ryzen 9 5950X 16-Core Processor wiht 2-CCD / 16 Core (and a IO chip) the report is “correct”
k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +48.1°C
Tccd1: +42.8°C
Tccd2: +29.5°C

On 7N40HS, there is only 1 CCD (with GPU+IO) so it can make sens not have CCD reported.

Per core temps is an other story… can be great to have it be not sure if we can realy.
I remamber (but whit age my memory is bad… so may be wrong) that is have been per core temps in some “old” kernel but it as be removed for unreliability. What I can remamber if it is hardware related (in some/all CPU), or if without clear spec it was to hard to report corect value with correct bias/factor…

edit: I may be wrong with per-core temp removal: https://www.phoronix.com/news/Linux-5.11-Drops-k10temp-V-C it was the Zen Voltage/Current that was remove because of doc missing…

1 Like

Oops! That is certainly my mistake for not understanding well enough what these were measuring (of course, once you point it out, it is obvious :person_facepalming:). I guess the question of whether valid bit is still true remains (5700X with one CCX still shows Tccd1), but I guess the real issue is why there is no core temperature implementation. Without public documentation to eventually write it myself I’m not sure what else to do other than politely ask for AMD to put out those docs or send a patch since I don’t have the resources to do more :slight_smile:

1 Like

With this case there is 1 CDD + 1 IO cliplet so it can be valide and make sens (it is only hypothesis).

Yes this question remains :wink:

1 Like