Do you think adding average CPU frequency for all core loads and maximum frequency for single core load would be of any use?
I was tempted to make a script that takes care of everything and spit out a log. But honestly the behavior seems ok for me on my end so I haven’t done anything else yet.
I must have forgotten to put up the results. Now updated.
The fan behaviour from a “fan curve” perspective is very similar (across the board I should say). The difference is with PPD set to ‘power-saver’, for all-core load the temps seem to be lower, settling around 70-72C with lower peak fan RPM. This is likely due to profile policy on CPU frequency. With PPD “disabled” the average CPU frequency would settle at around 3.3GHz in the above experiments, whereas with “power-saver” it would settle around 2.2GHz which would explain the difference in Tctl.
This does seem to be the case. I did not look too much into it and only added the link to the post for completion, rather than any criticism. There’s a lot of useful links in other threads that discuss the EC configuration and thermals, though most seem to be Intel related.
I did get the following readout from ectool thermalget:
sensor warn high halt fan_off fan_max name
0 343 353 393 313 343 local_f75303@4d
1 343 353 393 319 327 cpu_f75303@4d
2 343 353 393 401 401 ddr_f75303@4d
3 381 388 400 376 378 cpu@4c
(all temps in degrees Kelvin)
[for cpu@4c: 376K ~ 103C; 378K ~ 105C]
Which is in line with the EC config and what you are saying.
I did capture the frequency data as well but I’ve not included it as there’s already a lot of quite comprehensive benchmarks on the actual performance of the FW13 AMD that are available and I wouldn’t do them justice. The overall performance is exactly what I would expect from this form factor and I’m generally extremely happy with it.
Instead, I wanted to keep the discussion specific to the fan behaviour as a reference point for people, especially for anyone who might think there might be an issue with the way their unit behaves in that regard.
But to answer your question, with the “performance” PPD profile for all-core loads (which I tend to do often) the average frequency is about ~3.6GHz sustained, and for single-core loads it’s ~4.8GHz sustained. This is in line with expectations and other, more thorough, benchmarks.
Ok, I agree that overall clocks should be similar.
But I think clocks on the graph might show people the downsides of the throtling. i.e. might be double digits percent lower clocks while the fan is ramping up? Depending on cooling performance I guess.
Since I switched to PTM7950 I don’t hit 100C anymore, so clocks seem rock stable even before fan ramp up. (Also like 50mhz higher average under full load)
$ sudo ectool temps all
--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d 320 K (= 47 C) 23% (313 K and 343 K)
cpu_f75303@4d 319 K (= 46 C) 0% (319 K and 327 K)
ddr_f75303@4d 311 K (= 38 C) N/A (fan_off=401 K, fan_max=401 K)
cpu@4c 373 K (= 100 C) 0% (376 K and 378 K)
and multi core(43W from USBC) I got this.
$ sudo ectool temps all
--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d 321 K (= 48 C) 26% (313 K and 343 K)
cpu_f75303@4d 325 K (= 52 C) 75% (319 K and 327 K)
ddr_f75303@4d 314 K (= 41 C) N/A (fan_off=401 K, fan_max=401 K)
cpu@4c 364 K (= 91 C) 0% (376 K and 378 K)
On single core the CPU is hotter while the sensors are cooler, looks like on single core the thermal conductivity magically gets poorer which does not make any sense.
Maybe it could be of some use indeed. I might take a break from ‘graphing’ for a bit, but I’ll see if I can add it in with a secondary Y-axis on one side. It will make the graphs slightly less readable, but a different kind of plot would be a bit of an overkill for temporal data. All-core stats would be easy, but single core, which is probably more interesting, might require some pre-processing as the core on which the load runs tends to change since I didn’t make use of any core pinning.
In any case, my rationale for not including it is that while there will inevitably be an effect from throttling while the fan catches up, this should only affect all-core loads that run for less than the median time for the fan to reach peak RPM. For long, sustained all-core load this shouldn’t make a difference.
For single core loads it might be more subtle as there’s a lot more room for the fan to reach higher RPMs which could in turn, possibly, allow the CPU to boost higher. This is probably the more interesting case.
PTM7950 is really cool stuff, I might look into getting a pad to chop up and try out some time. It’s great to hear your experience with it as it seems to be making quite a change. I presume your CPU is also more comfortable with single core loads? This would be very impressive if so.
No, makes sense to me. With a single CPU generating the heat you get a point source of heat, with a longer path to the outside world, so more thermal resistance between the point source and the external sensor.
With multiple CPUs generating heat then you have multiple point sources each at the same temperature as the single CPU, but the distance to the external sensors from some of them may be shorter, so the external sensors will get hotter.
I don’t know how the CPUs are arrayed in the package, but is the single CPU (I presume this is the primary CPU that starts everything up initially and then masters which processors are being used) in the middle of the package cavity? This could potentially increase the thermal path to the external sensors.
It could be possible that the location of different core plays a part on thermal management. When stressing the CPU single core on s-tui the program seems to hopping from a core to another from time to time and there’s no “master” core as the unloaded cores always clock lower than loaded core. My guess is that the “Tctl” reports the highest core temperature but there’s no indication of which core is currently being measured
It’s probably even more complicated than that, there are tons of temperature sensors on a modern cpu/gpu but they usually only expose a a few abstracted ones (like hottest overall, hottest of this core, [insert proprietary buzzword]-math over a few of them or just a specific one).
I noticed that with or without USBC power also makes a difference, when PPD is balanced the pstate EPP is balance_power on battery and balance_performance on USBC power. Could you please specify the USBC power status in your test? Thank you
I never tried ectool thermalset … I just assumed that there was an intentional long averaging period on the cpu temp, but that assumption was totally wrong, it was basically ignoring the CPU temp (until 103°C) and waiting for something further away on the motherboard to heat up. You can get instant fan response, if you want it, by setting reasonable cpu fan-off / fan-max temperatures. Here’s what I’m using for now:
ectool thermalset 3 370 380 390 333 363
Translated from Kelvin to Celcius, that’s: fan_off=60 fan_max=90 (and warn=97 high=107 halt=117 but I’m not sure how much warn/high matter). Now my fan comes on in 1 or 2 seconds after starting stress.
Good Analysis!
A few points. The fan behavior on the Framework 13 Ryzen is driven by an external thermal sensor which takes some time to respond versus the CPU. This is partially by design, as having the fan respond instantly to the CPU is not an ergonomic tradeoff that many people will want to make. The CPU temperature can go from 40->90+C->40C in a second, having the fan respond so quickly would be unpleasant to listen to.
There are several factors we take into the thermal design of the system including the fan responsiveness, the thermal mass of the heatpipe and spreader, and the acoustic behavior.
We further improved the the fan behavior on the Framework 16 by introducing a virtual thermal sensor algorithm, which is essentially a low pass FIR filter fed by the die temperature. This allows us to tune the fan responsiveness by adjusting the filter parameters. This allows us to emulate placing a physical thermal sensor at different distances to the APU, without having to layout a new PCB. Which solved a similar issue on the Framework 16 APU and GPU.
We are investigating back-porting this new technology to the Ryzen 13 platform.
Interesting observation, thanks! I hadn’t thought about it. The experiments were all done while connected to the charger, so that would be on USBC power.
I didn’t capture the EPP, but having looked at it now it appears the EPP hint on my system is always performance, irrespective of PPD profile and AC power. That is assuming /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference is to be trusted. I’m inclined to believe this was also the case for when I ran the experiments as nothing much has changed since (other than a kernel update from 6.6.21 to 6.6.30).
Can you please clarify where you get the EPP hint from if different from above? There may be differences in behaviour between distros and/or kernel versions. AMD keep making updates to the amd_pstate driver (which is good) and I only run the “longerm” line of kernel releases - this is the default on Gentoo and I CBA to compile a new kernel every time it pops up. So not sure if any of the newer changes have been back-ported to the longterm 6.6.x line.
There’s a pretty good thread on tuning fan behaviour with ectool here. I did not link to it earlier as the original post is for the Intel series of the FW13, but there’s a good degree of overlap and has a lot of useful information on the matter.
Thanks for the explanation, Kieran! What you’re saying makes perfect sense and is in line with the post I had linked, so good to know it wasn’t just hearsay. I did suspect that, if by design, there must have been a “comfort” element to it.
Another one thing to note is that AMD Ryzen 7000 Series (and 7040 Series) have relatively low thermal conductivity compared to other processors with the same performance, which means that the heat transfer from the CPU to the heat sink is slower. As a result, higher CPU temperature and lower radiator temperature at a given load. Therefore, ramping up the fan immediately when the CPU reaches 90C does little since the radiator is still at 40C. To mitigate the problem, AMD optimized the thermal “wall” of the CPUs so instead of underclocking all the way to 0.4GHz as a form of safety protection, they underclock it just enough to maintain 100C as the radiator catches up dissipating the heat. In this case, undervolting is very beneficial as it’s not only saves energy but also improves performance by quite a bit.
Fedora 39, has PPD preinstalled. I used this UI(on the top right of the screen) and if you cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference you can get performance if the (max RPM icon)Performance is selected, power if the (min RPM icon)Power is selected. When (mid RPM icon)Balanced is selected, balance_performance on USBC and balance_power on battery
The fan speed control temperature points are way too high, thus when CPU is heating up, frequency throttling kicks in first, therefore temperature rise slows down, therefore the fan never speeds up or speeds up very very late.
--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d 329 K (= 56 C) 80% (313 K and 333 K)
cpu_f75303@4d 326 K (= 53 C) 65% (313 K and 333 K)
ddr_f75303@4d 321 K (= 48 C) 40% (313 K and 333 K)
cpu@4c 354 K (= 81 C) 100% (313 K and 353 K)
$ ectool thermalget
sensor warn high halt fan_off fan_max name
0 343 353 393 313 343 local_f75303@4d
1 343 353 393 319 327 cpu_f75303@4d
2 343 353 393 401 401 ddr_f75303@4d
3 381 388 400 376 378 cpu@4c
(all temps in degrees Kelvin)
This is the default setting.
When the temperature reaches Warn, the power is cut significantly. My interpretation is that the CPU can be used at 100C but other components cannot so when sensor 0 1 and 2 reaches 343K something needs to be done to prevent overheat damage. I found that if you
ectool thermalset 3 370 380 390 333 363
The power, frequency and fan speed will go all over the place as the CPU powers down when reaching 97C then powers up again when below 97W and fan speed fluctuate as the result. If you stress all cores nothing will happen, but if you stress one core or half of the cores, the power will be cut when the hottest core reaches 97C even if the idling cores are only 60~ish Celsius(AMD only displays the hottest core and the thermal conductivity between cores are worse than Intel). After adjusting to
ectool thermalset 3 381 388 400 333 363
The CPU power goes back to default while the fan still works as intended(faster reaction)
Here’s the comparison using $ stress -c 4 -t 30