AMD FW13 fan behaviour and ramp-up times

Awesome write up.

Do you think adding average CPU frequency for all core loads and maximum frequency for single core load would be of any use?

I was tempted to make a script that takes care of everything and spit out a log. But honestly the behavior seems ok for me on my end so I haven’t done anything else yet.

I must have forgotten to put up the results. Now updated.

The fan behaviour from a “fan curve” perspective is very similar (across the board I should say). The difference is with PPD set to ‘power-saver’, for all-core load the temps seem to be lower, settling around 70-72C with lower peak fan RPM. This is likely due to profile policy on CPU frequency. With PPD “disabled” the average CPU frequency would settle at around 3.3GHz in the above experiments, whereas with “power-saver” it would settle around 2.2GHz which would explain the difference in Tctl.

This does seem to be the case. I did not look too much into it and only added the link to the post for completion, rather than any criticism. There’s a lot of useful links in other threads that discuss the EC configuration and thermals, though most seem to be Intel related.

I did get the following readout from ectool thermalget:

sensor  warn  high  halt   fan_off fan_max   name
  0      343   353    393    313     343     local_f75303@4d
  1      343   353    393    319     327     cpu_f75303@4d
  2      343   353    393    401     401     ddr_f75303@4d
  3      381   388    400    376     378     cpu@4c
(all temps in degrees Kelvin)

[for cpu@4c: 376K ~ 103C; 378K ~ 105C]

Which is in line with the EC config and what you are saying.

1 Like

I did capture the frequency data as well but I’ve not included it as there’s already a lot of quite comprehensive benchmarks on the actual performance of the FW13 AMD that are available and I wouldn’t do them justice. The overall performance is exactly what I would expect from this form factor and I’m generally extremely happy with it.

Instead, I wanted to keep the discussion specific to the fan behaviour as a reference point for people, especially for anyone who might think there might be an issue with the way their unit behaves in that regard.

But to answer your question, with the “performance” PPD profile for all-core loads (which I tend to do often) the average frequency is about ~3.6GHz sustained, and for single-core loads it’s ~4.8GHz sustained. This is in line with expectations and other, more thorough, benchmarks.

Ok, I agree that overall clocks should be similar.

But I think clocks on the graph might show people the downsides of the throtling. i.e. might be double digits percent lower clocks while the fan is ramping up? Depending on cooling performance I guess.

Since I switched to PTM7950 I don’t hit 100C anymore, so clocks seem rock stable even before fan ramp up. (Also like 50mhz higher average under full load)

When doing single core(25W from USBC) I got this

$ sudo ectool temps all
--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d       320 K (= 47 C)          23% (313 K and 343 K)
cpu_f75303@4d         319 K (= 46 C)           0% (319 K and 327 K)
ddr_f75303@4d         311 K (= 38 C)        N/A (fan_off=401 K, fan_max=401 K)
cpu@4c                373 K (= 100 C)           0% (376 K and 378 K)

and multi core(43W from USBC) I got this.


$ sudo ectool temps all
--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d       321 K (= 48 C)          26% (313 K and 343 K)
cpu_f75303@4d         325 K (= 52 C)          75% (319 K and 327 K)
ddr_f75303@4d         314 K (= 41 C)        N/A (fan_off=401 K, fan_max=401 K)
cpu@4c                364 K (= 91 C)           0% (376 K and 378 K)

On single core the CPU is hotter while the sensors are cooler, looks like on single core the thermal conductivity magically gets poorer which does not make any sense.

Maybe it could be of some use indeed. I might take a break from ‘graphing’ for a bit, but I’ll see if I can add it in with a secondary Y-axis on one side. It will make the graphs slightly less readable, but a different kind of plot would be a bit of an overkill for temporal data. All-core stats would be easy, but single core, which is probably more interesting, might require some pre-processing as the core on which the load runs tends to change since I didn’t make use of any core pinning.

In any case, my rationale for not including it is that while there will inevitably be an effect from throttling while the fan catches up, this should only affect all-core loads that run for less than the median time for the fan to reach peak RPM. For long, sustained all-core load this shouldn’t make a difference.

For single core loads it might be more subtle as there’s a lot more room for the fan to reach higher RPMs which could in turn, possibly, allow the CPU to boost higher. This is probably the more interesting case.

PTM7950 is really cool stuff, I might look into getting a pad to chop up and try out some time. It’s great to hear your experience with it as it seems to be making quite a change. I presume your CPU is also more comfortable with single core loads? This would be very impressive if so.

No, makes sense to me. With a single CPU generating the heat you get a point source of heat, with a longer path to the outside world, so more thermal resistance between the point source and the external sensor.

With multiple CPUs generating heat then you have multiple point sources each at the same temperature as the single CPU, but the distance to the external sensors from some of them may be shorter, so the external sensors will get hotter.

I don’t know how the CPUs are arrayed in the package, but is the single CPU (I presume this is the primary CPU that starts everything up initially and then masters which processors are being used) in the middle of the package cavity? This could potentially increase the thermal path to the external sensors.

1 Like

It could be possible that the location of different core plays a part on thermal management. When stressing the CPU single core on s-tui the program seems to hopping from a core to another from time to time and there’s no “master” core as the unloaded cores always clock lower than loaded core. My guess is that the “Tctl” reports the highest core temperature but there’s no indication of which core is currently being measured

It’s probably even more complicated than that, there are tons of temperature sensors on a modern cpu/gpu but they usually only expose a a few abstracted ones (like hottest overall, hottest of this core, [insert proprietary buzzword]-math over a few of them or just a specific one).

I noticed that with or without USBC power also makes a difference, when PPD is balanced the pstate EPP is balance_power on battery and balance_performance on USBC power. Could you please specify the USBC power status in your test? Thank you

1 Like

I never tried ectool thermalset … I just assumed that there was an intentional long averaging period on the cpu temp, but that assumption was totally wrong, it was basically ignoring the CPU temp (until 103°C) and waiting for something further away on the motherboard to heat up. You can get instant fan response, if you want it, by setting reasonable cpu fan-off / fan-max temperatures. Here’s what I’m using for now:

ectool thermalset 3 370 380 390 333 363

Translated from Kelvin to Celcius, that’s: fan_off=60 fan_max=90 (and warn=97 high=107 halt=117 but I’m not sure how much warn/high matter). Now my fan comes on in 1 or 2 seconds after starting stress.

1 Like

Good Analysis!
A few points. The fan behavior on the Framework 13 Ryzen is driven by an external thermal sensor which takes some time to respond versus the CPU. This is partially by design, as having the fan respond instantly to the CPU is not an ergonomic tradeoff that many people will want to make. The CPU temperature can go from 40->90+C->40C in a second, having the fan respond so quickly would be unpleasant to listen to.

There are several factors we take into the thermal design of the system including the fan responsiveness, the thermal mass of the heatpipe and spreader, and the acoustic behavior.

We further improved the the fan behavior on the Framework 16 by introducing a virtual thermal sensor algorithm, which is essentially a low pass FIR filter fed by the die temperature. This allows us to tune the fan responsiveness by adjusting the filter parameters. This allows us to emulate placing a physical thermal sensor at different distances to the APU, without having to layout a new PCB. Which solved a similar issue on the Framework 16 APU and GPU.

We are investigating back-porting this new technology to the Ryzen 13 platform.

7 Likes

After doing this the CPU only gets to 90C after a minute of 100% load. Effective

Interesting observation, thanks! I hadn’t thought about it. The experiments were all done while connected to the charger, so that would be on USBC power.

I didn’t capture the EPP, but having looked at it now it appears the EPP hint on my system is always performance, irrespective of PPD profile and AC power. That is assuming /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference is to be trusted. I’m inclined to believe this was also the case for when I ran the experiments as nothing much has changed since (other than a kernel update from 6.6.21 to 6.6.30).

Can you please clarify where you get the EPP hint from if different from above? There may be differences in behaviour between distros and/or kernel versions. AMD keep making updates to the amd_pstate driver (which is good) and I only run the “longerm” line of kernel releases - this is the default on Gentoo and I CBA to compile a new kernel every time it pops up. So not sure if any of the newer changes have been back-ported to the longterm 6.6.x line.

There’s a pretty good thread on tuning fan behaviour with ectool here. I did not link to it earlier as the original post is for the Intel series of the FW13, but there’s a good degree of overlap and has a lot of useful information on the matter.

Thanks for the explanation, Kieran! What you’re saying makes perfect sense and is in line with the post I had linked, so good to know it wasn’t just hearsay. I did suspect that, if by design, there must have been a “comfort” element to it.

1 Like

Another one thing to note is that AMD Ryzen 7000 Series (and 7040 Series) have relatively low thermal conductivity compared to other processors with the same performance, which means that the heat transfer from the CPU to the heat sink is slower. As a result, higher CPU temperature and lower radiator temperature at a given load. Therefore, ramping up the fan immediately when the CPU reaches 90C does little since the radiator is still at 40C. To mitigate the problem, AMD optimized the thermal “wall” of the CPUs so instead of underclocking all the way to 0.4GHz as a form of safety protection, they underclock it just enough to maintain 100C as the radiator catches up dissipating the heat. In this case, undervolting is very beneficial as it’s not only saves energy but also improves performance by quite a bit.

image

Fedora 39, has PPD preinstalled. I used this UI(on the top right of the screen) and if you cat /sys/devices/system/cpu/cpufreq/policy0/energy_performance_preference you can get performance if the (max RPM icon)Performance is selected, power if the (min RPM icon)Power is selected. When (mid RPM icon)Balanced is selected, balance_performance on USBC and balance_power on battery

The fan speed control temperature points are way too high, thus when CPU is heating up, frequency throttling kicks in first, therefore temperature rise slows down, therefore the fan never speeds up or speeds up very very late.

I set to these points

##                  ch warn high halt f_off f_max
## should be quite low
## unknown "local"       70   75   90    40    60
${ectool} thermalset 0  343  348  363   313   333
## CPU@4d ?              60   70   85    40    60              
${ectool} thermalset 1  333  343  358   313   333
## DDR@4D verylow        60   65   80    40    60
${ectool} thermalset 2  333  338  353   313   333
## CPU@4C often 80-90    90   95  110    40    80
${ectool} thermalset 3  363  368  383   313   353

Notes:

  • fan should be max at least 80C, at which temp means CPU load is really really high
  • DDR is not very heat resistant, higher than 60-70C will result data error
  • Channel 1,2 is not quite understood, but readings, but default temperatures seems way too high

after my temp points config.
7-zip (downloaded from web site, not from Debian repo, and much faster)

7840U and 96GB DDR5-5600

a run from a cool machine, approx 40 degC sensor readings:

% 7z b 8

7-Zip 24.08 (x64) : Copyright (c) 1999-2024 Igor Pavlov : 2024-08-11
 64-bit locale=en_US.UTF-8 Threads:16 OPEN_MAX:1024

Compiler:  ver:14.2.0 GCC 14.2.0 : SSE2
Linux : 6.10.11-amd64 : #1 SMP PREEMPT_DYNAMIC Debian 6.10.11-1 (2024-09-22) : x86_64
PageSize:4KB THP:always hwcap:2 hwcap2:2
AMD Ryzen 7 7840U w/ Radeon  780M Graphics
(A70F41) 

1T CPU Freq (MHz):  4539  4812  4695  4838  5030  5099  5107
8T CPU Freq (MHz): 792% 4828   792% 4854  
16T CPU Freq (MHz): 1335% 3972   1586% 4783  

RAM size:   92337 MB,  # CPU hardware threads:  16
RAM usage:   3559 MB,  # Benchmark threads:     16

                       Compressing  |                  Decompressing
Dict     Speed Usage    R/U Rating  |      Speed Usage    R/U Rating
         KiB/s     %   MIPS   MIPS  |      KiB/s     %   MIPS   MIPS

22:      88208  1544   5556  85809  |     866831  1585   4665  73914
23:      75772  1522   5071  77203  |     770179  1576   4228  66626
24:      73088  1510   5203  78585  |     728693  1534   4169  63937
25:      71975  1514   5427  82178  |     706632  1513   4155  62869
22:      79798  1513   5131  77628  |     763827  1545   4215  65131
23:      74359  1499   5056  75764  |     756561  1560   4197  65448
24:      72263  1501   5175  77697  |     730948  1540   4165  64135
25:      71568  1516   5389  81715  |     711045  1529   4138  63262
22:      80043  1523   5112  77866  |     758626  1538   4206  64688
23:      75216  1509   5079  76637  |     757748  1564   4192  65551
24:      72303  1509   5152  77740  |     718334  1520   4146  63028
25:      71439  1515   5383  81567  |     696083  1501   4125  61931
22:      79336  1510   5112  77179  |     765934  1559   4188  65311
23:      75814  1517   5090  77245  |     768439  1578   4212  66475
24:      74313  1534   5209  79902  |     757895  1590   4182  66499
25:      70780  1502   5381  80814  |     729197  1570   4133  64877
22:      77399  1495   5038  75294  |     745900  1515   4197  63603
23:      75496  1518   5067  76922  |     758595  1573   4172  65624
24:      68814  1475   5015  73990  |     737877  1570   4125  64743
25:      72092  1532   5371  82313  |     708815  1549   4071  63064
22:      78572  1520   5027  76436  |     752961  1571   4087  64205
23:      73832  1508   4989  75227  |     740390  1577   4062  64049
24:      72157  1520   5104  77584  |     729466  1587   4034  64005
25:      70641  1520   5308  80656  |     715841  1590   4005  63689
22:      77780  1517   4988  75664  |     714286  1518   4012  60907
23:      74118  1525   4952  75518  |     726006  1571   3998  62805
24:      69919  1494   5033  75177  |     706880  1576   3934  62023
25:      69304  1524   5192  79129  |     697355  1588   3906  62044
22:      75681  1499   4913  73623  |     701399  1524   3924  59808
23:      72754  1523   4866  74128  |     709505  1556   3944  61377
24:      70632  1523   4987  75945  |     708841  1584   3928  62195
25:      69415  1523   5206  79256  |     693458  1579   3907  61697
----------------------------------  | ------------------------------
Avr:     74215  1514   5143  77887  |     735455  1557   4113  64048
Tot:            1536   4628  70967

another run from a hot machine, 80 degC CPU and 50 degC other sensor readings:

7z b 8
.
.
.
Avr:     71695  1506   5000  75279  |     711054  1568   3949  61927
Tot:            1537   4474  68603

readings

--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d       329 K (= 56 C)          80% (313 K and 333 K)
cpu_f75303@4d         326 K (= 53 C)          65% (313 K and 333 K)
ddr_f75303@4d         321 K (= 48 C)          40% (313 K and 333 K)
cpu@4c                354 K (= 81 C)         100% (313 K and 353 K)
$ ectool thermalget

sensor  warn  high  halt   fan_off fan_max   name
  0      343   353    393    313     343     local_f75303@4d
  1      343   353    393    319     327     cpu_f75303@4d
  2      343   353    393    401     401     ddr_f75303@4d
  3      381   388    400    376     378     cpu@4c
(all temps in degrees Kelvin)

This is the default setting.

When the temperature reaches Warn, the power is cut significantly. My interpretation is that the CPU can be used at 100C but other components cannot so when sensor 0 1 and 2 reaches 343K something needs to be done to prevent overheat damage. I found that if you

ectool thermalset 3 370 380 390 333 363

The power, frequency and fan speed will go all over the place as the CPU powers down when reaching 97C then powers up again when below 97W and fan speed fluctuate as the result. If you stress all cores nothing will happen, but if you stress one core or half of the cores, the power will be cut when the hottest core reaches 97C even if the idling cores are only 60~ish Celsius(AMD only displays the hottest core and the thermal conductivity between cores are worse than Intel). After adjusting to

ectool thermalset 3 381 388 400 333 363

The CPU power goes back to default while the fan still works as intended(faster reaction)
Here’s the comparison using $ stress -c 4 -t 30

1 Like