AMD FW13 fan behaviour and ramp-up times

There’s been a few threads that I found discussing cooling and fan behaviour, with a few posts mentioning sluggish fan ramp up times, but none seemed to be appropriate for this discussion as they were all started originally as separate, more specific questions.

Over in the AMD BIOS 3.05 release thread a few people, myself included, appear to have noticed a seemingly sluggish fan behaviour on the FW13 AMD and there has been a short discussion, so I thought I’d share my findings.

Disclaimer: While I previously thought the fan behaviour was a bit sluggish to ramp up with sudden sustained CPU load, I never really ran any experiments up until now. I admit I only started investigating this and paying close attention to the fan behaviour after updating to BIOS v3.05 and seeing a post that caught my attention and kicked off the discussion there. Maybe I was actively looking for regressions where there may have none and the observed behaviour is likely nothing to do with BIOS 3.05 specifically.

Anyway, I took inspiration from another member’s post on the matter and I got ectool compiled (this version supports AMD boards without the custom patch set for the kernel) to get the fan RPM readings and ran some experiments.

Experimental set up and methodology

  • no EC tweaking was ever done on this unit with ectool so EC behaviour is what would be considered “default”
  • Linux kernel 6.6.21 (Gentoo Linux)
  • amd_pstate: active (tends to be default for newer kernels and distros)
  • CPUFreq governor: powersave
    Default. Note, with amd_pstate=active only other option is performance. powersave, in fact, behaves more like “on-demand” and is similar to powersave for intel_pstate. This does not limit the frequency boost. The often default schedutil governor is not available with amd_pstate=active.
  • all-core load generated with stress -c 17
  • single-core load generated with stress -c 1
  • fan RPM values read from: ectool pwmgetfanrpm
  • CPU temperature values read from k10temp kernel module, as reported by HWMon under /sys/class/hwmon
  • ambient temperature was about 19-20C
  • CPU would be allowed to cool down to around 33-34C between experiments, which seemed to be the most stable average when idle
  • variables considered:
    • PPD profile
    • Single/all core load

Results

Notes on the graphs:

  • readings are taken approx every 0.5 sec (or 500 msec)
  • reported experiments are approx. 6 min long where:
    • CPU load starts at approx. 20 sec mark;
    • CPU load runs for approx 3:40 min in all charts;
    • ‘cooldown’ period after CPU load is approx. 2 min;

All-core Load Results




Single-core Load Results




Findings

For all-core load fan behaviour appears consistent with what was reported in the earlier post.

In the above graphs the fan took between 30-60 seconds at already peak temperatures to even start ramping up, with “balanced” PPD profile having the ‘fastest’ response time at the lower end, and PPD disabled having the ‘slowest’ response time. The ‘balanced’ PPD profile did not reach as RPM values as with ‘performance’ but the average sustained all-core CPU frequency was also slightly lower by about 100Mhz and Tctl was also lower, likely due to PPD policy, so this is to be expected.

Single-core load, on the other hand is more interesting. In all cases, the fan took about 60 sec to respond to the high CPU temperatures. Not only that, but the fan was sitting below 3000 RPM and well below its max RPM while Tctl was 95-100C.

Conclusions

  • I’m less inclined to believe that there was any fan behaviour change with BIOS 3.05 but I have yet to roll back and run the experiments against 3.03.
  • The sluggish fan ramp up for all-core loads is certainly less than what I would consider ideal, but with the CPU throttling itself while the fan decides to wake up it doesn’t seem to be something I would be particularly concerned about.
  • Single-core loads, or perhaps also applicable to several, however, makes me wonder if there’s more room for cooling. Even if boost frequency remains at 4.8GHz (which was the case in my experiments), higher fan RPM might be able to bring down the temperature. With PPD set to performance, Tctl was pegged at 100C which is still “fine” as far as the hardware is concerned, but still rather toasty with plenty of RPMs to spare.

As an aside, it has been suggested that fan behaviour might be affected by the fact that it uses a sensor that is physically outside of the APU and may be contributing to the slower fan response time, i.e. that the above behaviour is expected ‘by design’. Not sure if it would be possible, but it does seem to be like there could be room for improvement here.

Anyway, I hope that all this helps answer some questions about the fan behaviour on the AMD FW13.

If I run any more experiments or have other findings I will share.


Edits:

  • Add results for “power-saver” PPD profile
  • Update incorrect title in all-core graph for powersave PPD profile (stated “load cores: single” instead of “load cores: all”)
6 Likes

The fan controller does read the CPU temperature in addition to two other sensors away from the CPU and use the highest % of all three for the actual PWM duty cycle, is just that the temperature threshold is set to 103C so that it’s not considered since the CPU becomes constant temperature operation at 100C. To solve this, one could set (no instruction yet) the Tctl threshold to 90 - 95C to use up the use the plenty spare RPM.

What’s the difference between disabled and powersave?

Awesome write up.

Do you think adding average CPU frequency for all core loads and maximum frequency for single core load would be of any use?

I was tempted to make a script that takes care of everything and spit out a log. But honestly the behavior seems ok for me on my end so I haven’t done anything else yet.

I must have forgotten to put up the results. Now updated.

The fan behaviour from a “fan curve” perspective is very similar (across the board I should say). The difference is with PPD set to ‘power-saver’, for all-core load the temps seem to be lower, settling around 70-72C with lower peak fan RPM. This is likely due to profile policy on CPU frequency. With PPD “disabled” the average CPU frequency would settle at around 3.3GHz in the above experiments, whereas with “power-saver” it would settle around 2.2GHz which would explain the difference in Tctl.

This does seem to be the case. I did not look too much into it and only added the link to the post for completion, rather than any criticism. There’s a lot of useful links in other threads that discuss the EC configuration and thermals, though most seem to be Intel related.

I did get the following readout from ectool thermalget:

sensor  warn  high  halt   fan_off fan_max   name
  0      343   353    393    313     343     local_f75303@4d
  1      343   353    393    319     327     cpu_f75303@4d
  2      343   353    393    401     401     ddr_f75303@4d
  3      381   388    400    376     378     cpu@4c
(all temps in degrees Kelvin)

[for cpu@4c: 376K ~ 103C; 378K ~ 105C]

Which is in line with the EC config and what you are saying.

1 Like

I did capture the frequency data as well but I’ve not included it as there’s already a lot of quite comprehensive benchmarks on the actual performance of the FW13 AMD that are available and I wouldn’t do them justice. The overall performance is exactly what I would expect from this form factor and I’m generally extremely happy with it.

Instead, I wanted to keep the discussion specific to the fan behaviour as a reference point for people, especially for anyone who might think there might be an issue with the way their unit behaves in that regard.

But to answer your question, with the “performance” PPD profile for all-core loads (which I tend to do often) the average frequency is about ~3.6GHz sustained, and for single-core loads it’s ~4.8GHz sustained. This is in line with expectations and other, more thorough, benchmarks.

Ok, I agree that overall clocks should be similar.

But I think clocks on the graph might show people the downsides of the throtling. i.e. might be double digits percent lower clocks while the fan is ramping up? Depending on cooling performance I guess.

Since I switched to PTM7950 I don’t hit 100C anymore, so clocks seem rock stable even before fan ramp up. (Also like 50mhz higher average under full load)

When doing single core(25W from USBC) I got this

$ sudo ectool temps all
--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d       320 K (= 47 C)          23% (313 K and 343 K)
cpu_f75303@4d         319 K (= 46 C)           0% (319 K and 327 K)
ddr_f75303@4d         311 K (= 38 C)        N/A (fan_off=401 K, fan_max=401 K)
cpu@4c                373 K (= 100 C)           0% (376 K and 378 K)

and multi core(43W from USBC) I got this.


$ sudo ectool temps all
--sensor name -------- temperature -------- ratio (fan_off and fan_max) --
local_f75303@4d       321 K (= 48 C)          26% (313 K and 343 K)
cpu_f75303@4d         325 K (= 52 C)          75% (319 K and 327 K)
ddr_f75303@4d         314 K (= 41 C)        N/A (fan_off=401 K, fan_max=401 K)
cpu@4c                364 K (= 91 C)           0% (376 K and 378 K)

On single core the CPU is hotter while the sensors are cooler, looks like on single core the thermal conductivity magically gets poorer which does not make any sense.

Maybe it could be of some use indeed. I might take a break from ‘graphing’ for a bit, but I’ll see if I can add it in with a secondary Y-axis on one side. It will make the graphs slightly less readable, but a different kind of plot would be a bit of an overkill for temporal data. All-core stats would be easy, but single core, which is probably more interesting, might require some pre-processing as the core on which the load runs tends to change since I didn’t make use of any core pinning.

In any case, my rationale for not including it is that while there will inevitably be an effect from throttling while the fan catches up, this should only affect all-core loads that run for less than the median time for the fan to reach peak RPM. For long, sustained all-core load this shouldn’t make a difference.

For single core loads it might be more subtle as there’s a lot more room for the fan to reach higher RPMs which could in turn, possibly, allow the CPU to boost higher. This is probably the more interesting case.

PTM7950 is really cool stuff, I might look into getting a pad to chop up and try out some time. It’s great to hear your experience with it as it seems to be making quite a change. I presume your CPU is also more comfortable with single core loads? This would be very impressive if so.

No, makes sense to me. With a single CPU generating the heat you get a point source of heat, with a longer path to the outside world, so more thermal resistance between the point source and the external sensor.

With multiple CPUs generating heat then you have multiple point sources each at the same temperature as the single CPU, but the distance to the external sensors from some of them may be shorter, so the external sensors will get hotter.

I don’t know how the CPUs are arrayed in the package, but is the single CPU (I presume this is the primary CPU that starts everything up initially and then masters which processors are being used) in the middle of the package cavity? This could potentially increase the thermal path to the external sensors.

1 Like

It could be possible that the location of different core plays a part on thermal management. When stressing the CPU single core on s-tui the program seems to hopping from a core to another from time to time and there’s no “master” core as the unloaded cores always clock lower than loaded core. My guess is that the “Tctl” reports the highest core temperature but there’s no indication of which core is currently being measured

It’s probably even more complicated than that, there are tons of temperature sensors on a modern cpu/gpu but they usually only expose a a few abstracted ones (like hottest overall, hottest of this core, [insert proprietary buzzword]-math over a few of them or just a specific one).

I noticed that with or without USBC power also makes a difference, when PPD is balanced the pstate EPP is balance_power on battery and balance_performance on USBC power. Could you please specify the USBC power status in your test? Thank you

1 Like

I never tried ectool thermalset … I just assumed that there was an intentional long averaging period on the cpu temp, but that assumption was totally wrong, it was basically ignoring the CPU temp (until 103°C) and waiting for something further away on the motherboard to heat up. You can get instant fan response, if you want it, by setting reasonable cpu fan-off / fan-max temperatures. Here’s what I’m using for now:

ectool thermalset 3 370 380 390 333 363

Translated from Kelvin to Celcius, that’s: fan_off=60 fan_max=90 (and warn=97 high=107 halt=117 but I’m not sure how much warn/high matter). Now my fan comes on in 1 or 2 seconds after starting stress.

1 Like

Good Analysis!
A few points. The fan behavior on the Framework 13 Ryzen is driven by an external thermal sensor which takes some time to respond versus the CPU. This is partially by design, as having the fan respond instantly to the CPU is not an ergonomic tradeoff that many people will want to make. The CPU temperature can go from 40->90+C->40C in a second, having the fan respond so quickly would be unpleasant to listen to.

There are several factors we take into the thermal design of the system including the fan responsiveness, the thermal mass of the heatpipe and spreader, and the acoustic behavior.

We further improved the the fan behavior on the Framework 16 by introducing a virtual thermal sensor algorithm, which is essentially a low pass FIR filter fed by the die temperature. This allows us to tune the fan responsiveness by adjusting the filter parameters. This allows us to emulate placing a physical thermal sensor at different distances to the APU, without having to layout a new PCB. Which solved a similar issue on the Framework 16 APU and GPU.

We are investigating back-porting this new technology to the Ryzen 13 platform.

6 Likes

After doing this the CPU only gets to 90C after a minute of 100% load. Effective

Interesting observation, thanks! I hadn’t thought about it. The experiments were all done while connected to the charger, so that would be on USBC power.

I didn’t capture the EPP, but having looked at it now it appears the EPP hint on my system is always performance, irrespective of PPD profile and AC power. That is assuming /sys/devices/system/cpu/cpu*/cpufreq/energy_performance_preference is to be trusted. I’m inclined to believe this was also the case for when I ran the experiments as nothing much has changed since (other than a kernel update from 6.6.21 to 6.6.30).

Can you please clarify where you get the EPP hint from if different from above? There may be differences in behaviour between distros and/or kernel versions. AMD keep making updates to the amd_pstate driver (which is good) and I only run the “longerm” line of kernel releases - this is the default on Gentoo and I CBA to compile a new kernel every time it pops up. So not sure if any of the newer changes have been back-ported to the longterm 6.6.x line.

There’s a pretty good thread on tuning fan behaviour with ectool here. I did not link to it earlier as the original post is for the Intel series of the FW13, but there’s a good degree of overlap and has a lot of useful information on the matter.

Thanks for the explanation, Kieran! What you’re saying makes perfect sense and is in line with the post I had linked, so good to know it wasn’t just hearsay. I did suspect that, if by design, there must have been a “comfort” element to it.

1 Like

Another one thing to note is that AMD Ryzen 7000 Series (and 7040 Series) have relatively low thermal conductivity compared to other processors with the same performance, which means that the heat transfer from the CPU to the heat sink is slower. As a result, higher CPU temperature and lower radiator temperature at a given load. Therefore, ramping up the fan immediately when the CPU reaches 90C does little since the radiator is still at 40C. To mitigate the problem, AMD optimized the thermal “wall” of the CPUs so instead of underclocking all the way to 0.4GHz as a form of safety protection, they underclock it just enough to maintain 100C as the radiator catches up dissipating the heat. In this case, undervolting is very beneficial as it’s not only saves energy but also improves performance by quite a bit.