Thanks for sharing this! I spent just a little bit of time today playing with some system settings. I’m sad to say that I don’t have a complete answer to my original question of how well the powercap framework works, since I was a little perplexed by what turned out to be the tlp defaults. Those of you familiar with intel_pstate
, CPUFreq
, and powercap
will already know everything that I will report here, but novice Linux enthusiasts with a mere 10 years of usage like me may learn something
TLDR: I activated the defaults in tlp without realizing that these defaults create strict frequency limits for each threaded process. The good news is, I seemed to get powercap
working on my machine. Because of some technicalities, I’m unable to provide a meaningful report on the improvements it provides upon standard tlp.
My stress test and some reading
The Phenomenon
I started out by installing stress-ng
and trying to get baseline power statistics with powertop under both a single-core and multicore stress test. To my surprise, I found that the usage did not go above ~12w under these the 6 threaded test, and not above ~10w on the single-core test. I found this baffling since I did not set a percentage limit on max performance, and I’m using the 1135g7 which has a tdp of 28w.
I dug into my settings and confirmed that I did not set the max performance limiting. Yet, when I ran tlp-stat, I was informed that a 50% performance limit was being set.
# reported from tlp-stat
/sys/devices/system/cpu/intel_pstate/min_perf_pct = 9 [%]
/sys/devices/system/cpu/intel_pstate/max_perf_pct = 50 [%]
/sys/devices/system/cpu/intel_pstate/no_turbo = 0
/sys/devices/system/cpu/intel_pstate/turbo_pct = 47 [%]
/sys/devices/system/cpu/intel_pstate/num_pstates = 39
# i.e. the second line is equivalently this config:
CPU_MAX_PERF_ON_BAT=50
This seemed to explain why the frequency chart on powertop was not reporting above 1.8GHz for any core throughout either a single or multicore stress test.
Intel P-states
At this point, I wanted to know a bit more about what this intel_pstate thing is doing and how it differs from powercap. Here’s what I found:
…the CPUFreq core uses frequencies for identifying operating performance points of CPUs and frequencies are involved in the user space interface exposed by it, so intel_pstate maps its internal representation of P-states to frequencies too…
So pstate is a fancy version of capping CPU frequency—got it! Further:
Since the hardware P-state selection interface used by intel_pstate
is available at the logical CPU level, the driver always works with individual CPUs. Consequently, if intel_pstate
is in use, every CPUFreq
policy object corresponds to one logical CPU and CPUFreq
policies are effectively equivalent to CPUs.
Integrating this quote with what we saw previously, here is what I can surmise.
Tlp by default implements pstate-based CPU frequency limiting. Since this limiting is applied to logical CPUs, this ensures that every thread running on the Framework is running at a capped frequency. Consequently, features like turbo boost and even short term boosts in processing are disabled under battery.
Of course, this does provide power advantages, but at the same time it would be nice if I could let one or two threads go above their pstate limit if it still meant having a relatively low power consumption. This means pesky user applications which use way too many CPU cycles (cough Zoom cough) could continue their decadent reign without impacting the user experience.
Researching Powercap
I found this page on the Linux kernel powercap implementation:
[In an example given on the page, t]here is one control type called intel-rapl which contains two power zones, intel-rapl:0 and intel-rapl:1, representing CPU packages. Each of these power zones contains two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the “core” and the “uncore” parts of the given CPU package, respectively. All of the zones and subzones contain energy monitoring attributes (energy_uj, max_energy_range_uj) and constraint attributes (constraint_*) allowing controls to be applied (the constraints in the ‘package’ power zones apply to the whole CPU packages and the subzone constraints only apply to the respective parts of the given package individually).
From this quote, we may infer that constraints in the powercap framework are applied to the entirety of the CPU. Further, these constraints are a little more dynamic, since they apply to running averages over time rather than instantaneous limits like pstates:
Depending on different power zones, the Intel RAPL technology allows one or multiple constraints like short term, long term and peak power, with different time windows to be applied to each power zone.
So it seems like the application of powercap would allow me to set a similar limit on power usage, e.g. the 12w that I noticed on multicore workloads, without limiting performance on single-core workloads. E.g. no stuttering when launching a big app, container or VM, which might scale up and down quickly without triggering a powercap constraint. In terms of my benchmarks, I would expect this to allow the single-core stress test to go beyond its 1.8GHz limit under tlp while still limiting the multicore stress test.
What I tried
Without messing with the power-prioritizing governor, I tried simply removing the limit on pstate by adding:
CPU_MAX_PERF_ON_BAT=100
Then, I set a TDP target with the powercap-set
command:
sudo systemctl stop thermald
sudo powercap-set -p intel-rapl --zone=0 --constraint=0 -l 14000000 -s 10000000
# I had to add this to enable the "core" components of the CPU package
sudo powercap-set -p intel-rapl --zone=0:0 -e 1
I’m essentially using what was reported in the other thread on powercap, linked in my original post. I liked the idea of pinning to the 14W target as the 10-second running average, since this was close to what I got under the default tlp setting of limiting the pstate and @jbch settled on it after their own testing. It seems like powercap goes all the way down to 12W based on the intel specs for the i5 1135g7, although I haven’t tried putting this even lower. I verified that the system accepted these settings with sudo powercap-info -p intel-rapl
.
I also played a bit with the energy policy while using the powercap setting:
CPU_ENERGY_PERF_POLICY_ON_BAT=balance_performance
Interestingly, it did seem to have an impact on the metrics for the benchmark, even though the power draw was still limited by powercap. I.e., it seems like powercap might be “smarter” than this energy policy. More tests might be needed to confirm this, though.
What I haven’t done
I haven’t created the systemd scripts to make this setting stick, nor have I verified my conjectures about single-threaded workloads. I’d have to figure out how to get the stress test to “pin” to a specified core, and that’s just a bridge too far for one evening. For the moment, I’m going to give the laptop back to my wife without messing up her battery life, which is already completely amazing to her as it is!
You might have noticed that tlp by default also limits the turbo boost pstate to 47%. It would be interesting to see if I could uncap that as well in combination to gain more performance improvements on battery.
Conclusion
I hope this helps someone out there who is tinkering with their system, or who is trying to figure out how to use Intel’s “TDP-down” configuration in Linux. Let me know if my understanding is wrong, since I may have missed something in my sleuthing! I welcome any thoughts about how to measure CPU frequency under stress testing.