[RESPONDED] Thermal management on Linux with RAPL limits

I’ve been experimenting with power capping using Intel RAPL limits on my Framework with the i5 1135 running Ubuntu 22.04 since I wasn’t happy with the CPU getting to 90C and the fan running very loud when I start a multi-core workload - let’s say multithreaded compiling.

https://www.kernel.org/doc/html/latest/power/powercap/powercap.html

I stopped thermald to prevent it from changing the RAPL settings. Then I ran some tests with multi-core and single-core benchmarks and tweaked the limits to get a balance I like (using the pts/c-ray and pts/encode-flac benchmarks in phoronix-test-suite).

I ended up with the settings:
Long term power limit: 14 watts / 10 seconds
Short term power limit: leave default (unlimited)
Peak power limit: 36 watts

On the multi-threaded benchmark, this lets the CPU boost all cores to 3.2GHz for a short period of time, then settles around 2.7-2.8GHz, and never gets much above 60C.

On the single-threaded benchmark, it still lets the CPU maintain max boost on one core at a time - around 4.0GHz on my CPU - and that one core stays around 70C.

Your mileage may vary, so I’d recommend running some tests if you want to try this. The limits could be lower if you want more battery life.

$ sudo systemctl stop thermald
$ sudo powercap-set intel-rapl --zone=0 --constraint=0 -l 14000000 -s 10000000
$ sudo powercap-set intel-rapl --zone=0 --constraint=2 -l 36000000

(The units are microwatts / microseconds so multiply by 1,000,000. On this CPU, Zone 0 is the CPU package, Constraint 0 is the long term limit, Constraint 2 is the peak limit. Constraint 1 is the short term limit, it’s set to a high value by default. I didn’t see much difference setting either the peak or short term limit, the short term limit has a very small time window by default.)

I wrote a systemd unit file to apply the settings at startup. I also wrote one for a battery saver mode with 10w/20w limits but haven’t tested it much yet.

/etc/systemd/system/intel-rapl-balanced.service

[Unit]
Description=Set Intel RAPL power limits (balanced)
Conflicts=intel-rapl-powersave.service thermald.service

[Service]
ExecStart=/bin/sh -c "powercap-set intel-rapl -z 0 -c 0 -l 14000000 -s 10000000 && powercap-set intel-rapl -z 0 -c 2 -l 36000000"
ExecStop=/bin/sh -c "powercap-set intel-rapl -z 0 -c 0 -l 200000000 && powercap-set intel-rapl -z 0 -c 2 -l 121000000"
RemainAfterExit=yes

[Install]
WantedBy=sysinit.target

/etc/systemd/system/intel-rapl-powersave.service

[Unit]
Description=Set Intel RAPL power limits (powersave)
Conflicts=intel-rapl-balanced.service thermald.service

[Service]
ExecStart=/bin/sh -c "powercap-set intel-rapl -z 0 -c 0 -l 10000000 -s 10000000 && powercap-set intel-rapl -z 0 -c 2 -l 20000000"
ExecStop=/bin/sh -c "powercap-set intel-rapl -z 0 -c 0 -l 200000000 && powercap-set intel-rapl -z 0 -c 2 -l 121000000"
RemainAfterExit=yes

[Install]
WantedBy=sysinit.target
$ sudo systemctl daemon-reload
$ sudo systemctl enable intel-rapl-balanced
$ sudo systemctl start intel-rapl-balanced
$ sudo systemctl disable thermald

(Since I put Conflict directives, starting one will automatically stop the other and thermald.)

17 Likes

Good work. I’ll test it out when I can, though I’m pretty busy these days.

Might be the last piece of the puzzle I was looking for my automagic battery vs plugged in power management strategies…

I have it setup to switch on AC adapter connect/disconnect using acpid:

/etc/apci/events/ac-power-on

event=ac_adapter ACPI0003:00 00000080 00000001
action=systemctl start intel-rapl-balanced

/etc/acpi/events/ac-power-off

event=ac_adapter ACPI0003:00 00000080 00000000
action=systemctl start intel-rapl-powersave
$ sudo systemctl restart acpid
3 Likes

Thank you for this! I’ve been looking for an example of someone setting the configurable TDP on Linux.

Now that you have been using this for a while, is there a noticeable change in everyday system performance, or do you only see a change with the heavy multicore workloads?

I’m thinking that leveraging powercap is a much more sophisticated way to improve battery life on the Framework. Are you pairing this with powertop/tlp/power-profiles-daemon?

1 Like

Thanks for the post! I tried the instructions on the first comment and the 3rd comment on Fedora 36 on my Framework Laptop.

  • Install powercap.
  • Set up the systemd services.
  • Set up acpid events.

Here is the command log: Improving thermal management with Intel Running Average Power Limit (RAPL) · junaruga/framework-laptop-config Wiki · GitHub . I don’t hear any “loud” sound from the fan anymore so far. That’s great.

1 Like

Besides lower fan noise, curious if you’ve measured battery life improvements in your regular use (or if you’re doing any benchmarking?).

Unfortunately, I didn’t measure any battery life improvements. The powertop shows the result below with 1 Firefox and 2 terminals (1 vim, 1 powertop) now.

The battery reports a discharge rate of 6.92 W
The energy consumed was 151 J
The estimated remaining time is 7 hours, 31 minutes
The battery reports a discharge rate of 7.93 W
The energy consumed was 155 J
The estimated remaining time is 6 hours, 30 minutes
1 Like

Before setting up above, the CPU temperature was often more than 80 degrees, but now around 40 ~ less than 60 degrees. That is great.

I’ve just checked default RAPL values on my system and found one strange value:

korvin@sigma:~/work/research$ powercap-info intel-rapl -z 0 -c 0
Zone 0
  Constraint 0
    name: long_term
    power_limit_uw: 200000000
    time_window_us: 31981568
    max_power_uw: 28000000
korvin@sigma:~/work/research$ powercap-info intel-rapl -z 0 -c 1
Zone 0
  Constraint 1
    name: short_term
    power_limit_uw: 64000000
    time_window_us: 2440
    max_power_uw: 0
korvin@sigma:~/work/research$ powercap-info intel-rapl -z 0 -c 2
Zone 0
  Constraint 2
    name: peak_power
    power_limit_uw: 90000000
    time_window_us: 0
    max_power_uw: 0

Apparently long term power limit is set to 200000000 which is 200 000 000 which is 200W. Obviously this is insanely huge, especially given that other values seem reasonable.

My guess is that BIOS devs accidentally added extra zero in default config

This matches my experience, where laptop first tries some peak frequencies, then tries to settle down but then goes through the roof (apparently, when long term setting kicks in).

1 Like

If you believe you’ve captured a bug in the BIOS, please do open a ticket and let them know.