[Guide] FW13 Ryzen Power Management

A.Foss · June 5, 2024, 8:41pm

oh okay, gotcha. Yeah in those situations I’d likely get it plugged in. Okay thanks for your explanation!

Shijikori · June 7, 2024, 8:11pm

latest discussions suggest it’s not the decoder specifically but the process of decoding and displaying video. the wrong hardware functions might be used for scaling the video while efficient hardware functions for that purpose have been included in this GPU and need to be used. that may take a while to sort out but I’ve personally seen improvements since I got my Framework. Now, on power save profile, I get 7-8W power consumption when watching video. this is an improvement over 12W I had when I got the device.

Adrian_Joachim · June 7, 2024, 8:35pm

That has been discussed but no one has been able to actually demonstrate it.

Meanwhile all the actual tests we I have done/seen so far point towards the encoder using the bulk of the power. I did testing without any scaling or display and regular playback testing and in both cases the hw decoder is using more power than even sw decoding under 1080p 30. In the case of playback testing both sw and hw decoding have to use sw scaling so it should hurt both more or less equally and the main power difference is still the hw decoder.

There have been some tests indicating hw scaling does reduce playback power a bit but nowhere near enough.

I would love if somebody would back up those “it’s all the scaling and presentation and stuff” claims with a demo or some testing or anything. It doesn’t need to be anything fancy, hell it doesn’t even need to be useful. Could just be a custom binary spitting out a video directly to the display or hell even just decoding a 1080p 30 video into null for a non ridiculous amount of power.

It’s still way too much.

Mario_Limonciello · June 7, 2024, 9:02pm

That has been discussed but no one has been able to actually demonstrate it.

We discussed this at the Display Next Hackfest this year, and Leo spent cycles evangelizing the strategy of underlay planes (like a hole punch) for compositors to setup. Two of the compositors (Weston and Cosmic) have already done the work to get MPO working.

There are (at least) two patches that were part of this week’s display promotion you need that change how cursors work:
Patch 40
Patch 41

The kernel patches will be part of kernel 6.11. From my discussion with Leo those patches when paired with Weston show video offload savings. They show savings with Cosmic as well but some other bugs occur (pink flickering in the compositor and triggering some driver bugs on amdgpu).

I would love if somebody would back up those “it’s all the scaling and presentation and stuff” claims with a demo or some testing or anything.

I believe if you want to replicate the savings I’m describing you should be able to try it with the tip of the weston tree, mpv and a kernel with those patches.

Shijikori · June 7, 2024, 10:10pm

Not to be pedantic. I don’t disagree that it’s a lot more than necessary but it is an improvement none the less.

Adrian_Joachim · June 7, 2024, 10:25pm

Saw that demo, pretty cool and I do hope we get that soon, however that was done on a rockchip platform that can hw decode into the void for well under 1W already.

That stuff I a little above my skill level but I am not sure how valuable that is before decode only for non ridiculous amounts of power can be demonstrated.

Definitely and I do hope those improvements keep coming.

Shijikori · June 7, 2024, 10:53pm

Mario, do you think it can be tested with ffmpeg alone? Using VAAPI and simply decoding to null and decoding to null with a scaling filter in the process to show? Or does ffmpeg make use of those hardware features and there for wouldn’t show the issue?

Mario_Limonciello · June 7, 2024, 11:57pm

I believe this specific experiment you suggested has been tried unsuccessfully in the past.

I don’t really know how null output works to know if this is a valid experiment that won’t light up other IP. The important parts are the APU is in GFXOFF and CPU cores are idle.

I would guess with a compositor running and the cursor not on its own plane that you can’t really have GFXOFF even during such an experiment.

Mario_Limonciello · June 8, 2024, 5:23am

I just did an experiment on my (unpatched) system using big buck bunny (file name bbb_sunflower_2160p_60fps_normal.mp4) on Kwin running CachyOS w/ kernel 6.9.

First off null output:

❯ sudo perf stat -e power/energy-pkg/ mpv bbb_sunflower_2160p_60fps_normal.mp4 -vo null --start=10 --end=20 --fullscreen
182.23 Joules power/energy-pkg/

Now null with hwdec

❯ sudo perf stat -e power/energy-pkg/ mpv bbb_sunflower_2160p_60fps_normal.mp4 -vo null -hwdec=vaapi --start=10 --end=20 --fullscreen
182.11 Joules power/energy-pkg/

And then lastly with letting the compositor use it:

❯ sudo perf stat -e power/energy-pkg/ mpv bbb_sunflower_2160p_60fps_normal.mp4 -vo gpu -hwdec=vaapi --start=10 --end=20 --fullscreen
164.80 Joules power/energy-pkg/

From those 3 comparisons doesn’t that show that VAAPI is using “less” power than not?

Adrian_Joachim · June 10, 2024, 6:14am

Yeah above around 1080p30 that was usually the case, though still using a lot more power than you’d expect. In my testing at 4k 60 hw acceleration does significantly better than sw but that test-setup is also using ffmpeg or kodi and measuring on battery discharge. But around 4k 60 the power consumption of sw decoding starts really going off the rails whereas hw decoding is somewhat linear.

Mario_Limonciello · June 10, 2024, 12:07pm

Hmm, so doing a RAPL measurement means you get all the SoC usage not just the decoder.
One thing worth ruling out is whether it’s actually because of the active EPP policy and scheduler use.

Just today there are some patches posted that expose the PMU counters for a core by core basis.

https://lore.kernel.org/lkml/20240610100751.4855-1-Dhananjay.Ugwekar@amd.com/

pierce · June 16, 2024, 5:25pm

This seems to support a vague theory I have, which is that testing with -vo null (or similar with ffmpeg) still copies the decoded video back to the user-space process, before that process just discards it (ignores it, overwrites it, whatever). But the ideal for power saving is that the decoded video does not go anywhere after the GPU, the GPU composites it into the image going out to the screen completely on its own.

That’s what the hardware-plane management work in the driver and compositors is trying to achieve. The fact that -vo null seems to use more power than -vo gpu suggests that -vo null is doing the thing we explicitly want to avoid, the copy-back to userspace (while -vo gpu could potentially do significantly better if gpu hardware was managed just right by the driver and compositor … the intel driver manages to do better, because of differently structured hardware features, right? but I’m way out of my area of expertise here)

Mario_Limonciello · June 16, 2024, 5:58pm

Yup I agree with your hypothesis.

Wade · June 19, 2024, 12:36am

Thats nuts. I’m just sitting here doing nothing, with a browser open and I’m at 8W. Starting Plex playback jumps it to 11.2W.

Shijikori · June 19, 2024, 1:27am

That’s on bios 3.05, Arch Linux, using Firefox usually. Also using PPD on KDE, with the power profile set to power save as mentioned. This seems to work well for me. Make sure it’s actually controlling the epp profile with powerprofilesctl (iirc)

Wade · June 19, 2024, 9:51am

Fascinating, I’ve got almost the exact same setup, except I’m using sway instead of KDE (which, naively I would think would be better or the same). I did notice looking at powertop that pipewire uses 1W constantly (even with nothing is playing) so maybe thats a good first thing to run to ground.

ppd looks right to me:

* power-saver:
    CpuDriver:  amd_pstate
    PlatformDriver:     platform_profile

Edit: I guess this is actually not right (wrt PPD), it should be amd_pstate_epp? Do you use active/passive/guided?

Edit 2: it appears to only show up in /sys/devices/ and not in ppd, but with active I do seen significant improvement, idleing around 6W, 7W with firefox open, and hitting 9-10W on playback. @Mario_Limonciello perhaps you should include setting amd_pstate=active in the guide as well?

I do still see a few sus things in powertop, like pipewire always using 1W even with no audio playback:

Mario_Limonciello · June 19, 2024, 4:35pm

Active (EPP) is the default upstream policy. You can change it in the kernel if you want to, but PPD won’t load the AMD pstate “driver” unless you have the kernel set to active.

What terminal are you using? Some terminals (like WARP) use the graphics engine and will prevent low idle.

You should also try something besides sway in case sway is causing your issue.

Wade · June 19, 2024, 7:13pm

Understood, thanks.

I’m using wezterm, which generally seems good when doing nothing.

So, I believe the issue with pipewire was two-fold after digging in.

I was using waybar-cava which apparently burns power for no reason (as of now I no longer use it). Normal waybar is fine.
The discord tab in firefox was somehow using pipewire resources. I have no idea how this used so much power without audio on…

Thanks for your help, I guess the takeway is it’s always worth checking powertop every so often to make sure you’re not footgunning your battery life.

Mario_Limonciello · July 10, 2024, 11:28pm

All;

Try this series for video playback.

https://lore.kernel.org/amd-gfx/20240710202952.188573-1-boyuan.zhang@amd.com/T/#m349310747500886c5d42cccafc5765cb063346a6

TheStachelfisch · July 10, 2024, 11:50pm

Do you have a direct patch file link?
I can’t find a way to get one directly from the lore entry.