[Guide] FW13 Ryzen Power Management

oh okay, gotcha. Yeah in those situations I’d likely get it plugged in. Okay thanks for your explanation!

latest discussions suggest it’s not the decoder specifically but the process of decoding and displaying video. the wrong hardware functions might be used for scaling the video while efficient hardware functions for that purpose have been included in this GPU and need to be used. that may take a while to sort out but I’ve personally seen improvements since I got my Framework. Now, on power save profile, I get 7-8W power consumption when watching video. this is an improvement over 12W I had when I got the device.

That has been discussed but no one has been able to actually demonstrate it.

Meanwhile all the actual tests we I have done/seen so far point towards the encoder using the bulk of the power. I did testing without any scaling or display and regular playback testing and in both cases the hw decoder is using more power than even sw decoding under 1080p 30. In the case of playback testing both sw and hw decoding have to use sw scaling so it should hurt both more or less equally and the main power difference is still the hw decoder.

There have been some tests indicating hw scaling does reduce playback power a bit but nowhere near enough.

I would love if somebody would back up those “it’s all the scaling and presentation and stuff” claims with a demo or some testing or anything. It doesn’t need to be anything fancy, hell it doesn’t even need to be useful. Could just be a custom binary spitting out a video directly to the display or hell even just decoding a 1080p 30 video into null for a non ridiculous amount of power.

It’s still way too much.

1 Like

That has been discussed but no one has been able to actually demonstrate it.

We discussed this at the Display Next Hackfest this year, and Leo spent cycles evangelizing the strategy of underlay planes (like a hole punch) for compositors to setup. Two of the compositors (Weston and Cosmic) have already done the work to get MPO working.

There are (at least) two patches that were part of this week’s display promotion you need that change how cursors work:
Patch 40
Patch 41

The kernel patches will be part of kernel 6.11. From my discussion with Leo those patches when paired with Weston show video offload savings. They show savings with Cosmic as well but some other bugs occur (pink flickering in the compositor and triggering some driver bugs on amdgpu).

I would love if somebody would back up those “it’s all the scaling and presentation and stuff” claims with a demo or some testing or anything.

I believe if you want to replicate the savings I’m describing you should be able to try it with the tip of the weston tree, mpv and a kernel with those patches.

3 Likes

Not to be pedantic. I don’t disagree that it’s a lot more than necessary but it is an improvement none the less.

Saw that demo, pretty cool and I do hope we get that soon, however that was done on a rockchip platform that can hw decode into the void for well under 1W already.

That stuff I a little above my skill level but I am not sure how valuable that is before decode only for non ridiculous amounts of power can be demonstrated.

Definitely and I do hope those improvements keep coming.

Mario, do you think it can be tested with ffmpeg alone? Using VAAPI and simply decoding to null and decoding to null with a scaling filter in the process to show? Or does ffmpeg make use of those hardware features and there for wouldn’t show the issue?

I believe this specific experiment you suggested has been tried unsuccessfully in the past.

I don’t really know how null output works to know if this is a valid experiment that won’t light up other IP. The important parts are the APU is in GFXOFF and CPU cores are idle.

I would guess with a compositor running and the cursor not on its own plane that you can’t really have GFXOFF even during such an experiment.

I just did an experiment on my (unpatched) system using big buck bunny (file name bbb_sunflower_2160p_60fps_normal.mp4) on Kwin running CachyOS w/ kernel 6.9.

First off null output:

❯ sudo perf stat -e power/energy-pkg/ mpv bbb_sunflower_2160p_60fps_normal.mp4 -vo null --start=10 --end=20 --fullscreen
182.23 Joules power/energy-pkg/

Now null with hwdec

❯ sudo perf stat -e power/energy-pkg/ mpv bbb_sunflower_2160p_60fps_normal.mp4 -vo null -hwdec=vaapi --start=10 --end=20 --fullscreen
182.11 Joules power/energy-pkg/                                                     

And then lastly with letting the compositor use it:

❯ sudo perf stat -e power/energy-pkg/ mpv bbb_sunflower_2160p_60fps_normal.mp4 -vo gpu -hwdec=vaapi --start=10 --end=20 --fullscreen
164.80 Joules power/energy-pkg/                                                     

From those 3 comparisons doesn’t that show that VAAPI is using “less” power than not?

Yeah above around 1080p30 that was usually the case, though still using a lot more power than you’d expect. In my testing at 4k 60 hw acceleration does significantly better than sw but that test-setup is also using ffmpeg or kodi and measuring on battery discharge. But around 4k 60 the power consumption of sw decoding starts really going off the rails whereas hw decoding is somewhat linear.

Hmm, so doing a RAPL measurement means you get all the SoC usage not just the decoder.
One thing worth ruling out is whether it’s actually because of the active EPP policy and scheduler use.

Just today there are some patches posted that expose the PMU counters for a core by core basis.

https://lore.kernel.org/lkml/20240610100751.4855-1-Dhananjay.Ugwekar@amd.com/

2 Likes

This seems to support a vague theory I have, which is that testing with -vo null (or similar with ffmpeg) still copies the decoded video back to the user-space process, before that process just discards it (ignores it, overwrites it, whatever). But the ideal for power saving is that the decoded video does not go anywhere after the GPU, the GPU composites it into the image going out to the screen completely on its own.

That’s what the hardware-plane management work in the driver and compositors is trying to achieve. The fact that -vo null seems to use more power than -vo gpu suggests that -vo null is doing the thing we explicitly want to avoid, the copy-back to userspace (while -vo gpu could potentially do significantly better if gpu hardware was managed just right by the driver and compositor … the intel driver manages to do better, because of differently structured hardware features, right? but I’m way out of my area of expertise here)

3 Likes

Yup I agree with your hypothesis.

2 Likes

Thats nuts. I’m just sitting here doing nothing, with a browser open and I’m at 8W. Starting Plex playback jumps it to 11.2W.

That’s on bios 3.05, Arch Linux, using Firefox usually. Also using PPD on KDE, with the power profile set to power save as mentioned. This seems to work well for me. Make sure it’s actually controlling the epp profile with powerprofilesctl (iirc)

Fascinating, I’ve got almost the exact same setup, except I’m using sway instead of KDE (which, naively I would think would be better or the same). I did notice looking at powertop that pipewire uses 1W constantly (even with nothing is playing) so maybe thats a good first thing to run to ground.

ppd looks right to me:

* power-saver:
    CpuDriver:  amd_pstate
    PlatformDriver:     platform_profile

Edit: I guess this is actually not right (wrt PPD), it should be amd_pstate_epp? Do you use active/passive/guided?

Edit 2: it appears to only show up in /sys/devices/ and not in ppd, but with active I do seen significant improvement, idleing around 6W, 7W with firefox open, and hitting 9-10W on playback. @Mario_Limonciello perhaps you should include setting amd_pstate=active in the guide as well?

I do still see a few sus things in powertop, like pipewire always using 1W even with no audio playback: