[RESPONDED] PPD (power-profiles-daemon) (****AMD ONLY****)

Then hw outperforms software decoding at least, still uses a stupid amount of power though.

As I mentioned somewhere else (another thread I think?) the lowest level issue is a kernel issue with how the cursor is blended by the compositor. I don’t feel anything is being dismissed. This is the absolute first step to solving this issue.

https://lore.kernel.org/amd-gfx/Zg63qvnHgutUARrh@xpredator/T/#m46eb0af785d226309891edca514a756d0e2bde21

After that’s been resolved, the next step is to work on the compositors working with libliftoff which Valve already uses for Gamescope. This will start with Weston, and then $FAVORITE_COMPOSITOR needs to add support for it.

Then all the compositors need to use scan out to display the video. There is work ongoing for this right now too.

just the decoding bit (no scaling or displaying or anything, just decode and discard)

Again, if the cursor is being blended by the compositor incorrectly then the GPU won’t be in GFXOFF. Take “video playback” out of the picture and compare a simple workload of moving a cursor in Windows and Linux and you should see differences that the GPU isn’t in the best sleep state. When this is fixed by the kernel and compositors it should bring battery improvements as well for non-video workloads.

I have to emphasize none of what I said above is trivial and quick. I’ve done what I could on the “CPU side” and “Display side” of the equation with amd-pstate, amdgpu ABM and PPD, but there are “way more cooks in the kitchen” when it comes to GPU power optimization.

7 Likes

And how exactly does the cursor cause ffmpeg to use more power when just decoding something into /dev/null? I agree there are optimizations to be made in the rest of the pipeline but the decoder itself still makes up for a significant portion of the excess power use.

Or how does the same cursor make playing back exactly the same video through exactly the same flawed pipeline use more power with hardware decoding than without?

That is exactly the dismissing I am talking about. It’s not “no you are using ffmpeg incorrectly and you can use the hw decoder efficiently using parameter X” it’s just “yeah there are other stuff using too much power so just pin it all on that even if it is nowhere near as much of a problem on intel”

Edit:

Ok danm that is pretty bad too, 2-3W for just waving the cursor in front of an empty terminal window jesus.

If GFX IP isn’t off you’re going to have consumption from SDMA and GFX, it’s going to dwarf the VCN consumption.

I haven’t done this experiment - but thinking about it you might be able to prove it by doing your /dev/null experiment specifically over SSH while the display is in DPMS or in PSR (maybe both experiments are worth it). I would expect you have gfxoff at that time for sure for both, and can see just the consumption bump from running VCN and the CPUs spinning up.

Would that stuff not need to be on to show a picture anyway? And in the case of kodi playback with sw decoding it would definitely be on to actually display the video.

The current experiments were already done via ssh, but I did do some cross testing to make sure that doesn’t influence the results it’s just more convenient. Redoing the tests without sway running may be worth a shot though.

Edit: just retested the 1080p and 480p samples without sway running, pretty much on the dot exactly the same usage as the original tests.

DCN is the only hardware that needs to be on to show a picture. GFX and SDMA are “expensive” when it comes to power. Ideally you only want them on when you need them.

The current experiments were already done via ssh, but I did do some cross testing to make sure that doesn’t influence the results it’s just more convenient. Redoing the tests without sway running may be worth a shot though.

IIRC if you’re at a TTY, you won’t get GFXOFF when the display is turned on. I think it’s most interesting to use Kwin or Mutter. Hyperland I’ve seen reports it keeps GFX awake all the time.

Also; dumb question, but how are you measuring power consumption? Something like nvtop can cause GFXOFF to be exited…

1 Like

And you need to spin all that up to use the hw decoders?

I don’t know how I’d turn the display off. Does closing the lid work?

powerstat -d 30 1 480

Pretty sure that just queries the battery and cpu performance stats from the os

No you don’t. But because of how the compositors use the hardware today as I’ve mentioned above they could be active. And running a TTY can cause them to be active.

My point here is that unless graphics IP is in GFXOFF then you are going to basically burn DCN + GFX + VCN + SDMA. Really you should just be burning DCN + VCN with a proper stack.
This is what Windows does and why it behaves so much better for video offload.
The VCN microcode is the same for any OS (besides being a snapshot of code)

Not at a TTY.

Yeah that should be fine for this purpose.

1 Like

But this would still mean that in the ffmpeg case the only difference between hw and sw decode would be VCN being on if the rest is on or not doesn’t matter that much since it would be the same in both cases right? Only DCN and VCN would obviously better as an end result.

Still though if the only difference is VCN being active or not using 2.8W to decode 720p 24hz on dedicated hardware when the cpu can do that for 0.5W that seems a bit excessive and would still be a problem if only VCN and DCN were active, hell the 2.8W for just decoding using VCN is more than that 2.4W it takes for the full sw decoded kodi playback of the same thing with sound and everything.

Yeah apparently not I can only get it to suspend or not do anything at all which doesn’t even turn the backlight off when closing the lid. Most laptops I own just cut power to the backlight when the lid closes so that’s a first for me.

But this would still mean that in the ffmpeg case the only difference between hw and sw decode would be VCN being on if the rest is on or not doesn’t matter that much since it would be the same in both cases right?
Still though if the only difference is VCN being active or not using 2.8W to decode 720p 24hz on dedicated hardware when the cpu can do that for 0.5W that seems a bit excessive and would still be a problem if only VCN and DCN were active, hell the 2.8W for just decoding using VCN is more than that 2.4W it takes for the full sw decoded kodi playback of the same thing with sound and everything.

I don’t have specific numbers to make this point (so these are just examples) but you need to remember that the biggest consumer for power is GFX.
Under the SW rendering example if SDMA + GFX + DCN + CPU takes 5W and SDMA + GFX +DCN +CPU + VCN takes 7.8 W you assume that VCN takes 2.8W.
But if SDMA + GFX + CPU weren’t part of the picture you may be a lot lower, maybe at 3.5W.

IOW VCN consumes more than just CPU, but less than CPU + GFX + SDMA.

Most laptops I own just cut power to the backlight when the lid closes so that’s a first for me.

On a TTY? I’d like to understand how this actually could work. The kernel talks to upowerd and logind. Then thoes tell something to tell the graphics driver to turn off the backlight.

(asuming you put VCN into your second lineup) Would that not be a reasonable assumption to be made since the other bits are doing or not doing pretty much the same thing in both cases?

Could that be demonstrated somehow? I would love to see VCN decode (only decode) 1080p 30 for less than 1W under linux. Given decode and display is possible on windows for that it should be possible somehow and if it isn’t would that not be a bug somewhere on the linux driver side?

Like how the hell is this much never fancy 4nm chip with the same sub-obtimal software stack worse at this than my like 6 year old intel thinkpad? That one also doesn’t have direct scan-out and doesn’t even benefit from stupidly efficient cpu cores to mask some of the extra effort. I just don’t want to believe amd is really that far behind.

Nah like on a hw level, if the lid switch triggers the ec turns off the backlight enable pin (at least on my t480s, this bit is one of the bits that broke when I put in the OLED panel with the wonky pinout so it doesn’t do that anymore since I bypassed it now XD)

Yeah sorry I meant to put VCN in the second one, typo. I’ll correct it.
Yeah it’s reasonable to expect VCN to consume an incremental amount; but if GFX and SDMA aren’t active then it’s going to be less than those. GFX should consume more than VCN.

Could that be demonstrated somehow? I would love to see VCN decode (only decode) 1080p 30 for less than 1W under linux. Given decode and display is possible on windows for that it should be possible somehow and if it isn’t would that not be a bug somewhere on the linux driver side?

Like I said, the first step is to get the hardware cursor on the right plane and the compositor to work out how it’s blended. We’ll get there eventually.
And yes I don’t want to discount the possibility that some registers are set up differently on Windows and Linux. If once GFX and SDMA aren’t active for video playback everything is still too high, we’ll need to compare registers. This is better done on AMD reference hardware though.

Like how the hell is this much never fancy 4nm chip with the same sub-obtimal software stack worse at this than my like 6 year old intel thinkpad? That one also doesn’t have direct scan-out and doesn’t even benefit from stupidly efficient cpu cores to mask some of the extra effort.

I know very little about the Intel hardware design, but I can at least throw a few guesses out.

  1. Intel has chipset as not part of the SoC package. When measuring RAPL numbers on an Intel system you’re not getting a “complete” measurement. It might be better to look at a total battery or AC draw when it comes to drawing a full comparison.
  2. Intel might have their graphics set up that different IP blocks are in different power domains than AMD does. For current AMD hardware all of graphics is it’s own power domain, and so ALL graphics operations are very (power) expensive. Maybe Intel has scaling blocks that can be lit up separately, or similar. Future AMD silicon is going to be introducing VPE which has the ability to do some copy operations with GFX disabled. TBD how that will be used in Mesa.

I just don’t want to believe amd is really that far behind.

I think if you compare Windows for Intel and AMD for video playback I expect they’re going to come across very similarly, but there is very obviously a gap in power consumption for AMD and Intel in Linux.

Ah… this is a better question for Framework, I am unsure if their EC does anything like this when the lid is closed. It could be a missing functionality for their EC. I don’t know if there is even such a pin wired up either.

And yes that makes sense when you swapped to OLED, OLED “backlight brightness” is controlled by an AUX interface in the driver.

Even if GFX is doing the exact same thing in both cases?

But since there is quite a bit to be done until GFX and SDMA won’t be active for video playback would int not make sense to look into that first? Like some kind of demo thing, potentially completely without a DE? Hell even a demo of just rinning a decode at non rediculous power levels without any form of display would demonstrate if it’s borked or not.

I allways use battery draw numbers, I am well aware the reported power can be quite unreliable.

I was talking about the linux side, on windows it is apparently doing quite well.

A bl enable pin is pretty standard, but the t480s straight up toggles power to the backlight XD

Not quite, it actually uses the same interface for brightness as lcd panels, it’s just somewhere else (I only got the datasheet after I blew up parts of my mainboard, samsung uses a wildly incompatible pinout than the de-facto standard for lcds). To be fair if the pinout would have worked I’d have blown up the display since it has much weirder power requirements than an lcd.

The total damage was all the display fuses blown and the ec refusing to turn on the mosfet for the backlight, I “replaced” the fuses and ended up bypassing the mosfet for the bl power (mosfet is still fine but might as well jjst bypass it instead of forcing it on all the time) and it is pretty much fine now.

Someone else did exactly what you’re looking for:

Somehow didn’t get that notification.

While it’s not exactly what I was looking for it is certainly an interesting datapoint. But if that has the presentation bit fixed we are still looking at 230% extra power use vs windows which would be more of an argument on the “it’s borked” side.

The massively lower 4K consumption is encouraging though.

1 Like

Rather than “borked”, I’d say it’s more like “optimized”.

Due to Windows’ history of having the superior OS market share, both intel and AMD unsurprisingly put their most effort into optimizing the Windows drivers.
And due to intel’s history of having the superior CPU/GPU market share, the deveopers participating in optimizing the linux graphics drivers probably put more effort into the intel drivers.

Afaik AMD started to treat Linux more favorably after they released their first Zen processor generation. Its efforts are still inferior compared to Windows, but way better than the efforts coming from intel and nVidia.
Under those circumstances it is far from surprising, that the linux drivers are stll lacking. In that regard, I applaud your efforts. I wish I could help you out in a more active way, but sadly I’m just an IT supporter, specialised in dealing with the issues originating in front of the screen.

2 Likes