Irregularly high iGPU utilisation

My system & settings:

Latest BIOS, Latest Firmware, Kubuntu 22.04, 7840HS, RX7700S, Kernel 6.6.42, 6.9.9 & 6.10.1, fw-fanctrl (relevant config see bottom), balanced mode with ppd 0.21, 2560x1600, 165Hz

I have my FW16 since ~3 months, but only now started noticing that even slightest system interaction drives the iGPU usage quite high.

In the following picture I closed all background programs (green) and then started circling my mouse around slowly (orange):

Screenshot 1

While that probably isnt a problem in itself, starting a program like Steam, Telegram or Firefox pushes the utilization and temp really high.
Sometimes it gets back to “idle” (0-10%) after a minute or two, but most of the times it uses way more iGPU processing power although not interacted with:

Screenshot 2

Even just creating this topic drives the utilization to 25-50% while typing.

While this is no problem for just chilling on one website (it will occassionally ramp up the fans), this becomes really annoying, when switching between several applications that are using the iGPU.

Example: Switching from Firefox to Steam, opening friendlist

E.g. switching from Aseprite to Godot and Telegram or Firefox makes the fans go really loud and the laptop quite unresponsive :frowning:

Is this to be expected?

/etc/fw-fanctrl/config.json
"defaultStrategy": "lazy",
    "strategyOnDischarging" : "laziest",
    "strategies": {
        "laziest": {
            "fanSpeedUpdateFrequency": 5,
            "movingAverageInterval": 40,
            "speedCurve": [
                { "temp": 0, "speed": 0 },
                { "temp": 45, "speed": 0 },
                { "temp": 65, "speed": 25 },
                { "temp": 70, "speed": 35 },
                { "temp": 75, "speed": 50 },
                { "temp": 85, "speed": 75 }
            ]
        },
        "lazy": {
            "fanSpeedUpdateFrequency": 5,
            "movingAverageInterval": 30,
            "speedCurve": [
                { "temp": 0, "speed": 15 },
                { "temp": 50, "speed": 15 },
                { "temp": 65, "speed": 25 },
                { "temp": 70, "speed": 35 },
                { "temp": 75, "speed": 50 },
                { "temp": 85, "speed": 75 }
            ]
        },

If you have sensor monitoring software running it’s going to keep the dGPU awake. Simply being awake will keep it consuming more power and the thermal impact of this will make the common fans spin up more frequently.

1 Like

Yes, that is true, but my dGPU stays at 0% all the time, the iGPU is acting weirdly :confused:

What’s bothering me is that the iGPU heats up the whole package so much, that just opening discord results in really loud fans for 5 seconds, just to become quiet again.

0% is not the same as off. If the dGPU is on even at zero percent it’s generating a lot of heat.

1 Like

Good point, but I have to ask: On the pictures, the dGPU temperature stays the same, but it probably still generates heat nevertheless just because it is active?

I will remove the dGPU sensors from the system monitor page, that should rule out that the dGPU contributes to the problem.

I’ll also disable fw-fanctrl in case it tries to read the dGPU temperature and thus waking it up.

Yeah. Simply turning it on generates heat but because the temperature sensor for the dGPU is on the die you won’t be able to see what the value is unless it’s turned on. Catch 22.

Hopefully I’m right here and not monitoring it helps your problem.

I will keep an eye on it for a few days, right now it seems to work better :+1:

I don’t have a Framework 16 handy for a few weeks, but I’m wondering about doing something like this.

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index c11952a4389b..b60655e996e9 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -2529,6 +2529,9 @@ static ssize_t amdgpu_hwmon_show_temp(struct device *dev,
        if (channel >= PP_TEMP_MAX)
                return -EINVAL;
 
+       if (adev->in_runpm)
+               return -EBUSY;
+
        switch (channel) {
        case PP_TEMP_JUNCTION:
                /* get current junction temperature */

Can you see how your sensor reading software handles this? It should make it so that if the dGPU is suspended it won’t wake up just from reading a sensor.

1 Like

@James3 You have a dGPU on your F16 right? Can I ask you to see if that patch works for dGPU to keep it in D3cold when running sensors?

Here’s an updated patch that I think should work to keep sensors from waking the system. Can you guys please try it?

https://git.kernel.org/pub/scm/linux/kernel/git/superm1/linux.git/commit/?h=superm1/runtime-pm-screen-on-off&id=01a38e3ff67b83e2f7f4991fd76c4fe1802eb82a

@Mario_Limonciello I can test for you - but I’m not real sure on how to add the patch. If you can provide the steps to apply?

Also, not sure if it matters im on 6.10.3-200.fc40.x86_64

There is a download patch button in the link I sent. You should use this for building in Fedora;

1 Like

Thanks @Mario_Limonciello Ill try and get this done. As I havent done it before, I am trying to do it on a fedora VM I have first. Been a busy day at work… so Ill likely give it a go tonight or this weekend.

It won’t work in a VM. It needs to be on real hardware.

Clarification, Im just looking to build the kernel in the VM :slight_smile: not test in the VM. If I can build it without breaking things… Ill just do it on the laptop. Or are you saying that wouldnt work either?

That will work just fine!

1 Like

@Mario_Limonciello - took a while to build - but I got booted into the kernel. Unfortunately it does look like it still activates the GPU

Booted kernel:

/vmlinuz-6.10.0-rc2-knipp+

Output of cat /sys/class/drm/card[0,1]/device/power_state without a monitoring software running:

D3cold

I opened “Mission Center” which is basically the same thing as the OP for this post - and it went to:

D0

Kept it open for ~5 mins, and it stayed in D0 the entire time.

Closing Mission Center immediately put GPU back to D3Cold.

Edit to add: I tested with btop and the same result happened. Which is kind of interesting since the version I am on doesn’t seem to monitor GPU (at least for AMD).

I can stay on the kernel for a while if you want to check that I applied the patch correctly, get any logs, etc… Just let me know how I can help.

Can you try with specifically “sensors” command? Does this also wake it?

Hi Mario,

Even running #sensors results in the GPU going into D0.

Thanks

Huh, that’s really surprising. You’re sure you’re running your test kernel with that patch properly applied?