[RESPONDED] dGPU lockup from rear USB port

I’m not really a powertop user, but the difference I see between the two images is the MediaTek Wireless device (which I assume is Wifi) at 9.85W versus 110mW.

amdgpu lists at 0mW and 200uW (0.2mW) between the two images.

So to my eye, the difference for the battery drain delta was the Wifi.

I have no doubt that the dGPU has a draw when it’s not sleeping, but if idle D0/D3hot is down in the sub 1W range… that’s likely a more acceptable delta compared to 0mW D3sleep.

What’s also confusing based on what I’ve read (see example quote below) is that simply probing the sysfs files for the dGPU also wakes it up… so if powertop is probing all the devices for details while it’s running, is the dGPU ever in D3sleep during the power monitoring?

I’m not here to try to prove the facts to you regarding the dGPU using more power when awake. On my laptop and in my use case, I’m just saying keeping it awake all the time is going to significantly reduce my time on battery and it is not an acceptable workaround for me.

Trying to argue that the wifi card is somehow using more power when the dGPU is awake doesn’t change the base problem.

powertop does not wake up the dGPU, that’s why I am using it to check. I just did another check while watching it with the dGPU in D3cold, powertop showing ~20w. Then launching nvtop (which /DOES wake it up), and seeing powertop shoot up to ~27w with dGPU in D0.

All these simple checks are enough proof for me. Does the dGPU somehow cause the wifi card to draw more power? Maybe, but then having the gpu sleep is still preferred so my wifi power usage drops back down as well.

Thank you for all the helpful suggestions though. They assisted in narrowing down the quirks of that rear usbc port. Hopefully Framework sees this and takes note to help make it more useful for us.

As a new FW16 laptop owner, I am no expert, but I am willing to explore and learn… especially in areas where I have a related issue… like the dGPU port not working as expected.

More times than not, when I chime in on forum threads I end up learning something too in the exchange, so I like to confirm what I see prior to proposing my ideas/theories. Any proposed theories of course end up being either debunked of confirmed through dialogue; leading to learning either way.

In the end if our dialogue assisted in narrowing down the quirks of that rear usbc port., then I’ll consider that an advancement of learning. Not mine, but hey, this wasn’t my post

For right now, I’m going to focus on the support ticket opened for my partners Batch 6 FW 16 Laptop (arrived a week or so after my Batch 5)… as hers arrived with a damaged/misshaped input module header. The input module plug will not stay connected leading to a “connect your input module plug into its header” message on power up… making it a very expensive paperweight for her right now until (I’m guessing) deemed innocent of creating/causing the header issue.

Through more testing I can simplify the issue as these three undesirable functions which depend on power state:

D3cold - connect - D0: results in dGPU lock.
– plugging in a monitor to the rear dGPU usb-c port when the power state is in “D3cold” will not detect the external monitor(s). Then, if the gDPU tries to go active (switch from D3 to D0), it will lock up, and system fans will quickly ramp up to full blast, shooting hot air out the exhaust vents of the laptop until it is powered down. Only fix to this is to power off and let sit for at least 20-30 seconds.

Force D0 - connect: results in external working, disconnect will allow D3cold again.
– Holding the dGPU into D0 (on) state and then plugging into the rear port will work. However, if you make a mistake and it goes back into D3cold, and you reconnect, fans will ramp up to max, blasting heat out and it will be in same state as above.

connect - boot D0: results in external working, but D3cold is permanently blocked.
– booting up with the rear port connected to a monitor will also work, but the gDPU will never go to sleep again, even after disconnected. It remains in D0 state until the next powercycle.

Obtaining power status is from watching the sysfs file:

/sys/class/drm/card1/device/power_state

None of these bugs are desirable in the slightest, and makes using the rear port problematic at best. How could all these issues have made it past testing? The fact the gpu gets hot and causes the fans to ramp to max is worrying. Is Framework not going to comment on this? Is my machine the only one that is effected by this issue?

thinking back to my initial dGPU port experience, I was using the laptop post OS install without any external monitor connected.

When I connected a DP module with /cable to my monitor, it stayed blank, so I did not experience any fan ramp up like your scenario #1… however, when I rebooted with the monitor connected to the dGPU, it did display on the desktop (but not at SDDM/login, I found a way to force xrandr early to fix that, but is read herring to your issue).

I might have triggered scenario #2 through my conky, and perhaps when I rebooted with the monitor connected, it might have woken up the dGPU… can’t be sure.

scenario #3 is how I’ve left my system… However, since I cannot get a game to rum properly with an external monitor on the dGPU port (x11 and Wayland had different issues/symptoms), I’ve had to abandon the port for now and plug my external monitor on a side port (iGPU).

In short, and if my dGPU “issues” can be considered “baseline”, your fan ramp-up/lockups are definitely concerning; not what I’ve experienced.

Not sure if that points to the dGPU/port of the cable/module used. @Matt_Hartley is this somehting you could provide some feedback on?

This makes sense. And to clarify, the fan will not ramp up until I try to utilize the dGPU. At that point, it seems to switch out of D3cold and into D0, then freeze.

Yes, and since you are forcing the dgpu on with that kernel parameter, it avoids situation #1 and #2. This is probably the best route to take to work around the problem.

Not sure about this. I can say, in my testing, the dGPU port works beautifully as long as I avoid the D3cold situation while connecting the monitors. I have a total of 3x1440p@144hz monitors that all worked when connected to a 3-port dongle connected to that rear port. Played quite a bit of Helldivers 2 on the middle of the three this way without any problems. All functionality also works when connected to one of the side display-supporting ports as well.

Thank you for troubleshooting this as well, Daniel. Hopefully together we can get this situation sorted out. I just think not very many people have tested this port yet. Hoping it’s not only a problem with mine.

I’m also experiencing the same issue where the USB port on the dGPU module won’t provide display output when a monitor is connected after boot, but it will when plugged in at boot. I’ve also noticed two other things:

  1. Peripherals that are plugged into my monitor (a keyboard and mouse, in my case) are passed through to my laptop even though the monitor doesn’t detect an input signal
  2. If the rear USB display output is working (ex. after booting with it connected) and I unplug then replug it after a few seconds, it also doesn’t provide video output. Curiously, when attempting to check the power state of the dGPU (as seen in Use of USB port on GPU Expansion Module - #22 by Daniel_I) with cat /sys/class/drm/card1/device/power_state, it remains in D0

I also ran a test where I:

  1. Booted with the monitor connected to the rear USB port (video output working)
  2. Disconnected the port after a little while (power state still D0)
  3. Reconnected the monitor (no video output, but power state still D0)
  4. Attempted to run nvtop

In my case, the system did not hang, but the external monitor suddenly came to life. I haven’t yet tried this without the monitor connected at boot, where the power state may be D3cold instead of D0. I’ll give that a try later.

Regardless, though, I agree that this behavior is pretty undesirable. The expectation would be that when a monitor is connected to the port, it will behave similarly to any other display-output-capable port.

EDIT: Forgot to add some system details

Ubuntu 22.04 LTS with Wayland

Well, this test has been somewhat interesting. I attempted to replicate @jared_kidd’s scenario #1, but I was unable to.

Attempt 1:

  1. Boot laptop with no peripherals connected
  2. A minute after logging in, connect monitor (not launching anything else)
  3. Check device power state (reported as D0)

At first, the external monitor didn’t do anything except pass through the peripherals connected to it, but after ~15 seconds, it suddenly turned on and started displaying as expected. I thought I might have triggered the D0 power state by checking, so I tried again.

Attempt 2:

  1. Boot laptop with no peripherals connected
  2. A minute after logging in, connector monitor

It happened again where simply waiting ~15-20 seconds caused everything to work as expected.

I’m not entirely sure why my dGPU didn’t display D3cold at all, but perhaps it could be caused by:

  • Autolaunch of Steam or Discord (or Guake)
  • Enabling fractional scaling (and running at 125% with no external display connected and 150% with an external display connected)
  • 3.03 BIOS update
  • Launching into “Balanced” power setting
  • Leaving the magnetic USB-C adapter in the rear USB port (though this really shouldn’t do anything since it’s just a pass-through adapter)

Regardless, aside from taking a bit too long before displaying on the external monitor, I’m not seeing any major issues, like system hanging or complete failure to output to the external display. I just wasn’t waiting as long during my prior attempts.

I sent a ticket to Framework last week and they have confirmed the issue I am experiencing on their end. So hopefully a fix is in the pipeline.

You aren’t mentioning it, so not sure if your dGPU is in D3cold in any of your tests. You will not be able to replicate the issue I am describing unless your dGPU is able to sleep (D3cold). The reason for testing after a fresh boot is so it is actually in D3cold. Otherwise, the bug keeps it in D0 once you have stuff connected to that rear port.

@jared_kidd, I mentioned in that post (in the paragraph with all the bullet points) that the dGPU never seemed to enter D3cold.

I did try with a cold boot as well, but I think I’ve identified why my dGPU doesn’t enter D3cold. I noticed that my wireless earbud case doesn’t turn on its lights when I flip it open, and I have the same magnetic USB-C adapter on it as I do on the rear USB-C port on my Framework 16. When I removed the magnetic adapter, my earbud case lit up as expected, so I suspect USB-C ports may have plug detection that doesn’t rely on an upstream/downstream device.

I’ll try testing again without the magnetic adapter installed at boot. If the dGPU shows D3cold, then this should confirm that the reason mine wasn’t entering that state in the previous tests is due to the adapter being connected.

I tested again and I am fully unable to replicate your results, but I did find something else on my end.

With nothing connected to the laptop (all expansion cards removed, no adapters in any ports), at boot, I get the following terminal output

user@host:~$ cat /sys/class/drm/card0/device/power_state 
D3cold
user@host:~$ cat /sys/class/drm/card1/device/power_state 
D0

I guess my system is defaulting to the dGPU? I’m not sure why card0 is in D3cold but card1 is in D0 after booting.

Having said that, in this configuration, the following steps produce a few unexpected results:

  1. Connect external display to USB-C on dGPU – Nothing happens, even after ~30 seconds
  2. Run nvtop – Screen scaling changes, recognizing the second monitor, but does not display output to it, though peripherals are passed through
  3. Disconnect external display – Internal keyboard no longer functions, but trackpad works

I can’t get the system to hang, probably because I can’t even get card1 to be in D3cold at all, but the non-function keyboard input is confusing. Even if I reconnect the external display with its attached keyboard, neither keyboard works.

No, in this case your dGPU is card0, not card1. You are probably running Ubuntu (debian).

EDIT: looked up and see “Ubuntu 22.04 LTS with Wayland”. So yeah, yours are reversed.

1 Like

So Arch and Ubuntu list the cards oppositely? I would not have expected that…

I think the root of the freezing (on your end) may be Arch-specific. I am seeing some issues, namely the dropout of keyboard input from any source, but not the system freeze so I’m thinking there’s an underlying issue but it manifests in an OS-specific way.

Just Debian distros that are backwards is my understanding. Issue is not Arch specific at all. Read up I posted that Framework (and I) have verified the problem already on Fedora.

Connecting dGPU to the external display using dGPU port on the back? If this is what you are looking to do, there is a bug where the external display remains black ONLY when connected to the dGPU USB-C slow (HDMI or DP expansion card).

I do have a workaround, which uses a quick instance of running nvtop then closing it, all hidden and behind the scenes.

Basically it does a little udev magic to scream “Look, new USB is attached! Better run nvtop for a few seconds then close nvtop” - I do this as timeout 2 nvtop.

echo -e '#!/bin/bash\n\necho "USB device connected. Running nvtop for 2 seconds."\ntimeout 2 nvtop\necho "nvtop run completed."\n' | sudo tee /usr/local/bin/external_video.sh > /dev/null && sudo chmod +x /usr/local/bin/external_video.sh && echo 'ACTION=="add", SUBSYSTEM=="usb", RUN+="/usr/local/bin/external_video.sh"' | sudo tee /etc/udev/rules.d/99-external_video.rules > /dev/null && sudo udevadm control --reload-rules && sudo udevadm trigger

Paste this in, reboot, attach HDMI/DP to USB-C adapter, display will come online post login and after 2 seconds of being logged in. Kludgy, yes. Why not call up IDs for the devices vs any USB device? Compatibility.

This has not been tested with gaming as I only tested this to activate the display.

Now the original issue appears to vary some, as there is issues with lock up. But this may be due to the device going into a power save state. This may address that issue as well.

Thanks for the response @Matt_Hartley , just getting back from vacation, and I’d like to clarify my understanding of the information you’ve provided.

My understanding is that tools like nvtop, tickling the dGPU sysfs files, etc all serve to bring the dGPU out of sleep/D3Cold. If that’s related to the bug you’ve mentioned, then I think I’ve worked around it by adding amdgpu.runpm=0 to GRUB_CMDLINE_LINUX_DEFAULT to prevent the dGPU from sleeping… that said I can see how your solution/workaround could be attractive to @jared_kidd who didn;t want to disable dGPU sleep entirely.

Is the bug currently directly relayed to the dGPU sleep, or only when using the FW HDMI/DP modules to plug into the dGPU port? Just trying to understand what you are referring to as “slow”.

I have picked up a USB-C to DP cable (and have ordered a USB-C to HDMI cable as my monitor’s DP port recently died) which hopefully works around the bug you mentioned (cable not being “slow” like the module); however, I’m currently avoiding the dGPU USB-C port as games do not render properly (or at all) when connected to it.

That said I have yet to retest the state of things with my recent plasma6 update and/or latest 6.8/6.9 kernels… the joys of bleeding edge hardware with linux.

EDIT: My new USB-C HDMI cable arrived today so I thought I’d try scenario #1 above… same issue under plasma 6 as plasma 5 with the cable plugged directy to the dGPU; no issues plugged into side USB-C module (iGPU).

Didn’t test under wayland as it has problems with gamescope, so not an option for me at this time.

Appreciate the update. Need to clarify this is what we are doing, correct? Where is this being plugged into. My script was ONLY for the very back where the dGPU is attached. If it is for this area only, then yes, it wakes the dGPU up for use.

For any other ports, you would follow the outline here. And if using those supported ports as shown, it would be with expansion cards.

It’s also very important to note that you do not need to connect directly to the dGPU in the back to activate the dGPU. DRI_PRIME=1 will do that for you in any of the supported expansion bays.

Honestly, the absolute dead simplest approach here is to use Bazzite. GPU switching is just handled. It has a KDE option, too. We work closely with the developers there and this may make for a far, far easier experience for you.

Your script has been working fine as a workaround for me. Still have to reboot after I disconnect if I want the dGPU to ever go asleep again. But it’s at least usable.

Could you provide any update on when we will receive a fix for the actual bug?

I have almost the same issue, but instead of an external display I have… The Framework power adapter! In my case the adapter is plugged into rear left or rear right(1 and 4 on the FW16 port sheet) ports. If I plug the adapter in while the dGPU is in D3cold, the dGPU fails, producing a lot of logs about the crash. From my understanding, the dGPU times out.

If I boot with the power adapter plugged in, then the GPU stays in D0 until I unplug the adapter and fails again when the adapter is back in.

If, before plugging the adapter in(in any of the two three? cases), I force the dGPU into D0, then everything works like it should. I.E. the power adapter does not cause my GPU to fail.

Edit: The dGPU stays in D0 while the adapter is plugged in no matter what.

Edit 2: Plugging in my multipurpose adapter into these ports does not crash the GPU


This works

Edit 3: Plugging the power adapter into my multipurpose adapter does crash the GPU


This doesn’t

Edit 4: Fedora 40 doesn’t have this problem somehow…

Posting in here again so it doesn’t get auto-locked.

This is still an issue and causes my laptop to lock on occasion when i disconnect/connect external monitors. Does Framework have any plans to work on this bug?