dGPU lockup from rear USB port

I am having an issue with the dGPU locking up if I connect a display to the rear port after booting up. I run arch Linux as my main OS but also have Fedora 39 installed for further testing. The same results occur in either OS.
If I boot up with nothing connected, and then connect a display to the rear port, dmesg does show a connection like so, but the additional monitors do not work:

[  142.939856] usb 1-2.4: new full-speed USB device number 11 using xhci_hcd
[  143.078727] usb 1-2.4: New USB device found, idVendor=32ac, idProduct=0002, bcdDevice= 0.00
[  143.078743] usb 1-2.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  143.078749] usb 1-2.4: Product: HDMI Expansion Card
[  143.078753] usb 1-2.4: Manufacturer: Framework
[  143.078757] usb 1-2.4: SerialNumber: 11AD1D00A49C4014081E0B00
[  143.138075] hid-generic 0003:32AC:0002.000C: hiddev101,hidraw11: USB HID v1.11 Device [Framework HDMI Expansion Card] on usb-0000:c4:00.3-2.4/input1

However, if I then try to launch any graphics applications, games, or even nvtop, they just hang. This also causes the system to become pretty unstable and will hang on poweroff/reboot. If I then try to power back on immediately, the laptop will not boot (which scared the crap out of me the first time). But If I wait at least 30 seconds or so when powered off, it will power back on properly.

If I power on and boot up with the rear display(s) connected, it functions just fine and I can even remove them and reconnect without issue.

Can anyone else confirm this is happening to them? Or do I possibly have a busted dGPU and need to contact support?

1 Like

I’ll try testing your scenario in abit… Just for confirmation, it looks like you are using an HDMI module, so that will be one difference between our tests as I only have DP modules.

In the mean time, I’ll share some other settings that i set while I was troubleshooting my own issue(s)… thought being that if you could try them that our baselines would be closer together:

  1. In GRUB, I added amdgpu.runpm=0 to prevent the dGPU from sleeping
GRUB_CMDLINE_LINUX_DEFAULT="udev.log_priority=3 amdgpu.runpm=0 sysrq_always_enabled=1"
  1. I also added amdgpu to the modules section of /etc/mkinitcpio.conf\ (arch-based) to pre/early load the kernel driver…
    MODULES=(amdgpu)

Between these two settings, I think #1 could be illuminating… does your system respond the same when the dGPU is not sleeping

I did some further testing and this prevents it from locking up if I plug into the rear port after booting up. Unfortunately, it also adds 6-10W of additional load on the battery while unplugged because the dGPU never goes to sleep. Unacceptable in my mind.

I wish Framework would comment on this issue as it is a pretty major bug.

hmm, well, I’m normally connected to AC (living room PC), but according to my conky that’s monitoring cat /sys/class/drm/card1/device/hwmon/hwmon*/power1_average | awk -v OFMT="%3.2f" '{print $1/1000000 " W"}' the watts bounce between 0-1 W and only increase when I launch a steam game using DRI_PRIME=1 %command%… so strange that you’d be seeing 6-8W… perhaps we are measuring differently?

Also, for the time being, I have stopped using the dGPU port entirely. It feels obvious from my thread (and yours) that using the dGPU port brings problems; and I’ll test suggestions/fixes as FW responds to my thread. So I’m curious if my dGPU watts are measuring lower for me because the dGPU port is empty?

There are some days when I feel I’ve paid to beta test a new product… I need more days where I can just enjoy using a fully functional FW16… If I can’t get to that point soon, I may have to consider the 30 day return policy.

Perhaps. I’m by no means doing any exhausting testing on this. I didn’t save my results yesterday, so I just did it again to provide you some screenshots.
My simple method is boot up into KDE/Plasma on battery, close steam and bitwarden (two things I have in autostart) to get to a basic desktop environment with minimal extras running.

With no sleep (dGPU in “D0” mode):

Then with the “amdgpu.runpm=0” removed (dGPU in “D3cold” mode):

In this simple test, it is a 8w difference. Nothing connected to laptop beyond my mouse dongle and the ethernet adapter, screen brightness at 100% and nothing else going on or open on the desktop. Kept it as close to the same as I could.

In any case, I use my laptop more as a laptop, so the deep sleep is more important to me.

I’m not really a powertop user, but the difference I see between the two images is the MediaTek Wireless device (which I assume is Wifi) at 9.85W versus 110mW.

amdgpu lists at 0mW and 200uW (0.2mW) between the two images.

So to my eye, the difference for the battery drain delta was the Wifi.

I have no doubt that the dGPU has a draw when it’s not sleeping, but if idle D0/D3hot is down in the sub 1W range… that’s likely a more acceptable delta compared to 0mW D3sleep.

What’s also confusing based on what I’ve read (see example quote below) is that simply probing the sysfs files for the dGPU also wakes it up… so if powertop is probing all the devices for details while it’s running, is the dGPU ever in D3sleep during the power monitoring?

I’m not here to try to prove the facts to you regarding the dGPU using more power when awake. On my laptop and in my use case, I’m just saying keeping it awake all the time is going to significantly reduce my time on battery and it is not an acceptable workaround for me.

Trying to argue that the wifi card is somehow using more power when the dGPU is awake doesn’t change the base problem.

powertop does not wake up the dGPU, that’s why I am using it to check. I just did another check while watching it with the dGPU in D3cold, powertop showing ~20w. Then launching nvtop (which /DOES wake it up), and seeing powertop shoot up to ~27w with dGPU in D0.

All these simple checks are enough proof for me. Does the dGPU somehow cause the wifi card to draw more power? Maybe, but then having the gpu sleep is still preferred so my wifi power usage drops back down as well.

Thank you for all the helpful suggestions though. They assisted in narrowing down the quirks of that rear usbc port. Hopefully Framework sees this and takes note to help make it more useful for us.

As a new FW16 laptop owner, I am no expert, but I am willing to explore and learn… especially in areas where I have a related issue… like the dGPU port not working as expected.

More times than not, when I chime in on forum threads I end up learning something too in the exchange, so I like to confirm what I see prior to proposing my ideas/theories. Any proposed theories of course end up being either debunked of confirmed through dialogue; leading to learning either way.

In the end if our dialogue assisted in narrowing down the quirks of that rear usbc port., then I’ll consider that an advancement of learning. Not mine, but hey, this wasn’t my post

For right now, I’m going to focus on the support ticket opened for my partners Batch 6 FW 16 Laptop (arrived a week or so after my Batch 5)… as hers arrived with a damaged/misshaped input module header. The input module plug will not stay connected leading to a “connect your input module plug into its header” message on power up… making it a very expensive paperweight for her right now until (I’m guessing) deemed innocent of creating/causing the header issue.

Through more testing I can simplify the issue as these three undesirable functions which depend on power state:

D3cold - connect - D0: results in dGPU lock.
– plugging in a monitor to the rear dGPU usb-c port when the power state is in “D3cold” will not detect the external monitor(s). Then, if the gDPU tries to go active (switch from D3 to D0), it will lock up, and system fans will quickly ramp up to full blast, shooting hot air out the exhaust vents of the laptop until it is powered down. Only fix to this is to power off and let sit for at least 20-30 seconds.

Force D0 - connect: results in external working, disconnect will allow D3cold again.
– Holding the dGPU into D0 (on) state and then plugging into the rear port will work. However, if you make a mistake and it goes back into D3cold, and you reconnect, fans will ramp up to max, blasting heat out and it will be in same state as above.

connect - boot D0: results in external working, but D3cold is permanently blocked.
– booting up with the rear port connected to a monitor will also work, but the gDPU will never go to sleep again, even after disconnected. It remains in D0 state until the next powercycle.

Obtaining power status is from watching the sysfs file:

/sys/class/drm/card1/device/power_state

None of these bugs are desirable in the slightest, and makes using the rear port problematic at best. How could all these issues have made it past testing? The fact the gpu gets hot and causes the fans to ramp to max is worrying. Is Framework not going to comment on this? Is my machine the only one that is effected by this issue?

thinking back to my initial dGPU port experience, I was using the laptop post OS install without any external monitor connected.

When I connected a DP module with /cable to my monitor, it stayed blank, so I did not experience any fan ramp up like your scenario #1… however, when I rebooted with the monitor connected to the dGPU, it did display on the desktop (but not at SDDM/login, I found a way to force xrandr early to fix that, but is read herring to your issue).

I might have triggered scenario #2 through my conky, and perhaps when I rebooted with the monitor connected, it might have woken up the dGPU… can’t be sure.

scenario #3 is how I’ve left my system… However, since I cannot get a game to rum properly with an external monitor on the dGPU port (x11 and Wayland had different issues/symptoms), I’ve had to abandon the port for now and plug my external monitor on a side port (iGPU).

In short, and if my dGPU “issues” can be considered “baseline”, your fan ramp-up/lockups are definitely concerning; not what I’ve experienced.

Not sure if that points to the dGPU/port of the cable/module used. @Matt_Hartley is this somehting you could provide some feedback on?

This makes sense. And to clarify, the fan will not ramp up until I try to utilize the dGPU. At that point, it seems to switch out of D3cold and into D0, then freeze.

Yes, and since you are forcing the dgpu on with that kernel parameter, it avoids situation #1 and #2. This is probably the best route to take to work around the problem.

Not sure about this. I can say, in my testing, the dGPU port works beautifully as long as I avoid the D3cold situation while connecting the monitors. I have a total of 3x1440p@144hz monitors that all worked when connected to a 3-port dongle connected to that rear port. Played quite a bit of Helldivers 2 on the middle of the three this way without any problems. All functionality also works when connected to one of the side display-supporting ports as well.

Thank you for troubleshooting this as well, Daniel. Hopefully together we can get this situation sorted out. I just think not very many people have tested this port yet. Hoping it’s not only a problem with mine.