Thread summary
Since this post is evolving aggressively, let me provide a short tl;dr w/ links of what are my findings:
- LG Thunderbolt is unstable, not GPU related - it also works well when daisy-chained via TB dock
- After flickering/instability to fixed, I started to get GPU page faults
a) I’m using Arch - distro failed to pull the firmware fix for Strix Point
b) @Mario_Limonciello made me realize that for some people it may not be obvious to regenerate initramfs after updating firmware, mind this while trying to remediate the problem
c) It’s always good to verify the running version of firmware by inspectingsysfs(alternative helpful commands below) - Still getting hangs on the older MES firmware (0x80), I’m currently experimenting with
amdgpu.cwsr_enable=0kernel commandline parameter
a) didn’t help, problem still occurs
b) apparently this may not be the best course of action for Strix Point, if the problem reoccur I’ll attempt to debug it more properly - Attempted to decrease the iGPU assigned VRAM to default 512MB and rely on GTT + decreased refresh rate, just to change something, didn’t help
- Plugged monitor via DP instead of Thunderbolt, didn’t help either
Checking running firmware version
A big gun for all the AMD cards in the system:
# grep . /sys/module/amdgpu/drivers/pci\:amdgpu/*/fw_version/*
Alternatively, with nicer formatting:
# grep . /sys/kernel/debug/dri/0000:*/amdgpu_firmware_info
Wildcards can be replaced with PCIE address.
Original post
I’m starting this exploratory thread as I still have no full understanding of the problem and it’s scope. Curious if anyone else have similar observations.
TL;DR
After swapping motherboard to new one with Ryzen AI 370, using of my LG monitoring w/ Thunderbolt (or USB3 mode, as it also supports it) is almost impossible due to recent connection drops / reconnections, even though it worked perfectly fine on the previous generation of the motherboard.
What doesn’t work
After motherboard update I immediately noticed different behavior while connecting external screen (LG 38WN95C-W). First impression was that connecting kinda goes in two phases, where device is enumerated, then it disconnects for 1s and reconnects back. It was stable for few seconds and then become the disconnect/reconnect flapping for good and I could break it just by disconnecting the screen completely. Even though nothing changed in my setup I tried different cables and got most bizarre results:
- the cable I used so far was resulting in reproducible flapping
- short, passive TB4 (certified!) cable seemed to work, but PD negotiated only 60W instead of expected 94W
- active TB4 cable finally worked as expected and also seemed stable
I continued to dig and attempted multiple things to rule out software problems - bumped the kernel to newest version, made sure that I run the newest linux firmware, made sure that the laptop firmware is up to date also. Shut the laptop down and restarted. At this point all the cables I tried started to work.
What works, so far
This morning I attempted to use my setup for longer time and all of the problems hit me back. I couldn’t get stable session for longer than 1 minute. To this moment I was trying to connect only to the ports on the left side of the device (both, Thunderbolt and USB3) with the same result, so I attempted to connect to the right side with the long, active cable. I heard the disconnect/reconnect sounds for 4-5 times but it clicked and works stable for ~10-15 minutes so far. It’s not my preferred side of the device though + frankly, I’m only waiting for the problems to re-appear.
My thoughts so far / observations
There’s few things worth to mention:
- the charging status led also blinks while the “flapping”
- when booting the laptop with monitor connected, problems starts only after starting the windowed session, so after loading the
amdgpudriver - unloading
ucsi_acpimodule didn’t help (I suspected some kernel-PD interaction for a while) - I’m using the newest available kernel (6.18.2, also zen variant available in Arch repository) which was reported to be problematic, haven’t tried older one
- lastly it’s worth to mention that I had problems with this monitor in the past, on certain ThinkPad model, so it also falls into suspects bucket
Nothing of the above explains why connection seem more stable on the port from the right side of the laptop nor why it worked just fine on the older generation of the motherboard.
I’ll continue my observations and gonna experiment a bit with the kernels. Also gonna update this thread once I have more data.