[SOLVED] TB disconnects on left side but connection on right connects (amd egpu won't init) on arch linux

Hi,

I’m having the below issues with getting my amd egpu setup to run. (I’ve attached some logfile output below) The gpu is/was detected up in the grub menu. After loading the kernel is where things get wonky. I’m getting init errors by the kernel I believe. I’ve added the modules to grub and mkinit.conf. There is also a wonky thing going on with the left hand thunderbolt connection. The ports on the left hand side show the connected egpu case as disconnected. The right hand side shows the egpu case as connected.

Resulting in not having a working egpu setup and not getting external display output via the egpu.

Specs:

- Arch os
- 12th generation Framework laptop)
- 6600XT 
- Mantiz Saturn pro v2

The below commands result in:

lspci -k | grep -A 3 -E “(VGA|3D)”

0:02.0 VGA compatible controller: Intel Corporation Alder Lake-P Integrated Graphics Controller (rev 0c)
Subsystem: Device f111:0002
Kernel driver in use: i915
Kernel modules: i915

06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c1)
Subsystem: ASRock Incorporation Device 5215
Kernel modules: amdgpu
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller

dmesg | grep amd

output of dmesg
[ 2.449796] [drm] amdgpu kernel modesetting enabled.
[ 2.449954] amdgpu: CRAT table not found
[ 2.449956] amdgpu: Virtual CRAT table created for CPU
[ 2.449964] amdgpu: Topology: Add CPU node
[ 2.450155] amdgpu 0000:06:00.0: enabling device (0000 → 0003)
[ 2.453889] amdgpu 0000:06:00.0: BAR 6: can’t assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[ 2.482171] amdgpu 0000:06:00.0: amdgpu: Fetched VBIOS from ROM
[ 2.482172] amdgpu: ATOM BIOS: 113-EXT800288-L04
[ 2.482184] amdgpu 0000:06:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 2.482194] amdgpu 0000:06:00.0: amdgpu: PCIE atomic ops is not supported
[ 2.482253] amdgpu 0000:06:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[ 2.482254] amdgpu 0000:06:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 2.482255] amdgpu 0000:06:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 2.482302] [drm] amdgpu: 8176M of VRAM memory ready
[ 2.482303] [drm] amdgpu: 32009M of GTT memory ready.
[ 2.482328] amdgpu 0000:06:00.0: amdgpu: (-14) failed to allocate kernel bo
[ 2.482336] amdgpu 0000:06:00.0: amdgpu: Failed to DMA MAP the dummy page
[ 2.482337] [drm:amdgpu_device_init.cold [amdgpu]] ERROR sw_init of IP block <gmc_v10_0> failed -12
[ 2.482599] amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_init failed
[ 2.482600] amdgpu 0000:06:00.0: amdgpu: Fatal error during GPU init
[ 2.482601] amdgpu 0000:06:00.0: amdgpu: amdgpu: finishing device.
[ 2.483050] amdgpu: probe of 0000:06:00.0 failed with error -12
[ 11.892172] Modules linked in: snd_pcm thunderbolt(+) snd_timer i915(+) intel_vsec processor_thermal_device_pci snd processor_thermal_device soundcore processor_thermal_rfim processor_thermal_mbox processor_thermal_rapl ucsi_acpi intel_rapl_common intel_gtt typec_ucsi igen6_edac typec roles i2c_hid_acpi i2c_hid int3403_thermal int340x_thermal_zone int3400_thermal acpi_thermal_rel acpi_pad vfat fat mac_hid pkcs8_key_parser crypto_user fuse bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod serio_raw atkbd crct10dif_pclmul libps2 crc32_pclmul polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 vivaldi_fmap aesni_intel crypto_simd nvme cryptd nvme_core spi_intel_pci xhci_pci i8042 spi_intel xhci_pci_renesas nvme_common serio crc32c_intel amdgpu drm_ttm_helper ttm video wmi gpu_sched drm_buddy drm_display_helper cec

boltct list

TUL TBX-750FA
├─ type: peripheral
├─ name: TBX-750FA
├─ vendor: TUL
├─ uuid: c7010000-0070-6708-2399-b1845a21c801
├─ generation: Thunderbolt 3
├─ status: authorized
│ ├─ domain: c2f18780-706d-49a3-ffff-ffffffffffff
│ ├─ rx speed: 40 Gb/s = 2 lanes * 20 Gb/s
│ ├─ tx speed: 40 Gb/s = 2 lanes * 20 Gb/s
│ └─ authflags: boot
├─ authorized: wo 04 jan 2023 22:18:24 UTC
├─ connected: wo 04 jan 2023 22:18:24 UTC
└─ stored: zo 01 jan 2023 14:02:50 UTC
├─ policy: iommu
└─ key: no

TUL TBX-750FA #2
├─ type: peripheral
├─ name: TBX-750FA
├─ vendor: TUL
├─ uuid: ca030000-0092-9098-20d1-0a84cec3c001
├─ generation: Thunderbolt 3
├─ status: authorized
│ ├─ domain: c2f18780-706d-49a3-ffff-ffffffffffff
│ ├─ rx speed: 40 Gb/s = 2 lanes * 20 Gb/s
│ ├─ tx speed: 40 Gb/s = 2 lanes * 20 Gb/s
│ └─ authflags: boot
├─ authorized: wo 04 jan 2023 22:18:24 UTC
├─ connected: wo 04 jan 2023 22:18:24 UTC
└─ stored: zo 01 jan 2023 14:02:50 UTC
├─ policy: iommu
└─ key: no

If you made it so far pls halp

Update:

I think I may have fixxed issues with Thunderbolt. It’s connected and working on the right hand side haven’t tested the left side yet. @Matt_Hartley maybe the below is useful in debugging the TB firmware issues regarding docks and stuff.

Used two linux kernels

linux and linux-zen. Haven’t got the zen kernel working yet but with the below changes my egpu setup is working and running quite smoothly.

Changes:

/etc/default/grub

added intel_iommu=on

GRUB_CMDLINE_LINUX_DEFAULT=“cryptdevice=UUID=* root=/dev/mapper/luks-*loglevel=3 nowatchdog nvme_load=YES,intel_iommu=on”

That by itself didn’t work. But after the below command

boltctl authorize uuid {$TB_device) --chain

reboot

MAGIC it works.

2 Likes

Made my day. Was working with eGPU stuff recently. Not using a zen kernel at this point, but this is outstanding progress! Nvidia in my case, but this is great for LUKS users with AMD.

Marking this solved and bookmarking as well. :slight_smile:

1 Like