[RESPONDED] Radeon 7900XTX eGPU can't initialize in Linux (AMD 7840U Framework)

I have a Radeon 7900XTX GPU in a Razer Core X eGPU enclosure. It works fine in Windows on another laptop – in fact, it works even better than I had hoped. However, I mainly intended to use it with my Framework 13 (7840U) running Linux (Fedora Silverblue 39). It does not work on my Framework, unfortunately. I know only about enough to check dmesg, which shows this when I plug in the eGPU:

[60680.170344] thunderbolt 1-2: new device found, vendor=0x127 device=0x1
[60680.170349] thunderbolt 1-2: Razer Core X
[60680.194282] pcieport 0000:00:04.1: pciehp: Slot(0-1): Card present
[60680.194290] pcieport 0000:00:04.1: pciehp: Slot(0-1): Link Up
[60680.320045] pci 0000:62:00.0: [8086:15da] type 01 class 0x060400
[60680.320128] pci 0000:62:00.0: enabling Extended Tags
[60680.320262] pci 0000:62:00.0: supports D1 D2
[60680.320266] pci 0000:62:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[60680.320305] pci 0000:62:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:04.1 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[60680.320677] pci 0000:62:00.0: Adding to iommu group 3
[60680.322887] pci 0000:62:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[60680.323025] pci 0000:63:01.0: [8086:15da] type 01 class 0x060400
[60680.323090] pci 0000:63:01.0: enabling Extended Tags
[60680.323176] pci 0000:63:01.0: supports D1 D2
[60680.323178] pci 0000:63:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[60680.323242] pci 0000:63:01.0: Adding to iommu group 3
[60680.323352] pci 0000:62:00.0: PCI bridge to [bus 63-c0]
[60680.323361] pci 0000:62:00.0:   bridge window [io  0x0000-0x0fff]
[60680.323365] pci 0000:62:00.0:   bridge window [mem 0x00000000-0x000fffff]
[60680.323373] pci 0000:62:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[60680.323376] pci 0000:63:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[60680.323530] pci 0000:64:00.0: [1002:1478] type 01 class 0x060400
[60680.323556] pci 0000:64:00.0: reg 0x10: [mem 0x00000000-0x00003fff]
[60680.323719] pci 0000:64:00.0: PME# supported from D0 D3hot D3cold
[60680.323752] pci 0000:64:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:04.1 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[60680.323814] pci 0000:64:00.0: Adding to iommu group 3
[60680.326869] pci 0000:63:01.0: PCI bridge to [bus 64-c0]
[60680.326878] pci 0000:63:01.0:   bridge window [io  0x0000-0x0fff]
[60680.326883] pci 0000:63:01.0:   bridge window [mem 0x00000000-0x000fffff]
[60680.326890] pci 0000:63:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[60680.326893] pci 0000:64:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[60680.327039] pci 0000:65:00.0: [1002:1479] type 01 class 0x060400
[60680.327213] pci 0000:65:00.0: PME# supported from D0 D3hot D3cold
[60680.327326] pci 0000:65:00.0: Adding to iommu group 3
[60680.327452] pci 0000:64:00.0: PCI bridge to [bus 65-c0]
[60680.327461] pci 0000:64:00.0:   bridge window [io  0x0000-0x0fff]
[60680.327466] pci 0000:64:00.0:   bridge window [mem 0x00000000-0x000fffff]
[60680.327475] pci 0000:64:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[60680.327479] pci 0000:65:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[60680.327626] pci 0000:66:00.0: [1002:744c] type 00 class 0x030000
[60680.327658] pci 0000:66:00.0: reg 0x10: [mem 0x00000000-0x0fffffff 64bit pref]
[60680.327677] pci 0000:66:00.0: reg 0x18: [mem 0x00000000-0x001fffff 64bit pref]
[60680.327689] pci 0000:66:00.0: reg 0x20: [io  0x0000-0x00ff]
[60680.327700] pci 0000:66:00.0: reg 0x24: [mem 0x00000000-0x000fffff]
[60680.327711] pci 0000:66:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref]
[60680.327816] pci 0000:66:00.0: PME# supported from D1 D2 D3hot D3cold
[60680.327855] pci 0000:66:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:04.1 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[60680.327946] pci 0000:66:00.0: Adding to iommu group 3
[60680.328001] pci 0000:66:00.0: vgaarb: bridge control possible
[60680.328003] pci 0000:66:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[60680.328074] pci 0000:66:00.1: [1002:ab30] type 00 class 0x040300
[60680.328099] pci 0000:66:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[60680.328276] pci 0000:66:00.1: PME# supported from D1 D2 D3hot D3cold
[60680.328409] pci 0000:66:00.1: Adding to iommu group 3
[60680.328568] pci 0000:65:00.0: PCI bridge to [bus 66-c0]
[60680.328578] pci 0000:65:00.0:   bridge window [io  0x0000-0x0fff]
[60680.328584] pci 0000:65:00.0:   bridge window [mem 0x00000000-0x000fffff]
[60680.328594] pci 0000:65:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[60680.328596] pci_bus 0000:66: busn_res: [bus 66-c0] end is updated to 66
[60680.328603] pci_bus 0000:65: busn_res: [bus 65-c0] end is updated to 66
[60680.328609] pci_bus 0000:64: busn_res: [bus 64-c0] end is updated to 66
[60680.328614] pci_bus 0000:63: busn_res: [bus 63-c0] end is updated to 66
[60680.328626] pci 0000:62:00.0: BAR 15: assigned [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.328628] pci 0000:62:00.0: BAR 14: assigned [mem 0x60000000-0x77ffffff]
[60680.328630] pci 0000:62:00.0: BAR 13: assigned [io  0x2000-0x5fff]
[60680.328633] pci 0000:63:01.0: BAR 15: assigned [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.328635] pci 0000:63:01.0: BAR 14: assigned [mem 0x60000000-0x77ffffff]
[60680.328637] pci 0000:63:01.0: BAR 13: assigned [io  0x2000-0x5fff]
[60680.328639] pci 0000:64:00.0: BAR 15: assigned [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.328641] pci 0000:64:00.0: BAR 14: assigned [mem 0x60000000-0x77efffff]
[60680.328643] pci 0000:64:00.0: BAR 0: assigned [mem 0x77f00000-0x77f03fff]
[60680.328649] pci 0000:64:00.0: BAR 13: assigned [io  0x2000-0x5fff]
[60680.328651] pci 0000:65:00.0: BAR 15: assigned [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.328653] pci 0000:65:00.0: BAR 14: assigned [mem 0x60000000-0x77efffff]
[60680.328655] pci 0000:65:00.0: BAR 13: assigned [io  0x2000-0x5fff]
[60680.328658] pci 0000:66:00.0: BAR 0: assigned [mem 0x7000000000-0x700fffffff 64bit pref]
[60680.328674] pci 0000:66:00.0: BAR 2: assigned [mem 0x7010000000-0x70101fffff 64bit pref]
[60680.328691] pci 0000:66:00.0: BAR 5: assigned [mem 0x60000000-0x600fffff]
[60680.328697] pci 0000:66:00.0: BAR 6: assigned [mem 0x60100000-0x6011ffff pref]
[60680.328699] pci 0000:66:00.1: BAR 0: assigned [mem 0x60120000-0x60123fff]
[60680.328705] pci 0000:66:00.0: BAR 4: assigned [io  0x2000-0x20ff]
[60680.328711] pci 0000:65:00.0: PCI bridge to [bus 66]
[60680.328715] pci 0000:65:00.0:   bridge window [io  0x2000-0x5fff]
[60680.328723] pci 0000:65:00.0:   bridge window [mem 0x60000000-0x77efffff]
[60680.328729] pci 0000:65:00.0:   bridge window [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.328739] pci 0000:64:00.0: PCI bridge to [bus 65-66]
[60680.328742] pci 0000:64:00.0:   bridge window [io  0x2000-0x5fff]
[60680.328750] pci 0000:64:00.0:   bridge window [mem 0x60000000-0x77efffff]
[60680.328756] pci 0000:64:00.0:   bridge window [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.328766] pci 0000:63:01.0: PCI bridge to [bus 64-66]
[60680.328769] pci 0000:63:01.0:   bridge window [io  0x2000-0x5fff]
[60680.328775] pci 0000:63:01.0:   bridge window [mem 0x60000000-0x77ffffff]
[60680.328779] pci 0000:63:01.0:   bridge window [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.328787] pci 0000:62:00.0: PCI bridge to [bus 63-66]
[60680.328790] pci 0000:62:00.0:   bridge window [io  0x2000-0x5fff]
[60680.328795] pci 0000:62:00.0:   bridge window [mem 0x60000000-0x77ffffff]
[60680.328799] pci 0000:62:00.0:   bridge window [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.328807] pcieport 0000:00:04.1: PCI bridge to [bus 62-c0]
[60680.328809] pcieport 0000:00:04.1:   bridge window [io  0x2000-0x5fff]
[60680.328812] pcieport 0000:00:04.1:   bridge window [mem 0x60000000-0x77ffffff]
[60680.328814] pcieport 0000:00:04.1:   bridge window [mem 0x7000000000-0x7fffffffff 64bit pref]
[60680.329112] pcieport 0000:62:00.0: enabling device (0000 -> 0003)
[60680.329251] pcieport 0000:63:01.0: enabling device (0000 -> 0003)
[60680.329580] pcieport 0000:64:00.0: enabling device (0000 -> 0003)
[60680.329701] pcieport 0000:65:00.0: enabling device (0000 -> 0003)
[60680.330017] amdgpu 0000:66:00.0: enabling device (0000 -> 0003)
[60680.330054] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1458:0x240E 0xC8).
[60680.330158] [drm] register mmio base: 0x60000000
[60680.330159] [drm] register mmio size: 1048576
[60680.330241] [drm:amdgpu_discovery_set_ip_blocks [amdgpu]] *ERROR* amdgpu_discovery_init failed
[60680.330458] amdgpu 0000:66:00.0: amdgpu: Fatal error during GPU init
[60680.330461] amdgpu 0000:66:00.0: amdgpu: amdgpu: finishing device.
[60680.330502] amdgpu: probe of 0000:66:00.0 failed with error -22
[60680.330666] pci 0000:66:00.1: D0 power state depends on 0000:66:00.0
[60680.330700] snd_hda_intel 0000:66:00.1: enabling device (0000 -> 0002)
[60680.330752] snd_hda_intel 0000:66:00.1: Handle vga_switcheroo audio client
[60680.330754] snd_hda_intel 0000:66:00.1: Force to non-snoop mode
[60680.336667] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input31
[60680.336743] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input32
[60680.336801] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input33
[60680.336866] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input34
[60683.920580] ucsi_acpi USBC000:00: ucsi_handle_connector_change: ACK failed (-110)

It seems like this is probably the key error: [60680.330241] [drm:amdgpu_discovery_set_ip_blocks [amdgpu]] *ERROR* amdgpu_discovery_init failed but I’m not finding much online. I did find this issue that seemed really promising because I was also running without a monitor. But then realized I can plug it into my TV, and that did not change anything. The TV display works with my Windows laptop + eGPU, so it doesn’t seem like an issue with the eGPU or display.

Any ideas? Beyond dmesg and boltctl (which showed it as disconnected) I have no good clues about where to start.

Well that part is definitely a bit weird.

1 Like

@Adrian_Joachim Thanks for that input. I did not have any expectations for what would be weird so I didn’t try that more than once, but I checked again and now it does say connected in boltctl:

 ● Razer Core X
   ├─ type:          peripheral
   ├─ name:          Core X
   ├─ vendor:        Razer
   ├─ uuid:          e2030000-00b0-a518-a389-05840632b301
   ├─ generation:    Thunderbolt 3
   ├─ status:        authorized
   │  ├─ domain:     8b8e3804-b1a5-d24e-ffff-ffffffffffff
   │  ├─ rx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  ├─ tx speed:   40 Gb/s = 2 lanes * 20 Gb/s
   │  └─ authflags:  none
   ├─ authorized:    Fri 01 Dec 2023 08:08:51 PM UTC
   ├─ connected:     Fri 01 Dec 2023 08:08:51 PM UTC
   └─ stored:        Thu 30 Nov 2023 11:06:09 PM UTC
      ├─ policy:     iommu
      └─ key:        no

Nothing different in dmesg though. :frowning:

That does definitely look healthier.

Gimme a bit I’ll have a look at how the dmesg looks like with my 5700xt (not gonna pull the 7900 xtx out of my pc, it’s watercooled but it did work in the egpu enclosure with the t480s).

Edit: here you go, not really sure what’s going on in your case either.

[  118.690144] thunderbolt 1-2: new device found, vendor=0x8086 device=0x2
[  118.690151] thunderbolt 1-2: Intel Tamales Module 2
[  118.698087] pcieport 0000:00:04.1: pciehp: Slot(0-1): Card present
[  118.698090] pcieport 0000:00:04.1: pciehp: Slot(0-1): Link Up
[  118.831599] pci 0000:62:00.0: [8086:15ef] type 01 class 0x060400
[  118.831696] pci 0000:62:00.0: enabling Extended Tags
[  118.831855] pci 0000:62:00.0: supports D1 D2
[  118.831859] pci 0000:62:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[  118.831904] pci 0000:62:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:04.1 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[  118.832254] pci 0000:62:00.0: Adding to iommu group 3
[  118.841562] pci 0000:62:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[  118.841814] pci 0000:63:01.0: [8086:15ef] type 01 class 0x060400
[  118.841908] pci 0000:63:01.0: enabling Extended Tags
[  118.842650] pci 0000:63:01.0: supports D1 D2
[  118.842652] pci 0000:63:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[  118.842751] pci 0000:63:01.0: Adding to iommu group 3
[  118.842883] pci 0000:63:02.0: [8086:15ef] type 01 class 0x060400
[  118.842962] pci 0000:63:02.0: enabling Extended Tags
[  118.843059] pci 0000:63:02.0: supports D1 D2
[  118.843060] pci 0000:63:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[  118.843134] pci 0000:63:02.0: Adding to iommu group 3
[  118.843259] pci 0000:63:04.0: [8086:15ef] type 01 class 0x060400
[  118.843337] pci 0000:63:04.0: enabling Extended Tags
[  118.843740] pci 0000:63:04.0: supports D1 D2
[  118.843741] pci 0000:63:04.0: PME# supported from D0 D1 D2 D3hot D3cold
[  118.843858] pci 0000:63:04.0: Adding to iommu group 3
[  118.843980] pci 0000:62:00.0: PCI bridge to [bus 63-c0]
[  118.843991] pci 0000:62:00.0:   bridge window [io  0x0000-0x0fff]
[  118.843996] pci 0000:62:00.0:   bridge window [mem 0x00000000-0x000fffff]
[  118.844006] pci 0000:62:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[  118.844009] pci 0000:63:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[  118.844020] pci 0000:63:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[  118.844031] pci 0000:63:04.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[  118.844214] pci 0000:64:00.0: [1002:1478] type 01 class 0x060400
[  118.844256] pci 0000:64:00.0: reg 0x10: [mem 0x00000000-0x00003fff]
[  118.844505] pci 0000:64:00.0: PME# supported from D0 D3hot D3cold
[  118.844551] pci 0000:64:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:04.1 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[  118.844777] pci 0000:64:00.0: Adding to iommu group 3
[  118.844890] pci 0000:63:01.0: PCI bridge to [bus 64-c0]
[  118.844899] pci 0000:63:01.0:   bridge window [io  0x0000-0x0fff]
[  118.844905] pci 0000:63:01.0:   bridge window [mem 0x00000000-0x000fffff]
[  118.844914] pci 0000:63:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[  118.844919] pci 0000:64:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[  118.845151] pci 0000:65:00.0: [1002:1479] type 01 class 0x060400
[  118.845450] pci 0000:65:00.0: PME# supported from D0 D3hot D3cold
[  118.845604] pci 0000:65:00.0: Adding to iommu group 3
[  118.845752] pci 0000:64:00.0: PCI bridge to [bus 65-c0]
[  118.845766] pci 0000:64:00.0:   bridge window [io  0x0000-0x0fff]
[  118.845775] pci 0000:64:00.0:   bridge window [mem 0x00000000-0x000fffff]
[  118.845789] pci 0000:64:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[  118.845794] pci 0000:65:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[  118.846028] pci 0000:66:00.0: [1002:731f] type 00 class 0x030000
[  118.846082] pci 0000:66:00.0: reg 0x10: [mem 0x00000000-0x0fffffff 64bit pref]
[  118.846119] pci 0000:66:00.0: reg 0x18: [mem 0x00000000-0x001fffff 64bit pref]
[  118.846137] pci 0000:66:00.0: reg 0x20: [io  0x0000-0x00ff]
[  118.846156] pci 0000:66:00.0: reg 0x24: [mem 0x00000000-0x0007ffff]
[  118.846174] pci 0000:66:00.0: reg 0x30: [mem 0x00000000-0x0001ffff pref]
[  118.846354] pci 0000:66:00.0: PME# supported from D1 D2 D3hot D3cold
[  118.846413] pci 0000:66:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:04.1 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
[  118.846519] pci 0000:66:00.0: Adding to iommu group 3
[  118.846578] pci 0000:66:00.0: vgaarb: bridge control possible
[  118.846579] pci 0000:66:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[  118.846686] pci 0000:66:00.1: [1002:ab38] type 00 class 0x040300
[  118.846725] pci 0000:66:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[  118.846993] pci 0000:66:00.1: PME# supported from D1 D2 D3hot D3cold
[  118.847109] pci 0000:66:00.1: Adding to iommu group 3
[  118.847297] pci 0000:65:00.0: PCI bridge to [bus 66-c0]
[  118.847313] pci 0000:65:00.0:   bridge window [io  0x0000-0x0fff]
[  118.847321] pci 0000:65:00.0:   bridge window [mem 0x00000000-0x000fffff]
[  118.847336] pci 0000:65:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[  118.847338] pci_bus 0000:66: busn_res: [bus 66-c0] end is updated to 66
[  118.847347] pci_bus 0000:65: busn_res: [bus 65-c0] end is updated to 66
[  118.847355] pci_bus 0000:64: busn_res: [bus 64-c0] end is updated to 91
[  118.847504] pci 0000:92:00.0: [8086:15f0] type 00 class 0x0c0330
[  118.847532] pci 0000:92:00.0: reg 0x10: [mem 0x00000000-0x0000ffff]
[  118.847615] pci 0000:92:00.0: enabling Extended Tags
[  118.847722] pci 0000:92:00.0: supports D1 D2
[  118.847724] pci 0000:92:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[  118.847762] pci 0000:92:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:04.1 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[  118.848248] pci 0000:92:00.0: Adding to iommu group 3
[  118.848364] pci 0000:63:02.0: PCI bridge to [bus 92-c0]
[  118.848374] pci 0000:63:02.0:   bridge window [io  0x0000-0x0fff]
[  118.848379] pci 0000:63:02.0:   bridge window [mem 0x00000000-0x000fffff]
[  118.848388] pci 0000:63:02.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[  118.848390] pci_bus 0000:92: busn_res: [bus 92-c0] end is updated to 92
[  118.848651] pci 0000:63:04.0: PCI bridge to [bus 93-c0]
[  118.848661] pci 0000:63:04.0:   bridge window [io  0x0000-0x0fff]
[  118.848666] pci 0000:63:04.0:   bridge window [mem 0x00000000-0x000fffff]
[  118.848675] pci 0000:63:04.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[  118.848676] pci_bus 0000:93: busn_res: [bus 93-c0] end is updated to c0
[  118.848687] pci_bus 0000:63: busn_res: [bus 63-c0] end is updated to c0
[  118.848696] pci 0000:63:04.0: bridge window [mem 0x00100000-0x001fffff 64bit pref] to [bus 93-c0] add_size 100000 add_align 100000
[  118.848699] pci 0000:63:04.0: bridge window [mem 0x00100000-0x001fffff] to [bus 93-c0] add_size 100000 add_align 100000
[  118.848702] pci 0000:62:00.0: bridge window [mem 0x00100000-0x004fffff] to [bus 63-c0] add_size 100000 add_align 100000
[  118.848709] pci 0000:62:00.0: BAR 15: assigned [mem 0x6800000000-0x77ffffffff 64bit pref]
[  118.848711] pci 0000:62:00.0: BAR 14: assigned [mem 0x60000000-0x77ffffff]
[  118.848713] pci 0000:62:00.0: BAR 13: assigned [io  0x2000-0x5fff]
[  118.848716] pci 0000:63:01.0: BAR 15: assigned [mem 0x6800000000-0x70001fffff 64bit pref]
[  118.848718] pci 0000:63:01.0: BAR 14: assigned [mem 0x60000000-0x6bffffff]
[  118.848720] pci 0000:63:02.0: BAR 14: assigned [mem 0x6c000000-0x6c0fffff]
[  118.848722] pci 0000:63:02.0: BAR 15: assigned [mem 0x7000200000-0x70002fffff 64bit pref]
[  118.848723] pci 0000:63:04.0: BAR 14: assigned [mem 0x6c100000-0x77ffffff]
[  118.848725] pci 0000:63:04.0: BAR 15: assigned [mem 0x7000300000-0x77f42fffff 64bit pref]
[  118.848727] pci 0000:63:01.0: BAR 13: assigned [io  0x2000-0x2fff]
[  118.848729] pci 0000:63:02.0: BAR 13: assigned [io  0x3000-0x3fff]
[  118.848730] pci 0000:63:04.0: BAR 13: assigned [io  0x4000-0x4fff]
[  118.848733] pci 0000:64:00.0: BAR 15: assigned [mem 0x6800000000-0x6fffffffff 64bit pref]
[  118.848734] pci 0000:64:00.0: BAR 14: assigned [mem 0x60000000-0x6befffff]
[  118.848736] pci 0000:64:00.0: BAR 0: assigned [mem 0x6bf00000-0x6bf03fff]
[  118.848745] pci 0000:64:00.0: BAR 13: assigned [io  0x2000-0x2fff]
[  118.848747] pci 0000:65:00.0: BAR 15: assigned [mem 0x6800000000-0x6fffffffff 64bit pref]
[  118.848749] pci 0000:65:00.0: BAR 14: assigned [mem 0x60000000-0x6befffff]
[  118.848751] pci 0000:65:00.0: BAR 13: assigned [io  0x2000-0x2fff]
[  118.848754] pci 0000:66:00.0: BAR 0: assigned [mem 0x6800000000-0x680fffffff 64bit pref]
[  118.848779] pci 0000:66:00.0: BAR 2: assigned [mem 0x6810000000-0x68101fffff 64bit pref]
[  118.848806] pci 0000:66:00.0: BAR 5: assigned [mem 0x60000000-0x6007ffff]
[  118.848816] pci 0000:66:00.0: BAR 6: assigned [mem 0x60080000-0x6009ffff pref]
[  118.848818] pci 0000:66:00.1: BAR 0: assigned [mem 0x600a0000-0x600a3fff]
[  118.848828] pci 0000:66:00.0: BAR 4: assigned [io  0x2000-0x20ff]
[  118.848837] pci 0000:65:00.0: PCI bridge to [bus 66]
[  118.848842] pci 0000:65:00.0:   bridge window [io  0x2000-0x2fff]
[  118.848853] pci 0000:65:00.0:   bridge window [mem 0x60000000-0x6befffff]
[  118.848861] pci 0000:65:00.0:   bridge window [mem 0x6800000000-0x6fffffffff 64bit pref]
[  118.848876] pci 0000:64:00.0: PCI bridge to [bus 65-66]
[  118.848880] pci 0000:64:00.0:   bridge window [io  0x2000-0x2fff]
[  118.848892] pci 0000:64:00.0:   bridge window [mem 0x60000000-0x6befffff]
[  118.848900] pci 0000:64:00.0:   bridge window [mem 0x6800000000-0x6fffffffff 64bit pref]
[  118.848914] pci 0000:63:01.0: PCI bridge to [bus 64-91]
[  118.848917] pci 0000:63:01.0:   bridge window [io  0x2000-0x2fff]
[  118.848924] pci 0000:63:01.0:   bridge window [mem 0x60000000-0x6bffffff]
[  118.848930] pci 0000:63:01.0:   bridge window [mem 0x6800000000-0x70001fffff 64bit pref]
[  118.848940] pci 0000:92:00.0: BAR 0: assigned [mem 0x6c000000-0x6c00ffff]
[  118.848948] pci 0000:63:02.0: PCI bridge to [bus 92]
[  118.848952] pci 0000:63:02.0:   bridge window [io  0x3000-0x3fff]
[  118.848964] pci 0000:63:02.0:   bridge window [mem 0x6c000000-0x6c0fffff]
[  118.848969] pci 0000:63:02.0:   bridge window [mem 0x7000200000-0x70002fffff 64bit pref]
[  118.848978] pci 0000:63:04.0: PCI bridge to [bus 93-c0]
[  118.848981] pci 0000:63:04.0:   bridge window [io  0x4000-0x4fff]
[  118.848988] pci 0000:63:04.0:   bridge window [mem 0x6c100000-0x77ffffff]
[  118.848993] pci 0000:63:04.0:   bridge window [mem 0x7000300000-0x77f42fffff 64bit pref]
[  118.849002] pci 0000:62:00.0: PCI bridge to [bus 63-c0]
[  118.849005] pci 0000:62:00.0:   bridge window [io  0x2000-0x5fff]
[  118.849012] pci 0000:62:00.0:   bridge window [mem 0x60000000-0x77ffffff]
[  118.849017] pci 0000:62:00.0:   bridge window [mem 0x6800000000-0x77ffffffff 64bit pref]
[  118.849026] pcieport 0000:00:04.1: PCI bridge to [bus 62-c0]
[  118.849028] pcieport 0000:00:04.1:   bridge window [io  0x2000-0x5fff]
[  118.849031] pcieport 0000:00:04.1:   bridge window [mem 0x60000000-0x77ffffff]
[  118.849034] pcieport 0000:00:04.1:   bridge window [mem 0x6800000000-0x77ffffffff 64bit pref]
[  118.849399] pcieport 0000:62:00.0: enabling device (0000 -> 0003)
[  118.849571] pcieport 0000:63:01.0: enabling device (0000 -> 0003)
[  118.849868] pcieport 0000:63:01.0: pciehp: Slot #1 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[  118.850533] pcieport 0000:63:02.0: enabling device (0000 -> 0003)
[  118.850828] pcieport 0000:63:04.0: enabling device (0000 -> 0003)
[  118.851015] pcieport 0000:63:04.0: pciehp: Slot #4 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[  118.851995] pcieport 0000:64:00.0: enabling device (0000 -> 0003)
[  118.852150] pcieport 0000:65:00.0: enabling device (0000 -> 0003)
[  118.852427] pci 0000:66:00.0: disabling ATS
[  118.852489] amdgpu 0000:66:00.0: enabling device (0000 -> 0003)
[  118.852537] [drm] initializing kernel modesetting (NAVI10 0x1002:0x731F 0x1002:0x0B36 0xC1).
[  118.852639] [drm] register mmio base: 0x60000000
[  118.852640] [drm] register mmio size: 524288
[  121.731218] [drm] add ip block number 0 <nv_common>
[  121.731222] [drm] add ip block number 1 <gmc_v10_0>
[  121.731224] [drm] add ip block number 2 <navi10_ih>
[  121.731225] [drm] add ip block number 3 <psp>
[  121.731226] [drm] add ip block number 4 <smu>
[  121.731228] [drm] add ip block number 5 <dm>
[  121.731229] [drm] add ip block number 6 <gfx_v10_0>
[  121.731231] [drm] add ip block number 7 <sdma_v5_0>
[  121.731232] [drm] add ip block number 8 <vcn_v2_0>
[  121.731233] [drm] add ip block number 9 <jpeg_v2_0>
[  121.731252] amdgpu 0000:66:00.0: amdgpu: ACPI VFCT table present but broken (too short #2),skipping
[  121.872439] amdgpu 0000:66:00.0: amdgpu: Fetched VBIOS from ROM BAR
[  121.872442] amdgpu: ATOM BIOS: 113-D1820501-101
[  121.887797] [drm] VCN decode is enabled in VM mode
[  121.887801] [drm] VCN encode is enabled in VM mode
[  121.889811] [drm] JPEG decode is enabled in VM mode
[  121.889817] amdgpu 0000:66:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[  121.889836] amdgpu 0000:66:00.0: amdgpu: PCIE atomic ops is not supported
[  121.889842] [drm] GPU posting now...
[  121.889940] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[  121.889956] amdgpu 0000:66:00.0: BAR 2: releasing [mem 0x6810000000-0x68101fffff 64bit pref]
[  121.889959] amdgpu 0000:66:00.0: BAR 0: releasing [mem 0x6800000000-0x680fffffff 64bit pref]
[  121.889966] amdgpu 0000:66:00.0: BAR 0: assigned [mem 0x6800000000-0x680fffffff 64bit pref]
[  121.889984] amdgpu 0000:66:00.0: BAR 2: assigned [mem 0x6810000000-0x68101fffff 64bit pref]
[  121.890008] amdgpu 0000:66:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[  121.890011] amdgpu 0000:66:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[  121.890012] amdgpu 0000:66:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[  121.890036] [drm] Detected VRAM RAM=8176M, BAR=256M
[  121.890037] [drm] RAM width 256bits GDDR6
[  121.890653] [drm] amdgpu: 8176M of VRAM memory ready
[  121.890655] [drm] amdgpu: 29979M of GTT memory ready.
[  121.890677] [drm] GART: num cpu pages 131072, num gpu pages 131072
[  121.890903] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[  121.892262] [drm] Found VCN firmware Version ENC: 1.21 DEC: 6 VEP: 0 Revision: 0
[  121.892271] amdgpu 0000:66:00.0: amdgpu: Will use PSP to load VCN firmware
[  121.947465] [drm] reserve 0x900000 from 0x81fd000000 for PSP TMR
[  121.991060] amdgpu 0000:66:00.0: amdgpu: RAS: optional ras ta ucode is not available
[  121.996920] amdgpu 0000:66:00.0: amdgpu: RAP: optional rap ta ucode is not available
[  121.996922] amdgpu 0000:66:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[  121.997020] amdgpu 0000:66:00.0: amdgpu: use vbios provided pptable
[  121.997022] amdgpu 0000:66:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
[  122.033484] amdgpu 0000:66:00.0: amdgpu: SMU is initialized successfully!
[  122.033727] [drm] Display Core v3.2.247 initialized on DCN 2.0
[  122.033729] [drm] DP-HDMI FRL PCON supported
[  122.046458] [drm] kiq ring mec 2 pipe 1 q 0
[  122.050002] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[  122.050203] [drm] JPEG decode initialized successfully.
[  122.093084] amdgpu: HMM registered 8176MB device memory
[  122.094230] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[  122.094244] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[  122.094587] amdgpu: Virtual CRAT table created for GPU
[  122.094921] amdgpu: Topology: Add dGPU node [0x731f:0x1002]
[  122.094923] kfd kfd: amdgpu: added device 1002:731f
[  122.094958] amdgpu 0000:66:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 10, active_cu_number 40
[  122.095250] amdgpu 0000:66:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[  122.095251] amdgpu 0000:66:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[  122.095252] amdgpu 0000:66:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[  122.095253] amdgpu 0000:66:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[  122.095254] amdgpu 0000:66:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[  122.095255] amdgpu 0000:66:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[  122.095256] amdgpu 0000:66:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[  122.095257] amdgpu 0000:66:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[  122.095258] amdgpu 0000:66:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[  122.095259] amdgpu 0000:66:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[  122.095260] amdgpu 0000:66:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[  122.095261] amdgpu 0000:66:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[  122.095262] amdgpu 0000:66:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 8
[  122.095263] amdgpu 0000:66:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 8
[  122.095263] amdgpu 0000:66:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 8
[  122.095264] amdgpu 0000:66:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[  122.096962] amdgpu 0000:66:00.0: amdgpu: Using BOCO for runtime pm
[  122.098505] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:66:00.0 on minor 0
[  122.100757] amdgpu 0000:66:00.0: [drm] Cannot find any crtc or sizes
[  122.101016] pci 0000:66:00.1: D0 power state depends on 0000:66:00.0
[  122.101068] snd_hda_intel 0000:66:00.1: enabling device (0000 -> 0002)
[  122.101132] snd_hda_intel 0000:66:00.1: Handle vga_switcheroo audio client
[  122.101133] snd_hda_intel 0000:66:00.1: Force to non-snoop mode
[  122.101177] pci 0000:92:00.0: enabling device (0000 -> 0002)
[  122.101351] xhci_hcd 0000:92:00.0: xHCI Host Controller
[  122.101360] xhci_hcd 0000:92:00.0: new USB bus registered, assigned bus number 9
[  122.102568] xhci_hcd 0000:92:00.0: hcc params 0x200077c1 hci version 0x110 quirks 0x0000000200009810
[  122.103109] xhci_hcd 0000:92:00.0: xHCI Host Controller
[  122.103112] xhci_hcd 0000:92:00.0: new USB bus registered, assigned bus number 10
[  122.103114] xhci_hcd 0000:92:00.0: Host supports USB 3.1 Enhanced SuperSpeed
[  122.103395] usb usb9: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.06
[  122.103397] usb usb9: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[  122.103399] usb usb9: Product: xHCI Host Controller
[  122.103401] usb usb9: Manufacturer: Linux 6.6.3-arch1-1 xhci-hcd
[  122.103402] usb usb9: SerialNumber: 0000:92:00.0
[  122.103539] hub 9-0:1.0: USB hub found
[  122.103549] hub 9-0:1.0: 2 ports detected
[  122.103763] usb usb10: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.06
[  122.103764] usb usb10: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[  122.103766] usb usb10: Product: xHCI Host Controller
[  122.103766] usb usb10: Manufacturer: Linux 6.6.3-arch1-1 xhci-hcd
[  122.103767] usb usb10: SerialNumber: 0000:92:00.0
[  122.103863] hub 10-0:1.0: USB hub found
[  122.103877] hub 10-0:1.0: 2 ports detected
[  122.108163] snd_hda_intel 0000:66:00.1: bound 0000:66:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
[  122.109238] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input15
[  122.109293] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input16
[  122.109334] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input17
[  122.109373] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input18
[  122.109416] input: HDA ATI HDMI HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input19
[  122.109462] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:04.1/0000:62:00.0/0000:63:01.0/0000:64:00.0/0000:65:00.0/0000:66:00.1/sound/card2/input20
[  336.766911] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[  336.766956] [drm] PSP is resuming...
[  336.799188] [drm] reserve 0x900000 from 0x81fd000000 for PSP TMR
[  336.877141] amdgpu 0000:66:00.0: amdgpu: RAS: optional ras ta ucode is not available
[  336.889503] amdgpu 0000:66:00.0: amdgpu: RAP: optional rap ta ucode is not available
[  336.889506] amdgpu 0000:66:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[  336.889510] amdgpu 0000:66:00.0: amdgpu: SMU is resuming...
[  336.926541] amdgpu 0000:66:00.0: amdgpu: SMU is resumed successfully!
[  336.931765] [drm] kiq ring mec 2 pipe 1 q 0
[  336.935890] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[  336.936144] [drm] JPEG decode initialized successfully.
[  336.936178] amdgpu 0000:66:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[  336.936180] amdgpu 0000:66:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[  336.936181] amdgpu 0000:66:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[  336.936182] amdgpu 0000:66:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[  336.936183] amdgpu 0000:66:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[  336.936184] amdgpu 0000:66:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[  336.936184] amdgpu 0000:66:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[  336.936185] amdgpu 0000:66:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[  336.936186] amdgpu 0000:66:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[  336.936187] amdgpu 0000:66:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[  336.936188] amdgpu 0000:66:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[  336.936189] amdgpu 0000:66:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[  336.936190] amdgpu 0000:66:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 8
[  336.936191] amdgpu 0000:66:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 8
[  336.936192] amdgpu 0000:66:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 8
[  336.936193] amdgpu 0000:66:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[  336.939535] amdgpu 0000:66:00.0: [drm] Cannot find any crtc or sizes
1 Like

Thanks! Definitely doesn’t look like mine…though no idea why. What distro and kernel version are you running if I might ask? I kind of doubt that is the issue because I have a fairly recent kernel (6.6.2), but who knows.

Arch on 6.6.3 but I did have the 7900xtx running in the egpu enclosure on a much older kernel months back.

Would you happen to have some other gpu to test by any chance? Just so we can see if it’s a problem with the gpu or with the enclosure.

1 Like

Wish I did! Maybe the next thing to try is boot some other distro from a USB flash drive and see if that works.

There’s some eGPU related race conditions that just got fixed in 6.7. Can you please try the latest 6.7 RC kernel?

2 Likes

Thanks! This seems very promising. I was able to try out RC3 by rebasing Silverblue to rawhide. Specifically, that has kernel-6.7.0-0.rc3.20231201git994d5c58e50e.32.fc40 as of this morning when I tried it. I don’t get a dmesg error anymore, and the GPU shows up in lspci | grep VGA and radeontop -b XX where XX is the number I got from lspci. However, rawhide caused all sorts of other issues so I couldn’t actually try compiling ROCm code, but I am pretty optimistic.

The only oddity is I noticed is that every 6 seconds like clockwork there is a string of messages in dmesg like this:

[  313.036078] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[  313.036140] [drm] PSP is resuming...
[  313.072281] [drm] reserve 0x1300000 from 0x85fc000000 for PSP TMR
[  313.267382] amdgpu 0000:07:00.0: amdgpu: RAP: optional rap ta ucode is not available
[  313.267387] amdgpu 0000:07:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[  313.267391] amdgpu 0000:07:00.0: amdgpu: SMU is resuming...
[  313.267400] amdgpu 0000:07:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x0000003f, smu fw program = 0, smu fw version = 0x004e6601 (78.102.1)
[  313.267405] amdgpu 0000:07:00.0: amdgpu: SMU driver if version not matched
[  313.438406] amdgpu 0000:07:00.0: amdgpu: SMU is resumed successfully!
[  313.441733] [drm] DMUB hardware initialized: version=0x07002300
[  313.451340] [drm] kiq ring mec 3 pipe 1 q 0
[  313.464132] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[  313.465088] amdgpu 0000:07:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[  313.465448] amdgpu 0000:07:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[  313.465450] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[  313.465451] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[  313.465452] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[  313.465452] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[  313.465453] amdgpu 0000:07:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[  313.465454] amdgpu 0000:07:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[  313.465455] amdgpu 0000:07:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[  313.465456] amdgpu 0000:07:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[  313.465457] amdgpu 0000:07:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[  313.465458] amdgpu 0000:07:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[  313.465459] amdgpu 0000:07:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[  313.465460] amdgpu 0000:07:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8
[  313.465461] amdgpu 0000:07:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 8
[  313.465461] amdgpu 0000:07:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
[  313.470119] amdgpu 0000:07:00.0: [drm] Cannot find any crtc or sizes
[  313.470916] [drm] ring gfx_32772.1.1 was added
[  313.471510] [drm] ring compute_32772.2.2 was added
[  313.472167] [drm] ring sdma_32772.3.3 was added
[  313.472232] [drm] ring gfx_32772.1.1 ib test pass
[  313.472265] [drm] ring compute_32772.2.2 ib test pass
[  313.472448] [drm] ring sdma_32772.3.3 ib test pass

Maybe it is fine, but it is accompanied by a barely audible click from the GPU each time, so I am slightly concerned it is resetting or something.

I’ll see if I can get the rawhide kernel overlayed on my regular Silverblue deployment so I can actually try use it.

That’s from it going in and out of runtime PM.
The idea is if nothing is running (compute or display) that GPU should go to sleep to conserve power. If you always want it on you can use amdgpu.runpm=0.

Do you have some software polling GPU sensors?
That would kick it out.

1 Like

That makes sense!

Medium good news: After getting kernel 6.7rc3 overlayed on Silverblue stable/updates, the GPU works as a graphics processing unit! That is, I can plug it into the TV and it works great.

AI workloads via ROCm do not work, though. PyTorch gives me “HIP error: the operation cannot be performed in the present state”, which seems like it might be related to having multiple GPUs: Limitations — Use ROCm on Radeon GPUs
I did not consider that the iGPU counts as a GPU, but it was necessary to use HIP_VISIBLE_DEVICES="1" to get PyTorch to see the eGPU, so I guess it makes sense.

I tried a hipBLAS (with rocBLAS) C++ workload via llama.cpp, and that was able to load a neural net onto the eGPU, but seemed to hang after that. It did show the appropriate VRAM usage in radeontop at least.

Edit: Seems like it might also be related to PCIe atomics: Changelog — ROCm 5.7.1 Documentation Home
dmesg | grep atomic indeed shows they are unavailable, so I will maybe try compile from scratch with the workaround that page suggests.

Can you share your whole kernel log (dmesg)? Also can you please share output of sudo lspci -nnvv?

I’ll see if something stands out to me. I suspect the issue is that something in the chain doesn’t support PCIe atomics, and it will be clearer looking at that which ones in the chain don’t.

1 Like

Here those are. This is right after restarting so there shouldn’t be too much other stuff in there. Thanks for taking a look!

lspci -nnvv - Pastebin.comsudo lspci -nnvv (including redirected STDERR because there was one “pcilib” error there)
dmesg - Pastebin.comdmesg

dGPU, DS port and US port look good:

05:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 10) (prog-if 00 [Normal decode])			 
			 AtomicOpsCap: Routing+ 32bit- 64bit- 128bitCAS-
06:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 10) (prog-if 00 [Normal decode])
			 AtomicOpsCap: Routing+
07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8) (prog-if 00 [VGA controller])
			 AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-

Both possible PCIe bridges (inside your enclosure) look bad:

03:00.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02) (prog-if 00 [Normal decode])
			 AtomicOpsCap: Routing-
04:01.0 PCI bridge [0604]: Intel Corporation JHL6340 Thunderbolt 3 Bridge (C step) [Alpine Ridge 2C 2016] [8086:15da] (rev 02) (prog-if 00 [Normal decode])
			 AtomicOpsCap: Routing-

Both possible PCIe tunneling ports look good:

00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel [1022:14ef] (prog-if 00 [Normal decode])
			 AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS-
00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h USB4/Thunderbolt PCIe tunnel [1022:14ef] (prog-if 00 [Normal decode])
			 AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS-

So I believe the problem is likely your enclosure.

2 Likes

FYI, I can reproduce the same issue on an eGPU enclosure I have on hand (Sonnet eGPU breakaway box 750). So anyone who comes by this and wants to use an eGPU for ROCm, this isn’t viable.

2 Likes

Nothing to add as Mario tackled this. Marking responded.

Thanks very much, Mario. Glad to know it is not just my enclosure (which I got refurbished so I was mildly concerned) nor even my specific model of enclosure. That makes me more hopeful that it is possible for ROCm to support in the future given that it works on a Windows Intel laptop. I will try dive into compiling for ROCm with the PCIe atomics workaround if I can figure that out, but in any case thanks for all the education.

This exact same enclosure and dGPU works on an Intel laptop with Windows? That’s odd to me. Can you by chance load Linux on that laptop (same kernel as you tested on the FW 13 if you can).

1 Like

Yes indeed, exact same setup works great with ROCm on an Intel+Windows laptop. Granted, the development experience is annoying compared to Linux so I haven’t used it much, but I ran some precompiled ROCm code that worked incredibly well.

Unfortunately, it is my work laptop and I can’t get too adventurous. I know from experience that dual booting is out of the question, but there might be a chance I could load a live USB. I’m a bit overly cautious about getting locked out and will probably have to wait a few weeks until there’s a stretch of days when I don’t need it, though, in case it were to have an allergic reaction to live USB (maybe that’s silly, I don’t know, I’ve just been burned by Bitlocker on work laptops in the past for lesser sins).

Ok well interesting data point when you can provide it. I think you might want to report your results to rocm repository when you have data points and logs to show.

1 Like