AMD Framework and NVMe SSD Enclosure Compatibility Investigation

Thank you Jason for showing me this thread, I had my own thread going before noticing this one USB Storage issue. I have very similar issues all of you are having on my FW 16. I have an SSK NVME enclosure that when I use windows to go I get I/O lockup at random or under high load, regardless of connection port or cable type. 100% disk utilization, 0 throughput, disk queue climbing into the hundreds while windows waits for the drive to reconnect. The drive won’t work on port 6 or port 3, although my SanDisk 10gbps external SSD works fine on those ports. Latest NVME firmware.

Works fine on my desktop Ryzen 5 3600 system at 10gbps.

And it worked fine on my 11th gen Intel XPS 9710. So clearly it’s a framework AMD USB implementation issue at work here sadly.

A workaround I have used (and hopefully is helpful to some) is I use a USB 3.1 5gbps capable USB hub as a “speed limiter” for my SSK NVME enclosure and that resolves my issue but of course cuts my speed in half and I have to use an additional device to make it possible. For me it seems to be that the FW16 can’t handle full load from 10 gbps while booted from it, however I did verify I have no issues using this drive when not using it as a boot device. I was able to get 950MBps copies from my SSK NVME to my SanDisk 10gbps SSD. SSK NVME was in port 5 USB-A and SanDisk was in Port 6 USB-A.

1 Like

Yeah. At this point it does seems to be an compatibility issue that could be similarly observed on both the AMD13 and FW16. I just don’t have the expertise nor equipment to determine if the issue is happening on the data level, or power-level.

Hi,
My guess is that the problem is usb power related.
A way to test this would be to put a 10Gbps separately powered hub between the laptop and external ssd and see if it is any more reliable.

Sadly. After upgrade BIOS 3.06 beta, this problem still exists.

Yeah, and ‘Known Issues’ section only addressed connectivity issue with USB-C equipped Apple devices, but not this problem we have here. I’m gradually becoming less and less inclined to believe Framework is still at the investigation stage of this issue with no more details to offer, after nearly a year of initial reporting to them.

I know FW16, which uses similar architecture as AMD13 and also affected by this issue per community reports, got a new board revision, Rev: 1A on 2024-05-07 vs the original Rev: 1.0 on 2023-10-19. I’m unsure if this revision is due to other unrelated reasons, or included mid-cycle fixes for issues like this post. Hopefully someone that recently purchased or RMAed AMD13 boards can check the revision on the new boards they are receiving.

My AMD13 just got shipped, where would i find the revision?

On the back of the mainboard is where you would find the REV and date.

You do need to take out the mainboard to check, but taking it out shouldn’t damage anything or affect your warranty as long as performed carefully.

1 Like

So my Board says Rev. 1A with Date 2023-08-04 but i have not yet installed an OS, but it boots after teardown and reassembly :sweat_smile:

Sorry for the late response.

Yeah, AFAIK Rev. 1A should be the launch day AMD13 PCB. At least that’s the same revision last I saw.

This issue is still occurring after reading a few GB from my external SSD enclosure:

How quickly this issue occurs, depends on the connection speed:

  • USB 4 Slot: happens after reading 12 GiB
  • USB 3.2 Slot: happens after reading 40 GiB
  • Using a USB 2.0 cable, it reads easily past 90+ GiB

Kernel: kernel-6.13.5-200.fc41.x86_64
BIOS: 3.05
Enclosure: UGREEN 90408
SSD: WD_BLACK SN850X 1TB

1 Like

Have you tried the 3.07 Beta?

Not yet. Do 3.06/3.07 contain any fixes related to this problem which make it worth trying?

Possible causes of the problem:

  1. hardware - I have seem some enclosures have loose or faulty connectors.
    Fixed by returning the faulty unit.
  2. power supply from the usb port not enough.
    Fixed by putting a powered hub between the FW and the enclosure.
    Note: not all powered hubs work. I have a power hub that has 4 usb ports. Only one of the usb ports work with my enclosure.
  3. Over heating
    Some SSDs have temp sensors on them, that can be reported by smartmon tools.
    Monitor the temps over time and see if they are getting too hot. Once you loose the usb connection, you loose the temp sensing also.
  4. faulty cables. Some usb cables are just better than others. I have some that work OK a low speed and then are intermittently faulty at higher speeds.

If you can try the above suggestion and maybe narrow down the problem to which one it is, you might make some progress towards a solution.

1 Like

This is a huge one. I have bought over 8 external enclosures and it happens more often than not that the actual usb cable that comes with the enclosure is faulty. Either it can’t provide enough power to the enclosure, or it causes it to connect at slower speeds. Always try a new cable when you have problems. It’s the easiest thing to try.

1 Like

Just updated to 3.07. Tried a few different USB cables. Speed is ok, but the issue still happens after reading a few GB.

The kernel messages for me looks very similar to the above screenshot running on FW16 and FW13 (AI 9 370HX) + Debian/trixie 6.12.27-amd64 with the Zike Z666 with a Samsung 970 Pro inside, hotplugged. I just have an additional message about nvme 0000:62:00.0: platform quirk: setting simple suspend. This combination did work for a small window of time (kernel updates?) and, annoyingly, it works in Windows 11.

When I enable more kernel debug messages, I see a lot of pci power management related messages. Interestingly enough, when you disable ASPM (kernel cmdline pcie_aspm=off), the kernel+asmedia-chip seem to downgrade the thunderbolt connection to classic usb 3.2 and exposes a pretty fast scsi block device. So, if you are also getting desperate with your Zike Z666 paperweight on linux, you could temporarily disable ASPM and use your drive via generic scsi.

Fallback to USB 3.2 and SCSI
[  761.848348] thunderbolt 1-2: new device found, vendor=0x1ca device=0xd666
[  761.848358] thunderbolt 1-2: Gopod Group Limited. USB4 NVMe SSD Pro Enclosure
[  762.570647] ucsi_acpi USBC000:00: unknown error 0
[  762.584781] thunderbolt 1-0:2.1: new retimer found, vendor=0x1da0 device=0x8833
[  763.422491] thunderbolt 1-2:1.1: new retimer found, vendor=0x1da0 device=0x8833
[  768.528430] thunderbolt 1-0:2.1: retimer disconnected
[  768.530073] thunderbolt 1-2:1.1: retimer disconnected
[  768.530210] thunderbolt 1-2: device disconnected
[  770.447558] usb 8-1: new SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[  770.464414] usb 8-1: New USB device found, idVendor=2d01, idProduct=3666, bcdDevice= 1.00
[  770.464425] usb 8-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[  770.464427] usb 8-1: Product: USB 3.2 SSD Drive Enclosure
[  770.464429] usb 8-1: Manufacturer: Gopod Group Limited.
[  770.464430] usb 8-1: SerialNumber: AAAABBBB0019
[  770.466154] scsi host0: uas
[  771.320010] scsi 0:0:0:0: Direct-Access     Gopod    Enclosure        0    PQ: 0 ANSI: 6
[  772.630738] sd 0:0:0:0: Attached scsi generic sg0 type 0
[  772.631042] sd 0:0:0:0: [sda] 2000409264 512-byte logical blocks: (1.02 TB/954 GiB)
[  772.631150] sd 0:0:0:0: [sda] Write Protect is off
[  772.631155] sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
[  772.631332] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  772.687152] sd 0:0:0:0: [sda] Preferred minimum I/O size 512 bytes
[  772.687160] sd 0:0:0:0: [sda] Optimal transfer size 2097152 bytes
[  772.734245]  sda: sda1
[  772.734559] sd 0:0:0:0: [sda] Attached SCSI disk

Disks utility reports about 1.0GB/s read and write performance.

Normal thunderbolt with nvme enumeration failure
[11021.856441] thunderbolt 1-2: new device found, vendor=0x1ca device=0xd666
[11021.856452] thunderbolt 1-2: Gopod Group Limited. USB4 NVMe SSD Pro Enclosure
[11022.533476] ucsi_acpi USBC000:00: unknown error 0
[11022.597312] thunderbolt 1-0:2.1: new retimer found, vendor=0x1da0 device=0x8833
[11023.431642] thunderbolt 1-2:1.1: new retimer found, vendor=0x1da0 device=0x8833
[11023.548868] pcieport 0000:00:01.2: pciehp: Slot(0-1): Card present
[11023.548882] pcieport 0000:00:01.2: pciehp: Slot(0-1): Link Up
[11023.673905] pci 0000:60:00.0: [1b21:2463] type 01 class 0x060400 PCIe Switch Upstream Port
[11023.673958] pci 0000:60:00.0: PCI bridge to [bus 00]
[11023.673975] pci 0000:60:00.0:   bridge window [io  0x0000-0x0fff]
[11023.673982] pci 0000:60:00.0:   bridge window [mem 0x00000000-0x000fffff]
[11023.673999] pci 0000:60:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[11023.674021] pci 0000:60:00.0: enabling Extended Tags
[11023.674181] pci 0000:60:00.0: PME# supported from D0 D3hot D3cold
[11023.674301] pci 0000:60:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:01.2 (capable of 15.753 Gb/s with 16.0 GT/s PCIe x1 link)
[11023.674694] pci 0000:60:00.0: Adding to iommu group 29
[11023.674931] pcieport 0000:00:01.2: ASPM: current common clock configuration is inconsistent, reconfiguring
[11023.685319] pci 0000:60:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[11023.685501] pci 0000:61:00.0: [1b21:2463] type 01 class 0x060400 PCIe Switch Downstream Port
[11023.685541] pci 0000:61:00.0: PCI bridge to [bus 00]
[11023.685554] pci 0000:61:00.0:   bridge window [io  0x0000-0x0fff]
[11023.685561] pci 0000:61:00.0:   bridge window [mem 0x00000000-0x000fffff]
[11023.685577] pci 0000:61:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[11023.685600] pci 0000:61:00.0: enabling Extended Tags
[11023.685756] pci 0000:61:00.0: PME# supported from D0 D3hot D3cold
[11023.686037] pci 0000:61:00.0: Adding to iommu group 30
[11023.686286] pci 0000:60:00.0: PCI bridge to [bus 61-be]
[11023.686315] pci 0000:61:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[11023.686427] pci 0000:62:00.0: [144d:a808] type 00 class 0x010802 PCIe Endpoint
[11023.686472] pci 0000:62:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
[11023.686833] pci 0000:62:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x1 link at 0000:00:01.2 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
[11023.686967] pci 0000:62:00.0: Adding to iommu group 30
[11023.693328] pci 0000:61:00.0: PCI bridge to [bus 62-be]
[11023.693349] pci_bus 0000:62: busn_res: [bus 62-be] end is updated to 62
[11023.693358] pci_bus 0000:61: busn_res: [bus 61-be] end is updated to 62
[11023.693379] pci 0000:60:00.0: bridge window [mem 0x80000000-0x97ffffff]: assigned
[11023.693383] pci 0000:60:00.0: bridge window [mem 0x2000000000-0x3fffffffff 64bit pref]: assigned
[11023.693385] pci 0000:60:00.0: bridge window [io  0x2000-0x5fff]: assigned
[11023.693389] pci 0000:61:00.0: bridge window [mem 0x80000000-0x97ffffff]: assigned
[11023.693391] pci 0000:61:00.0: bridge window [mem 0x2000000000-0x3fffffffff 64bit pref]: assigned
[11023.693393] pci 0000:61:00.0: bridge window [io  0x2000-0x5fff]: assigned
[11023.693396] pci 0000:62:00.0: BAR 0 [mem 0x80000000-0x80003fff 64bit]: assigned
[11023.693412] pci 0000:61:00.0: PCI bridge to [bus 62]
[11023.693416] pci 0000:61:00.0:   bridge window [io  0x2000-0x5fff]
[11023.693422] pci 0000:61:00.0:   bridge window [mem 0x80000000-0x97ffffff]
[11023.693426] pci 0000:61:00.0:   bridge window [mem 0x2000000000-0x3fffffffff 64bit pref]
[11023.693433] pci 0000:60:00.0: PCI bridge to [bus 61-62]
[11023.693444] pci 0000:60:00.0:   bridge window [io  0x2000-0x5fff]
[11023.693450] pci 0000:60:00.0:   bridge window [mem 0x80000000-0x97ffffff]
[11023.693454] pci 0000:60:00.0:   bridge window [mem 0x2000000000-0x3fffffffff 64bit pref]
[11023.693461] pcieport 0000:00:01.2: PCI bridge to [bus 60-be]
[11023.693463] pcieport 0000:00:01.2:   bridge window [io  0x2000-0x5fff]
[11023.693466] pcieport 0000:00:01.2:   bridge window [mem 0x80000000-0x97ffffff]
[11023.693469] pcieport 0000:00:01.2:   bridge window [mem 0x2000000000-0x3fffffffff 64bit pref]
[11023.693815] pcieport 0000:60:00.0: enabling device (0000 -> 0003)
[11023.694004] pcieport 0000:61:00.0: enabling device (0000 -> 0003)
[11023.694423] nvme 0000:62:00.0: platform quirk: setting simple suspend
[11023.694576] nvme nvme1: pci function 0000:62:00.0
[11023.694588] nvme 0000:62:00.0: enabling device (0000 -> 0002)
Tutorials

Enable Extra Logging

Disable SecureBoot and then do the following:

echo 'module nvme +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
echo 'module thunderbolt +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
echo 'module pcieport +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
echo 'module pci +p' | sudo tee /sys/kernel/debug/dynamic_debug/control
# Start listening
sudo dmesg -w

Disable ASPM

This should not be used regularly.

  1. Disable SecureBoot
  2. At the GRUB boot menu, highlight the normal boot configuration and then press e to temporarily edit the config for this boot.
  3. Add to the end of the linux line the token pcie_aspm=off.
  4. Press F10 or Ctrl-X (I believe) to start booting

I have no clue why this downgrades the external nvme enclosure to usb 3.2 + scsi, since another usb4 hub still works.

@Alex_H, what do the kernel messages look like when you first plug in your external nvme into a usb4 capable slot? Or, if you are you booting from this external nvme, do you have earlier messages?

What I see in your screenshot is that your are interfacing with a generic scsi device, not nvme. This could indicate that your external pci/nvme controller downgraded the communication to basic usb and the controller is now emulating scsi. These errors could also be a symptom, but I think the primary issue started when the kernel (or previous) negotiated with this device initially.

I encountered NVMe TB enclosure issue on my new AI 300 FW13.

I have a Samsung 970 Evo Plus SSD installed in an ACASIS TBU401E USB4 enclosure. When I connect it to the USB 3.2 port, everything is fine, the SSD is detected by the scsi driver and shows up in the system as /dev/sdb. But when I plug it into the USB4 port, the device doesn’t show up.

However, I can connect and use this disk by using kernel args nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off and doing echo 1 | sudo tee /sys/bus/pci/rescan after connecting the SSD. It shows up as /dev/nvme0n1 (I dont have internal NVMe yet and run system on 1TB storage card).

Bug report on kernel bugzilla with additional info: 220175 – NVMe SSD doesn't work via Thunderbolt enclosure on Framework 13 (Ryzen AI 300)

2 Likes

Man this is really weird that it only happens with 970s, the 980 is fine, oem version of the 970 evo (pm981) and 980 pro(pm9a1) is fine, basically any other ssd is fine but 970 evo/evo plus/pro/ pro plus all seem to be able to trigger weird stuff.

1 Like