FW16 NixOS Freezing crashing from SSD turning off?

Which Linux distro are you using?
NixOS

Which release version?
Unstable, last update November 27th, 2024
See my flake.lock for exact version.

Which kernel are you using?
Linux 6.11.10

Which BIOS version are you using?
3.03

Which Framework Laptop 16 model are you using?
CPU: AMD Ryzen 7 7840HS
GPU: AMD Radeon 780M [Integrated]

SSDs:
2280 slot: Samsung SSD 970 EVO Plus 500GB
2230 slot: Sabrent SB-2130-512
The OS is installed to the Samsung.

NixOS configuration:
https://github.com/slippyice/nixconfigs

Issue:
The FW will randomly decide to freeze/crash. Sometimes it happens right after waking up from sleep, and other times it just happens when I’m watching youtube. When it crashes the programs begin to become unresponsive, including the desktop (KDE Plasma 6) until it reaches a black screen. Sometimes I can switch into TTY mode, and sometimes I can’t. Even in TTY mode if I tried to log in, it’d freeze and go to a black screen with the cursor blinking in the top left corner.

It was really hard to get any useful debugging info as most of the times this is what was displayed in TTY

I finally was lucky enough to get error messages with meaning:



nvme0 refers to the Samsung SSD.

Most of the time nothing of interest pops up in dmesg while everything is working, but I was lucky enough to get this set of messages one time while gaming:

[ 163.796449] usb 1-4.1: reset full-speed USB device number 9 using xhci_hcd
[ 344.088433] cros-ec-dev cros-ec-dev.1.auto: Some logs may have been dropped…
[ 1493.562450] i2c_designware AMDI0010:00: i2c_dw_handle_tx_abort: lost arbitration
[ 1539.918593] usb 1-4.1: reset full-speed USB device number 9 using xhci_hcd
[ 1664.558723] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[ 1664.558735] nvme nvme0: Does your device have a faulty power saving mode enabled?
[ 1664.558739] nvme nvme0: Try “nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off” and report a bug
[ 1664.592253] nvme 0000:03:00.0: enabling device (0000 → 0002)
[ 1664.594492] nvme nvme0: D3 entry latency set to 8 seconds
[ 1664.611686] nvme nvme0: 16/0/0 default/read/poll queues
[ 2377.816783] usb 1-4.1: reset full-speed USB device number 9 using xhci_hcd
[ 2449.323744] systemd-ssh-generator[32166]: Disabling SSH generator logic, since sshd is not installed.
[ 2637.285267] systemd-ssh-generator[33470]: Disabling SSH generator logic, since sshd is not installed.
[ 3042.016815] usb 1-4.1: reset full-speed USB device number 9 using xhci_hcd
[ 3087.181154] systemd-ssh-generator[51843]: Disabling SSH generator logic, since sshd is not installed.
[ 3402.659062] usb 1-4.1: reset full-speed USB device number 9 using xhci_hcd
[ 3490.681897] usb 1-4.1: reset full-speed USB device number 9 using xhci_hcd
[ 3494.439296] EXT4-fs (nvme1n1p2): mounted filesystem 132d7d7d-1812-4ebf-b837-ffe86441cf90 r/w with ordered data mode. Quota mode: none.
[ 6501.776987] usb 1-4.1: reset full-speed USB device number 9 using xhci_hcd
[ 6584.592966] usb 1-4.1: reset full-speed USB device number 9 using xhci_hcd

So I am assuming something is up with power management, SSDs, and the FW16.

Ever since an update in like September I have been having issue with power management. With powertop enabled I was having issues the keyboard repeating keys frequently, and USB-A mouse disconnecting, and KDE Plasma was no longer able to communicate with powertop (improper power mode). So I disabled it by doing:

powerManagement.powertop.enable = false;
services.power-profiles-daemon.enable = false;

By doing that I believe it defaults to TLP.

I can revert to a flake.lock from August and the issues seem to go away. Which is not ideal and prevents me from updating. The issues seem to also be kernel independent.

powertop auto-tuning is known to cause the issues you described. Search the forum for powertop to find the others of us who learnt this the hard way :slight_smile:

For AMD Framework you should be using PPD, not TLP. TLP may stomp on things. So I guess try making sure you are not using TLP first.

Seeing USB resets at other times does suggest some software is meddling with power management and it is breaking your system. It is not normal in my experience to see resets that often.

Good luck!

1 Like

My i5-13th has just started to fail to resume from suspend also with recent kernels, though I’m on Arch Linux, not Nix. Seems to be a similar nvme not waking up problem. I don’t use sleep or hibernate much anyway so it’s not a big loss to me. But if you want to put work into troubleshooting, there’s probably an acpi command line argument that would get sleep / resume working again.

I think I’ll give this a try. I wish I had a way to consistently trigger it to test but the most i can do is just watch dmesg.

I actually don’t think it is dependent on the kernel itself in my situation. I tried 6.6 and was still having issues.