Input/output error on disk after S3 suspend

Hi, I am running Arch Linux on my framework. After resuming from a (successful) deep suspend, touchpad, keyboard, screen, etc are all working, but the disk is completely inaccessible - no reading or writing, just I/O errors. BTRFS logs thousands of write errors and read errors.

SSD is Inland Premium 2TB SSD M.2 2280 PCIe NVMe 3.0 x4 - Used with 2 other laptops over the past 2 years, both suspended fine
1x8GB 3200Mhz
Arch Linux, fully up to date
BIOS Beta 3.03, the same happened on 3.02

s2idle and hibernation both work perfectly fine.

I’ll see if i can get more error messages, they aren’t logged to disk because, well, the disk is read only.

Similar problem on my unit with Arch Linux, except using Ext4 / Crucial P5 SSD. My bios has not been updated yet, so still on 3.02.

I get this problem even with s2idle set @ /sys/power/mem_sleep if the unit has been asleep for a while. Although for a short period s2idle does seem to resume OK.

Other things tried: setting acpi_osi=Windows, acpiphp.disable=1, disabled hybrid sleep options in systemd sleep.conf, set nvme_core.default_ps_max_latency_us to 0.

Nothing that I’ve tried has made any difference, trying to wake from a long s2idle/suspend does not work in any configuration, console is full of disk I/O errors.

Sounds similar to this: Ubuntu 21.04 on the Framework Laptop - #30 by ezhik

Maybe try ZFS? That fixed it there.

Ok I found something interesting. While running this script from intel, just trying to get C10 in idle (which I have been unable to do), it reported

check PCIe bridge Link PM states:

Available bridge device: 0000:00:06.0 0000:00:07.0 0000:00:07.1 0000:00:07.2 0000:00:07.3 0000:00:1d.0

The PCIe bridge link power management state is:
0000:00:06.0 Link is in L0

The link power management state of PCIe bridge: 0000:00:06.0 is not expected. 
which is expected to be L1.1 or L1.2, or user would run this script again.

The L1SubCap of the failed 0000:00:06.0 is:
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+

The L1SubCtl1 of the failed 0000:00:06.0 is:
		L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+

Checking PCI Devices tree diagram:
-[0000:00]-+-00.0  Intel Corporation 11th Gen Core Processor Host Bridge/DRAM Registers
           +-02.0  Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics]
           +-04.0  Intel Corporation TigerLake-LP Dynamic Tuning Processor Participant
           +-06.0-[01]----00.0  Phison Electronics Corporation E12 NVMe Controller
           +-08.0  Intel Corporation GNA Scoring Accelerator module
           +-0a.0  Intel Corporation Tigerlake Telemetry Aggregator Driver
           +-0d.0  Intel Corporation Tiger Lake-LP Thunderbolt 4 USB Controller
           +-0d.2  Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0
           +-0d.3  Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #1
           +-12.0  Intel Corporation Tiger Lake-LP Integrated Sensor Hub
           +-14.0  Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 xHCI Host Controller
           +-14.2  Intel Corporation Tiger Lake-LP Shared SRAM
           +-15.0  Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0
           +-15.1  Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1
           +-15.3  Intel Corporation Tiger Lake-LP Serial IO I2C Controller #3
           +-16.0  Intel Corporation Tiger Lake-LP Management Engine Interface
           +-1d.0-[aa]----00.0  Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz
           +-1f.0  Intel Corporation Tiger Lake-LP LPC Controller
           +-1f.3  Intel Corporation Tiger Lake-LP Smart Sound Technology Audio Controller
           +-1f.4  Intel Corporation Tiger Lake-LP SMBus Controller
           \-1f.5  Intel Corporation Tiger Lake-LP SPI Controller

The PCIe bridge connected to the SSD can’t seem to get powersaving. Maybe it’s related to no C10 while idle (I do get C10 when in s2idle though) and S3?

As an experiment I tried checking if the latest Ubuntu (21) would behave any different in sleep. Looks like I would probably have the same issue I’m currently experiencing as the drive also is dropping communication there.

I’m experiencing the suspend issue with a Crucial P5 Plus SSD (Crucial P5 Plus 1TB PCIe M.2 2280SS Gaming SSD | CT1000P5PSSD8 |, where any suspend in deep mode for any length of time results in a read-only drive and I/O errors until a reboot.
s2idle mode and hibernation does work.

Running Arch Linux

Quite an anticlimactic solution, but I purchased a 2tb SN750 to replace my 2tb inland, and after cloning over my linux install, s3 suspend works totally fine. I still don’t know why it didn’t work with the inland drive, that specific drive had no issues suspending on 2 other laptops, one intel 8th gen and the other ryzen 4000.

Note, I only went this route because I needed a new SSD for one of my servers, so I am putting the Inland in there.

For the record as I also wrote on NVMe SSD options - #27 by Adrien_Rey-Jarthon I had this problem with a Crucial P5 (non plus) 2TB drive, and in my case it’s not only deep sleep but also s2idle mode. From what I read and tested I am pretty confident this is a compatibility issue between the SSD (or it’s firmware) and the motherboard that available software can’t fix (at the moment). So I ordered a Samsung 980 Pro instead and will try with it, if it works I’ll return the Crucial P5. It’s sad we have this kind of incompatibilities but I suppose NVMe is not as mature as SATA and there’s a lot of new stuff to take into account ^^

1 Like

I’m having the exact same issue with the exact same Crucial P5 2TB drive.

@Kieran_Levin is there any chance this might be fixed in a future firmware upgrade?