Replaced NVMe, now I get "read only filesystem" sometimes on wake

Peter_Conrad · December 7, 2022, 4:49am

I’ve had my Framework about a year, and my drive started to get full. I ordered a new Western Digital NVMe, and it’s fast and awesome. Clonezilla cloned it, gParted resized it, and now my Framework runs exactly the same but with more space.

Except… from time to time when I wake it from sleep, it’s unresponsive. Ctrl-Alt-F1 gets me to a terminal that has a large number of error messages about trying to write to the journal but it can’t because it’s a read-only filesystem.

Some page online talked about how I should update the SSD’s firmware and how it has something to do with trim() but I don’t know if this is relevant or what to do. Anyone experiencing this? Anyone know how to solve it?

I’m on Ubuntu 22.10 11th gen, 64 MB RAM, now a 2 TB drive, and I can provide the model # of the drive it that’s relevant.

640kb · December 7, 2022, 3:57pm

This sounds similar to what I’m trying to troubleshoot on Debian with a different drive. Haven’t narrowed it down much yet but I’ll let you know if/when I do. Interesting that this only happened when you switched SSDs and with no change to the OS.

Peter_Conrad · December 7, 2022, 4:38pm

Yeah, strange indeed. And other than that, everything seems fine. I guess I should just, you know, keep good backups!

I did set it to run fsck on startup, just in case.

Matt_Hartley · December 8, 2022, 12:35am

Assuming the drive isn’t bad, I’d try changing change the /etc/default/grub to:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pcie_aspm=off"

Then sudo update-grub

I’ve seen pcie_aspm=off help in these situations.

Peter_Conrad · December 8, 2022, 6:09pm

Thanks! I’ve made the change, and I’ll see how it goes. I appreciate the help. Previous version was:

# GRUB_CMDLINE_LINUX_DEFAULT="quiet splash mem_sleep_default=deep fsck.mode=force fsck.repair=yes"

I wonder whether to take the mem_sleep_default setting from there or not, but I’ll try without for now.

640kb · December 10, 2022, 10:13pm

From my experience, if you take that away, the kernel will default to s2idle rather than deep sleep. See /sys/power/mem_sleep; it shows the current selection in brackets:

$ cat /sys/power/mem_sleep 
s2idle [deep]

For me the problem is with deep sleep, and s2idle works fine (except for higher power usage that is, which is why I’m still trying to get the deep part working).

Peter_Conrad · December 12, 2022, 3:06pm

It happened again last night. It seems to happen when I’m trying to suspend (but not every time). The symptom last night was:

Select “Suspend” from the power menu, unplug my USB mouse/keyboard etc. and the power cord.
Realize some minutes later that the Framework’s fan is still on, open laptop.
Black screen. Ctrl-Alt-F1 got me to a terminal, which was showing the error messages in the attached image.

When I do cat /sys/power/mem_sleep I see [s2idle] deep.

So, I dunno, I’d rather not clone to another NVMe, I think this (brand new) device is probably good, but I suppose I can’t rule it out. But I wonder if it’s related to SSD TRIM maybe.

nadb · December 12, 2022, 4:48pm

I will be frank, here and this is not really entirely directed to you as a user. Linux suspend, hibernate, hybrid mode, all of it has been buggy as hell for years. At this point the cpu handles lower power states so well that Intel even used it as a justification for getting rid of s3. Also all of them are very dependent on the firmware/uefi/bios that the problem could even be there. So what do I always recommend…don’t use any of them. The laptop boots faster than it will come out of hibernate, and the hours of troubleshooting to get anywhere near a stable suspend are just not worth it. Your laptop will drop to a lower level of power use even without it when it idles long enough. Spare yourself the headaches associated with it. Disable all of it in systemd, and move on with your day.

Peter_Conrad · December 12, 2022, 5:06pm

Well that’s an interesting way to go. I’ll think about that. Thing is, though, for a laptop, it is nice to be able to close it, walk to another room in the house, and open it again.

And I’ve had Linux laptops in the past that suspended and hibernated very well (and some that didn’t).

Call me a naive optimist, but I think Framework is an amazing idea—one that others are doing as well—and Linux is an amazing idea—and part of the way we should be thinking is “how can we make this better?” not “this part doesn’t work, so let’s ignore it.”

But I’m a user, not a kernel developer, so I suppose I also should “put up or shut up.”

At the very least, though, your comments do lend some credence to the idea that this is a suspend problem and not a busted SSD, so that’s encouraging.

nadb · December 12, 2022, 5:21pm

At the least disable it for now to see if the behavior improves. That way you can narrow down the real issue. Regardless you should be able to close your lid anyway, unless the Framework laptop pulls air in through the keyboard. The CPU will handle the power usage particularly if you use thermals with dptfxtract setting up an automatic conf file.

Matt_Hartley · December 12, 2022, 8:18pm

I have the same exact config, except mine wasn’t cloned. I wonder if there is something happening there causing problems.

Peter_Conrad · December 12, 2022, 8:52pm

Yours isn’t cloned, and yours isn’t having the problem? Yeah, I wonder, then. I will say that I manually ran fstrim and it trimmed 1.1 TiB, which is probably close to the amount by which I resized the partition after cloning. So I wonder if when I cloned and resized, it didn’t do something it needed to do to all that open space.

My files are all there, as far as I can tell—and I haven’t erased the drive I cloned from, so that’s some form of fairly recent backup.

nadb · December 12, 2022, 9:46pm

Silly question do you have in the kernel parameters a resume=uuidofswap and does that match whatever the entry is in your /etc/fstab?

My suspicion is that one is missing or the uuid may be wrong in one location.

Peter_Conrad · December 12, 2022, 10:27pm

Ah! I think you are right that one is missing. My /etc/fstab looks like this (leading commented lines removed):

UUID=ead8646c-dc70-477e-b316-7e13aa93b32b /               ext4    errors=remount-ro 0       1
# /boot/efi was on /dev/nvme0n1p1 during installation
UUID=EB57-96DD  /boot/efi       vfat    umask=0077      0       1
/swapfile                                 none            swap    sw              0       0

The command cat /etc/default/grub | grep resume returns no results.

Here is the relevant device:

Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: WD_BLACK SN850X 2000GB                  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 4D2CA474-00B6-4473-8613-55567F7B6BEE

Device           Start        End    Sectors  Size Type
/dev/nvme0n1p1    2048    1050623    1048576  512M EFI System
/dev/nvme0n1p2 1050624 3906672419 3905621796  1.8T Linux filesystem

The blkid command says:

$ sudo blkid
/dev/nvme0n1p2: UUID="ead8646c-dc70-477e-b316-7e13aa93b32b" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="ff756fbb-6e45-49ad-a2cd-a0c1ae3979b6"

... # omitted some /dev/loop* devices

/dev/nvme0n1p1: UUID="EB57-96DD" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="0f3d226f-26ed-4dc9-bfee-5772df376fed"

So, what I don’t know is how to get the UUID of /swap so I can add it to my Grub file. The only uncommented lines of /etc/default/grub are:

GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
# GRUB_CMDLINE_LINUX_DEFAULT="quiet splash mem_sleep_default=deep fsck.mode=force fsck.repair=yes"
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pcie_aspm=off fsck.mode=force fsck.repair=yes"
GRUB_CMDLINE_LINUX=""

Do you know how I can fix it?

lbkNhubert · December 12, 2022, 11:01pm

Jumping in here, if I may. It looks like you are using a swap file rather than a swap partition, so keep that in mind when looking through the following pages. They should help you to get things working.

https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Hibernation
and
https://www.kernel.org/doc/html/latest/power/basic-pm-debugging.html

Good luck!

Peter_Conrad · December 12, 2022, 11:22pm

Thanks @nadb @lbkNhubert! From what I can figure out from Dr. Google:

$ findmnt -no UUID -T /swapfile
ead8646c-dc70-477e-b316-7e13aa93b32b
$ sudo filefrag -v /swapfile 
[sudo] password: 
Filesystem type is: ef53
File size of /swapfile is 2147483648 (524288 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..   63487:      34816..     98303:  63488:            
   1:    63488..  126975:     100352..    163839:  63488:      98304:
   2:   126976..  190463:     165888..    229375:  63488:     163840:
   3:   190464..  253951:     231424..    294911:  63488:     229376:
   4:   253952..  481279:     296960..    524287: 227328:     294912:
   5:   481280..  524287:     557056..    600063:  43008:     524288: last,eof
/swapfile: 6 extents found

So I should add the following to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub:

resume=UUID=ead8646c-dc70-477e-b316-7e13aa93b32b resume_offset=34816

Is that right? If I seem over-cautious, it’s because setting a physical offset seems very risky if I don’t know exactly what I’m doing, and because the doc for doing this is peppered with warnings (though I am using ext4 not btrfs at least).

Also the doc is for Arch not Ubuntu, but seem to match what’s going on here.

But I just want to make sure this new thing that started happening once I cloned my disk doesn’t have a simpler solution before I start putting numbers into my Grub file

nadb · December 13, 2022, 12:01am

Make sure you have a live-usb available always. Going to assume that you do.
If you make any modifications to /etc/fstab run mount -a after any edit. If spits back that something can’t mount revert the edit.
You are using a swapfile so things will be different. You won’t be adding any kernel parameters.
Instead vim /etc/suspend.conf and then on one line resume device = and another line with resume offset = with your install specific data.

Can you post the results of lsblk -f this will give us a better idea of how your system is partitioned. I want to be sure we are not dealing with any encrypted partitions.

Peter_Conrad · December 13, 2022, 12:27am

It’s long! But here it is. BTW there are a couple USB flash drives plugged in, which you’ll see as sda and sdb.

$ sudo lsblk -f
[sudo] password: 
NAME FSTYPE FSVER LABEL UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
loop0
     squash 4.0                                                    0   100% /snap/acrordrdc/62
loop1
     squash 4.0                                                    0   100% /snap/audacity/1032
loop2
     squash 4.0                                                    0   100% /snap/audacity/1051
loop3
     squash 4.0                                                    0   100% /snap/bare/5
loop4
     squash 4.0                                                    0   100% /snap/codium/283
loop5
     squash 4.0                                                    0   100% /snap/codium/285
loop6
     squash 4.0                                                    0   100% /snap/core/14399
loop7
     squash 4.0                                                    0   100% /snap/core18/2620
loop8
     squash 4.0                                                    0   100% /snap/core18/2632
loop9
     squash 4.0                                                    0   100% /snap/core20/1695
loop10
     squash 4.0                                                    0   100% /snap/core20/1738
loop11
     squash 4.0                                                    0   100% /snap/core22/310
loop12
     squash 4.0                                                    0   100% /snap/core22/444
loop13
     squash 4.0                                                    0   100% /snap/firefox/2088
loop14
     squash 4.0                                                    0   100% /snap/firefox/2154
loop15
     squash 4.0                                                    0   100% /snap/flameshot/182
loop16
     squash 4.0                                                    0   100% /snap/flameshot/183
loop17
     squash 4.0                                                    0   100% /snap/gnome-3-28-1804/161
loop18
     squash 4.0                                                    0   100% /snap/gnome-3-34-1804/77
loop19
     squash 4.0                                                    0   100% /snap/gnome-3-38-2004/115
loop20
     squash 4.0                                                    0   100% /snap/gnome-3-38-2004/119
loop21
     squash 4.0                                                    0   100% /snap/gnome-42-2204/29
loop22
     squash 4.0                                                    0   100% /snap/gnome-42-2204/44
loop23
     squash 4.0                                                    0   100% /snap/gtk2-common-themes/13
loop24
     squash 4.0                                                    0   100% /snap/gtk-common-themes/1534
loop25
     squash 4.0                                                    0   100% /snap/gtk-common-themes/1535
loop26
     squash 4.0                                                    0   100% /snap/heroku/4087
loop27
     squash 4.0                                                    0   100% /snap/heroku/4092
loop28
     squash 4.0                                                    0   100% /snap/inkscape/10509
loop29
     squash 4.0                                                    0   100% /snap/inkscape/10512
loop30
     squash 4.0                                                    0   100% /snap/kde-frameworks-5-91-qt-5-15-3-core20/1
loop31
     squash 4.0                                                    0   100% /snap/kde-frameworks-5-96-qt-5-15-5-core20/7
loop32
     squash 4.0                                                    0   100% /snap/kde-frameworks-5-98-qt-5-15-6-core20/9
loop33
     squash 4.0                                                    0   100% /snap/kde-frameworks-5-99-qt-5-15-7-core20/3
loop34
     squash 4.0                                                    0   100% /snap/kde-frameworks-5-99-qt-5-15-7-core20/7
loop35
     squash 4.0                                                    0   100% /snap/kde-frameworks-5-core18/32
loop36
     squash 4.0                                                    0   100% /snap/kde-frameworks-5-qt-5-15-3-core20/8
loop37
     squash 4.0                                                    0   100% /snap/kdenlive/71
loop38
     squash 4.0                                                    0   100% /snap/kdenlive/75
loop39
     squash 4.0                                                    0   100% /snap/ksnip/407
loop40
     squash 4.0                                                    0   100% /snap/ksnip/443
loop41
     squash 4.0                                                    0   100% /snap/p3x-onenote/142
loop42
     squash 4.0                                                    0   100% /snap/p3x-onenote/146
loop43
     squash 4.0                                                    0   100% /snap/scrcpy/386
loop44
     squash 4.0                                                    0   100% /snap/scrcpy/394
loop45
     squash 4.0                                                    0   100% /snap/shortwave/66
loop46
     squash 4.0                                                    0   100% /snap/skype/238
loop47
     squash 4.0                                                    0   100% /snap/shortwave/72
loop48
     squash 4.0                                                    0   100% /snap/skype/240
loop49
     squash 4.0                                                    0   100% /snap/slack/67
loop50
     squash 4.0                                                    0   100% /snap/slack/68
loop51
     squash 4.0                                                    0   100% /snap/snap-store/599
loop52
     squash 4.0                                                    0   100% /snap/snap-store/638
loop53
     squash 4.0                                                    0   100% /snap/snapd-desktop-integration/14
loop54
     squash 4.0                                                    0   100% /snap/snapd-desktop-integration/43
loop55
     squash 4.0                                                    0   100% /snap/sosumi/15
loop56
     squash 4.0                                                    0   100% /snap/typora/74
loop57
     squash 4.0                                                    0   100% /snap/typora/76
loop58
     squash 4.0                                                    0   100% /snap/ubports-installer/418
loop59
     squash 4.0                                                    0   100% /snap/ubports-installer/435
loop60
     squash 4.0                                                    0   100% /snap/wine-platform-6-stable/19
loop61
     squash 4.0                                                    0   100% /snap/wine-platform-runtime/321
loop62
     squash 4.0                                                    0   100% /snap/wine-platform-runtime/322
sda                                                                         
└─sda1
     exfat  1.0   Samsung USB
                        64A5-F009                              80.3G    66% /media/pconrad/Samsung USB1
sdb                                                                         
└─sdb1
     vfat   FAT32       B0DF-383F                                 2G    47% /media/pconrad/B0DF-383F
nvme0n1
                                                                            
├─nvme0n1p1
│    vfat   FAT32       EB57-96DD                             505.7M     1% /boot/efi
└─nvme0n1p2
     ext4   1.0         ead8646c-dc70-477e-b316-7e13aa93b32b    1.1T    37% /var/snap/firefox/common/host-hunspell

But also:

$ cat /etc/suspend.conf
cat: /etc/suspend.conf: No such file or directory

Looks like this page talks about kernel parameters too.

The weird thing about this whole thing is that it’s intermittent. I just suspended 3 times in a row and came back with no problems. I do wonder if it was related to the TRIM.

nadb · December 13, 2022, 12:39am

Damn snaps… lol. One of the reasons I left Ubuntu a decade ago.
Just to make sure we are not dealing with any lvm’s run lvdisplay. If we don’t have any lurking then we can proceed.

Of course if it is working correctly now you can hold off as well. No point in trying to add anything if it is working now. Yes too much garbage waiting on a trim could also affect it.

Peter_Conrad · December 13, 2022, 12:49am

@nadb I haven’t installed LVM, so I believe we have no LVMs. I don’t have lvdisplay installed, in fact!

Yeah, I feel like I should let it settle and let’s see if it does it again. Without personally having a high level of kernel knowledge, what I think happened is:

I cloned the drive (success!)
I grew the cloned partition (success!)
I swapped the drive and booted (success!)
The machine said “Oh, your drive UUID used to be X and is now Y. Is that okay?” And I said it was (I think I did it right)
When I grew the partition, I didn’t tell it to do anything to that new filesystem space, because I think in HDD not SSD and I figured it would know what to do (problem?)
It had a terabyte or so of undefined (garbage?) stuff (maybe?) because something was aware the partition grew but some other thing wasn’t aware (←my naive theory)
I ran sudo fstrim -v / and I’m now hoping, if I was right in Step 6 above, that maybe all is well.

But… we shall see?

Some other webpage somewhere said something about how maybe the SSD is trying to run TRIM, and it becomes RO because of that, and then the OS freaks out because it wasn’t expecting it. That seems like a solvable problem too. But I don’t know how to know what’s going on at this point (which is why I have fsck in my Grub file now).

Thanks for your help thus far. I’ve been running Linux in one form or another for a long time now, but I’m more a graphic novelist and sometime light programmer than I am a Linux admin. I’ve watched Linux go from “If you have a Broadcomm WiFi chip, you have to build and install this kernel mod” (which I’ve done) to being damn near as easy as any other consumer OS. So I am happy with where things are, mostly, except when I run into something like this—and while I could say “with Windows or Mac this wouldn’t happen” I know that (1) that’s not true; and (2) unlike Linux, you don’t really have as much power to fix it when it does happen!

So there you go. Thanks everyone on this thread, and I’ll come back when it does it again

Topic		Replies	Views
[TRACKING] DIY edition Ubuntu - Filesystem in Readonly mode Linux	36	4208	December 19, 2023
Nvme0: controller is down; will reset Framework Laptop 13	14	6606	April 23, 2023
[RESPONDED] SSD Does not Wake up After Suspend - AMD Linux arch	19	1480	August 6, 2024
[RESPONDED] Really bad errors trying deep sleep Linux	7	946	April 12, 2023
Ssd failure (WD_BLACK™ SN850 NVMe™) Framework Laptop 13	10	3686	October 22, 2021

Replaced NVMe, now I get "read only filesystem" sometimes on wake

Related topics