I’ve had my Framework about a year, and my drive started to get full. I ordered a new Western Digital NVMe, and it’s fast and awesome. Clonezilla cloned it, gParted resized it, and now my Framework runs exactly the same but with more space.
Except… from time to time when I wake it from sleep, it’s unresponsive. Ctrl-Alt-F1 gets me to a terminal that has a large number of error messages about trying to write to the journal but it can’t because it’s a read-only filesystem.
Some page online talked about how I should update the SSD’s firmware and how it has something to do with trim() but I don’t know if this is relevant or what to do. Anyone experiencing this? Anyone know how to solve it?
I’m on Ubuntu 22.10 11th gen, 64 MB RAM, now a 2 TB drive, and I can provide the model # of the drive it that’s relevant.
This sounds similar to what I’m trying to troubleshoot on Debian with a different drive. Haven’t narrowed it down much yet but I’ll let you know if/when I do. Interesting that this only happened when you switched SSDs and with no change to the OS.
From my experience, if you take that away, the kernel will default to s2idle rather than deep sleep. See /sys/power/mem_sleep; it shows the current selection in brackets:
$ cat /sys/power/mem_sleep
s2idle [deep]
For me the problem is with deep sleep, and s2idle works fine (except for higher power usage that is, which is why I’m still trying to get the deep part working).
It happened again last night. It seems to happen when I’m trying to suspend (but not every time). The symptom last night was:
Select “Suspend” from the power menu, unplug my USB mouse/keyboard etc. and the power cord.
Realize some minutes later that the Framework’s fan is still on, open laptop.
Black screen. Ctrl-Alt-F1 got me to a terminal, which was showing the error messages in the attached image.
When I do cat /sys/power/mem_sleep I see [s2idle] deep.
So, I dunno, I’d rather not clone to another NVMe, I think this (brand new) device is probably good, but I suppose I can’t rule it out. But I wonder if it’s related to SSD TRIM maybe.
I will be frank, here and this is not really entirely directed to you as a user. Linux suspend, hibernate, hybrid mode, all of it has been buggy as hell for years. At this point the cpu handles lower power states so well that Intel even used it as a justification for getting rid of s3. Also all of them are very dependent on the firmware/uefi/bios that the problem could even be there. So what do I always recommend…don’t use any of them. The laptop boots faster than it will come out of hibernate, and the hours of troubleshooting to get anywhere near a stable suspend are just not worth it. Your laptop will drop to a lower level of power use even without it when it idles long enough. Spare yourself the headaches associated with it. Disable all of it in systemd, and move on with your day.
Well that’s an interesting way to go. I’ll think about that. Thing is, though, for a laptop, it is nice to be able to close it, walk to another room in the house, and open it again.
And I’ve had Linux laptops in the past that suspended and hibernated very well (and some that didn’t).
Call me a naive optimist, but I think Framework is an amazing idea—one that others are doing as well—and Linux is an amazing idea—and part of the way we should be thinking is “how can we make this better?” not “this part doesn’t work, so let’s ignore it.”
But I’m a user, not a kernel developer, so I suppose I also should “put up or shut up.”
At the very least, though, your comments do lend some credence to the idea that this is a suspend problem and not a busted SSD, so that’s encouraging.
At the least disable it for now to see if the behavior improves. That way you can narrow down the real issue. Regardless you should be able to close your lid anyway, unless the Framework laptop pulls air in through the keyboard. The CPU will handle the power usage particularly if you use thermals with dptfxtract setting up an automatic conf file.
Yours isn’t cloned, and yours isn’t having the problem? Yeah, I wonder, then. I will say that I manually ran fstrim and it trimmed 1.1 TiB, which is probably close to the amount by which I resized the partition after cloning. So I wonder if when I cloned and resized, it didn’t do something it needed to do to all that open space.
My files are all there, as far as I can tell—and I haven’t erased the drive I cloned from, so that’s some form of fairly recent backup.
Jumping in here, if I may. It looks like you are using a swap file rather than a swap partition, so keep that in mind when looking through the following pages. They should help you to get things working.
Is that right? If I seem over-cautious, it’s because setting a physical offset seems very risky if I don’t know exactly what I’m doing, and because the doc for doing this is peppered with warnings (though I am using ext4 not btrfs at least).
Also the doc is for Arch not Ubuntu, but seem to match what’s going on here.
But I just want to make sure this new thing that started happening once I cloned my disk doesn’t have a simpler solution before I start putting numbers into my Grub file
Make sure you have a live-usb available always. Going to assume that you do.
If you make any modifications to /etc/fstab run mount -a after any edit. If spits back that something can’t mount revert the edit.
You are using a swapfile so things will be different. You won’t be adding any kernel parameters.
Instead vim /etc/suspend.conf and then on one line resume device = and another line with resume offset = with your install specific data.
Can you post the results of lsblk -f this will give us a better idea of how your system is partitioned. I want to be sure we are not dealing with any encrypted partitions.
$ cat /etc/suspend.conf
cat: /etc/suspend.conf: No such file or directory
Looks like this page talks about kernel parameters too.
The weird thing about this whole thing is that it’s intermittent. I just suspended 3 times in a row and came back with no problems. I do wonder if it was related to the TRIM.
Damn snaps… lol. One of the reasons I left Ubuntu a decade ago.
Just to make sure we are not dealing with any lvm’s run lvdisplay. If we don’t have any lurking then we can proceed.
Of course if it is working correctly now you can hold off as well. No point in trying to add anything if it is working now. Yes too much garbage waiting on a trim could also affect it.
@nadb I haven’t installed LVM, so I believe we have no LVMs. I don’t have lvdisplay installed, in fact!
Yeah, I feel like I should let it settle and let’s see if it does it again. Without personally having a high level of kernel knowledge, what I think happened is:
I cloned the drive (success!)
I grew the cloned partition (success!)
I swapped the drive and booted (success!)
The machine said “Oh, your drive UUID used to be X and is now Y. Is that okay?” And I said it was (I think I did it right)
When I grew the partition, I didn’t tell it to do anything to that new filesystem space, because I think in HDD not SSD and I figured it would know what to do (problem?)
It had a terabyte or so of undefined (garbage?) stuff (maybe?) because something was aware the partition grew but some other thing wasn’t aware (←my naive theory)
I ran sudo fstrim -v / and I’m now hoping, if I was right in Step 6 above, that maybe all is well.
But… we shall see?
Some other webpage somewhere said something about how maybe the SSD is trying to run TRIM, and it becomes RO because of that, and then the OS freaks out because it wasn’t expecting it. That seems like a solvable problem too. But I don’t know how to know what’s going on at this point (which is why I have fsck in my Grub file now).
Thanks for your help thus far. I’ve been running Linux in one form or another for a long time now, but I’m more a graphic novelist and sometime light programmer than I am a Linux admin. I’ve watched Linux go from “If you have a Broadcomm WiFi chip, you have to build and install this kernel mod” (which I’ve done) to being damn near as easy as any other consumer OS. So I am happy with where things are, mostly, except when I run into something like this—and while I could say “with Windows or Mac this wouldn’t happen” I know that (1) that’s not true; and (2) unlike Linux, you don’t really have as much power to fix it when it does happen!
So there you go. Thanks everyone on this thread, and I’ll come back when it does it again