[RESPONDED] Laptop crashes and reboot sometimes after resuming from suspend

sydney · August 8, 2024, 6:38pm

@Max_Pearce_Basman Did you set a BIOS Boot Password by any chance?

Max_Pearce_Basman · August 8, 2024, 9:50pm

Not unless there already was one, no.

sydney · August 8, 2024, 10:22pm

@Max_Pearce_Basman Alright…Nevermind, then. I had one set (i.e. storing the disk encryption passphrase in the TPM), removed the BIOS Boot PW and so far, so good (been only a few days). I read in the Release Notes from the 0.0.3.5 FW update [1] that there has been an issue where the system hangs and the EC resets the system if it can’t access the nvme after resuming from suspend.

But anyhow, that also wouldn’t explain the artifacts you describe you’re seeing shortly before system reset.

[1] LVFS: Laptop 13 AMD Ryzen 7040

_luke · August 25, 2024, 12:12pm

I’m getting this issue as well, happens maybe 1 in 10 lid opens - you open it up and after a few seconds see the Framework logo and a cold boot. I’m using the AMD 13 7840U model with the latest firmware (03.05). There’s lots of swap space (which shouldn’t matter given that it’s doing a suspend via s2idle). I’m using Debian and tried 3 different kernels now with the same issue - 6.10.6, 6.9.7, 6.1.0. There doesn’t seem to be many power options in the UEFI config except allowing the PCI devices to drop to gen3 for power savings which seems to have zero effect.

I was going to switch the system to the deep/suspend-to-RAM method but that doesn’t appear to be available when I check /sys/power/mem_sleep.

I’ve disabled quick boot, quiet boot and secure boot to make troubleshooting easier but probably unsurprisingly nothing has helped as this seems like something in the firmware is deciding the system can’t resume. Very frustrating.

Edit: Also side note despite using ext4 which should be fairly crash resistant one of these accidental reboots destroyed my Signal messages database. I was under the impression that the system syncs filesystems on suspend (and there’s a “1” in /sys/power/sync_on_suspend) so that’s even more frustrating.

Garro · August 25, 2024, 4:47pm

I am affected by this issue too.

I am on Ubuntu 22.04 with the oem 22.04d kernel. Not sure if it has the patch mentioned above included. What I see in the logs, though, is a bunch of messages like this:

kernel: xhci_hcd 0000:c1:00.3: Refused to change power state from D0 to D3hot

Garro · August 25, 2024, 5:10pm

Swapfiles are supported on btrfs since kernel 5.0 (Swapfile — BTRFS documentation)

sydney · August 25, 2024, 8:37pm

Heads up: Frustrated, disillusioned user rant.

I honestly thought that having removed the BIOS password would have fixed at least this hard reset issues, but unfortunately, i did not. The EC still hard resets the machine after suspend every 5-10 wakeup, and i’m still losing all my unsaved work.

Having said that, i feel disappointed that nearly a year after launch there are still this kind of issues with the embedded controller and/or the firmware of the device.

These half-working USB ports, these constant hard resets, the USB-C PD issues, the flaky firmware and amount of issues and quirks, other people are having to workaroung with their machines…

In my personal opinion, i now regret spending the 1700€ i paid for the issue-ridden experience i have had so far, and i cannot recommend (at least this AMD version) this laptop in good faith to anyone any longer, seeing all the kinds of issues it has and the amount of compromises one has to accept, even if you are willing to make sacrifices for the “greater good”.

In this state, the machine is indeed barely usable for any productive means.

With all the love and respect i have for Framework’s Vision and my willingness to accept some drawbacks, as a whole, truly unfortunately but nonetheless, i feel like this vision has failed me. The truth (= my personal experience) simply is, that i had never as many issues/drawbacks with any laptop i ever owned before, than i have had with this FW 13" AMD model.

It is as simple as it is sad.

Garro · August 26, 2024, 4:41pm

Did anyone try to contact official support for this issue?

_luke · August 26, 2024, 4:52pm

Yeah I have a support request open that I’m going to work with them on.

Garro · August 26, 2024, 5:08pm

Great. Please, keep us all updated on this thread if you can. That would be appreciated very much.

sydney · August 26, 2024, 6:06pm

I highly doubt that this is an kernel issue, considering the circumstances. After having calmed down (a bit), i think this might very well be a AMD Phoenix Platform issue, perhaps even out of the purview/reach of framework as a company at all. After having a look over the wall, it appears there are even M* powered Macs that are unable to cycle suspend/resume on a reliable basis.

Nonetheless, as @Garro pointed out, it might be very interesting having a follow-up from framework themselfs, if it doesn’t just boil down to “have you tried changing your FS” or “does it occur also with OS/2 warp”…

Windows Modern Standby truly is the IE6 of Platform Engineering these days…

_luke · August 30, 2024, 12:34am

So I opened a line to support a few days ago and their first recommendation was to reset the BIOS to optimal settings; I did that and the only settings I’ve changed afterwards (it was already on 3.05) was disabling quick boot, disabling quiet boot and disabling secure boot. So far it’s been about 4 days with no crashes whereas before I was getting a couple a day so fingers crossed that I’m in the clear!

sydney · August 30, 2024, 2:10pm

@_luke Thank you for the follow-up… Are you intending to turn back on “secure boot” or are you going to leave it that way?

_luke · August 30, 2024, 2:25pm

I’m not sure. I have everything working great on the latest kernel version which is unsigned as it’s not in Debian and honestly I’m not sure figuring out the machine key infrastructure is worth having it on.

JB-Mtl · October 13, 2024, 6:11pm

Hi, I seem to have a similar / related issue.

Observed behaviour : once in a while (one every 10 ?) the FW 13 AMD Ryzen 7 7840U will not resume, and needs a hard reset.

system/kernel : Ubuntu 24.04.1 LTS / kernel: 6.8.0-45-generic

from journalctl -b | grep -i suspend
Oct 10 19:18:56 jb-fw kernel: nvme 0000:02:00.0: platform quirk: setting simple suspend

nvme ssd hardware
description: NVMe device
product: WD_BLACK SN850X 1000GB
vendor: Sandisk Corp

I have seen a number of posts related, but cannot find a solution, it looks like the new kernel version (6.11) may still have that “platform quirk: setting simple suspend” issue

Any help most welcome !

sydney · October 13, 2024, 6:58pm

I have exactly the same HW (but the 2TB nvme model) and am facing the same issue as you described. FW customer support advised me to do a “mainboard reset” as described here [1].

After resetting the mainboard i had a run for about a whole month with the machine not resetting itself after resume from suspend a single time (usually it takes about 3-6 suspend cycles to trigger the issue here).

A few days ago, it appears my lucky streak came to an, considering the issue, truly abrupt end and the issue started happening again.

I have had this issue since i received the laptop to this day, have tried a lot of different things, and so far, nothing appears to have solved the issue, unfortunately.

You mentioned that the kernel may still have “that platform quirk: setting simple suspend” issue”, and haven’t come across this one.

Have you tried running the system with the “amd-pmf”-module blacklisted as mentioned here [2]?

[1] Reset forgotten BIOS password - #17 by sgilderd
[2] 6.8-rc: system freezes after resuming from suspend

JB-Mtl · October 14, 2024, 4:27pm

@sydney: Thanks for the links. I haven’t tried a reset of the motherboard, I will try that after testing the removal the amd-pmf module : seems more promising.

I was investigating kernel updates because the [1] thread seems to end with a promising optimistic conclusion :
“Got it. I installed -hwe 6.8.0-20 and suspend/resume now works every time!”.

What does your ‘‘journalctl -b | grep -i suspend’’ or dmsg tell you after failure to resume ? does it also point to nvme issue ? I’m thinking of trying another ssd to see if this is the WD specific.

Thanks for the reply, it’s really helpful.

[1] resolved-kernel-6-8

sydney · October 14, 2024, 5:13pm

Yes, you are correct. That kind of illustrates the desperateness i’m in at this point.
Willing to try everything, even if i don’t think myself it will solve this issue, like blacklisting this module, but you know… how few people know this low level platform management stuff anyways…

On a sucessful cycle it show this:

Oct 14 07:00:19 fw systemd[1]: logrotate.service: Deactivated successfully.
Oct 14 07:00:19 fw systemd[1]: Finished Logrotate Service.
Oct 14 07:04:34 fw rtkit-daemon[1852]: Warning: Reached maximum concurrent process limit for user '1000', denying request.
Oct 14 07:05:27 fw rtkit-daemon[1852]: Warning: Reached maximum concurrent process limit for user '1000', denying request.
Oct 14 07:23:37 fw bluetoothd[1158]: src/profile.c:ext_io_disconnected() Unable to get io data for Hands-Free Voice gateway: getpeername: Transport endpoint is not connected (107)
Oct 14 07:23:38 fw dbus-daemon[1193]: [system] Rejected send message, 0 matched rules; type="method_return", sender=":1.34" (uid=1000 pid=1859 comm="/nix/store/y8rr19f18wq3pccz8rr65r0ksc>
Oct 14 07:23:57 fw systemd-logind[1217]: The system will suspend now!
Oct 14 07:23:57 fw systemd[1]: Starting Pre-Sleep Actions...
Oct 14 07:23:57 fw systemd[1]: pre-sleep.service: Deactivated successfully.
Oct 14 07:23:57 fw systemd[1]: Finished Pre-Sleep Actions.
Oct 14 07:23:57 fw systemd[1]: Reached target Sleep.
Oct 14 07:23:57 fw systemd[1]: Starting System Suspend...
Oct 14 07:23:57 fw systemd-sleep[1410582]: Successfully froze unit 'user.slice'.
Oct 14 07:23:57 fw systemd-sleep[1410582]: Performing sleep operation 'suspend'...
Oct 14 07:23:57 fw kernel: PM: suspend entry (s2idle)
Oct 14 07:23:57 fw kernel: Filesystems sync: 0.001 seconds
Oct 14 15:40:19 fw kernel: Freezing user space processes
Oct 14 15:40:19 fw kernel: Freezing user space processes completed (elapsed 0.002 seconds)
Oct 14 15:40:19 fw kernel: OOM killer disabled.
Oct 14 15:40:19 fw kernel: Freezing remaining freezable tasks
Oct 14 15:40:19 fw kernel: Freezing remaining freezable tasks completed (elapsed 0.599 seconds)
Oct 14 15:40:19 fw kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Oct 14 15:40:19 fw kernel: wlan0: deauthenticating from 04:f0:21:36:61:e3 by local choice (Reason: 3=DEAUTH_LEAVING)
Oct 14 15:40:19 fw kernel: ACPI: EC: interrupt blocked
Oct 14 15:40:19 fw kernel: ACPI: EC: interrupt unblocked
Oct 14 15:40:19 fw kernel: [drm] PCIE GART of 512M enabled (table at 0x00000080FFD00000).
Oct 14 15:40:19 fw kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming...
Oct 14 15:40:19 fw kernel: nvme nvme0: 16/0/0 default/read/poll queues

After it crashed, the last lines from the log always are:

Oct 11 06:28:11 fw systemd[1]: Finished Refresh fwupd metadata and update motd.
Oct 11 06:35:26 fw bluetoothd[1046]: src/profile.c:ext_io_disconnected() Unable to get io data for Hands-Free Voice gateway: getpeername: Transport endpoint is not connected (107)
Oct 11 06:35:26 fw dbus-daemon[1081]: [system] Rejected send message, 0 matched rules; type="method_return", sender=":1.31" (uid=1000 pid=1615 comm="/nix/store/y8rr19f18wq3pccz8rr65r0ksc>
Oct 11 06:35:53 fw systemd-logind[1108]: The system will suspend now!
Oct 11 06:35:53 fw systemd[1]: Starting Pre-Sleep Actions...
Oct 11 06:35:53 fw systemd[1]: pre-sleep.service: Deactivated successfully.
Oct 11 06:35:53 fw systemd[1]: Finished Pre-Sleep Actions.
Oct 11 06:35:53 fw systemd[1]: Reached target Sleep.
Oct 11 06:35:53 fw systemd[1]: Starting System Suspend...
Oct 11 06:35:53 fw systemd-sleep[189140]: Successfully froze unit 'user.slice'.
Oct 11 06:35:53 fw systemd-sleep[189140]: Performing sleep operation 'suspend'...
Oct 11 06:35:53 fw kernel: PM: suspend entry (s2idle)
lines 933-1000/1000 (END)

Note, when the system has crashed, the logs always lack this line before suspend in comparison to a sucessful cycle:

Filesystems sync: 0.001 seconds

I didn’t notice the kernel: nvme 0000:02:00.0: platform quirk: setting simple suspend until you noticed it…

JB-Mtl · October 17, 2024, 1:01pm

Hi @sydney : it looks there maybe different causes…

I had a week without issue, my last fail to resume gives me this from ‘journalctl’ searching for 'fail

Blockquote
sudo journalctl -b | grep fail
Oct 17 07:38:00 jb-fw kernel: ACPI: _OSC evaluation for CPUs failed, trying _PDC
Oct 17 07:38:00 jb-fw (udev-worker)[534]: nvme0n1: Process ‘/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/nvme0n1’ failed with exit code 1.
Oct 17 07:38:00 jb-fw (udev-worker)[541]: nvme0n1p2: Process ‘/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/nvme0n1p2’ failed with exit code 1.
Oct 17 07:38:00 jb-fw (udev-worker)[534]: nvme0n1p1: Process ‘/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/nvme0n1p1’ failed with exit code 1.
Oct 17 07:38:00 jb-fw (udev-worker)[522]: nvme0n1p3: Process ‘/usr/bin/unshare -m /usr/bin/snap auto-import --mount=/dev/nvme0n1p3’ failed with exit code 1.
Oct 17 07:38:17 jb-fw systemd[1]: Starting grub-initrd-fallback.service - GRUB failed boot detection…
Oct 17 07:38:17 jb-fw bluetoothd[1310]: profiles/sap/server.c:sap_server_register() Sap driver initialization failed.
Oct 17 07:38:17 jb-fw systemd[1]: Finished grub-initrd-fallback.service - GRUB failed boot detection.
Oct 17 07:38:17 jb-fw gnome-remote-de[1316]: Init TPM credentials failed because Failed to initialize transmission interface context: tcti:IO failure, using GKeyFile as fallback
Oct 17 07:38:17 jb-fw boltd[1593]: [d2733804-901e-domain0 ] udev: failed to determine if uid is stable: unknown NHI PCI id ‘0x1668’
Oct 17 07:38:17 jb-fw boltd[1593]: [d2733804-911e-domain1 ] udev: failed to determine if uid is stable: unknown NHI PCI id ‘0x1669’
Oct 17 07:38:19 jb-fw NetworkManager[1462]: [1729165099.0515] failed to open /run/network/ifstate
Oct 17 07:38:20 jb-fw /usr/libexec/gdm-x-session[1767]: xf86EnableIO: failed to enable I/O ports 0000-03ff (Operation not permitted)
Oct 17 07:38:21 jb-fw /usr/libexec/gdm-x-session[1859]: dbus-daemon[1859]: [session uid=128 pid=1859] Activated service ‘org.freedesktop.systemd1’ failed: Process org.freedesktop.systemd1 exited with status 1
Oct 17 07:38:22 jb-fw /usr/libexec/gdm-x-session[1859]: dbus-daemon[1859]: [session uid=128 pid=1859] Activated service ‘org.freedesktop.systemd1’ failed: Process org.freedesktop.systemd1 exited with status 1
Oct 17 07:38:23 jb-fw /usr/libexec/gdm-x-session[1859]: dbus-daemon[1859]: [session uid=128 pid=1859] Activated service ‘org.freedesktop.systemd1’ failed: Process org.freedesktop.systemd1 exited with status 1
Oct 17 07:38:23 jb-fw systemd[1]: Started update-notifier-download.timer - Download data for packages that failed at package install time.
Oct 17 07:38:39 jb-fw systemd-xdg-autostart-generator[2425]: /home/jb/.config/autostart/slack.desktop: stat() failed, ignoring: No such file or directory
Oct 17 07:38:40 jb-fw /usr/libexec/gdm-x-session[2523]: _XSERVTransSocketUNIXCreateListener: …SocketCreateListener() failed
Oct 17 07:38:40 jb-fw /usr/libexec/gdm-x-session[1767]: (EE) AMDGPU(0): failed to set mode: Permission denied
Oct 17 07:38:40 jb-fw gsd-power[2092]: Release of light sensors failed: GDBus.Error:org.freedesktop.DBus.Error.AccessDenied: Not Authorized: Sensor claim not allowed
Oct 17 07:38:40 jb-fw /usr/libexec/gdm-x-session[2523]: xf86EnableIO: failed to enable I/O ports 0000-03ff (Operation not permitted)
Oct 17 07:43:08 jb-fw systemd[1]: Starting update-notifier-download.service - Download data for packages that failed at package install time…
Oct 17 07:43:08 jb-fw systemd[1]: Finished update-notifier-download.service - Download data for packages that failed at package install time.
Oct 17 07:55:18 jb-fw google-chrome.desktop[4772]: [4765:4794:1017/075518.004892:ERROR:connection_factory_impl.cc(483)] ConnectionHandler failed with net error: -2
Oct 17 07:55:18 jb-fw google-chrome.desktop[4772]: [4765:4794:1017/075518.005519:ERROR:connection_factory_impl.cc(483)] ConnectionHandler failed with net error: -2
Oct 17 07:55:18 jb-fw google-chrome.desktop[4772]: [4812:4812:1017/075518.040037:ERROR:gl_surface_presentation_helper.cc(260)] GetVSyncParametersIfAvailable() failed for 1 times!
Oct 17 08:08:34 jb-fw systemd[1]: Starting grub-initrd-fallback.service - GRUB failed boot detection…
Oct 17 08:08:34 jb-fw systemd[1]: Finished grub-initrd-fallback.service - GRUB failed boot detection.

So, not the NVME quirk. I have updated to 6.8.0-47-generic (from 6.8.0-45) following ubuntu suggested updates, I’ll let you know if that seem to change anything ! I am of course out of my depth here

Max_Pearce_Basman · November 2, 2024, 5:08am

After a few lucky months (of admittedly not using the device much), I have experienced this once more, twice in the same day.
Currently on the 6.11.5-200 kernel.

I’ll probably upgrade to Fedora 41 in a week. Will let y’all know if it stops happening as a result.