Sudden reboot on wake with no logs

Ever since the start of February (through I have no corroborating package installs around that time), my Framework will, upon waking from sleep and waiting a few seconds (just over enough time for me to unlock), instantly reboot. I am using desktop programs normally when the screen instantly turns black and dumps me into GRUB.

I did notice that this always seems to happen right after the wifi wakes back up (I will almost always see the NetworkManager notification that I have connected). I had seen that a similar issue occurs for Ryzen+MT7922 Framework owners (though I can’t seem to find the post anymore, it was recent-ish), so I bought a used AX210, but it just happened again so it’s probably not the issue.

This happens 99% of the time when waking the laptop, and has only happened twice on boot (ie, wake → reboot (from issue) → reboot again, after login but before DE)

This usually happens after sleeping for an extended period of time (an hour?), though haven’t measured it.

With the MT7922, the issue kept happening even when the kernel module was unloaded (sudo modprobe -r mt7921e). The AX210 arrived yesterday, so I haven’t done much testing with it.

I seem to be able to trigger the issue (rarely) by plugging in my USB charger, after which it seems to instantly shut off, but I think this is more coincidence since the laptop is not always plugged in when this happens.

Sometimes, the screen will instead freeze for a little bit and show graphical artifacts. This is the only capture I have, but it’s happened a handful of times.

journalctl:

Mar 02 11:55:21 NIXYEVA NetworkManager[889]: <info>  [1740934521.1158] device (wlp1s0): state change: disconnected -> unmanaged (reason 'sleeping', sys-iface-state: 'managed')
Mar 02 11:55:21 NIXYEVA NetworkManager[889]: <info>  [1740934521.2593] device (wlp1s0): set-hw-addr: reset MAC address to xx:xx:xx:xx:xx:xx (unmanage)
Mar 02 11:55:21 NIXYEVA systemd[1]: Reached target sleep.target - Sleep.
Mar 02 11:55:21 NIXYEVA wpa_supplicant[894]: p2p-dev-wlp1s0: CTRL-EVENT-DSCP-POLICY clear_all
Mar 02 11:55:21 NIXYEVA wpa_supplicant[894]: p2p-dev-wlp1s0: CTRL-EVENT-DSCP-POLICY clear_all
Mar 02 11:55:21 NIXYEVA wpa_supplicant[894]: nl80211: deinit ifname=p2p-dev-wlp1s0 disabled_11b_rates=0
Mar 02 11:55:21 NIXYEVA systemd[1]: Starting systemd-suspend.service - System Suspend...
Mar 02 11:55:21 NIXYEVA systemd-sleep[11179]: Entering sleep state 'suspend'...
-- Boot b6647471d0af41678744b66f51f65f8e -- 
Mar 02 12:13:35 NIXYEVA kernel: Linux version 6.14.0-rc4+ (helpimnotdrowning@NIXYEVA) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #2 SMP PREEMPT_DYNAMIC Thu Feb 27 14:32:56
Mar 02 12:13:35 NIXYEVA kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.14.0-rc4+ root=UUID=3388a2f9-b6e3-4664-8fa6-4dc8c1583b88 ro amdgpu.dcdebugmask=0x10 amdgpu.gpu_recovery=1 systemd.journald.forward_to_console=1 console=tty1 pause_on_oops=10 plymouth.use-simpledrm
Mar 02 12:13:35 NIXYEVA kernel: BIOS-provided physical RAM map:
Mar 02 12:13:35 NIXYEVA kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable

(the boot flags systemd.journald.forward_to_console=1 console=tty1 pause_on_oops=10 plymouth.use-simpledrm was my failed attempt at trying to see if it was creating a kernel panic message, but I gave up on it)

Using:

  • Framework Laptop 13 (batch 2)
  • BIOS: 3.05
  • CPU: Ryzen 5 7640U
  • RAM: 16gb Crucial DDR5-5600
  • Network:
    • AMD RZ616 (Mediatek MT7922)
    • Intel AX210
  • OS: Debian 12

Can you confirm this issue occurs on stable kernels? your log entry is from an kernel in development.

I believe the latest stable kernel is 6.12, but check the latest available for Debian. That is the latest distro kernel for Debian 12 that is stable. I have no interest in debuging an development kernel on your behalf.

/Zoe

You might wish to read this:

Try the S5_RESET_STATUS patch.

I was just able to recreate it on bookworm linux-image-6.1.0-31-amd64 (6.1.128-1) stable.

Every kernel I listed (apart from the Liquorix and mainline kernels) is from stable, security or backports (and maybe a trixie kernel, can’t remember) and experiences this issue (I specifically recall, early into February, finding and installing a kernel from before the issue presented from my apt cache and also experiencing the issue, but I can’t remember which one)

are you able to run a memory test on the gpu?

The message says kernel: S5_RESET_STATUS = 0x00800800, which doesn’t seem to match any of the other mentioned codes in the thread you linked. I assume these status codes have to be decoded by AMD to be of any use?

I had to modify the patch a very tiny amount since the file drivers/i2c/busses/i2c-piix4.c seems to have changed in kernel 6.14 (one line was added where the define was supposed to go, so the initial patch apply failed)

For posterity, the original patch was:

diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c
index 809fbd014cd6..043b29f1e33c 100644
--- a/drivers/i2c/busses/i2c-piix4.c
+++ b/drivers/i2c/busses/i2c-piix4.c
@@ -100,6 +100,7 @@
 
 #define SB800_PIIX4_FCH_PM_ADDR			0xFED80300
 #define SB800_PIIX4_FCH_PM_SIZE			8
+#define SB800_PIIX4_FCH_PM_S5_RESET_STATUS	0xC0
 
 /* insmod parameters */
 
@@ -200,6 +201,9 @@ static int piix4_sb800_region_request(struct device *dev,
 
 		mmio_cfg->addr = addr;
 
+		addr += SB800_PIIX4_FCH_PM_S5_RESET_STATUS;
+		pr_info_once("S5_RESET_STATUS = 0x%08x", ioread32(addr));
+
 		return 0;
 	}

and my modified patch, acting upon torvalds/linux@1e15510, is:

diff --git a/drivers/i2c/busses/i2c-piix4.c b/drivers/i2c/busses/i2c-piix4.c
index dd75916157f..521a257588d 100644
--- a/drivers/i2c/busses/i2c-piix4.c
+++ b/drivers/i2c/busses/i2c-piix4.c
@@ -87,6 +87,7 @@
 
 #define SB800_PIIX4_FCH_PM_ADDR			0xFED80300
 #define SB800_PIIX4_FCH_PM_SIZE			8
+#define SB800_PIIX4_FCH_PM_S5_RESET_STATUS	0xC0
 #define SB800_ASF_ACPI_PATH			"\\_SB.ASFC"
 
 /* insmod parameters */
@@ -182,6 +183,9 @@ int piix4_sb800_region_request(struct device *dev, struct sb800_mmio_cfg *mmio_c
 
 		mmio_cfg->addr = addr;
 
+		addr += SB800_PIIX4_FCH_PM_S5_RESET_STATUS;
+		pr_info_once("S5_RESET_STATUS = 0x%08x", ioread32(addr));
+
 		return 0;
 	}

(the SB800_PIIX4_FCH_PM_S5_RESET_STATUS define was moved slightly and the function definition line for piix4_sb800_region_request was re-spaced, so it’s cut off differently; both changes seem to be just aesthetic)

1 Like

I see a lot of different tools (stress tests, several random python scripts…) online. Is there a specific test thatll work best?

memtest is probably good start.

I ran memtest86+ and memtest_vulkan (both just once) and passed without issue.

unfortunately it might be necessary to run memtest for an extend period of time, I have heard storys of people running memtest up to 24h before catching errors.

if you are using wayland it might be worth trying to reset your desktop configuration, apparently some have experience issues with plugins/extensions. leading to similar issues.

it might also be worth turning of suspend, and see if it is related to suspend/resume.

you can also run dmesg under condition that trigger the issue to see if their exist anything useful…

/Zoe

So, S5_RESET_STATUS = 0x00800800 means:
Bit 11 set. - Reserved
Bit 23 set - 23: shutdown_msg. Read-write,Read,Write-1-to-clear. Reset: 0. system reset was caused by a SHUTDOWN command from CPU (when PMx08[20]=1 and PMx74[17]=1). Write 1 to clear. Bit[31] and Bit[28:16] except bit[20] will be cleared by Last reset event except the associated bit will be set.

Shutdown (message from CPU):
Triple faults in CPU will cause an internal SHUTDOWN message
broadcasted. FCH will generate a reset to S0 logic; configurable to warm or cold reset.

So, this implies the CPU caused the shutdown.
My guess therefore, is its due to a software bug causing a triple fault, rather than anything more serious.

This might be a faulty device driver.
Try removing all slot cards and blacklisting the wifi card.
If that does not help, it might be a problem with the nvme SSD.
Try removing the SSD and boot from a USB stick and see if the problem goes away.
Once you have identified which device/device driver is the problem, one can narrow it down a bit.
What make/model of wifi card, nvme SSD do you have?
There have been reports of some NVME SSDs causing sleep problems.

1 Like

This happened on the previous Mediatek MT7922 card and my current Intel AX210. The SSD is a WD Black SN770. I’ll experiment with your suggestions.

If the shutdown is due to a “triple fault”, this is likely to affect I wide variety of AMD 7840 main-boards, not just FW13/FW16 ones.
Has anyone seen similar reports with other manufacturer’s laptops ?

It’s bizarre to me that this has occurred for people on and off across so many kernels and so many distros. Are there any other packages we should be looking at as possible culprits? (I’ve heard mesa is one possibility?)