System freeze after resuming from suspend

I’ve finally gotten to a point with my FW16 that I feel safe trying to suspend and resume it instead of just full power-off and power back on every time, and I pretty quickly have had some problems. After a couple of cycles of suspend/resume, it resumed successfully but then within an hour or so today, suddenly froze (screen went blank, caps lock light didn’t respond to pressing caps lock, and then in a minute or so the keyboard lights flickered and it rebooted unprompted).

What’s in the logs is a little weird. First suspend/resume:

Nov 27 21:05:35 kitchen kernel: SCSI subsystem initialized
Nov 27 21:05:35 kitchen kernel: nvme nvme0: using unchecked data buffer
Nov 27 21:05:35 kitchen kernel: block nvme0n1: No UUID available providing old NGUID
Nov 27 21:05:36 kitchen kernel: hid-generic 0003:32AC:0002.000C: hiddev96,hidraw0: USB HID v1.11 D>
Nov 27 21:17:31 kitchen kernel: wlp1s0: deauthenticating from a8:fb:40:54:19:98 by local choice (R>
Nov 27 21:17:32 kitchen kernel: PM: suspend entry (s2idle)
Nov 27 21:17:32 kitchen kernel: Filesystems sync: 0.002 seconds
Nov 27 23:20:31 kitchen kernel: Freezing user space processes
Nov 27 23:20:31 kitchen kernel: Freezing user space processes completed (elapsed 0.001 seconds)
Nov 27 23:20:31 kitchen kernel: OOM killer disabled.
Nov 27 23:20:31 kitchen kernel: Freezing remaining freezable tasks
Nov 27 23:20:31 kitchen kernel: Freezing remaining freezable tasks completed (elapsed 0.001 second>
Nov 27 23:20:31 kitchen kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Nov 27 23:20:31 kitchen kernel: pcieport 0000:00:08.3: quirk: disabling D3cold for suspend
Nov 27 23:20:31 kitchen kernel: ACPI: EC: interrupt blocked
Nov 27 23:20:31 kitchen kernel: ACPI: EC: interrupt unblocked
Nov 27 23:20:31 kitchen kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming…
Nov 27 23:20:31 kitchen kernel: nvme nvme0: 16/0/0 default/read/poll queues
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on h>
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on >
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on >
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on >
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on >
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on >
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on >
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on>
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on>
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 >
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hu>
Nov 27 23:20:31 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13>
Nov 27 23:20:31 kitchen kernel: OOM killer enabled.
Nov 27 23:20:31 kitchen kernel: Restarting tasks: Starting
Nov 27 23:20:31 kitchen kernel: Restarting tasks: Done
Nov 27 23:20:31 kitchen kernel: random: crng reseeded on system resumption
Nov 27 23:20:31 kitchen kernel: PM: suspend exit

So it looks to me like some of the suspend activity only took place after it had gone to sleep for a while, and while it was waking up (at 23:20) was when it was finishing freezing tasks, and then immediately unfreezing them and waking up a fraction of a second later. Maybe I’m reading it wrong, but that’s what it looks like to me.

And then during that second period of waking, I got a bunch of these:

Nov 27 23:20:44 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:20:57 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:21:22 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:22:00 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:22:12 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:22:25 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:22:37 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:22:50 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:23:03 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!

And then, I did another suspend and resume a little while later, with the same pattern where it looks to me like some of the suspend is only taking place during wakeup (on the 28th as opposed to the 27th), because it didn’t get a chance to complete it during the suspend:

Nov 27 23:26:00 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:26:12 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:26:25 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 27 23:26:35 kitchen kernel: wlp1s0: deauthenticating from a8:fb:40:54:19:98 by local choice (R>
Nov 27 23:26:35 kitchen kernel: PM: suspend entry (s2idle)
Nov 27 23:26:35 kitchen kernel: Filesystems sync: 0.010 seconds
Nov 28 11:23:10 kitchen kernel: Freezing user space processes
Nov 28 11:23:10 kitchen kernel: Freezing user space processes completed (elapsed 0.001 seconds)
Nov 28 11:23:10 kitchen kernel: OOM killer disabled.
Nov 28 11:23:10 kitchen kernel: Freezing remaining freezable tasks
Nov 28 11:23:10 kitchen kernel: Freezing remaining freezable tasks completed (elapsed 0.001 second>
Nov 28 11:23:10 kitchen kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Nov 28 11:23:10 kitchen kernel: ACPI: EC: interrupt blocked
Nov 28 11:23:10 kitchen kernel: ACPI: EC: interrupt unblocked
Nov 28 11:23:10 kitchen kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resuming…
Nov 28 11:23:10 kitchen kernel: nvme nvme0: 16/0/0 default/read/poll queues
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: SMU is resumed successfully!
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on h>
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on >
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on >
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on >
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on >
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on >
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on >
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on>
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on>
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 >
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hu>
Nov 28 11:23:10 kitchen kernel: amdgpu 0000:c1:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13>
Nov 28 11:23:10 kitchen kernel: OOM killer enabled.
Nov 28 11:23:10 kitchen kernel: Restarting tasks: Starting
Nov 28 11:23:10 kitchen kernel: Restarting tasks: Done
Nov 28 11:23:10 kitchen kernel: random: crng reseeded on system resumption
Nov 28 11:23:10 kitchen kernel: PM: suspend exit
Nov 28 11:23:10 kitchen kernel: iwlwifi 0000:01:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x20
Nov 28 11:23:10 kitchen kernel: iwlwifi 0000:01:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
Nov 28 11:23:10 kitchen kernel: iwlwifi 0000:01:00.0: WFPM_AUTH_KEY_0: 0x90
Nov 28 11:23:10 kitchen kernel: iwlwifi 0000:01:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
Nov 28 11:23:14 kitchen kernel: wlp1s0: authenticate with a8:fb:40:54:19:98 (local address=5c:b2:6>
Nov 28 11:23:14 kitchen kernel: wlp1s0: send auth to a8:fb:40:54:19:98 (try 1/3)
Nov 28 11:23:14 kitchen kernel: wlp1s0: authenticated

And then, a bunch more spurious interrupts, and then the freeze (with nothing in the log when the freeze happens, this is the end of the log):

Nov 28 12:14:53 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 28 12:15:06 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 28 12:15:18 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 28 12:15:31 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 28 12:15:44 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 28 12:15:57 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Nov 28 12:16:09 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!

This is on battery power, the charger not plugged in.

Anyone any ideas?


Which Linux distro are you using? NixOS

Which release version? 25.05.20251124.1c8ba8d (Warbler)

Which kernel are you using? 6.17.8

Which BIOS version are you using? 03.07

Which Framework Laptop 16 model are you using? AMD Ryzen™ 7840HS

Which RAM chips do you have? Make, Model.

After the system boots after the freeze/reset, do you see anything in the syslogs mentioning “Previous system reset reason”. If so, please cut/paste a few lines either side of it here.

It is two 8GB DDR5-5600 chips. I have no idea what manufacturer, it’s just what Framework sent me.

In syslog after the freeze, it says:

Nov 28 12:47:33 kitchen kernel: x86/amd: Previous system reset reason [0x08000800]: an uncorrected error caused a data fabric sync flood event

Oh, and the surrounding lines:

Nov 28 12:47:33 kitchen kernel: simple-framebuffer simple-framebuffer.0: [drm] fb0: simpledrmdrmfb frame buffer device
Nov 28 12:47:33 kitchen kernel: drop_monitor: Initializing network drop monitor service
Nov 28 12:47:33 kitchen kernel: NET: Registered PF_INET6 protocol family
Nov 28 12:47:33 kitchen kernel: Segment Routing with IPv6
Nov 28 12:47:33 kitchen kernel: In-situ OAM (IOAM) with IPv6
Nov 28 12:47:33 kitchen kernel: x86/amd: Previous system reset reason [0x08000800]: an uncorrected error caused a data fabric sync flood event
Nov 28 12:47:33 kitchen kernel: microcode: Current revision: 0x0a70410a
Nov 28 12:47:33 kitchen kernel: resctrl: L3 allocation detected
Nov 28 12:47:33 kitchen kernel: resctrl: MB allocation detected
Nov 28 12:47:33 kitchen kernel: resctrl: SMBA allocation detected
Nov 28 12:47:33 kitchen kernel: resctrl: L3 monitoring detected
Nov 28 12:47:33 kitchen kernel: IPI shorthand broadcast: enabled

It appears that you are seeing this problem:

Anything you can find that makes it reproducible in any way will help make progress on a solution.

Interesting. I might add my findings to that, it’s good to know that there is a bug tracker for these things that gets a decent amount of attention it looks like.

I also just got a full freeze (no restart) again today, when otherwise the thing’s been pretty stable. Maybe a kernel regression? I just rolled back to 6.17.7 to see if that one behaves itself, I might switch back and forth between that and 6.17.8 to see if that is the issue.

That will probably be this one. Freeze then Hang (FTH).
Again, anything you can do that makes it easily reproducible will help.

Hm, I am less sure about that. The graphical corruption shown there looks to me clearly like the amdgpu hang, which is a distinct issue, I think. There’s a person saying they haven’t seen the glitch since trying their own EC firmware, but the EC firmware has no commits ahead of the stock FW firmware as far as I can see.

I will say, the other thing that I’m running differently now is that I’m traveling and running off battery which I usually don’t do (usually this machine is plugged in). So maybe the theory that that is related to the issue has some merit. But I feel like I need to dig into it more before saying confidently what it is or isn’t (I learned my lesson with the amdpgu problem trying different theories and having them “confirmed” just by accident because the problem is so intermittent). But, it’s easy to try the slightly older kernel and see if that resolves it, and try running from battery only for a little while to see if that makes it more frequent. This issue hasn’t been too rare for me so far this week (once every 1-3 days or so I would say), so presumably I’ll see it again before too long if I am running in a configuration that causes it.

Okay, I’m still not completely sure what’s going on, but it definitely does seem like there is some kind of flakiness with resuming after suspend. Even rolling back to 6.17.7, even running from charger power, I’m seeing stuff like that “Spurious native interrupt!” which I hadn’t been seeing when sticking with always shutdown/startup… and today I got a ping spike combined with this in the log:

Dec 02 18:30:10 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Dec 02 18:30:23 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Dec 02 18:30:35 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Dec 02 18:30:48 kitchen kernel: pcieport 0000:00:08.1: PME: Spurious native interrupt!
Dec 02 18:30:51 kitchen kernel: usb 1-4.1: reset full-speed USB device number 8 using xhci_hcd

That “reset full-speed USB device” is what corresponds to about a 10-second ping spike and then the network comes back.

IDK man. I think I’ve more or less reached my limit of how much time I want to contribute time to helping debug these things. I already had about 4 months of going back and forth with support trying to get my GPU not to freeze and my network not to randomly drop out. I think I might just say that this laptop is incapable of remaining stable when I do a suspend/resume, and I wasn’t the one who constructed it that way so it’s not my problem to track it down, and I will fully shut it down / start it back up for my own usage going forward when I want it to be fully stable and call it a day with that much. Sorry if I sound embittered about it but like I say I’ve just reached my limit of wanting to contribute more debugging time to Framework to help them work out the problem with their kit at this point.

What devices do you have plugged in?
If you unplug all usb devices and then suspend. Is that any more stable?
If it is then more stable, you can then try plugging stuff back in, until you find the device that is causing the problems.

I have 0 devices plugged in, aside from various expansion cards with nothing plugged into them. I think all of this behavior (including another FTH from just now) happened with not even the charger plugged in.