[TRACKING] Cannot resume from suspend on AMD

Okay! Also, I added amdgpu.sg_display=0 because I was getting the screen glitching issue.

jp@fw:~$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-1bc906c1-fbe5-4985-b07f-9fa13b689164 rhgb quiet amdgpu.sg_display=0"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
jp@fw:~$ journalctl -b 0 -u systemd-logind
Nov 20 13:35:17 fw systemd[1]: Starting systemd-logind.service - User Login Management...
Nov 20 13:35:17 fw systemd-logind[1631]: New seat seat0.
Nov 20 13:35:17 fw systemd-logind[1631]: Watching system buttons on /dev/input/event0 (Lid Switch)
Nov 20 13:35:17 fw systemd-logind[1631]: Watching system buttons on /dev/input/event1 (Power Button)
Nov 20 13:35:17 fw systemd-logind[1631]: Watching system buttons on /dev/input/event4 (FRMW0004:00 32AC:0006 Consumer Control)
Nov 20 13:35:17 fw systemd-logind[1631]: Watching system buttons on /dev/input/event2 (AT Translated Set 2 keyboard)
Nov 20 13:35:18 fw systemd[1]: Started systemd-logind.service - User Login Management.
Nov 20 13:35:18 fw systemd-logind[1631]: New session c1 of user gdm.
Nov 20 13:35:24 fw systemd-logind[1631]: New session 2 of user jp.
Nov 20 13:35:27 fw systemd-logind[1631]: Session c1 logged out. Waiting for processes to exit.
Nov 20 13:35:27 fw systemd-logind[1631]: Removed session c1.
Nov 20 14:08:48 fw systemd-logind[1631]: Power key pressed short.
Nov 20 14:08:48 fw systemd-logind[1631]: The system will suspend now!
Nov 20 14:08:49 fw systemd-logind[1631]: Lid closed.
Nov 20 15:31:28 fw systemd-logind[1631]: Lid opened.
Nov 20 15:31:28 fw systemd-logind[1631]: Operation 'sleep' finished.
Nov 20 16:48:51 fw systemd-logind[1631]: Lid closed.
Nov 20 16:48:51 fw systemd-logind[1631]: Suspending...
Nov 20 19:21:49 fw systemd-logind[1631]: Operation 'sleep' finished.
Nov 20 19:21:56 fw systemd-logind[1631]: Lid opened.
Nov 20 20:10:47 fw systemd-logind[1631]: Watching system buttons on /dev/input/event13 (SRS-XE200 (AVRCP))
jp@fw:~$ journalctl -b -1 -u systemd-logind
Nov 20 13:27:02 fw systemd[1]: Starting systemd-logind.service - User Login Management...
Nov 20 13:27:02 fw systemd-logind[9904]: New seat seat0.
Nov 20 13:27:02 fw systemd-logind[9904]: Watching system buttons on /dev/input/event0 (Lid Switch)
Nov 20 13:27:02 fw systemd-logind[9904]: Watching system buttons on /dev/input/event1 (Power Button)
Nov 20 13:27:02 fw systemd-logind[9904]: Watching system buttons on /dev/input/event4 (FRMW0004:00 32AC:0006 Consumer Control)
Nov 20 13:27:02 fw systemd-logind[9904]: Watching system buttons on /dev/input/event2 (AT Translated Set 2 keyboard)
Nov 20 13:27:02 fw systemd[1]: Started systemd-logind.service - User Login Management.
Nov 20 13:27:03 fw systemd-logind[9904]: New session c1 of user gdm.
Nov 20 13:27:24 fw systemd-logind[9904]: New session 2 of user jp.
Nov 20 13:27:27 fw systemd-logind[9904]: Session c1 logged out. Waiting for processes to exit.
Nov 20 13:27:27 fw systemd-logind[9904]: Removed session c1.
Nov 20 13:32:43 fw systemd-logind[9904]: Power key pressed short.
Nov 20 13:32:44 fw systemd-logind[9904]: The system will suspend now!
jp@fw:~$ journalctl -b -2 -u systemd-logind
Nov 19 18:12:53 fw systemd[1]: Starting systemd-logind.service - User Login Management...
Nov 19 18:12:53 fw systemd-logind[1509]: New seat seat0.
Nov 19 18:12:53 fw systemd-logind[1509]: Watching system buttons on /dev/input/event0 (Lid Switch)
Nov 19 18:12:53 fw systemd-logind[1509]: Watching system buttons on /dev/input/event1 (Power Button)
Nov 19 18:12:53 fw systemd-logind[1509]: Watching system buttons on /dev/input/event4 (FRMW0004:00 32AC:0006 Consumer Control)
Nov 19 18:12:53 fw systemd-logind[1509]: Watching system buttons on /dev/input/event2 (AT Translated Set 2 keyboard)
Nov 19 18:12:53 fw systemd[1]: Started systemd-logind.service - User Login Management.
Nov 19 18:12:53 fw systemd-logind[1509]: New session c1 of user gdm.
Nov 19 18:12:58 fw systemd-logind[1509]: New session 2 of user jp.
Nov 19 18:13:01 fw systemd-logind[1509]: Session c1 logged out. Waiting for processes to exit.
Nov 19 18:13:01 fw systemd-logind[1509]: Removed session c1.
Nov 19 18:25:52 fw systemd-logind[1509]: Lid closed.
Nov 19 18:25:52 fw systemd-logind[1509]: Suspending...
Nov 19 18:25:57 fw systemd-logind[1509]: Delay lock is active (UID 1000/jp, PID 2853/gnome-shell) but inhibitor timeout is reached.
Nov 19 18:26:58 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 19 18:27:24 fw systemd-logind[1509]: Suspending...
Nov 19 18:27:29 fw systemd-logind[1509]: Delay lock is active (UID 1000/jp, PID 2853/gnome-shell) but inhibitor timeout is reached.
Nov 19 18:27:35 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 19 18:28:00 fw systemd-logind[1509]: Suspending...
Nov 19 18:28:05 fw systemd-logind[1509]: Delay lock is active (UID 1000/jp, PID 2853/gnome-shell) but inhibitor timeout is reached.
Nov 19 18:29:37 fw systemd-logind[1509]: Lid opened.
Nov 19 18:29:37 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 19 20:35:02 fw systemd-logind[1509]: Lid closed.
Nov 19 20:35:02 fw systemd-logind[1509]: Suspending...
Nov 19 20:41:07 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 19 20:41:36 fw systemd-logind[1509]: Suspending...
Nov 20 08:07:05 fw systemd-logind[1509]: Lid opened.
Nov 20 08:07:05 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 20 08:14:05 fw systemd-logind[1509]: Lid closed.
Nov 20 08:14:05 fw systemd-logind[1509]: Suspending...
Nov 20 08:37:56 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 20 08:37:57 fw systemd-logind[1509]: Lid opened.
Nov 20 08:57:36 fw systemd-logind[1509]: Lid closed.
Nov 20 08:57:36 fw systemd-logind[1509]: Suspending...
Nov 20 09:00:08 fw systemd-logind[1509]: Lid opened.
Nov 20 09:00:08 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 20 09:37:27 fw systemd-logind[1509]: The system will suspend now!
Nov 20 09:47:01 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 20 10:02:01 fw systemd-logind[1509]: The system will suspend now!
Nov 20 10:17:45 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 20 10:32:45 fw systemd-logind[1509]: The system will suspend now!
Nov 20 11:25:28 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 20 12:11:39 fw systemd-logind[1509]: The system will suspend now!
Nov 20 12:13:35 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 20 12:29:03 fw systemd-logind[1509]: The system will suspend now!
Nov 20 12:43:58 fw systemd-logind[1509]: Operation 'sleep' finished.
Nov 20 13:10:17 fw systemd-logind[1509]: Lid closed.
Nov 20 13:10:17 fw systemd-logind[1509]: Suspending...

@Matt_Hartley An anecdotal data point which may help track down the problem. Yesterday I received my Framework Laptop 13 DIY Edition (AMD Ryzen 7640U), batch 5, BIOS 3.03. Within BIOS settings, limited max battery charge to 80%. Promptly installed Fedora 38 (not 39). During installation I set the filesystem to ext4 instead of btrfs. After installation I updated the kernel to 6.5.11-200; no other package updates.

Suspending and then attempting to wake resulted in a hard lockup. The keyboard lit up, but the screen was completely blank, and no other indication that it was working. The only way to get the machine back alive was a hard reboot by holding the power button for about 10 seconds. The failure to wake happened about 10 times or so. Nothing was attached to the expansion cards.

While in the zombie state (with the keyboard lit up), toggling the caps lock key stopped working after about a minute (the white light on the key wouldn’t turn on).

Analysing logs via journalctl didn’t seem to yield anything that stood out: the system entered sleep and never woke up from it; the immediate messages after sleep indicated a fresh boot.

This morning I installed all the latest Fedora 38 updates. To be doubly sure I reinstalled the kernel (including kernel-modules-extra), systemd, grub2 from the update channel. Manually rebooted several times via the “power off” menu in Gnome.

Suspending the laptop seems to work properly now, with the laptop coming back alive. I did the suspend/wake cycle about 10 times now within a 5 minute window. However, given the problems reported within this ticket (intermittent failures) and my own negative experience, I don’t know if suspend is actually reliable over a longer period. Will report back here with observations.

PS. There was one “weird” thing during the first boot immediately after assembling the laptop. Step 12 in the DIY Edition Quick Start Guide (AMD Ryzen series) states that the first boot will “take a while” due to “memory training” and suggests that the wait will be in the “order of a minute or two”. On first boot I let it do its thing and left it running for at least 15 minutes. During this time, the screen remained blank and the fans were seemingly at full speed. Eventually I got frustrated/worried and did a hard reset via holding the power button.
Is the Quick Start Guide wrong here? Is the first boot supposed to take 15 minutes or even longer?

Laptop details as follows, in case this is helpful.
AMD Ryzen 5 7640U.
1 x 16GB RAM; DDR5-5600; product model: FRANRMFW02.
1TB (Gen 4) WD_BLACK SN770 NVMe SSD.
Expansion cards: 1 x USB-C, 1 x HDMI, 2 x USB-A
Locations:
(1) top-left: USB-C (FRACCKBZ01)
(2) bottom-left: HDMI (FRACCHBZ01-3)
(3) top-right: USB-A (FRACCABZ01)
(4) bottom-right: USB-A (FRACCABZ01)
Let me know if you need the serial numbers of each expansion card.

sudo dmidecode -s bios-version gives: 03.03
uname -r gives: 6.5.11-200.fc38.x86_64
fwupdmgr update states that there are no available firmware updates (WD BLACK SSD and the fingerprint sensor); system firmware and UEFI dbx have the latest available firmware version.

I’ve noticed a pattern: I’ll suspend my machine and it will reboot rather than wake up, and then the very next time I suspend it won’t resume. This what I saw when I ran amd_s2idle.py in previous post.

I’m not seeing any failures to resume that aren’t preceded by suspend-then-reboot.

I tried running amd_s2idle.py multiple times (using the --count parameter) and saw this same pattern: reboot, then unable to wake from suspend, then variable number of successful resumes.

Not sure if this is at all helpful, but thought I’d add it to the mix.

@Matt_Hartley May have found the culprit. Suspend becomes unreliable when the laptop is disconnected from power and “PCIE Dynamic Link Power Management” is enabled within BIOS.

The following seems to reproduce the hard lockup (and the wake-to-reboot observed by jonp) on my machine when using Fedora 38 + latest updates.
(1) Ensure “PCIE Dynamic Link Power Management” is enabled within the BIOS Setup Utility.
(2) Turn off laptop.
(3) Plug in power.
(4) Turn on laptop.
(5) Log in to desktop.
(6) Suspend from the Gnome power off menu.
(7) Disconnect power.
(8) Attempt to wake by pressing the power button; at this stage the laptop either reboots or refuses to wake up.

Suspend seems to work if “PCIE Dynamic Link Power Management” is disabled.
The BIOS setup utility gives the following info:
“Reduce PCIE bus speed to gen 3 when running on battery. AMD PSPP.”

With this setting disabled, suspend appears to work, but the battery still seems to drain at a noticeable rate (around 3 percentage points per hour or so).

(For posterity, this is how to disable PCIE power management. Reboot laptop and repeatedly press F2 until the BIOS menu comes up. Select “Setup Utility”. Go to Advanced settings. Disable “PCIE Dynamic Link Power Management”. Press F10 to save settings and exit.)

@jonp @mailtodevnull Can you try reproducing the above workaround?

I notice a luks volume in your kernel CLI - so I assume you’re using encrypted volumes and have secureboot turned on.

Do you get the same behaviour when secureboot is disabled and/or you don’t have encrypted volume?

@goldenspinach I tried your recipe (again, I’m on Fedora 39) but it doesn’t reliably reproduce the issue for me.

@jwp I am running a luks volume, but I have secure boot turned off. This is my primary machine atm so I’m reluctant to wipe it just to run the experiment of seeing if a non-luks volume / secureboot is a factor here.

That said, the issue is causing me significant strife – I’ve lost some work at this point, and I can’t rely on this machine to wake from suspend which is annoying.

So… perhaps it is time to wipe it and try again…

I’m also experiencing suspend issues on AMD (with luks)

Sometimes Fedora will wake up after a minute and let me login, but the btrfs luks filesystem seems to be ‘read only’. Other times I’ll just receive many read-write errors on a tty screen.

This occurs even after applying this workaround:

System details

Detailed information (hardware, OS, UEFI, etc. click to expand)

$ inxi -bL --za

System:
  Kernel: 6.5.12-300.fc39.x86_64 arch: x86_64 bits: 64 Desktop: GNOME v: 45.1
    Distro: Fedora release 39 (Thirty Nine)
Machine:
  Type: Laptop System: Framework product: Laptop 13 (AMD Ryzen 7040Series)
    v: A7 serial: <superuser required>
  Mobo: Framework model: FRANMDCP07 v: A7 serial: <superuser required>
    UEFI: INSYDE v: 03.03 date: 10/17/2023
Battery:
  ID-1: BAT1 charge: 30.8 Wh (60.0%) condition: 51.3/55.0 Wh (93.2%)
    volts: 15.8 min: 15.4
CPU:
  Info: 8-core AMD Ryzen 7 7840U w/ Radeon 780M Graphics [MT MCP] speed (MHz):
    avg: 677 min/max: 400/5289:6076:5605:5132:5760:5447:5918
Graphics:
  Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT]
    driver: amdgpu v: kernel
  Device-2: AMD Phoenix1 driver: amdgpu v: kernel
  Display: wayland server: X.Org v: 23.2.2 with: Xwayland v: 23.2.2
    compositor: gnome-shell driver: X: loaded: amdgpu,modesetting
    unloaded: fbdev,radeon,vesa dri: radeonsi gpu: amdgpu
    resolution: 2256x1504~60Hz
  API: OpenGL v: 4.6 vendor: amd mesa v: 23.2.1 renderer: AMD Radeon
    Graphics (gfx1103_r1 LLVM 16.0.6 DRM 3.54 6.5.12-300.fc39.x86_64)
Network:
  Device-1: MEDIATEK MT7922 802.11ax PCI Express Wireless Network Adapter
    driver: mt7921e
Logical:
  Message: No logical block device data found.
  Device-1: luks-38101d4c-c21e-49f4-9320-138a055fe380 type: LUKS
    size: 1.82 TiB Components: p-1: nvme0n1p3
Drives:
  Local Storage: total: 1.82 TiB used: 3.58 TiB (196.7%)
Info:
  Processes: 548 Uptime: 2h 13m Memory: total: 32 GiB note: est.
  available: 30.53 GiB used: 8.52 GiB (27.9%) Shell: Zsh inxi: 3.3.31
  • No available firmware updates: sudo fwupdtool update:
    Loading…                 [*****************                      ]
    11:08:13.542 FuEngine             failed to add device /sys/devices/pci0000:00/0000:00:08.1/0000:c1:00.0: ioctl error: Bad address [14]
    Loading…                 [************************************** ]
    Devices with no available firmware updates: 
     • Fingerprint Sensor
     • UEFI dbx
    

Suspend problems

This sometimes occurs when I attempt to wake from suspend:

Another time the Gnome lock screen appeared after about a minute and I logged in. GUI applications had database related error popups and when I opened a shell and typed touch test, I received a ‘read only filesystem’ error message.

As the filesystem is going read only, logs are failing to write to disk so I only have photos of the display:

I will work on recreating this tomorrow with additional testing/replication. I’ll update once I’ve had time to recreate this. Thanks everyone.

I think the problem for me is with the HDMI Expansion card.

I ran my machine all day without the HDMI and USB-A expansion cards (only two USB-C expansion cards were in place) and had no issues with resuming.

Tonight, I ran amd_s2idle.py with --count 100:

  • 100 cycles with USB-C only, with no issues
  • 100 cycles with USB-C + USB-A with no issues
  • 80 cycles with USB-C + USB-A + HDMI and then a reboot, followed by a fail to resume next
  • 120 cycles with USB-C + HDMI and then a reboot, followed by a fail to resume next
  • Another 200 cycles with USB-C + USB-A with no issues
1 Like

@Matt_Hartley I replicated (by chance) the original issue reported by @mailtodevnull in the first post.

Suspended the laptop via Gnome power off menu and closed the lid. Opened the lid 9 hours later. Laptop wouldn’t wake. Pressed the power button, and still wouldn’t wake. Tried several times. Pressed trackpad and random keys on the keyboard several times. Still wouldn’t wake.

The only way to get it alive again was by hard reboot.

System details: Fedora 38 with kernel 6.5.12-200.fc38.x86_64 + all updates as of yesterday. Expansion cards: (1) USB-C, (2), HDMI, (3) USB-C, (4) USB-A.

(PS. over the 9 hours, battery charge went from 80% to 73%).

1 Like

@goldenspinach I really do advise running:

sudo dnf system-upgrade download --releasever=39

and then

sudo dnf system-upgrade reboot

It will take 20 mins or so on a reasonable internet connection and save you a lot of strife. If for some reason you need some fedora38 packaged thing; you can use a toolbox container. - Which works just fine on regular fedora as it does on silverblue/ublue.

In general IME the Fedora package ecosystem, unlike ubuntu derivitives have a much longer on-ramp of ecosystem partner CI/CD builds based on the pre-release branched tags prior to a GA release ; this is 2023 and the whole N-1 ‘wisdom’ from the 80’s/90’s really no longer applies. dnf system-upgrade will also deal with rpm-fusion updates etc automatically.

@jonp For general purpose encrypted volumes I tend to favour userspace / non-system integrated things. Except for appliance/super secure things (which i’ll build something validated up from scratch). For a Laptop - there are just too many risk surfaces to really counter with something like Luks or TPM IMNSHO. If I actually value something it will go on a USB key with a gpg archive around it. I’m not saying it shouldn’t work - just that i’ve seen more problems and data losses from system level integrated crypto solutions than they have ever solved.

1 Like

Ditto what @jwp suggested.

Thanks jwp, crazy busy and I appreciate the assist.

@jwp Thanks. I’ll consider it. Because of the nature of my work I have to have some guarantees around encryption at rest but could possibly switch things up so I’m not running an encrypted system drive.

That said, since running without the HDMI Expansion card I have seen no issues with resuming, whereas I was seeing multiple occurrences per day prior. Not sure if I got a janky card or what.

@Matt_Hartley Based on the HDMI observation made by @jonp I removed the HDMI expansion card, and left only one USB-C card installed in slot (1). All other slots were empty.

Did two long-duration suspends (over 6 hours). In both cases the laptop woke up without problems.

Not a definite smoking gun, but it’s anecdotal evidence suggesting that something may be wrong with the HDMI card, and/or the Linux kernel really doesn’t like it.

PS. Interestingly, without the HDMI card installed, the drop in battery percentage reduced to about 0.5-0.6 points per hour during the suspend. With the HDMI card it was about 0.7 to 0.8 points per hour.

Given this latest development, I’ve also removed the HDMI expansion card, and will report my anecdata.

@jwp As per your advice, upgraded to F39. This came along with the rather fresh update to kernel 6.6.2, which promptly caused problems.

Under kernel 6.6.2-201.fc39, waking up the machine from suspend resulted in a completely white screen. Switching to terminal view via the usual control-alt-F1/F2/F3 etc key combos didn’t help. Hard reboot was required.

Downgrading to kernel 6.5.12-200.fc38 (retained from the original F38 install) fixed the issue. Suspend with that kernel seems to be working okay.

In all cases the only installed expansion cards are USB-C in (1), and USB-A in (4). USB-C had a charger connected to it.

As per @jonp’s suspicion over the HDMI expansion card, I removed my HDMI expansion card (slot 4), booted my system (secure boot, luks, fedora 39), logged in, opened a couple of apps and then closed the lid for two hours, all on battery power, starting the test at ~60%

Left-hand side expansion cards Right-hand side expansion cards
1. USB-C 3. USB-C
2. USB-A 4. Empty (used to be HDMI)

Upon opening the lid, the screen was blank (backlight on) for ~70 seconds before I was presented with the login screen. I logged in and a couple of GUI applications had errors. Battery was far from empty (somewhere above 40%). I attempted to open a terminal twice to do a write test with $ touch as I didn’t already have a terminal open, but gnome-terminal failed to open both times.

I then pressed ctrl, alt, F3 to move from my graphical gnome session to tty3. About 4 error messages appeared (can’t remember exactly what they were about, I believe something about btrfs?) and was then on a blank screen with a single blinking underscore in the top left for a few seconds, before these errors began being printed to the screen:

TLDR: A 2 hour suspend with no HDMI expansion card still resulted in a read-only system after waking (which took 70 seconds).

Perhaps relevant, as my framework used to be Intel, I’m using 2022 expansion cards which came with my original Intel framework, now plugged into an AMD motherboard.

I’m going to try disabling secure boot next, whilst keeping the suspect HDMI expansion card disconnected.


EDIT: After removing the kernel arguments module_blacklist=hid_sensor_hub and nvme.noacpi=1, my system has been waking from suspend immediately with no read-only problems :tada:

Will post again after longer term testing where I plan to slowly re-enable secure boot and reinstall the HDMI expansion card if all goes well.

~24 hours so far with HDMI expansion card disconnected, no suspend/resume failures.

Edit: Went to a co-working space today, and this happened :face_exhaling:

3 Likes

@goldenspinach Just upgraded to 6.6.2-201.fc39 yesterday with no issues resuming here.

Also: as an update, since removing the HDMI card 3 days ago I’ve not had a single issue with resuming.

I’m not entirely sure about the graphical issue others are seeing, but I have the amdgpu.sg_display=0 kernel option set because I did experience that earlier and haven’t since.

1 Like

Whoops. Much :face_with_hand_over_mouth:
Does the expansion card have a full size HDMI, or a mini one?