Linux Framework 16 intermittent failure to resume from suspend

Hello,

I got a Framework 16 laptop two weeks ago. In that time, I’ve had two situations where the laptop failed to resume from suspend. I researched this and found some similar topics:

I am uncertain if these issues match what I have seen, though. The symptoms are:

  1. Intermittent and fairly rare; maybe 2 out of 25 suspend/resume cycles.
  2. The power LED pulses slowly, as is normal for suspend.
  3. Pressing the power button has no effect. My only recourse is to hold the power button down for full power-off.
  4. On the second occasion, the laptop was already quite warm; didn’t notice either way on the first occasion.

Does this match any known issues with suspend/resume for AMD on Linux, or is there a hardware issue?

I tried amd_s2idle.py as mentioned elsewhere. If I understand correctly, this is a test script to exercise the suspend-to-idle functionality (as suspend-to-RAM is not supported on this platform). The script was able to run without incident. I ran a total of 25 cycles on battery power and 20 cycles on AC power. I have not yet been able to deliberately induce failure in any way.

  • Ryzen 7 7840HS
  • BIOS version 3.03
  • Debian Sid (Linux 6.10.6 kernel, stock Debian build)
  • I/O modules: 2x USB-C, 1x USB-A, 1x HDMI, 1x Ethernet, 1x audio
  • no graphics module

Please let me know if there are other details I should supply.

Thanks.

I would see if there is anything notable in the logs regarding suspend or resume for the two failure cases. Also, what SSD are you using?

Thank you for your response.

The logs for the first incident appear to have been rotated out.

The last messages in the systemd journal for second incident are:

2024-10-01T09:23:41-07:00 lizard sudo[9526]:  bugfood : PWD=/home/bugfood ; USER=root ; COMMAND=/home/bugfood/bin/s2ram
2024-10-01T09:23:41-07:00 lizard sudo[9526]: pam_unix(sudo:session): session opened for user root(uid=0) by bugfood(uid=1000)
2024-10-01T09:23:41-07:00 lizard systemd[1]: bacula-fd.service: Scheduled restart job, restart counter is at 96.
2024-10-01T09:23:41-07:00 lizard systemd[1]: Starting bacula-fd.service - Bacula File Daemon service...
2024-10-01T09:23:41-07:00 lizard systemd[1]: Started bacula-fd.service - Bacula File Daemon service.
2024-10-01T09:23:41-07:00 lizard bacula-fd[9530]: lizard-fd: Warning: Cannot bind port 9102: ERR=Cannot assign requested address: Retrying ...

The sudo messages are from me running my s2ram wrapper script, which initiates the suspend.

#!/bin/bash

set -e

xscreensaver-command -lock
sleep 1
exec "/usr/sbin/$(basename "$0")" "$@"

The bacula errors are an unrelated issue–but they do show that some userspace activity was still happening then.

Normally, I would expect to see that followed by a message from the kernel like:

2024-09-27T22:07:01-07:00 lizard kernel: PM: suspend entry (s2idle)

…but I don’t think I can conclude anything from not seeing the message here; the message may have been printed by the kernel but systemd never had chance to write the message to the journal.

My SSDs are:

$ lsscsi 
[N:0:0:1]    disk    WD_BLACK SN770 1TB__1                      /dev/nvme0n1
[N:1:0:1]    disk    WD_BLACK SN770M 1TB__1                     /dev/nvme1n1

I also see a kernel warning from the resume preceding the failure:

2024-10-01T09:00:28-07:00 lizard kernel: ------------[ cut here ]------------
2024-10-01T09:00:28-07:00 lizard kernel: WARNING: CPU: 10 PID: 5258 at drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:1532 dp_retrieve_lttpr_cap+0x121/0x1e0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel: Modules linked in: ccm sunrpc binfmt_misc nls_ascii nls_cp437 vfat fat typec_displayport amdgpu snd_sof_amd_rembrandt snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof amdxcp btusb uvcvideo drm_exec btrtl snd_hda_codec_realtek gpu_sched btintel videobuf2_vmalloc drm_buddy snd_sof_utils btbcm uvc drm_suballoc_helper btmtk snd_hda_codec_generic videobuf2_memops snd_hda_scodec_component drm_display_helper snd_soc_core snd_hda_codec_hdmi videobuf2_v4l2 cec bluetooth videodev snd_compress snd_hda_intel rc_core snd_pcm_dmaengine snd_intel_dspcfg drm_ttm_helper snd_pci_ps snd_intel_sdw_acpi ttm snd_rpl_pci_acp6x snd_hda_codec videobuf2_common snd_pci_acp6x snd_pci_acp5x drm_kms_helper snd_rn_pci_acp3x snd_hda_core mc snd_acp_config crc16 snd_soc_acpi i2c_algo_bit snd_pci_acp3x snd_hwdep amd_atl intel_rapl_msr intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 edac_mce_amd mac80211 kvm_amd amd_pmf amdtee libarc4 kvm ccp cfg80211 hid_sensor_als ucsi_acpi hid_sensor_trigger amd_sfh
2024-10-01T09:00:28-07:00 lizard kernel:  typec_ucsi hid_sensor_iio_common industrialio_triggered_buffer tee kfifo_buf cros_usbpd_charger platform_profile typec industrialio rapl wmi_bmof cros_usbpd_notify cros_ec_chardev cros_ec_debugfs cros_ec_sysfs cros_usbpd_logger sp5100_tco roles amd_pmc button ac k10temp watchdog rfkill cpufreq_ondemand snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore evdev i2c_dev sidewinder gameport joydev parport_pc ppdev lp parport efi_pstore configfs nfnetlink ip_tables x_tables autofs4 xfs dm_crypt dm_mod efivarfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 cdc_ncm cdc_ether usbnet r8152 mii libphy usbhid raid1 md_mod hid_multitouch hid_sensor_hub hid_generic nvme i2c_hid_acpi xhci_pci crc32_pclmul nvme_core i2c_hid crc32c_intel cros_ec_dev t10_pi xhci_hcd drm ghash_clmulni_intel crc64_rocksoft_generic sha512_ssse3 thunderbolt crc64_rocksoft crc_t10dif sha256_ssse3 usbcore cros_ec_lpcs
2024-10-01T09:00:28-07:00 lizard kernel:  crct10dif_generic cros_ec sha1_ssse3 crct10dif_pclmul crc64 i2c_piix4 usb_common video crct10dif_common hid battery wmi aesni_intel crypto_simd cryptd
2024-10-01T09:00:28-07:00 lizard kernel: CPU: 10 PID: 5258 Comm: kworker/u64:3 Not tainted 6.10.6-amd64 #1  Debian 6.10.6-1
2024-10-01T09:00:28-07:00 lizard kernel: Hardware name: Framework Laptop 16 (AMD Ryzen 7040 Series)/FRANMZCP07, BIOS 03.03 03/27/2024
2024-10-01T09:00:28-07:00 lizard kernel: Workqueue: async async_run_entry_fn
2024-10-01T09:00:28-07:00 lizard kernel: RIP: 0010:dp_retrieve_lttpr_cap+0x121/0x1e0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel: Code: 48 21 c8 48 c1 e2 38 48 09 d0 48 89 85 98 02 00 00 f6 85 c4 02 00 00 02 74 42 e8 7a ed ff ff 84 c0 75 39 48 8b 85 d8 01 00 00 <0f> 0b c6 85 9c 02 00 00 80 48 8b 40 10 48 8b 30 48 85 f6 74 04 48
2024-10-01T09:00:28-07:00 lizard kernel: RSP: 0018:ffffa50b8b363bf8 EFLAGS: 00010246
2024-10-01T09:00:28-07:00 lizard kernel: RAX: ffff8bc999af6c00 RBX: 00000000ffffffff RCX: 00ffffffffffffff
2024-10-01T09:00:28-07:00 lizard kernel: RDX: 0000000000000007 RSI: ffffa50b8b363bf8 RDI: 0000000000000000
2024-10-01T09:00:28-07:00 lizard kernel: RBP: ffff8bca7a3ee000 R08: ffff8bc98a47b940 R09: 0000000000000000
2024-10-01T09:00:28-07:00 lizard kernel: R10: 0000000000000002 R11: 0000000000000000 R12: ffff8bca0b9c2600
2024-10-01T09:00:28-07:00 lizard kernel: R13: ffff8bca7a3e9800 R14: ffffffffb06b2464 R15: 000000000000000a
2024-10-01T09:00:28-07:00 lizard kernel: FS:  0000000000000000(0000) GS:ffff8bd85ff00000(0000) knlGS:0000000000000000
2024-10-01T09:00:28-07:00 lizard kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2024-10-01T09:00:28-07:00 lizard kernel: CR2: 000055ca822302f8 CR3: 0000000dff620000 CR4: 0000000000750ef0
2024-10-01T09:00:28-07:00 lizard kernel: PKRU: 55555554
2024-10-01T09:00:28-07:00 lizard kernel: Call Trace:
2024-10-01T09:00:28-07:00 lizard kernel:  <TASK>
2024-10-01T09:00:28-07:00 lizard kernel:  ? __warn+0x80/0x120
2024-10-01T09:00:28-07:00 lizard kernel:  ? dp_retrieve_lttpr_cap+0x121/0x1e0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  ? report_bug+0x164/0x190
2024-10-01T09:00:28-07:00 lizard kernel:  ? handle_bug+0x3c/0x80
2024-10-01T09:00:28-07:00 lizard kernel:  ? exc_invalid_op+0x17/0x70
2024-10-01T09:00:28-07:00 lizard kernel:  ? asm_exc_invalid_op+0x1a/0x20
2024-10-01T09:00:28-07:00 lizard kernel:  ? dp_retrieve_lttpr_cap+0x121/0x1e0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  ? dp_retrieve_lttpr_cap+0x116/0x1e0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  link_blank_all_dp_displays+0x56/0xd0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  dcn31_init_hw+0x1da/0x860 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  dc_set_power_state+0x85/0xc0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  dm_resume+0x120/0x910 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
2024-10-01T09:00:28-07:00 lizard kernel:  ? _dev_info+0x79/0xa0
2024-10-01T09:00:28-07:00 lizard kernel:  amdgpu_device_ip_resume_phase2+0x4f/0xc0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  amdgpu_device_resume+0xa0/0x2d0 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  ? __pfx_pci_pm_resume+0x10/0x10
2024-10-01T09:00:28-07:00 lizard kernel:  amdgpu_pmops_resume+0x4a/0x80 [amdgpu]
2024-10-01T09:00:28-07:00 lizard kernel:  ? __pfx_pci_pm_resume+0x10/0x10
2024-10-01T09:00:28-07:00 lizard kernel:  dpm_run_callback+0x88/0x1e0
2024-10-01T09:00:28-07:00 lizard kernel:  device_resume+0x9c/0x220
2024-10-01T09:00:28-07:00 lizard kernel:  async_resume+0x1d/0x30
2024-10-01T09:00:28-07:00 lizard kernel:  async_run_entry_fn+0x31/0x130
2024-10-01T09:00:28-07:00 lizard kernel:  process_one_work+0x179/0x390
2024-10-01T09:00:28-07:00 lizard kernel:  worker_thread+0x265/0x380
2024-10-01T09:00:28-07:00 lizard kernel:  ? __pfx_worker_thread+0x10/0x10
2024-10-01T09:00:28-07:00 lizard kernel:  kthread+0xcf/0x100
2024-10-01T09:00:28-07:00 lizard kernel:  ? __pfx_kthread+0x10/0x10
2024-10-01T09:00:28-07:00 lizard kernel:  ret_from_fork+0x31/0x50
2024-10-01T09:00:28-07:00 lizard kernel:  ? __pfx_kthread+0x10/0x10
2024-10-01T09:00:28-07:00 lizard kernel:  ret_from_fork_asm+0x1a/0x30
2024-10-01T09:00:28-07:00 lizard kernel:  </TASK>
2024-10-01T09:00:28-07:00 lizard kernel: ---[ end trace 0000000000000000 ]---

I think that is a red herring, though. The warning is also present in the logs for the current boot, which has not failed yet.

Sometimes I plug in a TV via HDMI; I may suspend with the HDMI cable connected and resume with the cable disconnected. I haven’t been able to induce a failure by doing that, though.

I don’t think suspend to ram is really supported.
Something called “suspend to idle” is supported.

I don’t think suspend to ram is really supported.
Something called “suspend to idle” is supported.

Yes, I agree. This can be checked via:

$ cat /sys/power/mem_sleep 
[s2idle]

The kernel documentation describes these.

https://www.kernel.org/doc/Documentation/power/states.txt

Userspace triggers a “mem_sleep” via writing mem to /sys/power/state and the kernel chooses the actual mechanism depending on /sys/power/mem_sleep. Since s2idle is the only available value on this laptop, suspend-to-idle is what is used.

The s2ram utility does this; I checked via strace and I see it in the source:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-utils.git/tree/s2ram-main.c

In other words, s2ram triggers whatever kind of “mem” sleep the kernel is configured to use, same as other utilities, e.g. amd_s2idle.py:

I see now that the package for s2ram is no longer maintained in Debian, so I’m switching to systemctl suspend anyway, but I don’t think that should matter for this problem, since the actual suspend is handled by the kernel.

Thanks,
Corey

I had another resume failure today.

I had been using the laptop on battery for a short while–maybe 20 minutes. Then I suspended it and plugged it in. A bit later, I unplugged it, then plugged it in again. Then I came back about 15 minutes later and found the laptop was warm. I checked the power draw at the wall: 16.3 +/- 0.3 W. The laptop was unable to resume.

Normal suspend-to-idle power draw, when the battery is charged, seems to be 1.3 +/- 0.3 W.

Suspecting the failure had something to do with unplugging and replugging power, I did about 5 unplug/replug cycles while the laptop was suspended. This did induce another failure. Unfortunately, I was not able to repeat this again on two more attempts, so either the failure was a coincidence or I have not yet found the right conditions.

Both failures today were after suspending via systemctl suspend (nothing to do with the s2ram utility).

-Corey