[TRACKING] Fedora 35 kernel 5.16.5 s2idle and maybe wifi issues

Just installed Fedora’s 5.16.5 kernel from updates-testing, and noticed several regressions:

  • s2idle/modern standby:

no longer puts the machine to sleep properly. The display backlight switches off, but the keyboard backlight stays on. I think the machine doesn’t enter sleep either.

Pressing the power button does wake the machine up, but it’s flaky for about a minute (apparent display freezing). After that it seems to run ok. But, again, no modern standby it seems.

  • iwlwifi stack traces:
[   55.035075] iwlwifi 0000:aa:00.0: Error sending ECHO_CMD: time out after 2000ms.
[   55.035085] iwlwifi 0000:aa:00.0: Current CMD queue read_ptr 151 write_ptr 152
[   55.036510] iwlwifi 0000:aa:00.0: HCMD_ACTIVE already clear for command ECHO_CMD
[   55.036565] iwlwifi 0000:aa:00.0: Start IWL Error Log Dump:
[   55.036569] iwlwifi 0000:aa:00.0: Transport status: 0x0000004A, valid: 6
[   55.036575] iwlwifi 0000:aa:00.0: Loaded firmware version: 67.8f59b80b.0 ty-a0-gf-a0-67.ucode
...

followed by

[   55.745194] ------------[ cut here ]------------
[   55.745195] WARNING: CPU: 4 PID: 1498 at drivers/net/wireless/intel/iwlwifi/mvm/../iwl-trans.h:1310 iwl_mvm_wait_sta_queues_empty+0x85/0xb0 [iwlmvm]
[   55.745214] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer tun bnep nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast btusb nft_fib_inet nft_fib_ipv4 btrtl nft_fib_ipv6 btbcm nft_fib btintel bluetooth nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct ecdh_generic nft_chain_nat nf_nat nf_conntrack snd_hda_codec_hdmi nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr sunrpc snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof iwlmvm snd_soc_hdac_hda iTCO_wdt mei_hdcp snd_hda_ext_core mei_pxp ee1004 intel_pmc_bxt snd_soc_acpi_intel_match iTCO_vendor_support snd_soc_acpi soundwire_bus pmt_telemetry pmt_class intel_rapl_msr snd_soc_core snd_hda_codec_realtek mac80211 snd_hda_codec_generic ledtrig_audio snd_compress intel_tcc_cooling ac97_bus x86_pkg_temp_thermal snd_pcm_dmaengine intel_powerclamp coretemp snd_hda_intel snd_intel_dspcfg
[   55.745253]  libarc4 kvm_intel snd_intel_sdw_acpi snd_hda_codec kvm snd_hda_core snd_hwdep iwlwifi snd_seq irqbypass intel_cstate snd_seq_device intel_uncore snd_pcm cfg80211 snd_timer pcspkr wmi_bmof snd mei_me i2c_i801 i2c_smbus rfkill soundcore mei vfat fat joydev idma64 hid_sensor_als hid_sensor_trigger hid_sensor_iio_common industrialio_triggered_buffer processor_thermal_device_pci_legacy kfifo_buf processor_thermal_device industrialio thunderbolt processor_thermal_rfim intel_pmt processor_thermal_mbox processor_thermal_rapl intel_rapl_common intel_soc_dts_iosf igen6_edac int3403_thermal int340x_thermal_zone int3400_thermal acpi_pad acpi_thermal_rel zram ip_tables dm_crypt uas usb_storage hid_sensor_hub intel_ishtp_hid i915 hid_multitouch i2c_algo_bit ttm drm_kms_helper cec drm crct10dif_pclmul crc32_pclmul crc32c_intel nvme ghash_clmulni_intel ucsi_acpi typec_ucsi intel_ish_ipc nvme_core intel_ishtp typec serio_raw wmi i2c_hid_acpi i2c_hid video pinctrl_tigerlake ipmi_devintf
[   55.745297]  ipmi_msghandler fuse
[   55.745300] CPU: 4 PID: 1498 Comm: wpa_supplicant Not tainted 5.16.5-200.fc35.x86_64 #1
[   55.745303] Hardware name: Framework Laptop/FRANBMCP03, BIOS 03.07 12/14/2021
[   55.745304] RIP: 0010:iwl_mvm_wait_sta_queues_empty+0x85/0xb0 [iwlmvm]
[   55.745315] Code: 1f 00 85 c0 75 0b 48 83 c3 28 4c 39 eb 75 b5 31 c0 5b 5d 41 5c 41 5d 41 5e c3 0f 0b b8 f4 fd ff ff 5b 5d 41 5c 41 5d 41 5e c3 <0f> 0b 48 8b 7f 38 48 c7 c1 20 28 18 c1 48 c7 c2 d5 93 18 c1 31 f6
[   55.745317] RSP: 0018:ffffa5c0012578c0 EFLAGS: 00010293
[   55.745318] RAX: ffffffffc0cfdd20 RBX: ffff963da92f8be8 RCX: 0000000000000000
[   55.745320] RDX: 0000000000000001 RSI: 0000000000000005 RDI: ffff963dd20e8028
[   55.745321] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000000
[   55.745322] R10: 0000000000000000 R11: 0000000000000000 R12: ffff963da92f8bd4
[   55.745322] R13: ffff963da92f8d50 R14: ffff963d8491a048 R15: 0000000000000000
[   55.745323] FS:  00007f457af7d7c0(0000) GS:ffff96411fb00000(0000) knlGS:0000000000000000
[   55.745325] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.745326] CR2: 00007fa59c2ab000 CR3: 00000001059ba002 CR4: 0000000000770ee0
[   55.745328] PKRU: 55555554
[   55.745328] Call Trace:
[   55.745331]  <TASK>
[   55.745334]  iwl_mvm_mac_flush+0xf0/0x2e0 [iwlmvm]
[   55.745344]  __ieee80211_flush_queues+0xab/0x240 [mac80211]
[   55.745383]  ieee80211_set_disassoc+0x46b/0x580 [mac80211]
[   55.745417]  ieee80211_mgd_deauth.cold+0x49/0x1f3 [mac80211]
[   55.745456]  cfg80211_mlme_deauth+0x9d/0x1b0 [cfg80211]
[   55.745493]  nl80211_deauthenticate+0xd8/0x120 [cfg80211]
[   55.745517]  genl_family_rcv_msg_doit+0xca/0x110
[   55.745522]  genl_rcv_msg+0xce/0x1c0
[   55.745524]  ? nl80211_disassociate+0x120/0x120 [cfg80211]
[   55.745547]  ? genl_get_cmd+0xd0/0xd0
[   55.745549]  netlink_rcv_skb+0x4e/0xf0
[   55.745551]  genl_rcv+0x24/0x40
[   55.745554]  netlink_unicast+0x20e/0x330
[   55.745556]  netlink_sendmsg+0x23f/0x480
[   55.745558]  sock_sendmsg+0x5b/0x60
[   55.745562]  ____sys_sendmsg+0x22c/0x270
[   55.745563]  ? import_iovec+0x17/0x20
[   55.745567]  ? sendmsg_copy_msghdr+0x59/0x90
[   55.745569]  ? __check_object_size+0x46/0x150
[   55.745574]  ___sys_sendmsg+0x81/0xc0
[   55.745576]  ? ___sys_recvmsg+0x86/0xe0
[   55.745578]  ? avc_has_perm+0x77/0x170
[   55.745581]  ? restore_sigcontext+0x14e/0x190
[   55.745585]  ? sock_has_perm+0x84/0xa0
[   55.745587]  __sys_sendmsg+0x49/0x80
[   55.745590]  do_syscall_64+0x38/0x90
[   55.745593]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   55.745596] RIP: 0033:0x7f457b409327
[   55.745598] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
[   55.745599] RSP: 002b:00007ffd1aa30f68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[   55.745601] RAX: ffffffffffffffda RBX: 000055c5308d15c0 RCX: 00007f457b409327
[   55.745602] RDX: 0000000000000000 RSI: 00007ffd1aa30fa0 RDI: 0000000000000009
[   55.745603] RBP: 000055c5308d34e0 R08: 0000000000000004 R09: 0000000000000000
[   55.745604] R10: 00007ffd1aa31080 R11: 0000000000000246 R12: 000055c5309138c0
[   55.745605] R13: 00007ffd1aa30fa0 R14: 000055c52f0b82c0 R15: 0000000000000000
[   55.745608]  </TASK>
[   55.745608] ---[ end trace 533aeec4381c8a64 ]---

wifi still keeps working apparently.

  • some dmesg logspam I have not noticed with previous kernels:
[   88.482938] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
[   92.938966] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
[   97.355112] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
[  101.771160] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?

What’s usb2-port2 here?

$ lsusb
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 004: ID 27c6:609c Shenzhen Goodix Technology Co.,Ltd. Goodix USB2.0 MISC
Bus 003 Device 003: ID 090c:3350 Silicon Motion, Inc. - Taiwan (formerly Feiya Technology Corp.) USB DISK
Bus 003 Device 005: ID 8087:0032 Intel Corp. AX210 Bluetooth
Bus 003 Device 002: ID 32ac:0002 Framework HDMI Expansion Card
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

@Kieran_Levin, and everyone, any idea if this is known/tracked in upstream bug trackers - or as a known Framework firmware issue maybe?

Edit: I’ve opened a fedora bug

1 Like

Note that S3 sleep (sudo grubby --update-kernel=ALL --args=mem_sleep_default=deep, then reboot to take effect) appears to work so far.

Just updated to 5.16.5 on arch, I’ll see if there’s a similar regression here or if it’s Fedora specific. Standby for an edit here …

Unfortunately the kernel 5.16.5 will be pushed to the Fedora 35 stable repository. People using Fedora 35 can care not to upgrade the kernel to 5.16.5.
https://bodhi.fedoraproject.org/updates/FEDORA-2022-57fd391bf8#comment-2394732

https://bodhi.fedoraproject.org/updates/FEDORA-2022-57fd391bf8#comment-2394715

I am aware of the Framework issue, but this fixes a lot more users than it breaks, and contains security updates so I have decided to push. Framework has a small enough deployed hardware footprint that I haven’t even seen it mentioned upstream or elsewhere just yet. Hopefully we can get the framework issues resolved soon, but it is going to require someone with the actual hardware doing some testing/debugging.

Above comment on Fedora community teaches me when a device such as Framework Laptop, is a minor in a kernel community or upstream community, this kind of regression bug can happen more. Because it is not checked on the community. Promoting Framework Laptop more to kernel or upstream community is the way to prevent this kind of bug on Framework Laptop. Because this kind of bug is checked more before a patch will be merged to the source repository.

And as a reference for some people who want stability, if you don’t want to see this kind of regression bug, and want to see more stability, using an old stable version (the latest stable version -1 or -2), not using the latest stable version is a way to avoid this kind of bug.

@D.H any luck with the Arch kernel?

Edit: I did a bit of testing with both older 5.16 series and a 5.17 release candidate, and this looks like something that was introduced during the 5.16 development cycle.

Also repro’s on Manjaro. Trying to bisect the kernel now. Will update with any results found.

Update: The regression appears to be due to something within: [GIT PULL] Power management updates for v5.16-rc1

Still working on figuring out exactly which commit(s)

1 Like

Thanks! You meant you are trying to do git bisect for the kernel source to find a causing commit?

Yeah and holy smokes does this take a long time!

I am experiencing the same problem and same time I am a little bit concerned about Fedora releasing kernel where this kind of bugs is reported. It’s not just our laptop.

@Anil_Kulkarni Thanks for your time trying to bisect it!

Nice. So, could you create a small reproducing script to fail on the kernel 5.16.5, and pass the kernel 5.15? The test case is not included in the unit tests in kernel? I am curious to know which git repo in https://git.kernel.org/ is related to this issue.

Somewhat shockingly… the bisect points to: ACPICA: Add support for Windows 2020 _OSI string

And indeed, reverting this patch on-top of 5.16 fixes backlight at least. @ dimitris, if you wanted to try compiling a kernel with git revert 3bf70bd2538f0515ce17b1c067889ff0e4fec842 and see if it fixes your issues.

Since it’s the EOW, I’ll wait until Monday to create a bugzilla report over this

4 Likes

@Anil_Kulkarni nice work! I’ll try to build a Fedora kernel from source with that revert somehow included, but it’s something I’ve never done before so it may take me a while to get it done.

FWIW I tried to quickly play with the acpi_osi= kernel command line parameter (Linux, Windows, Windows 2019) but it didn’t make a difference.

I also reproduced the difference in behavior with lid vs power button: The lid does seem to put the laptop into s2idle sleep successfully. I still would prefer to use the power button primarily because there isn’t (yet) a way to visually confirm the laptop has in fact entered sleep. I believe this has been discussed in the EC thread(s) re: modifying LED behavior.

@Kieran_Levin this combination of “behavior deltas” (kernel adding a “reported capability” to ACPI and difference between power button vs lid sensor) point to some underlying EC/BIOS issue?

Update: Apparently it’s possible to tell the kernel to mask _OSI entries; no git revert needed.

After adding "acpi_osi=!Windows 2020" to the kernel cmdline args:

sudo grubby --update-kernel=vmlinuz-5.16.8-200.fc35.x86_64 --args='"acpi_osi=!Windows 2020"'

(edit: or sudo grubby --update-kernel=ALL --args='"acpi_osi=!Windows 2020"' to make it the default going forward)

and rebooting, I can now get into s2idle sleep with the power button. So it seems that telling the Insyde BIOS (3.07 anyway) that we “support/are” Windows 2020 makes it do Bad Things.

Background:
kernel docs: ACPI _OSI and _REV methods — The Linux Kernel documentation
and this for the “negate entry” hint: Linux: ACPI: Fix problems with Suspend, Resume, and Missing devices using acpi_osi=

7 Likes

And one quick followup on the wifi issues: Those do persist, so they don’t seem to be a side effect of S3 sleep but rather an actual regression with the kernel/Intel wireless driver. I believe I’ve seen something in a thread here about that being worked on in 5.17 or 5.18, but I’m not sure… (edit: 5.17)

Anyway, the wireless network still does come back after a momentary/mildly annying freeze on resume.

Ah thanks for digging in - this makes more sense. Just from reading the patch I was very confused how Windows was affecting standby. But now it makes sense because it’s telling the bios “Yes, we’re windows 2020”

Here’s the key snippet:

Linux had no choice but to also return TRUE to _OSI(“Windows 2001”) and its successors. To do otherwise would virtually guarantee breaking a BIOS that has been tested only with that _OSI returning TRUE.
This strategy is problematic, as Linux is never completely compatible with the latest version of Windows, and sometimes it takes more than a year to iron out incompatibilities.

so I take it we’ll need yet-another-bios update to fix this eventually. In the mean time we probably should get those kernel args added to the linux guides on this site

Hi @dimitris

I had a bunch of other things to focus on but I finally can devote a few minutes to helping troubleshoot this. In EndeavourOS (arch based), with 5.16.8:

  1. Seeing the same keyboard backlight remaining on behavior for s2idle. It does take a power button push to wake it up, not a keyboard keypress, so I know it made it to some kind of “sleep”. I prefer deep by far though, so I’m going back to that.

  2. No issues resuming wifi from s2idle or from deep

  3. None of your iwlwifi errors show up in my dmesg.

Here’s dmesg at boot:

$ sudo dmesg | grep -i wifi
[    3.469737] Intel(R) Wireless WiFi driver for Linux
[    3.470023] iwlwifi 0000:aa:00.0: enabling device (0000 -> 0002)
[    3.512379] iwlwifi 0000:aa:00.0: api flags index 2 larger than supported by driver
[    3.512396] iwlwifi 0000:aa:00.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.0.2.34
[    3.512635] iwlwifi 0000:aa:00.0: loaded firmware version 67.8f59b80b.0 ty-a0-gf-a0-67.ucode op_mode iwlmvm
[    3.654393] iwlwifi 0000:aa:00.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
[    3.660897] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 1, ret=-1
[    3.660900] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 2, ret=-1
[    3.660901] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 3, ret=-1
[    3.828038] iwlwifi 0000:aa:00.0: loaded PNVM version dda57f4f
[    3.843264] iwlwifi 0000:aa:00.0: Detected RF GF, rfid=0x10d000
[    3.912446] iwlwifi 0000:aa:00.0: base HW address: f4:46:37:ca:8b:bd
[    4.004239] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 1, ret=-1
[    4.004242] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 2, ret=-1
[    4.004243] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 3, ret=-1
[    4.297471] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 1, ret=-1
[    4.297478] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 2, ret=-1
[    4.297480] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 3, ret=-1

and I get this after every resume from either s2idle or from deep:

$ sudo dmesg | grep -i wifi
[ 3493.683355] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 1, ret=-1
[ 3493.683362] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 2, ret=-1
[ 3493.683364] iwlwifi 0000:aa:00.0: WRT: Failed to set DRAM buffer for alloc id 3, ret=-1
  1. cannot reproduce your usb issues / messages. Is there a particular device that inserting/removing changes your behavior? Have you checked internall for EMI sticker issues, melted covers, etc?

@D.H I’ve checked inside the laptop, no melt/overhead issues, and the EMI stickers seem ok (it’s a batch 3 build, the KB article implies this was an issue only in batches 1 and 2).

About the iwlwifi stack traces: I’ve opened a fedora issue with pretty much the same issue, separate from the now worked-around s2idle one.

The Intel wireless firmware version in your dmesg is the same I have. This could be a difference in the patches that EndeavourOS carries in the kernel vs Fedora, or even the particular wireless network. Since it doesn’t reproduce 100% for me, it’s worth keeping an eye on it. In my case it’s roughly one in three attempts.

In an ideal world, it should be the company itself submitting patches/updates Upstream for its own hardware. I guess, but I may be wrong, that in this case the regression is due to Intel rather than to Framework, and that can be proved by replacing the wi-fi card with something else (not Intel, possibly) and see how it goes.
As per the USB error logs, it might as well be an hardware issue (possibly in the expansion card), but I hope I’m wrong.

The plot sickens. It seems plasma shell is unable to make the laptop sleep in s2idle nor in deep mode lately. Suspending via systemctl call works, but not clicking “sleep” in the GUI, closing the lid, etc.

Hibernate works from GUI and from command line as expected.

Anybody else seeing similar?

5.16.8 in arch derivative, not Fedora, but still…