[RESPONDED] WiFi driver failing randomly [iwlwifi] (12th Gen, Fedora 37)

Whilst swapping out the SSD (for another project) inside the Framework I did ensure the antennas and WiFi card were snug and secure, but I have not yet fully disconnected and reconnected them. The cables on the right hinge seem to be under quite a lot of strain and I’m wondering if the antenna goes through the hinge, and if so, if it could have become damaged.

I applied your powersave suggestion a few messages ago. If I have missed something else, please let me know.

1 Like

However I’m not sure how a damaged antenna could generate the errors you got in dmesg… I would rather think it is the wifi card itself…

My WiFi has been mostly working over the last few months, however there have been instances where the wifi adapter has failed and not been displayed in the UI. I didn’t properly log these occurrences so cannot remember if they were all boot time, waking from suspend, mid-usage, or a mix of those.

I reseated and re-cable managed the WiFi chip and its antennas a few weeks ago, and it has all been mostly working flawlessly since then.

I’m currently running kernel 6.4.4-200.fc38.x86_64 with the only recent change to kernel args being the addition of tpm_tis.interrupts=0 and the removal of rhgb quiet.

WiFi worked on my first 1-2 boots of this kernel, but on my third and subsequent boots I saw errors during bootup, or there would be no errors and the adapter would be found but a few seconds after login there would be a few seconds of system freeze followed by the adapter disappearing.

I used to be able to boot into kernel 6.4.4 following the errors but now it gets stuck there. I can boot with kernel 6.3.12, but the wifi adapter is not found

The wifi adapter is also absent when booting a fedora liveboot usb on kernel version 6.0.7.

Pictures:

Here is some dmesg output from the liveusb:

[   12.959588] Bluetooth: hci0: Firmware Version: 129-28.22
[   13.021044] Intel(R) Wireless WiFi driver for Linux
[   13.021108] iwlwifi 0000:a8:00.0: enabling device (0000 -> 0002)
[   13.021270] iwlwifi 0000:a8:00.0: HW_REV=0xFFFFFFFF, PCI issues?
[   13.021279] iwlwifi: probe of 0000:a8:00.0 failed with error -5
[   13.815111] [drm] Initialized i915 1.6.0 20201103 for 0000:00:02.0 on minor 1internel

After rebooting out of the liveusb (into my internal drive) on kernel 6.4.4, the wifi adapter is found and working. EDIT 15 minutes later: there was a system freeze, i managed to move to another tty session and back after about 30 seconds, but wifi was no longer working. Another freeze occurred when I attempted to toggle wifi off and on again. I managed to move to another tty which printed loads of stack traces, I got a picture of what was currently on the screen.

After rebooting, the wifi adapter UI looked good, but freezes were occurring and wifi was not working. Here’s dmesg:
larger dmesg output

[  195.230687] WARNING: CPU: 2 PID: 155 at drivers/net/wireless/intel/iwlwifi/mvm/../iwl-trans.h:1383 iwl_mvm_wait_sta_queues_empty+0x92/0xc0 [iwlmvm]
...
[  195.231317]  ret_from_fork+0x29/0x50
[  195.231319]  </TASK>
[  195.231320] ---[ end trace 0000000000000000 ]---
[  195.231321] iwlwifi 0000:a8:00.0: iwl_trans_wait_txq_empty bad state = 0
[  195.231331] wlp168s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-5)
[  195.231360] wlp168s0: failed to remove key (4, ff:ff:ff:ff:ff:ff) from hardware (-5)
[  195.243497] ------------[ cut here ]------------
[  195.243509] WARNING: CPU: 2 PID: 155 at net/mac80211/driver-ops.c:39 drv_stop+0xf5/0x100 [mac80211]
[  195.243665] Modules linked in: tun dummy uinput rfcomm...
...
[  195.243896] CPU: 2 PID: 155 Comm: kworker/2:2 Tainted: G        W    L     6.4.4-200.fc38.x86_64 #1
[  195.243901] Hardware name: Framework Laptop (12th Gen Intel Core)/FRANMACP06, BIOS 03.05 08/23/2022
[  195.243907] Workqueue: events_freezable ieee80211_restart_work [mac80211]
[  195.243928] RIP: 0010:drv_stop+0xf5/0x100 [mac80211]
...
[  195.565058] usb 3-9: reset full-speed USB device number 3 using xhci_hcd
[  195.813175] usb 3-9: reset full-speed USB device number 3 using xhci_hcd

After rebooting again (same latest kernel), the wifi option from the UI has simply disappeared again, no freezes, but also no wifi.

I then booted a known good Windows drive which has had working wifi with this machine in the past, but this time it didn’t:

I then reseated the wifi card and booted linux, and wifi worked.

It would seem the issue is unrelated to the kernel version/args, and could be an intermittent hardware failure that is resolved by either leaving the system alone long enough to fully power down, or by disconnecting and reconnecting the wifi chip.

I see “HW failure” in some of the logs, does my Framework simply have a faulty WiFi chip? I have been experiencing these issue intermittently for months.

Please advise.

This feels like it could likely be a faulty card. If this continues, do open a ticket and link to this thread for context as to what you’ve done with troubleshooting.

Framework Support shipped me a replacement WiFi card which I installed and have been using for about a day. :tada:

Due to the intermittent nature of my issue, only time will tell if it has been resolved, although I’m optimistic.

However, a few minutes ago my laptop unexpectedly went into suspend whilst I was typing away (60% battery, charging). I noticed iwlwifi in the system journal, so I’ll share the logs below for documentation purposes, although I currently don’t think much of it.

16:20:26 h kernel: Filesystems sync: 0.032 seconds
16:20:26 h wireplumber[2664]: 0x55ebd40556a8: error 24
16:20:35 h kernel: Freezing user space processes
16:20:35 h kernel: Freezing user space processes completed (elapsed 0.003 seconds)
16:20:35 h kernel: OOM killer disabled.
16:20:35 h kernel: Freezing remaining freezable tasks
16:20:35 h kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
16:20:35 h kernel: printk: Suspending console(s) (use no_console_suspend to debug)
16:20:35 h kernel: PM: suspend devices took 0.299 seconds
16:20:35 h kernel: ACPI: EC: interrupt blocked
16:20:35 h kernel: ACPI: EC: interrupt unblocked
16:20:35 h kernel: iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
16:20:35 h kernel: i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin version 70.5.1
16:20:35 h kernel: i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin version 7.9.3
16:20:35 h kernel: i915 0000:00:02.0: [drm] GT0: HuC: authenticated!
16:20:35 h kernel: i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
16:20:35 h kernel: i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
16:20:35 h kernel: i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
16:20:35 h kernel: iwlwifi 0000:a6:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x1f
16:20:35 h kernel: iwlwifi 0000:a6:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
16:20:35 h kernel: iwlwifi 0000:a6:00.0: WFPM_AUTH_KEY_0: 0x80
16:20:35 h kernel: iwlwifi 0000:a6:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
16:20:35 h kernel: PM: resume devices took 0.301 seconds
16:20:35 h kernel: OOM killer enabled.
16:20:35 h kernel: Restarting tasks ... 
16:20:35 h kernel: mei_hdcp 0000:00:16.0-x-x-x-uuid-redacted: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
16:20:35 h systemd-resolved[1296]: Clock change detected. Flushing caches

Intel cards can be quite verbose, which is fine. So these events may have indeed happened, but unless you’re experiencing something specific, I would not be concerned.

This would be something I would say is OS behavior and something is off. I’d grep through your logs to see if you can spot the trigger.

…or magnets. :thinking:

Dino

1 Like

Solid point right there. ^^

1 Like

My desktop PC (not a Framework) started acting strange last year. The SSD drive or the DVD drive would randomly disappear while the system was running. Other peculiar things would happen. Windows would crash. After trying a lot of things I discovered the CMOS battery was failing. I replaced the battery and all the strange occurrences stopped.

Hugh

Unfortunately whilst booting my Framework today iwlwifi errors were displayed before the desktop environment loaded and my system no longer detected the WiFi card.

Full dmesg log: MicroBin

dmesg sample:

[   18.462905] iwlwifi 0000:a6:00.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
[   18.464510] thermal thermal_zone13: failed to read out thermal zone (-61)
[   18.473028] iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
[   18.806448] RPC: Registered named UNIX socket transport module.
[   18.807826] RPC: Registered udp transport module.
[   18.809110] RPC: Registered tcp transport module.
[   18.810321] RPC: Registered tcp-with-tls transport module.
[   18.811527] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   19.523467] ------------[ cut here ]------------
[   19.524377] Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
[   19.525311] WARNING: CPU: 6 PID: 1231 at drivers/net/wireless/intel/iwlwifi/pcie/trans.c:2190 __iwl_trans_pcie_grab_nic_access+0x14a/0x150 [iwlwifi]


[   19.942792] WARNING: CPU: 6 PID: 1231 at drivers/net/wireless/intel/iwlwifi/iwl-trans.h:1493 iwl_fwrt_dump_lmac_error_log+0x50c/0x600 [iwlwifi]

[   19.948884] CPU: 6 PID: 1231 Comm: modprobe Tainted: G        W          6.5.5-200.fc38.x86_64 #1
[   19.949642] Hardware name: Framework Laptop (12th Gen Intel Core)/FRANMACP06, BIOS 03.06 11/10/2022

[   19.950383] RIP: 0010:iwl_fwrt_dump_lmac_error_log+0x50c/0x600 [iwlwifi]

[   19.993215] iwlwifi 0000:a6:00.0: HW error, resetting before reading

However, rebooting the laptop resolved the issue (I’m posting this from the Framework using WiFi). Re-seating the WiFi card was not necessary.

Very interesting, I’ll explore replacing my CMOS battery. How did you discover your CMOS battery was failing?

1 Like

hi @groundwork

This is with the new card right? does enabling and re-enabling the interface make a difference?

When I was having my Windows problems I did a lot of searches on the symptoms, Windows crash codes, etc. I found a message in an online forum, I forget which one, from a user with similar problems who said replacing the CMOS battery fixed his problems. I had tried everything else so I bought a new battery and inserted it, cleared the CMOS, loaded the defaults and like magic, all my problems went away.

Hugh

Yes, the replacement card which Framework support sent me has been installed for over a week and used daily without issue until this point.

Do you mean removing/adding the wifi modules from the kernel with modprobe?

for example:

# Remove modules
sudo modprobe -r iwlwifi
sudo modprobe -r iwlmvm

# Add modules
sudo modprobe iwlmvm
sudo modprobe iwlwifi

If so, I can try this the next time iwlwifi fails. However, with my old WiFi card this didn’t fix the issue and a system reboot or WiFi card re-seat was necessary to get WiFi back.

Thank you for the instructions

1 Like