[RESPONDED] Iwlwifi failure on latest linux-oem-22.04c (6.5.0-1019-oem)

I’ve been happily running Linux Mint Cinnamon 21 with linux-oem-22.04c kernels for more than a year. Until this morning, 6.1.0-1036-oem was the latest in this series.

This morning a new kernel in the series popped up in the Update Manager: 6.5.0-1019-oem. I installed it. After rebooting I had no wifi. I rummaged around in the kern.log:

Apr  9 05:24:30 canephora kernel: [    0.000000] Linux version 6.5.0-1019-oem (buildd@bos03-amd64-049) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #20-Ubuntu SMP PREEMPT_DYNAMIC Mon Mar 18 17:38:55 UTC 2024 (Ubuntu 6.5.0-1019.20-oem 6.5.13)
Apr  9 05:24:30 canephora kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.5.0-1019-oem root=/dev/mapper/sysvg-root ro quiet splash mem_sleep_default=deep tpm_tis.interrupts=0 usb-storage.quirks=059f:105e:u vt.handoff=7
...
Apr  9 05:24:30 canephora kernel: [    8.636175] Intel(R) Wireless WiFi driver for Linux
Apr  9 05:24:30 canephora kernel: [    8.636403] iwlwifi 0000:a6:00.0: enabling device (0000 -> 0002)
Apr  9 05:24:30 canephora kernel: [    8.642530] iwlwifi 0000:a6:00.0: Detected crf-id 0x400410, cnv-id 0x400410 wfpm id 0x80000000
Apr  9 05:24:30 canephora kernel: [    8.642586] iwlwifi 0000:a6:00.0: PCI dev 2725/0024, rev=0x420, rfid=0x10d000
Apr  9 05:24:30 canephora kernel: [    8.648499] iwlwifi 0000:a6:00.0: api flags index 2 larger than supported by driver
Apr  9 05:24:30 canephora kernel: [    8.648515] iwlwifi 0000:a6:00.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.0.2.41
Apr  9 05:24:30 canephora kernel: [    8.649217] iwlwifi 0000:a6:00.0: loaded firmware version 83.e8f84e98.0 ty-a0-gf-a0-83.ucode op_mode iwlmvm
Apr  9 05:24:30 canephora kernel: [    8.799424] iwlwifi 0000:a6:00.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
Apr  9 05:24:30 canephora kernel: [    8.808505] iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
...
Apr  9 05:24:30 canephora kernel: [    9.874243] ------------[ cut here ]------------
Apr  9 05:24:30 canephora kernel: [    9.874248] Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
Apr  9 05:24:30 canephora kernel: [    9.874286] WARNING: CPU: 13 PID: 810 at drivers/net/wireless/intel/iwlwifi/pcie/trans.c:2195 __iwl_trans_pcie_grab_nic_access+0x192/0x1a0 [iwlwifi]
Apr  9 05:24:30 canephora kernel: [    9.874320] Modules linked in: btusb btrtl uvcvideo btbcm videobuf2_vmalloc btintel uvc btmtk videobuf2_memops videobuf2_v4l2 bluetooth videodev videobuf2_common ecdh_generic mc ecc snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils intel_uncore_frequency snd_soc_hdac_hda snd_hda_codec_hdmi intel_uncore_frequency_common snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation snd_hda_codec_idt soundwire_bus snd_hda_codec_generic ledtrig_audio snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg x86_pkg_temp_thermal snd_intel_sdw_acpi snd_hda_codec intel_powerclamp iwlmvm(+) snd_hda_core coretemp snd_hwdep snd_pcm snd_seq_midi kvm_intel mac80211 snd_seq_midi_event libarc4 snd_rawmidi kvm snd_seq processor_thermal_device_pci pmt_telemetry snd_seq_device cmdlinepart iwlwifi irqbypass mei_pxp mei_hdcp processor_thermal_device pmt_class
Apr  9 05:24:30 canephora kernel: [    9.875020] iwlwifi 0000:a6:00.0: iwlwifi transaction failed, dumping registers

I rebooted the previous (6.1.0-1036-oem) kernel. Everything was peachy. The kern.log told a significantly different story:

Apr  9 05:30:24 canephora kernel: [    0.000000] Linux version 6.1.0-1036-oem (buildd@lcy02-amd64-078) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Mar 11 17:32:20 UTC 2024
Apr  9 05:30:24 canephora kernel: [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.1.0-1036-oem root=/dev/mapper/sysvg-root ro quiet splash mem_sleep_default=deep tpm_tis.interrupts=0 usb-storage.quirks=059f:105e:u
...
Apr  9 05:30:24 canephora kernel: [    9.032479] Intel(R) Wireless WiFi driver for Linux
Apr  9 05:30:24 canephora kernel: [    9.032703] iwlwifi 0000:a6:00.0: enabling device (0000 -> 0002)
Apr  9 05:30:24 canephora kernel: [    9.041424] iwlwifi 0000:a6:00.0: api flags index 2 larger than supported by driver
Apr  9 05:30:24 canephora kernel: [    9.041446] iwlwifi 0000:a6:00.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.0.2.36
Apr  9 05:30:24 canephora kernel: [    9.042090] iwlwifi 0000:a6:00.0: loaded firmware version 72.a764baac.0 ty-a0-gf-a0-72.ucode op_mode iwlmvm
Apr  9 05:30:24 canephora kernel: [    9.137441] iwlwifi 0000:a6:00.0: Detected Intel(R) Wi-Fi 6 AX210 160MHz, REV=0x420
Apr  9 05:30:24 canephora kernel: [    9.294698] iwlwifi 0000:a6:00.0: loaded PNVM version e28bb9d7
Apr  9 05:30:24 canephora kernel: [    9.306366] iwlwifi 0000:a6:00.0: Detected RF GF, rfid=0x10d000
Apr  9 05:30:24 canephora kernel: [    9.377707] iwlwifi 0000:a6:00.0: base HW address: bc:09:1b:f3:47:ba
Apr  9 05:30:24 canephora kernel: [    9.420141] iwlwifi 0000:a6:00.0 wlp166s0: renamed from wlan0

Not sure where to go from here. I prefer a stable, fully functioning system to having a kernel with the highest version number. I have no insight as to why 22.04c changed from 6.1 to 6.5. It appears that 22.04c reached EOL on 1 April. It has been automagically made synonymous with 22.04d.

I don’t have much time or patience for faffing around. My FW13 is a tool not a toy. Any insights or suggestions consistent with this will be appreciated.

$ sudo inxi
CPU: 12-core (4-mt/8-st) 12th Gen Intel Core i7-1260P (-MST AMCP-)
speed/min/max: 400/400/4700:3400 MHz Kernel: 6.1.0-1036-oem x86_64 Up: 49m
Mem: 3123.5/31800.5 MiB (9.8%) Storage: 1.83 TiB (28.0% used) Procs: 403
Shell: Sudo inxi: 3.3.13
$ sudo inxi -N
Network:
  Device-1: Intel Wi-Fi 6 AX210/AX211/AX411 160MHz driver: iwlwifi

Thanks in advance.

Dino

edit: updated the reason for the change from 6.1 to 6.5.

Yes, OEM C has been updated to match OEM D.

We will be testing this against Ubuntu 22.04 as we do not whatsoever, test against Mint due to which distros we officially test against and support. But this may be a kernel regression in general, so we will be testing this against 22.04 just in case.

@Loell_Framework can you see if you can repro with Ubuntu 22.04.3 to see if this is a larger regression or not.

Loell, if you can repro, let me know so we can file a bug report.

1 Like

No wifi drops on my end, connection is still running stable with kernel 6.5.0-1019-oem

nmcli con show --active
NAME UUID TYPE DEVICE
PLDTHOMEFIBR5GSw8Z9 6e48ded7-8871-4037-ba3c-c863f0e9f120 wifi wlp166s0
br-98a0c18c4a4f e4d170b9-4802-4496-9415-7c174d3d2029 bridge br-98a0c18c4a4f
docker0 871c7e61-10b4-4b3e-85ae-4a311bf84b5d bridge docker0
loell@loell-Laptop-12th-Gen-Intel-Core:~$ uname -r
6.5.0-1019-oem

Thanks @Loell_Framework (and @Matt_Hartley). I think whatever caused the failure on my FW13 is not as clear cut as Loell’s post suggests.

My own research (see links below) has returned several diffuse and inconclusive reports of similar failures associated with 6.5 kernels. One of the theories advanced was that the cause lay within specific versions of the Intel iwlwifi driver. A contributor speculated that the newer driver that was loaded when the kernel transitioned from 6.1 to 6.5 left the adapter in an invalid state which was resolved by a subsequent boot into the same 6.5 kernel.

This seemed like a simple experiment to conduct and I can confirm that after two successive boots of 6.5.0-1019-oem the wifi NIC started working and has continued to do so.

Examination of kernel logs from the last couple of weeks shows that while the NIC appears to work (at the macro level) there are many errors being logged and exceptions being thrown by 6.5 that were not present while running 6.1. This suggests while the wifi sub-system is sufficiently resilient that it can recover from these errors, at the micro level performance is being compromised.

Here’s an example of a couple of the recurrent messages in the kernel log that were not present while running 6.1. I had to obfuscate the name of the log file because the slashes in the path kept breaking the Discourse composer. The actual files are in /var/log and are named kern.log*.

$ grep -E 'CPU: [0-9]+ PID: [0-9]+ Comm: irq/[0-9]+-iwlwifi Tainted|iwlwifi \S+: WRT: Invalid buffer destination' kernel-log{,.1}
kernel-log:Apr  9 05:24:30 canephora kernel: [    8.808505] iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
...
kernel-log:Apr  9 16:35:36 canephora kernel: [   15.886053] iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
kernel-log:Apr  9 17:25:37 canephora kernel: [ 3017.700038] CPU: 9 PID: 770 Comm: irq/185-iwlwifi Tainted: P           O       6.5.0-1019-oem #20-Ubuntu
...
kernel-log:Apr  9 17:29:33 canephora kernel: [ 3253.801451] CPU: 6 PID: 769 Comm: irq/184-iwlwifi Tainted: P           O       6.5.0-1019-oem #20-Ubuntu
...
kernel-log:Apr  9 17:30:25 canephora kernel: [ 3305.667686] CPU: 5 PID: 777 Comm: irq/192-iwlwifi Tainted: P           O       6.5.0-1019-oem #20-Ubuntu
...
kernel-log:Apr 10 05:36:19 canephora kernel: [10984.748045] iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
kernel-log:Apr 10 06:19:34 canephora kernel: [11922.757816] iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
kernel-log:Apr 10 10:29:36 canephora kernel: [21950.149090] iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
kernel-log:Apr 11 06:39:01 canephora kernel: [54512.966938] iwlwifi 0000:a6:00.0: WRT: Invalid buffer destination
$ who -b
         system boot  2024-04-09 16:35

Those messages are “signatures” that are accompanied by a burst of related messages that imply departure from expected behaviour and evasive action being taken. NOTE: I have not inspected the source. The messages might be unimportant.

I’ve documented this for completeness in case other community members encounter similar problems. I don’t expect Framework to expend resources debugging it further (although if you have any influence over the Intel devs perhaps you could encourage them to look at it :wink:).

I’ll come back and update this thread if the situation changes.

Dino

References
launchpad: https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-6.5/+bug/2049195
kernel dot org: https://lore.kernel.org/lkml/c64ce498-7c06-3726-47d5-0a74471f027b@gmail.com/
intel: https://community.intel.com/t5/Wireless/Ubuntu-23-10-AX210-Microcode-SW-error-detected-Restarting-0x0/td-p/1552243
arch linux (including a Framework machine): https://bbs.archlinux.org/viewtopic.php?id=288765

The search terms I used to surface these and many others: "iwlwifi" "WRT: Invalid buffer destination" "AX210"

While fixing a suspend / hibernation issue I also had an iwlwifi WRT: Invalid buffer destination every time on boot or wake up from suspend and hibernation in dmesg. My wifi works fine. The problem with suspend and hibernate were the USB adapters. I removed them, shut down, waited three hours (incidentally because suddenly a friend decided to visit us) and then cleaned and reinserted the adapters and the over-current condition disappeared and suspend and hibernate work again. iwlwifi keeps erroring.

It’s very confusing.

Perhaps it’s just noise? The problem is elsewhere?