[TRACKING] 12 gen Fedora 38 crashed on usb disconnect

Delighted to hear you have this sorted with 6.3.13 - this is not something I’ve experienced on the other kernels myself, but the important thing is 6.3.13 has not been an issue yet.

If it crops up again, please do share what device specifically seems to set it off. In the meantime, I’ll mark this as resolved.

1 Like

Just wanted to comment that this is not just a framework issue nor just a Fedora 38 issue. I had the same thing happen (very similar call stack) with Fedora 37 on a Thinkpad T14s.
It would take approx 24 hours before the bug would occur, but once it did a hard poweroff was required to fix.

For sure this occurred on 6.4.9-100.fc47, 6.4.10-100.fc47, 6.4.11-100.fc47, 6.4.12-100.fc47

I’m going back to 6.3.12-100.fc37 for now.

Appreciate this. I am putting together a list for the Fedora team, for you, this occurred on 37 with kernels listed. Noted and added to my running list.

@Kenny_Keslar As I track this, were you able to see any change using 6.4.13-100.fc37.x86_64

@Matt_Hartley I’m not sure - I just rebooted to that version to give it a try. When it happened in the past it took about a day to occur. I’ll try and get an update tomorrow - if I don’t reply by Wednesday night feel free to give me a poke. (going on vacation early wed morning so tomorrow night might be a bit hectic)

I appreciate you testing this. I have not been able to repro here, but I am on the later kernel mentioned.

Have an amazing vacation!

Hi,
not a framework user but wanted to report that I have the exact same crash on usb hub switching away from this machine.
Not sure when it started but fairly rececentyl, on 6.4.12 on arch-linux now (6.4.9 previously, I think I had a crash there as well).
Cannot upgrade to latest 6.4 unfortunately since the linux package have already moved on 6.5 (which I cannot use yet since it is not compatible with zfs).

Sep 15 13:19:59 johan-amd kernel: BUG: unable to handle page fault for address: ffffafe963344a80
Sep 15 13:19:59 johan-amd kernel: #PF: supervisor write access in kernel mode
Sep 15 13:19:59 johan-amd kernel: #PF: error_code(0x0002) - not-present page
Sep 15 13:19:59 johan-amd kernel: PGD 100000067 P4D 100000067 PUD 1001e7067 PMD 11b23e067 PTE 0
Sep 15 13:19:59 johan-amd kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Sep 15 13:19:59 johan-amd kernel: CPU: 0 PID: 2681961 Comm: kworker/0:2 Tainted: P           OE      6.4.12-arch1-1 #1 3e6fa2753a2d75925c34ecb78e22e85a65d083df
Sep 15 13:19:59 johan-amd kernel: Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 4802 06/15/2023
Sep 15 13:19:59 johan-amd kernel: Workqueue: usb_hub_wq hub_event
Sep 15 13:19:59 johan-amd kernel: RIP: 0010:power_supply_uevent+0xee/0x1d0
Sep 15 13:19:59 johan-amd kernel: Code: 75 4e 48 8b 13 48 83 7a 28 00 74 75 45 31 ff 31 c0 eb 10 48 8b 13 41 83 c7 01 49 63 c7 48 3b 42 28 73 5e 48 8b 52 20 8>
Sep 15 13:19:59 johan-amd kernel: RSP: 0018:ffffafe9452337b8 EFLAGS: 00010297
Sep 15 13:19:59 johan-amd kernel: RAX: 0000000000000003 RBX: ffff989f96299800 RCX: ffff98a148934000
Sep 15 13:19:59 johan-amd kernel: RDX: 00000000f0889610 RSI: 00000000c6808203 RDI: ffff989f96299800
Sep 15 13:19:59 johan-amd kernel: RBP: ffff989f96299838 R08: 0000000000000007 R09: ffff989f1aeaa308
Sep 15 13:19:59 johan-amd kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: ffff989e1aeaa000
Sep 15 13:19:59 johan-amd kernel: R13: 0000000000000000 R14: ffff98a148934000 R15: 0000000000000003
Sep 15 13:19:59 johan-amd kernel: FS:  0000000000000000(0000) GS:ffff98acaea00000(0000) knlGS:0000000000000000
Sep 15 13:19:59 johan-amd kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 13:19:59 johan-amd kernel: CR2: ffffafe963344a80 CR3: 000000016789c000 CR4: 0000000000750ef0
Sep 15 13:19:59 johan-amd kernel: PKRU: 55555554
Sep 15 13:19:59 johan-amd kernel: Call Trace:
Sep 15 13:19:59 johan-amd kernel:  <TASK>
Sep 15 13:19:59 johan-amd kernel:  ? __die+0x23/0x70
Sep 15 13:19:59 johan-amd kernel:  ? page_fault_oops+0x171/0x4e0
Sep 15 13:19:59 johan-amd kernel:  ? srso_alias_return_thunk+0x5/0x7f
Sep 15 13:19:59 johan-amd kernel:  ? exc_page_fault+0x175/0x180
Sep 15 13:19:59 johan-amd kernel:  ? asm_exc_page_fault+0x26/0x30
Sep 15 13:19:59 johan-amd kernel:  ? power_supply_uevent+0xee/0x1d0
Sep 15 13:19:59 johan-amd kernel:  ? power_supply_uevent+0x10d/0x1d0
Sep 15 13:19:59 johan-amd kernel:  ? srso_alias_return_thunk+0x5/0x7f
Sep 15 13:19:59 johan-amd kernel:  dev_uevent+0x112/0x2d0
Sep 15 13:19:59 johan-amd kernel:  kobject_uevent_env+0x294/0x680
Sep 15 13:19:59 johan-amd kernel:  power_supply_unregister+0x8e/0xa0
Sep 15 13:19:59 johan-amd kernel:  release_nodes+0x40/0xb0
Sep 15 13:19:59 johan-amd kernel:  devres_release_all+0x8c/0xc0
Sep 15 13:19:59 johan-amd kernel:  device_unbind_cleanup+0xe/0x70
Sep 15 13:19:59 johan-amd kernel:  device_release_driver_internal+0x1cc/0x200
Sep 15 13:19:59 johan-amd kernel:  bus_remove_device+0xc6/0x130
Sep 15 13:19:59 johan-amd kernel:  device_del+0x15c/0x3e0
Sep 15 13:19:59 johan-amd kernel:  ? __queue_work+0x1df/0x440
Sep 15 13:19:59 johan-amd kernel:  hid_destroy_device+0x4b/0x60
Sep 15 13:19:59 johan-amd kernel:  logi_dj_remove+0x9a/0x100 [hid_logitech_dj d43d018d7924207bf124eb09ddb07ddd68a2e21b]
Sep 15 13:19:59 johan-amd kernel:  hid_device_remove+0x47/0x90
Sep 15 13:19:59 johan-amd kernel:  device_release_driver_internal+0x19f/0x200
Sep 15 13:19:59 johan-amd kernel:  bus_remove_device+0xc6/0x130
Sep 15 13:19:59 johan-amd kernel:  device_del+0x15c/0x3e0
Sep 15 13:19:59 johan-amd kernel:  ? __queue_work+0x1df/0x440
Sep 15 13:19:59 johan-amd kernel:  hid_destroy_device+0x4b/0x60
Sep 15 13:19:59 johan-amd kernel:  usbhid_disconnect+0x47/0x60 [usbhid 7dfa265d9e4e418c5b40b99cb0a80cd663a1de29]
Sep 15 13:19:59 johan-amd kernel:  usb_unbind_interface+0x93/0x270
Sep 15 13:19:59 johan-amd kernel:  device_release_driver_internal+0x19f/0x200
Sep 15 13:19:59 johan-amd kernel:  bus_remove_device+0xc6/0x130
Sep 15 13:19:59 johan-amd kernel:  device_del+0x15c/0x3e0
Sep 15 13:19:59 johan-amd kernel:  ? srso_alias_return_thunk+0x5/0x7f
Sep 15 13:19:59 johan-amd kernel:  ? kobject_put+0xa0/0x1d0
Sep 15 13:19:59 johan-amd kernel:  usb_disable_device+0xcd/0x1e0
Sep 15 13:19:59 johan-amd kernel:  usb_disconnect+0xde/0x2c0
Sep 15 13:19:59 johan-amd kernel:  usb_disconnect+0xc3/0x2c0
Sep 15 13:19:59 johan-amd kernel:  hub_event+0xea5/0x1c80
Sep 15 13:19:59 johan-amd kernel:  ? srso_alias_return_thunk+0x5/0x7f
Sep 15 13:19:59 johan-amd kernel:  ? __mod_timer+0x11f/0x370
Sep 15 13:19:59 johan-amd kernel:  process_one_work+0x1c7/0x3d0
Sep 15 13:19:59 johan-amd kernel:  worker_thread+0x51/0x390
Sep 15 13:19:59 johan-amd kernel:  ? __pfx_worker_thread+0x10/0x10
Sep 15 13:19:59 johan-amd kernel:  kthread+0xe8/0x120
Sep 15 13:19:59 johan-amd kernel:  ? __pfx_kthread+0x10/0x10
Sep 15 13:19:59 johan-amd kernel:  ret_from_fork+0x2c/0x50
Sep 15 13:19:59 johan-amd kernel:  </TASK>
Sep 15 13:19:59 johan-amd kernel: Modules linked in: exfat vhost_net vhost vhost_iotlb tap tun xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_ne>
Sep 15 13:19:59 johan-amd kernel:  drm_display_helper platform_profile crypto_simd rfkill asus_ec_sensors mxm_wmi wmi_bmof mousedev mc snd_timer igb cryptd ce>
Sep 15 13:19:59 johan-amd kernel: CR2: ffffafe963344a80
Sep 15 13:19:59 johan-amd kernel: ---[ end trace 0000000000000000 ]---
Sep 15 13:19:59 johan-amd kernel: RIP: 0010:power_supply_uevent+0xee/0x1d0
Sep 15 13:19:59 johan-amd kernel: Code: 75 4e 48 8b 13 48 83 7a 28 00 74 75 45 31 ff 31 c0 eb 10 48 8b 13 41 83 c7 01 49 63 c7 48 3b 42 28 73 5e 48 8b 52 20 8>
Sep 15 13:19:59 johan-amd kernel: RSP: 0018:ffffafe9452337b8 EFLAGS: 00010297
Sep 15 13:19:59 johan-amd kernel: RAX: 0000000000000003 RBX: ffff989f96299800 RCX: ffff98a148934000
Sep 15 13:19:59 johan-amd kernel: RDX: 00000000f0889610 RSI: 00000000c6808203 RDI: ffff989f96299800
Sep 15 13:19:59 johan-amd kernel: RBP: ffff989f96299838 R08: 0000000000000007 R09: ffff989f1aeaa308
Sep 15 13:19:59 johan-amd kernel: R10: ffffffffffffffff R11: 0000000000000000 R12: ffff989e1aeaa000
Sep 15 13:19:59 johan-amd kernel: R13: 0000000000000000 R14: ffff98a148934000 R15: 0000000000000003
Sep 15 13:19:59 johan-amd kernel: FS:  0000000000000000(0000) GS:ffff98acaea00000(0000) knlGS:0000000000000000
Sep 15 13:19:59 johan-amd kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 15 13:19:59 johan-amd kernel: CR2: ffffafe963344a80 CR3: 000000016789c000 CR4: 0000000000750ef0
Sep 15 13:19:59 johan-amd kernel: PKRU: 55555554

Appreciate you sharing your experiences with the kernel, even if not on a Framework 13 laptop.

In my a Fedora 37 install (I use both 37 and 38), kernel 6.4.13-100.fc37.x86_64 and no such issues here. I am seeing this happening on Arch specifically here in the community, but my bigger concern is seeing this on Fedora - if it’s not happening on Fedora, I am feeling reasonably good overall as Fedora 37/38 are officially supported by us.

Hi Matt - I am still experiencing this issue with a lot of the newer kernels including 6.4.14-200.fc38 on Fedora 38. My system is now version locked 6.3.13-200.fc38.x86_64 and is stable. Something in the later 6.4. kernels do not like my setup. I am happy to provide any info you like. Let me know and thank you @Matt_Hartley !

Thanks for this. We’re entering 6.4.15 territory now, so it will be worthwhile to track this.


Everyone affected can you walk me through your setup?

  • Fedora 3x?
  • Kernel in use when this happened?
  • Which expansion cards are in use? USB-A, C, HDMI, etc?
  • Do you use a USB dock and is it connected when this is happening?
  • Software open when this happens?
  • Bat or AC power?

Will report details even if not on Framework:

  • Arch Linux
  • kernel 6.4.12
  • Onboard USB ports on Asus X570 Prime motherboard with AMD Ryzen 5900x. lspci reports as USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
  • Aten US224 USB switch, with a Logitech wireless keyboard & a yubikey
  • A lot of things running, keyboard and yubikey not in active use while switching

Problem does not always occur on switching, only sometimes. And only when switching away from this machine.
Result is semi-hung machine, existing TCP connections seems to work but cannot make new ones, some new processes cannot be started (like sudo) but others seems to work (new foot terminal). Can close all apps but cannot succeed in a graceful reboot, so have to do hard poweroff.

Is there any known upstream ticket for this?

This is happening due to a bug in the hid_logitech_dj kernel module, which occurs when a Logitech Unifying (as well as Bolt and other dongles I believe) is disconnected (not all the time, but often enough for it to happen to me regularly).

Upstream bug is here: 217412 – Since kernel 6.3.1 logitech unify receiver not working properly

Just to confirm, if you check the stack traces when this occurs, somewhere higher up in the trace will be a call to logi_dj_remove, which is part of the hid_logitech_dj kernel module. Every stack trace I’ve seen describing this issue, including my own, has had this in the call stack.

I solved this on my systems by blacklisting the hid_logitech_dj kernel module about 10 days ago and have not had the issue since. I was previously running into this issue multiple times a day as I use a KVM to switch between machines and was happening nearly every 2-3 switches.

To work around this on your system, create a file at /etc/modprobe.d/logitech-blacklist.conf with the following contents:

blacklist hid_logitech_dj

Then update your initramfs and reboot; for Arch it’s mkinitcpio -P, debian is update-initramfs -u -k all, and I believe fedora is dracut --regenerate-all.

2 Likes

hi @Matt_Hartley,

here is the requested info:

Fedora 3x? 38
Kernel in use when this happened? 6.4.14-200.fc38.x86_64
Which expansion cards are in use? USB-A, C, HDMI, etc? 2 usbc, 1 usbA, ethernet
Do you use a USB dock and is it connected when this is happening? yes
Software open when this happens? clean boot then opened 2 terminal windows in gnome
Bat or AC power? ac

lsub just before crash on disconnect

Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 003: ID 27c6:609c Shenzhen Goodix Technology Co.,Ltd. Goodix USB2.0 MISC
Bus 003 Device 010: ID 046d:c52b Logitech, Inc. Unifying Receiver
Bus 003 Device 009: ID 1b1c:1b2d Corsair K95 RGB Platinum Keyboard [RGP0056]
Bus 003 Device 008: ID 1a40:0101 Terminus Technology Inc. Hub
Bus 003 Device 002: ID 04e8:a020 Samsung Electronics Co., Ltd 4-Port USB 2.0 Hub
Bus 003 Device 004: ID 8087:0032 Intel Corp. AX210 Bluetooth
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

crash in dmesg

[Sep18 10:32] usb 3-4.1: USB disconnect, device number 8
[  +0.000013] usb 3-4.1.2: USB disconnect, device number 9
[  +0.168392] usb 3-4.1.4: USB disconnect, device number 10
[  +0.107212] BUG: unable to handle page fault for address: ffffb13d0e98aaf8
[  +0.000010] #PF: supervisor write access in kernel mode
[  +0.000004] #PF: error_code(0x0002) - not-present page
[  +0.000003] PGD 100000067 P4D 100000067 PUD 10020b067 PMD 0 
[  +0.000008] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  +0.000006] CPU: 12 PID: 1504 Comm: kworker/12:2 Not tainted 6.4.14-200.fc38.x86_64 #1
[  +0.000005] Hardware name: Framework Laptop (12th Gen Intel Core)/FRANMACP08, BIOS 03.04 07/15/2022
[  +0.000003] Workqueue: usb_hub_wq hub_event
[  +0.000012] RIP: 0010:power_supply_uevent+0xee/0x1d0
[  +0.000007] Code: 75 4e 48 8b 13 48 83 7a 28 00 74 75 45 31 ff 31 c0 eb 10 48 8b 13 41 83 c7 01 49 63 c7 48 3b 42 28 73 5e 48 8b 52 20 8b 14 82 <f0> 48 0f ab 54 24 08 48 8b 13 4c 89 f1 4c 89 e6 48 89 ef 48 8b 52
[  +0.000005] RSP: 0000:ffffb13d01b877b8 EFLAGS: 00010293
[  +0.000004] RAX: 0000000000000002 RBX: ffff9b39f10ff000 RCX: ffff9b3a092c0000
[  +0.000003] RDX: 00000000670199e5 RSI: 0000000000000000 RDI: ffff9b39f10ff000
[  +0.000003] RBP: ffff9b39f10ff038 R08: 0000000000000007 R09: ffff9b3b6b08f305
[  +0.000003] R10: ffffffffffffffff R11: 0000000000000000 R12: ffff9b3a6b08f000
[  +0.000002] R13: 0000000000000000 R14: ffff9b3a092c0000 R15: 0000000000000002
[  +0.000003] FS:  0000000000000000(0000) GS:ffff9b492f900000(0000) knlGS:0000000000000000
[  +0.000003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000003] CR2: ffffb13d0e98aaf8 CR3: 00000002520a4000 CR4: 0000000000f50ee0
[  +0.000004] PKRU: 55555554
[  +0.000002] Call Trace:

After seeing @ryanpetris comment I tried removing the logitech dongle from the mix, and interestingly enough the problem only happens if the Logitech Unifying receiver is connected. This is with 6.4.14-200.fc38.x86_64.

Also found this: 2227221 – USB disconnect causes kernel crash seem to be related.

@Mitesh_Patel If you get to the bottom of that report, you’ll see it also applies to Logitech receivers.

I should note that I’ve had this happen not just on Framework laptops, but also 12th and 13th gen Intel NUCs, Dell laptops, and even AMD systems.

1 Like

Did you try @ryanpetris’ suggestion, @Mitesh_Patel for comparative testing?

@Matt_Hartley I blacklisted the hid_logitech_dj just this morning and so far no crashes. I have not been able to reproduce the issue after blacklisting. As mentioned earlier it was pretty easy to reproduce. I ll report back if it happens again but its looking pretty good at the moment.

Thank you @ryanpetris and @Matt_Hartley !!

3 Likes

Okay, perfect. I will leave this as tracking as I am actively tracking this still. Appreciate the update.

3 Likes

I’m running 6.5.5-100.fc37 now and haven’t seen the issue after a 31 hrs uptime.

1 Like

This sounds promising!