[RESPONDED] Linux recent update is causing some sort of stalling/freezing in Pop!_OS

Anyone else experienced this? It usually happens when an embedded youtube clip is playing in Firefox. The cursor freezes, keyboard stops working, but the caps-lock still toggles the LED, and it gets fixed if I close and open the laptop, after a few seconds it sleeps and then I can unlock and use it again. It’s really frustrating and intermittent, and I’m struggling to try to debug when it happens because it’s unusable until it sleeps and wakes. Anyone ever experience anything like this? or have any tips on what might be causing it?

  • 11th Gen Framework i5
  • Firefox 124.0.1
  • Pop!_OS 22.04 LTS
  • Linux pop-os 6.8.0-76060800daily20240311-generic #202403110203~1711393930~22.04~331756a SMP PREEMPT_DYNAMIC Mon M x86_64 x86_64 x86_64 GNU/Linux

I had dmesg -T --level=err,crit,alert,emerg --follow running this last time when it happened, here’s the errors in the logs:

[Mon Apr  8 01:13:38 2024] i915 0000:00:02.0: [drm] *ERROR* [CRTC:98:pipe A] flip_done timed out
[Mon Apr  8 01:13:52 2024] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[Mon Apr  8 01:13:52 2024] i915 0000:00:02.0: [drm] *ERROR* [CRTC:98:pipe A] commit wait timed out
[Mon Apr  8 01:14:02 2024] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[Mon Apr  8 01:14:02 2024] i915 0000:00:02.0: [drm] *ERROR* [PLANE:31:plane 1A] commit wait timed out
[Mon Apr  8 01:14:07 2024] Freezing user space processes failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
[Mon Apr  8 01:14:13 2024] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[Mon Apr  8 01:14:13 2024] i915 0000:00:02.0: [drm] *ERROR* [PLANE:94:cursor A] commit wait timed out
[Mon Apr  8 01:14:18 2024] iwlwifi 0000:aa:00.0: WRT: Invalid buffer destination

And here are the full logs from that time range:

[Mon Apr  8 01:13:39 2024] i915 0000:00:02.0: [drm] *ERROR* [CRTC:98:pipe A] flip_done timed out
[Mon Apr  8 01:13:43 2024] wlp170s0: deauthenticating from 7a:45:58:1b:f7:26 by local choice (Reason: 3=DEAUTH_LEAVING)
[Mon Apr  8 01:13:48 2024] PM: suspend entry (s2idle)
[Mon Apr  8 01:13:48 2024] Filesystems sync: 0.002 seconds
[Mon Apr  8 01:13:48 2024] Freezing user space processes
[Mon Apr  8 01:13:53 2024] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[Mon Apr  8 01:13:53 2024] i915 0000:00:02.0: [drm] *ERROR* [CRTC:98:pipe A] commit wait timed out
[Mon Apr  8 01:14:03 2024] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[Mon Apr  8 01:14:03 2024] i915 0000:00:02.0: [drm] *ERROR* [PLANE:31:plane 1A] commit wait timed out
[Mon Apr  8 01:14:08 2024] Freezing user space processes failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
[Mon Apr  8 01:14:08 2024] task:gnome-shell     state:D stack:0     pid:3188  tgid:3188  ppid:2042   flags:0x00004006
[Mon Apr  8 01:14:08 2024] Call Trace:
[Mon Apr  8 01:14:08 2024]  <TASK>
[Mon Apr  8 01:14:08 2024]  __schedule+0x2cb/0x740
[Mon Apr  8 01:14:08 2024]  schedule+0x33/0x110
[Mon Apr  8 01:14:08 2024]  schedule_timeout+0x95/0x170
[Mon Apr  8 01:14:08 2024]  ? __pfx_process_timeout+0x10/0x10
[Mon Apr  8 01:14:08 2024]  wait_for_completion_timeout+0x81/0x150
[Mon Apr  8 01:14:08 2024]  drm_crtc_commit_wait+0x32/0x90
[Mon Apr  8 01:14:08 2024]  drm_atomic_helper_wait_for_dependencies+0x117/0x170
[Mon Apr  8 01:14:08 2024]  intel_atomic_commit_tail+0xdd/0xa00 [i915]
[Mon Apr  8 01:14:08 2024]  ? __flush_workqueue+0x1ac/0x3e0
[Mon Apr  8 01:14:08 2024]  intel_atomic_commit+0x352/0x3a0 [i915]
[Mon Apr  8 01:14:08 2024]  drm_atomic_commit+0x96/0xd0
[Mon Apr  8 01:14:08 2024]  ? __pfx___drm_printfn_info+0x10/0x10
[Mon Apr  8 01:14:08 2024]  drm_mode_atomic_ioctl+0x563/0x850
[Mon Apr  8 01:14:08 2024]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[Mon Apr  8 01:14:08 2024]  drm_ioctl_kernel+0xb9/0x120
[Mon Apr  8 01:14:08 2024]  drm_ioctl+0x2d0/0x550
[Mon Apr  8 01:14:08 2024]  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
[Mon Apr  8 01:14:08 2024]  __x64_sys_ioctl+0xa0/0xf0
[Mon Apr  8 01:14:08 2024]  do_syscall_64+0x76/0x140
[Mon Apr  8 01:14:08 2024]  ? irqentry_exit+0x43/0x50
[Mon Apr  8 01:14:08 2024]  ? exc_page_fault+0x94/0x1b0
[Mon Apr  8 01:14:08 2024]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[Mon Apr  8 01:14:08 2024] RIP: 0033:0x7d2e1371a94f
[Mon Apr  8 01:14:08 2024] RSP: 002b:00007ffdde3917f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Mon Apr  8 01:14:08 2024] RAX: ffffffffffffffda RBX: 00007ffdde391890 RCX: 00007d2e1371a94f
[Mon Apr  8 01:14:08 2024] RDX: 00007ffdde391890 RSI: 00000000c03864bc RDI: 0000000000000009
[Mon Apr  8 01:14:08 2024] RBP: 00000000c03864bc R08: 0000000000000028 R09: 0000000000000028
[Mon Apr  8 01:14:08 2024] R10: 0000000000000003 R11: 0000000000000246 R12: 00005d26fe2baa20
[Mon Apr  8 01:14:08 2024] R13: 0000000000000009 R14: 00005d26fe93bcc0 R15: 00005d26fdbf09a0
[Mon Apr  8 01:14:08 2024]  </TASK>
[Mon Apr  8 01:14:08 2024] OOM killer enabled.
[Mon Apr  8 01:14:08 2024] Restarting tasks ... done.
[Mon Apr  8 01:14:08 2024] random: crng reseeded on system resumption
[Mon Apr  8 01:14:08 2024] thermal thermal_zone5: failed to read out thermal zone (-61)
[Mon Apr  8 01:14:08 2024] PM: suspend exit
[Mon Apr  8 01:14:08 2024] PM: suspend entry (s2idle)
[Mon Apr  8 01:14:08 2024] Filesystems sync: 0.023 seconds
[Mon Apr  8 01:14:08 2024] Freezing user space processes
[Mon Apr  8 01:14:14 2024] i915 0000:00:02.0: [drm] *ERROR* flip_done timed out
[Mon Apr  8 01:14:14 2024] i915 0000:00:02.0: [drm] *ERROR* [PLANE:94:cursor A] commit wait timed out
[Mon Apr  8 01:14:14 2024] Freezing user space processes completed (elapsed 6.160 seconds)
[Mon Apr  8 01:14:14 2024] OOM killer disabled.
[Mon Apr  8 01:14:14 2024] Freezing remaining freezable tasks
[Mon Apr  8 01:14:14 2024] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[Mon Apr  8 01:14:14 2024] printk: Suspending console(s) (use no_console_suspend to debug)
[Mon Apr  8 01:14:15 2024] ACPI: EC: interrupt blocked
[Mon Apr  8 01:14:18 2024] ACPI: EC: interrupt unblocked
[Mon Apr  8 01:14:19 2024] nvme nvme0: 8/0/0 default/read/poll queues
[Mon Apr  8 01:14:19 2024] OOM killer enabled.
[Mon Apr  8 01:14:19 2024] Restarting tasks ... 
[Mon Apr  8 01:14:19 2024] mei_hdcp 0000:00:16.0-b638ab7e-94e2-4ea2-a552-d1c54b627f04: bound 0000:00:02.0 (ops i915_hdcp_ops [i915])
[Mon Apr  8 01:14:19 2024] mei_pxp 0000:00:16.0-fbf6fcf1-96cf-4e2e-a6a6-1bab8cbe36b1: bound 0000:00:02.0 (ops i915_pxp_tee_component_ops [i915])
[Mon Apr  8 01:14:19 2024] done.
[Mon Apr  8 01:14:19 2024] random: crng reseeded on system resumption
[Mon Apr  8 01:14:19 2024] thermal thermal_zone5: failed to read out thermal zone (-61)
[Mon Apr  8 01:14:19 2024] PM: suspend exit
[Mon Apr  8 01:14:19 2024] iwlwifi 0000:aa:00.0: WRT: Invalid buffer destination
[Mon Apr  8 01:14:20 2024] iwlwifi 0000:aa:00.0: WFPM_UMAC_PD_NOTIFICATION: 0x20
[Mon Apr  8 01:14:20 2024] iwlwifi 0000:aa:00.0: WFPM_LMAC2_PD_NOTIFICATION: 0x1f
[Mon Apr  8 01:14:20 2024] iwlwifi 0000:aa:00.0: WFPM_AUTH_KEY_0: 0x90
[Mon Apr  8 01:14:20 2024] iwlwifi 0000:aa:00.0: CNVI_SCU_SEQ_DATA_DW9: 0x0
[Mon Apr  8 01:14:20 2024] usb 3-9: reset full-speed USB device number 2 using xhci_hcd
[Mon Apr  8 01:14:20 2024] usb 3-9: reset full-speed USB device number 2 using xhci_hcd
[Mon Apr  8 01:14:23 2024] wlp170s0: authenticate with 7a:45:58:1b:f7:26 (local address=c4:bd:e5:1b:42:c5)
[Mon Apr  8 01:14:23 2024] wlp170s0: send auth to 7a:45:58:1b:f7:26 (try 1/3)
[Mon Apr  8 01:14:23 2024] wlp170s0: authenticated
[Mon Apr  8 01:14:23 2024] wlp170s0: associate with 7a:45:58:1b:f7:26 (try 1/3)
[Mon Apr  8 01:14:23 2024] wlp170s0: RX AssocResp from 7a:45:58:1b:f7:26 (capab=0x1011 status=0 aid=14)
[Mon Apr  8 01:14:23 2024] wlp170s0: associated
[Mon Apr  8 01:14:23 2024] wlp170s0: Limiting TX power to 36 (36 - 0) dBm as advertised by 7a:45:58:1b:f7:26

Not a distro we test against, but, if doable, worth trying a previous kernel to see if this is a bug. On POP, an easy way to do this is to boot to the recovery partition and test thing, then if it is fine, run uname -r for your kernel used there.

2 Likes

Yes! I expereienced it.
…but I don’t have much advice for kernel troubleshooting. I suspect an interaction between the kernel and gdm or maybe a cosmic extension.
If you want to dig deeper, research enabling verbose logging or debugging in gdm and related components and see where that leads.

My 11th gen i7-1165 had the display lock up on two pop-os 6.8 kernels:
6.8.0.76060800daily20240311.202403110203~1710198088~22.04~1a3dbc7 and
6.8.0.76060800daily20240311.202403110203~1711393930~22.04~331756a

so I added a loader entry for 6.6.10 and set the loader to load 6.6.10-76060610.202401051437~1709764300~22.04~379e7a9 by default.

I did not do a lot of troubleshooting as the logs did not show much consistency.

My experience was that the display locked up including the clock in the top bar not updating but the logs showed various activity continuing.
Sometimes it would lock up before even logging on, sometimes after minutes or hours of use it would freeze for 4 to 5 minutes and then go to the gnome login screen and sometimes it stayed frozen more than 10 minutes where I would hold the power button to force a shutdown. Fortunately btrfs tools showed no file system inconsistencies.

I will test future pop-os kernels released.

I’ll add that this may present as a similar symptom that others have posted about for perhaps a year or more with an issue with firefox causing short freezes on pop-os which I also have experienced but not much that I recall recently.

Good luck!

1 Like

I booted into the previous kernel (6.6.10-76060610-generic) and the weird freezing hasn’t happened once. Thanks @Matt_Hartley & @Spence !

1 Like

@Travis ,
I tried the new 6.8.0 kernel that System76 released this afternoon and it locked up within 2 minutes with a youtube video playing on firefox 124. I’ll test with firefox 125 when it shows up, probably late morning Tuesday.

Info on the new kernel:

sudo file /boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/vmlinuz.efi
/boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/vmlinuz.efi: Linux kernel x86 boot executable bzImage, version 6.8.0-76060800daily20240311-generic (jenkins@warp.pop-os.org) #202403110203~1713206908~22.04~3a62479 SMP PREEMPT_DYNAMIC Mon A, RO-rootFS, swap_dev 0XE, Normal VGA
date -d @1713206908
Mon Apr 15 02:48:28 PM EDT 2024
1 Like

Thanks for checking! I’m glad to know that it’s not just me!

On 6.8 here, and not experienced the issue myself, so possibly specifically related to something Firefox related?

Hi Trevor,
I tried the recently released kernel and Gnome continues to lock up. )

202403110203~1714077665~22.04~4c8e9a0 ( ? built Thu Apr 25 08:41:05 PM UTC 2024 ? )

I tried it today and I had sshd running so I tried to ssh into my fw 13 11th gen running pop-os.
It worked fine. top showed very low cpu usage and gnome-shell never made the list. There were no journalctl entries nor dmesg records in the 10 - 15 minutes between the gnome lock-up and my checking the logs.

I killed the gnome shell instance from the remote ssh log-in and the gnome login screen appeared. I did an orderly restart from the gnome menu and went back to kernel 6.6.10.

1 Like

Just popping in to confirm that this still happens for me in 6.9.3-76060903-generic. A recent update seems to have corrupted the old kernel entry in my boot menu so now I need to figure out how to repair it again.

OK, in case this happens to me again, I fixed it by running:

sudo update-initramfs -u -k 6.6.10-76060610-generic
1 Like

Nice!

Hi @Travis ,

I have a large efi partition so I added a third kernel/initramfs combo and related config for 6.6.10-76060610-generic that has been preserved through a few kernel updates. I don’t know if it is truly stable or remotely close to best practice but it is working. Do something similar for yourself if it looks useful to you.

It looks like my active efi directory is /boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/ so I copied
/boot/initrd.img-6.6.10-76060610-generic to
/boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/initrd.img-good
and copied
/boot/vmlinuz-6.6.10-76060610-generic to
/boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/vmlinuz-good.efi
and added it to the efi boot menu.

I created file /boot/efi/loader/entries/Pop_OS-good.conf
with contents:

title Pop!_OS-custom
linux /EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/vmlinuz-good.efi
initrd /EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/initrd.img-good
options root=UUID=dbe3a739-4a96-4023-940a-8844b7b35aec ro quiet loglevel=0 systemd.show_status=false splash rootflags=subvol=@

You probably have different options for your boot but can see what is appropriate in the “standard” entry conf files.

Lastly I set my custom efi boot entry for kernel 6.6.10 to be default and set a 3 second timeout so I can easily choose one manually.
Contents of /boot/efi/loader/loader.conf

default Pop_OS-good
#default Pop_OS-current
timeout 3

I found that I can see what the kernel versions of Pop_Os maintained current and previous files with the file command and match them up to the files in /boot/

sudo file /boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/vmlinuz.efi
/boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/vmlinuz.efi: Linux kernel x86 boot executable bzImage, version 6.9.3-76060903-generic (jenkins@warp.pop-os.org) #202405300957~1718348209~22.04~7817b67 SMP PREEMPT_DYNAMIC Mon J, RO-rootFS, swap_dev 0XE, Normal VGA
sudo file /boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/vmlinuz-previous.efi
/boot/efi/EFI/Pop_OS-dbe3a739-4a96-4023-940a-8844b7b35aec/vmlinuz-previous.efi: Linux kernel x86 boot executable bzImage, version 6.8.0-76060800daily20240311-generic (jenkins@warp.pop-os.org) #202403110203~1715181801~22.04~aba43ee SMP PREEMPT_DYNAMIC Wed M, RO-rootFS, swap_dev 0XE, Normal VGA

In hindsight, I kinda wish I left 6.6.10-generic in the loader menu name. Easy enough to change though.

Let me know if you find something missing or a confusing typo etc.

EDIT: …or let me know if you have an idea to improve my setup. :slight_smile:

1 Like

Are you using fractional scaling? I am pretty sure I was having this same issue on a Thinkpad X12 with a 1160g7/Iris XE running PopOS. I didn’t rollback my kernel but I am pretty sure I was able to avoid the issue. I since sold the device though so my I might be miss remembering.

Edit: Are you running Wayland?

1 Like

Hmm, I am using both fractional scaling and wayland.

I am as well. I might test without fractional scaling and also with X instead of wayland over the weekend.

1 Like

Any update on this? I just upgraded Firefox & popOS and when I open Firefox (it loads a lot of previously open tabs) it locks up and becomes unusable (some of these are Youtube videos). The same symptoms you descripe @Travis i.e. caps lock working but no mouse/keyboard, top bar frozen etc.

I’m running fractional scaling and Wayland, currently on kernel 6.9.3-76060903-generic.

As a novice in the kernel world, I’ve booted into the previous kernel i.e. oldkern by spamming space (I also pressed F11 but not sure if this did much) on boot up and selecting the old kernel.

6.5.6-76060506-generic

This no longer freezes up and seems to work properly, I’m going to set this as my default for now. Any feedback is welcome, as I’m concerned any future kernel updates will replace the old backup.