[TRACKING] Hard freezing on Fedora 36 with the new 12th gen system

This sounds promising. And we have your user template here for my reference later on.

Fedora 36 or 37: 37

12th gen or 11th gen: 12th gen

Kernel version: 6.1.6-200.fc37.x86_64

Gnome version: 43.2

Using i915 fixes, if so, which ones: Yes, options i915 enable_psr=0

Has this been tried yet and if so, any difference: Yes, seemed to have reduced freezes but did not completely resolve them.

Applications running in foreground or background when freeze occurs: Gnome Settings, Discord, Firefox

What if anything is attached to your Framework; docks, (BT, IR, wired) mouse, keyboard: Caldigit TS3 dock, which has ethernet, USB switch for mouse and keyboard, and DisplayPort monitor connected to it.

Just reporting in to say I have not experienced any freezes or crashes in the last couple of weeks and now firefox also seems to be hardware decoding AV1 videos (only 2-3hr testing so far) without crashing. Looks good, we’ll see if it stays like that.

A bit before my last post with the templated info request, i was seeing firefox being extremely problematic. 5-10minutes of youtube crashing. Not sure if it got patched or a reboot fixed but same experience as @mcz now; firefox has been rock solid for me all this week with firefox in default config (av1/vp9 hw decode enabled). untill this week i’ve been using Chromium for anything youtube. A number of hours with firefox youtube playback without a hitch. the last 24-48hours the laptop appears to be rock solid.

i might have another session in freecad over the next few days, but with both a kernel and freecad update, it might’ve improved things.

There does appear to be a trend of improvement for me, so the fixes the kernel/i915 devs are releasing are resolving real issues.

Update for me:

6.1.7-200.fc37.x86_64 / 12th Gen Intel(R) Core™ i7-1260P

No i915 tunnings! But with intel_idle.max_cstate=2 while booting

With this Since 6.1.17 – I have started getting hangs – but w/o the garbled screens and trash and garbage on my screens. Garble screen hangs will, seemingly, eliminated with intel_idle.max_cstate=2.

With 6.1.17, I started to just getting freezes/hangs; no garbled screens.

So maybe in my situation I did have some problems related the sleep states; but now perhaps I am now also bumping into the intel GPU issues many have noted here:

booting now with: intel_idle.max_cstate=2 i915.enable_psr=0

I seem to have some stability back.

NOTE: There is maybe some correlation between this (new to me) hang and Zoom with web cam active. Too early to tell at this point.

HTH

1 Like

Freecad was perfect. not a single hitch with it including this morning.
next on the todo list is some VScode work. very light stuff - just some ansible configs. It was unuseable. 2 soft freezes with the 3rd being a hard freeze all within 5-10 minutes and it’s the only time i’ve ever had issues with vscode. Probably used weekly and installed since i first got the laptop in November.

i can see kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000 in the logs around the time of the soft freezes, but nothing i can see leading up to the hard freeze.

vscode running in background while i type this and so far the clean boot after the hardfreeze power-off i’ve not had any issues yet…

the ONLY thing i can say is a potential update issue. I did some system updates which included mesa. no prompt for reboot so i didn’t bother.

1 Like

been getting some weird behaviour today.

with brightness cntl in a broken state ([SOLVED] 12th gen not sending XF86MonBrightnessUp / Down - #52 by vhx), i’ve also noticed that i did not have any battery listed either. VS Code continued to crash but since it was freezing the entire laptop, i’m not sure if thats accurate. journlctl shows the following, note a reference with kwin_wayland :

...
Jan 28 11:07:02 tim-laptop kernel: Asynchronous wait on fence 0000:00:02.0:kwin_wayland[2002]:6270 timed out (hint:intel_atomic_commit_ready [i915])
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.bin version 70.5.1
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] HuC authenticated
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
...

i’ve since rebooted, keyboard brightness controls working again (always works via the OS’s display brightness) and battery is being reported on again. no more crashing for hours now with identical workload, software and websites open.

since battery & brightness are both involved with acpi, wonder if theres an issue there causing multiple downstream issues. if it’s random then that could explain why i have good days and bad days. i’d typically leave my laptop in sleep/running states and very rarely reboot. I’ll see if i can reproduce that more later today…

have you guys noticed similar?

(F37 KDE, kernel 6.1.7; updated this morning)

1 Like

@vhx have you tried this

1 Like

not recently. my issue is that my problems are extremely infrequent. if i had an issue that’d be fixable with PSR then that issue would be more prevelent (which it has in the past).

I’m not willing to run reduced battery for a problem thats infrequent and a reboot seems to correct it. i can run weeks without a problem which to me proves there is a lot of stability improvement since this thread was initially created. this includes video playback.

What i was trying to get it; the current state of my 12gen is ecode related freezing appear to be fixed with a reboot and/or there are other issues present (the aformentioned keyboard brightness control and missing battery detection). my most recent reboot (no OS updates for atleast 3-4 reboots now) has been stable and reliable.

Also, unless there was residual data somewhere, the laptop was disconnected from all peripherals between one of the reboots so it 100% excludes anything that isnt the physical laptop (no external USBC dock, displays, etc causing issues).

1 Like

I’m using Pop OS (Ubuntu 22.04) and having the same hard freeze problem 3x in the past 3 days. I’ve only had the laptop (DIY version) that long, so have only installed OS and software, but haven’t really done anything yet to troubleshoot.

12th gen i7-1280P
Kernel version: 6.0.12-76060006-generic
No i915 fixes
Gnome version 42.5

Have mostly been using Firefox when it has happened, with Settings and Terminal also open. I had an external hard drive attached to a USB-C port. Unplugged that last night, and it hasn’t frozen yet today. I’m not sure yet whether that’s coincidental or not.

Has anyone noticed if this happens more with low power states?

From about 1030-1200 yesterday i was on charge (via USB-C PD dock) with vscode and my mentioned 2 soft freezes & 1 hard freeze from yesterday. Rebooted to apply some updates, and fix those possibly-acpi related brightness&battery detection issues.
1200-1500 was after reboot on charge. not a single issue.
From about 1700 onwards I moved to battery only but only really used the laptop after about 1830. I encountered 2 more freezes around 1900&2100.

what i have done over the last week is configure TLP - notice anything…?? ACPI issues in the morning with problems.
no problems in afternoon. Problems again when on battery.

It might be pure coincidence and i’m barking up the wrong tree but i’ve started to see various issues since using TLP. Could be TLP itself, could be the PCIE_ASPM_ON_BAT=powersupersave battery state, or just a coincidence.
TLP service is being disabled to test further today…

edit: no, i’m wrong. disable TLP, reboot, got the ecode soft-freeze issue. now running with TLP disabled & psr=0; no issues for about 90mins on battery…

1 Like

Fedora 36 or 37: 36, now 37, has been happening since 34 and 35.

12th gen or 11th gen: 11th gen

Kernel version: 6.1.7-200.fc37.x86_64

Gnome version: 43.2, and whatever came with Fedora 36

Using i915 fixes, if so, which ones: None

Has this been tried yet and if so, any difference: Have not tried this.

Applications running in foreground or background when freeze occurs: Discord, Firefox, Files. Only happens on unlock, if that helps.

What if anything is attached to your Framework; docks, (BT, IR, wired) mouse, keyboard: Sabrent KVM with a 4k60hz display, mouse, and keyboard. Happens when connected to dock, and when in standalone laptop mode.

Happens regardless of power saving mode, battery level, modules plugged in, etc.

1 Like

Can you guys with the hard freezes check if discard is enabled on the nvme?

3 Likes

Please, this. Thanks

2 Likes

no discard here (not intentional, I always forget to add discard to fstab!!)

F37, BTRFS OS drive with / /boot /boot/efi & /home partitions on the entire system. WD SN770 500GB nvme with 731100WD firmare (was updated early Nov’22 when first bought)

fstab options / & /home use subvol=root,compress=zstd:1 & subvol=home,compress=zstd:1 respectively.

I’ve had no more hard freezes disabling TLP, and enabling PSR, on or off battery. bit too early to be sure, but had no issues across many hours with that combo on Sunday. Definately seemed more frequent when on battery (no USB connected peripherals, MX Master&keys connected via bluetooth instead of unify) vs mains.

I’m asking about discard because its not encouraged to do discards on nvvme drives and caused me some headaches.

https://wiki.archlinux.org/title/Solid_state_drive/NVMe

Discards
Warning: Although continuous TRIM is an option (albeit not recommended) for SSDs, NVMe devices should not be issued discards.

Discards are disabled by default on typical setups that use ext4 and LVM, but other file systems might need discards to be disabled explicitly.

Intel, as one device manufacturer, recommends not to enable discards at the file system level, but suggests the periodic TRIM method, or apply fstrim manually.[3]

I think the advice does not hold anymore for recent devices.
See the rationale in the page Solid state drive - ArchWiki

Warning: Before SATA 3.1 all TRIM commands were non-queued, so continuous trimming would produce frequent system freezes.

And discard is continuous trimming.

I don’t think the NVMes used with Framework laptops are SATA 3.1, except if older NVMes are brought over by the users.

In particular, SATA M.2 was introduced with SATA 3.2, and SSDs in the Framework use such M.2 connectors.

(EDITED: NVMe seems to be an alternative to SATA, in which case all these info about SATA would not apply)

Did your problems with discard happen on a Framework laptop?

might make myself look a fool now… but sata, m2 and nvme are all seperate afaik. if we’re dealing with sata specific discard, then that needs to be confirmed (Kingston confirm the differences)

m.2 = physical connector. comes in various different keys like A&E
sata = serial ATA
nvme = pcie attached storage. does not use sata protocol.

not wanting to take this too far off topic, btrfs does not enable discards. has to be explicitly defined with discard, or the more preferred discard=async recommendation. either way, no discards for me.

1 Like

Okay, sorry, you might well be right.
Admit that the “SATA M.2” denomination may lead to think that there would be a link…

I have no idea how to check (on Linux) if a NVMe, e.g. my SN850, uses SATA or not… Your link at Kingston seems to say that NVMe is an alternative to SATA, in which case I have my answer.

Do we have more details on why discard is not advised on NVMes?
For now I will just keep the discard option, as I never had problems with it.

Usually distros with a reasonable default configuration would use periodic trims via the fstrim.service which should be enabled by default on Fedora 37. This should run once a week and issue TRIM commands for blocks that were freed up and there would be no need for any continous discard option.
I’m not an expert on SSDs but doing this periodically seems like a good approach, performance should be good and it doesn’t require checking different filesystems for compatibility or mount options or anything.

@mcz But the reason why periodic trim is preferred over continuous trim is the problem with SATA < 3.1 that I quoted before. If this issue is out of the picture it seems obvious that continuous trim becomes again the best choice.

I also read some argument that NVMes had spare space specifically to address wear problems and thus continuous trimming was not as much needed. This seems like to me, to the contrary, a hint at going back to continuous trimming… in order to slow down the reliance on spare blocks.

That said, I’m going to spend some time understanding what trimming is exactly…