6.1.7-200.fc37.x86_64 / 12th Gen Intel(R) Core™ i7-1260P
No i915 tunnings! But with intel_idle.max_cstate=2 while booting
With this Since 6.1.17 – I have started getting hangs – but w/o the garbled screens and trash and garbage on my screens. Garble screen hangs will, seemingly, eliminated with intel_idle.max_cstate=2.
With 6.1.17, I started to just getting freezes/hangs; no garbled screens.
So maybe in my situation I did have some problems related the sleep states; but now perhaps I am now also bumping into the intel GPU issues many have noted here:
booting now with: intel_idle.max_cstate=2 i915.enable_psr=0
I seem to have some stability back.
NOTE: There is maybe some correlation between this (new to me) hang and Zoom with web cam active. Too early to tell at this point.
Freecad was perfect. not a single hitch with it including this morning.
next on the todo list is some VScode work. very light stuff - just some ansible configs. It was unuseable. 2 soft freezes with the 3rd being a hard freeze all within 5-10 minutes and it’s the only time i’ve ever had issues with vscode. Probably used weekly and installed since i first got the laptop in November.
i can see kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000 in the logs around the time of the soft freezes, but nothing i can see leading up to the hard freeze.
vscode running in background while i type this and so far the clean boot after the hardfreeze power-off i’ve not had any issues yet…
the ONLY thing i can say is a potential update issue. I did some system updates which included mesa. no prompt for reboot so i didn’t bother.
with brightness cntl in a broken state ([SOLVED] 12th gen not sending XF86MonBrightnessUp / Down - #52 by vhx), i’ve also noticed that i did not have any battery listed either. VS Code continued to crash but since it was freezing the entire laptop, i’m not sure if thats accurate. journlctl shows the following, note a reference with kwin_wayland :
...
Jan 28 11:07:02 tim-laptop kernel: Asynchronous wait on fence 0000:00:02.0:kwin_wayland[2002]:6270 timed out (hint:intel_atomic_commit_ready [i915])
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.bin version 70.5.1
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc.bin version 7.9.3
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] HuC authenticated
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] GuC submission enabled
Jan 28 11:07:06 tim-laptop kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
...
i’ve since rebooted, keyboard brightness controls working again (always works via the OS’s display brightness) and battery is being reported on again. no more crashing for hours now with identical workload, software and websites open.
since battery & brightness are both involved with acpi, wonder if theres an issue there causing multiple downstream issues. if it’s random then that could explain why i have good days and bad days. i’d typically leave my laptop in sleep/running states and very rarely reboot. I’ll see if i can reproduce that more later today…
not recently. my issue is that my problems are extremely infrequent. if i had an issue that’d be fixable with PSR then that issue would be more prevelent (which it has in the past).
I’m not willing to run reduced battery for a problem thats infrequent and a reboot seems to correct it. i can run weeks without a problem which to me proves there is a lot of stability improvement since this thread was initially created. this includes video playback.
What i was trying to get it; the current state of my 12gen is ecode related freezing appear to be fixed with a reboot and/or there are other issues present (the aformentioned keyboard brightness control and missing battery detection). my most recent reboot (no OS updates for atleast 3-4 reboots now) has been stable and reliable.
Also, unless there was residual data somewhere, the laptop was disconnected from all peripherals between one of the reboots so it 100% excludes anything that isnt the physical laptop (no external USBC dock, displays, etc causing issues).
I’m using Pop OS (Ubuntu 22.04) and having the same hard freeze problem 3x in the past 3 days. I’ve only had the laptop (DIY version) that long, so have only installed OS and software, but haven’t really done anything yet to troubleshoot.
12th gen i7-1280P
Kernel version: 6.0.12-76060006-generic
No i915 fixes
Gnome version 42.5
Have mostly been using Firefox when it has happened, with Settings and Terminal also open. I had an external hard drive attached to a USB-C port. Unplugged that last night, and it hasn’t frozen yet today. I’m not sure yet whether that’s coincidental or not.
Has anyone noticed if this happens more with low power states?
From about 1030-1200 yesterday i was on charge (via USB-C PD dock) with vscode and my mentioned 2 soft freezes & 1 hard freeze from yesterday. Rebooted to apply some updates, and fix those possibly-acpi related brightness&battery detection issues.
1200-1500 was after reboot on charge. not a single issue.
From about 1700 onwards I moved to battery only but only really used the laptop after about 1830. I encountered 2 more freezes around 1900&2100.
what i have done over the last week is configure TLP - notice anything…?? ACPI issues in the morning with problems.
no problems in afternoon. Problems again when on battery.
It might be pure coincidence and i’m barking up the wrong tree but i’ve started to see various issues since using TLP. Could be TLP itself, could be the PCIE_ASPM_ON_BAT=powersupersave battery state, or just a coincidence.
TLP service is being disabled to test further today…
edit: no, i’m wrong. disable TLP, reboot, got the ecode soft-freeze issue. now running with TLP disabled & psr=0; no issues for about 90mins on battery…
Applications running in foreground or background when freeze occurs: Discord, Firefox, Files. Only happens on unlock, if that helps.
What if anything is attached to your Framework; docks, (BT, IR, wired) mouse, keyboard: Sabrent KVM with a 4k60hz display, mouse, and keyboard. Happens when connected to dock, and when in standalone laptop mode.
Happens regardless of power saving mode, battery level, modules plugged in, etc.
no discard here (not intentional, I always forget to add discard to fstab!!)
F37, BTRFS OS drive with / /boot /boot/efi & /home partitions on the entire system. WD SN770 500GB nvme with 731100WD firmare (was updated early Nov’22 when first bought)
fstab options / & /home use subvol=root,compress=zstd:1 & subvol=home,compress=zstd:1 respectively.
I’ve had no more hard freezes disabling TLP, and enabling PSR, on or off battery. bit too early to be sure, but had no issues across many hours with that combo on Sunday. Definately seemed more frequent when on battery (no USB connected peripherals, MX Master&keys connected via bluetooth instead of unify) vs mains.
Discards
Warning: Although continuous TRIM is an option (albeit not recommended) for SSDs, NVMe devices should not be issued discards.
Discards are disabled by default on typical setups that use ext4 and LVM, but other file systems might need discards to be disabled explicitly.
Intel, as one device manufacturer, recommends not to enable discards at the file system level, but suggests the periodic TRIM method, or apply fstrim manually.[3]
might make myself look a fool now… but sata, m2 and nvme are all seperate afaik. if we’re dealing with sata specific discard, then that needs to be confirmed (Kingston confirm the differences)
m.2 = physical connector. comes in various different keys like A&E
sata = serial ATA
nvme = pcie attached storage. does not use sata protocol.
not wanting to take this too far off topic, btrfs does not enable discards. has to be explicitly defined with discard, or the more preferred discard=async recommendation. either way, no discards for me.
Okay, sorry, you might well be right.
Admit that the “SATA M.2” denomination may lead to think that there would be a link…
I have no idea how to check (on Linux) if a NVMe, e.g. my SN850, uses SATA or not… Your link at Kingston seems to say that NVMe is an alternative to SATA, in which case I have my answer.
Do we have more details on why discard is not advised on NVMes?
For now I will just keep the discard option, as I never had problems with it.
Usually distros with a reasonable default configuration would use periodic trims via the fstrim.service which should be enabled by default on Fedora 37. This should run once a week and issue TRIM commands for blocks that were freed up and there would be no need for any continous discard option.
I’m not an expert on SSDs but doing this periodically seems like a good approach, performance should be good and it doesn’t require checking different filesystems for compatibility or mount options or anything.
@mcz But the reason why periodic trim is preferred over continuous trim is the problem with SATA < 3.1 that I quoted before. If this issue is out of the picture it seems obvious that continuous trim becomes again the best choice.
I also read some argument that NVMes had spare space specifically to address wear problems and thus continuous trimming was not as much needed. This seems like to me, to the contrary, a hint at going back to continuous trimming… in order to slow down the reliance on spare blocks.
That said, I’m going to spend some time understanding what trimming is exactly…
Well as far as I know: If you delete a file the filesystem used to just mark is as gone and never overwrite it on the harddrive unless the freed up block was actually used for something else. The harddrive didn’t care. But for SSDs it is important for the controller inside the SSD to know which blocks are actually used, otherwise the controller can’t overwrite those blocks and can’t use them for wear levelling and such things (the blocks the OS sees are not identical to where they actually are on the flash chips)… So filesystems now need to tell the drive when blocks are no longer used, which is what TRIM does.
Since normal users don’t really delete all that much stuff running a once-a-week TRIM seems reasonable to me.
@mcz Excellent explanation, thank you!
Ok so now I understand why it doesn’t matter that much to use a periodic trimming.
I thought that was more about limiting write operations, but I’m glad I was wrong.
Anyone posting who hasn’t done this already, please be sure to:
Everyone who is experiencing freezing, please reply with this outline below so I can get this into a spreadsheet and track this down for the folks at Fedora. Please use this format. Thanks!