[Tracking] Various issues of 12th gen with Void Linux

This is somewhat both a tracking issue for self-support as well as a support request.

Issues

  • NVME still disconnects, even though not so often.
  • Micro stutters every X seconds even with i915.enable_psr=0
    Better than without, but still not gone.
  • Audio pops on first module usage (snd_hda_intel)
  • Kernel parameters do not apply automatically,
    but can be written “manually” after boot
/sys/module/snd_hda_intel/parameters/power_save:0
/sys/module/snd_hda_intel/parameters/power_save_controller:N
/sys/module/nvme_core/parameters/default_ps_max_latency_us:0
  • Kernel parameters are not writable
    Hint: seems to be written when using pcie_aspm=off.
/sys/module/nvme/parameters/noacpi:Y
  • HDMI Splitter does not reliably work
    ToDo: Try to set up EDID manually for it to work

  • Wifi disconnects randomly (possibly AP)

[19. Jan 13:42] wlp166s0: disassociated from <AP> (Reason: 8=DISASSOC_STA_HAS_LEFT)
[  +2,504500] wlp166s0: authenticate with <AP>
[  +0,018266] wlp166s0: send auth to <AP> (try 1/3)
[  +0,040204] wlp166s0: authenticated
[  +0,000825] wlp166s0: associate with <AP> (try 1/3)
[  +0,008567] wlp166s0: RX AssocResp from <AP> (capab=0x1511 status=0 aid=1)
[  +0,040636] wlp166s0: associated
[  +0,000067] wlp166s0: Limiting TX power to 20 (23 - 3) dBm as advertised by <AP>

Setup

CPU:	12th Gen Intel(R) Core(TM) i5-1240P (16x)
GPU:	Alder Lake-P Integrated Graphics Controller (i915)
WIFI:	74% (wlp166s0)
BAT:	80% (Wear: 2%)
MACH:	03.06 (BIOS) 6.1.6 (KRNL) x86_64 (ARCH)
SWAP:	0GB (49GB/49GB free)
RAM:	0GB (29GB/31GB free)
NVME:	931,5G (nvme0n1, 731100WD)

BOOT:	/vmlinuz-6.1.6_1 cryptdevice=/dev/nvme0n1p3 i915.enable_psr=0 
BOOT:	iwlwifi.disable_11ax=Y lang=de locale=de_DE.UTF-8 loglevel=4 nvme.noacpi=1 
BOOT:	nvme_core.default_ps_max_latency_us=0 pcie_aspm=off rd.dm=0 rd.luks.crypttab=1 
BOOT:	rd.luks.uuid=cb2d4837-551d-4600-9149-484023cb9c9d rd.luks=1 rd.lvm=1 rd.md=0 
BOOT:	resume=UUID=5bbcc5b3-12a7-44a2-8a85-e3d4ba9be391 ro root=/dev/mapper/lvm-void 
BOOT:	snd_hda_intel.power_save=0 snd_hda_intel.power_save_controller=N 

Energy
tlp	PCIE_ASPM_ON_BAT=powersupersave

System
swap	vm.dirty_background_ratio = 5
swap	vm.dirty_ratio = 10
swap	vm.swappiness = 10

Pkg                 Vers
intel-media-driver  22.5.3_1
intel-ucode         20221108_1
libva-intel-driver  2.4.1_1
mesa-demos          8.4.0_3
mesa-dri            22.2.4_2
mesa-vulkan-intel   22.2.4_2

Edit: Updated my setup, removed nvme.noacpi and replaced it with pcie_aspm=off which seems to set it.

2 Likes

Note:
Maybe linux 6.2 will bring some improvements to the CPU

Intel’s In-Field Scan feature which will help system administrators detect faulty CPU cores was introduced in Linux kernel 5.19. However, it was not working properly. Now, Intel engineers have fixed the issues and it will be available in kernel 6.2. Going on with the news from Intel, the Intel On Demand platform, which is basically a pay-to-unlock and subscription-model hardware is receiving some improvements, including rebranding from Software Defined Silicon and some low-level changes as well.

The Alder Lake and Raptor Lake processors are receiving new updates for HWP (hardware P-states) in order to better calibrate the resulting frequencies on hybrid CPUs.

and to the GPU.

Intel drm-intel-next driver is receiving refactoring in the display code. The drm-intel-gt-next driver has also been updated for memory management improvements and some other small changes. With the treatment change in the Intel i915 driver, Mesa 23.0 for Vulcan can be able able to deliver performance metrics for Intel Arc Graphics. Furthermore, Intel is making its preparations for bringing Meteor Lake integrated graphics support, which will be the series of CPUs that will be introduced in 2023. There are 5 GPU IDs added for Meteor Lake, but they are disabled as expected.

There are also some misc changes which may be revelant:

USB and Thunderbolt interfaces are being updated as well, with many small changes. The USB driver drops support for some older hardware in Linux kernel 6.2 and brings wake-on-connect and wake-on-disconnect features for the USB4 interface.

Wake-on-connect … interesting, but another potential error source.

Edit: Also see the following fixes

Features and functionality:

  • Meteorlake display enabling (Animesh, Luca, Stan, Jouni, Anusha)
  • DP MST DSC support (Stan)
  • Gamma/degamma readout support for the state checker (Ville)
  • Enable SDP split support for DP 2.0 (Vinod)
  • Add probe blocking support to i915.force_probe parameter (Rodrigo)
  • Enable Xe HP 4tile support (Jonathan)

Refactoring and cleanups:

  • Color refactoring, especially related to DSB usage (Ville)
  • DSB refactoring (Ville)
  • DVO refactoring (Ville)
  • Backlight register and logging cleanups (Jani)
  • Avoid display direct calls to uncore (Maarten, Jani)
  • Add new “soc” sub-directory (Jani)
  • Refactor DSC platform support checks (Swati)

Fixes:

  • Interlace modes are no longer supported starting at display version 12 (Ankit)
  • Use polling read for aux control (Arun)
  • DMC firmware no longer requires specific versions (Gustavo)
  • Fix PSR flickering and freeze issues (Jouni)
  • Fix ICL+ DSI GPIO handling (Jani)
  • Ratelimit errors in display engine irqs (Lucas)
  • Fix DP MST DSC bpp and timeslot calculations (Stan)
  • Fix CDCLK squash and crawl sequences (Ville, Anusha)
  • Fix bigjoiner checks for fused pipes (Ville)
  • Fix ADP+ degamma LUT size (Ville)
  • Fix DVO ch7xxx and sil164 suspend/resume (Ville)
  • Fix memory leak in VBT parsing (Xia Fukun)
  • Fix VBT packet port selection for dual link DSI (Mikko Kovanen)
  • Fix SDP infoframe product string for discrete graphics (Clint)
  • Fix VLV/CHV HDMI/DP audio enable (Ville)
  • Fix VRR delays and calculations (Ville)
  • No longer disable transcoder for PHY test pattern change (Khaled)
  • Fix dual PPS handling (Ville)
  • Fix timeout and wait for DDI BUF CTL active after enabling (Ankit)

Merges:

  • Backmerge drm-next to sync up with v6.2-rc1 (Jani)

Especially

  • Fix PSR flickering and freeze issues (Jouni)
2 Likes

Not a VOID user, but following this thread for updates as it progresses.

Abou the random nvme disconnects:

I’m starting to believe it either has to be based on the bad WD firmware or my drive is somewhat broken.

I’m very inclined to get a Samsung 980 PRO M.2 NVMe SSD and check if that drive has similiar issues after a dd to it.

Edit:
Before going down this road I’m trying different kernels currently, all with these bootargs:

BOOT:   cryptdevice=/dev/nvme0n1p3
BOOT:   i915.enable_psr=0 iwlwifi.disable_11ax=Y lang=de locale=de_DE.UTF-8 loglevel=4
BOOT:   nvme_core.default_ps_max_latency_us=0 pcie_aspm=off rd.dm=0 rd.luks.crypttab=1
BOOT:   rd.luks.uuid=cb2d4837-551d-4600-9149-484023cb9c9d rd.luks=1 rd.lvm=1 rd.md=0
BOOT:   resume=UUID=5bbcc5b3-12a7-44a2-8a85-e3d4ba9be391 ro root=/dev/mapper/lvm-void
BOOT:   snd_hda_intel.power_save=0 snd_hda_intel.power_save_controller=N

Edit2:
I’m currently running 6.1.7 with the following tlp settings:

tlp     SOUND_POWER_SAVE_ON_AC=0
tlp     SOUND_POWER_SAVE_ON_BAT=0
tlp     SOUND_POWER_SAVE_CONTROLLER=N
tlp     TLP_DEFAULT_MODE=AC
tlp     TLP_PERSISTENT_DEFAULT=1
tlp     DISK_DEVICES=""

The sporadic wifi disconnects haven’t appeared again. I would blame my AP for this.

@Matt_Hartley
I’ve had success with running the above settings on tlp against 6.1.7 and had not a single nvme disconnect again so far (2 days without one). Lets hope it’s not a coincidence.

I’ll let it run a few more days without restarting and if all is well I figured out a configuration that I’ll keep for now.

1 Like

Nice! Good to hear this!

Yep, that’s great!

I still have these two issues remaining:

  • Display freezes for a short period every 6-7 seconds even with i915.enable_psr=0
    There is no kernel entry about this, audio plays during the whole time and everything else is smooth. Maybe it’s just a gpu buffer getting cleaned aggressively.
    Just realized even the mouse moves smoothly.
  • Audio pops on first module usage (snd_hda_intel)

I hope the first one is fixed with the 6.3 kernel, possibly the 6.2 one as well.

The second one … I have no idea. I’ve disabled audio power saving pretty much everywhere I found related settings and it still pops. It’s annoying, but I can live with that for now.

All in all, I am happy to have bought the laptop right now and not earlier, as I’m still having quite a few issues even with recent kernels (I’m on a rolling release, not everybody has the advantage of getting kernel updates so frequently and uptodate).

Edit:
Update, my nvme disconnected again this morning after nearly 3 days of not doing so. What a bummer. I’ll keep investigating, I already ruled out bad RAM. I guess it’s time to put Ubuntu on my usb stick and see if the issues are the same.

Edit2:
The stutter every 5 seconds is gone! I found a script that is run every 5 seconds that queries xrandr but did not pass --current to the command, so that it always tries to poll for hardware changes. Yay!

Edit3:
I can rule out a faulty RAM and bad kernel modules (I’ve tried a lot of stable and unstable kernel versions) and also any combination with ASPM/ACPI/NVME.maxlatency and TLP.
Since I’m on the latest firmware for both the laptop and the nvme I believe either the drive is faulty, the motherboard has an issue or the combination is bugged. I will run another test with Ubuntu 22.10 and one with no expansion ports plugged to see if it makes any difference. If this doesn’t work I will probably order a Samsung 990 PRO M.2 NVMe SSD and (or Crucial P5 Plus 1TB M.2 PCIe Gen4 NVMe) hope for reimbursement as I really don’t trust NVME disks of WD at this point and don’t want a replacement of them.