[SOLVED] Various issues of 12th gen with Void Linux

Anachron · January 19, 2023, 7:25am

This is somewhat both a tracking issue for self-support as well as a support request.

Issues

NVME still disconnects, even though not so often.
Micro stutters every X seconds even with i915.enable_psr=0
Better than without, but still not gone.
Audio pops on first module usage (snd_hda_intel)
Kernel parameters do not apply automatically,
but can be written “manually” after boot

/sys/module/snd_hda_intel/parameters/power_save:0
/sys/module/snd_hda_intel/parameters/power_save_controller:N
/sys/module/nvme_core/parameters/default_ps_max_latency_us:0

Kernel parameters are not writable
Hint: seems to be written when using pcie_aspm=off.

/sys/module/nvme/parameters/noacpi:Y

HDMI Splitter does not reliably work
ToDo: Try to set up EDID manually for it to work
Wifi disconnects randomly (possibly AP)

[19. Jan 13:42] wlp166s0: disassociated from <AP> (Reason: 8=DISASSOC_STA_HAS_LEFT)
[  +2,504500] wlp166s0: authenticate with <AP>
[  +0,018266] wlp166s0: send auth to <AP> (try 1/3)
[  +0,040204] wlp166s0: authenticated
[  +0,000825] wlp166s0: associate with <AP> (try 1/3)
[  +0,008567] wlp166s0: RX AssocResp from <AP> (capab=0x1511 status=0 aid=1)
[  +0,040636] wlp166s0: associated
[  +0,000067] wlp166s0: Limiting TX power to 20 (23 - 3) dBm as advertised by <AP>

Setup

CPU:	12th Gen Intel(R) Core(TM) i5-1240P (16x)
GPU:	Alder Lake-P Integrated Graphics Controller (i915)
WIFI:	74% (wlp166s0)
BAT:	80% (Wear: 2%)
MACH:	03.06 (BIOS) 6.1.6 (KRNL) x86_64 (ARCH)
SWAP:	0GB (49GB/49GB free)
RAM:	0GB (29GB/31GB free)
NVME:	931,5G (nvme0n1, 731100WD)

BOOT:	/vmlinuz-6.1.6_1 cryptdevice=/dev/nvme0n1p3 i915.enable_psr=0 
BOOT:	iwlwifi.disable_11ax=Y lang=de locale=de_DE.UTF-8 loglevel=4 nvme.noacpi=1 
BOOT:	nvme_core.default_ps_max_latency_us=0 pcie_aspm=off rd.dm=0 rd.luks.crypttab=1 
BOOT:	rd.luks.uuid=cb2d4837-551d-4600-9149-484023cb9c9d rd.luks=1 rd.lvm=1 rd.md=0 
BOOT:	resume=UUID=5bbcc5b3-12a7-44a2-8a85-e3d4ba9be391 ro root=/dev/mapper/lvm-void 
BOOT:	snd_hda_intel.power_save=0 snd_hda_intel.power_save_controller=N 

Energy
tlp	PCIE_ASPM_ON_BAT=powersupersave

System
swap	vm.dirty_background_ratio = 5
swap	vm.dirty_ratio = 10
swap	vm.swappiness = 10

Pkg                 Vers
intel-media-driver  22.5.3_1
intel-ucode         20221108_1
libva-intel-driver  2.4.1_1
mesa-demos          8.4.0_3
mesa-dri            22.2.4_2
mesa-vulkan-intel   22.2.4_2

Edit: Updated my setup, removed nvme.noacpi and replaced it with pcie_aspm=off which seems to set it.

Anachron · January 19, 2023, 8:38am

Note:
Maybe linux 6.2 will bring some improvements to the CPU

Intel’s In-Field Scan feature which will help system administrators detect faulty CPU cores was introduced in Linux kernel 5.19. However, it was not working properly. Now, Intel engineers have fixed the issues and it will be available in kernel 6.2. Going on with the news from Intel, the Intel On Demand platform, which is basically a pay-to-unlock and subscription-model hardware is receiving some improvements, including rebranding from Software Defined Silicon and some low-level changes as well.

The Alder Lake and Raptor Lake processors are receiving new updates for HWP (hardware P-states) in order to better calibrate the resulting frequencies on hybrid CPUs.

and to the GPU.

Intel drm-intel-next driver is receiving refactoring in the display code. The drm-intel-gt-next driver has also been updated for memory management improvements and some other small changes. With the treatment change in the Intel i915 driver, Mesa 23.0 for Vulcan can be able able to deliver performance metrics for Intel Arc Graphics. Furthermore, Intel is making its preparations for bringing Meteor Lake integrated graphics support, which will be the series of CPUs that will be introduced in 2023. There are 5 GPU IDs added for Meteor Lake, but they are disabled as expected.

There are also some misc changes which may be revelant:

USB and Thunderbolt interfaces are being updated as well, with many small changes. The USB driver drops support for some older hardware in Linux kernel 6.2 and brings wake-on-connect and wake-on-disconnect features for the USB4 interface.

Wake-on-connect … interesting, but another potential error source.

Edit: Also see the following fixes

Features and functionality:

Meteorlake display enabling (Animesh, Luca, Stan, Jouni, Anusha)

DP MST DSC support (Stan)

Gamma/degamma readout support for the state checker (Ville)

Enable SDP split support for DP 2.0 (Vinod)

Add probe blocking support to i915.force_probe parameter (Rodrigo)

Enable Xe HP 4tile support (Jonathan)

Refactoring and cleanups:

Color refactoring, especially related to DSB usage (Ville)

DSB refactoring (Ville)

DVO refactoring (Ville)

Backlight register and logging cleanups (Jani)

Avoid display direct calls to uncore (Maarten, Jani)

Add new “soc” sub-directory (Jani)

Refactor DSC platform support checks (Swati)

Fixes:

Interlace modes are no longer supported starting at display version 12 (Ankit)

Use polling read for aux control (Arun)

DMC firmware no longer requires specific versions (Gustavo)

Fix PSR flickering and freeze issues (Jouni)

Fix ICL+ DSI GPIO handling (Jani)

Ratelimit errors in display engine irqs (Lucas)

Fix DP MST DSC bpp and timeslot calculations (Stan)

Fix CDCLK squash and crawl sequences (Ville, Anusha)

Fix bigjoiner checks for fused pipes (Ville)

Fix ADP+ degamma LUT size (Ville)

Fix DVO ch7xxx and sil164 suspend/resume (Ville)

Fix memory leak in VBT parsing (Xia Fukun)

Fix VBT packet port selection for dual link DSI (Mikko Kovanen)

Fix SDP infoframe product string for discrete graphics (Clint)

Fix VLV/CHV HDMI/DP audio enable (Ville)

Fix VRR delays and calculations (Ville)

No longer disable transcoder for PHY test pattern change (Khaled)

Fix dual PPS handling (Ville)

Fix timeout and wait for DDI BUF CTL active after enabling (Ankit)

Merges:

Backmerge drm-next to sync up with v6.2-rc1 (Jani)

Especially

Fix PSR flickering and freeze issues (Jouni)

Matt_Hartley · January 20, 2023, 7:06pm

Not a VOID user, but following this thread for updates as it progresses.

Anachron · January 22, 2023, 3:44pm

Abou the random nvme disconnects:

I’m starting to believe it either has to be based on the bad WD firmware or my drive is somewhat broken.

I’m very inclined to get a Samsung 980 PRO M.2 NVMe SSD and check if that drive has similiar issues after a dd to it.

Edit:
Before going down this road I’m trying different kernels currently, all with these bootargs:

BOOT:   cryptdevice=/dev/nvme0n1p3
BOOT:   i915.enable_psr=0 iwlwifi.disable_11ax=Y lang=de locale=de_DE.UTF-8 loglevel=4
BOOT:   nvme_core.default_ps_max_latency_us=0 pcie_aspm=off rd.dm=0 rd.luks.crypttab=1
BOOT:   rd.luks.uuid=cb2d4837-551d-4600-9149-484023cb9c9d rd.luks=1 rd.lvm=1 rd.md=0
BOOT:   resume=UUID=5bbcc5b3-12a7-44a2-8a85-e3d4ba9be391 ro root=/dev/mapper/lvm-void
BOOT:   snd_hda_intel.power_save=0 snd_hda_intel.power_save_controller=N

Edit2:
I’m currently running 6.1.7 with the following tlp settings:

tlp     SOUND_POWER_SAVE_ON_AC=0
tlp     SOUND_POWER_SAVE_ON_BAT=0
tlp     SOUND_POWER_SAVE_CONTROLLER=N
tlp     TLP_DEFAULT_MODE=AC
tlp     TLP_PERSISTENT_DEFAULT=1
tlp     DISK_DEVICES=""

The sporadic wifi disconnects haven’t appeared again. I would blame my AP for this.

Anachron · January 25, 2023, 7:16pm

@Matt_Hartley
I’ve had success with running the above settings on tlp against 6.1.7 and had not a single nvme disconnect again so far (2 days without one). Lets hope it’s not a coincidence.

I’ll let it run a few more days without restarting and if all is well I figured out a configuration that I’ll keep for now.

Matt_Hartley · January 25, 2023, 7:18pm

Nice! Good to hear this!

Anachron · January 25, 2023, 7:32pm

Yep, that’s great!

I still have these two issues remaining:

Display freezes for a short period every 6-7 seconds even with i915.enable_psr=0
There is no kernel entry about this, audio plays during the whole time and everything else is smooth. Maybe it’s just a gpu buffer getting cleaned aggressively.
Just realized even the mouse moves smoothly.
Audio pops on first module usage (snd_hda_intel)

I hope the first one is fixed with the 6.3 kernel, possibly the 6.2 one as well.

The second one … I have no idea. I’ve disabled audio power saving pretty much everywhere I found related settings and it still pops. It’s annoying, but I can live with that for now.

All in all, I am happy to have bought the laptop right now and not earlier, as I’m still having quite a few issues even with recent kernels (I’m on a rolling release, not everybody has the advantage of getting kernel updates so frequently and uptodate).

Edit:
Update, my nvme disconnected again this morning after nearly 3 days of not doing so. What a bummer. I’ll keep investigating, I already ruled out bad RAM. I guess it’s time to put Ubuntu on my usb stick and see if the issues are the same.

Edit2:
The stutter every 5 seconds is gone! I found a script that is run every 5 seconds that queries xrandr but did not pass --current to the command, so that it always tries to poll for hardware changes. Yay!

Edit3:
I can rule out a faulty RAM and bad kernel modules (I’ve tried a lot of stable and unstable kernel versions) and also any combination with ASPM/ACPI/NVME.maxlatency and TLP.
Since I’m on the latest firmware for both the laptop and the nvme I believe either the drive is faulty, the motherboard has an issue or the combination is bugged. I will run another test with Ubuntu 22.10 and one with no expansion ports plugged to see if it makes any difference. If this doesn’t work I will probably order a Samsung 990 PRO M.2 NVMe SSD and (or Crucial P5 Plus 1TB M.2 PCIe Gen4 NVMe) hope for reimbursement as I really don’t trust NVME disks of WD at this point and don’t want a replacement of them.

Anachron · February 13, 2023, 5:09pm

So I’ve actually bough the Samsung 990 Pro, cloned all my data to it and it’s running 6+ days now without a single disconnect!

In fact I strongly believe now that my WD-NVME is faulty. I’ve asked Framework Support to send the drive back and get the money back, as I’m very happy with my new Samsung NVME.

There are still some issues left, which I’m about to go down in the coming weeks, but nothing that really stops me from being productive.

Matt_Hartley · February 13, 2023, 7:19pm

Going to mark this as resolved, but we can always switch back to tracking later if need be.

Anachron · February 20, 2023, 4:35pm

@Matt_Hartley in fact since I’ve switched the NVME two weeks ago to a Samsung one I haven’t had a single disconnect. So it confirmed,- my drive was not fine.

I’ll try to revert some of the changes, mainly the nvme+aspm kernel-parameters to see if I can finally remove all this cruft with a working NVME now.

I’m very pleased with my setup now! It’s fast, silent and very convinient. This laptop rocks!

Matt_Hartley · February 21, 2023, 7:15pm

Delighted to hear you’ve gotten this resolved with a different drive. For the sake of tracking, what drive did you have previously once again? The model of WD specifically. Thanks

Anachron · February 21, 2023, 8:05pm

I got the 1TB - WD_BLACK™ SN770 NVMe™ one.

My current todos are:

Receive and setup eGPU
Install a Windows 11 VM on QEMU with SSD and eGPU passthrough

My current issues are:

dhcpcd logs requesting DHCPv6 information every 10 seconds
ISSUE: dhcpcd 9.4.1 requests DHCPv6 info every 10 seconds · Issue #80 · NetworkConfiguration/dhcpcd · GitHub
HDMI Splitter does not reliably work
ToDo: Try to set up EDID manually for it to work
Sometimes expansion ports stop working and need replugging
IDEA: Related to kernel updates?
Sometimes does not wake up from suspend
IDEA: Remove ACPI/NVME fixes
Audio pops on first module usage (snd_hda_intel)
BETTER: Setup modprobe.blacklist=hid_sensor_hub,pcspkr,snd_pcsp

My past issues:

Short graphical stutters every X seconds even with i915.enable_psr=0
Better than without, but still not gone.
DONE: xrandr needs --current to not poll for changes
error root: ACPI action undefined: PNP0C0A:00
Fix inside /etc/acpi/handler.sh
DONE: Added case condition
NVME disconnects sporadically
DONE: Replaced drive with a Samsung Pro 990

Matt_Hartley · February 21, 2023, 9:58pm

Thanks for the details above. On the point of the HDMI splitter, something I had to do for my display card was to set up a kernel parameter as sometimes the DP card wouldn’t kick on when I also had HDMI connected.

For me, I used video=DP-1:1920x1080M@60 which gave me the desired resolution and framerate (changed to the desired settings). Takes affect the login screen. Basically forces the port to wake up regardless of other settings. So for you, it would be video=HDMI-1 or -2, then set the resolution.

Likely an TLP thing, see mine attached:

I’d be interested to hear if anything here could be TLP related.

Anachron · March 7, 2023, 8:02pm

I got a Razer Core X eGPU case with an ASRock RX 6600 GPU and guess what?

$ DRI_PRIME=1 glmark2
=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      AMD
    GL_RENDERER:    AMD Radeon RX 6600 (navi23, LLVM 12.0.1, DRM 3.49, 6.1.15_1)
    GL_VERSION:     4.6 (Compatibility Profile) Mesa 22.3.5
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 1041 FrameTime: 0.961 ms
[build] use-vbo=true: FPS: 1358 FrameTime: 0.737 ms
[texture] texture-filter=nearest: FPS: 1301 FrameTime: 0.769 ms
[texture] texture-filter=linear: FPS: 1303 FrameTime: 0.767 ms
[texture] texture-filter=mipmap: FPS: 1300 FrameTime: 0.770 ms
[shading] shading=gouraud: FPS: 1060 FrameTime: 0.944 ms
[shading] shading=blinn-phong-inf: FPS: 1082 FrameTime: 0.925 ms
[shading] shading=phong: FPS: 1296 FrameTime: 0.772 ms
[shading] shading=cel: FPS: 1294 FrameTime: 0.773 ms
[bump] bump-render=high-poly: FPS: 1300 FrameTime: 0.769 ms
[bump] bump-render=normals: FPS: 1300 FrameTime: 0.769 ms
[bump] bump-render=height: FPS: 1301 FrameTime: 0.769 ms

It works without any issues. In fact, the fans do not even start spinning because it’s not enough load … I guess I will try to run Unigine now to really check what this GPU is capable off.

Oh and also congratz to the Framework team! I connected my eGPU, rebooted and … that was it! I can use my external eGPU.

Anachron · March 7, 2023, 9:26pm

Check my benchmark results here: [GUIDE] eGPU performance tests using AMD RX 580 on Linux - #5 by Anachron

My next ToDo will be to replace those noisy eGPU fans with some Noctua ones.

Sarah_Gould · March 26, 2023, 5:04am

I’m interested in trying out Void Linux myself (currently using PopOS), but I’m not sure I understand how to apply the changes you made - but maybe with the continued open issues I should hold off for now.

Anachron · March 26, 2023, 10:04am

Hey Sarah,

I am planning to update this.

The issues I have are not distro specific, as you can find out by searching the forums.

The fixes however sometimes are, because some tools/packages dont exist on other distros or is not the default.

I can suggest to give Void at least a try.

variegated.vanilla · September 28, 2024, 4:54pm

Hi all,

I’m experiencing what seems like the same issue on the AMD Framework 13" DIY running Fedora 40. This has been difficult to diagnose* because the whole system freezes when the harddisk becomes read only.

Wondering if anyone could help me confirm this is the WD Black drive… If this is not the same issue described in this thread please let me know and I will create a separate issue.

* (For a long time I was having issues related to sleep/wake on AMD, confounding this issue. Then because of the non-responsiveness I thought I was running out of memory or something).

Hardware Information:

Hardware Model: Framework Laptop 13 AMD Ryzen 7040Series
Memory: 16.0 GiB
Processor: AMD Ryzen™ 5 7640U w/ Radeon™ 760M Graphics × 12
Graphics: AMD Radeon™ 760M
Disk Information: WD_BLACK SN770 250GB (731100WD) BTRFS/LUKS

Software Information:

Firmware Version: 03.05
OS Name: Fedora Linux 40 (Workstation Edition)
OS Build: (null)
OS Type: 64-bit
GNOME Version: 46
Windowing System: Wayland
Kernel Version: Linux 6.6.6-200.fc39.x86_64

Issue Description:

I’ll be using my computer as usual when suddenly it freezes up. Most typically I am in Firefox or VSCode. Usually an issue with the current application presents first** then within a few seconds the whole system freezes. This happens sometimes more than once per day, sometimes only every few days.

** (e.g. “Firefox is not responding” or VSCode cannot save the file because the disk is read only).

Usually It’s a total lockup. I cannot switch to a vterm. The computer is completely unresponsive and I must do a hard power off

Once I was able to leave dmesg --follow-new in an open terminal to catch the issue

Sometimes I can ask the system to reboot
When in this state where the display is not locked up, I cannot run any applications

The power controls even disappear from the GNOME power menu

When I do a soft power off I see daemons and unmounts failing left and right… The machine fails to power itself off.

Another example…

Chris_J · September 28, 2024, 5:45pm

Firstly, avoid bumping old posts. This post was about 12th gen with Void Linux not AMD with Fedora…

Anyway, I had that exact issue many times in the past but I actually don’t remember how I got it fixed… But, some troubleshooting steps can be done:

Update your kernel to the latest version
Run memtest86 to see if your RAM is defective
Remove any unnecessary extensions from GNOME
Run sudo journalctl -b -p 3..4 To see warnings and errors
Run sudo dmesg > kernel.log to save the kernel logs if you can or sudo dmesg to look at the logs while your disk is in read-only mode

If you were experiencing this when you first got your laptop, then some of your hardware is defective so:

If RAM was defective, then replace it
Run sudo smartctl -a /dev/nvme0 to see the health of the SSD
(BACK UP YOUR DATA FIRST!) If you see warnings with the above command, then your SSD is defective and it needs to be replaced

If any of your hardware is defective, then you can probably get new ones for free since SSDs and RAM usually have 3+ year warranties.

Also, remember to contact Framework support if you received the RAM and SSD from them.