[TRACKING] Linux freezing on multiple distros

What distros have been affected by the issue besides Fedora and Pop OS?

@bearislive For me, the WD SSD caused the issue. I had no freezes since replacing it with a Samsung one. Mind that it did also not report any smart errors and Iā€™m currently reusing it in my Windows machine where it works perfectly fine. My theory is that some WD SSDs have compatibility issues with Linux.

Do you have a different SSD you could swap in for testing? Alternatively you could try to remove it temporarily and boot from a USB disk for testing.

I think someone else had issues with the wireless card and resolved it by requesting a replacement from framework. You could try temporarily removing it.

@SinistralFern23 I donā€™t think the freezing issues are necessarily distro dependent. They all cook with water after all.

Interesting, Framework support now also recommended to remove the SSD and try running Linux from an USB stick to see if the freezes still happen. Letā€™s see how it goes :slight_smile:

1 Like

If you have a DVD burner you could try booting from a live Linux disc.

Iā€™m still have freeze and random highlighting, following is my journal. I donā€™t know what any of this means. (BTW, I found that eliminating snapd lowered my laptop temp by about 40c.)

qrp@pop-os:~$ sudo journalctl -k -p4 -b-1 --no-pager
[sudo] password for qrp:
Dec 14 18:00:10 pop-os kernel: ENERGY_PERF_BIAS: Set to ā€˜normalā€™, was ā€˜performanceā€™
Dec 14 18:00:10 pop-os kernel: pci 0000:00:07.0: DPC: RP PIO log size 0 is invalid
Dec 14 18:00:10 pop-os kernel: pci 0000:00:07.1: DPC: RP PIO log size 0 is invalid
Dec 14 18:00:10 pop-os kernel: pci 0000:00:07.2: DPC: RP PIO log size 0 is invalid
Dec 14 18:00:10 pop-os kernel: pci 0000:00:07.3: DPC: RP PIO log size 0 is invalid
Dec 14 18:00:10 pop-os kernel: pnp 00:02: disabling [mem 0xc0000000-0xcfffffff] because it overlaps 0000:00:02.0 BAR 9 [mem 0x00000000-0xdfffffff 64bit pref]
Dec 14 18:00:10 pop-os kernel: resource sanity check: requesting [mem 0xfedc0000-0xfedcdfff], which spans more than pnp 00:02 [mem 0xfedc0000-0xfedc7fff]
Dec 14 18:00:10 pop-os kernel: caller __uncore_imc_init_box+0xc0/0x110 mapping multiple BARs
Dec 14 18:00:10 pop-os kernel: i8042: Warning: Keylock active
Dec 14 18:00:10 pop-os kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
Dec 14 18:00:10 pop-os kernel: platform eisa.0: EISA: Cannot allocate resource for mainboard
Dec 14 18:00:10 pop-os kernel: platform eisa.0: Cannot allocate resource for EISA slot 1
Dec 14 18:00:10 pop-os kernel: platform eisa.0: Cannot allocate resource for EISA slot 2
Dec 14 18:00:10 pop-os kernel: platform eisa.0: Cannot allocate resource for EISA slot 3
Dec 14 18:00:10 pop-os kernel: platform eisa.0: Cannot allocate resource for EISA slot 4
Dec 14 18:00:10 pop-os kernel: platform eisa.0: Cannot allocate resource for EISA slot 5
Dec 14 18:00:10 pop-os kernel: platform eisa.0: Cannot allocate resource for EISA slot 6
Dec 14 18:00:10 pop-os kernel: platform eisa.0: Cannot allocate resource for EISA slot 7
Dec 14 18:00:10 pop-os kernel: platform eisa.0: Cannot allocate resource for EISA slot 8
Dec 14 18:00:10 pop-os kernel: acpi PNP0C14:01: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
Dec 14 18:00:10 pop-os kernel: usb: port power management may be unreliable
Dec 14 18:00:10 pop-os kernel: cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum 01
Dec 14 18:00:10 pop-os kernel: cros-ec-dev cros-ec-dev.1.auto: cannot get EC features: -74
Dec 14 18:00:10 pop-os kernel: i2c_designware i2c_designware.2: i2c_dw_handle_tx_abort: lost arbitration
Dec 14 18:00:10 pop-os kernel: usb 3-4: device descriptor read/64, error -71
Dec 14 18:00:10 pop-os kernel: usb 3-4: device descriptor read/64, error -71
Dec 14 18:00:10 pop-os kernel: usb 3-4: device descriptor read/64, error -71
Dec 14 18:00:10 pop-os kernel: usb 3-4: device descriptor read/64, error -71
Dec 14 18:00:10 pop-os kernel: system76_acpi: loading out-of-tree module taints kernel.
Dec 14 18:00:10 pop-os kernel: usb 3-4: Device not responding to setup address.
Dec 14 18:00:10 pop-os kernel: usb 3-4: Device not responding to setup address.
Dec 14 18:00:10 pop-os kernel: usb 3-4: device not accepting address 4, error -71
Dec 14 18:00:10 pop-os kernel: usb 3-4: Device not responding to setup address.
Dec 14 18:00:10 pop-os kernel: usb 3-4: Device not responding to setup address.
Dec 14 18:00:10 pop-os kernel: usb 3-4: device not accepting address 5, error -71
Dec 14 18:00:10 pop-os kernel: usb usb3-port4: unable to enumerate USB device
Dec 14 18:00:10 pop-os kernel: FAT-fs (nvme0n1p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Dec 14 18:00:10 pop-os kernel: FAT-fs (nvme0n1p2): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Dec 14 18:00:10 pop-os kernel: iwlwifi 0000:aa:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-72.ucode failed with error -2
Dec 14 18:00:10 pop-os kernel: iwlwifi 0000:aa:00.0: api flags index 2 larger than supported by driver
Dec 14 18:00:10 pop-os kernel: spi-nor: probe of spi0.0 failed with error -524
Dec 14 18:00:10 pop-os kernel: thermal thermal_zone5: failed to read out thermal zone (-61)
Dec 14 18:00:11 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [_TZ.ETMD], AE_NOT_FOUND (20220331/psargs-330)
Dec 14 18:00:11 pop-os kernel:
Dec 14 18:00:11 pop-os kernel: No Local Variables are initialized for Method [_OSC]
Dec 14 18:00:11 pop-os kernel:
Dec 14 18:00:11 pop-os kernel: Initialized Arguments for Method [_OSC]: (4 arguments defined for method invocation)
Dec 14 18:00:11 pop-os kernel: Arg0: 00000000b8bd820a Buffer(16) 5D A8 3B B2 B7 C8 42 35
Dec 14 18:00:11 pop-os kernel: Arg1: 00000000a5adede7 Integer 0000000000000001
Dec 14 18:00:11 pop-os kernel: Arg2: 0000000036ea10ab Integer 0000000000000002
Dec 14 18:00:11 pop-os kernel: Arg3: 00000000c51c9ea2 Buffer(8) 00 00 00 00 05 00 00 00
Dec 14 18:00:11 pop-os kernel:
Dec 14 18:00:11 pop-os kernel: ACPI Error: Aborting method _SB.IETM._OSC due to previous error (AE_NOT_FOUND) (20220331/psparse-529)
Dec 14 18:00:12 pop-os kernel: Bluetooth: hci0: Malformed MSFT vendor event: 0x02
Dec 14 18:00:12 pop-os kernel: Bluetooth: hci0: Bad flag given (0x1) vs supported (0x0)
Dec 14 18:00:23 pop-os kernel: usb 3-4: device descriptor read/64, error -71
Dec 14 18:00:23 pop-os kernel: usb 3-4: device descriptor read/64, error -71
Dec 14 18:00:24 pop-os kernel: usb 3-4: device descriptor read/64, error -71
Dec 14 18:00:41 pop-os kernel: usb usb2-port3: unable to enumerate USB device
Dec 14 18:25:56 pop-os kernel: cros_ec_lpcs cros_ec_lpcs.0: bad packet checksum f4
Dec 14 18:54:24 pop-os kernel: ACPI BIOS Error (bug): Could not resolve symbol [_TZ.ETMD], AE_NOT_FOUND (20220331/psargs-330)
Dec 14 18:54:24 pop-os kernel:
Dec 14 18:54:24 pop-os kernel: No Local Variables are initialized for Method [_OSC]
Dec 14 18:54:24 pop-os kernel:
Dec 14 18:54:24 pop-os kernel: Initialized Arguments for Method [_OSC]: (4 arguments defined for method invocation)
Dec 14 18:54:24 pop-os kernel: Arg0: 00000000d40b36cf Buffer(16) 5D A8 3B B2 B7 C8 42 35
Dec 14 18:54:24 pop-os kernel: Arg1: 0000000010fd8b0b Integer 0000000000000001
Dec 14 18:54:24 pop-os kernel: Arg2: 000000007dda4787 Integer 0000000000000002
Dec 14 18:54:24 pop-os kernel: Arg3: 00000000223308ca Buffer(8) 00 00 00 00 00 00 00 00
Dec 14 18:54:24 pop-os kernel:
Dec 14 18:54:24 pop-os kernel: ACPI Error: Aborting method _SB.IETM._OSC due to previous error (AE_NOT_FOUND) (20220331/psparse-529)
qrp@pop-os:~$

Laptop Make: Model > Framework: Laptop AB
OS: Pop!_OS 22.04 LTS x86_64
DE: GNOME 42.3.1
Kernel: 6.0.6-76060006-generic
Shell: bash 5.1.16
WM: Mutter
CPU: 11th Gen Intel i7-1165G7 (8) @ 4.700GHz [123.8Ā°F]
CPU Usage: 7%
Disk (/): 14G / 220G (7%)
GPU: Intel TigerLake-LP GT2 [Iris Xe Graphics]
GPU Driver: i915
Memory: 3346MiB / 15785MiB (21%)
Resolution: 2256x1504

I am glad others have found solutions, my problem is that there are too many solutions so we are still shooting in the dark. For instance, several people have identified the SN850 SSD as the culprit, however I have the 250GB - WD_BLACKā„¢ SN750 NVMeā„¢. So maybe it is Western Digital? I cannot afford to try one solution after another, so I think it is for FW to step up and devote some time and money here, otherwise the one certain thing is I will not be a returning customer and should anyone ask me about FW I will tell them it will be a lonely experience and to look elsewhere.

I had this exact SSD model you have and also suffered from freezes. Framework support quicky identified it as the problem and replaced it. Freezes were immediately gone :slight_smile:

I can recommend contacting the official Framework support regarding your issue. They were really friendly, quick and helpful.

Ii have opened a ticket with FW support, when I get a solution I will post it here.

1 Like

Did you get a solution yet?

Iā€™m undecided whether I should try to find a combination of kernel/laptop firmware/nvme firmware that works without those random disconnects or ask for a replacement as I already have started to set up my OS on itā€¦

Everyone, if youā€™re experiencing freezing on:

  • Multiple distros
  • Andā€¦please donā€™t skip this part, on Live USBs of multiple distros

Then itā€™s time to open a support ticket as this may be a hardware issue.

Iā€™m possibly out, my recent changes did probably improve the situation as I got no more disconnect yet.

Will do further tests, especially without charging as I feel this could have been a factor for me as well.

I upgraded the framework and WD black firmware to the latest and set ā€œnvme_core.default_ps_max_latency_us=6000ā€ in grub.

This going to be the recommend course of action, well done. Pop OS users:

sudo kernelstub -a "nvme_core.default_ps_max_latency_us=6000"

Users of grub on other distros:
sudo sed -i '/^GRUB_CMDLINE_LINUX_DEFAULT/ s/"$/ nvme_core.default_ps_max_latency_us=6000"/' /etc/default/grub

That will append (without messing up) your exciting parameters.

Just commenting to say that these 6000 are not for everybody.

I ran smartctl -x /dev/nvme0n1p3 to figure out which power states my device uses. I then started with the second biggest sleep state which my drive enters at 6000.

Iā€™ve also set my sleep state to deep. (mem_sleep_default=deep)

I found some useful information here:
https://ubuntu-bugs.narkive.com/WoKhDzSP/bug-1678184-new-apst-quirk-needed-for-samsung-512gb-nvme-drive

It looks like somw drives have multiple issues, so that may not be successful for everbody.

I hope this will improve the situation for some people at least.

1 Like

@Anachron what were the power states on that drive?

1 Like

Edited my own post, great point.

Iā€™m sorry I have to correct myself again, I was typing from mobile but I remembered the command wrong.

It actually is smartctl -x /dev/nvme0.

This is my result:

...

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.00W    5.00W       -    0  0  0  0        0       0
 1 +     3.30W    3.00W       -    0  0  0  0        0       0
 2 +     2.20W    2.00W       -    0  0  0  0        0       0
 3 -   0.0150W       -        -    3  3  3  3     1500    2500
 4 -   0.0050W       -        -    4  4  4  4    10000    6000
 5 -   0.0033W       -        -    5  5  5  5   176000   25000

...

As you can see, the Ex_Lat is about 6000 of the 4th powerstate which I use.

It would be great to hear if this solved somebody elses problem as well.

@Anachron yes mine look like this:

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.50W       -        -    0  0  0  0        5     305
 1 +   3.9000W       -        -    1  1  1  1       30     330
 2 +   1.5000W       -        -    2  2  2  2      100     400
 3 -   0.0500W       -        -    3  3  3  3      500    1500
 4 -   0.0050W       -        -    4  4  4  4     1000    9000

I have not had any consistent problems outside of docked scenarios, and I donā€™t think I am hitting any issues with my drive, but I was just interested to see what it looked like on the WD drives. Mine is a SK Hynix P41 Platinum 2TB.

Iā€™m just replying here to confirm that I have not had any single NVME drive issues or freezes or anything alike since Iā€™ve updated the firmware from framework, the NVME and added those kernel parameters mentioned above.

Please for those who had issues like this, let us know if theyā€™re solved or if we can assist you further to get you (and keep you) running!

FW replaced one of my memory sticks, but I think I can trace the problem to Proton VPN. Many months ago FW told me that the Proton Kill Switch has been causing similar problems and advised me to turn off (kill?) the Kill Switch when the computer says itā€™s connected to wifi and there are browser problems such as non-response and freezing - that did the trick and the connection was re-enabled and the browser stopped freezing. This time I think part of this problem was that my then-current OS, Pop!, is not validated by Proton, so I have gone back to Ubuntu which is so validated. I am no longer having problems with the laptop.

1 Like

As a reminder, anytime you have issues with a Linux install, start disabling or removing stuff. Proton would be one such example. This is non-default software that can do some creative things. :slight_smile: