[SOLVED] Linux Mint 20.3 Cinnamon Recurring Problems

I have the same suspicion as a faulty RAM, because Firefox (or browsers in general) need a crazy memory management so a broken RAM stick could very well cause a sudden crash.

I’d try a memtest again, switch ram stick places and/or only try a single stick.

3 Likes

I echo this suggestion, worth ruling this out.

1 Like

I only have one stick; the new one I ordered shipped a couple days ago.
I’m afraid to try memtest86 again on my own after the first time (and I lost that thumb drive somehow), so I will be waiting to try the new memory stick because that should be conclusive.

Just found that in a session when Firefox crashes, Libreoffice is prone to it too.

I have a very similar issue on a different distribution.

The symptoms

  • Random segfaults in some programs:
    • Firefox
    • Firefox tabs
    • Saleae Logic 2
    • all programs using any version of Java
  • Recurrent file system corruption

My machine

  • Framework DIY i5 12th gen
  • 2*8 GiB of RAM
  • Linux Kernel 6.1.12-1
  • Clean install of Manjaro Linux GNOME
  • SN770 NVMe with EXT4 + LUKS full disk encryption

My findings

The segfaults doesn’t seems to be caused by file corruption, because I got some on a clean install, and after running fsck the FS was clean.

The FS corruption is correlated with NVME unsafe shutdowns. You can check the number of unsafe shutdowns with:
sudo smartctl -a /dev/nvme0 | grep 'Unsafe Shutdowns'

Unsafe NVME shutdown will often (but not always) cause FS corruption.

I can trigger unsafe NVME shutdown reproducibly using the following process:

  • Install Manjaro Linux with full disk encryption (not sure if the distro or encryption play a role here)
  • Install smartctl
  • check the number of unsafe NVME shutdown
  • reboot
  • when GRUB ask for decryption password press the shutdown button
  • power again the PC and log in (if you can because FS corruption could have occured)
  • check the number of unsafe NVME shutdown (it increased +1)

I can ALSO trigger unsafe NVME shutdown reproducibly using the following process:

  • Install Manjaro Linux with full disk encryption (not sure if the distro or encryption play a role here)
  • Install smartctl
  • Add nvme.noacpi=1 to the GRUB options
  • reboot and log in
  • check the number of unsafe NVME shutdown
  • shut down the PC by holding the power button
  • power again the PC and log in (if you can because FS corruption could have occured)
  • check the number of unsafe NVME shutdown (it increased +1)

A partial workaround

  • don’t use the power button to power off the PC
  • don’t use nvme.noacpi=1

This fixes the FS corruption (for now), but it doesn’t fixes the random segfaults, even on a clean install.

What I will test next

  • a clean install WITHOUT encryption
  • Ubuntu

@Cl00e9ment make sure the firmware for the nvme drive is up to date.

@uraneaUmbra Let us know once your new stick shows up. Good luck.

1 Like

sudo smartctl -a /dev/nvme0 | grep 'Firmware'

Firmware Version:                   731030WD
Firmware Updates (0x14):            2 Slots, no Reset required

I don’t know if it is up to date.

Try using fwupdmgr. It may just show that it is updateable and the firmware version. In whihc case you are going to have to look on the manufacturers website for that drive and determine if a newer version is available.

The NVME firmware is up to date.

The new stick of memory arrived, and with it installed I was finally able to install Linux 21.1 MATE!
Still on edge, since I don’t know what originally made it break, but there are no issues ten minutes in and with updates installed.

Thank you all again for all of your help. Here’s to hoping I don’t return to this thread with anything new.
And good luck, Cl00e9ment.

1 Like

Since this was solved for OP, marking solved.

All, the thread is open. If others experience issues, feel free to add to this thread. But this is marked as solved.

2 Likes

I ran memtest86 and I sure have a faulty RAM stick too. I’m relieved to have found the root cause of those issues!

After removing the faulty RAM stick, everything seems to work great. I’ll contact the Framework support to have it exchanged.

I don’t know how many people are affected by this default. Visibly, it’s not a single isolated issue that affect only one person. Framework should have a word with Crucial because their QA is not incredibly good.

Delighted to hear that the root cause has been identified. Appreciate the feedback on the Crucial ram as well. Stuff like this is very much reactive vs proactive. In terms of sheer volume, percentages of failing ram are incredibly small in terms of overall volume. That said, I totally get how frustrating this is. I do.

Please do contact Framework support to address the bad ram. And again, I appreciate the feedback on the bad ram experience.

1 Like