Oh no! That crash has eaten my Linux installation!

Has anyone else hit something like this?

I have a Framework 13" AMD with a Samsung 990 Pro NVMe stick, still on 3.03 firmware, running Debian 12/Trixie, with SMB shares mounted via CIFS tools and connected through a wired USB-C dongle.

Three times, I’ve had a hard crash when accessing files from the network – so hard that the root partition has suffered corruption and needed a full fsck and eventual reinstall.

I think that power management has put the USB network dongle to sleep, but doesn’t wake it in time so that the file access fails in a horrendous crash. I hope this doesn’t happen to you – has anyone else seen this?

K3n.

I’ve had similar things happen on my systems, but not any time recently. I think I was using an early version of the btrfs file-system on my boot drive the last couple times I saw them.

(I love btrfs, but I do NOT use or recommend it [or any checksummed file system] on your boot drive, unless you have it in RAID1 or a similar RAID mode. btrfs, at least, won’t let you even access any file that doesn’t match its checksum, and if that file is at all important to the boot process, the system can’t boot, period. At least non-checksummed file systems can usually still boot when a file is only slightly corrupted.)

I had two of these hard crashes about a week ago, and yes, one of them left the machine unbootable (but salvageable by booting from a flash drive and running fsck.ext4), and the other ate my Chromium profile and one git repository (which resulted in one directory that can be only renamed but not deleted; but I can’t be bothered to reinstall…)

I had a 9 day streak without those hard crashes after updating to BIOS 3.05, so I would recommend you to do that first. But you will also need to update your AMD GPU firmware, because neither stable, Trixie or Sid have a version that is recent enough to fix all known issues.

I thought I updated my firmware weeks ago, but after one more “soft” crash yesterday (GPU crash only, i.e., the system is still running and accessible over SSH, so does it not result in FS damage), I found out I needed to update initrd to actually make the replaced firmware load during boot, as described here:

Today I updated the “InstallingDebianOn” guide for AMD 7040 series Framework, you can take a look at that as well:

1 Like

You can disable that now and imo it’s a lot better to know you have a borked system than have it partially work.

1 Like

Ah, I wasn’t aware of that. Thank you for bringing it to my attention.

I can understand your point, but I can’t agree with it. When I sit down at my desktop system to get work done, I need to get that work done. If something is wrong with a boot-drive file that prevents me from booting up the system, then I have to spend time investigating, reinstalling, and setting up the system again before I can do the work.

On my laptop, the situation is worse because I’m usually away from home. If I can’t boot it up, I can’t do anything until I get home and use a working system to get it running again. I’ve partly offset that by always carrying around a thumb drive with the “live” Ubuntu installer for the version I’m using, but that’s a poor substitute for a working system.

If either system worked, in a degraded form, there’s a good chance that I can get my work done, and then reinstall later when I have some free time.

The best of both worlds would be if btrfs would let the system (attempt to) boot, but the system would then immediately notify the user of the problem. I’m not sure if that’s offered yet, I need to look into the newer capability that you’ve brought to my attention.

There is still something wrong with your boot drive that can cause who knows what, if it is something that prevents booting with it throws an io error while reading it’s probably pretty important.

Great the system boots but it’s also just randomly crashing in the middle of the work you are trying to get done or is doing who knows what kind of data corruption in the background.

With btrfs you can still boot your garbled system but at you’ll know what files are damaged so a full reinstall is a lot less likely to be required.

You can set it up like that if you want to