Fedora btrfs corrupted - checksum verify failed

Hi!

Just wondering if anyone else has experienced this. Fedora linux, btrfs partition, luks encrypted.

On Saturday my whole system froze (unresponsive) and I decided to hard power it off, holding in the power button. I have had my Framework 13 for almost a year, and do not recall it freezing like this before.

Immediately after that, powering it on again, it wouldn’t boot with some error, something about my luks partition not being an absolute file system path.

I manage to get grub to boot in verbose mode, so I can see some more errors, and apparently I get a BTRFS warning, checksum verify failed on logical.

[ 1653.641632] BTRFS: device label fedora devid 1 transid 105531 /dev/mapper/my_drive (252:0) scanned by mount (14143)
[ 1653.641987] BTRFS info (device dm-0): first mount of filesystem 591cccae-daa2-4bef-b993-33773a1f36b9
[ 1653.641991] BTRFS info (device dm-0): using crc32c checksum algorithm
[ 1653.656118] BTRFS warning (device dm-0): checksum verify failed on logical 163821240320 mirror 1 wanted 0xc285fdcf found 0x48dd5876 level 0
[ 1653.656298] BTRFS warning (device dm-0): checksum verify failed on logical 163821240320 mirror 2 wanted 0xc285fdcf found 0x57cc0fb0 level 0
[ 1653.656308] BTRFS error (device dm-0): failed to read block groups: -5
[ 1653.656773] BTRFS error (device dm-0): open_ctree failed: -5

I have been trying to figure out how to correct the error on the disk, but every command I try doesn’t get rid of the checksum errors.

The best I have managed to do is find the link below, where someone has had the same issue with his BTRFS file system, and he gives up, simply mounting the drive as read only and tries to save what he can. I have managed to do the same from a live boot image, and recovered most of my files (some are unreadable)

sudo mount -oro,rescue=all /dev/[your partition name] ./mountdir
cp: cannot stat 'DSC07326.ARW': Input/output error

Has anyone else come across this, and managed to correct the error?

Which Linux distro are you using?

Fedora 44

Which release version?
(if rolling release without a release version, skip this question)

(If rolling release, last date updated?)

Which kernel are you using?

6.19.14-200

Which BIOS version are you using?

03.05

Which Framework Laptop 13 model are you using?

AMD Ryzen™ AI 300 Series

Btrfs doesn’t just corrupt itself. Most likely your SSD is failing or you have some other hardware fault. You should restore from backups or try cloning that drive onto another known good drive to investigate further.

Not sure what failing hardware it could be, everything is new.

On BTRFS not corrupting itself, indeed, that seems to be the general sentiment.

What I have learnt is that, if your BTRFS file system does get corrupted, it is not easy to repair.

Do you have a solution to offer up?

Like I said, I would clone it to another drive and go from their. Try running ‘btrfs check --repair’ on it while it’s in another machine or you booted a Linux iso to test.

Hey Jared. Like I said, I have tried everything, already running a live linux iso.

I have backed up what I could, now I am just testing to see if I can do something to correct the checksum error and get the drive booting.

Here is the manual page for btrfs. Manual pages — BTRFS documentation

I have essentially tried every command on btrfs-check and btrfs-rescue.

All these commands make no difference, they just report the checksum error.

btrfs check --repair /dev/mapper/my_drive
btrfs rescue fix-data-checksum /dev/mapper/my_drive
btrfs check --init-csum-tree /dev/mapper/my_drive

And so on.

The only command that seemed promising was the zero-log one, but although it didn’t report any error it did not clear the checksum error.

btrfs rescue zero-log /dev/mapper/my_drive

Some output from the different commands, first we decrypt the drive:

~$ sudo cryptsetup luksOpen /dev/nvme0n1p3 my_drive
TPM policy does not match current system state. Either system has been tempered with or policy out-of-date: Operation not permitted
Enter passphrase for /dev/nvme0n1p3:
~$ 

Then we can proceed:

~$ sudo btrfs check /dev/mapper/my_drive 
Opening filesystem to check...
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x48dd5876
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x57cc0fb0
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x48dd5876
Csum didn't match
ERROR: failed to read block groups: Input/output error
ERROR: cannot open file system
~$ 
~$ sudo btrfs check --repair /dev/mapper/my_drive 
enabling repair mode
WARNING:

	Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. E.g.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x48dd5876
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x57cc0fb0
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x48dd5876
Csum didn't match
ERROR: failed to read block groups: Input/output error
ERROR: cannot open file system
~$ 
~$ sudo btrfs check --init-csum-tree /dev/mapper/my_drive 
Creating a new CRC tree
WARNING:

	Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. E.g.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x48dd5876
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x57cc0fb0
checksum verify failed on 163821240320 wanted 0xc285fdcf found 0x48dd5876
Csum didn't match
ERROR: failed to read block groups: Input/output error
ERROR: cannot open file system

This is the smartctl output on the drive.

~$ sudo smartctl -a /dev/nvme0n1
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-7.0.0-14-generic] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WD_BLACK SN850X 1000GB
Serial Number:                      24463F801471
Firmware Version:                   620361WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      8224
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 8b40f01f40
Local Time is:                      Wed May 13 05:59:20 2026 UTC
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     94 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W    9.00W       -    0  0  0  0        0       0
 1 +     6.00W    6.00W       -    0  0  0  0        0       0
 2 +     4.50W    4.50W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     5000   10000
 4 -   0.0050W       -        -    4  4  4  4     3900   45700

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        34 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    16,579,498 [8.48 TB]
Data Units Written:                 24,870,114 [12.7 TB]
Host Read Commands:                 952,327,449
Host Write Commands:                1,179,385,292
Controller Busy Time:               458
Power Cycles:                       2,225
Power On Hours:                     109
Unsafe Shutdowns:                   117
Media and Data Integrity Errors:    0
Error Information Log Entries:      270
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
No Self-tests Logged

To remove btrfs checksum errors do the following.

  1. find out which file is corrupted.
  2. remove and snapshots that include the file.
  3. delete the file that contains the checksum failure.
  4. btrfs check should then be clear.

If the filesystem was mirrored, it would be able to fix itself as only one of the mirror copies would be bad.

Hi James! Thanks for the step by step! How do I find out which file is corrupted? Can I identify it/them with the checksum error I have?

minority

@minority

Your disk is most likely failing. I would backup any data and replace the disk/ssd/nvme.

Does this help?

The log should have told you which file was the problem one.