Full-Size SD Card Data Corruption Issue

ryanpetris · September 4, 2024, 12:53pm

I installed Kali Linux on a Sandisk Extreme PRO UHS-II 256GB SD card using BTRFS and noticed that I was getting some BTRFS has mismatch errors in the logs, and was wondering if anyone noticed the same.

I also have an Arch Linux install on another SD card (same brand/model, purchased at the same time) and it does not have any data corruption issues.

I tried copying the installation of Kali onto another SD card (again, same brand/model, purchased at the same time) and I still experience the data corruption issues on the new SD card. However if I mount the partitions from either Kali SD card on an Arch installation and manipulate the filesystem that way, no corruption happens.

Thinking that maybe the kernel version had something to do with it (6.8 on Kali vs 6.10 on Arch), I installed the 6.10 kernel from the kali-experimental repository, however the issue remained.

So my guess is that there’s something different about the Kali kernel that’s causing a data corruption issue that he Arch kernel is not.

Additionally, I tried another SD card reader, some Anker USB-C hub, that has a UHS-I SD card reader built-in, and while it only operates at half the speed it does not result in any data corruption while running Kali.

Also note that I have three of the full-size sd card adapters and all of the exhibit the same issue, so it’s not a matter of just having a bad sd card reader.

Has anyone else run into this problem? Or am I the only one crazy enough to install a linux distro on an SD card? (even though this happens all the time for raspberry pis and such…)

Edit: Just so it’s explicit:

Which OS (Operating System)? Kali Linux
Which release of your OS (Operating System / Windows 10, 11)? Up-to-date kali-rolling
Which Framework laptop (11th, 12th or 13th generation Framework laptop, Chromebook or Framework Laptop 16) are you asking for support with? 13 7840U

tom_chiverton · September 4, 2024, 1:27pm

BTRFS trashed my /home about 45 minutes into owning a new FrameWork under Ubuntu 22.04, so I’d point the finger there before anything else.

ryanpetris · September 4, 2024, 2:49pm

I’m run BTRFS for several years now on several machines with different configurations without issue; it’s not that. As I mentioned it doesn’t happen on Arch but does on Kali. I’m trying to find answers as to why this is happening as I can’t trust the SD card readers if it’s corrupting data. Please don’t turn this into a filesystem bashing thread. Thank you.

tom_chiverton · September 10, 2024, 1:58pm

Trying a different filesysyem type will rule that out as being the issue.

Adrian_Joachim · September 10, 2024, 2:03pm

Ext4 (or any other non error checking fs) will just not tell you about the corruption until it gets really bad XD

tom_chiverton · September 10, 2024, 2:33pm

Ext4 will fail it’s fsck if it’s being corrupted. Or fill the disk for identical 1gig files of /dev/urandom named for that sha1sum and see what happens. Many many, ways to check the integrity of storage hardware.
And that’s before we get into smartctl, hdparm and the like.

You are using an unsupported O/S for the hardware. On a variety of strange devices.

You’ll need to reduce the variables in the search space in order to pin point the issue.

Adrian_Joachim · September 10, 2024, 4:30pm

You do got to run those deliberately so corruption is a lot less obvious if you aren’t looking for it. btrfs by default will throw some pretty unpleasant io errors instead of returning corrupted data which can be kind of inconvenient (but you can configure it to just warn and give you what it has instead now)

But on a more serious not the the probability of the the file system itself corrupting data is really low these days, much more likely a memory or other hardware/driver/firmware issue. If you have solid reason to belie it really was btrfs itself corrupting yours stuff please file a bug report so that can be looked into, would be really bad if true. Though afaik the only data corruption bugs mainline btrfs had (ages ago) were raid5/6 related.

My btrfs works just fine XD (my install is even a sketchy ext4 to btrfs conversion cause I was too lazy to reinstall, it’s not recommended to do that though and I did have backups)

Filling the drive with know data and reading it back using dd or something is probably the easiest solution to entirely sidestep fs issues. Or maybe one of those fake flash testing utilities.

ryanpetris · September 10, 2024, 5:22pm

Ext4 will fail fsck if the filesystem itself, not the data, is corrupted. Checksumming was added a while ago but that was only for metadata. The actual data is not checksummed or otherwise validated in any way.

There are only a few filesystems that actually checksum and validate the data: BTRFS, ZFS, and I believe Bcachefs. Thus, using any other filesystem is really just masking the issue.

ryanpetris · September 10, 2024, 5:27pm

BTRFS has the same write hole problem that MDADM and many other parity RAID implementations have, the only reason why BTRFS considers it a bug and MDADM does not is the guarantee of always being consistent, which the write hole breaks. That can be mitigated, however, by using RAID5/6 for DATA ONLY while using non-parity raid such as RAID1C2 (for RAID5) or RAID1C3 (for RAID6) for the metadata. Using non-parity raid for the metadata guarantees that the metadata will always be consistent/recoverable, and if the metadata is consistent/recoverable, then it can figure out which bits of data are correct for the data segments.

Thus, just make sure not to use parity raid for metadata and you’re otherwise good to use parity raid for your actual data. I’ve been running RAID6 data and RAID1C3 metadata over 10 4TB NVME drives in an Asusstor Flashstor 12 pretty much since it was released without issue, and many years before that on a spinning-disk NAS with a similar configuration.

Adrian_Joachim · September 10, 2024, 5:34pm

I was literally running btrfs raid6 (on meta and data) while this issue was an active problem (only found out about it after I switched to zfs) for literal years on a 10x3tb array without ecc and apparently got lucky. Still miss some of the flexibility btrfs had over zfs (just expanding the array by one disk on a whim was pretty nice early on) but I can live with it.

These days I think the performance hit is probably still worth it on desktop for the certainty that you only get the exact data you have written (especially with the amount of ssd firmware bugs lately) so I am going btrfs/zfs everywhere.

ryanpetris · September 10, 2024, 5:42pm

I use BTRFS on my machines not just for the data checksumming but also to get time machine like “backups” every hour. There have been a few times now where I was able to fix a problem by rsyncing some path/file from a backup made hours or days earlier.

That and compression, especially for virtual machines. I’ve had 8+ GB filesystems shrink to less than 2GB by enabling compression.

…and then there’s also the advantage of being able to have virtual “partitions” via subvolumes; I’ve used this to install/use multiple Linux distributions on the same disk without having to create separate actual partitions for each install.

ryanpetris · September 10, 2024, 6:24pm

Funnily enough, it’s the “unsupported” OS that I’m not having a problem with. While Kali is also technically not supported, given it’s Debian-based just like Ubuntu, I’m pretty certain that Ubuntu will probably exhibit the same behavior. I’ll have to try when I have some extra time.

SD cards aren’t unusual to run Linux distributions off of given the proliferation of Raspberry Pi and similar devices.