Help an on again/off again Linux user take the plunge

i used to do that. personally i think it’s useful if one has only one computer. if one has access to a NAS / backups and an easy way to restore a system, having more partitions is more of a hassle than anything else. the way i do it is symlinking the dotfiles i really need to a folder that i keep in sync across multiple devices (besides of course documents etc.).

besides, if both the root and home partitions are encrypted (which, i mean, yes of course) it easily adds complexity. it is still doable of course, but it gets more annoying.

1 Like

I’ve done a variation of this where I keep a copy of the live ISO on a dedicated partition and then add an entry to the bootloader for booting the aforementioned ISO. As a side-bonus, it keeps disk space down as well.

It also makes it easy to bug-test things on real hardware in a clean, vanilla configuration, and you’ve also then got the ISO readily on-hand for use in VMs.

Of course, if you’re actually compiling your own installation then this method isn’t exactly practical.

As far as I know, ext4 doesn’t have a native snapshot feature in the same sense that ZFS does. ext4 can use LVM snapshots, but that operates at a lower (block) level, not filesystem. ZFS snapshots are native to the filesystem itself, so you can do things like snapshotting a live filesystem (without freezing the filesystem).

btrfs however does have this support. I’ve always shied away from it due to the rapid development and lack of good recovery tooling when I checked it out. The general saying around it was along the lines of “it worked perfectly until the day it didn’t and it ate my data”. Of course, everyone should have backups, etc etc. But it’s still a hassle. :slight_smile:

I will say, none of these filesystems provide self-healing abilities with a single disk, mostly because it would be somewhat pointless (if a drive is exhibiting read errors, it’s likely to only get worse, and parity data can only do so much). At that point I’d consider stability, speed, and tooling, which have traditionally led me to use ext4 or XFS (depending on the device, see here for some more info) for most of my systems, and ZFS for my NAS and some of my workstations.

ZFS does like to eat RAM when configured to, but those are largely performance optimizations to reduce the amount of disk reads necessary to return data. When there’s memory pressure or not much memory in the first place, it’ll simply hit the disk instead of cache. Simple as that (as you’ve noted). I think the bigger strength in ZFS’ snapshots is that they can be sent and synced between devices. So, for example, setting up backups to a NAS can be as simple as creating a snapshot on your machine, then sending it via ssh like this:

LATEST=$(ssh backup-server zfs list -t snapshot -o name -s creation -r backups/workstation | tail -1)
NOW=$(date +"%Y%m%d_%H%M%S")
zfs snapshot -r zpool/home@backup-$NOW
zfs send -R -i $LATEST zpool/home@backup-$NOW | ssh backup-server zfs recv -du backups/workstation

Critical to note here is that this is an incremental backup; no need to send unchanged data. I don’t know that doing something like this is possible with ext4/LVM, at least not without specialized tooling to send block-level incremental diffs.

If this is something that appeals to you, btrfs and ZFS are going to be your best bet. ZFS also has support for filesystem-level encryption and btrfs doesn’t, which you may or may not care about. The biggest win there is interoperability with BSD systems and the fact that you can selectively encrypt subvolumes.

All this to say: there’s good reasons to choose ZFS beyond the traditional parity stuff. :slight_smile:

2 Likes

In what way is this different or perhaps better than LUKS? Is it more performant? Or is it because ZFS is aware of encryption and other filesystems are unaware?

How is this better or worse than filesystem level? Again is it related to performance or does it enable greater selectivity in what gets backed up?

It’s different than LUKS in that it’s a native property of the filesystem itself, which allows for doing things like encrypting subvolumes. The performance characteristics vary depending on your workload, really, since ZFS and ext4 also behave differently as filesystems. Worth noting that ext4 apparently does have support for encryption natively through fscrypt. You can do neat stuff like encrypt per-user using a user password, which I think is particularly nice.

As for LVM vs filesystem snapshots, the main difference is that LVM snapshots are basically disk images, which means backing them up incrementally is a bit of a pain and requires special tooling. The other (more noticable, imo) thing is that you can’t browse the snapshots in the same way ZFS and btrfs allow you to. They both let you represent snapshots as a directory on the filesystem, so you can browse them in a Time Machine style directory. This can be useful for grabbing files off an old snapshot in a jiffy… Conversely, those snapshots are limited to the filesystem, so you can’t do something like snapshot multiple filesystems simultaneously (if you have multiple volumes in the LVM volume group).

There are good reasons to use either or. I’d suggest you try some benchmarking yourself with various configurations to figure out what’s best for you. For what it’s worth, I opted for LUKS + LVM + ext4, as I wanted to set up TPM-based unlocking and support encrypted swap, so LUKS + LVM made the most sense. I want to use a vanilla kernel, so that restricts me to ext4, XFS, and btrfs.

Pretty much all of the features I’ve mentioned can be set up using any of these methods (even the integrity checking can be done at the LVM level using dm-integrity). It’s really about what works best for your use case and how easy it is to set up for your distro.

1 Like

@reanimus Ok, follow-up question time. It seems to me that it should be possible to use LUKS for encryption and follow this guide on setting up FDE, TPM, and Secure Boot while using BTRFS for snapshots/backup. If BTRFS can detect data errors, even if it can’t correct them, I should be able to navigate to the backup and restore the file to an uncorrupted state. Does this sound right to you?

If the backup is not on the same disk where the failure occurred, yes.

Given the way snapshots are implemented, if a given file hasn’t changed, the snapshot doesn’t store a separate copy of it. That is to say, if the file that got corrupted is a file that hasn’t changed in a while, the snapshot’s copy will also be corrupt. If it’s a file that changes often, the snapshot will point to a different file and may be recoverable.

That being said, errors are usually due to one of two things: bit rot or drive failure. Bit rot is a one-off but incredibly rare in drives that are regularly used, whereas drive failure generally means there are going to be more errors on the way. If btrfs is indicating an error, it almost always means the drive is going to fail soon and needs to be replaced.

1 Like

this is a crucial point. i just want to stress again that relying on the filesystem for “data integrity” on a laptop is a terrible idea :slight_smile: the priority should be 1. backups 2. (many other things e.g. don’t get it stolen, don’t leave it in places, don’t sit on it or put it in an oven, etc.) 3. use a filesystem with snapshots.

1 Like

@marco As I stated in the first few posts, I’ll have an eGPU dock that also has 3.5" drive bay so I’ll be backing up regularly to an external drive. I’m not really that ignorant.

Honestly, the bigger utility of local filesystem snapshots is to protect you from user/software error, rather than hard drive failure. With snapshots of the root filesystem in place, you can apply software updates or try config changes without worrying about how to undo it; simply try your update/changes, and if it breaks, roll back to the snapshot.

Similarly with home snapshots, you can retrieve files you’ve accidentally deleted/corrupted.

1 Like

@GhostLegion it’s not about ignorance. like @reanimus is saying, snapshots serve a very different purpose from backups

1 Like

To be fair, I think it’s also common to conflate the two if only because Time Machine is one of the most well-known and widely deployed backup solutions and it offers both snapshotting and backup functionality. But yes, the snapshotting and the backup serve different purposes. Snapshots protect you from user error, software bugs, and general hardware failure (i.e. power loss, CPU reset, etc). Backups protect you from loss of storage (i.e. hard drive dies, virus eats your drive, or you just plain lose it).

1 Like

Since you and @reanimus had to explain the difference, I suppose it is. Consider me more educated now. I’ll try to avoid conflating the terms in the future. I want protection from ransomware as well as the benefits previously mentioned.

1 Like

Snapshots should (in most cases) provide ransomware protection, assuming the ransomware doesn’t take the effort of wiping snapshots, as it would be similar to rolling back a config/corruption change.

Backups provide protection in either scenario, as you could simply wipe the drive and restore from a backup, though some ransomware may be programmed to wipe network mounts and the like as well.

i mostly meant FS-based snapshots as opposed to the generic concept of snapshots. my point is about the false sense of security that btrfs might give over backups, as yes, probably ransomwares and the like are a good reason to use btrfs, but they don’t affect a lot of people, while a lot of people experience hardware failures or thefts.

Has there been a successful multi-system ransomware attack in the Linux ecosystem? Last time I looked there had not been, but maybe someone knows differently…

Do you mean something like LockBit Linux-ESXi?

Or do you mean publicly known attack incidents?

EDIT: @GhostLegion don’t listen to me–I’m a nincompoop.

On top of what others have pointed out, I just wanted to mention that AFAIK the integrity-checking powers of BTRFS/ZFS rely on these filesystems having direct access to the bare metal of the drive. In other words, adding a LUKS layer of encryption between the filesystem and the drive effectively neuters the bitrot detection.

Nonetheless, I’m using LVM+LUKS+Btrfs with encrypted root/boot/swap and hibernation. Btrfs snapshotting is a lot more flexible and elegant than LVM snapshots, IMO.

1 Like

No? This isn’t true at all. The integrity checking works by reading the data from the underlying device and computing a checksum based off of it, then comparing it to the stored checksum in the filesystem metadata. This works whether the filesystem is being written directly to disk or being encrypted in the interim.

By default, LUKS will hide certain capabilities the block device has from upper layers (the usual example being SSD TRIM support, for security reasons) but this can be disabled to allow btrfs/ZFS to use those capabilities as well. This has nothing to do with its ability to check data integrity, though.

The only real difference I’d say integrity-wise between LUKS and non-LUKS is that if an encrypted disk has a single bit flip with LUKS, then it will result in a varying amount of incorrect bytes being read from the disk (i.e. 1 encrypted bit flip = multiple decrypted bytes being incorrect). This StackExchange answer suggests it’s usually around 16 bytes getting corrupted, depending on the LUKS crypto parameters.

2 Likes

Yup, thanks for the correction. I guess I was playing the game Telephone with some piece of FUD I’d heard about btrfs some time ago.

2 Likes