Encrypted, self-healing backup with a ZFS mirror

By Rahul Pandit

Posted on Tuesday, 05 April 2022

Last updated on Wednesday, 13 April 2022

Introduction

After reading this Ars Technica article on next-gen filesystems and this Wikipedia article on data degradation, I got really worried about my own data. Bit rot or silent data corruption can happen due to cosmic rays or faulty RAM module or loose cable connections or faulty SATA or USB controllers or even due to just keeping your storage device unconnected for years at a time.

All my data was stored on the ext4 filesystem that is the default on Ubuntu. And even though I follow proper 3-2-1 backup strategy, if silent data corruption were to occur in any of my backups, there was no easy way to know and correct it.

I mean you can keep md5 hashes of all the files and then periodically check them with the current md5 hashes of the files to know if any has changed, and, of course, you can create parchive files of all the data to detect and correct the corruption. But you'll have to do all of this manually. You'll have to update hashes and parchive files when your data is modified then periodically check if hashes match and if they don't, use parchive to fix the files.

Imagine how easier your life would be if the filesystem itself would be capable of doing all this. This is where ZFS comes in.

What is ZFS

  • ZFS is a filesystem developed originally at Sun Microsystems (later acquired by Oracle).
  • ZFS was designed primarily to ensure data integrity. Data integrity can be achieved via mirroring or ZFS-specific RAID-Z configurations.
  • ZFS is a Copy-on-Write (CoW) filesystem. Say, you've modified a text file stored on a ZFS drive, when you save it, an entirely new copy of the file with the modified content is created rather than modifying the original file.
  • CoW enables ZFS to offer snapshot feature. If you take a snapshot at a particular time and then modify or delete some files later on, those files will still be stored in the snapshot. You'll have the ability to restore the original files.
  • ZFS can also send and receive snapshots securely over the network which is helpful if you're looking for offsite backups.
  • ZFS also has a feature called datasets. Datasets can behave like partitions on a zpool and they can have separate configuration than the parent zpool.

What is a ZFS mirror

A ZFS mirror is like a traditional RAID-1 setup but unlike a RAID-1 setup, a ZFS mirror is self-healing. It can detect and fix data corruption. A ZFS mirroring setup works like this. You have a two-way mirror with 2 disks (say, disk D1 and disk D2). When you copy a file to your new ZFS mirror, ZFS will store the file and its checksum to both the disks. Later on, when you try to read the file, ZFS will compare the file content and its checksum on both disks D1 and D2 and see if they match, if they match, it's all good. If one of them doesn't match (say, there's a checksum mismatch on disk D2), ZFS will know data corruption has occurred and it will take the good copy from disk D1 and copy it over to disk D2. This is how ZFS ensures data integrity in a mirroring setup.

By the way, if you want more redundancy, you can add a third disk and make it a three-way mirror too.

Some ZFS jargon

I'm not gonna go into too much detail here. You can read this article on ZFS basics if you want more details.

  • zpool : zpool is the highest level ZFS structure. A zpool is made of one or more vdevs.
  • vdev : A vdev is made of one or more disks. Our vdev will consist of 2 disks in a mirroring setup.

Install ZFS on Ubuntu

I'm assuming that you're using Ubuntu or some Ubuntu derivative distro. We can install the open source implementation of ZFS called OpenZFS as follows :

sudo apt install zfsutils-linux

Create a Zpool Mirror

You will need 2 storage devices of the same size. This can be either 2 USB flash drives or external HDDs or SSDs.

Attach both of them to your computer and format them. Then check their unique identifiers with the following command :

ls /dev/disk/by-id/ -1 | grep usb

The output will contain names that uniquely identify your devices be something like :

usb-Kingston_Flash_Drive_123456
usb-Sandisk_Flash_Drive_98765

Now, following command will create a compression enabled and encrypted zpool mirror named mydata which will be mounted at /mydatadir directory. ZFS will encrypt your data in AES-GCM mode using 256-bit key. The command will ask you for a password for the encryption. Encryption key will be derived from the password.

sudo zpool create -O encryption=aes-256-gcm -O keylocation=prompt -O keyformat=passphrase -O compression=zstd -m /mydatadir mydata mirror usb-Kingston_Flash_Drive_123456 usb-Sandisk_Flash_Drive_98765

You can see the status of your zpool :

zpool status

You can see the list of all the zpools :

zpool list

Before you check out your newly minted zpool mirror, your user should own the directory first :

sudo chown yourusername: /mydatadir

Open your file manager and copy your files in /mydatadir.

When you're done, you should unmount the zpool, unload the encryption keys and export the zpool. DON'T FORGET TO DO THIS!

sudo zfs unmount mydata
sudo zfs unload-key mydata
sudo zpool export mydata

Now you can disconnect the drives from your computer.

The next time you want to access your zpool, attach both the drives to your computer and then import the zpool :

sudo zpool import mydata
sudo zfs load-key mydata
sudo zfs mount mydata

Afterwards, don't forget to unmount the zpool, unload the key and export the zpool :

sudo zfs unmount mydata
sudo zfs unload-key mydata
sudo zpool export mydata

Regularly Scrub your Zpool

As we saw above, ZFS automatically corrects the errors if it encounters them when reading files. Additionally it's recommended to scrub your zpool every once in a while to check every bit of data. Scrub command will verify that the data matches the stored checksum and will repair any damage it finds. Run this command once a month :

zpool scrub mydata

ZFS Snapshots

Create a new snapshot named snapshot-5-april-2022 :

sudo zfs snapshot mydata@snapshot-5-april-2022

Now, you can modify a file or even delete a file without worrying too much because as long as you have snapshot-5-april-2022, you will be able to restore those files.

Mount the snapshot snapshot-5-april-2022 :

sudo mkdir ~/snapshot-5-april-2022-dir
sudo mount -t zfs mydata@snapshot-5-april-2022 ~/snapshot-5-april-2022-dir

After you're done, unmount the snapshot :

sudo unmount ~/snapshot-5-april-2022-dir

You can rollback to the saved snapshot too. Any changes made after the snapshot was taken will be lost.

sudo zfs rollback -r mydata@snapshot-5-april-2022

Delete the snapshot :

sudo zfs destroy mydata@snapshot-5-april-2022

Further Reading



Cover Picture Credit : Photo by benjamin lehman on Unsplash





Recent Posts

Deploy Vaultwarden password manager, Portainer, Nginx and Certbot in Docker


Good Pi-hole blocklists that stop online ads, trackers and malware


Block online ads, trackers and malware with Pi-hole, WireGuard, DoT and DoH servers


Free third-party DNS for blocking ads and trackers


My Chess Notes