ZFS on Linux, Part 1
ZFS is a filesystem that has a lot of very nice features. It’s not native to Linux like Btrfs, but it has similar features and is a bit more mature.
I still have my VIA ARTiGO A2000 that I use for a home server, mostly for storing important files. A RAID1 setup is nice in that, if a disk fails, the data is not lost since the 2 drives are copies of each other. However, taking snapshots still require an external drive (I usually use rdiff-backup for this). Also, RAID1 doesn’t protect against things like bit rot — basically, I use sha1sum
if I’m paranoid about losing data (I only use sha1sum
instead of something simpler like cksfv
(crc32) or md5sum
because the VIA C7 chip has built-in SHA-1 hardware support. With the sha1sum
implementation on this page, it is very quick.)
ZFS has two nice features to ensure data integrity — checksum of data written to disk as well as copy-on-write (COW). This means that if data on disk somehow gets corrupted (e.g., bit rot), ZFS can report this to the user. In fact, if a mirror configuration is used, the data is duplicated, so ZFS can heal itself. COW is also useful in that, if a power outage occurs, there’s less chance of inconsistent data on disk since data is never overwritten in place unlike most regular filesystems.
ZFS works on the concept of pools which contain multiple devices, and data sets are created in the pool. The zpool
command allows the user to control the pools while the zfs
command allows users to create data sets in the pool. What is nice with this method is that one can add devices (e.g., hard drives) quite easily to an existing pool, if needed — in other filesystems, one can’t dynamically change or add devices to an existing filesystem. Data sets can be thought of as mount points or file systems in a pool — although all data sets of a pool share the same amount of space, the administrator can set different quotas or reserve space for each file system. Snapshots (also a type of data set) can be made of a file system for backup purposes, and these snapshots are very quick — instead of needing to create incremental backups separately, this feature is built into ZFS.
There are some additional features of ZFS, such as data deduplication, compression, and encryption, that may be attractive for others. However, this depends on the zpool version used. Moreover, for a home server, data deduplication isn’t important for me (I don’t have multiple copies of the same file). Media files (e.g., JPEG or MP3) do not compress well since they already are usually compressed with some lossy compression algorithm. Lastly, one can always use LUKS/dm-crypt to encrypt the device prior to specifying the device in the ZFS pool for encryption.
ZFS is not in the Linux kernel because it is distributed under CDDL, which is incompatible with GPL. However, there are two implementations of ZFS on Linux – zfs-fuse and Native ZFS for Linux. I tried the LLNL Native ZFS for Linux, but there were issues with copying files larger than 2 GB on a 32-bit system (the VIA ARTiGO A2000 I use for my home server is a 32-bit system). zfs-fuse worked fine with large files, and I haven’t had any problems with it for a long time, so it seems quite reliable.
ZFS, compared with other filesystems, is not very fast (Phoronix benchmark here and here). However, the performance is fine for home use. Consequently, if data integrity is important, use of ZFS may be something work considering.
Comments are closed.