I still have my VIA ARTiGO A2000 that I use for a home server, mostly for storing important files. A RAID1 setup is nice in that, if a disk fails, the data is not lost since the 2 drives are copies of each other. However, taking snapshots still require an external drive (I usually use rdiff-backup for this). Also, RAID1 doesn’t protect against things like bit rot — basically, I use
sha1sum if I’m paranoid about losing data (I only use
sha1sum instead of something simpler like
cksfv (crc32) or
md5sum because the VIA C7 chip has built-in SHA-1 hardware support. With the
sha1sum implementation on this page, it is very quick.)
ZFS has two nice features to ensure data integrity — checksum of data written to disk as well as copy-on-write (COW). This means that if data on disk somehow gets corrupted (e.g., bit rot), ZFS can report this to the user. In fact, if a mirror configuration is used, the data is duplicated, so ZFS can heal itself. COW is also useful in that, if a power outage occurs, there’s less chance of inconsistent data on disk since data is never overwritten in place unlike most regular filesystems.
ZFS works on the concept of pools which contain multiple devices, and data sets are created in the pool. The
zpool command allows the user to control the pools while the
zfs command allows users to create data sets in the pool. What is nice with this method is that one can add devices (e.g., hard drives) quite easily to an existing pool, if needed — in other filesystems, one can’t dynamically change or add devices to an existing filesystem. Data sets can be thought of as mount points or file systems in a pool — although all data sets of a pool share the same amount of space, the administrator can set different quotas or reserve space for each file system. Snapshots (also a type of data set) can be made of a file system for backup purposes, and these snapshots are very quick — instead of needing to create incremental backups separately, this feature is built into ZFS.
There are some additional features of ZFS, such as data deduplication, compression, and encryption, that may be attractive for others. However, this depends on the zpool version used. Moreover, for a home server, data deduplication isn’t important for me (I don’t have multiple copies of the same file). Media files (e.g., JPEG or MP3) do not compress well since they already are usually compressed with some lossy compression algorithm. Lastly, one can always use LUKS/dm-crypt to encrypt the device prior to specifying the device in the ZFS pool for encryption.
ZFS is not in the Linux kernel because it is distributed under CDDL, which is incompatible with GPL. However, there are two implementations of ZFS on Linux – zfs-fuse and Native ZFS for Linux. I tried the LLNL Native ZFS for Linux, but there were issues with copying files larger than 2 GB on a 32-bit system (the VIA ARTiGO A2000 I use for my home server is a 32-bit system). zfs-fuse worked fine with large files, and I haven’t had any problems with it for a long time, so it seems quite reliable.
ZFS, compared with other filesystems, is not very fast (Phoronix benchmark here and here). However, the performance is fine for home use. Consequently, if data integrity is important, use of ZFS may be something work considering.