ZFS – One File System to Rule Them All

ZFS [1] is one of the few enterprise-grade file systems with advanced storage features, such as in-line deduplication, in-line compression, copy-on-write, and snapshotting. These features are handy in a variety of scenarios from backups to virtual machine image storage. A native port of ZFS is also available for Linux. Here we take a look at ZFS compression and deduplication features using some examples.

Setting ZFS up

ZFS handles disks very much like operating systems handle memory. This way, ZFS creates a logical separation between the file system and the physical disks. This logical seperation is called “pool” in ZFS terms.

Here we simply create a large file to mimic a disk via a loopback device and we create a pool on top:

# fallocate -l10G test1.img
# losetup /dev/loop0 test1.img
# zpool create testpool /dev/loop0
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 124K 9.94G 0% 1.00x ONLINE -

Let’s create a file, note that the pool gets mounted on /$POOLNAME:

# cd /testpool
# dd if=/dev/urandom of=randfile bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 15.3166 s, 6.8 MB/s
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 100M 9.84G 0% 1.00x ONLINE -

Deduplication/compression

ZFS supports in-line deduplication and compression. This means that if these features are enabled, the file system automatically finds duplicated data and deduplicates it and compresses the data with compression potential. Here we show how deduplication can help save disk space:

# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 100M 9.84G 0% 1.00x ONLINE -
# cp randfile randfile2
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 200M 9.74G 1% 1.00x ONLINE -
# zfs create -o dedup=on testpool/deduplicated
# ls
deduplicated randfile randfile2
# mv randfile deduplicated/
# mv randfile2 deduplicated/
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 101M 9.84G 0% 2.00x ONLINE -

Here we show how compression can help save disk space with gzip algorithm:

# zfs create -o compression=gzip testpool/compressed
# ls
compressed deduplicated linux-3.12.6
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 532M 9.42G 5% 1.00x ONLINE -
# mv linux-3.12.6 compressed/
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 155M 9.79G 1% 1.00x ONLINE -

Discussion

As you can see deduplication and compression can save you some serious disk space. You can also enable both deduplication and compression together according to your needs. Deduplication is especially useful when there are lots of similar inter or intra files (e.g. virtual machine images). Compression is useful when there is compression opportunity inter files (e.g. text, source code). Benefits aside, deduplication needs a hashtable for detecting similarity. Depending on the data, you may need a couple of GBs of memory per TB of data. De/compression on the other hand, burns a lot of your CPU cycles.

[1] http://en.wikipedia.org/wiki/ZFS
[2] http://zfsonlinux.org

Share