ZFS – One File System to Rule Them All

ZFS [1] is one of the few enterprise-grade file systems with advanced storage features, such as in-line deduplication, in-line compression, copy-on-write, and snapshotting. These features are handy in a variety of scenarios from backups to virtual machine image storage. A native port of ZFS is also available for Linux. Here we take a look at ZFS compression and deduplication features using some examples.

Setting ZFS up

ZFS handles disks very much like operating systems handle memory. This way, ZFS creates a logical separation between the file system and the physical disks. This logical seperation is called “pool” in ZFS terms.

Here we simply create a large file to mimic a disk via a loopback device and we create a pool on top:

# fallocate -l10G test1.img
# losetup /dev/loop0 test1.img
# zpool create testpool /dev/loop0
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 124K 9.94G 0% 1.00x ONLINE -

Let’s create a file, note that the pool gets mounted on /$POOLNAME:

# cd /testpool
# dd if=/dev/urandom of=randfile bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 15.3166 s, 6.8 MB/s
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 100M 9.84G 0% 1.00x ONLINE -

Deduplication/compression

ZFS supports in-line deduplication and compression. This means that if these features are enabled, the file system automatically finds duplicated data and deduplicates it and compresses the data with compression potential. Here we show how deduplication can help save disk space:

# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 100M 9.84G 0% 1.00x ONLINE -
# cp randfile randfile2
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 200M 9.74G 1% 1.00x ONLINE -
# zfs create -o dedup=on testpool/deduplicated
# ls
deduplicated randfile randfile2
# mv randfile deduplicated/
# mv randfile2 deduplicated/
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 101M 9.84G 0% 2.00x ONLINE -

Here we show how compression can help save disk space with gzip algorithm:

# zfs create -o compression=gzip testpool/compressed
# ls
compressed deduplicated linux-3.12.6
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 532M 9.42G 5% 1.00x ONLINE -
# mv linux-3.12.6 compressed/
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
testpool 9.94G 155M 9.79G 1% 1.00x ONLINE -

Discussion

As you can see deduplication and compression can save you some serious disk space. You can also enable both deduplication and compression together according to your needs. Deduplication is especially useful when there are lots of similar inter or intra files (e.g. virtual machine images). Compression is useful when there is compression opportunity inter files (e.g. text, source code). Benefits aside, deduplication needs a hashtable for detecting similarity. Depending on the data, you may need a couple of GBs of memory per TB of data. De/compression on the other hand, burns a lot of your CPU cycles.

[1] http://en.wikipedia.org/wiki/ZFS
[2] http://zfsonlinux.org

Share

3 thoughts on “ZFS – One File System to Rule Them All”

  1. Could you explain why deduplication didn’t happen until you moved the files into the deduplicated directory?

    I would have thought that creating an additional file with the same contents would not result in additional usage of the filesystem.

  2. In ZFS, you can create multiple folders that have their own set of zfs properties just like the root filesystem… you are not limited to a single mountpoint but the same pool can have many folders created at many mountpoints.

    The following steps were left out of the author’s article

    zfs create testpool/deduplicated
    zfs set dedup=on testpool/deduplicated

    You can see here dedup was enabled but only inside this specific folder, hence there was no deduplication until they were moved.
    By default the new folder will inherit settings from the global pool until they are overriden, such as with the zfs set above.

  3. Thanks for your comments. As Trent has mentioned, in ZFS you can have different volumes within a pool that have different properties. I have now included the command used to create the deduplicated volume.

Leave a Reply

Your email address will not be published. Required fields are marked *