ZFS is the free copy-on-write filesystem from Oracle Corp (former Sun Microsystems). I’m currently using ZFS and Solaris 11 Express as the base of my home file server and I thought I’d do some explanation on how it works. Two of the most interesting things about ZFS is the ability to use fast SSDs to speed things up. I’m currently using two mirrored 60GB SSDs as ZIL (ZFS Intent Log) and one 60GB as L2ARC (Layer 2 Adaptive Replacement Cache). The ZIL is the write cache and the L2ARC is the read cache. Why do I need those? Well, I don’t. But it’s interesting to learn about the file system and how it works, so I just HAD to try it out.
The old model
The old model of file servers uses the machines RAM as a read cache and the rest of the data is on normal, spinning disks.
The new model
Now, how can we be more efficent? The problem with disks is that they are slooooow, really slow. The problem with RAM is that you never have enough of it. The solution is to insert another layer in the storage hierarchy, a SSD layer. The fast SSD disks will act as a cache, much faster than spinning disks and with a lot more storage capacity than RAM.
ZFS uses the new model, but with a “twist”. Like i mentioned before, there are two kinds of SSD cache in ZFS: ZIL and L2ARC.
The ZIL, or ZFS Intent Cache, is the ZFS write cache. Many applications, like databases, needs to do synchronous writes to disk to ensure that the data is secured down in storage. This tends to be a problem since sync writes are really slow. What usually happens is that ZFS uses transaction groups, these are pushed out to every about every couple of seconds. Does the database want to wait this time? Probably not and the ZFS transaction log that says “I’m about to write baladibla to block bla bla” is written to disk instead, painfully slow but at least the data won’t be gone in case of a power failure. This pretty much works like the logs in a normal database. So what the ZIL does is that is gathers these transaction groups and instead of writing the logs to slow spinning disks they are stored on fast SSDs and the sync writes can be handled much faster.
The L2ARC on the other hand is totally different, this is the ZFS read cache. In the old model the data requested would first be read from the cache in RAM, if it’s missing there it would have to be read from disk. Disk reads are slow, can we please avoid them? Yes, we can. We’re inserting another layer between RAM and spinning disks consisting of much faster SSD disks. They will work as an extension of the normal cache in RAM (called ARC, hence L2ARC). This cache is now filled on some basic rules like “Most frequently used”, “Most recently used” and so on. When you read data the system first checks the ARC, then the L2ARC and last the spinning disks. This means a lot faster reads, especially random reads which tends to be extremely slow in spinning disks.
In my personal experience doing some testing i get about three times the write performance with this setup and twice the read performance. A quite nice addition to my home server indeed. I also have to mention that I use VMware ESXi for virtualization and the backend storage runs over NFS against the file server. NFS uses sync writes and this is probably where I saw the biggest difference when I added the ZIL devices. Earlier the VMs would pretty much be unresponsive whenever i copied a big file, because they couldn’t get their sync writes through to the disks. This is gone now, thankfully.