Hypercritical
ZFS data integrity explained
How many CPU cycles is your data worth?
As a follow-up to my earlier post on the relative stagnation file system design, I was going to write a summary of ZFS’s approach to data integrity—how it works and what makes it different than past approaches—but it looks like someone at Sun has already done it for me. This recent entry in Jeff Bonwick’s blog is short and to the point. Here’s the introduction, to whet your appetite.
The job of any filesystem boils down to this: when asked to read a block, it should return the same data that was previously written to that block. If it can’t do that -- because the disk is offline or the data has been damaged or tampered with -- it should detect this and return an error.
Incredibly, most filesystems fail this test. They depend on the underlying hardware to detect and report errors. If a disk simply returns bad data, the average filesystem won’t even detect it.
Read the whole post to find out how ZFS solves1 the problem. Okay, I’ll (partially) spoil it for you. As revealed in the RSS summary of the post you are reading right now (see why you should subscribe?), ZFS trades CPU cycles for peace of mind. Every block is checksummed after it arrives from the disk (or network or whatever).
One thing Jeff Bonwick’s post doesn’t go into, however, is the actual cost in CPU cycles of all this checksumming. This type of overhead is ideal for an architecture like Sun’s Niagara: many small, simple CPU cores. Finding a spare core to “dedicate” to checksumming during heavy i/o is easier when your CPU has 8 cores. But is it well suited to more traditional desktop CPUs with one or two cores? How much CPU overhead is acceptable for a desktop file system driver?
My position is that the PC industry should just bite the bullet now and get it over with. Apple proved the feasibility (if not necessarily the desirability) of this approach with Quartz. Make the hard decisions early (“Everything is composited! No exceptions!”), perhaps even before the hardware is really ready, and work out the details later. This is a reasonable approach when the pay-off is big enough. I think data integrity is such a situation—even more so than modernizing the display layer.
My only regret is that I have but one CPU core to give for my data…
- Yes, if you think about it for too long, Jeff’s explanation of the ZFS solution starts to be consumed by its own paranoia. The disk drive, the firmware, and the network are all fallible, but the host CPU is not? But really, this is a bit unfair. As a file system designer, you can only solve problems that are in your domain. ZFS does better in this regard than all previous file systems. After finding a clever way to bypass several historical sources of data corruption, I don’t fault ZFS for being vulnerable to one final layer of fallible hardware.↩
This article originally appeared at Ars Technica. It is reproduced here with permission.