Hypercritical


Who’s minding the store?

The dismal state of personal data storage and the Sun light at the end of the tunnel.

In an earlier post, I discussed a possible hardware solution for raising the standards of data integrity in the personal computer market. In it, I briefly acknowledged that there is a much wider world of data management (backups, accounting for user error, performance, etc.), but I chose to focus on a single aspect. Today I’d like to take on the rest of the topic.

Backing up a bit first, I’ve had two big concerns about the world of data storage for the past decade or so. The first is the historic notion of files as minimally-adorned streams of bytes. My early experience with the Mac thoroughly convinced me of the power and utility of thinking of files, instead, as arbitrarily extensible collections of attributes, only one of which is the traditional stream of bytes. This topic has been well aired here at Ars. Recently, I’ve started to see some progress in this area on my favorite platform.

My second concern is the continued lack of hardware abstraction and fault tolerance (human faults included), especially when it comes to personal data storage. For my entire computing life, it’s been the same story: disks, volumes, files, and folders. I don’t expect those to go away, mind you. Lower levels of abstraction that are truly useful take a very long time to disappear entirely. But I do expect further abstractions to be built on top of past achievements.

The transistor was followed by the logic gate which was followed by the integrated circuit which was followed by the CPU, the operating system, the MMU, application programming interface, the graphical user interface, and on and on. Today, we use abstractions truly built on (and still actually running on, remember) the formerly “high-level” abstractions of past eras. This is what’s known as progress, and in the world of personal data storage, it’s been sorely lacking.

On my very first computer that had a hard drive (circa 1986), I recall allocating storage by creating volumes (also known as partitions), one or more per disk. In 2005, I’m still doing the same thing. Yeah, I’m still using a keyboard and mouse too, but data storage hasn’t remained the same because no one can come up with anything better. In the world of “enterprise” data storage, better solutions have existed for decades.

I’m talking about RAID, logical volume management, storage-area networks, log-structured and write-anywhere file systems, and all that good stuff. These things tend to come at “enterprise” prices as well. Most require expensive hardware, occupy more physical space than a traditional single hard drive, and often have significant configuration hurdles. But when it comes to abstraction, each has one or more advantage that we poor schleps with our fragile, slow, nearly disposable hard drives must live without.

Even enterprise storage has stagnated to some degree. Ask anyone who has to use one of these enterprise storage technologies and they’ll tell you a dozen things that could be improved in each of them. (It’s not hard to come up with plausible reasons for the relative lack of progress in the data storage world, but I’m not going to dwell on that now. It’s solutions I’m after.)

So, personal storage abstraction has not fundamentally changed in decades, and enterprise storage, while faring better, is burdened by expensive hardware and complex setup requirements. Maybe there’s something software can do to help?

When it comes to data storage, the “software side” usually means file systems. I’ve been a file system junkie since HFS replaced MFS. File systems are great at addressing my metadata concerns. Although progress is still painfully slow, it does happen. Witness NTFS on Windows, the newly enhanced HFS+ on Mac OS X, and even more audacious projects like ReiserFS on Linux.

Unfortunately, it’s proven difficult for file systems to innovate beyond the bounds of a single volume. If they want to be a drop-in replacements for their predecessor file systems, they are usually at the mercy of whatever volume management mechanisms already exists. On a personal computer, that means the same old disk/partition/volume scheme that we’ve been stuck with my entire life.

Enterprise solutions get around this by selling themselves as “solutions.” You give us $150 grand and we give you a refrigerator-sized box with lots of blinking lights, plus two nerds at $250/hour to keep it all running. And like I said, ask the nerds and they’ll say even their own “advanced technology” still kinda sucks.

Is software, too, powerless to save us? Take heart, all is not lost! When it comes to software, all it takes to start an avalanche is one tiny, shining, metaphorically conflicted pebble to act as a beacon of light for the rest to follow. Arguably, this has happened before—and recently—in the data storage world, when BFS demonstrated the power of high-performance, arbitrarily extensible metadata when applied to personal computers. Today, everyone’s doing the metadata thing (or trying to, anyway).

What we need is the equivalent of BFS for the world of volume management and software-based data integrity. Like BFS, it doesn’t even have to be a success in the market. The only requirement is that it be unencumbered by the assumptions of the past. Put simply, we need someone, anyone, to Think Big and then act on those thoughts.

Enter ZFS from Sun. Yes, Sun! These days, Sun is the red-headed stepchild to Linux’s darling son in the Unix market. But Sun has proven unique among large technology companies in its willingness to do what is hard. While their success rate may not be enviable, their motivations are. And when something does hit the mark (e.g., Java), the pay-off can be big. I’m not shocked that ZFS came from Sun.

I’ve been following ZFS since it was first announced over a year ago. Since then, there has been little fanfare among regular users—or anyone, really, except maybe some big Solaris geeks. (Yes, they exist. Yes, outside Sun.) Then, just a few days ago, Sun made good on its promise to make ZFS open source. The interest level this time? Barely two pages in the discussion thread attached to the Ars article.

This is all part of why progress is slow in the world of data storage. I know I said I wouldn’t dwell on the causes, but the contrast is readily apparent. Advances in data integrity, performance, and reliability can have a huge impact on even the lowly consumer. Yet there is almost no interest in this area, even among self-described geeks. It’s a shame, and it’s frustrating for a file system geek like me.

ZFS is definitely worth getting excited about. You can read all about it at the Open Solaris web site. If you’re technically inclined, I recommend starting with this PDF presentation, boldly named "ZFS - The Last Word in File Systems." (Har, so clever.) For now, the executive summary is that ZFS does away with the old restrictions on volume size and scope, while also addressing data integrity and performance issues, all from a purely software perspective. (Like one slide says, “ZFS loves cheap disks!”)

The end game is a world where storage—even personal storage—actually behaves like the magically intelligent, infinitely expandable cloud that we’d all like to think it is, and less like those temperamental little cylinders (to use some diagram-speak, if I may). It’s daring, free-thinking stuff.

If ZFS turns out to be as impressive in it implementation as it is in its philosophy and design, I will be ecstatic. If Apple co-opts it, KHTML-style, and incorporates it into Mac OS X 10.6 Lion, I think I’ll explode. If not, well, then maybe some forward-thinking Linux hackers will grab the source code and run with it. (I think they’d be crazy not do…ah, but then there’s that pesky licensing issue. Sigh.) Then again, maybe ZFS will be a flop.

In the end, I don’t think it matters. The folks at Sun have shown the way forward in a manner that’s difficult to ignore. They’re putting their product—and their source—where their collective mouth is. Or rather, their minds. Even plucky innovators like Hans Reiser have not been able to think outside the box that keeps file systems in their traditional place. With ZFS, Sun has broken the logjam. Free your minds, people, and your disks will follow.


This article originally appeared at Ars Technica. It is reproduced here with permission.