One of the leading NoSQL databases has reached the coveted 1.0 release. Apache announced Cassandra 1.0 today, just two years after entering the Apache Incubator. Originally developed at Facebook,Cassandra has come a long way in a short time.
What's new in this release? Cassandra has better performance, better disk space management and data compression on a per-ColumnFamily basis, among other things.
Cassandra also adds a Windows service in this release, so users can set it up as a managed service rather than running it from a bat
file.
I spoke briefly today to Jonathan Ellis, vice president of Apache Cassandra and CTO of DataStax on the 1.0 release, and what's after the 1.0 release. Ellis says that they did a poll of Cassandra users two years ago to see what they'd want in Cassandra. That produced a wishlist of features that the developers "kept in the back of their minds." Ellis says that, with 1.0, "we looked at that, and we got all of those done. Not all in 1.0 specifically, but in the two years since the survey, we've nailed all of those."
We, in this case, is not just DataStax. Ellis says that DataStax employs "the majority" of people who work on Cassandra, but that it's a "long tail" project with lots of contributors. Ellis also says that Twitter and Netflix are major contributors, as well as Rackspace "historically." Then he says there's a long list of users who contribute minor patches to fix issues that affect them directly, but aren't involved in core contribution.
On Oracle
Cassandra may be young, but it's got the industry's attention. According to Apache's release, it's in production use by Twitter, Netflix, Urban Airship, Constant Contact and Google. It's being used for barcode scanning and geospatial databases. The largest known production cluster exceeds 300 terabytes. It is, in short, a project that has commercial potential. Naturally, that's attracted the big guns in the database industry.
Say what you will about Oracle (I do), but the company certainly has the muscle to make an impact when it decides it wants to get involved with a technology. That's doubly true when it comes to the database market. So how's Ellis feeling about Oracle's decision to get into the noSQL game?
First, Ellis says that Oracle's entry actually helps validate the noSQL market. "Living in the echo chamber the way we do on the cutting edge of technology, it's easy to lose sight of how new this is for a majority of companies out there. This is hugely validating for us."
Validation of a business market isn't of much consolation if you're getting squeezed out, though. But Ellis says he's not worried about Oracle. "Time and again, big companies have proved it's really difficult for them to come out with a new product in a timely fashion."
It's also hard to judge right now exactly what Oracle's offering. Ellis says the company has put out very little technical detail. But from what he's seen, Ellis says that Oracle's offering is more like Cassandra than competing noSQL databases like MongoDB. "So I think they're on the right track."
Next with Cassandra
Ellis says that he's just starting to think about the post-1.0 world for Cassandra. Two features do come to mind, though, that missed the boat for 1.0 and that were on a lot of wishlists. The first is triggers.
Database triggers let you define rules in the database, such as updating table X when table Y is updated. Ellis says that triggers will be necessary for Cassandra as it grows in popularity. "As more tools use it, that's something more users are going to be asking for."
Another feature that Ellis sees making its way into Cassandra is entity groups. Ellis says that this is for data that shares a primary key. For example, if you're storing email in Cassandra you can ensure that attachments and the body of an email are not saved independently. (This would be bad, for instance, if you saved a draft email with an updated body but an old version of an attachment.)
Currently, Cassandra is on a four-month release cycle. At some point, Ellis says that it might make sense to move to a six-month cycle as it matures. But for now, expect the next release of Cassandra in early 2012.