| Abstract: |
Faced with the costs of vertically scaling their relational database systems, some application developers have considered NOSQL databases, including Apache Cassandra, as alternatives. These databases solve the scaling problem by partitioning data, expanding horizontally and promising “eventual” consistency. Effectively utilizing these new databases requires that developers take different approaches to the ways they model data used in their applications. Specifically, developers must cope without transactions, ad-hoc queries or automatic indexes. Further, developers who come from a strong RDBMS background will find they need to overcome dogmatic thinking, as some best practices are no longer best. Developers should also be aware of situations when Cassandra might not be the best tool for the job. In this presentation I will explore some of the motivation for using Cassandra, as well examine the modeling patterns that are emerging as more application developers adopt and become familiar with NOSQL databases. I will also explain some of the architecture internals of Cassandra as they relate to how application data can be most effectively modeled. I will conclude with a few brief case studies outlining how some companies are using Cassandra. Apache Cassandra is a fully distributed non-relational database that offers the ability to scale horizontally with no single point of failure. It features flexible, partially structured schema, customizable partitioning and multiple levels of indexing. Cassandra is in use at Digg, Twitter, Reddit, Facebook and Rackspace, and other companies that have large, active data sets. The largest production cluster has over 100 TB of data spread over more than 150 machines. |