Cassandra’s Data Model

- October 23, 2019

I did a previous post on Cassandra but that one focused on its fault tolerance, network architecture and scalability. This one focuses on the structure of data stored in Cassandra.

Cassandra is a wide columnar data store. Logically, you can think of data stored in Cassandra like a compound index in a conventional SQL data store. If you know the row key and the column names you want, you can get the data and you need. If you are ok searching through ALL the columns, then you just need the row key. And if you don’t have either but you’re data is stored somewhere in a Cassandra table, you’re in for an expensive full table scan. However, different from SQL data stores is that you can essentially have an unlimited number of columns and each row can choose to have whatever columns it wants. That’s why it’s called a wide-columnar data store - you could have millions of columns if you wanted to!

Within each row, the columns are stored in sorted order, so finding a specific column can be done very fast. What makes it difficult to visualize a Cassandra table is that you can’t think of it as a grid of row and columns like you can for a SQL table. Instead it’s more like a jagged 2D array. Each row can have a different number of columns, and the column names of 2 rows might have some intersection or it might have none at all. In this sense, Cassandra tables have no enforced schema.

So since the data must be keyed first by row and next by column names, in order to get the data you want for your application, you have to have the row key. So you might have multiple tables with a different row key for each type you’ll need to look up. And when you need to update the data, you often will need to update more than 1 table to keep the data in sync. This is where Cassandra really differs from a SQL data store, where all of the data could be updated in a single transaction - Cassandra might take multiple.

But the trade off is far greater scalability - without the need for ACID transactions, you’ll find Cassandra can scale to many more reads and writes per second than a SQL data store with ACID properties.

Search This Blog

masudio - tech

Uber's Michelangelo vs. Netflix's Metaflow

Cassandra’s Data Model

Comments

Post a Comment

Popular posts from this blog

ChatGPT - How Long Till They Realize I’m a Robot?

Architectural Characteristics - Transcending Requirements

Laws of Software Architecture