An Overview of Cassandra Architecture

September 22, 2021

Apache Cassandra– a name synonymous with availability, reliability and scalability is a NoSQL database with a unique architecture. Let’s delve a little deeper into what the architecture of Cassandra comprises of

Cassandra clusters

Cassandra has distributed system architecture. Each Cassandra instance (document confirming to a specific data type definition (DTD)) is called a node. You can add multiple nodes by horizontally scaling to form a cluster. A cluster is a shell in the entire Cassandra database

( Nodes to clusters by horizontal scaling)

Column– It is the innermost layer in the Apache Cassandra database. It is divided into different headings. The headings carry the primary data for the specific entry made

Rows– It is the next layer and are basically classifications under which the columns are divided.

Cassandra stores data in table and each table is grouped under columns and rows

Structure of the table in Cassandra

USE User_data;

CREATE TABLE User_table (

User_id int,

User_name text,

User_add text,

User_phone text,

PRIMARY KEY (User_id)

);

Insert into User_data (User_id, User_name, User_add, User_phone )

VALUES(100, ‘P’, ‘San Jose’, ‘9899112453’);

Insert into User_data (User_id, User_name, User_add, User_phone )

VALUES(101, ‘P Maz’, ‘Alameda’);

OUTPUT

Key space– The key space is the outermost layer in the storage. It contains the main data, distributed according to their properties.

A cluster is the main entry point of the data into the Cassandra database.

Each node in Cassandra is capable of performing the entire database operations and deal with client requests individually without requiring any master node. The cluster doesn’t hold a single point of failure owing to the distributed nature of architecture.

One or more nodes in the cluster serve as replicas for the set of data. With data replication among the nodes, Cassandra ensures no single point of failure

The cluster nodes communicate with each other through Gossip protocol. Gossip keeps one node apprised of the status of other nodes. Every second, one node gossips with three other nodes in the cluster.

A cluster in Cassandra is divided into racks (group of metal servers collaborating resources like network switches and power source) and data centers. Snitch configuration is used to group nodes into racks and data centers. Data replication depends on the configuration of racks and data centers.

Storage in Cassandra

Commit Log- It’s the crash recovery mechanism of Cassandra or to sync issues with the server on failure of the data node. All write operations are written to commit log.

Mem-table– After commit log the data is written to the Mem-table

SS table- When the contents of the Mem-table reach a threshold value, the data is flushed from it to a disk file called the SS- table