Data organization on disk in Apache Cassandra

on waitingforcode.com

Data organization on disk in Apache Cassandra

Until now we're working with Cassandra without looking on what happens. It's a time to be a little bit more curious.

Directories used to store data can be configured, as other options, in cassandra.yaml configuration file. Actually, we focus on 4 entries: hints_directory, data_file_directories, commitlog_directory and saved_caches_directory. Their default location is $CASSANDRA_HOME/data directory. And it's the place where they are stored in my installation:

.
├── commitlog
├── data
├── hints
└── saved_caches

4 directories, 0 files

Commitlog directory in Cassandra

Let's see what they contain by beginning by the most evident one, commitlog:

commitlog/
├── CommitLog-6-1460645722723.log
└── CommitLog-6-1460645722724.log

This directory contains binary files holding commit logs. As already told in one of previous articles, Cassandra uses these files to ensure durability. Every time when new write operation is called, Cassandra writes this operation to append-only binary commit log files. It's advised to move commit log directory to other disk or partition than the directory storing the data SSTables. It will allow Cassandra to work on both, commit logs and data, at the same time.

Data directory in Cassandra

Another key part of data organization is...data directory. It stored column family data, known also as SSTables. When listing root data directory, we'll see all keyspaces:

collectionstest  football    network_top_bad_dc   network_top_test  playerteamhistorytest  system_auth         system_schema  tabletest
deletetest       mappertest  network_top_good_dc  players           system                 system_distributed  system_traces

Each keyspace directory contains structure similar to:

.
└── test-29be0f80fc9011e596a97d20979b7bc6
    ├── backups
    └── snapshots
        └── 2460013186390-test
            ├── ma-1-big-CompressionInfo.db
            ├── ma-1-big-Data.db
            ├── ma-1-big-Digest.crc32
            ├── ma-1-big-Filter.db
            ├── ma-1-big-Index.db
            ├── ma-1-big-Statistics.db
            ├── ma-1-big-Summary.db
            ├── ma-1-big-TOC.txt
            └── manifest.json

The text file contains all components for given SSTable. x-Data.db file, as the name indicates, stores the set of rows with its data (columns, data size, keys) for this SSTable. x-Filter.db and x-Index.db files are a kind of mapping files. The filter one stores the row keys bloom filter. The second one is a map between row keys and their offsets in x-Data.db file. x-CompressionInfo.db defines the relation between compressed block and corresponding data file. The compression file is optional. When the compression is turned-off, it's not created.

Two last files, x-Statisticts.db and x-Summary.db, store respectively, the stats and the summary of index.

With tools provided by Cassandra, such us sstablemetadata, we can inspect these files. Simple output of data file contains, among others, information about partitioner, bloom filter, clustering values defined in the file, static column types or number of rows:

Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0,010000
Minimum timestamp: 1460553044290000
Maximum timestamp: 1460553044336000
SSTable min local deletion time: 2147483647
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.9113924050632911
TTL min: 0
TTL max: 0
minClustringValues: [Player_1]
maxClustringValues: [Player_2]
Estimated droppable tombstones: 0.0
SSTable Level: 0
Repaired at: 0
ReplayPosition(segmentId=1460543064439, position=6474)
totalColumnsSet: 3
totalRows: 3
Estimated cardinality: 1
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1442880000
EncodingStats minTimestamp: 1460553044290000
KeyType: org.apache.cassandra.db.marshal.UTF8Type
ClusteringTypes: [org.apache.cassandra.db.marshal.UTF8Type]
StaticColumns: {division:org.apache.cassandra.db.marshal.Int32Type, country:org.apache.cassandra.db.marshal.UTF8Type, foundationyear:org.apache.cassandra.db.marshal.Int32Type}
RegularColumns: {}

Saved cache directory in Cassandra

Another data-related directory is saved cache one. As the name indicates, it stored row/key cache to help Cassandra to run quicker on start-up. Sample content for saved cache directory can look like below:

├── KeyCache-d.crc
└── KeyCache-d.db

Hints directory in Cassandra

The remaining directory to analyze is the one holding hints. To discover the importance of this directory we should explain first the hints. Cassandra fault tolerance mechanism is based on them. When coordinating node wants to write new row on node which is not available (maintenance, temporary problems), it'll try to make the write on the replica of this node. If the replica is not available too, the write will be made locally.

Every time when a write is made on other node than initially expected, a hint is written locally. So, a hint is an information telling something like "I wrote this information on node A instead of on node B. When node B is available, I'll replay this write on it". This technique is called hinted handoff.

This article shows 4 main items for data storage in Apache Cassandra. The first part describes commitlog files. We can see that there are append-only log files. The second part presents data directory structure and shows that they contain all information needed by Cassandra to know where row demanded by client can be stored. Two last parts are little bit shorter. The third described briefly the role of cache directory. The last one explains the idea of hints, as one of methods to prevent data inconsistency caused by replication issues.

Share on: