Compaction in Apache Cassandra on waitingforcode.com

Disk compaction helps to save space. Since Cassandra is supposed to store a lot of data, it can't miss this useful process.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

This article focuses on compaction part in Apache Cassandra. The first part describes configuration entries which can be used to manipulate compaction settings. The second part describes compaction strategies available in Cassandra. The last part shows how these strategies work.

Configure compaction in Cassandra

To remind the basics, compaction is a process consisting on cleaning data held by Cassandra. Among that data we can distinguish: SSTables consolidation, tombstones eviction, index creation, keys merging. Its frequency and way of working can be configured through different configuration entries in cassandra.yaml file:

snapshot_before_compaction - indicates whether a snapshot should be taken before launching a compaction. It's helpful in the case when something goes wrong and we need to restore data. However, Cassandra doesn't clean up old snapshots and very quickly it can become an overhead. According to configuration, this configuration is "mostly useful if you're paranoid when is a data format change"
concurrent_compactors - defines the number of simultaneous compactions. According to configuration's advice, if data is stored on SSD, the value of concurrent_compactors should be equal to the number of CPUs. Otherwise, as default value, is taken the minimal value among number of disks and number of cores. Before tune this value because of too slow or too fast compaction, you should try to tune the next value of this list - compaction_throughput_mb_per_sec.
compaction_throughput_mb_per_sec - this value influences compaction frequency depending on data writes. It should increase with the amount of written data per second, ie. more data is written, then bigger value should be set. The recommended value (for the most cases) is 16-32 times of the write throughput. The default value is 16 which means that it should perform well for writes of 1 MBs/second.

Compaction strategies in Cassandra

Cassandra supports 3 compaction strategies. They were shortly described in the article about Data part in Apache Cassandra. This time we'll try to present them better. Default and the most basics strategy is SizeTieredCompactionStrategy. It's understandable very easily because it's based on property called min_threshold. If the number of similar SSTables reaches the value defined here, compaction is triggered. We'll see this case in action in the last part of the article.

Another strategy is DateTieredCompactionStrategy. This strategy is very helpful for time series data since it groups data defined within similar period of time. For example, Cassandra can store data written within last 1 hour (for example: 18.00-18.59) in the same SSTable. Data written within another hour (19.00-19.59) will be stored in another SSTable and so on. The age of SSTables eligible for compaction is determined by base_time_seconds configuration. Also, as SizeTieredCompactionStrategy, DateTieredCompactionStrategy has a parameter called min_threshold. It specifies how many time windows must exist before being merged together to one big window time.

The last strategy is called LeveledCompactionStrategy and it utilizes an interesting concept of small and divided SSTables. SSTables created with this strategy are relatively small (default size is 5 MB). They're grouped into levels. Each next level is 10 times larger than the previous. Levels don't overlap. This strategy ignores concurrent_compactors parameter. It also helps to improve read operations because, almost every time, all row data is stored inside one SSTable. It's specifically useful when rows are updated very often (if data is once-written, there are no a big difference with SizeTieredCompactionStrategy).

If it's still hard to understand, let's help us with one example. We can think about LeveledCompactionStrategy as about generational garbage collector. So we've our 1st generation (1st level - L0). It contains one SSTable of 5MB. Now, we insert 5 rows, each storing 1MB of data. The level 0 is filled up and new data is coming (still 1MB row data). Because L0 is full and it can't accept new rows, Cassandra creates new level (L1). It's 10 times bigger than L0, so it can store 50MB of data. After creating this new level, Cassandra moves the L0's SSTable of 5 rows to L1 and writes new incoming rows to L0. This iteration is repeated infinitely. Every time new level is created and the rows are moved between levels. During rows moving, duplicated and updated rows are merged. By doing this operation, this compaction makes more probable that all columns of given row are held by one SSTable.

Example of compaction in Cassandra

Compaction activity can be easily investigated through nodetool compactionstats command. We'll use it to check what happens every time when test case is running. Another useful command to track compaction is nodetool compactionhistory. It shows all compactions executed in Cassandra. To see how does compaction look, we'll create this table and after insert, update and select 500 000 rows. Created table has SizeTieredCompactionStrategy with min_threshold equal to 2:

CREATE TABLE simple_team (
    teamName text,
    city text,
    PRIMARY KEY (teamName)
) WITH compaction = {'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 2}

Code responsible for rows manipulation looks like:

for (int i = 0; i < 500_000; i++) {
  Statement insert = QueryBuilder.insertInto("compactiontest", "simple_team")
    .value("city", "old")
    .value("teamName", "Team_"+i);
  SESSION.execute(insert);
}

for (int i = 0; i < 500_000; i++) {
  Statement update = QueryBuilder.update("compactiontest", "simple_team")
    .where(QueryBuilder.eq("teamName", "Team_"+i))
    .with(QueryBuilder.set("city", "new"));
  SESSION.execute(update);
}

for (int i= 0; i < 500_000; i++) {
  Statement select = QueryBuilder.select()
    .from("compactiontest", "simple_team")
    .where(QueryBuilder.eq("teamName", "Team_"+i));
  SESSION.execute(select);
}

Let's launch this code, wait for termination and check what happened with our data by checking the logs and compaction activity. First, we analyze shorter stuff - compactionhistory:

id                                   keyspace_name  columnfamily_name compacted_at            bytes_in bytes_out rows_merged
129f3d40-0de0-11e6-ade7-89f10cfa2089 compactiontest simple_team       2016-04-30T19:57:51.764 10934916 8609828   {1:245587, 2:254413}

The output is self-explanatory. Our rows belonging to simple_team table was compacted at 19:57. As we can see, compaction helped to save 2325088 bytes (2 mb). Logs analysis gives us more explanation about how compaction is produced:

The first part of data is flushed

ColumnFamilyStore.java:1192 - Flushing largest CFS(Keyspace='compactiontest', ColumnFamily='simple_team') to free up room. 
  Used to tal: 0,50/0,00, live: 0,50/0,00, flushing: 0,00/0,00, this: 0,50/0,50
ColumnFamilyStore.java:846 - Enqueuing flush of simple_team: 127384518 (50%) on-heap, 0 (0%) off-heap
Memtable.java:405 - Writing Memtable-simple_team@368206467(17,267MiB serialized bytes, 377207 ops, 
  50%/0% of on/off -heap limit), flushed range = (min(-9223372036854775808), max(9223372036854775807)]
Memtable.java:433 - Completed flushing bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-1-big-Data.db (13,559MiB) 
  for commitlog position ReplayPosition(segmentId=1461914921908, position=24791466)
ColumnFamilyStore.java:1120 - Flushed to [BigTableReader(path='bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-1-big-Data.db')] 
  (1 sstables, 12406977 bytes), biggest 12406977 bytes, smallest 12406977 bytes

The second part of data is flushed

ColumnFamilyStore.java:1192 - Flushing largest CFS(Keyspace='compactiontest', ColumnFamily='simple_team') 
  to free up room. Used total: 0,50/0,00, live: 0,50/0,00, flushing: 0,00/0,00, this: 0,50/0,50
ColumnFamilyStore.java:846 - Enqueuing flush of simple_team: 127384180 (50%) on-heap, 0 (0%) off-heap
Memtable.java:405 - Writing Memtable-simple_team@427021901(17,267MiB serialized bytes, 377206 ops, 
  50%/0% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(9223372036854775807)]
Memtable.java:433 - Completed flushing bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-2-big-Data.db (13,556MiB) 
  for commitlog position ReplayPosition(segmentId=1461914921909, position=16021790)
ColumnFamilyStore.java:1120 - Flushed to [BigTableReader(path='bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-2-big-Data.db')]
  (1 sstables, 12873875 bytes), biggest 12873875 bytes, smallest 12873875 bytes

min_threshold parameter (2) is reached - it triggers compaction

CompactionTask.java:150 - Compacting (0fe3edd0-0de0-11e6-ade7-89f10cfa2089) 
  [bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-1-big-Data.db:level=0, 
   bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-2-big-Data.db:level=0, ]
CompactionTask.java:221 - Compacted (0fe3edd0-0de0-11e6-ade7-89f10cfa2089) 
  2 sstables to [bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-3-big,] 
  to level=0.  10 934 916 bytes to 8 609 828 (~78% of original) in 4 582ms = 1,792006MB/s.  
  0 total partitions merged to 500 000.  Partition merge counts were {1:245587, 2:254413, }

This article explains more in details the concept of compaction. Its first part shows which parameter can be used to configure compaction in Cassandra. The second part lists 3 compaction strategies, each one working well with different kind of data. The last part shows how to produce a compaction by simply working with big amount of data.

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects