Compaction in Apache Cassandra

Disk compaction helps to save space. Since Cassandra is supposed to store a lot of data, it can't miss this useful process.

This article focuses on compaction part in Apache Cassandra. The first part describes configuration entries which can be used to manipulate compaction settings. The second part describes compaction strategies available in Cassandra. The last part shows how these strategies work.

Configure compaction in Cassandra

To remind the basics, compaction is a process consisting on cleaning data held by Cassandra. Among that data we can distinguish: SSTables consolidation, tombstones eviction, index creation, keys merging. Its frequency and way of working can be configured through different configuration entries in cassandra.yaml file:

Compaction strategies in Cassandra

Cassandra supports 3 compaction strategies. They were shortly described in the article about Data part in Apache Cassandra. This time we'll try to present them better. Default and the most basics strategy is SizeTieredCompactionStrategy. It's understandable very easily because it's based on property called min_threshold. If the number of similar SSTables reaches the value defined here, compaction is triggered. We'll see this case in action in the last part of the article.

Another strategy is DateTieredCompactionStrategy. This strategy is very helpful for time series data since it groups data defined within similar period of time. For example, Cassandra can store data written within last 1 hour (for example: 18.00-18.59) in the same SSTable. Data written within another hour (19.00-19.59) will be stored in another SSTable and so on. The age of SSTables eligible for compaction is determined by base_time_seconds configuration. Also, as SizeTieredCompactionStrategy, DateTieredCompactionStrategy has a parameter called min_threshold. It specifies how many time windows must exist before being merged together to one big window time.

The last strategy is called LeveledCompactionStrategy and it utilizes an interesting concept of small and divided SSTables. SSTables created with this strategy are relatively small (default size is 5 MB). They're grouped into levels. Each next level is 10 times larger than the previous. Levels don't overlap. This strategy ignores concurrent_compactors parameter. It also helps to improve read operations because, almost every time, all row data is stored inside one SSTable. It's specifically useful when rows are updated very often (if data is once-written, there are no a big difference with SizeTieredCompactionStrategy).

If it's still hard to understand, let's help us with one example. We can think about LeveledCompactionStrategy as about generational garbage collector. So we've our 1st generation (1st level - L0). It contains one SSTable of 5MB. Now, we insert 5 rows, each storing 1MB of data. The level 0 is filled up and new data is coming (still 1MB row data). Because L0 is full and it can't accept new rows, Cassandra creates new level (L1). It's 10 times bigger than L0, so it can store 50MB of data. After creating this new level, Cassandra moves the L0's SSTable of 5 rows to L1 and writes new incoming rows to L0. This iteration is repeated infinitely. Every time new level is created and the rows are moved between levels. During rows moving, duplicated and updated rows are merged. By doing this operation, this compaction makes more probable that all columns of given row are held by one SSTable.

Example of compaction in Cassandra

Compaction activity can be easily investigated through nodetool compactionstats command. We'll use it to check what happens every time when test case is running. Another useful command to track compaction is nodetool compactionhistory. It shows all compactions executed in Cassandra. To see how does compaction look, we'll create this table and after insert, update and select 500 000 rows. Created table has SizeTieredCompactionStrategy with min_threshold equal to 2:

CREATE TABLE simple_team (
    teamName text,
    city text,
    PRIMARY KEY (teamName)
) WITH compaction = {'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 2}

Code responsible for rows manipulation looks like:

for (int i = 0; i < 500_000; i++) {
  Statement insert = QueryBuilder.insertInto("compactiontest", "simple_team")
    .value("city", "old")
    .value("teamName", "Team_"+i);
  SESSION.execute(insert);
}

for (int i = 0; i < 500_000; i++) {
  Statement update = QueryBuilder.update("compactiontest", "simple_team")
    .where(QueryBuilder.eq("teamName", "Team_"+i))
    .with(QueryBuilder.set("city", "new"));
  SESSION.execute(update);
}

for (int i= 0; i < 500_000; i++) {
  Statement select = QueryBuilder.select()
    .from("compactiontest", "simple_team")
    .where(QueryBuilder.eq("teamName", "Team_"+i));
  SESSION.execute(select);
}

Let's launch this code, wait for termination and check what happened with our data by checking the logs and compaction activity. First, we analyze shorter stuff - compactionhistory:

id                                   keyspace_name  columnfamily_name compacted_at            bytes_in bytes_out rows_merged
129f3d40-0de0-11e6-ade7-89f10cfa2089 compactiontest simple_team       2016-04-30T19:57:51.764 10934916 8609828   {1:245587, 2:254413} 

The output is self-explanatory. Our rows belonging to simple_team table was compacted at 19:57. As we can see, compaction helped to save 2325088 bytes (2 mb). Logs analysis gives us more explanation about how compaction is produced:

  1. The first part of data is flushed
    ColumnFamilyStore.java:1192 - Flushing largest CFS(Keyspace='compactiontest', ColumnFamily='simple_team') to free up room. 
      Used to tal: 0,50/0,00, live: 0,50/0,00, flushing: 0,00/0,00, this: 0,50/0,50
    ColumnFamilyStore.java:846 - Enqueuing flush of simple_team: 127384518 (50%) on-heap, 0 (0%) off-heap
    Memtable.java:405 - Writing Memtable-simple_team@368206467(17,267MiB serialized bytes, 377207 ops, 
      50%/0% of on/off -heap limit), flushed range = (min(-9223372036854775808), max(9223372036854775807)]
    Memtable.java:433 - Completed flushing bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-1-big-Data.db (13,559MiB) 
      for commitlog position ReplayPosition(segmentId=1461914921908, position=24791466)
    ColumnFamilyStore.java:1120 - Flushed to [BigTableReader(path='bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-1-big-Data.db')] 
      (1 sstables, 12406977 bytes), biggest 12406977 bytes, smallest 12406977 bytes
    
  2. The second part of data is flushed
    ColumnFamilyStore.java:1192 - Flushing largest CFS(Keyspace='compactiontest', ColumnFamily='simple_team') 
      to free up room. Used total: 0,50/0,00, live: 0,50/0,00, flushing: 0,00/0,00, this: 0,50/0,50
    ColumnFamilyStore.java:846 - Enqueuing flush of simple_team: 127384180 (50%) on-heap, 0 (0%) off-heap
    Memtable.java:405 - Writing Memtable-simple_team@427021901(17,267MiB serialized bytes, 377206 ops, 
      50%/0% of on/off-heap limit), flushed range = (min(-9223372036854775808), max(9223372036854775807)]
    Memtable.java:433 - Completed flushing bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-2-big-Data.db (13,556MiB) 
      for commitlog position ReplayPosition(segmentId=1461914921909, position=16021790)
    ColumnFamilyStore.java:1120 - Flushed to [BigTableReader(path='bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-2-big-Data.db')]
      (1 sstables, 12873875 bytes), biggest 12873875 bytes, smallest 12873875 bytes
    
  3. min_threshold parameter (2) is reached - it triggers compaction
    CompactionTask.java:150 - Compacting (0fe3edd0-0de0-11e6-ade7-89f10cfa2089) 
      [bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-1-big-Data.db:level=0, 
       bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-2-big-Data.db:level=0, ]
    CompactionTask.java:221 - Compacted (0fe3edd0-0de0-11e6-ade7-89f10cfa2089) 
      2 sstables to [bin/../data/data/compactiontest/simple_team-794964e00ddf11e6ade789f10cfa2089/ma-3-big,] 
      to level=0.  10 934 916 bytes to 8 609 828 (~78% of original) in 4 582ms = 1,792006MB/s.  
      0 total partitions merged to 500 000.  Partition merge counts were {1:245587, 2:254413, }
    

This article explains more in details the concept of compaction. Its first part shows which parameter can be used to configure compaction in Cassandra. The second part lists 3 compaction strategies, each one working well with different kind of data. The last part shows how to produce a compaction by simply working with big amount of data.

If you liked it, you should read:

The comments are moderated. I publish them when I answer, so don't worry if you don't see yours immediately :)

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!