Snapshot in HDFS

Versions: Hadoop 2.7.2

Implementing snapshots in distributed file systems is not a simple job. It must take into account different aspects, such as file deletion or content changes, and keep file system consistent among them.

The first part defines snapshots in HDFS. It shows some basic information about this feature. The second part describes snapshots use in the cases of file delete and append/truncate. The last part shows how to use snapshot in command line.

Definition

Snapshot is a read-only persistent data structure representing the content of a directory at given point of time. The content means here a directory structure with all files metadata (block list, size) and subdirectories.

This persistent data structure can be useful to directory recovery from human errors. The recovery from snapshot is much quicker than the restore of whole FSImage. It's also more scriptable because we can easily create a snapshot of important directory before modifying it.

Only directories marked as "snapshottable" can contain snapshots. Each snapshottable directory has a subdirectory called .snapshot created inside. It's the place where all snapshots are stored. A snapshot can have customized name or can use the default one (based on timestamp: "s'yyyyMMdd-HHmmss.SSS"). Each snapshottable directory can keep at most 65 536 snapshots.

Snapshot under-the-hood

Snapshotted files are represented as a kind of symbolic links, referencing already existent blocks. It means that blocks aren't copied when a new snapshot is created. These pointers are very useful to know if a file can be deleted. If user triggers delete command, under-the-hood HDFS checks if deleted file can be physically removed from file system or if it can only be removed from namespace. It does it in org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOpt's delete method, more particularly here:

// collect block and update quota
if (!targetNode.isInLatestSnapshot(latestSnapshot)) {
  targetNode.destroyAndCollectBlocks(fsd.getBlockStoragePolicySuite(), collectedBlocks, removedINodes);
} else {
  QuotaCounts counts = targetNode.cleanSubtree(
    fsd.getBlockStoragePolicySuite(), CURRENT_STATE_ID, latestSnapshot, collectedBlocks, removedINodes
  );
  removed = counts.getNameSpace();
  fsd.updateCountNoQuotaCheck(iip, iip.length() -1, counts.negation());
}

As you can see, blocks of files present in the most recent snapshot can't be physically deleted. It's the reason why in NameNode logs the entry indicating block deletion BlockStateChange: BLOCK* BlockManager: ask 127.0.0.1:50010 to delete [blk_1073741827_1003] is absent for this case.

For the append operations on already snapshotted files, snapshot is able to hold only initially saved data thanks to the length of file. The content of file is read to the end of file defined by file length. The same logic applies for file truncation.

Operations

Snapshots are easily created with hdfs dfsadmin command. It takes different options helping to tell if given directory allows/disallows snapshots. To create snapshot, hdfs dfs -createSnapshot should be used. Below an example of created snapshot and the recovery after file delete:

# Create directory structure
hdfs dfs -mkdir /my_dir
hdfs dfs -mkdir /my_dir/1
hdfs dfs -mkdir /my_dir/2
hdfs dfs -mkdir /my_dir/1/a
hdfs dfs -mkdir /my_dir/1/b

# Allow snapshots creation for /my_dir
hdfs dfsadmin -allowSnapshot /my_dir
Allowing snaphot on /my_dir succeeded

# Create the first snapshot
hdfs dfs -createSnapshot /my_dir testSnap1
Created snapshot /my_dir/.snapshot/testSnap1

# Snapshots are not listed with usual ls command
hdfs dfs -ls /my_dir 
Found 2 items
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 09:51 /my_dir/1
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 09:51 /my_dir/2

# But they can be read directly
hdfs dfs -ls /my_dir/.snapshot 
Found 1 items
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 09:52 /my_dir/.snapshot/testSnap1

hdfs dfs -ls /my_dir/.snapshot/testSnap1
Found 2 items
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 09:51 /my_dir/.snapshot/testSnap1/1
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 09:51 /my_dir/.snapshot/testSnap1/2

# Copy not empty local file
hadoop fs -copyFromLocal ~/tested_file.txt /my_dir/tested_file.txt

# Create new snapshot - this time for directory
# containing a file
hdfs dfs -createSnapshot /my_dir testSnap2
Created snapshot /my_dir/.snapshot/testSnap2

# Check if file is there
hdfs dfs -ls /my_dir/.snapshot/testSnap2
Found 3 items
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 09:51 /my_dir/.snapshot/testSnap2/1
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 09:51 /my_dir/.snapshot/testSnap2/2
-rw-r--r--   1 bartosz supergroup          7 2016-12-03 10:01 /my_dir/.snapshot/testSnap2/tested_file.txt

# This command serves to compare snapshots
hdfs  snapshotDiff /my_dir .snapshot/testSnap1 .snapshot/testSnap2
Difference between snapshot testSnap1 and snapshot testSnap2 under directory /my_dir:
M	.
+	./tested_file.txt

# Remove previously copied file
hdfs dfs -rm /my_dir/tested_file.txt
16/12/03 11:36:25 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /my_dir/tested_file.txt

# Restore snapshot directory to a temporary place
# to see if file is there
hdfs dfs -cp /my_dir/.snapshot/testSnap2 /my_dir_from_snapshot
hdfs dfs -ls /my_dir_from_snapshot
Found 3 items
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 11:37 /my_dir_from_snapshot/1
drwxr-xr-x   - bartosz supergroup          0 2016-12-03 11:37 /my_dir_from_snapshot/2
-rw-r--r--   1 bartosz supergroup          7 2016-12-03 11:37 /my_dir_from_snapshot/tested_file.txt

# Check file's content
hdfs dfs -cat /my_dir_from_snapshot/tested_file.txt
Test 1

Let's now see what happens on file append operation:

# Create new directory and copy file with "Test 1" content
hdfs dfs -mkdir /appendable_dir
hadoop fs -copyFromLocal ~/tested_file.txt /my_other_dir/appendable.txt

# Allow snapshots for created directory
hdfs dfsadmin -allowSnapshot /appendable_dir 
Allowing snaphot on /appendable_dir succeeded

# Create snapshot 
current/bin/hdfs dfs -createSnapshot /appendable_dir testSnap
Created snapshot /appendable_dir/.snapshot/testSnap

# Check content 
hdfs dfs -cat /appendable_dir/.snapshot/testSnap/appendable.txt
Test 1

# Append content to file in directory
hadoop fs -appendToFile ~/doc/code/bigdata/hadoop/tested_file.txt /appendable_dir/appendable.txt
# And check the new content
hdfs dfs -cat /appendable_dir/appendable.txt
Test 1
Test 1

# Compare the length of both files
hadoop fs -du /appendable_dir/appendable.txt
14  /appendable_dir/appendable.txt
hadoop fs -du /appendable_dir/.snapshot/testSnap/appendable.txt
7  /appendable_dir/.snapshot/testSnap/appendable.txt

This post presents snapshot feature in HDFS. The first part describes the basics: conditions to make them work and default behavior. The second part focuses on 2 specific cases of snapshotted files: delete and append/truncate. We can learn from there that snapshots work as symbolic links preventing sometimes blocks to be deleted. We can also discover that the support for append/truncate operations is based on file length. The last part shows how to create snapshots for situations described in previous section.