Append and truncate in HDFS

Versions: Hadoop 2.7.2

Making an immutable distributed file system is easier than building a mutable one. HDFS, even if initially was destined to not changing data, supports mutability through 2 operations: append and truncate.

This post presents these 2 operations more in detail. The first shown operation concerns append. The second part describes truncate.

Append explained

Append operation consists on adding new data at the end of the file. Thus, the file changes its length and probably the number of blocks. Append algorithm in HDFS can be resumed in following steps:

  1. The client sends append request to the NameNode
  2. The NameNode checks if file is closed - otherwise append is not allowed. If the file is closed, it moves to Under Construction state.
  3. The NameNode checks the last block of file: if it's full, the NameNode initializes new block that will hold appended fragment. If the block is not full, it will be used to handle new data.
  4. The pipeline is resolved: for fully block a new pipeline is created and for not full block the pipeline associated with this block is taken.
  5. Data is written as in the case of file creation: within specified pipeline.

Single append is transparent for snapshots because only the lenght of modified file changes.

Below example shows the simple use case of append:

hadoop fs -copyFromLocal ~/tested_file.txt /copied_file.txt

hadoop fs -appendToFile ~/tested_file.txt /copied_file.txt

Logs associated with this operation contain:

# NameNode part
12:44:07,139 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=bartosz (auth:SIMPLE)     ip=/127.0.0.1   cmd=append      src=/copied_file.txt    dst=null        perm=null       proto=rpc

# DataNode execution
12:44:07,327 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Appending to FinalizedReplica, blk_1073741825_1001, FINALIZED
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /home/bartosz/hdfs_dir/data_blocks/current
  getBlockFile()    = /home/bartosz/hdfs_dir/data_blocks/current/BP-1817513253-127.0.1.1-1481542921087/current/finalized/subdir0/subdir0/blk_1073741825
  unlinked          =false 

# NameNode creates pipeline
12:44:07,359 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(blk_1073741825_1001, newGS=1002, newLength=7, newNodes=[127.0.0.1:50010], client=DFSClient_NONMAPREDUCE_772042077_1)
12:44:07,359 INFO BlockStateChange: BLOCK* Removing stale replica from location: [DISK]DS-b2084b71-fe36-4a2e-9dd8-dc1b1094de7c:NORMAL:127.0.0.1:50010
12:44:07,372 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(blk_1073741825_1001 => blk_1073741825_1002) success

# DataNode from the pipeline handles append request
12:44:07,380 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /127.0.0.1:34210, dest: /127.0.0.1:50010, bytes: 14, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_772042077_1, offset: 0, srvID: 2e5f9bdf-f444-4a62-88c3-15882e84e1c9, blockid: BP-1817513253-127.0.1.1-1481542921087:blk_1073741825_1002, duration: 45130390
12:44:07,380 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1817513253-127.0.1.1-1481542921087:blk_1073741825_1002, type=LAST_IN_PIPELINE, downstreams=0:[] terminating

# NameNode confirms append
12:44:07,381 DEBUG BlockStateChange: *BLOCK* NameNode.blockReceivedAndDeleted: from DatanodeRegistration(127.0.0.1:50010, datanodeUuid=2e5f9bdf-f444-4a62-88c3-15882e84e1c9, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-6133e6a0-7312-4f20-85ac-3545cecc5bfd;nsid=870060382;c=0) 1 blocks.
12:44:07,381 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1073741825_1002{UCState=UNDER_CONSTRUCTION, truncateBlock=null, primaryNodeIndex=-1, replicas=[ReplicaUC[[DISK]DS-b2084b71-fe36-4a2e-9dd8-dc1b1094de7c:NORMAL:127.0.0.1:50010|RBW]]} size 7
12:44:07,382 DEBUG BlockStateChange: BLOCK* block RECEIVED_BLOCK: blk_1073741825_1002 is received from 127.0.0.1:50010
12:44:07,382 DEBUG BlockStateChange: *BLOCK* NameNode.processIncrementalBlockReport: from 127.0.0.1:50010 receiving: 0, received: 1, deleted: 0
12:44:07,391 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /copied_file.txt is closed by DFSClient_NONMAPREDUCE_772042077_1

Truncate operation in HDFS

The opposite operation for append is truncate. Its goal is to remove data from the tail of the file. The algorithm also manipulates the last block(s) to achieve the goal:

  1. The client sends a truncate request containing the name of the file to truncate and the new length.
  2. NameNode checks if the file is closed - otherwise the operation is not permitted.
  3. If after truncate the last block is not empty (for example: truncate removed only 2.5 of 3 last blocks), the NameNode marks file as Under Construction and acquires the lease on it.
  4. The last not fully truncated block is in Under Recovery state and the NameNode starts truncate recovery process.
  5. Truncate recovery consists on making all not fully truncated blocks to be of the same length. NameNode identifies one DataNode holding the block's replica and asks it to synchronize block's format in all DataNodes having it.
  6. When all DataNodes confirm the change, selected DataNode informs NameNode about it.
  7. The NameNode persists the change on edit logs and removes the lease from the file.

Handling truncate in the case of snapshots needs a little bit more work from the part of HDFS. When the last block is not fully truncated and it's used in one of snapshots, HDFS will create new block holding data after truncate. The old block will still keep the data before truncate, until the last snapshot using it is removed.

As in the case of append, below a simple use case of truncate:

# Create file with the length 7
hadoop fs -copyFromLocal ~/hadoop/tested_file.txt /copied_file2.txt

hadoop fs -truncate 3 /copied_file2.txt
Truncating /copied_file2.txt to length: 3. Wait for block recovery to complete before further updating this file.

Logs produced by this call are:

# NameNode receives client's request
12:58:51,539 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=bartosz (auth:SIMPLE)     ip=/127.0.0.1   cmd=getfileinfo src=/copied_file2.txt   dst=null        perm=null       proto=rpc
12:58:51,574 INFO BlockStateChange: BLOCK* blk_1073741826_1003{UCState=UNDER_RECOVERY, truncateBlock=blk_1073741826_1004, primaryNodeIndex=0, replicas=[ReplicaUC[[DISK]DS-b2084b71-fe36-4a2e-9dd8-dc1b1094de7c:NORMAL:127.0.0.1:50010|RBW]]} recovery started, primary=ReplicaUC[[DISK]DS-b2084b71-fe36-4a2e-9dd8-dc1b1094de7c:NORMAL:127.0.0.1:50010|RBW]
12:58:51,603 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true    ugi=bartosz (auth:SIMPLE)     ip=/127.0.0.1   cmd=truncate    src=/copied_file2.txt   dst=null        perm=bartosz:supergroup:rw-r--r--     proto=rpc

# DataNode's work - the last block was not fully truncated,
# the recovery is started
12:58:54,094 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: NameNode at localhost/127.0.0.1:9000 calls recoverBlock(BP-1817513253-127.0.1.1-1481542921087:blk_1073741826_1003, targets=[DatanodeInfoWithStorage[127.0.0.1:50010,null,null]], newGenerationStamp=1004)
12:58:54,095 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: initReplicaRecovery: blk_1073741826_1003, recoveryId=1004, replica=FinalizedReplica, blk_1073741826_1003, FINALIZED
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /home/bartosz/hdfs_dir/data_blocks/current
  getBlockFile()    = /home/bartosz/hdfs_dir/data_blocks/current/BP-1817513253-127.0.1.1-1481542921087/current/finalized/subdir0/subdir0/blk_1073741826
  unlinked          =false
12:58:54,095 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: initReplicaRecovery: changing replica state for blk_1073741826_1003 from FINALIZED to RUR
12:58:54,097 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: updateReplica: BP-1817513253-127.0.1.1-1481542921087:blk_1073741826_1003, recoveryId=1004, length=3, replica=ReplicaUnderRecovery, blk_1073741826_1003, RUR
  getNumBytes()     = 7
  getBytesOnDisk()  = 7
  getVisibleLength()= 7
  getVolume()       = /home/bartosz/hdfs_dir/data_blocks/current
  getBlockFile()    = /home/bartosz/hdfs_dir/data_blocks/current/BP-1817513253-127.0.1.1-1481542921087/current/finalized/subdir0/subdir0/blk_1073741826
  recoveryId=1004
  original=FinalizedReplica, blk_1073741826_1003, FINALIZED
    getNumBytes()     = 7
    getBytesOnDisk()  = 7
    getVisibleLength()= 7
    getVolume()       = /home/bartosz/hdfs_dir/data_blocks/current
    getBlockFile()    = /home/bartosz/hdfs_dir/data_blocks/current/BP-1817513253-127.0.1.1-1481542921087/current/finalized/subdir0/subdir0/blk_1073741826
  unlinked          =false
# Summary of truncate operation:
12:58:54,131 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: truncateBlock: blockFile=/home/bartosz/hdfs_dir/data_blocks/current/BP-1817513253-127.0.1.1-1481542921087/current/finalized/subdir0/subdir0/blk_1073741826, metaFile=/home/bartosz/hdfs_dir/data_blocks/current/BP-1817513253-127.0.1.1-1481542921087/current/finalized/subdir0/subdir0/blk_1073741826_1004.meta, oldlen=7, newlen=3

# NameNode
12:58:54,132 DEBUG BlockStateChange: *BLOCK* NameNode.blockReceivedAndDeleted: from DatanodeRegistration(127.0.0.1:50010, datanodeUuid=2e5f9bdf-f444-4a62-88c3-15882e84e1c9, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=CID-6133e6a0-7312-4f20-85ac-3545cecc5bfd;nsid=870060382;c=0) 1 blocks.
12:58:54,132 DEBUG BlockStateChange: BLOCK* block RECEIVED_BLOCK: blk_1073741826_1004 is received from 127.0.0.1:50010
12:58:54,132 DEBUG BlockStateChange: *BLOCK* NameNode.processIncrementalBlockReport: from 127.0.0.1:50010 receiving: 0, received: 1, deleted: 0
12:58:54,141 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: commitBlockSynchronization(oldBlock=BP-1817513253-127.0.1.1-1481542921087:blk_1073741826_1003, newgenerationstamp=1004, newlength=3, newtargets=[127.0.0.1:50010]) successful

Append and truncate are opposite operations which make mutability possible in HDFS. Append allows to add new data at the end of file while truncate to cut some last characters in file. Both are different logic: append is much simpler since it deals mostly with file length. Truncate in the other side must take into account such aspects as not full last block or truncated block referenced in snapshots.