Articles about Compression algorithms on waitingforcode.com

September 30, 2018 • Apache Spark

Apache Spark and data compression

Compressed data takes less place and thus may be sent faster across the network. However these advantages transform in drawbacks in the case of parallel distributed data processing where the engine doesn't know how to split it for better parallelization. Fortunately, some of compression formats can be splitted.

Continue Reading →

November 26, 2017 • Apache Parquet

Compression in Parquet

Last time we've discovered different encoding methods available in Apache Parquet. But the encoding is not the single technique helping to reduce the size of files. The other one, very similar, is the compression.

Continue Reading →

Compression algorithms articles

Apache Spark and data compression

Compression in Parquet