Apache Spark articles

Data representation in Spark - RDD

The first post about Spark internals concerns Resilient Distributed Dataset (RDD), an abstraction used to represent processed data.

Continue Reading →