Apache Spark GraphX articles

on waitingforcode.com

Edge partitioning strategies

Previously we've learned about the vertices and edges representations in Apache Spark GraphX. At this moment to not introduce too many new concepts at once, we deliberately omitted the discovery of edges partitioning. Luckily, a new week comes and it lets us discuss that. Continue Reading →

GraphX and fault-tolerance

Bad things happen in distributed data processing and if we're prepared for them, it's better. To prevent against such issues Apache Spark is able to recompute failed partition but also to store the computation snapshot as a checkpoint. Both properties apply to GraphX module's fault-tolerance mechanism. Continue Reading →

Introduction to Apache Spark GraphX

Every time when we learn a new topic, it's important to start from the basics. We couldn't learn a new language without knowing the order of subject and verbs in a sentence. The same rule applies to Apache Spark's GraphX module that will be covered in this category. But before going into details, we'll focus on its basics. Continue Reading →