Spark cache articles

isEmpty() trap in Spark

In general Spark's actions reflects logic implemented in a lot of equivalent methods in programming languages. As an example we can consider isEmpty() that in Spark checks the existence of only 1 element and similarly in Java's List. But it can often lead to troubles, especially when more than 1 action is invoked.

Continue Reading →

Cache in Spark

Cache is an appreciable tool when we have a greedy computation generating a lot of data. Spark also uses this feature to better handle the case of RDD which generation is heavy (for example necessities database connection or data retrieval from external web services).

Continue Reading →