Spark cache articles

Articles tagged with Spark cache. There are 2 article(s) corresponding to the tag Spark cache. If you don't find what you're looking for, please check related tags: access pattern, Ad-hoc polymorphism, Akka Distributed Data, Akka examples, algorithm analysis, algorithm complexity, Apache Beam configuration, Apache Beam internals, Apache Beam partitioning, Apache Beam PCollection.

Check out my new course on Data Engineering!

Are you a data scientist who wants to extend his data engineering skills? Or a software engineer who wants to work with Big Data? If not, maybe a BI developer who wants to evolve to engineering position? My course will help you to achieve your goal! Join the class →

isEmpty() trap in Spark

In general Spark's actions reflects logic implemented in a lot of equivalent methods in programming languages. As an example we can consider isEmpty() that in Spark checks the existence of only 1 element and similarly in Java's List. But it can often lead to troubles, especially when more than 1 action is invoked. Continue Reading →

Cache in Spark

Cache is an appreciable tool when we have a greedy computation generating a lot of data. Spark also uses this feature to better handle the case of RDD which generation is heavy (for example necessities database connection or data retrieval from external web services). Continue Reading →