Waiting for code

on waitingforcode.com

Dealing with nested data in Apache Spark SQL

Nested data structure is very useful in data denormalization for Big Data needs. It avoids joins that we could use for several related and fully normalized datasets. But processing such data structures is not always simple. Fortunately Apache Spark SQL provides different utility functions helping to work with them. Continue Reading →

Quasiquotes in Scala

Apache Spark inspired not only the last week's post about closures but also the one you're reading about quasiquotes - a mysterious Scala experimental feature those existence we can difficulty suspect in the first months of work with the language. Continue Reading →

Closures in Scala

If you're reading this blog, you've certainly noticed its big interest for Apache Spark. One of first problems we encounter with this data processing framework is a "Task not serializable" error that is caused by a not serializable closure. In this post, outside of Spark's context, we'll focus on these specific functions. Continue Reading →

Stream-to-stream state management

Last weeks we've discovered 2 stream-to-stream join types in Apache Spark Structured Streaming. As told in these posts, state management logic may be sometimes omitted (for inner joins) but generally it's advised to reduce the memory pressure. Apache Spark proposes 3 different state management strategies that will be detailed in the following sections. Continue Reading →

Lazy operator in Scala

Scala's lazy instances generation can be helpful in a lot of places. It simplifies writing since we can declare an instance at right and common place and delay its physical creation up to its first use. In Java we've this possibility too, though, it's much more verbose than in Scala. Continue Reading →

Scala and for loop

For loop was maybe the most used iterative structure in Java before the introduction of lambda expressions. With this loop we're able to write everything - starting with a simple mapping and terminating with a more complex "find-first element in the collection" feature. In Scala we can made these operations with monads and despite that, for loop is a feature offering a lot of possibilities. Continue Reading →