Waiting for code

on waitingforcode.com

Vertex-centric graph processing

Graph data processing, even though seems to be less popular than streaming or files processing, is an important member of data-oriented systems. And as its "colleagues", it also has some different processing logics. The first described in this blog is called vertex-centric. Continue Reading →

Neo4j scalability and Apache Spark

Even though Apache Spark provides GraphX module, it's still possible to use the framework with other graph-based engines. One of them is Neo4j. But before using its Spark connector, it's good to know some internal execution details - especially the ones related to scalability. Continue Reading →

Handling exceptions in Scala

At first glance, the try-catch block seems to be the preferred approach to deal with exceptions for the people coming to Scala from Java. However, in reality, this approach is not a single one available in Scala. Continue Reading →

Graphs and data processing

In this blog I've covered the topics about relational databases, key-value stores, search engines or log systems. There are still some storage systems deserving some learning effort and one of them are graphs considered here in the context of data processing. Continue Reading →

Package objects

Who didn't encounter a question about helper classes ? For ones, creating them isn't legitimate since everything we can link to an object. For the others they're fully legal because they help to keep code base understandable. Scala comes with an idea that can make both sides agree - package objects. Continue Reading →

Apache Spark and data compression

Compressed data takes less place and thus may be sent faster across the network. However these advantages transform in drawbacks in the case of parallel distributed data processing where the engine doesn't know how to split it for better parallelization. Fortunately, some of compression formats can be splitted. Continue Reading →

Constructors

Scala is known as more concise language than Java. One of points showing this characteristics are classes constructors that can be defined with a single line, alongside of class name definition. Continue Reading →

Dealing with nested data in Apache Spark SQL

Nested data structure is very useful in data denormalization for Big Data needs. It avoids joins that we could use for several related and fully normalized datasets. But processing such data structures is not always simple. Fortunately Apache Spark SQL provides different utility functions helping to work with them. Continue Reading →

Quasiquotes in Scala

Apache Spark inspired not only the last week's post about closures but also the one you're reading about quasiquotes - a mysterious Scala experimental feature those existence we can difficulty suspect in the first months of work with the language. Continue Reading →