Waiting for code

on waitingforcode.com

Immutability and key-value storage

The immutability is a precious property of systems dealing with a lot of data. It's especially true when something goes wrong and we must recover quickly. Since the data is immutable, the cleaning step is not executed and with some additional computation power, the data can be regenerated efficiently. Continue Reading →

Akka and Distributed Data module

Recently we discovered a lot of topics related to the eventual consistency (ACID 2.0, CRDT, CALM...). Some of these posts had the code examples using the Akka distributed data module that will be covered below. Continue Reading →

ACID 2.0

ACID is a well-known acronym for almost all developers growing with the RDBMS as the main storage. However with the popularization of NoSQL and distributed computing, another ACID acronym appeared - ACID 2.0. Continue Reading →

ScalaTest and testing styles

At first glance the wide choice of testing families in Scala can scary. After all in JUnit and other xUnit frameworks, the choice of tests declaration is limited. Hopefully after some digging ScalaTest's testing styles become more obvious to understand and to use. Continue Reading →

Immutability in Big Data

The interest of immutability in Big Data is often difficult to understand at the first glance. After all it introduces some complexity - especially at the reading path. But when the first problems appear and some of data need to be recomputed in order, the immutability comes to the rescue. Continue Reading →

Scala context bound

This post begins the series of posts about Scala's features called "One Scala feature per week". Every week one particular Scala's functionality will be covered. This beginning post will explain context bound. Continue Reading →

Conflict-Free Replicated Data Type

Pessimistic replication requires a synchronous communication between the main node writing the data and the replicas. However in some cases the optimistic replication can be more efficient and still guarantee the same final result. One of solutions from this category are conflict-free replicated data types. Continue Reading →

Hierarchical queries

SQL hides for the most of its daily users a lot of interesting and powerful functions that though are not used very frequently. One of them are hierarchical queries. Continue Reading →

Correlated subqueries

Even though the RDBMS is more and more completed (replaced?) by NoSQL solutions, it still remains an important piece of the data processing. It's even more true with the distributed databases as BigQuery supporting SQL standards so the correlated subqueries. But they're also implemented in other Big Data engines as Spark SQL or more classical ones as PostgreSQL. Continue Reading →

Time series - general notes

Temporal data is a little bit particular. It can be generated very frequently, as for instance every 500 ms or less. It's then important to store it efficiently and to allow quick and flexible reads. It's also important to know the specificities of time-series as a popular case of temporal data. Continue Reading →