Data on AWS articles

on waitingforcode.com

Listening EMR events with AWS Lambda

I really appreciate AWS services and one of the main reasons for that is the facility to implement event-driven systems. One of the interesting use cases of these events is related to the EMR service, responsible for running Apache Spark pipelines. In this post I will try to associate an action invoked every time an EMR step completes successfully. Continue Reading →

AWS Lambda - does it fit in with data processing ?

Despite the recent critics (cf. "Serverless Computing: One Step Forward, Two Steps Back" link in the Read also section), serverless movement gains the popularity. Databricks proposes a serverless platform for running Apache Spark workflows, Google Cloud Platform comes with a similar service reserved to Dataflow pipelines and Amazon Web Services, ... In this post, I will summarize the good and bad sides of my recent experiences with AWS Lambda applied to the data processing. Continue Reading →

Doing data on AWS - overview

Open Source provides a lot of interesting tools to deal with Big Data: Apache Spark, Apache Kafka, Parquet - to quote only a few of them. However nowadays data platforms without cloud support are more and rarer. It's why this topic merits its own category and posts on this blog. To not go too quickly, the first article speaks about services you can use to work with the data on AWS. Continue Reading →