Reverse ETL

The first "reverse" term I've ever encountered in programming was reverse proxy. Since then, I've seen passing "reverse engineering", "reverse iterator", but none of them was a pure data term. Until recently, when I heard about reverse ETL.

4-day workshop · In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants · production-ready templates
See the full curriculum → €7,000 flat fee · cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

Reverse ETL solves the problem of making data available in specialized systems, mostly used in the operational analytics domain. In a more human-friendly tone, it takes the data you import to your data warehouse/gold tables layer and moves it to the systems dedicated to growth marketing, sales support, or customer management. In the picture below, you can see a process exporting the sales information to some 3rd party sales support system:

The reasons to move the data outside your ecosystem vary. Among the key ones you will find:

How to implement this reverse ETL? If we consider this process like any data processing step, the implementation is a simple data processing job reading the data warehouse/gold table data and importing it to the operational system. Very often, this job will use the APIs exposed by this system. The throughput can also be limited, so you may need to implement a resilient retry strategy. Besides the custom solutions, you will also find 3rd party reverse ETL tools. Unfortunately, I didn't find any Open Source library, so your comment is more than welcome if you know one! Thank you, Kurt, for sharing Grouparoo as an Open Source example.

On some cloud providers you'll be able to connect your data warehouse/golden table layer to 3rd party business tools managed by these providers. For example, GCP BigQuery is a natively supported data source for Google Analytics.

In a nutshell, reverse ETL is a way to operationalize the data warehouse/gold tables layer. Technically, it consists of loading the data to often 3rd party systems heavily used by the business users who do not necessarily master SQL (= so they can't directly use your last data layer).

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems contact@waitingforcode.com 📩