Reverse ETL

The first "reverse" term I've ever encountered in programming was reverse proxy. Since then, I've seen passing "reverse engineering", "reverse iterator", but none of them was a pure data term. Until recently, when I heard about reverse ETL.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

Reverse ETL solves the problem of making data available in specialized systems, mostly used in the operational analytics domain. In a more human-friendly tone, it takes the data you import to your data warehouse/gold tables layer and moves it to the systems dedicated to growth marketing, sales support, or customer management. In the picture below, you can see a process exporting the sales information to some 3rd party sales support system:

The reasons to move the data outside your ecosystem vary. Among the key ones you will find:

taking advantage of SaaS - if you need to send some invoices to the customers, there is no need to write an invoicing system. Simply export what you need to some 3rd party provider and let it do the job for you
operational analytics - not all business users know SQL. Operational analytics brings the data to the business-specialized systems that don't require any technical knowledge.

How to implement this reverse ETL? If we consider this process like any data processing step, the implementation is a simple data processing job reading the data warehouse/gold table data and importing it to the operational system. Very often, this job will use the APIs exposed by this system. The throughput can also be limited, so you may need to implement a resilient retry strategy. Besides the custom solutions, you will also find 3rd party reverse ETL tools. ~~Unfortunately, I didn't find any Open Source library, so your comment is more than welcome if you know one!~~ Thank you, Kurt, for sharing Grouparoo as an Open Source example.

On some cloud providers you'll be able to connect your data warehouse/golden table layer to 3rd party business tools managed by these providers. For example, GCP BigQuery is a natively supported data source for Google Analytics.

In a nutshell, reverse ETL is a way to operationalize the data warehouse/gold tables layer. Technically, it consists of loading the data to often 3rd party systems heavily used by the business users who do not necessarily master SQL (= so they can't directly use your last data layer).

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects

Read also about Reverse ETL here:

‘Reverse ETL’ Can Help Companies Operationalize Data Warehouses How Datadog Operationalizes Their Data Warehouse to Supercharge Their Business Teams Reverse ETL on Airflow What is Reverse ETL: A Definition & Why It's Taking Off

If you liked it, you should read:

ETL, ELT, EL - OK, but reverse ETL?! I heard about this term a few months ago for the first time and recently did some research to know more. The result is there ? https://t.co/z89O7Nww0I
— Bartosz Konieczny (@waitingforcode) January 9, 2022