Reverse ETL

The first "reverse" term I've ever encountered in programming was reverse proxy. Since then, I've seen passing "reverse engineering", "reverse iterator", but none of them was a pure data term. Until recently, when I heard about reverse ETL.

Reverse ETL solves the problem of making data available in specialized systems, mostly used in the operational analytics domain. In a more human-friendly tone, it takes the data you import to your data warehouse/gold tables layer and moves it to the systems dedicated to growth marketing, sales support, or customer management. In the picture below, you can see a process exporting the sales information to some 3rd party sales support system:

The reasons to move the data outside your ecosystem vary. Among the key ones you will find:

How to implement this reverse ETL? If we consider this process like any data processing step, the implementation is a simple data processing job reading the data warehouse/gold table data and importing it to the operational system. Very often, this job will use the APIs exposed by this system. The throughput can also be limited, so you may need to implement a resilient retry strategy. Besides the custom solutions, you will also find 3rd party reverse ETL tools. Unfortunately, I didn't find any Open Source library, so your comment is more than welcome if you know one! Thank you, Kurt, for sharing Grouparoo as an Open Source example.

On some cloud providers you'll be able to connect your data warehouse/golden table layer to 3rd party business tools managed by these providers. For example, GCP BigQuery is a natively supported data source for Google Analytics.

In a nutshell, reverse ETL is a way to operationalize the data warehouse/gold tables layer. Technically, it consists of loading the data to often 3rd party systems heavily used by the business users who do not necessarily master SQL (= so they can't directly use your last data layer).