The first "reverse" term I've ever encountered in programming was reverse proxy. Since then, I've seen passing "reverse engineering", "reverse iterator", but none of them was a pure data term. Until recently, when I heard about reverse ETL.
New ebook 🔥
Learn 84 ways to solve common data engineering problems with cloud services.
Reverse ETL solves the problem of making data available in specialized systems, mostly used in the operational analytics domain. In a more human-friendly tone, it takes the data you import to your data warehouse/gold tables layer and moves it to the systems dedicated to growth marketing, sales support, or customer management. In the picture below, you can see a process exporting the sales information to some 3rd party sales support system:
The reasons to move the data outside your ecosystem vary. Among the key ones you will find:
- taking advantage of SaaS - if you need to send some invoices to the customers, there is no need to write an invoicing system. Simply export what you need to some 3rd party provider and let it do the job for you
- operational analytics - not all business users know SQL. Operational analytics brings the data to the business-specialized systems that don't require any technical knowledge.
How to implement this reverse ETL? If we consider this process like any data processing step, the implementation is a simple data processing job reading the data warehouse/gold table data and importing it to the operational system. Very often, this job will use the APIs exposed by this system. The throughput can also be limited, so you may need to implement a resilient retry strategy. Besides the custom solutions, you will also find 3rd party reverse ETL tools.
Unfortunately, I didn't find any Open Source library, so your comment is more than welcome if you know one! Thank you, Kurt, for sharing Grouparoo as an Open Source example.
On some cloud providers you'll be able to connect your data warehouse/golden table layer to 3rd party business tools managed by these providers. For example, GCP BigQuery is a natively supported data source for Google Analytics.
In a nutshell, reverse ETL is a way to operationalize the data warehouse/gold tables layer. Technically, it consists of loading the data to often 3rd party systems heavily used by the business users who do not necessarily master SQL (= so they can't directly use your last data layer).