Database management services on the cloud

Data migration is one of the scenarios you can face as a data engineer. It's not always an easy task but managed cloud services can help you to put in place the pipeline and solve many common problems.

New ebook 🔥

Learn 84 ways to solve common data engineering problems with cloud services.

👉 I want my Early Access edition

Database migration

A database migration is not a topic that we could summarize in a few sentences. It has multiple characteristics, such as

Data migration is the most obvious component of a database migration but is not the single one. Generally, we can also distinguish the following major steps:

  1. Initial assessment. Understand the current data, detect potential schema incompatibilities, or necessary types conversions.
  2. Schema definition. Define the target schema that can be different depending on the migration type.
  3. Data migration. After the initial setup, it's time to move the data.
  4. Migration assessment. You should verify whether the data migrated correctly. It's a good moment to check for any data consistency issues, such as null values, truncated fields, missing or duplicated rows.
  5. Switchover. It's an optional step where all the clients start to use the new database.

Database migration on the cloud

Let's see now what cloud services we can use in the data migration scenario:

AWS
Database Migration
Service
Azure
Data Factory
Azure
Data Share
GCP
Database Migration
Service
GCP
Datastream
Data type Heterogeneous and homogeneous scenarios are supported. For example, the service can migrate data from an RDBMS to DynamoDB. Heterogeneous and homogeneous scenarios are supported with CopyActivity action. For example, you can copy a relational data from SQL Server to columnar Parquet files stored on a Storage Account, as well as from an on-premise MySQL to an Azure SQL Database. Homogeneous and heterogeneous types are supported. Homogeneous scenario is supported. It's limited to RDBMS data. Heterogeneous scenarios is supported. The service writes RDBMS data to GCS.
Database type Both heterogeneous and homogeneous scenarios are covered. For example, you can migrate a PostgreSQL data to MySQL. Heterogeneous and homogeneous scenarios are supported with CopyActivity action. Homogeneous and heterogeneous types are supported. Homogeneous scenario. The service supports MySQL and PostgreSQL migrations. Heterogeneous scenario. The service synchronizes RDBMS to GCS.
Methods Change Data Capture and full loads. Full and incremental loads with Change Data Capture or a delta column. In the snapshot mode, the service supports full and incremental modes. Continuous and one-time migrations are supported. Continuous migration based on Change Data Capture.
Schema migration AWS comes with a AWS Schema Conversion Tool to migrate the schemas. Supported in the mapping tab of the CopyActivity. No. No. The service maps the schema from the RDBMS sources to unified types. Consequently, it represents all the sources with the same types in the synchronized files.
Data transformation The replication task can have a Transformation rule. It can operate at the table or column level and for example, rename any of these objects. It also supports some expressions, such as columns concatenation or nullable values coalescing. The service supports datetime formatting, data truncation, language-based types conversion in the CopyActivity. Additionally, Data Factory can use other types of activities to perform more customized data transformations. No. No. No.

Data migration might seem a very popular topic in the to-cloud migration context. However, we can migrate the data in other scenarios as well. One of such situations is data access specialization, where we move the data to optimize its querying. And it happens that you can perform this task on all of the major cloud providers!