Database migrations can impose huge challenges when you consider how to start, what tools to use, and how to achieve a full database migration successfully. Earlier, we have listed the top open source you can use on migration for MySQL or MariaDB. In this blog, we'll show you how to migrate data from Microsoft Azure Database for MySQL or MariaDB.
Microsoft Azure is now known to be a contender against the two other cloud tech giants: AWS and Google Cloud. It specializes more of its Microsoft products specially their home grown MSSQL proprietary database. But not only that, it also has open sources as one of their fully managed service databases to offer publicly. Among its supported databases are MySQL and MariaDB.
Moving out from Azure Database for MySQL/MariaDB can be tedious but it depends on what type of architecture and what type of dataset you have hosted in your Azure as your current cloud provider. With the right tools, it can be achievable and a full migration can be done.
We'll focus on the tools we can use for data migrations on MySQL or MariaDB. For this blog, I'm using RHEL/CentOS to install the required packages. Let's go over and define the steps and procedures on how to do this.
Migrating From Azure Database for MySQL or MariaDB
A typical approach of migrating your data from Azure Database to an on-prem server is to take a backup using a logical copy. This can be done using backup utility solutions that are compatible to operate with Azure Database for MySQL or MariaDB which is a fully-managed service. Fully-managed database services do not offer SSH logins so physical copy of backups is not an option.
Before you can migrate or dump your existing database from Azure, you have to take note of the following considerations.
Common Use-cases For Dump and Restore On-Prem
Most common use-cases are:
- Using logical backup (such as mysqldump, mysqlpump or mydumper/myloader) and restore is the only option. Azure Database for MySQL or MariaDB does not support physical access to the physical storage as this is a fully-managed database service.
- Supports only InnoDB and Memory storage engines. Migrating from alternative storage engines to InnoDB. Azure Database for MySQL or MariaDB supports only InnoDB Storage engine, and therefore does not support alternative storage engines. If your tables are configured with other storage engines, convert them into the InnoDB engine format before migration to Azure Database for MySQL.
- For example, if you have a WordPress or WebApp using the MyISAM tables, first convert those tables by migrating into InnoDB format before restoring to Azure Database for MySQL. Use the clause ENGINE=InnoDB to set the engine used when creating a new table, then transfer the data into the compatible table before the restore.
- If your source Azure Database is on a specific version, then your target on-premise server has also been the same version as the source Azure Database.
So with these limitations, you only expect that your data from Azure has to be InnoDB storage engine or Memory, if there's such in your dataset.
Performance Considerations For Taking Logical Backup from Azure Database
The only way to take a logical backup with Azure is to use mysqldump or mysqlpump. To optimize performance when taking a dump using these tools, take notice of these considerations when dumping large databases:
- Use the exclude-triggers option in mysqldump when dumping databases. Exclude triggers from dump files to avoid the trigger commands firing during the data restore.
- Use the single-transaction option to set the transaction isolation mode to REPEATABLE READ and send a START TRANSACTION SQL statement to the server before dumping data. Dumping many tables within a single transaction causes some extra storage to be consumed during restore. The single-transaction option and the lock-tables option are mutually exclusive because LOCK TABLES causes any pending transactions to be committed implicitly. To dump large tables, combine the single-transaction option with the quick option.
- Use the extended-insert multiple-row syntax that includes several VALUE lists. This results in a smaller dump file and speeds up inserts when the file is reloaded.
- Use the order-by-primary option in mysqldump when dumping databases, so that the data is scripted in primary key order.
- Use the disable-keys option in mysqldump when dumping data, to disable foreign key constraints before load. Disabling foreign key checks provides performance gains. Enable the constraints and verify the data after the load to ensure referential integrity.
- Use partitioned tables when appropriate.
- Load data in parallel. Avoid too much parallelism that would cause you to hit a resource limit, and monitor resources using the metrics available in the Azure portal.
- Use the defer-table-indexes option in mysqlpump when dumping databases, so that index creation happens after the table's data is loaded.
- Use the skip-definer option in mysqlpump to omit definer and SQL SECURITY clauses from the create statements for views and stored procedures. When you reload the dump file, it creates objects that use the default DEFINER and SQL SECURITY values.
- Copy the backup files to an Azure blob/store and perform the restore from there, which should be a lot faster than performing the restore across the Internet.
Unsupported
The following are unsupported:
- DBA role: Restricted. Alternatively, you can use the administrator user (created during new server creation), allows you to perform most of DDL and DML statements.
- SUPER privilege: Similarly, SUPER privilege is restricted.
- DEFINER: Requires super privileges to create and is restricted. If importing data using a backup, remove the CREATE DEFINER commands manually or by using the --skip-definer command when performing a mysqldump.
- System databases: The mysql system database is read-only and used to support various PaaS functionality. You cannot make changes to the mysql system database.
- SELECT ... INTO OUTFILE: Not supported in the service.
Using mysqldump
Using mysqldump has to be installed in your target database node located on-prem. It has to be prepared as a replica of the Azure Database node so all subsequent transactions shall be replicated to the node. To do this, follow the steps below.
Install mysqldump
Prepare the repository.
# For MySQL
$ yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
# For MariaDB
$ curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash
Install mysql-client package
# For MySQL
$ yum install -y mysql-community-client.x86_64
# For MariaDB
$ yum install -y MariaDB-client
Create a data dump using mysqldump by executing it inside the target node.
$ MYSQL_PWD=<YOUR_MYSQL_PASS> mysqldump -h<YOUR_AZURE_DB_HOSTNAME> -u<YOUR_AZURE_USERNAME> --single-transaction --master-data=2 --extended-insert --order-by-primary --disable-keys --databases maximusdb db2 db3 > backups/dump.sql
Install the MySQL/MariaDB Server in the target database node
# For MySQL
$ yum install mysql-community-server.x86_64 mysql-community-client mysql-community-common
# For MariaDB
$ yum install MariaDB-server.x86_64
Setup the MySQL/MariaDB Server instance (my.cnf, file permissions, directories), and start the server
# Setting up the my.cnf (using the my.cnf deployment use by ClusterControl)
[MYSQLD]
user=mysql
basedir=/usr/
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid_file=/var/lib/mysql/mysql.pid
port=3306
log_error=/var/log/mysql/mysqld.log
log_warnings=2
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=2
slow_query_log=OFF
log_queries_not_using_indexes=OFF
innodb_buffer_pool_size=2G
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path=ibdata1:100M:autoextend
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_doublewrite=1
innodb_log_file_size=256M
innodb_log_buffer_size=32M
innodb_buffer_pool_instances=1
innodb_log_files_in_group=2
innodb_thread_concurrency=0
innodb_flush_method=O_DIRECT
innodb_rollback_on_timeout=ON
innodb_autoinc_lock_mode=2
innodb_stats_on_metadata=0
default_storage_engine=innodb
server_id=1126
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
relay_log=relay-bin
expire_logs_days=7
read_only=OFF
report_host=192.168.10.226
key_buffer_size=24M
tmp_table_size=64M
max_heap_table_size=64M
max_allowed_packet=512M
skip_name_resolve=true
memlock=0
sysdate_is_now=1
max_connections=500
thread_cache_size=512
query_cache_type=0
query_cache_size=0
table_open_cache=1024
lower_case_table_names=0
performance_schema=OFF
performance-schema-max-mutex-classes=0
performance-schema-max-mutex-instances=0
[MYSQL]
socket=/var/lib/mysql/mysql.sock
[client]
socket=/var/lib/mysql/mysql.sock
[mysqldump]
socket=/var/lib/mysql/mysql.sock
max_allowed_packet=512M
## Reset the data directory and re-install the database system files
$ rm -rf /var/lib/mysql/*
## Create the log directories
$ mkdir /var/log/mysql
$ chown -R mysql.mysql /var/log/mysql
## For MySQL
$ mysqld --initialize
## For MariaDB
$ mysql_install_db
Start the MySQL/MariaDB Server
## For MySQL
$ systemctl start mysqld
## For MariaDB
$ systemctl start mariadb
Load the data dump we have taken from Azure Database to the target database node on-prem
$ mysql --show-warnings < backups/dump.sql
Create the replication user from your Azure Database source node
CREATE USER 'repl_user'@'<your-target-node-ip>' IDENTIFIED BY 'repl_passw0rd';
GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO repl_user@'<your-target-node-ip>' IDENTIFIED BY 'repl_passw0rd';
Make sure you change the IP address of your target node's IP address as the client to connect from.
Set up the MySQL/MariaDB Server as a replica/slave of the Azure Database source node
## First, let's search or locate the CHANGE MASTER command
$ grep -rn -E -i 'change master to master' backups/dump.sql |head -1
22:-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=2938610;
## Run the CHANGE MASTER statement but add the replication user/password and the hostname as follows,
CHANGE MASTER TO MASTER_HOST='<YOUR_AZURE_DB_HOSTNAME>', MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=2938610, MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd';
## In some cases, you might have to ignore the mysql schema. Run the following statement:
SET GLOBAL replicate_wild_ignore_table='mysql.%';
## Then start the slave threads
START SLAVE;
## Check the slave status how it goes
SHOW SLAVE STATUS \G
Now that we have finally been able to replicate from Azure Database either for MySQL/MariaDB as the source of your replica located on-prem.
Using mydumper
Azure Database for MySQL or MariaDB in fact suggests that using mydumper specially for large backups such as 1TB can be your alternative option. It offers parallelism and speed when taking a dump or backup copy of your dataset from a source Azure Database node.
Follow the steps below from installing the mydumper to loading it to your destination on-prem server.
Install the binary. The binaries can be located here https://github.com/maxbube/mydumper/releases.
$ yum install https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper-0.9.5-2.el6.x86_64.rpm
Take the backup from the Azure Database source node. For example,
[root@testnode26 mydumper]# MYSQL_PWD=<YOUR_AZURE_DB_PASSWORD> /usr/bin/mydumper --outputdir=. --verbose=3 --host=<YOUR_AZURE_DB_HOSTNAME> -u <YOUR_AZURE_USER>@<YOUR_AZURE_DB_HOSTNAME> --port=3306 --kill-long-queries --chunk-filesize=5120 --build-empty-files --events --routines --triggers --compress --less-locking --success-on-1146 --regex='(maximusdb\.|db1\.|db2\.)'
** Message: Connected to a MySQL server
** Message: Using Percona Backup Locks
** (mydumper:28829): CRITICAL **: Couldn't acquire LOCK BINLOG FOR BACKUP, snapshots will not be consistent: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'BINLOG FOR BACKUP' at line 1
** Message: Started dump at: 2020-10-26 01:34:05
** Message: Written master status
** Message: Multisource slave detected.
** Message: Thread 5 connected using MySQL connection ID 64315
** Message: Thread 6 connected using MySQL connection ID 64345
** Message: Thread 7 connected using MySQL connection ID 64275
** Message: Thread 8 connected using MySQL connection ID 64283
** Message: Thread 1 connected using MySQL connection ID 64253
** Message: Thread 2 connected using MySQL connection ID 64211
** Message: Thread 3 connected using MySQL connection ID 64200
** Message: Thread 4 connected using MySQL connection ID 64211
** (mydumper:28829): CRITICAL **: Error: DB: mysql - Could not execute query: Access denied for user 'mysqldbadmin'@'%' to database 'mysql'
** Message: Thread 5 shutting down
** Message: Thread 6 shutting down
** Message: Thread 7 shutting down
** Message: Thread 8 shutting down
** Message: Thread 1 dumping data for `db1`.`TB1`
** Message: Thread 2 dumping data for `db1`.`tb2
….
As you can see, there's a limitation of taking backup from a managed database such as Azure. You might notice,
** (mydumper:28829): CRITICAL **: Couldn't acquire LOCK BINLOG FOR BACKUP, snapshots will not be consistent: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'BINLOG FOR BACKUP' at line 1
This is because, SUPER PRIVILEGE is not supported or restricted. Ideally, the best option to do this is to take the backup from a replica of your Azure Database. We'll talk about this later.
Now, at this point, mydumper will take a backup files in the form of *.gz files
Load it to your destination on-premise server
$ myloader --host localhost --directory=$(pwd) --queries-per-transaction=10000 --threads=8 --compress-protocol --verbose=3
** Message: 8 threads created
** Message: Creating database `maximusdb`
** Message: Creating table `maximusdb`.`usertbl`
** Message: Creating table `maximusdb`.`familytbl`
** Message: Creating table `db2`.`t1`
** Message: Creating table `db3`.`test1`
…
….
Setup the destination node as a slave/replica. mydumper will include a file called metadata which consists of binary log coordinates including GTID positions, for example:
$ cat metadata
Started dump at: 2020-10-26 01:35:12
SHOW MASTER STATUS:
Log: mysql-bin.000007
Pos: 801
GTID:0-3649485694-1705
Finished dump at: 2020-10-26 01:37:12
## Then run a change master from the replica or your target destination MySQL/MariaDB database node
CHANGE MASTER TO MASTER_HOST='<YOUR_AZURE_DB_HOSTNAME>', MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=801, MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd';
## Start the slave
START SLAVE;
At this point, you have now replicated from an Azure Database instance running MySQL/MariaDB. Once your application is ready to move away from your Azure Database instance, setup the endpoint going to your on-prem server and all remaining transactions from your Azure instance will be replicated to your on-prem leaving no data being missed going to your on-prem server.
Handling Limitations With Managed Databases For MySQL or MariaDB in Azure
Dealing with limitations especially when taking a backup dump of your dataset has to be 100% accurate from the point in time you have taken the backup dump. Of course, this is an ideal migration going to your on-prem. In order to deal with this, the best architecture setup is to have a replication topology presence in your Azure Database.
Once you have it and ready for migration, the mysqldump/mysqlpump or mydumper has to use the Azure Database replica as its source. Within that Azure Database replica, make sure that the SQL_THREAD is stopped so that you can snapshot or record the correctMASTER_LOG_FILE and EXEC_MASTER_LOG_POS from the result of SHOW SLAVE STATUS.
Of course, once the backup has been done, do not forget to start your Azure Database replica to start its replication threads again.
Check For Data Discrepancies
Once you have your data loaded or dumped to your on-prem server acting as a replica from the Azure Database instance, you should double check this by running checksum calculations to determine how far your data is against the source Azure Database. We suggest you use pt-table-checksum tool from Percona Toolkit, but you can create your own though by using checksumming tools such as md5 or sha256 but this takes time to do. Additionally, using pt-upgrade from Percona Toolkit can help as well after your data migration using this replication approach is done.
Conclusion
Limitations of privileges and unsupported types from Azure Database can be challenging but with the appropriate flow and architecture, it's not impossible to migrate from a fully-managed database going on-prem. All you need to do is prepare the required steps, setup the required topology from your Azure Database source, then start migration from taking backups, to replication, and total migration to your on-prem.