Amazon Web Services is a technology giant, especially when it comes to pioneering itself in top-of-the-line cloud computing services. Its fully managed services products (Amazon RDS) is one of a kind. But then again, while it can be a perfect platform for some organizations, it can be a challenge to move out from it if it’s not. There are always concerns of being stuck in a vendor lock-in situation.
Some things to keep in mind when migrating from RDS to an on-premise platform are budget constraints, security, and data autonomy. This is because data is your most valuable asset and retaining control wherever it resides, it is always imperative for the organization and company to always remain competitive. No organization can afford to have cloud lock-in, and yet, many enterprises find themselves exactly in that situation and start hopping searching for any alternative existing solutions that can be operable via on-prem.
This blog will walk you through how to migrate from Amazon RDS going to an on-prem server. Our target database on the on-prem server is on a RHEL/CentOS Linux server, but the applicable procedure shall apply on other versions of Linux as well as long as packages are properly installed.
There are some third-party existing solutions that offer data migration but it's not applicable for an on-premise platform. Additionally, it is not free and migrating using free with open source solutions is always favorable and advantageous. Though doubts and concerns also exist since the warranty and support is not bound with open-source technologies but we will show you here how to achieve this in a straightforward procedure.
Since Amazon RDS supports compatibility with MySQL and MariaDB. We will focus on them for this blog.
Migrating from Amazon RDS for MySQL or MariaDB
A typical approach of migrating your data from Amazon RDS to an on-prem server is to take a backup using a logical copy. This can be done using backup utility solutions that are compatible to operate with Amazon RDS which is a fully-managed service. Fully-managed database services do not offer SSH logins so physical copy of backups is not an option.
Using mysqldump
Using mysqldump has to be installed in your target database node located on-prem. It has to be prepared as a replica of the AWS RDS node so all subsequent transactions shall be replicated to the node. To do this, follow the steps below.
AWS RDS source host: database-1.xxxxxxx.us-east-2.rds.amazonaws.com
On-Prem Server Host: 192.168.10.226 (testnode26)
Before starting the dump, make sure that the binlog retention hours is set. To set it, you can do like the example procedure call below in your Amazon RDS instance,
mysql> call mysql.rds_set_configuration('binlog retention hours', 24);
Query OK, 2 rows affected (0.23 sec)
mysql> CALL mysql.rds_show_configuration;
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| name | value | description |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
| binlog retention hours | 24 | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |
+------------------------+-------+------------------------------------------------------------------------------------------------------+
1 row in set (0.23 sec)
Query OK, 0 rows affected (0.23 sec)
Install mysqldump
Prepare the repository.
# For MySQL
$ yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
# For MariaDB
$ curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash
Install mysql-client package
# For MySQL
$ yum install -y mysql-community-client.x86_64
# For MariaDB
$ yum install -y MariaDB-client
Create a data dump using mysqldump by executing it inside the target node. Take note, with --master-data=2 specified as an option, this works only for MariaDB but not in MySQL. So extra-work for MySQL has to be done. We'll talk on this later.
## Applicable for MariaDB approach
[root@testnode26 ~]# mysqldump -h database-1.xxxxxxx.us-east-2.rds.amazonaws.com -uadmin -p --single-transaction --master-data=2 --databases db1 db2 db3 > backups/dump.sql
Enter password:
[root@testnode26 ~]# ls -alth backups/dump.sql
-rw-r--r--. 1 root root 196M Oct 18 02:34 backups/dump.sql
Install the MySQL/MariaDB Server in the target database node
# For MySQL (always check what version repository is enabled in your yum repository. At this point, I'm using MySQL 5.7)
$ yum --disablerepo=* --enablerepo=mysql57-community install mysql-community-common mysql-community-client mysql-community-server
# For MariaDB
$ yum install MariaDB-server.x86_64
Setup the MySQL/MariaDB Server instance (my.cnf, file permissions, directories), and start the server
# Setting up the my.cnf (using the my.cnf deployment use by ClusterControl)
[MYSQLD]
user=mysql
basedir=/usr/
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid_file=/var/lib/mysql/mysql.pid
port=3306
log_error=/var/log/mysql/mysqld.log
log_warnings=2
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=2
slow_query_log=OFF
log_queries_not_using_indexes=OFF
innodb_buffer_pool_size=2G
innodb_flush_log_at_trx_commit=2
innodb_file_per_table=1
innodb_data_file_path=ibdata1:100M:autoextend
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_doublewrite=1
innodb_log_file_size=256M
innodb_log_buffer_size=32M
innodb_buffer_pool_instances=1
innodb_log_files_in_group=2
innodb_thread_concurrency=0
innodb_flush_method=O_DIRECT
innodb_rollback_on_timeout=ON
innodb_autoinc_lock_mode=2
innodb_stats_on_metadata=0
default_storage_engine=innodb
server_id=1126
binlog_format=ROW
log_bin=binlog
log_slave_updates=1
relay_log=relay-bin
expire_logs_days=7
read_only=OFF
report_host=192.168.10.226
key_buffer_size=24M
tmp_table_size=64M
max_heap_table_size=64M
max_allowed_packet=512M
skip_name_resolve=true
memlock=0
sysdate_is_now=1
max_connections=500
thread_cache_size=512
query_cache_type=0
query_cache_size=0
table_open_cache=1024
lower_case_table_names=0
performance_schema=OFF
performance-schema-max-mutex-classes=0
performance-schema-max-mutex-instances=0
[MYSQL]
socket=/var/lib/mysql/mysql.sock
[client]
socket=/var/lib/mysql/mysql.sock
[mysqldump]
socket=/var/lib/mysql/mysql.sock
max_allowed_packet=512M
## Reset the data directory and re-install the database system files
$ rm -rf /var/lib/mysql/*
## Create the log directories
$ mkdir /var/log/mysql
$ chown -R mysql.mysql /var/log/mysql
## For MySQL
$ mysqld --initialize
## For MariaDB
$ mysql_install_db
Start the MySQL/MariaDB Server
## For MySQL
$ systemctl start mysqld
## For MariaDB
$ systemctl start mariadb
Load the data dump we have taken from AWS RDS to the target database node on-prem
$ mysql --show-warnings < backups/dump.sql
Create the replication user from the AWS RDS source node
MariaDB [(none)]> CREATE USER 'repl_user'@'149.145.213.%' IDENTIFIED BY 'repl_passw0rd';
Query OK, 0 rows affected (0.242 sec)
MariaDB [(none)]> GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO repl_user'@'149.145.213.%' IDENTIFIED BY 'repl_passw0rd' ;
Query OK, 0 rows affected (0.229 sec)
Set up the MySQL/MariaDB Server as a replica/slave of the AWS RDS source node
## First, let's search or locate the CHANGE MASTER command
[root@testnode26 ~]# grep -rn -E -i 'change master to master' backups/dump.sql |head -1
22:-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin-changelog.000584', MASTER_LOG_POS=421;
## Run the CHANGE MASTER statement but add the replication user/password and the hostname as follows,
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='database-1.xxxxxxx.us-east-2.rds.amazonaws.com', MASTER_LOG_FILE='mysql-bin-changelog.000584', MASTER_LOG_POS=421, MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd';
Query OK, 0 rows affected (0.004 sec)
## Then start the slave threads
MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.001 sec)
## Check the slave status how it goes
MariaDB [(none)]> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: database-1.xxxxxxx.us-east-2.rds.amazonaws.com
Master_User: repl_user
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin-changelog.000584
Read_Master_Log_Pos: 421
Relay_Log_File: relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin-changelog.000584
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 421
Relay_Log_Space: 256
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1675507089
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
Slave_DDL_Groups: 0
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 0
1 row in set (0.000 sec)
Now that we have finally been able to replicate from RDS as the source or master of our replica located on-prem. It's not done yet. There are some cases you'll encounter replication errors such as,
Last_SQL_Errno: 1146
Last_SQL_Error: Error 'Table 'mysql.rds_heartbeat2' doesn't exist' on query. Default database: 'mysql'. Query: 'INSERT INTO mysql.rds_heartbeat2(id, value) values (1,1602988485784) ON DUPLICATE KEY UPDATE value = 1602988485784'
Since on-prem does not need to replicate data coming from mysql database for tables prefixed with 'rds%', then we just ignore these tables during replication. Additionally, you might not want AWS RDS to update and change your mysql.user table. To do this, you can optionally ignore the schema or just list of tables such as,
STOP SLAVE;
Then,
SET GLOBAL replicate_wild_ignore_table='mysql.rds%';
or
SET GLOBAL replicate_wild_ignore_table='mysql.%';
The MySQL Problem With --master-data=2
Taking the mysqldump with --master-data=2 requires sufficient privileges which requires SUPER and RELOAD privileges. The problem is, AWS RDS does not supply this for the admin user during database setup and creation. In order to workaround this issue, it must be that your AWS RDS has a master and a replica or slave setup. Once you have a slave setup, take that as the target source host when taking mysqldump. Then stop the slave threads from your AWS RDS replica as follows,
rds-replica-mysql> CALL mysql.rds_stop_replication;
Then take the mysqldump without the --master-data option just like below,
mysqldump -h database-1.xxxxxxx.us-east-2.rds.amazonaws.com -uadmin -p --single-transaction --databases db1 db2 db3 > backups/dump.sql
Then run the SHOW SLAVE STATUS\G from your AWS RDS replica and take note of the Master_Log_File and Exec_Master_Log_Pos for which you will use when connecting to the AWS RDS master replicating to your on-prem server. Use those coordinates when running CHANGE MASTER TO… MASTER_LOG_FILE=Master_Log_File, MASTER_LOG_POS=<Exec_Master_Log_Pos>. Of course, once the backup has been done, do not forget to start your RDS replica to start its replication threads again,
rds-replica-mysql> CALL mysql.rds_start_replication;
Using mydumper
mydumper can be your alternative option here especially when the dataset is very large as it offers parallelism and speed when taking a dump or backup copy of your dataset from a source RDS node. Follow the steps below from installing the mydumper to loading it to your destination on-prem server.
Install the binary. The binaries can be located here https://github.com/maxbube/mydumper/releases.
$ yum install https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper-0.9.5-2.el6.x86_64.rpm
Take the backup from the RDS source node. For example,
[root@testnode26 mydumper-2]# /usr/bin/mydumper --outputdir=. --verbose=3 --host=database-1.xxxxxxx.us-east-2.rds.amazonaws.com --port=3306 --kill-long-queries --chunk-filesize=5120 --build-empty-files --events --routines --triggers --compress --less-locking --success-on-1146 --regex='(db1\.|db2\.|db3\.|mydb4\.|testdb5\.)' -u admin --password=admin123
** Message: Connected to a MySQL server
** (mydumper:18904): CRITICAL **: Couldn't acquire global lock, snapshots will not be consistent: Access denied for user 'admin'@'%' (using password: YES)
** Message: Started dump at: 2020-10-18 09:34:08
** Message: Written master status
** Message: Multisource slave detected.
** Message: Thread 5 connected using MySQL connection ID 1109
Now, at this point, mydumper will take a backup files in the form of *.gz files
Load it to your destination on-premise server
$ myloader --host localhost --directory=$(pwd) --queries-per-transaction=10000 --threads=8 --compress-protocol --verbose=3
** Message: 8 threads created
** Message: Creating database `db1`
** Message: Creating table `db1`.`folders_rel`
** Message: Creating table `db2`.`p`
** Message: Creating table `db2`.`t1`
** Message: Creating table `db3`.`AddressCodeTest`
Setup the destination node as a slave/replica. mydumper will include a file called metadata which consists of binary log coordinates including GTID positions, for example:
$ cat metadata
Started dump at: 2020-10-18 10:23:35
SHOW MASTER STATUS:
Log: mysql-bin-changelog.000680
Pos: 676
GTID:0-1675507089-3044
## Then run a change master from the replica or your target destination MySQL/MariaDB database node
MariaDB [jbmrcd_date]> CHANGE MASTER TO MASTER_HOST='database-1.cmu8qdlvkepg.us-east-2.rds.amazonaws.com', MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd', MASTER_LOG_FILE='mysql-bin-changelog.000680', MASTER_LOG_POS
=676;
Query OK, 0 rows affected (0.002 sec)
## Start the slave
MariaDB [jbmrcd_date]> start slave;
Query OK, 0 rows affected (0.001 sec)
At this point, you have now replicated from an Amazon RDS instance running MySQL/MariaDB. Once your application is ready to move away from your Amazon RDS instance, setup the endpoint going to your on-prem server and all remaining transactions from your RDS instance will be replicated to your on-prem leaving no data being missed going to your on-prem server.
Check For Data Discrepancies
Once you have your data loaded or dumped to your on-prem server acting as a replica from the AWS RDS instance, you should double check this by running checksum calculations to determine how far your data is against the source Amazon RDS. I suggest you use pt-table-checksum tool by Percona, but you can create your own though by using checksumming tools such as md5 or sha256 but this takes time to do. Additionally, using pt-upgrade can help as well after your data migration using this replication approach is done.
Conclusion
Using mysqldump or mydumper are free open-source tools which is a great advantage as well especially if your data is very confidential and you do not want a third-party to access it. Although it might be simple to take this approach, there can be tedious and large work that can be involved as testing and double checks always follow in order to prove that migration is fully achieved without any data inconsistencies.