How to perform online schema changes on MySQL using gh-ost

replication

gh-ost

online schema change

Having too large a (write) workload on a master is dangerous. If the master collapses and a failover happens to one of its slave nodes, the slave node could collapse under the write pressure as well. To mitigate this problem you can shard horizontally across more nodes.

Sharding increases the complexity of data storage though, and very often, it requires an overhaul of the application. In some cases, it may be impossible to make changes to an application. Luckily there is a simpler solution: functional sharding. With functional sharding you move a schema or table to another master, and thus alleviating the master from the workload of these schemas or tables.

In this Tips & Tricks post, we will explain how you can functionally shard your existing master, and offload some workload to another master using functional sharding. We will use ClusterControl, MySQL replication and ProxySQL to make this happen, and the total time taken should not be longer than 15 minutes in total. Mission impossible? :-)

The example database

In our example we have a serious issue with the workload on our simple order database, accessed by the so_user. The majority of the writes are happening on two tables: orders and order_status_log. Every change to an order will write to both the order table and the status log table.

CREATE TABLE `orders` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) NOT NULL,
  `status` varchar(14) DEFAULT 'created',
  `total_vat` decimal(15,2) DEFAULT '0.00',
  `total` decimal(15,2) DEFAULT '0.00',
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `order_status_log` (
  `orderId` int(11) NOT NULL,
  `status` varchar(14) DEFAULT 'created',
  `changeTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `logline` text,
  PRIMARY KEY (`orderId`, `status`, `changeTime` )
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `customers` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `firstname` varchar(15) NOT NULL,
  `surname` varchar(80) NOT NULL,
  `address` varchar(255) NOT NULL,
  `postalcode` varchar(6) NOT NULL,
  `city` varchar(50) NOT NULL,
  `state` varchar(50) NOT NULL,
  `country` varchar(50) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

What we will do is to move the order_status_log table to another master.

As you might have noticed, there is no foreign key defined on the order_status_log table. This simply would not work across functional shards. Joining the order_status_log table with any other table would simply no longer work as it will be physically on a different server than the other tables. And if you write transactional data to multiple tables, the rollback will only work for one of these masters. If you wish to retain these things, you should consider to use homogenous sharding instead where you keep related data grouped together in the same shard.

Installing the Replication setups

First, we will install a replication setup in ClusterControl. The topology in our example is really basic: we deploy one master and one replica:

But you could import your own existing replication topology into ClusterControl as well.

After the setup has been deployed, deploy the second setup:

While waiting for the second setup to be deployed, we will add ProxySQL to the first replication setup:

Adding the second setup to ProxySQL

After ProxySQL has been deployed we can connect with it via command line, and see it’s current configured servers and settings:

MySQL [(none)]> select hostgroup_id, hostname, port, status, comment from mysql_servers;
+--------------+-------------+------+--------+-----------------------+
| hostgroup_id | hostname    | port | status | comment               |
+--------------+-------------+------+--------+-----------------------+
| 20           | 10.10.36.11 | 3306 | ONLINE | read server           |
| 20           | 10.10.36.12 | 3306 | ONLINE | read server           |
| 10           | 10.10.36.11 | 3306 | ONLINE | read and write server |
+--------------+-------------+------+--------+-----------------------+

MySQL [(none)]> select rule_id, active, username, schemaname, match_pattern, destination_hostgroup from mysql_query_rules;
+---------+--------+----------+------------+---------------------------------------------------------+-----------------------+
| rule_id | active | username | schemaname | match_pattern                                           | destination_hostgroup |
+---------+--------+----------+------------+---------------------------------------------------------+-----------------------+
| 100     | 1      | NULL     | NULL       | ^SELECT .* FOR UPDATE                                   | 10                    |
| 200     | 1      | NULL     | NULL       | ^SELECT .*                                              | 20                    |
| 300     | 1      | NULL     | NULL       | .*                                                      | 10                    |
+---------+--------+----------+------------+---------------------------------------------------------+-----------------------+

As you can see, ProxySQL has been configured with the ClusterControl default read/write splitter for our first cluster. Any basic select query will be routed to hostgroup 20 (read pool) while all other queries will be routed to hostgroup 10 (master). What is missing here is the information about the second cluster, so we will add the hosts of the second cluster first:

MySQL [(none)]> INSERT INTO mysql_servers VALUES (30, '10.10.36.13', 3306, 'ONLINE', 1, 0, 100, 10, 0, 0, 'Second repl setup read server'), (30, '10.10.36.14', 3306, 'ONLINE', 1, 0, 100, 10, 0, 0, 'Second repl setup read server');
Query OK, 2 rows affected (0.00 sec)

MySQL [(none)]> INSERT INTO mysql_servers VALUES (40, '10.10.36.13', 3306, 'ONLINE', 1, 0, 100, 10, 0, 0, 'Second repl setup read and write server');
Query OK, 1 row affected (0.00 sec)

After this we need to load the servers to ProxySQL runtime tables and store the configuration to disk:

MySQL [(none)]> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)]> SAVE MYSQL SERVERS TO DISK;
Query OK, 0 rows affected (0.01 sec)

As ProxySQL is doing the authentication for the clients as well, we need to add the os_user user to ProxySQL to allow the application to connect through ProxySQL:

MySQL [(none)]> INSERT INTO mysql_users (username, password, active, default_hostgroup, default_schema) VALUES ('so_user', 'so_pass', 1, 10, 'simple_orders');
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> LOAD MYSQL USERS TO RUNTIME;
Query OK, 0 rows affected (0.00 sec)

MySQL [(none)]> SAVE MYSQL USERS TO DISK;
Query OK, 0 rows affected (0.00 sec)

Now we have added the second cluster and user to ProxySQL. Keep in mind that normally in ClusterControl the two clusters are considered two separate entities. ProxySQL will remain part of the first cluster. Even though it is now configured for the second cluster, it will only be displayed under the first cluster,.

Mirroring the data

Keep in mind that mirroring queries in ProxySQL is still a beta feature, and it doesn’t guarantee the mirrored queries will actually be executed. We have found it working fine within the boundaries of this use case. Also there are (better) alternatives to our example here, where you would make use of a restored backup on the new cluster and replicate from the master until you make the switch. We will describe this scenario in a follow up Tips & Tricks blog post.

Now that we have added the second cluster, we need to create the simple_orders database, the order_status_log table and the appropriate users on the master of the second cluster:

mysql> create database simple_orders;
Query OK, 1 row affected (0.01 sec)

mysql> use simple_orders;
Database changed

mysql> CREATE TABLE `order_status_log` (
  `orderId` int(11) NOT NULL,
  `status` varchar(14) DEFAULT 'created',
  `changeTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `logline` text,
  PRIMARY KEY (`orderId`, `status`, `changeTime` )
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> create user 'so_user'@'10.10.36.15' identified by 'so_pass';
Query OK, 0 rows affected (0.00 sec)

mysql> grant select, update, delete, insert on simple_orders.* to 'so_user'@'10.10.36.15';
Query OK, 0 rows affected (0.00 sec)

This enables us to start mirroring the queries executed against the first cluster onto the second cluster. This requires an additional query rule to be defined in ProxySQL:

MySQL [(none)]> INSERT INTO mysql_query_rules (rule_id, active, username, schemaname, match_pattern, destination_hostgroup, mirror_hostgroup, apply) VALUES (50, 1, 'so_user', 'simple_orders', '(^INSERT INTO|^REPLACE INTO|^UPDATE|INTO TABLE) order_status_log', 20, 40, 1);
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 1 row affected (0.00 sec)

In this rule ProxySQL will match everything that is writing to the orders_status_log table, and send it in addition to the hostgroup 40. (write server of the second cluster)

Now that we have started mirroring the queries, the backfill of the data from the first cluster can take place. You can use the timestamp from the first entry in the new orders_status_log table to determine the time we started to mirror.

Once the data has been backfilled we can reconfigure ProxySQL to perform all actions on the orders_status_log table on the second cluster. This will be a two step approach: add a new rule to move the read queries to the second cluster’s read servers and except the SELECT … FOR UPDATE queries. Then another one to modify our mirroring query to stop mirroring and only write to the second cluster.

MySQL [(none)]> INSERT INTO mysql_query_rules (rule_id, active, username, schemaname, match_pattern, destination_hostgroup, apply) VALUES (70, 1, 'so_user', 'simple_orders', '^SELECT .* FROM order_status_log', 30, 1), (60, 1, 'so_user', 'simple_orders', '^FROM order_status_log .* FOR UPDATE', 40, 1);
Query OK, 2 rows affected (0.00 sec)

MySQL [(none)]> UPDATE mysql_query_rules SET destination_hostgroup=40, mirror_hostgroup=NULL WHERE rule_id=50;
Query OK, 1 row affected (0.00 sec)

And don’t forget to activate and persist the new query rules:

MySQL [(none)]> LOAD MYSQL QUERY RULES TO RUNTIME;
Query OK, 1 row affected (0.00 sec)

MySQL [(none)]> SAVE MYSQL QUERY RULES TO DISK;
Query OK, 0 rows affected (0.05 sec)

After this final step we should see the workload drop on the first cluster, and increase on the second cluster. Mission possible and accomplished. Happy clustering!

Tags:

The MySQL database workload is determined by the number of queries that it processes. There are several situations in which MySQL slowness can originate. The first possibility is if there is any queries that are not using proper indexing. When a query cannot make use of an index, the MySQL server has to use more resources and time to process that query. By monitoring queries, you have the ability to pinpoint SQL code that is the root cause of a slowdown.

By default, MySQL provides several built-in tools to monitor queries, namely:

Slow Query Log - Captures query that exceeds a defined threshold, or query that does not use indexes.
General Query Log - Captures all queries happened in a MySQL server.
SHOW FULL PROCESSLIST statement (or through mysqladmin command) - Monitors live queries currently being processed by MySQL server.
PERFORMANCE_SCHEMA - Monitors MySQL Server execution at a low level.

There are also open-source tools out there that can achieve similar result like mtop and Percona’s pt-query-digest.

How ClusterControl monitors queries

ClusterControl does not only monitor your hosts and database instances, it also monitors your database queries. It gets the information in two different ways:

Queries are retrieved from PERFORMANCE_SCHEMA
If PERFORMANCE_SCHEMA is disabled or unavailable, ClusterControl will parse the content of the Slow Query Log

ClusterControl starts reading from the PERFORMANCE_SCHEMA tables immediately when the query monitor is enabled, and the following tables are used by ClusterControl to sample the queries:

performance_schema.events_statements_summary_by_digest
performance_schema.events_statements_current
performance_schema.threads

In older versions of MySQL (5.5), having PERFORMANCE_SCHEMA (P_S) enabled might not be an option since it can cause significant performance degradation. With MySQL 5.6 the overhead is reduced and even more so in 5.7. P_S offers great introspection of the server at an overhead of a few percents (1-3%). If the overhead is a concern then ClusterControl can parse the Slow Query log remotely to sample queries. Note that no agents are required on your database servers. It uses the following flow:

Start slow log (during MySQL runtime).
Run it for a short period of time (a second or couple of seconds).
Stop log.
Parse log.
Truncate log (ClusterControl creates new log file).
Go to 1.

As you can see, ClusterControl does the above trick when pulling and parsing the Slow Query log to overcome the problems with offsets. The drawback of this method is that the continuous sampling might miss some queries during steps 3 to 5. Hence, if continuous query sampling is vital for you and part of your monitoring policy, the best way is to use P_S. If enabled, ClusterControl will automatically use it.

The collected queries are hashed, calculated and digested (normalize, average, count, sort) and then stored in ClusterControl.

Enabling Query Monitoring

As mentioned earlier, ClusterControl monitors MySQL query via two ways:

Fetch the queries from PERFORMANCE_SCHEMA
Parse the content of MySQL Slow Query

Performance Schema (Recommended)

First of all, if you would like to use Performance Schema, turn it on all MySQL servers (MySQL/MariaDB v5.5.3 and later). Enabling this requires a MySQL restart. Add the following line to your MySQL configuration file:

performance_schema = ON

Then, restart the MySQL server. For ClusterControl users, you can use the configuration management feature at Manage -> Configurations -> Change Parameter and perform a rolling restart at Manage -> Upgrades -> Rolling Restart.

Once enabled, ensure at least events_statements_current is enabled:

mysql> SELECT * FROM performance_schema.setup_consumers WHERE NAME LIKE 'events_statements%';
+--------------------------------+---------+
| NAME                           | ENABLED |
+--------------------------------+---------+
| events_statements_current      | YES     |
| events_statements_history      | NO      |
| events_statements_history_long | NO      |
+--------------------------------+---------+

Otherwise, run the following statement to enable it:

UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' WHERE NAME = 'events_statements_current';

MySQL Slow Query

If Performance Schema is disabled, ClusterControl will then default to the Slow Query log. Hence, you don’t have to do anything since it can be turned on and off dynamically during runtime via SET statement.

The Query Monitoring function must be toggled to on under ClusterControl -> Query Monitor -> Top Queries. ClusterControl will monitor queries on all database nodes under this cluster:

Click on the “Settings” and configure “Long Query Time” and toggle “Log queries not using indexes” to On. If you have defined two parameters (long_query_time and log_queries_not_using_indexes) inside my.cnf and you would like to use those values instead, toggle “MySQL Local Query Override” to On. Otherwise, ClusterControl will obey the former.

Once enabled, you just need to wait a couple of minutes before you can see data under Top Queries and Query Histogram.

How ClusterControl visualizes the queries

Under the Query Monitor tab, you should see the following three items:

Top Queries
Running Queries
Query Histogram

We’ll have a quick look at these here, but remember that you can always find more details in the ClusterControl documentation.

Top Queries

Top Queries is an aggregated list of all your top queries running on all the nodes of your cluster. The list can be ordered by “Occurrence” or “Execution Time”, to show the most common or slowest queries respectively. You don’t have to login to each of the servers to see the top queries. The UI provides an option to filter based on MySQL server.

If you are using the Slow Query log, only queries that exceed the “Long Query Time” will be listed here. If the data is not populated correctly and you believe that there should be something in there, it could be:

ClusterControl did not collect enough queries to summarize and populate data. Try to lower the “Long Query Time”.
You have configured Slow Query Log configuration options in the my.cnf of MySQL server, and “Override Local Query” is turned off. If you really want to use the value you defined inside my.cnf, probably you have to lower the long_query_time value so ClusterControl can calculate a more accurate result.
You have another ClusterControl node pulling the Slow Query log as well (in case you have a standby ClusterControl server). Only allow one ClusterControl server to do this job.

The “Long Query Time” value can be specified to a resolution of microseconds, for example 0.000001 (1 x 10^-6). The following shows a screenshot of what’s under Top Queries:

Clicking on each query will show the query plan executed, similar to EXPLAIN command output:

Running Queries

Running Queries provides an aggregated view of current running queries across all nodes in the cluster, similar to SHOW FULL PROCESSLIST command in MySQL. You can stop a running query by selecting to kill the connection that started the query. The process list can be filtered out by host.

Use this feature to monitor live queries currently running on MySQL servers. By clicking on each row that contains “Info”, you can see the extended information containing the full query statement and the query plan:

Query Histogram

The Query Histogram is actually showing you queries that are outliers. An outlier is a query taking longer time than the normal query of that type. Use this feature to filter out the outliers for a certain time period. This feature is dependent on the Top Queries feature above. If Query Monitoring is enabled and Top Queries are captured and populated, the Query Histogram will summarize these and provide a filter based on timestamp.

That’s all folks! Monitoring queries is as important as monitoring your hosts or MySQL instances, to make sure your database is performing well.

Tags:

(this post was edited on 13/01/2017 after comments from Shlomi N.)

In previous posts, we gave an overview of gh-ost and showed you how to test your schema changes before executing them. One important feature of all schema change tools is their ability to throttle themselves. Online schema change requires copying data from old table to a new one and, no matter what you do in addition to that, it is an expensive process which may impact database performance.

Throttling in gh-ost

Throttling is crucial to ensure that normal operations continue to perform in a smooth way. As we discussed in a previous blog post, gh-ost allows to stop all of its activity, which makes things so much less intrusive. Let’s see how it works and to what extent it is configurable.

- Disclaimer - this section is related to gh-ost in versions older than 1.0.34 -

The main problem is that, gh-ost uses multiple methods of lag calculation, which make things not really clear. The documentation is also not clear enough to clarify how things work internally. Let’s take a look at how gh-ost operates right now.As we mentioned, there are multiple methods used to calculate lag. First of all, gh-ost generates an internal heartbeat in its _ghc table.

mysql> SELECT * FROM sbtest1._sbtest1_ghc LIMIT 1\G
*************************** 1. row ***************************
         id: 1
last_update: 2016-12-27 13:36:37
       hint: heartbeat
      value: 2016-12-27T13:36:37.139851335Z
1 row in set (0.00 sec)

It is used to calculate lag on the slave/replica, on which gh-ost operates and reads binary logs from. Then, replicas are mentioned in --throttle-control-replicas. Those, by default, have their lag tracked using SHOW SLAVE STATUS and Seconds_Behind_Master. This data has the granularity of one second.

The problem is that sometimes, one second of lag is too much for the application to handle, therefore one of the very important features of gh-ost is to be able to detect sub-second lag. On the replica, where gh-ost operates, gh-ost’s heartbeat supports sub-second granularity using heartbeat-interval-millis variable. The remaining replicas, though, are not supported this way - there is an option to take advantage of an external heartbeat solution like, for example, pt-heartbeat, and calculate slave lag using --replication-lag-query.

Unfortunately, when we put it all together, it didn’t work as expected - sub-second lag was not calculated correctly by gh-ost. We decided to contact Shlomi Noah, who’s leading the gh-ost project, to get some more insight in how gh-ost operates regarding to sub-second lag detection. What you will read below is a result of this conversation, showing how it is done starting from version 1.0.34, which incorporates changes in lag calculation and does it in the “right” way.

Gh-ost, at this moment, inserts heartbeat data in its _*_ghc table. This makes any external heartbeat generator redundant and, as a result, it makes --replication-lag-query deprecated and soon to be removed. Gh-ost’s internal heartbeat is be used across the whole replication topology.

If you want to check for lag with sub-second granularity, you need to configure correctly --heartbeat-interval-millis and --max-lag-millis ensuring that heartbeat-interval-millis is set to lower value than max-lag-millis - that’s all. You can, for example, tell gh-ost to insert a heartbeat every 100 milliseconds (heartbeat-interval-millis) and then test if lag is less than, let’s say 500 milliseconds (max-lag-millis). Of course, lag will be checked on all replicas defined in --throttle-control-replicas. You can see updated documentation related to the lag checking process here: https://github.com/github/gh-ost/blob/3bf64d8280b7cd639c95f748ccff02e90a7f4345/doc/subsecond-lag.md

Again, please keep in mind that this is how gh-ost operates when you use it in version v1.0.34 or later.

We need to mention, for a sake of completeness, one more setting - nice-ratio. It is used to define how aggressive gh-ost should be in copying the data. It basically tells ghost how much should it pause after each row copy operation. If you set it to 0 - no pause will be added. If you set it to 0.5, the whole process of copying rows will take 150% of original time. If you set it to 1, it will take twice as long (200%). It works but it is also pretty hard to adjust the ratio so the original workload is not affected. As long as you can use sub-second lag throttling, this is the way to go.

Runtime configuration changes in gh-ost

Another very useful feature of gh-ost is its ability to handle runtime configuration changes. When it starts, it listens on the unix socket, which you can choose through --serve-socket-file. By default it is created in /tmp dir and name is determined by gh-ost. It seems like it depends on the schema and table which gh-ost works upon. An example would be: /tmp/gh-ost.sbtest1.sbtest1.sock

Gh-ost can also work using TCP port but for that you need to pass --serve-tcp-port.

Knowing this, we can manipulate some of the settings. The best way to learn what we can change would be to ask gh-ost about it. When we send the ‘help’ string to the socket, we’ll get a list of available commands:

root@ip-172-30-4-235:~# echo help | nc -U /tmp/gh-ost.sbtest1.sbtest1.sock
available commands:
status                               # Print a detailed status message
sup                                  # Print a short status message
chunk-size=<newsize>                 # Set a new chunk-size
nice-ratio=<ratio>                   # Set a new nice-ratio, immediate sleep after each row-copy operation, float (examples: 0 is agrressive, 0.7 adds 70% runtime, 1.0 doubles runtime, 2.0 triples runtime, ...)
critical-load=<load>                 # Set a new set of max-load thresholds
max-lag-millis=<max-lag>             # Set a new replication lag threshold
replication-lag-query=<query>        # Set a new query that determines replication lag (no quotes)
max-load=<load>                      # Set a new set of max-load thresholds
throttle-query=<query>               # Set a new throttle-query (no quotes)
throttle-control-replicas=<replicas> # Set a new comma delimited list of throttle control replicas
throttle                             # Force throttling
no-throttle                          # End forced throttling (other throttling may still apply)
unpostpone                           # Bail out a cut-over postpone; proceed to cut-over
panic                                # panic and quit without cleanup
help                                 # This message

As you can see, there is a bunch of settings to change at runtime - we can change chunk size, we can change critical load settings (when defined thresholds will cross, causing gh-ost to start to throttle). You can also set settings related to throttling: nice-ratio, max-lag-millis, replication-lag-query, throttle-control-replicas. You can as well force throttling by sending the ‘throttle’ string to gh-ost or immediately stop the migration by sending ‘panic’.

Another setting which is worth mentioning is unpostpone. Gh-ost allows you to postpone the cutover process. As you know, gh-ost creates a temporary table using the new schema, and then fills it with data from the old table. Once all data has been copied, it performs a cut-over and replaces the old table with a new one. It may happen that you want to be there to monitor things, when gh-ost performs this step - in case something goes wrong. In that case, you can use --postpone-cut-over-flag-file to define a file which, if exists, will postpone the cut-over process. Then you can create that file and be sure that gh-ost won’t swap tables unless you let it by removing the file. Still, if you’d like to go ahead and force cut-over without a need to find and remove the postpone file, you can send ‘unpostpone’ string to gh-ost and it will immediately perform a cut-over.

We coming to the end of this post. Throttling is a critical part of any online schema change process (or any database-heavy process, for that matter) and it is important to understand how to do it right. Yet, even with throttling, some additional load is unavoidable That’s why, in our next blog post, we will try to assess the impact of running gh-ost on the system.

Tags:

Today we are pleased to announce the 1.4 release of ClusterControl - the all-inclusive database management system that lets you easily deploy, monitor, manage and scale highly available open source databases in any environment; on-premise or in the cloud.

This release contains key new features for MongoDB and MySQL Replication in particular, along with performance improvements and bug fixes.

Release Highlights

For MySQL

MySQL Replication

Enhanced multi-master deployment
Flexible topology management & error handling
Automated failover

MySQL Replication & Load Balancers

Deploy ProxySQL on MySQL Replication setups and monitor performance
HAProxy Read-Write split configuration support for MySQL Replication setups

Experimental support for Oracle MySQL Group Replication

Deploy Group Replication Clusters

And support for Percona XtraDB Cluster 5.7

Download ClusterControl

For MongoDB

MongoDB & sharded clusters

Convert a ReplicaSet to a sharded cluster
Add or remove shards
Add Mongos/Routers

More MongoDB features

Step down or freeze a node
New Severalnines database advisors for MongoDB

Download ClusterControl

View release details and resources

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

New MySQL Replication Features

ClusterControl 1.4 brings a number of new features to better support replication users. You are now able to deploy a multi-master replication setup in active - standby mode. One master will actively take writes, while the other one is ready to take over writes should the active master fail. From the UI, you can also easily add slaves under each master and reconfigure the topology by promoting new masters and failing over slaves.

Topology reconfigurations and master failovers are not usually possible in case of replication problems, for instance errant transactions. ClusterControl will check for issues before any failover or switchover happens. The admin can define whitelists and blacklists of which slaves to promote to master (and vice versa). This makes it easier for admins to manage their replication setups and make topology changes when needed.

Deploy ProxySQL on MySQL Replication clusters and monitor performance

Load balancers are an essential component in database high availability. With this new release, we have extended ClusterControl with the addition of ProxySQL, created for DBAs by René Cannaò, himself a DBA trying to solve issues when working with complex replication topologies. Users can now deploy ProxySQL on MySQL Replication clusters with ClusterControl and monitor its performance.

By default, ClusterControl deploys ProxySQL in read/write split mode - your read-only traffic will be sent to slaves while your writes will be sent to a writable master. ProxySQL will also work together with the new automatic failover mechanism. Once failover happens, ProxySQL will detect the new writable master and route writes to it. It all happens automatically, without any need for the user to take action.

MongoDB & sharded clusters

MongoDB is the rising star of the Open Source databases, and extending our support for this database has brought sharded clusters in addition to replica sets. This meant we had to retrieve more metrics to our monitoring, adding advisors and provide consistent backups for sharding. With this latest release, you can now convert a ReplicaSet cluster to a sharded cluster, add or remove shards from a sharded cluster as well as add Mongos/routers to a sharded cluster.

New Severalnines database advisors for MongoDB

Advisors are mini programs that provide advice on specific database issues and we’ve added three new advisors for MongoDB in this ClusterControl release. The first one calculates the replication window, the second watches over the replication window, and the third checks for un-sharded databases/collections. In addition to this we also added a generic disk advisor. The advisor verifies if any optimizations can be done, like noatime and noop I/O scheduling, on the data disk that is being used for storage.

There are a number of other features and improvements that we have not mentioned here. You can find all details in the ChangeLog.

We encourage you to test this latest release and provide us with your feedback. If you’d like a demo, feel free to request one.

Thank you for your ongoing support, and happy clustering!

PS.: For additional tips & tricks, follow our blog: http://www.severalnines.com/blog/

Tags:

With the recent release of ClusterControl 1.4.0, we added a bunch of new features to better support MySQL replication users. In this blog post, we’ll give you a quick overview of the new features.

Enhanced multi-master deployment

A simple master-slave replication setup is usually good enough in a lot of cases, but sometimes, you might need a more complex topology with multiple masters. With 1.4.0, ClusterControl can help provision such setups. You are now able to deploy a multi-master replication setup in active - standby mode. One of the masters will actively take writes, while the other one is ready to take over writes should the active master fail. You can also easily add slaves under each master, right from the UI.

Enhanced flexibility in replication topology management

With support for multi-master setups comes improved support for managing replication topology changes. Do you want to re-slave a slave off the standby master? Do you want to create a replication chain, with an intermediate master in-between? Sure! You can use a new job for that: “Change Replication Master”. Just go to one of the nodes and pick that job (not only on the slaves, you can also change replication master for your current master, to create a multi-master setup). You’ll be presented with a dialog box in which you can pick the master from which to slave your node off. As of now, only GTID-enabled replication is supported, both Oracle and MariaDB implementations.

Replication error handling

You may ask - what about issues like errant transactions which can be a serious problem for MySQL replication? Well, for starters, ClusterControl always set slaves in read_only mode so only a superuser can create an errant transaction. It still may happen, though. That’s why we added replication error handling in ClusterControl.

Errant transactions are common and they are handled separately - errant transactions are checked for before any failover or switchover happens. The user can then fix the problem before triggering a topology change once more. If, for some reason (like high availability, for example), a user wants to perform a failover anyway, no matter if it is safe or not, it can also be done by setting:

replication_stop_on_error=0

This is set in the cmon configuration file of the replication setup ( /etc/cmon.d/cmon_X.cnf, where X is the cluster ID of the replication setup). In such cases, failover will be performed even if there’s a possibility that replication will break.

To handle such cases, we added experimental support for slave rebuilding. If you enable replication_auto_rebuild_slave in the cmon configuration and if your slave is marked as down with the following error in MySQL:

Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

ClusterControl will attempt to rebuild the slave using data from the master. Such a setting may be dangerous as the rebuilding process will induce an increased load on the master, it may also be that your dataset is very large and a regular rebuild is not an option - that’s why this behavior is disabled by default. Feel free to try it out, though and let us know what you think about it.

Automated failover

Handling replication errors is not enough to maintain high availability with MySQL replication - you need also to handle crashes of MySQL instances. Until now, ClusterControl alerted the user and let her perform a manual failover. With ClusterControl version 1.4.0 comes support for automated failover handling. It is enough to have cluster recovery enabled for your replication cluster and ClusterControl will try to recover your replication cluster in the best way possible. You must explicitly enable "Cluster Auto Recovery" in the UI in order for automatic failover to be activated.

Once a master failure is detected, ClusterControl starts to look for the most up-to-date slave available. Once it’s been found, ClusterControl checks the remaining slaves and looks for additional, missing transactions. If such transactions are found on some of the slaves, the master candidate is configured to replicate from each of those slaves and apply any missing transactions.

If, for any reason, you’d rather not wait for a master candidate to get all missing transactions (maybe because you are 100% sure there won’t be any), you can disable this step by enabling the replication_skip_apply_missing_txs setting in cmon configuration.

For MariaDB setups, the behavior is different - ClusterControl picks the most advanced slave and promotes it to become master.

Getting missing transactions is one thing. Applying them is another. ClusterControl, by default, does not fail over to a slave if the slave has not applied all missing transactions. You could lose data. Instead, it will wait indefinitely to allow slaves to catch up. Of course, if the master candidate becomes up to date, ClusterControl will failover immediately after. This behavior can be configured using replication_failover_wait_to_apply_timeout setting in the cmon configuration file. Default value (-1) prevents any failover if master candidate is lagging behind. If you’d like to execute failover anyway, you can set it to 0. You can also set a timeout in seconds, this is the amount of time that ClusterControl will wait for a master candidate to catch up before performing a failover.

Once a master candidate is brought up to date, it is promoted to master and the remaining slaves are slaved off it. The exact process differs depending on which host failed (the active or standby master in a multi-master setup) but the final outcome is that all slaves are again replicating from the working master. Combined with proxies such as HAProxy, ProxySQL or MaxScale, this lets you build an environment where a master failure is handled in an automated and transparent way.

Additional control over failover behavior is granted through replicaton_failover_whitelist and replicaton_failover_blacklist lists in the cmon configuration file. These let you configure a list of slaves which should be treated as a candidate list to become master, and a list of slaves which should not be promoted to master by ClusterControl. There are numerous reasons you may want to use those variables. Maybe you have some backup or OLAP/reporting slaves which are not suitable to become a master? Maybe some of your slaves use weaker hardware or maybe they are located in a different datacenter? In this case, you can avoid them from being promoted by adding those slaves to the replicaton_failover_blacklist variable.

Likewise, maybe you want to limit the number of slaves that are promotable to a particular set of hosts which are the closest to the current master? Or maybe you use master - master, active - passive setup and you want only your standby master to be considered for promotion? Then specify the IP’s of master candidates in the replicaton_failover_whitelist variable. Please keep in mind that a restart of cmon process will be required to reload such configuration. By executing cmon --help-config on the controller, you will get more detailed information about these (and other) parameters.

Finally, you might want to manually restore replication.If you do not want ClusterControl to perform any automated failover in your replication topology, you can disable cluster recovery from the ClusterControl UI.

So, there are lots of good stuff to try out here for MySQL replication users. Do give it a try, and let us know how we’re doing.

Tags:

A proxy layer is an important building block for any highly available database environment. HAProxy has been available for a long time, as a generic TCP load balancer. We later added support for MaxScale, an SQL-aware load balancer. We’re today happy to announce support for ProxySQL, which also belongs to the family of SQL-aware load balancers. We blogged about ProxySQL sometime back, in case you would like to read more about it.

As from ClusterControl 1.4, we now support deployment and configuration of ProxySQL. It is done from a simple wizard:

You need to pick a host on which ProxySQL will be installed, you can choose to install it on one of your existing hosts in your cluster or you can type in a new host.

In the next step, you need to pick username and password for administration and monitoring user for ProxySQL.

You also need to define application users. ProxySQL works in the middle, between application and backend MySQL servers, so the database users need to be able to connect from the ProxySQL IP address. The proxy supports connection multiplexing, so not all connections opened by applications onto ProxySQL will result in connections opened to the backend databases. This can seriously increase performance of MySQL - it’s well known that MySQL scales up to some number of concurrent connections (exact value depends mostly on the MySQL version). Now, using ProxySQL, you can push thousands of connections from the application. ProxySQL opens a much lower number of connections to MySQL. It takes advantage of the fact that, for most of the time, connections wait for the application and do not process any query: they can be reused. This feature requires that the application authenticates against ProxySQL, so you need to add all users that you use in your application to ProxySQL.

You can either add existing database users, or you can create a new one from the UI - fill in username and password, pick what grants you want to assign to that user, and you are all set - ClusterControl will create this user for you in both MySQL and ProxySQL.

Finally, you need to pick which hosts you want to include in the ProxySQL configuration, and set some configuration variables like maximum allowed replication lag, maximum number of connections or weight.

At the end of the form you have to answer an important question: do you use implicit transactions or not? This question is mandatory to answer and based on your answer ClusterControl will configure ProxySQL. If you don’t use implicit transactions (by that we mean that you don’t use SET autocommit=1 to initiate a transaction) thenyour read-only traffic will be sent to slaves while your writes will be sent to a writable master. If you use implicit transactions, the only safe method of configuring ProxySQL is to send all traffic to the master. There will be no scale-out, only high availability. Of course, you can always redirect some particular queries to the slaves but you will have to do it on your own - ClusterControl cannot do it for you. This will change in the future when ProxySQL will add support for handling such scenario, but for now we have to stick to what we have.

ProxySQL will also work together with the new automatic failover mechanism (for master-slave replication setups) added in ClusterControl 1.4.0 - once failover happens, ProxySQL will detect the new writable master and route writes to it. It all happens automatically, without any need for the user to take action.

Once deployed, you can use ClusterControl to monitor your ProxySQL installation. You can check the status of hosts in all defined hostgroups. You can also check some metrics related to hostgroups - used connections, free connections, errors, number of queries executed, amount of data sent and received, latency. Below you can find graphs related to some of ProxySQL metrics - active transactions, data sent and received, memory utilization, number of connections and some others. This gives you nice insight in how ProxySQL operates and helps to catch any potential issues with your proxy layer.

So, give it a try and tell us what you think. We plan on adding more management features in the next release, and would love to hear what you’d like to see.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Tags:

MySQL replication setups are inevitably related to failovers. Unlike multi-master clusters like Galera, there is one single writer in a whole setup - the master. If the master fails, one of the slaves will have to take its role through the process of failover. Such process is tricky and potentially, it may cause data loss. It may happen, for example, if a slave is not up to date while it is promoted. The master may also die before it is able to transfer all binlog events to at least one of its slaves.

Different people have different takes on how to perform failover. It depends on personal preferences but also on requirements of the business. There are two main options - automated failover or manual failover.

Automated failover comes in very handy if you want your environment to run 24x7, and to recover quickly from any failures. Unfortunately, this may come at a cost - in more complex failure scenarios, automated failover may not work correctly or, even worse, it may result in your data being messed up and partially missing (although one might argue that a human can also make disastrous mistakes leading to similar consequences). Those who prefer to keep close control over their database may choose to skip automated failover and use a manual process instead. Such process takes more time, but it allows an experienced DBA to assess the state of a system and take corrective actions based on what happened.

ClusterControl already supports automated failover for master-master clusters like Galera and NDB Cluster. Now with 1.4, it also does this for MySQL replication. In this blog post, we’ll take you through the failover process, discussing how ClusterControl does it, and what can be configured by the user.

Configuring Automatic Failover

Failover in ClusterControl can be configured to be automatic or not. If you prefer to take care of failover manually, you can disable automated cluster recovery. By default, cluster recovery is enabled and automated failover is used. Once you make changes in the UI, make sure you also make them in the cmon configuration and set enable_cluster_autorecovery to ‘0’. Otherwise your settings will be overwritten when the cmon process is restarted.

Failover is initiated by ClusterControl when it detects that there is no host with read_only flag disabled. It can happen because master (which has read_only set to 0) is not available or it can be triggered by a user or some external software that changed this flag on the master. If you do manual changes to the database nodes or have software that may fiddle with the read_only settings, then you should disable automatic failover.

Also, note that failover is attempted only once. Should a failover attempt fail, then no more attempts will be made until the controller is restarted.

At the beginning of the failover procedure, ClusterControl builds a list of slaves which can be promoted to master. Most of the time, it will contain all slaves in the topology but the user has some additional control over it. There are two variables you can set in the cmon configuration:

replicaton_failover_whitelist

and

replicaton_failover_blacklist

First of them, when used, contains a list of IP’s or hostnames of slaves which should be used as potential master candidates. If this variable is set, only those hosts will be considered.

Second variable may contain list of hosts which will never be considered a master candidate. You can use it to list slaves that are used for backups or analytical queries. If the hardware varies between slaves, you may want to put here the slaves which use slower hardware.

Replication_failover_whitelist takes precedence, meaning the replication_failover_blacklist is ignored if replication_failover_whitelist is set.

Once the list of slaves which may be promoted to master is ready, ClusterControl starts to compare their state, looking for the most up to date slave. Here, the handling of MariaDB and MySQL-based setups differs. For MariaDB setups, ClusterControl picks a slave which has the lowest replication lag of all slaves available. For MySQL setups, ClusterControl picks such a slave as well but then it checks for additional, missing transactions which could have been executed on some of the remaining slaves. If such a transaction is found, ClusterControl slaves the master candidate off that host in order to retrieve all missing transactions.

In case you’d like to skip this process and just use the most advanced slave, you can set the following setting in the cmon configuration:

replication_skip_apply_missing_txs=1

Such process may result in a serious problem though - if an errant transaction is found, replication may be broken. What is an errant transaction? In short, it is a transaction that has been executed on a slave but it’s not coming from the master. It could have been, for example, executed locally. The problem is caused by the fact that, while using GTID, if a host, which has such errant transaction, becomes a master, all slaves will ask for this missing transaction in order to be in sync with their new master. If the errant transaction happened way in the past, it may not longer be available in binary logs. In that case, replication will break because slaves won’t be able to retrieve the missing data. If you would like to learn more about errant transactions, we have a blog post covering this topic.

Of course, we don’t want to see replication breaking, therefore ClusterControl, by default, checks for any errant transactions before it promotes a master candidate to become a master. If such problem is detected, the master switch is aborted and ClusterControl lets the user fix the problem manually. The blog post we mentioned above explains how you can manually fix issues with errant transactions.

If you want to be 100% certain that ClusterControl will promote a new master even if some issues are detected, you can do that using the replication_stop_on_error=0 setting in cmon configuration. Of course, as we discussed, it may lead to problems with replication - slaves may start asking for a binary log event which is not available anymore. To handle such cases we added experimental support for slave rebuilding. If you set replication_auto_rebuild_slave=1 in the cmon configuration and if your slave is marked as down with the following error in MySQL:

ClusterControl will attempt to rebuild the slave using data from the master. Such a setting may not always be appropriate as the rebuilding process will induce an increased load on the master. It may also be that your dataset is very large and a regular rebuild is not an option - that’s why this behavior is disabled by default.

Once we ensure that no errant transactions exist and we are good to go, there is still one more issue we need to handle somehow - it may happen that all slaves are lagging behind the master.

As you probably know, replication in MySQL works in a rather simple way. The master stores writes in binary logs. The slave’s I/O thread connects to the master and pulls any binary log events it is missing. It then stores them in the form of relay logs. The SQL thread parses them and applies events. Slave lag is a condition in which SQL thread (or threads) cannot cope with the number of events, and is unable to apply them as soon as they are pulled from the master by the I/O thread. Such situation may happen no matter what type of replication you are using. Even if you use semi-sync replication, it can only guarantee that all events from the master are stored on one of slaves in the relay log. It doesn’t say anything about applying those events to a slave.

The problem here is that, if a slave is promoted to master, relay logs will be wiped out. If a slave is lagging and hasn’t applied all transactions, it will lose data - events that are not yet applied from relay logs will be lost forever.

There is no one-size-fits-all way of solving this situation. ClusterControl gives users control over how it should be done, maintaining safe defaults. It is done in cmon configuration using the following setting:

Replication_failover_wait_to_apply_timeout

By default it takes a value of ‘-1’, which means that failover won’t happen if a master candidate is lagging. ClusterControl will wait indefinitely for it to apply all missing transactions from its relay logs. This is safe, but, if for some reason, the most up-to-date slave is lagging badly, failover may takes hours to complete. On the other side of the spectrum is setting it to ‘0’ - it means that failover happens immediately, no matter if the master candidate is lagging or not. You can also go the middle way and set it to some value. This will set a time in seconds, during which ClusterControl will wait for a master candidate to apply missing transactions from its relay logs. Failover happens after the defined time or when the master candidate will catch up on replication - whichever happens first. This may be a good choice if your application has specific requirements regarding downtime and you have to elect a new master within a short time window.

When using MySQL replication along with proxies like ProxySQL, ClusterControl can help you build an environment in which the failover process is barely noticeable by the application. Below we’ll show how the failover process may look like in a typical replication setup - one master with two slaves. We will use ProxySQL to detect topology changes and route traffic to the correct hosts.

First, we’ll start our “application” - sysbench:

root@ip-172-30-4-48:~# while true ; do sysbench --test=/root/sysbench/sysbench/tests/db/oltp.lua --num-threads=2 --max-requests=0 --max-time=0 --mysql-host=172.30.4.48 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=6033 --oltp-tables-count=32 --report-interval=1 --oltp-skip-trx=on --oltp-table-size=100000 run ; done

It will connect to ProxySQL (port 6033) and use it to distribute traffic between master and slaves. We simulate default behavior of autocommit=1 in MySQL by disabling transactions for Sysbench.

Once we induce some lag, we kill our master:

root@ip-172-30-4-112:~# killall -9 mysqld mysqld_safe

ClusterControl starts failover.

ID:79574 [13:18:34]: Failover to a new Master.

At first, it verifies state of replication on all nodes in the cluster. Among other things, ClusterControl looks for the most up to date slave in the topology and picks it as master candidate.

ID:79575 [13:18:34]: Checking 172.30.4.99:3306
ID:79576 [13:18:34]: ioerrno=2003 io running 0 on 172.30.4.99:3306
ID:79577 [13:18:34]: Checking 172.30.4.4:3306
ID:79578 [13:18:34]: ioerrno=2003 io running 0 on 172.30.4.4:3306
ID:79579 [13:18:34]: Checking 172.30.4.112:3306
ID:79580 [13:18:34]: 172.30.4.112:3306: is not connected. Checking if this is the failed master.
ID:79581 [13:18:34]: 172.30.4.99:3306: Checking if slave can be used as a candidate.
ID:79582 [13:18:34]: Adding 172.30.4.99:3306 to slave list
ID:79583 [13:18:34]: 172.30.4.4:3306: Checking if slave can be used as a candidate.
ID:79584 [13:18:34]: Adding 172.30.4.4:3306 to slave list
ID:79585 [13:18:34]: 172.30.4.4:3306: Slave lag is 4 seconds.
ID:79586 [13:18:34]: 172.30.4.99:3306: Slave lag is 20 seconds >= 4 seconds, not a possible candidate.
ID:79587 [13:18:34]: 172.30.4.4:3306 elected as the new Master.

As a next step, required grants are added.

ID:79588 [13:18:34]: 172.30.4.4:3306: Creating user 'rpl_user'@'172.30.4.99.
ID:79589 [13:18:34]: 172.30.4.4:3306: Granting REPLICATION SLAVE 'rpl_user'@'172.30.4.99'.
ID:79590 [13:18:34]: 172.30.4.99:3306: Creating user 'rpl_user'@'172.30.4.4.
ID:79591 [13:18:34]: 172.30.4.99:3306: Granting REPLICATION SLAVE 'rpl_user'@'172.30.4.4'.
ID:79592 [13:18:34]: 172.30.4.99:3306: Setting read_only=ON
ID:79593 [13:18:34]: 172.30.4.4:3306: Setting read_only=ON

Then, it’s time to ensure no errant transactions are found, which could prevent the whole failover process from happening.

ID:79594 [13:18:34]: Checking for errant transactions.
ID:79595 [13:18:34]: 172.30.4.99:3306: Skipping, same as slave 172.30.4.99:3306
ID:79596 [13:18:34]: 172.30.4.99:3306: Comparing to 172.30.4.4:3306, master_uuid = 'e4864640-baff-11e6-8eae-1212bbde1380'
ID:79597 [13:18:34]: 172.30.4.4:3306: Checking for errant transactions.
ID:79598 [13:18:34]: 172.30.4.112:3306: Skipping, same as master 172.30.4.112:3306
ID:79599 [13:18:35]: 172.30.4.4:3306: Comparing to 172.30.4.99:3306, master_uuid = 'e4864640-baff-11e6-8eae-1212bbde1380'
ID:79600 [13:18:35]: 172.30.4.99:3306: Checking for errant transactions.
ID:79601 [13:18:35]: 172.30.4.4:3306: Skipping, same as slave 172.30.4.4:3306
ID:79602 [13:18:35]: 172.30.4.112:3306: Skipping, same as master 172.30.4.112:3306
ID:79603 [13:18:35]: No errant transactions found.

During the last preparation step, missing transactions are being applied on the master candidate - we want it to fully catch up on the replication before we proceed with failover. In our case, to ensure that failover will happen even if slave is badly lagging, we enforced 600 second limit - slave will try to replay any missing transactions from its relay logs but if it will take more than 600 seconds, we will force a failover.

ID:79604 [13:18:35]: 172.30.4.4:3306: preparing candidate.
ID:79605 [13:18:35]: 172.30.4.4:3306: Checking if there the candidate has relay log to apply.
ID:79606 [13:18:35]: 172.30.4.4:3306: waiting up to 600 seconds before timing out.
ID:79608 [13:18:37]: 172.30.4.4:3306: Applied 391 transactions
ID:79609 [13:18:37]: 172.30.4.4:3306: Executing 'SELECT WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS('e4864640-baff-11e6-8eae-1212bbde1380:16340-23420', 5)' (waited 5 out of maximally 600 seconds).
ID:79610 [13:18:37]: 172.30.4.4:3306: Applied 0 transactions
ID:79611 [13:18:37]: 172.30.4.99:3306: No missing transactions found.
ID:79612 [13:18:37]: 172.30.4.4:3306: Up to date with temporary master 172.30.4.99:3306
ID:79613 [13:18:37]: 172.30.4.4:3306: Completed preparations of candidate.

Finally, failover happens. From the application’s standpoint, the impact was minimal - the process took less than 5 seconds, during which the application had to wait for queries to execute. Of course, it depends on multiple factors - the main one is replication lag as the failover process, by default, requires the slave to be up-to-date. Catching up can take quite some time if the slave is lagging behind heavily.

At the end, we have a new replication topology. A new master has been elected and the second slave has been reslaved. The old master, on the other hand, is stopped. This is intentional as we want the user to be able to investigate the state of the old master before performing any further changes (e.g., slaving it off a new master or rebuilding it).

We hope this mechanism will be useful in maintaining high availability of replication setups. If you have any feedback on it, let us know as we’d love to hear from you.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Tags:

Join us next Tuesday, February 7th 2017, as Johan Andersson, CTO at Severalnines, unveils the new ClusterControl 1.4 in a live demo webinar.

ClusterControl reduces complexity of managing your database infrastructure while adding support for new technologies; enabling you to truly automate multiple environments for next-level applications. This latest release further builds out the functionality of ClusterControl to allow you to manage and secure your 24/7, mission critical infrastructures.

In this live webinar, Johan will demonstrate how ClusterControl increases your efficiency by giving you a single interface to deploy and operate your databases, instead of searching for and cobbling together a combination of open source tools, utilities and scripts that need constant updates and maintenance. Watch as ClusterControl demystifies the complexity associated with database high availability, load balancing, recovery and your other everyday struggles.

To put it simply: learn how to be a database hero with ClusterControl!

Date, Time & Registation

Europe/MEA/APAC

Tuesday, February 7th at 09:00 GMT (UK) / 10:00 CET (Germany, France, Sweden)

North America/LatAm

Tuesday, February 7th at 9:00 Pacific Time (US) / 12:00 Eastern Time (US)

Agenda

ClusterControl (1.4) Overview
‘Always on Databases’ with enhanced MySQL Replication functions
‘Safer NoSQL’ with MongoDB and larger sharded cluster deployments
‘Enabling the DBA’ with ProxySQL, HAProxy and MaxScale
Backing up your open source databases
Live Demo
Q&A

Speaker

Johan Andersson, CTO, Severalnines

Johan's technical background and interest are in high performance computing as demonstrated by the work he did on main-memory clustered databases at Ericsson as well as his research on parallel Java Virtual Machines at Trinity College Dublin in Ireland. Prior to co-founding Severalnines, Johan was Principal Consultant and lead of the MySQL Clustering & High Availability consulting group at MySQL / Sun Microsystems / Oracle, where he designed and implemented large-scale MySQL systems for key customers. Johan is a regular speaker at MySQL User Conferences as well as other high profile community gatherings with popular talks and tutorials around architecting and tuning MySQL Clusters.

We look forward to “seeing” you there and to insightful discussions!

If you have any questions or would like a personalised live demo, please do contact us.

Tags:

ClusterControl makes it easy to deploy a database setup - just fill in some values (database vendor, database data directory, password and hostnames) in the deployment wizard and you’re good to go. The rest of the configuration options will be automatically determined (and calculated) based on the host specifications (CPU cores, memory, IP address etc) and applied to the template file that comes with ClusterControl. In this blog post, we are going to look into how ClusterControl uses default template files and how users can customize them to their needs.

Base Template Files

All services configured by ClusterControl use a base configuration template available under /usr/share/cmon/templates on the ClusterControl node. The following are template files provided by ClusterControl v1.4.0:

Filename	Description
config.ini.mc	MySQL Cluster configuration file.
haproxy.cfg	HAProxy configuration template for Galera Cluster.
haproxy_rw_split.cfg	HAProxy configuration template for read-write splitting.
garbd.cnf	Galera arbitrator daemon (garbd) configuration file.
keepalived-1.2.7.conf	Legacy keepalived configuration file (pre 1.2.7). This is deprecated.
keepalived.conf	Keepalived configuration file.
keepalived.init	Keepalived init script.
MaxScale_template.cnf	MaxScale configuration template.
mongodb-2.6.conf.org	MongoDB 2.x configuration template.
mongodb.conf.org	MongoDB 3.x configuration template.
mongodb.conf.percona	MongoDB 3.x configuration template for Percona Server for MongoDB.
mongos.conf.org	Mongo router (mongos) configuration template.
my.cnf.galera	MySQL configuration template for Galera Cluster.
my57.cnf.galera	MySQL configuration template for Galera Cluster on MySQL 5.7.
my.cnf.grouprepl	MySQL configuration template for MySQL Group Replication.
my.cnf.gtid_replication	MySQL configuration template for MySQL Replication with GTID.
my.cnf.mysqlcluster	MySQL configuration template for MySQL Cluster.
my.cnf.pxc55	MySQL configuration template for Percona XtraDB Cluster v5.5.
my.cnf.repl57	MySQL configuration template for MySQL Replication v5.7.
my.cnf.replication	MySQL configuration template for MySQL/MariaDB without MySQL’s GTID.
mysqlchk.galera	MySQL health check script template for Galera Cluster.
mysqlchk.mysql	MySQL health check script template for MySQL Replication.
mysqlchk_xinetd	Xinetd configuration template for MySQL health check.
mysqld.service.override	Systemd unit file template for MySQL service.
proxysql_template.cnf	ProxySQL configuration template.

The above list depends upon the feature set provided by the installed ClusterControl release. In an older version, you might not find some of them. You can modify these template files directly, although we do not recommend it as explained in the next sections.

Configuration Manager

Depending on the cluster type, ClusterControl will then import the necessary base template file into CMON database and accessible via Manage -> Configurations -> Templates once deployment succeeds. For example, consider the following configuration template for a MariaDB Galera Cluster:

ClusterControl will load the base template content of Galera configuration template from /usr/share/cmon/templates/my.cnf.galera into CMON database (inside cluster_configuration_templates table) after deployment succeeds. You can then customize your own configuration file directly in the ClusterControl UI. Whenever you hit the Save button, the new version of configuration template will be stored inside CMON database, without overwriting the base template file.

Once the cluster is deployed and running, the template in the UI takes precedence. The base template file is only used during the initial cluster deployment via ClusterControl -> Deploy -> Deploy Database Cluster. During the deployment stage, ClusterControl will use a temporary directory located at /var/tmp/ to prepare the content, for example:

/var/tmp/cmon-003862-6a7775ca76c62486.tmp

Dynamic Variables

There are a number configuration variables which configurable dynamically by ClusterControl. These variables are represented with capital letters enclosed by at sign ‘@’, for example @DATADIR@. For full details on supported variables, please refer to this page. Dynamic variables are automatically configured based on the input specified during cluster deployment, or ClusterControl performs automatic detection based on hostname, IP address, available RAM, number of CPU cores and so on. This simplifies the deployment where you only need to specify minimal options during cluster deployment stage

If the dynamic variable is replaced with a value (or undefined), ClusterControl will skip it and use the configured value instead. This is handy for advanced users, where usually have their own set of configuration options that tailored for specific database workloads.

Pre-deployment Configuration Template Example

Instead of relying on ClusterControl’s dynamic variable on the number of max_connections for our database nodes, we can change the following line inside /usr/share/cmon/templates/my57.cnf.galera, from:

max_connections=@MAX_CONNECTIONS@

To:

max_connections=50

Save the text file and on the Deploy Database Cluster dialog, ensure ClusterControl uses the correct base template file:

Click on Deploy button to start the database cluster deployment.

Post-deployment Configuration Template Example

After the database cluster deployment completes, you might have done some fine tuning on the running servers before deciding to scale it up. When scaling up, ClusterControl will use the configuration template inside CMON database (the one populated under ClusterControl -> Configurations -> Templates) to deploy the new nodes. Hence do remember to apply the modification you made on the database server to the template file.

Before adding a new node, it’s a good practice to review the configuration template to ensure that the new node gets what we expected. Then, go to ClusterControl -> Add Node and ensure the correct MySQL template file is selected:

Then, click on “Add Node” button to start the deployment.

That’s it. Even though ClusterControl does various automation jobs when it comes to deployment, it still provides freedom for users to customize the deployment accordingly. Happy clustering!

Tags:

clustercontrol

configuration management

ClusterControl 1.4 introduces some major improvements in the area of backup management, with a revamped interface and simplified options to create backups. In this blog post, we’ll have a look at the new backup features available in this release.

Upgrading to 1.4

If you upgrade ClusterControl from version 1.3.x to version 1.4, the CMON process will internally migrate all backup related data/schedules to the new interface. The migration will happen during the first startup after you have upgraded (you are required to restart the CMON process after a package upgrade). To upgrade, please refer to the documentation.

Redesigned User Interface

In the user interface, we have now consolidated related functionality onto a single interface. This includes Backup Settings, which were previously found under ClusterControl -> Settings -> Backups. It is now accessible under the same backup management tab:

The interface is now responsive to any action taken and requires no manual refresh. When a backup is created, you will see it in the backup list with a spinning arrows icon:

It is also possible now to schedule a backup every minute (the lowest interval) or year (the highest interval):

The backup options when scheduling or creating a backup now appear on the right side:

This allows you to quickly configure the backup, rather than having to scroll down the page.

Backup Report

Here is how it used to look pre v1.4:

After upgrading to ClusterControl v1.4, the report will look like this:

All incremental backups are automatically grouped together under the last full backup and expandable with a drop down. This makes the backups more organized per backup set. Each created backup will have “Restore” and “Log” buttons. The “Time” column also now contains timezone information, useful if you are dealing with geographically distributed infrastructure.

Restore to an Incremental Backup Point

You are now able to restore up to a certain incremental backup. Previously, ClusterControl supported restoration per backup set. All incremental backups under a single backup set would be restored and there was no way, for instance, to skip some of the incremental backups.

Consider the below example:

Full backup happens every day around 5:15 AM while incremental backup was scheduled every 15 minutes after the hour. If something happened around 5:50 AM and you would like to restore up to the backup taken just before that, you can skip the 6 AM backup by just clicking on the “Restore” link of the 5:45 AM incremental backup. You should then see the following Restore wizard and a couple of post-restoration options:

ClusterControl will then prepare the backup up until the selected point and the rest will be skipped. It also highlights “Warning” and “Notes” so you are aware of what will happen with the cluster during the restoration process. Note that mysqldump restoration can be performed online, while Xtrabackup requires the cluster/database instance to be stopped.

Operational Report

You might have multiple database systems running, and perhaps in different datacenters. Would it not be nice to get a consolidated report of the systems, when they were last backed up, and if there were any failed backups? This is available in 1.4. Note that you have other types of ops reports available in ClusterControl.

The report contains two sections and gives you a short summary of when the last backup was created, if it completed successfully or failed. You can also check the list of backups executed on the cluster with their state, type and size. This is as close you can get to check that backups work correctly without running a full recovery test. However, we definitely recommend that such tests are regularly performed.

The operational report can be scheduled and emailed to a set of recipients under Settings -> Operational Reports section, as shown in the following screenshot:

Access via ClusterControl RPC interface

The new backup features are now exposed under ClusterControl RPC interface, which means you can interact via API call with a correct RPC key. For example, to list the created backup on cluster ID 2, the following call should be enough:

$ curl -XPOST -d '{"operation": "listbackups", "token": "RB81tydD0exsWsaM"}' http://localhost:9500/2/backup
{"cc_timestamp": 1477063671,"data": [
  {"backup": [
      {"db": "mysql","files": [
          {"class_name": "CmonBackupFile","created": "2016-10-21T15:26:40.000Z","hash": "md5:c7f4b2b80ea439ae5aaa28a0f3c213cb","path": "mysqldump_2016-10-21_172640_mysqldb.sql.gz","size": 161305,"type": "data,schema"
          } ],"start_time": "2016-10-21T15:26:41.000Z"
      } ],"backup_host": "192.168.33.125","cid": 101,"class_name": "CmonBackupRecord","config":
      {"backupDir": "/tmp","backupHost": "192.168.33.125","backupMethod": "mysqldump","backupToIndividualFiles": false,"backup_failover": false,"backup_failover_host": "","ccStorage": false,"checkHost": false,"compression": true,"includeDatabases": "","netcat_port": 9999,"origBackupDir": "/tmp","port": 3306,"set_gtid_purged_off": true,"throttle_rate_iops": 0,"throttle_rate_netbw": 0,"usePigz": false,"wsrep_desync": false,"xtrabackupParallellism": 1,"xtrabackup_locks": false
      },"created": "2016-10-21T15:26:40.000Z","created_by": "","description": "","finished": "2016-10-21T15:26:41.000Z","id": 5,"job_id": 2952,"log_file": "","lsn": 140128879096992,"method": "mysqldump","parent_id": 0,"root_dir": "/tmp/BACKUP-5","status": "Completed","storage_host": "192.168.33.125"
  },
  {"backup": [
      {"db": "","files": [
          {"class_name": "CmonBackupFile","created": "2016-10-21T15:21:50.000Z","hash": "md5:538196a9d645c34b63cec51d3e18cb47","path": "backup-full-2016-10-21_172148.xbstream.gz","size": 296000,"type": "full"
          } ],"start_time": "2016-10-21T15:21:50.000Z"
      } ],"backup_host": "192.168.33.125","cid": 101,"class_name": "CmonBackupRecord","config":
      {"backupDir": "/tmp","backupHost": "192.168.33.125","backupMethod": "xtrabackupfull","backupToIndividualFiles": false,"backup_failover": false,"backup_failover_host": "","ccStorage": false,"checkHost": false,"compression": true,"includeDatabases": "","netcat_port": 9999,"origBackupDir": "/tmp","port": 3306,"set_gtid_purged_off": true,"throttle_rate_iops": 0,"throttle_rate_netbw": 0,"usePigz": false,"wsrep_desync": false,"xtrabackupParallellism": 1,"xtrabackup_locks": true
      },"created": "2016-10-21T15:21:47.000Z","created_by": "","description": "","finished": "2016-10-21T15:21:50.000Z","id": 4,"job_id": 2951,"log_file": "","lsn": 1627039,"method": "xtrabackupfull","parent_id": 0,"root_dir": "/tmp/BACKUP-4","status": "Completed","storage_host": "192.168.33.125"
  } ],"requestStatus": "ok","total": 2
}

Other supported operations are:

deletebackup
listschedules
schedule
deleteschedule
updateschedule

By having those operations extensible via ClusterControl RPC, one could automate the backup management and list the backup schedule via scripting or application call. However, to create a backup, ClusterControl handles it differently via job call (operation: createJob) since some backups may take hours or days to complete. To create a backup on cluster ID 9, one would do:

$ curl -XPOST -d '{"token": "c8gY3Eq5iFE3DC4i", "username":"admin@domain.com","operation":"createJob","job":{"command":"backup", "job_data": {"backup_method":"xtrabackupfull", "hostname": "192.168.33.121", "port":3306, "backupdir": "/tmp/backups/" }}}' http://localhost:9500/9/job

Where:

The URL format is: http://[ClusterControl_host]/clusterid/job
Backup method: Xtrabackup (full)
RPC token: c8gY3Eq5iFE3DC4i (retrievable from cmon_X.cnf)
Backup host: 192.168.33.121, port 3306
Backup destination: /tmp/backups on the backup host

For example, it’s a good idea to create a backup when testing DDL queries like TRUNCATE or DROP because those are not transactions, meaning they are impossible to rollback. We are going to cover this in details in an upcoming blog post.

With a BASH script together with correct API call, it is now possible to have an automated script like the following:

$ test_disasterous_query.sh --host 192.168.33.121 --query 'TRUNCATE mydb.processes' --backup-first 1

There are many other reasons to upgrade to the latest ClusterControl version, the backup functionality is just one of many exciting new features introduced in ClusterControl v1.4. Do upgrade (or install ClusterControl if you haven’t used it yet), give it a try and let us know your thoughts. New installations come with a 30-days trial.

Tags:

MySQL replication setups can take different shapes. The main topology is probably a simple master-slave setup. But it is also possible to construct more elaborate setups with multiple masters and chained setups. ClusterControl 1.4 takes advantage of this flexibility and gives you possibility to deploy multimaster setups. In this blog post, we will look at a couple of different setups and how they would be used in real-life situations.

New Deployment Wizard

First of all, let’s take a look at the new deployment wizard in ClusterControl 1.4. It starts with SSH configuration: user, path to ssh key and whether you use sudo or not.

Next, we pick a vendor and version, data directory, port, configuration template, password for root user and, finally, from which repository ClusterControl should install the software.

Then, the third and final step to define the topology.

Let’s go through some of these topologies in more detail.

Master - slave topology

This is the most basic setup you can create with MySQL replication - one master and one or more slaves.

Such configuration gives you scale-out for reads as you can utilize your slaves to handle read-only queries and transactions. It also adds some degree of high availability in your setup - one of slaves can be promoted to master in case the current master becomes unavailable. We introduced an automatic failover mechanism in ClusterControl 1.4.

The master - slave topology is widely used to reduce load on the master by moving reads to slaves. Slaves can also be used to handle specific types of heavy traffic - for instance, backups or analytics/reporting servers. This topology can also be used to distribute data across different datacenters.

When it comes to multiple datacenters, this might be useful if users are spread across different regions. By moving data closer to the users, you will reduce network latency.

Master - master, active - standby

This is another common deployment pattern - two MySQL instances replicating to each other. One of them is taking writes, the second one is in standby mode. This setup can be used for scale-out, where we use the standby node for reads. But this is not where its strength lies. The most common use case of this setup is to deliver high availability. When the active master dies, the standby master takes over its role and starts accepting writes. When deploying this setup, you have to keep in mind that two nodes may not be enough to avoid split brain. Ideally you’d use a third node, for example a ClusterControl host, to detect the state of the cluster. A proxy, collocated with ClusterControl, should be used to direct traffic. Colocation ensures that both ClusterControl (which performs the failover) and proxy (which routes traffic) see the topology in the same way.

You may ask - what is the difference between this setup and master-slave? One way or the other, a failover has to be performed when the active master is down. There is one important difference - replication goes both ways. This can be used to self-heal the old master after failover. As soon as you determine that the old master is safe to take a “standby” role, you can just start it and, when using GTID, all missing transactions should be replicated to it without need for any action from user.

This feature is commonly used to simplify site switchover. Let’s say that you have two site locations - active and standby/disaster recovery (DR). The DR site is designed to take over the workload when something is not right with the main location. Imagine that some issue hit your main datacenter, something not necessarily related to the database - for instance, a problem with block storage on your web tier. As long as your backup site is not affected, you can easily (or not - it depends on how your app works) switch your traffic to the backup site. From the database perspective, this is fairly simple process. Especially if you use proxies like ProxySQL which can perform a failover that is transparent to the application. After such failover, your writes hit the old standby master which now acts as active. Everything is replicated back to the primary datacenter. So when the problem is solved, you can switch the traffic back without much issues. The data in both datacenters are up-to-date.

It is worth noting that ClusterControl also supports active - active type of master - master setups. It does not deploy such topology as we strongly discourage users from writing simultaneously on multiple masters. It does not help you to scale writes, and is potentially very tricky to maintain. Still, as long as you know what you are doing, ClusterControl will detect that both masters have read_only=off and will treat the topology accordingly.

Master - master with slaves

This is an extended version of the previous topology, it combines scale-out of master - slave(s) setup with easy of failover of master - master setup. Such complex setups are commonly used across datacenters, either forming a backup environment or being actively used for scale-out, keeping data close to the rest of the application.

Topology changes

Replication topologies are not static, they evolve with time. A slave can be promoted to master, different slaves can be slaving off different masters or intermediate masters. New slaves can be added. As you can see, deploying a replication topology is one thing. Maintaining it is something different. In ClusterControl 1.4, we added the ability to modify your topology.

On the above screenshot, you can see how ClusterControl sees a master - master topology with a few slaves. On the left panel, you can see list of nodes and their roles. We can see two multi-master nodes out of which one is writable (our active master). We can also see list of slaves (read copies). On the main panel, you can see a summary for the highlighted host: its IP, IP of its master and IPs of its slaves.

As we mentioned in our previous blog post, ClusterControl handles failover for you - it checks errant transactions, it lets slaves to catch up if needed. We still need a way to move our slaves around - you can find those options in the node’s drop-down list of actions:

What we are looking for are: “Promote Slave”, which does what it says - the chosen slave will become a master (as long as there is nothing which would prevent it from happening) and the remaining hosts will slave off it. More commonly used will be “Change Replication Master”, which gives you a way to slave the chosen node off another MySQL master. Once you pick this job and “Execute” it, you’ll be presented with following dialog box:

Here you need to pick a new master host for your node. Once that’s done, click “Proceed”. In our case, we picked the IP of one of the slaves which will end up as an intermediate master. Below you can see the status of our replication setup after reslaving finished. Please note that node 172.30.4.119 is marked as “Intermediate”. It’s worth noting that ClusterControl performs sanity checks when reslaving happens - it checks for errant transactions and ensures that the master switch won’t impact replication. You can read more about those safety measures in our blog post which covers failover and switchover process.

As you can see, deploying and managing replication setups is easy with ClusterControl 1.4. We encourage you to give it a try and see how efficiently you can handle your setups. If you have any feedback on it, let us know as we’d love to hear from you.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Tags:

Moving your data into a public cloud service is a big decision. All the major cloud vendors offer cloud database services, with Amazon RDS for MySQL being probably the most popular.

In this blog, we’ll have a close look at what it is, how it works, and compare its pros and cons.

RDS (Relational Database Service) is an Amazon Web Services offering. In short, it is a Database as a Service, where Amazon deploys and operates your database. It takes care of tasks like backup and patching the database software, as well as high availability. A few databases are supported by RDS, we are here mainly interested in MySQL though - Amazon supports MySQL and MariaDB. There is also Aurora, which is Amazon’s clone of MySQL, improved, especially in area of replication and high availability.

Deploying MySQL via RDS

Let’s take a look at the deployment of MySQL via RDS. We picked MySQL and then we are presented with couple of deployment patterns to pick from.

Main choice is - do we want to have high availability or not? Aurora is also promoted.

Next dialog box gives us some options to customize. You can pick one of many MySQL versions - several 5.5, 5.6 and 5.7 versions are available. Database instance - you can choose from typical instance sizes available in a given region.

Next option is a pretty important choice - do you want to use multi-AZ deployment or not? This is all about high availability. If you don’t want to use multi-AZ deployment, a single instance will be installed. In case of failure, a new one will be spun up and its data volume will be remounted to it. This process takes some time, during which your database will not be available. Of course, you can minimize this impact by using slaves and promoting one of them, but it’s not an automated process. If you want to have automated high availability, you should use multi-AZ deployment. What will happen is that two database instances will be created. One is visible to you. A second instance, in a separate availability zone, is not visible to the user. It will act as a shadow copy, ready to take over the traffic once the active node fails. It is still not a perfect solution as traffic has to be switched from the failed instance to the shadow one. In our tests, it took ~45s to perform a failover but, obviously, it may depend on instance size, I/O performance etc. But it’s much better than non-automated failover where only slaves are involved.

Finally, we have storage settings - type, size, PIOPS (where applicable) and database settings - identifier, user and password.

In the next step, a few more options are waiting for user input.

We can choose where the instance should be created: VPC, subnet, should it be publicly available or not (as in - should a public IP be assigned to the RDS instance), availability zone and VPC Security Group. Then, we have database options: first schema to be created, port, parameter and option groups, whether metadata tags should be included in snapshots or not, encryption settings.

Next, backup options - how long do you want to keep your backups? When would you like them taken? Similar setup is related to maintenances - sometimes Amazon administrators have to perform maintenance on your RDS instance - it will happen within a predefined window which you can set here. Please note, there is no option not to pick at least 30 minutes for the maintenance window, that’s why having multi-AZ instance on production is really important. Maintenance may result in node restart or lack of availability for some time. Without multi-AZ, you need to accept that downtime. With multi-AZ deployment, failover happens.

Finally, we have settings related to additional monitoring - do we want to have it enabled or not?

Managing RDS

In this chapter we will take a closer look at how to manage MySQL RDS. We will not go through every option available out there, but we’d like to highlight some of the features Amazon made available.

Snapshots

MySQL RDS uses EBS volumes as storage, so it can use EBS snapshots for different purposes. Backups, slaves - all based on snapshots. You can create snapshots manually or they can be taken automatically, when such need arises. It is important to keep in mind that EBS snapshots, in general (not only on RDS instances), adds some overhead to I/O operations. If you want to take a snapshot, expect your I/O performance to drop. Unless you use multi-AZ deployment, that is. In such case, the “shadow” instance will be used as a source of snapshots and no impact will be visible on the production instance.

Backups

Backups are based on snapshots. As mentioned above, you can define your backup schedule and retention when you create a new instance. Of course, you can edit those settings afterwards, through the “modify instance” option.

At any time you can restore a snapshot - you need to go to the snapshot section, pick the snapshot you want to restore, and you will be presented with a dialog similar to the one you’ve seen when you created a new instance. This is not a surprise as you can only restore a snapshot into a new instance - there is no way to restore it on one of the existing RDS instances. It may come as a surprise, but even in cloud environment, it may make sense to reuse hardware (and instances you already have). In a shared environment, performance of a single virtual instance may differ - you may prefer to stick to the performance profile that you are already familiar with. Unfortunately, it’s not possible in RDS.

Another option in RDS is point-in-time recovery - very important feature, a requirement for anyone who need to take good care of her data. Here things are more complex and less bright. For starters, it’s important to keep in mind that MySQL RDS hides binary logs from the user. You can change a couple of settings and list created binlogs, but you don’t have direct access to them - to make any operation, including using them for recovery, you can only use the UI or CLI. This limits your options to what Amazon allows you to do, and it allows you to restore your backup up to the latest “restorable time” which happens to be calculated in 5 minutes interval. So, if your data has been removed at 9:33a, you can restore it only up to the state at 9:30a. Point-in-time recovery works the same way as restoring snapshots - a new instance is created.

Scale-out, replication

MySQL RDS allows scale-out through adding new slaves. When a slave is created, a snapshot of the master is taken and it is used to create a new host. This part works pretty well. Unfortunately, you cannot create any more complex replication topology like one involving intermediate masters. You are not able to create a master - master setup, which leaves any HA in the hands of Amazon (and multi-AZ deployments). From what we can tell, there is no way to enable GTID (not that you could benefit from it as you don’t have any control over the replication, no CHANGE MASTER in RDS), only regular, old-fashioned binlog positions.

Lack of GTID makes it not feasible to use multithreaded replication - while it is possible to set a number of workers using RDS parameter groups, without GTID this is unusable. Main issue is that there is no way to locate a single binary log position in case of a crash - some workers could have been behind, some could be more advanced. If you use the latest applied event, you’ll lose data that is not yet applied by those “lagging” workers. If you will use the oldest event, you’ll most likely end up with “duplicate key” errors caused by events applied by those workers which are more advanced. Of course, there is a way to solve this problem but it is not trivial and it is time-consuming - definitely not something you could easily automate.

Users created on MySQL RDS don’t have SUPER privilege so operations, which are simple in stand-alone MySQL, are not trivial in RDS. Amazon decided to use stored procedures to empower the user to do some of those operations. From what we can tell, a number of potential issues are covered although it hasn’t always been the case - we remember when you couldn’t rotate to the next binary log on the master. A master crash + binlog corruption could render all slaves broken - now there is a procedure for that: rds_next_master_log.

A slave can be manually promoted to a master. This would allow you to create some sort of HA on top of multi-AZ mechanism (or bypassing it) but it has been made pointless by the fact that you cannot reslave any of existing slaves to the new master. Remember, you don’t have any control over the replication. This makes the whole exercise futile - unless your master can accommodate all of your traffic. After promoting a new master, you are not able to failover to it because it does not have any slaves to handle your load. Spinning up new slaves will take time as EBS snapshots have to be created first and this may take hours. Then, you need to warm up the infrastructure before you can put load on it.

Lack of SUPER privilege

As we stated earlier, RDS does not grant users SUPER privilege and this becomes annoying for someone who is used to having it on MySQL. Take it for granted that, in the first weeks, you will learn how often it is required to do things that you do rather frequently - such as killing queries or operating the performance schema. In RDS, you will have to stick to predefined list of stored procedures and use them instead of doing things directly. You can list all of them using the following query:

SELECT specific_name FROM information_schema.routines;

As with replication, a number of tasks are covered but if you ended up in a situation which is not yet covered, then you’re out of luck.

Interoperability and Hybrid Cloud Setups

This is another area where RDS is lacking flexibility. Let’s say you want to build a mixed cloud/on-premises setup - you have a RDS infrastructure and you’d like to create a couple of slaves on premises. The main problem you’ll be facing is that there is no way to move data out of RDS except to take a logical dump. You can take snapshots of RDS data but you don’t have access to them and you cannot move them away from AWS. You also don’t have physical access to the instance to use xtrabackup, rsync or even cp. The only option for you is to use mysqldump, mydumper or similar tools. This adds complexity (character set and collation settings have a potential to cause problems) and is time-consuming (it takes long time to dump and load data using logical backup tools).

It is possible to setup replication between RDS and an external instance (in both ways, so migrating data into RDS is also possible), but it can be a very time-consuming process.

On the other hand, if you want to stay within an RDS environment and span your infrastructure across the atlantic or from east to west coast US, RDS allows you to do that - you can easily pick a region when you create a new slave.

Unfortunately, if you’d like to move your master from one region to the other, this is virtually not possible without downtime - unless your single node can handle all of your traffic.

Security

While MySQL RDS is a managed service, not every aspect related to security is taken care of by Amazon’s engineers. Amazon calls it “Shared Responsibility Model”. In short, Amazon takes care of the security of the network and storage layer (so that data is transferred in a secure way), operating system (patches, security fixes). On the other hand, user has to take care of the rest of the security model. Make sure traffic to and from RDS instance is limited within VPC, ensure that database level authentication is done right (no password-less MySQL user accounts), verify that API security is ensured (AMI’s are set correctly and with minimal required privileges). User should also take care of firewall settings (security groups) to minimize exposure of RDS and the VPC it’s in to external networks. It’s also the user’s responsibility to implement data at rest encryption - either on the application level or on the database level, by creating an encrypted RDS instance in the first place.

Database level encryption can be enabled only on the instance creation, you cannot encrypt an existing, already running database.

RDS limitations

If you plan to use RDS or if you are already using it, you need to be aware of limitations that come with MySQL RDS.

Lack of SUPER privilege can be, as we mentioned, very annoying. While stored procedures take care of a number of operations, it is a learning curve as you need to learn to do things in a different way. Lack of SUPER privilege can also create problems in using external monitoring and trending tools - there are still some tools which may require this priviledge for some part of its functionality.

Lack of direct access to MySQL data directory and logs makes it harder to perform actions which involves them. It happens every now and then that a DBA needs to parse binary logs or tail error, slow query or general log. While it is possible to access those logs on RDS, it is more cumbersome than doing whatever you need by logging into shell on the MySQL host. Downloading them locally also takes some time and adds additional latency to whatever you do.

Lack of control over replication topology, high availability only in multi-AZ deployments. Given that you don’t have a control over the replication, you cannot implement any kind of high availability mechanism into your database layer. It doesn’t matter that you have several slaves, you cannot use some of them as master candidates because even if you promote a slave to a master, there is no way to reslave the remaining slaves off this new master. This forces users to use multi-AZ deployments and increase costs (the “shadow” instance doesn’t come free, user has to pay for it).

Reduced availability through planned downtime. When deploying an RDS instance, you are forced to pick a weekly time window of 30 minutes during which maintenance operations may be executed on your RDS instance. On the one hand, this is understandable as RDS is a Database as a Service so hardware and software upgrades of your RDS instances are managed by AWS engineers. On the other hand, this reduce your availability because you cannot prevent your master database from going down for the duration of the maintenance period. Again, in this case using multi-AZ setup increases availability as changes happen first on the shadow instance and then failover is executed. Failover itself, though, is not transparent so, one way or the other, you lose the uptime. This forces you to design your app with unexpected MySQL master failures in mind. Not that it’s a bad design pattern - databases can crash at any time and your application should be built in a way it can withstand even the most dire scenario. It’s just that with RDS, you have limited options for high availability.

Reduced options for high availability implementation. Given the lack of flexibility in the replication topology management, the only feasible high availability method is multi-AZ deployment. This method is good but there are tools for MySQL replication which would minimize the downtime even further. For example, MHA or ClusterControl when used in connection with ProxySQL can deliver (under some conditions like lack of long running transactions) transparent failover process for the application. While on RDS, you won’t be able to use this method.

Reduced insight into performance of your database. While you can get metrics from MySQL itself, sometimes it’s just not enough to get a full 10k feet view of the situation. At some point, the majority of users will have to deal with really weird issues caused by faulty hardware or faulty infrastructure - lost network packets, abruptly terminated connections or unexpectedly high CPU utilization. When you have an access to your MySQL host, you can leverage lots of tools that help you to diagnose state of a Linux server. When using RDS, you are limited to what metrics are available in Cloudwatch, Amazon’s monitoring and trending tool. Any more detailed diagnosis require contacting support and asking them to check and fix the problem. This may be quick but it also can be a very long process with a lot of back and forth email communication.

Vendor lock-in caused by complex and time-consuming process of getting data out of the MySQL RDS. RDS doesn’t grant access to MySQL data directory so there is no way to utilize industry standard tools like xtrabackup to move data in a binary way. On the other hand, the RDS under the hood is a MySQL maintained by Amazon, it is hard to tell if it is 100% compatible with upstream or not. RDS is only available on AWS, so you would not be able to do a hybrid setup.

Summary

MySQL RDS has both strengths and weaknesses. This is a very good tool for those who’d like to focus on the application without having to worry about operating the database. You deploy a database and start issuing queries. No need for building backup scripts or setting up monitoring solution because it’s already done by AWS engineers - all you need to do is to use it.

There is also a dark side of the MySQL RDS. Lack of options to build more complex setups and scaling outside of just adding more slaves. Lack of support for better high availability than what’s proposed under multi-AZ deployments. Cumbersome access to MySQL logs. Lack of direct access to MySQL data directory and lack of support for physical backups, which makes it hard to move the data out of the RDS instance.

To sum it up, RDS may work fine for you if you value ease of use over detailed control of the database. You need to keep in mind that, at some point in the future, you may outgrow MySQL RDS. We are not necessarily talking here about performance only. It’s more about your organization’s needs for more complex replication topology or a need to have better insight into database operations to deal quickly with different issues that arise from time to time. In that case, if your dataset already has grown in size, you may find it tricky to move out of the RDS. Before making any decision to move your data into RDS, information managers must consider their organization's requirements and constraints in specific areas.

In next couple of blog posts we will show you how to take your data out of the RDS into a separate location. We will discuss both migration to EC2 and to on-premises infrastructure.

Tags:

Proxies are building blocks of high availability setups for MySQL and MariaDB. They can detect failed nodes and route queries to hosts which are still available. If your master failed and you had to promote one of your slaves, proxies will detect such topology changes and route your traffic accordingly. More advanced proxies can do much more: route traffic based on precise query rules, cache queries or mirror them. They can even be used to implement different types of sharding.

Introducing ProxySQL!

Join us for this live joint webinar with ProxySQL’s creator, René Cannaò, who will tell us more about this new proxy and its features. We will also show you how you can deploy ProxySQL using ClusterControl. And we will give you an early walk-through of some of the exciting ClusterControl features for ProxySQL that we have planned.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, February 28th at 09:00 GMT (UK) / 10:00 CET (Germany, France, Sweden)

North America/LatAm

Tuesday, February 28th at 9:00 Pacific Time (US) / 12:00 Eastern Time (US)

Agenda

Introduction
ProxySQL concepts (René Cannaò)
- Hostgroups
- Query rules
- Connection multiplexing
- Configuration management
Demo of ProxySQL setup in ClusterControl (Krzysztof Książek)
Upcoming ClusterControl features for ProxySQL

Speakers

René Cannaò, Creator & Founder, ProxySQL. René has 10 years of working experience as a System, Network and Database Administrator mainly on Linux/Unix platform. In the last 4-5 years his experience was focused mainly on MySQL, working as Senior MySQL Support Engineer at Sun/Oracle and then as Senior Operational DBA at Blackbird, (formerly PalominoDB). In this period he built an analytic and problem solving mindset and he is always eager to take on new challenges, especially if they are related to high performance. And then he created ProxySQL …

Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.

We look forward to “seeing” you there and to insightful discussions!

If you have any questions or would like a personalised live demo, please do contact us.

Tags:

In our previous blog, we saw how easy it is to get started with RDS for MySQL. It is a convenient way to deploy and use MySQL, without worrying about operational overhead. The tradeoff though is reduced control, as users are entirely reliant on Amazon staff in case of poor performance or operational anomalies. No access to the data directory or physical backups makes it hard to move data out of RDS. This can be a major problem if your database outgrows RDS, and you decide to migrate to another platform. This two-part blog shows you how to do an online migration from RDS to your own MySQL server.

We’ll be using EC2 to run our own MySQL Server. It can be a first step for more complex migrations to your own private datacenters. EC2 gives you access to your data so xtrabackup can be used. EC2 also allows you to setup SSH tunnels and it removes requirement of setting up hardware VPN connections between your on-premises infrastructure and VPC.

Assumptions

Before we start, we need to make couple of assumptions - especially around security. First and foremost, we assume that RDS instance is not accessible from outside of AWS. We also assume that you have an application in EC2. This implies that either the RDS instance and the rest of your infrastructure shares a VPC or there is access configured between them, one way or the other. In short, we assume that you can create a new EC2 instance and it will have access (or it can be configured to have the access) to your MySQL RDS instance.

We have configured ClusterControl on the application host. We’ll use it to manage our EC2 MySQL instance.

Initial setup

In our case, the RDS instance shares the same VPC with our “application” (EC2 instance with IP 172.30.4.228) and host which will be a target for the migration process (EC2 instance with IP 172.30.4.238). As the application we are going to use tpcc-MySQL benchmark executed in the following way:

./tpcc_start -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com -d tpcc1000 -u tpcc -p tpccpass -w 20 -r 60 -l 600 -i 10 -c 4

Initial plan

We are going to perform a migration using the following steps:

Setup our target environment using ClusterControl - install MySQL on 172.30.4.238
Then, install ProxySQL, which we will use to manage our traffic at the time of failover
Dump the data from the RDS instance
Load the data into our target host
Set up replication between RDS instance and target host
Switchover traffic from RDS to target host

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Prepare environment using ClusterControl

Assuming we have ClusterControl installed (if you don’t you can grab it from: https://severalnines.com/download-clustercontrol-database-management-system), we need to setup our target host. We will use the deployment wizard from ClusterControl for that:

Deploying a Database Cluster in ClusterControl

Once this is done, you will see a new cluster (in this case, just your single server) in the cluster list:

Database Cluster in ClusterControl

Next step will be to install ProxySQL - starting from ClusterControl 1.4 you can do it easily from the UI. We covered this process in details in this blog post. When installing it, we picked our application host (172.30.4.228) as the host to install ProxySQL to. When installing, you also have to pick a host to route your traffic to. As we have only our “destination” host in the cluster, you can include it but then couple of changes are needed to redirect traffic to the RDS instance.

If you have chosen to include destination host (in our case it was 172.30.4.238) in the ProxySQL setup, you’ll see following entries in the mysql_servers table:

mysql> select * from mysql_servers\G
*************************** 1. row ***************************
       hostgroup_id: 20
           hostname: 172.30.4.238
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 100
max_replication_lag: 10
            use_ssl: 0
     max_latency_ms: 0
            comment: read server
*************************** 2. row ***************************
       hostgroup_id: 10
           hostname: 172.30.4.238
               port: 3306
             status: ONLINE
             weight: 1
        compression: 0
    max_connections: 100
max_replication_lag: 10
            use_ssl: 0
     max_latency_ms: 0
            comment: read and write server
2 rows in set (0.00 sec)

ClusterControl configured ProxySQL to use hostgroups 10 and 20 to route writes and reads to the backend servers. We will have to remove the currently configured host from those hostgroups and add the RDS instance there. First, though, we have to ensure that ProxySQL’s monitor user can access the RDS instance.

mysql> SHOW VARIABLES LIKE 'mysql-monitor_username';
+------------------------+------------------+
| Variable_name          | Value            |
+------------------------+------------------+
| mysql-monitor_username | proxysql-monitor |
+------------------------+------------------+
1 row in set (0.00 sec)

mysql> SHOW VARIABLES LIKE 'mysql-monitor_password';
+------------------------+---------+
| Variable_name          | Value   |
+------------------------+---------+
| mysql-monitor_password | monpass |
+------------------------+---------+
1 row in set (0.00 sec)

We need to grant this user access to RDS. If we need it to track replication lag, the user would have to have then‘REPLICATION CLIENT’ privilege. In our case it is not needed as we don’t have slave RDS instance - ‘USAGE’ will be enough.

root@ip-172-30-4-228:~# mysql -ppassword -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 210
Server version: 5.7.16-log MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE USER 'proxysql-monitor'@172.30.4.228 IDENTIFIED BY 'monpass';
Query OK, 0 rows affected (0.06 sec)

Now it’s time to reconfigure ProxySQL. We are going to add the RDS instance to both writer (10) and reader (20) hostgroups. We will also remove 172.30.4.238 from those hostgroups - we’ll just edit them and add 100 to each hostgroup.

mysql> INSERT INTO mysql_servers (hostgroup_id, hostname, max_connections, max_replication_lag) VALUES (10, 'rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com', 100, 10);
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO mysql_servers (hostgroup_id, hostname, max_connections, max_replication_lag) VALUES (20, 'rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com', 100, 10);
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE mysql_servers SET hostgroup_id=110 WHERE hostname='172.30.4.238' AND hostgroup_id=10;
Query OK, 1 row affected (0.00 sec)

mysql> UPDATE mysql_servers SET hostgroup_id=120 WHERE hostname='172.30.4.238' AND hostgroup_id=20;
Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL SERVERS TO RUNTIME;
Query OK, 0 rows affected (0.01 sec)

mysql> SAVE MYSQL SERVERS TO DISK;
Query OK, 0 rows affected (0.07 sec)

Last step required before we can use ProxySQL to redirect our traffic is to add our application user to ProxySQL.

mysql> INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('tpcc', 'tpccpass', 1, 10);
Query OK, 1 row affected (0.00 sec)

mysql> LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK; SAVE MYSQL USERS TO MEMORY;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.05 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> SELECT username, password FROM mysql_users WHERE username='tpcc';
+----------+-------------------------------------------+
| username | password                                  |
+----------+-------------------------------------------+
| tpcc     | *8C446904FFE784865DF49B29DABEF3B2A6D232FC |
+----------+-------------------------------------------+
1 row in set (0.00 sec)

Quick note - we executed “SAVE MYSQL USERS TO MEMORY;” only to have password hashed not only in RUNTIME but also in working memory buffer. You can find more details about ProxySQL’s password hashing mechanism in their documentation.

We can now redirect our traffic to ProxySQL. How to do it depends on your setup, we just restarted tpcc and pointed it to ProxySQL.

Redirecting Traffic with ProxySQL

At this point, we have built a target environment to which we will migrate. We also prepared ProxySQL and configured it for our application to use. We now have a good foundation for the next step, which is the actual data migration. In the next post, we will show you how to copy the data out of RDS into our own MySQL instance (running on EC2). We will also show you how to switch traffic to your own instance while applications continue to serve users, without downtime.

Tags:

As we saw earlier, it might be challenging for companies to move their data out of RDS for MySQL. In the first part of this blog, we showed you how to set up your target environment on EC2 and insert a proxy layer (ProxySQL) between your applications and RDS. In this second part, we will show you how to do the actual migration of data to your own server, and then redirect your applications to the new database instance without downtime.

Copying data out of RDS

Once we have our database traffic running through ProxySQL, we can start preparations to copy our data out of RDS. We need to do this in order to set up replication between RDS and our MySQL instance running on EC2. Once this is done, we will configure ProxySQL to redirect traffic from RDS to our MySQL/EC2.

As we discussed in the first blog post in this series, the only way you can get data out of the RDS is via logical dump. Without access to the instance, we cannot use any hot, physical backup tools like xtrabackup. We cannot use snapshots either as there is no way to build anything else other than a new RDS instance from the snapshot.

We are limited to logical dump tools, therefore the logical option would be to use mydumper/myloader to process the data. Luckily, mydumper can create consistent backups so we can rely on it to provide binlog coordinates for our new slave to connect to. The main issue while building RDS replicas is binlog rotation policy - logical dump and load may take even days on larger (hundreds of gigabytes) datasets and you need to keep binlogs on the RDS instance for the duration of this whole process. Sure, you can increase binlog rotation retention on RDS (call mysql.rds_set_configuration('binlog retention hours', 24); - you can keep them up to 7 days) but it’s much safer to do it differently.

Before we proceed with taking a dump, we will add a replica to our RDS instance.

Amazon RDS Dashboard

Create Replica DB in RDS

Once we click on the “Create Read Replica” button, a snapshot will be started on the “master” RDS replica. It will be used to provision the new slave. The process may take hours, it all depends on the volume size, when was the last time a snapshot was taken and performance of the volume (io1/gp2? Magnetic? How many pIOPS a volume has?).

Master RDS Replica

When slave is ready (its status has changed to “available”), we can log into it using its RDS endpoint.

RDS Slave

Once logged in, we will stop replication on our slave - this will ensure the RDS master won’t purge binary logs and they will be still available for our EC2 slave once we complete our dump/reload process.

mysql> CALL mysql.rds_stop_replication;
+---------------------------+
| Message                   |
+---------------------------+
| Slave is down or disabled |
+---------------------------+
1 row in set (1.02 sec)

Query OK, 0 rows affected (1.02 sec)

Now, it’s finally time to copy data to EC2. First, we need to install mydumper. You can get it from github: https://github.com/maxbube/mydumper. The installation process is fairly simple and nicely described in the readme file, so we won’t cover it here. Most likely you will have to install a couple of packages (listed in the readme) and the harder part is to identify which package contains mysql_config - it depends on the MySQL flavor (and sometimes also MySQL version).

Once you have mydumper compiled and ready to go, you can execute it:

root@ip-172-30-4-228:~/mydumper# mkdir /tmp/rdsdump
root@ip-172-30-4-228:~/mydumper# ./mydumper -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com -p tpccpass -u tpcc  -o /tmp/rdsdump  --lock-all-tables --chunk-filesize 100 --events --routines --triggers
.

Please note --lock-all-tables which ensures that the snapshot of the data will be consistent and it will be possible to use it to create a slave. Now, we have to wait until mydumper complete its task.

One more step is required - we don’t want to restore the mysql schema but we need to copy users and their grants. We can use pt-show-grants for that:

root@ip-172-30-4-228:~# wget http://percona.com/get/pt-show-grants
root@ip-172-30-4-228:~# chmod u+x ./pt-show-grants
root@ip-172-30-4-228:~# ./pt-show-grants -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com -u tpcc -p tpccpass > grants.sql

Sample of pt-show-grants may look like this:

-- Grants for 'sbtest'@'%'
CREATE USER IF NOT EXISTS 'sbtest'@'%';
ALTER USER 'sbtest'@'%' IDENTIFIED WITH 'mysql_native_password' AS '*2AFD99E79E4AA23DE141540F4179F64FFB3AC521' REQUIRE NONE PASSWORD EXPIRE DEFAULT ACCOUNT UNLOCK;
GRANT ALTER, ALTER ROUTINE, CREATE, CREATE ROUTINE, CREATE TEMPORARY TABLES, CREATE USER, CREATE VIEW, DELETE, DROP, EVENT, EXECUTE, INDEX, INSERT, LOCK TABLES, PROCESS, REFERENCES, RELOAD, REPLICATION CLIENT, REPLICATION SLAVE, SELECT, SHOW DATABASES, SHOW VIEW, TRIGGER, UPDATE ON *.* TO 'sbtest'@'%';

It is up to you to pick what users are required to be copied onto your MySQL/EC2 instance. It doesn’t make sense to do it for all of them. For example, root users don’t have ‘SUPER’ privilege on RDS so it’s better to recreate them from scratch. What you need to copy are grants for your application user. We also need to copy users used by ProxySQL (proxysql-monitor in our case).

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Inserting data into your MySQL/EC2 instance

As stated above, we don’t want to restore system schemas. Therefore we will move files related to those schemas out of our mydumper directory:

root@ip-172-30-4-228:~# mkdir /tmp/rdsdump_sys/
root@ip-172-30-4-228:~# mv /tmp/rdsdump/mysql* /tmp/rdsdump_sys/
root@ip-172-30-4-228:~# mv /tmp/rdsdump/sys* /tmp/rdsdump_sys/

When we are done with it, it’s time to start to load data into the MySQL/EC2 instance:

root@ip-172-30-4-228:~/mydumper# ./myloader -d /tmp/rdsdump/ -u tpcc -p tpccpass -t 4 --overwrite-tables -h 172.30.4.238

Please note that we used four threads (-t 4) - make sure you set this to whatever makes sense in your environment. It’s all about saturating the target MySQL instance - either CPU or I/O, depending on the bottleneck. We want to squeeze as much out of it as possible to ensure we used all available resources for loading the data.

After the main data is loaded, there are two more steps to take, both are related to RDS internals and both may break our replication. First, RDS contains a couple of rds_* tables in the mysql schema. We want to load them in case some of them are used by RDS - replication will break if our slave won’t have them. We can do it in the following way:

root@ip-172-30-4-228:~/mydumper# for i in $(ls -alh /tmp/rdsdump_sys/ | grep rds | awk '{print $9}') ; do echo $i ;  mysql -ppass -uroot  mysql < /tmp/rdsdump_sys/$i ; done
mysql.rds_configuration-schema.sql
mysql.rds_configuration.sql
mysql.rds_global_status_history_old-schema.sql
mysql.rds_global_status_history-schema.sql
mysql.rds_heartbeat2-schema.sql
mysql.rds_heartbeat2.sql
mysql.rds_history-schema.sql
mysql.rds_history.sql
mysql.rds_replication_status-schema.sql
mysql.rds_replication_status.sql
mysql.rds_sysinfo-schema.sql

Similar problem is with timezone tables, we need to load them using data from the RDS instance:

root@ip-172-30-4-228:~/mydumper# for i in $(ls -alh /tmp/rdsdump_sys/ | grep time_zone | grep -v schema | awk '{print $9}') ; do echo $i ;  mysql -ppass -uroot  mysql < /tmp/rdsdump_sys/$i ; done
mysql.time_zone_name.sql
mysql.time_zone.sql
mysql.time_zone_transition.sql
mysql.time_zone_transition_type.sql

When all this is ready, we can setup replication between RDS (master) and our MySQL/EC2 instance (slave).

Setting up replication

Mydumper, when performing a consistent dump, writes down a binary log position. We can find this data in a file called metadata in the dump directory. Let’s take a look at it, we will then use the position to setup replication.

root@ip-172-30-4-228:~/mydumper# cat /tmp/rdsdump/metadata
Started dump at: 2017-02-03 16:17:29
SHOW SLAVE STATUS:
    Host: 10.1.4.180
    Log: mysql-bin-changelog.007079
    Pos: 10537102
    GTID:

Finished dump at: 2017-02-03 16:44:46

One last thing we lack is a user that we could use to setup our slave. Let’s create one on the RDS instance:

root@ip-172-30-4-228:~# mysql -ppassword -h rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com

mysql> CREATE USER IF NOT EXISTS 'rds_rpl'@'%' IDENTIFIED BY 'rds_rpl_pass';
Query OK, 0 rows affected (0.04 sec)

mysql> GRANT REPLICATION SLAVE ON *.* TO 'rds_rpl'@'%';
Query OK, 0 rows affected (0.01 sec)

Now it’s time to slave our MySQL/EC2 server off the RDS instance:

mysql> CHANGE MASTER TO MASTER_HOST='rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com', MASTER_USER='rds_rpl', MASTER_PASSWORD='rds_rpl_pass', MASTER_LOG_FILE='mysql-bin-changelog.007079', MASTER_LOG_POS=10537102;
Query OK, 0 rows affected, 2 warnings (0.03 sec)

mysql> START SLAVE;
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Queueing master event to the relay log
                  Master_Host: rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com
                  Master_User: rds_rpl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin-changelog.007080
          Read_Master_Log_Pos: 13842678
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 20448
        Relay_Master_Log_File: mysql-bin-changelog.007079
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 10557220
              Relay_Log_Space: 29071382
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 258726
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1237547456
                  Master_UUID: b5337d20-d815-11e6-abf1-120217bb3ac2
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set:
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.01 sec)

Last step will be to switch our traffic from the RDS instance to MySQL/EC2, but we need to let it catch up first.

When the slave has caught up, we need to perform a cutover. To automate it, we decided to prepare a short bash script which will connect to ProxySQL and do what needs to be done.

# At first, we define old and new masters
OldMaster=rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com
NewMaster=172.30.4.238

(
# We remove entries from mysql_replication_hostgroup so ProxySQL logic won’t interfere
# with our script

echo "DELETE FROM mysql_replication_hostgroups;"

# Then we set current master to OFFLINE_SOFT - this will allow current transactions to
# complete while not accepting any more transactions - they will wait (by default for
# 10 seconds) for a master to become available again.

echo "UPDATE mysql_servers SET STATUS='OFFLINE_SOFT' WHERE hostname=\"$OldMaster\";"
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032


# Here we are going to check for connections in the pool which are still used by
# transactions which haven’t closed so far. If we see that neither hostgroup 10 nor
# hostgroup 20 has open transactions, we can perform a switchover.

CONNUSED=`mysql -h 127.0.0.1 -P6032 -uadmin -padmin -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND (hostgroup=10 OR hostgroup=20)' -B -N 2> /dev/null`
TRIES=0
while [ $CONNUSED -ne 0 -a $TRIES -ne 20 ]
do
  CONNUSED=`mysql -h 127.0.0.1 -P6032 -uadmin -padmin -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND (hostgroup=10 OR hostgroup=20)' -B -N 2> /dev/null`
  TRIES=$(($TRIES+1))
  if [ $CONNUSED -ne "0" ]; then
    sleep 0.05
  fi
done

# Here is our switchover logic - we basically exchange hostgroups for RDS and EC2
# instance. We also configure back mysql_replication_hostgroups table.

(
echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=110 WHERE hostname=\"$OldMaster\" AND hostgroup_id=10;"
echo "UPDATE mysql_servers SET STATUS='ONLINE', hostgroup_id=120 WHERE hostname=\"$OldMaster\" AND hostgroup_id=20;"
echo "UPDATE mysql_servers SET hostgroup_id=10 WHERE hostname=\"$NewMaster\" AND hostgroup_id=110;"
echo "UPDATE mysql_servers SET hostgroup_id=20 WHERE hostname=\"$NewMaster\" AND hostgroup_id=120;"
echo "INSERT INTO mysql_replication_hostgroups VALUES (10, 20, 'hostgroups');"
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -u admin -padmin -h 127.0.0.1 -P6032

When all is done, you should see the following contents in the mysql_servers table:

mysql> select * from mysql_servers;
+--------------+-----------------------------------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+-------------+
| hostgroup_id | hostname                                      | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment     |
+--------------+-----------------------------------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+-------------+
| 20           | 172.30.4.238                                  | 3306 | ONLINE | 1      | 0           | 100             | 10                  | 0       | 0              | read server |
| 10           | 172.30.4.238                                  | 3306 | ONLINE | 1      | 0           | 100             | 10                  | 0       | 0              | read server |
| 120          | rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com | 3306 | ONLINE | 1      | 0           | 100             | 10                  | 0       | 0              |             |
| 110          | rds2.cvsw8xpajw2b.us-east-1.rds.amazonaws.com | 3306 | ONLINE | 1      | 0           | 100             | 10                  | 0       | 0              |             |
+--------------+-----------------------------------------------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+-------------+

On the application side, you should not see much of an impact, thanks to the ability of ProxySQL to queue queries for some time.

With this we completed the process of moving your database from RDS to EC2. Last step to do is to remove our RDS slave - it did its job and it can be deleted.

In our next blog post, we will build upon that. We will walk through a scenario in which we will move our database out of AWS/EC2 into a separate hosting provider.

Tags:

After attacks on MongoDB databases, we have recently also seen that MySQL servers are being targeted by ransomware. This should not come as a surprise, given the increasing adoption of public and private clouds. Running a poorly configured database in the cloud can become a major liability.

In this blog post, we’ll share with you a number of tips on how to protect and secure your MySQL or MariaDB servers.

Understanding the Attack Vector

Quoting SCMagazine:
“The attack starts with brute-forcing the root password for the MySQL database. Once logged in, the MySQL databases and tables are fetched. The attacker then creates a new table called ‘WARNING' that includes a contact email address, a bitcoin address and a payment demand.”

Based on the article, the attack vector starts by guessing the MySQL root password via brute-force method. Brute-force attack consists of an attacker trying many passwords or passphrases with the hope of eventually guessing correctly. This means short passwords can usually be discovered quite quickly, but longer passwords may take days or months.

Brute-force is a common attack that would happen to any service. Unfortunately for MySQL (and many other DBMS), there is no out-of-the-box feature that detects and blocks brute-force attacks from specific addresses during user authentication. MySQL does capture authentication failures in the error log though.

Review your Password Policy

Reviewing the MySQL password policy is always the first step to protect your server. MySQL root password should be strong enough with combination of alphabets, numbers and symbols (which makes it harder to remember) and stored in a safe place. Change the password regularly, at least every calendar quarter. Based on the attack vector, this is the weakest point that hackers target. If you value your data, don’t overlook this part.

MySQL deployments performed by ClusterControl will always follow the vendor’s security best practices, for example there will be no wildcard host defined during GRANT and sensitive login credentials stored in the configuration file is permissible only to OS’s root user. We strongly recommend our users to specify a strong password during the deployment stage.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Isolate the MySQL Server

In a standard production environment, database servers are usually located in a lower level tier. This layer should be protected and only accessible from the upper tier, such as application or load balancer. If the database is co-located with the application, you can even lockdown against non-local addresses and use MySQL socket file instead (less overhead and more secure).

Configuring the "bind-address" parameter is vital here. Take note that MySQL binding is limited to either none, one or all IP addresses (0.0.0.0) on the server. If you have no choice and need MySQL to listen to all network interfaces, restrict the access to the MySQL service from known good sources. Use a firewall application or security group to whitelist access only from hosts that need to access the database directly.

Sometimes, the MySQL server has to be exposed to a public network for integration purposes (e.g, monitoring, auditing, backup etc). That’s fine as long as you draw a border around it. Don’t let unwanted sources to “see” the MySQL server. You can bet how many people in the world know 3306 is the default port for MySQL service, and by simply performing a port scan against a network address, an attacker can create a list of exposed MySQL servers in the subnet in less than a minute. Advisedly, use a custom MySQL port by configuring the "port" parameter in the MySQL configuration file to minimize the exposure risk.

Review the User Policy

Limit certain users to hold the critical administration rights, especially GRANT, SUPER and PROCESS. You can also enable super_read_only if the server is a slave, only available on MySQL 5.7.8 and Percona Server 5.6.21 and later (sadly not with MariaDB). When enabled, the server will not allow any updates, beside updating the replication repositories if slave status logs are tables, even for the users that have SUPER privilege. Remove the default test database and any users with empty passwords to narrow the scope of penetration. This is one of the security checks performed by ClusterControl, implemented as a database advisor.

It’s also a good idea to restrict the number of connections permitted to a single account. You can do so by setting the max_user_connections variable in mysqld (default is 0, equal to unlimited) or use the resource control options in GRANT/CREATE USER/ALTER USER statements. The GRANT statement supports limiting the number of simultaneous connections to the server by an account, for example:

mysql> GRANT ALL PRIVILEGES ON db.* TO 'db_user'@'localhost' WITH MAX_USER_CONNECTIONS 2;

Create MySQL account with MAX_USER_CONNECTIONS resource control option using ClusterControl

The default administrator username on the MySQL server is “root”. Hackers often attempt to gain access to its permissions. To make this task much harder, rename “root” to something else. MySQL user names can be up to 32 characters long (16 characters before MySQL 5.7.8). It is possible to use a longer username for the super admin user by using the RENAME statement as shown below:

mysql> RENAME USER root TO new_super_administrator_username;

A side note for ClusterControl users, ClusterControl needs to know the MySQL root user and password to automate and manage the database server for you. By default, it will look for ‘root’. If you rename the root user to something else, specify “monitored_mysql_root_user={new_user}” inside cmon_X.cnf (where X is the cluster ID) and restart CMON service to apply the change.

Backup Policy

Even though the hackers stated that you would get your data back once the ransom is paid, this was usually not the case. Increasing the backup frequency would increase the possibility to restore your deleted data. For example, instead of a full backup once a week with daily incremental backup, you can schedule a full backup once a day with hourly incremental backup. You can do this easily with ClusterControl’s backup management feature, and restore your data if something goes wrong.

If you have binary logs (binlogs) enabled, that’s even better. You can create a full backup every day and backup the binary logs. Binlogs are important for point-in-time recovery and should be backed up regularly as part of your backup procedure. DBAs tend to miss this simple method, which is worth every cent. In case if you got hacked, you can always recover to the last point before it happened, provided the hackers did not purge the binary logs. Take note that binary logs purging is only possible when the attacker has SUPER privilege.

One more important thing is that the backup files must be restorable. Verify the backups every now and then, and avoid bad surprises when you need to restore.

Safeguard your Web/Application Server

Well, if you have isolated your MySQL servers, there are still chances for the attackers to access them via web or application server. By injecting a malicious script (e.g, Cross-Site Scripting, SQL injection) against the target website, one can get into the application directory, and have the ability to read the application files. These might contain sensitive information, for instance, the database login credentials. By looking at this, an attacker can simply log into the database, delete all tables and leave a “ransom” table inside. It doesn’t necessarily have to be a MySQL root user to ransom a victim.

There are thousands of ways to compromise a web server and you can’t really close the inbound port 80 or 443 for this purpose. Another layer of protection is required to safeguard your web server from HTTP-based injections. You can use Web Application Firewall (WAF) like Apache ModSecurity, NAXSI (WAF for nginx), WebKnight (WAF for IIS) or simply running your web servers in a secure Content Delivery Network (CDN) like CloudFlare, Akamai or Amazon CloudFront.

Always Keep Up-to-date

You have probably heard about the critical zero-day MySQL exploit, where a non-privileged user can escalate itself to super user? It sounds scary. Luckily, all known vendors has updated their repository to include a bug fix for this issue.

For production use, it’s highly recommended for you to install the MySQL/MariaDB packages from the vendor’s repository. Don’t rely on the default operating system repository, where the packages are usually outdated. If you are running in a cluster environment like Galera Cluster, or even MySQL Replication, you always have the choice to patch the system with minimal downtime. Make this into a routine and try to automate the upgrade procedure as much as possible.

ClusterControl supports minor version rolling upgrade (one node at a time) for MySQL/MariaDB with a single click. Major versions upgrade (e.g, from MySQL 5.6 to MySQL 5.7) commonly requires uninstallation of the existing packages and it is a risky task to automate. Careful planning and testing is necessary for such kind of upgrade.

Conclusion

Ransomware is an easy-money gold pot. We will probably see more security breaches in the future, and it is better to take action before something happens. Hackers are targeting many vulnerable servers out there, and very likely this attack will spread to other database technologies as well. Protecting your data is a constant challenge for database administrators. The real enemy is not the offender, but our attitude towards protecting our critical assets.

Tags:

For years, MySQL replication used to be based on binary log events - all a slave knew was the exact event and the exact position it just read from the master. Any single transaction from a master may have ended in different binary logs, and in different positions in these logs. It was a simple solution that came with limitations - more complex topology changes could require an admin to stop replication on the hosts involved. Or these changes could cause some other issues, e.g., a slave couldn’t be moved down the replication chain without time-consuming rebuild process (we couldn’t easily change replication from A -> B -> C to A -> C -> B without stopping replication on both B and C). We’ve all had to work around these limitations while dreaming about a global transaction identifier.

GTID was introduced along with MySQL 5.6, and brought along some major changes in the way MySQL operates. First of all, every transaction has an unique identifier which identifies it in a same way on every server. It’s not important anymore in which binary log position a transaction was recorded, all you need to know is the GTID: ‘966073f3-b6a4-11e4-af2c-080027880ca6:4’. GTID is built from two parts - the unique identifier of a server where a transaction was first executed, and a sequence number. In the above example, we can see that the transaction was executed by the server with server_uuid of ‘966073f3-b6a4-11e4-af2c-080027880ca6’ and it’s 4th transaction executed there. This information is enough to perform complex topology changes - MySQL knows which transactions have been executed and therefore it knows which transactions need to be executed next. Forget about binary logs, it’s all in the GTID.

So, where can you find GTID’s? You’ll find them in two places. On a slave, in ‘show slave status;’ you’ll find two columns: Retrieved_Gtid_Set and Executed_Gtid_Set. First one covers GTID’s which were retrieved from the master via replication, the second informs about all transactions which were executed on given host - both via replication or executed locally.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Setting up a Replication Cluster the easy way

Deployment of MySQL replication cluster is very easy in ClusterControl (you can try it for free). The only prerequisite is that all hosts, which you will use to deploy MySQL nodes to, can be accessed from the ClusterControl instance using passwordless SSH connection.

When connectivity is in place, you can deploy a cluster by using the “Deploy” option. When the wizard window is open, you need to make couple of decisions - what do you want to do? Deploy a new cluster? Deploy a Postgresql node or import existing cluster.

We want to deploy a new cluster. We will then be presented with following screen in which we need to decide what type of cluster we want to deploy. Let’s pick replication and then pass the required details about ssh connectivity.

When ready, click on Continue. This time we need to decide which MySQL vendor we’d like to use, what version and couple of configuration settings including, among others, password for the root account in MySQL.

Finally, we need to decide on the replication topology - you can either use a typical master - slave setup or create more complex, active - standby master - master pair (+ slaves should you want to add them). Once ready, just click on “Deploy” and in couple of minutes you should have your cluster deployed.

Once this is done, you will see your cluster in the cluster list of ClusterControl’s UI.

Having the replication up and running we can take a closer look at how GTID works.

Errant transactions - what is the issue?

As we mentioned at the beginning of this post, GTID’s brought a significant change in the way people should think about MySQL replication. It’s all about habits. Let’s say, for some reason, that an application performed a write on one of the slaves. It shouldn’t have happened but surprisingly, it happens all the time. As a result, replication stops with duplicate key error. There are couple of ways to deal with such problem. One of them would be to delete the offending row and restart replication. Other one would be to skip the binary log event and then restart replication.

STOP SLAVE SQL_THREAD; SET GLOBAL sql_slave_skip_counter = 1; START SLAVE SQL_THREAD;

Both ways should bring replication back to work, but they may introduce data drift so it is necessary to remember that slave consistency should be checked after such event (pt-table-checksum and pt-table-sync works well here).

If a similar problem happens while using GTID, you’ll notice some differences. Deleting the offending row may seem to fix the issue, replication should be able to commence. The other method, using sql_slave_skip_counter won’t work at all - it’ll return an error. Remember, it’s now not about binlog events, it’s all about GTID being executed or not.

Why deleting the row only ‘seems’ to fix the issue? One of the most important things to keep in mind regarding GTID is that a slave, when connecting to the master, checks if it is missing any transactions which were executed on the master. These are called errant transactions. If a slave finds such transactions, it will execute them. Let’s assume we ran following SQL to clear an offending row:

DELETE FROM mytable WHERE id=100;

Let’s check show slave status:

                  Master_UUID: 966073f3-b6a4-11e4-af2c-080027880ca6
           Retrieved_Gtid_Set: 966073f3-b6a4-11e4-af2c-080027880ca6:1-29
            Executed_Gtid_Set: 84d15910-b6a4-11e4-af2c-080027880ca6:1,
966073f3-b6a4-11e4-af2c-080027880ca6:1-29,

And see where the 84d15910-b6a4-11e4-af2c-080027880ca6:1 comes from:

mysql> SHOW VARIABLES LIKE 'server_uuid'\G
*************************** 1. row ***************************
Variable_name: server_uuid
        Value: 84d15910-b6a4-11e4-af2c-080027880ca6
1 row in set (0.00 sec)

As you can see, we have 29 transactions that came from the master, UUID of 966073f3-b6a4-11e4-af2c-080027880ca6 and one that was executed locally. Let’s say that at some point we failover and the master (966073f3-b6a4-11e4-af2c-080027880ca6) becomes a slave. It will check its list of executed GTID’s and will not find this one: 84d15910-b6a4-11e4-af2c-080027880ca6:1. As a result, the related SQL will be executed:

DELETE FROM mytable WHERE id=100;

This is not something we expected… If, in the meantime, the binlog containing this transaction would be purged on the old slave, then the new slave will complain after failover:

                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

How to detect errant transactions?

MySQL provides two functions which come in very handy when you want to compare GTID sets on different hosts.

GTID_SUBSET() takes two GTID sets and checks if the first set is a subset of the second one.

Let’s say we have following state.

Master:

mysql> show master status\G
*************************** 1. row ***************************
             File: binlog.000002
         Position: 160205927
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 8a6962d2-b907-11e4-bebc-080027880ca6:1-153,
9b09b44a-b907-11e4-bebd-080027880ca6:1,
ab8f5793-b907-11e4-bebd-080027880ca6:1-2
1 row in set (0.00 sec)

Slave:

mysql> show slave status\G
[...]
           Retrieved_Gtid_Set: 8a6962d2-b907-11e4-bebc-080027880ca6:1-153,
9b09b44a-b907-11e4-bebd-080027880ca6:1
            Executed_Gtid_Set: 8a6962d2-b907-11e4-bebc-080027880ca6:1-153,
9b09b44a-b907-11e4-bebd-080027880ca6:1,
ab8f5793-b907-11e4-bebd-080027880ca6:1-4

We can check if the slave has any errant transactions by executing the following SQL:

mysql> SELECT GTID_SUBSET('8a6962d2-b907-11e4-bebc-080027880ca6:1-153,ab8f5793-b907-11e4-bebd-080027880ca6:1-4', '8a6962d2-b907-11e4-bebc-080027880ca6:1-153, 9b09b44a-b907-11e4-bebd-080027880ca6:1, ab8f5793-b907-11e4-bebd-080027880ca6:1-2') as is_subset\G
*************************** 1. row ***************************
is_subset: 0
1 row in set (0.00 sec)

Looks like there are errant transactions. How do we identify them? We can use another function, GTID_SUBTRACT()

mysql> SELECT GTID_SUBTRACT('8a6962d2-b907-11e4-bebc-080027880ca6:1-153,ab8f5793-b907-11e4-bebd-080027880ca6:1-4', '8a6962d2-b907-11e4-bebc-080027880ca6:1-153, 9b09b44a-b907-11e4-bebd-080027880ca6:1, ab8f5793-b907-11e4-bebd-080027880ca6:1-2') as mising\G
*************************** 1. row ***************************
mising: ab8f5793-b907-11e4-bebd-080027880ca6:3-4
1 row in set (0.01 sec)

Our missing GTID’s are ab8f5793-b907-11e4-bebd-080027880ca6:3-4 - those transactions were executed on the slave but not on the master.

How to solve issues caused by errant transactions?

There are two ways - inject empty transactions or exclude transactions from GTID history.

To inject empty transactions we can use the following SQL:

mysql> SET gtid_next='ab8f5793-b907-11e4-bebd-080027880ca6:3';
Query OK, 0 rows affected (0.01 sec)

mysql> begin ; commit;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> SET gtid_next='ab8f5793-b907-11e4-bebd-080027880ca6:4';
Query OK, 0 rows affected (0.00 sec)

mysql> begin ; commit;
Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.01 sec)

mysql> SET gtid_next=automatic;
Query OK, 0 rows affected (0.00 sec)

This has to be executed on every host in the replication topology that does not have those GTID’s executed. If the master is available, you can inject those transactions there and let them replicate down the chain. If the master is not available (for example, it crashed), those empty transactions have to be executed on every slave. Oracle developed a tool called mysqlslavetrx which is designed to automate this process.

Another approach is to remove the GTID’s from history:

Stop slave:

mysql> STOP SLAVE;

Print Executed_Gtid_Set on the slave:

mysql> SHOW MASTER STATUS\G

Reset GTID info:

RESET MASTER;

Set GTID_PURGED to a correct GTID set. based on data from SHOW MASTER STATUS. You should exclude errant transactions from the set.

SET GLOBAL GTID_PURGED='8a6962d2-b907-11e4-bebc-080027880ca6:1-153, 9b09b44a-b907-11e4-bebd-080027880ca6:1, ab8f5793-b907-11e4-bebd-080027880ca6:1-2';

Start slave.

mysql> START SLAVE\G

In every case, you should verify consistency of your slaves using pt-table-checksum and pt-table-sync (if needed) - errant transaction may result in a data drift.

Failover in ClusterControl

Starting from version 1.4, ClusterControl enhanced its failover handling processes for MySQL Replication. You can still perform a manual master switch by promoting one of the slaves to master. The rest of the slaves will then fail-over to the new master. From version 1.4, ClusterControl also have the ability to perform a fully-automated failover should the master fail. We covered it in-depth in a blog post describing ClusterControl and automated failover. We’d still like to mention one feature, directly related to the topic of this post.

By default, ClusterControl performs failover in a “safe way” - at the time of failover (or switchover, if it’s the user who executed a master switch), ClusterControl picks a master candidate and then verifies that this node does not have any errant transactions which would impact replication once it is promoted to master. If an errant transaction is detected, ClusterControl will stop the failover process and the master candidate will not be promoted to become a new master.

If you want to be 100% certain that ClusterControl will promote a new master even if some issues (like errant transactions) are detected, you can do that using the replication_stop_on_error=0 setting in cmon configuration. Of course, as we discussed, it may lead to problems with replication - slaves may start asking for a binary log event which is not available anymore.

To handle such cases, we added experimental support for slave rebuilding. If you set replication_auto_rebuild_slave=1 in the cmon configuration and your slave is marked as down with the following error in MySQL, ClusterControl will attempt to rebuild the slave using data from the master:

Such a setting may not always be appropriate as the rebuilding process will induce an increased load on the master. It may also be that your dataset is very large and a regular rebuild is not an option - that’s why this behavior is disabled by default.

Tags:

Replication is one of the most common ways to achieve high availability for MySQL and MariaDB. It has become much more robust with the addition of GTIDs, and is thoroughly tested by thousands and thousands of users. MySQL Replication is not a ‘set and forget’ property though, it needs to be monitored for potential issues and maintained so it stays in good shape. In this blog post, we’d like to share some tips and tricks on how to maintain, troubleshoot and fix issues with MySQL replication.

How to determine if MySQL replication is in a good shape?

This is hands down the most important skill that anyone taking care of a MySQL replication setup has to possess. Let’s take a look at where to look for information about the state of replication. There is a slight difference between MySQL and MariaDB and we will discuss this as well.

SHOW SLAVE STATUS

This is hands down the most common method of checking the state of replication on a slave host - it’s with us since always and it’s usually the first place where we go if we expect that there is some issue with replication.

mysql> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.101
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: binlog.000002
          Read_Master_Log_Pos: 767658564
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 405
        Relay_Master_Log_File: binlog.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 767658564
              Relay_Log_Space: 606
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 5d1e2227-07c6-11e7-8123-080027495a77
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set: 5d1e2227-07c6-11e7-8123-080027495a77:1-394233
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

Some details may differ between MySQL and MariaDB but the majority of the content will look the same. Changes will be visible in the GTID section as MySQL and MariaDB do it in a different way. From SHOW SLAVE STATUS, you can derive some pieces of information - which master is used, which user and which port is used to connect to the master. We have some data about the current binary log position (not that important anymore as we can use GTID and forget about binlogs) and the state of SQL and I/O replication threads. Then you can see if and how filtering is configured. You can also find some information about errors, replication lag, SSL settings and GTID. The example above comes from MySQL 5.7 slave which is in a healthy state. Let’s take a look at some example where replication is broken.

MariaDB [test]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.104
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: binlog.000003
          Read_Master_Log_Pos: 636
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 765
        Relay_Master_Log_File: binlog.000003
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Update_rows_v1 event on table test.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000003, end_log_pos 609
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 480
              Relay_Log_Space: 1213
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Update_rows_v1 event on table test.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000003, end_log_pos 609
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 0-1-73243
      Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
                Parallel_Mode: conservative
1 row in set (0.00 sec)

This sample is taken from MariaDB 10.1, you can see changes at the bottom of the output to make it work with MariaDB GTID’s. What’s important for us is the error - you can see that something is not right in the SQL thread:

Last_SQL_Error: Could not execute Update_rows_v1 event on table test.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000003, end_log_pos 609

We will discuss this particular problem later, for now it’s enough that you will see how you can check if there are any errors in the replication using SHOW SLAVE STATUS.

Another important information that comes from SHOW SLAVE STATUS is - how badly our slave lags. You can check it in “Seconds_Behind_Master” column. This metric is especially important to track if you know your application to be sensitive when it comes to stale reads.

In ClusterControl you can track this data in the “Overview” section:

We made visible all of the most important pieces of information from SHOW SLAVE STATUS command. You can check the status of the replication, who is master, if there is a replication lag or not, binary log positions. You can also find retrieved and executed GTID’s.

Performance Schema

Another place you can look for the information about replication is the performance_schema. This applies only to Oracle’s MySQL 5.7 - earlier versions and MariaDB don’t collect this data.

mysql> SHOW TABLES FROM performance_schema LIKE 'replication%';
+---------------------------------------------+
| Tables_in_performance_schema (replication%) |
+---------------------------------------------+
| replication_applier_configuration           |
| replication_applier_status                  |
| replication_applier_status_by_coordinator   |
| replication_applier_status_by_worker        |
| replication_connection_configuration        |
| replication_connection_status               |
| replication_group_member_stats              |
| replication_group_members                   |
+---------------------------------------------+
8 rows in set (0.00 sec)

Below you can find some examples of data available in some of those tables.

mysql> select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME:
               GROUP_NAME:
              SOURCE_UUID: 5d1e2227-07c6-11e7-8123-080027495a77
                THREAD_ID: 32
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 1
 LAST_HEARTBEAT_TIMESTAMP: 2017-03-17 19:41:34
 RECEIVED_TRANSACTION_SET: 5d1e2227-07c6-11e7-8123-080027495a77:715599-724966
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
1 row in set (0.00 sec)

mysql> select * from replication_applier_status_by_worker\G
*************************** 1. row ***************************
         CHANNEL_NAME:
            WORKER_ID: 0
            THREAD_ID: 31
        SERVICE_STATE: ON
LAST_SEEN_TRANSACTION: 5d1e2227-07c6-11e7-8123-080027495a77:726086
    LAST_ERROR_NUMBER: 0
   LAST_ERROR_MESSAGE:
 LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
1 row in set (0.00 sec)

As you can see, we can verify the state of the replication, last error, received transaction set and some more data. What’s important - if you enabled multi-threaded replication, in replication_applier_status_by_worker table, you will see the state of every single worker - this helps you understand the state of replication for each of the worker threads.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Replication Lag

Lag is definitely one of the most common problems you’ll be facing when working with MySQL replication. Replication lag shows up when one of the slaves is unable to keep up with the amount of write operations performed by the master. Reasons could be different - different hardware configuration, heavier load on the slave, high degree of write parallelization on master which has to be serialized (when you use single thread for the replication) or the writes cannot be parallelized to the same extent as it has been on the master (when you use multi-threaded replication).

How to detect it?

There are couple of methods to detect the replication lag. First of all, you may check “Seconds_Behind_Master” in the SHOW SLAVE STATUS output - it will tell you if the slave is lagging or not. It works well in most of the cases but in more complex topologies, when you use intermediate masters, on hosts somewhere low in the replication chain, it may be not precise. Another, better, solution is to rely on external tools like pt-heartbeat. Idea is simple - a table is created with, amongst others, a timestamp column. This column is updated on the master at regular intervals. On a slave, you can then compare the timestamp from that column with current time - it will tell you how far behind the slave is.

Regardless of the way you calculate the lag, make sure your hosts are in sync time-wise. Use ntpd or other means of time syncing - if there is a time drift, you will see “false” lag on your slaves.

How to reduce lag?

This is not an easy question to answer. In short, it depends on what is causing the lag, and what became a bottleneck. There are two typical patterns - slave is I/O bound, which means that its I/O subsystem can’t cope with amount of write and read operations. Second - slave is CPU-bound, which means that replication thread uses all CPU it can (one thread can use only one CPU core) and it’s still not enough to handle all write operations.

When CPU is a bottleneck, the solution can be as simple as to use multi-threaded replication. Increase the number of working threads to allow higher parallelization. It is not always possible though - in such case you may want to play a bit with group commit variables (for both MySQL and MariaDB) to delay commits for a slight period of time (we are talking about milliseconds here) and, in this way, increase parallelization of commits.

If the problem is in the I/O, the problem is a bit harder to solve. Of course, you should review your InnoDB I/O settings - maybe there is room for improvements. If my.cnf tuning won’t help, you don’t have too many options - improve your queries (wherever it’s possible) or upgrade your I/O subsystem to something more capable.

Most of the proxies (for example, all proxies which can be deployed from ClusterControl: ProxySQL, HAProxy and MaxScale) give you a possibility to remove a slave out of rotation if replication lag crosses some predefined threshold. This is by no means a method to reduce lag but it can be helpful to avoid stale reads and, as a side effect, to reduce the load on a slave which should help it to catch up.

Of course, query tuning can be a solution in both cases - it’s always good to improve queries which are CPU or I/O heavy.

Errant Transactions

Errant transactions are transactions which have been executed on a slave only, not on the master. In short, they make a slave inconsistent with the master. When using GTID-based replication, this can cause serious troubles if the slave is promoted to a master. We have an in-depth post on this topic and we encourage you to look into it and get familiar with how to detect and fix issues with errant transactions. We also included there information how ClusterControl detects and handles errant transactions.

No Binlog File on the Master

How to identify the problem?

Under some circumstances, it may happen that a slave connects to a master and asks for non-existing binary log file. One reason for this could be the errant transaction - at some point in time, a transaction has been executed on a slave and later this slave becomes a master. Other hosts, which are configured to slave off that master, will ask for that missing transaction. If it was executed a long time ago, there is a chance that binary log files have already been purged.

Another, more typical, example - you want to provision a slave using xtrabackup. You copy the backup on a host, apply the log, change the owner of MySQL data directory - typical operations you do to restore a backup. You execute

SET GLOBAL gtid_purged=

based on the data from xtrabackup_binlog_info and you run CHANGE MASTER TO … MASTER_AUTO_POSITION=1 (this is in MySQL, MariaDB has a slightly different process), start the slave and then you end up with an error like:

                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

in MySQL or:

                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find GTID state requested by slave in any binlog files. Probably the slave state is too old and required binlog files have been purged.'

in MariaDB.

This basically means that the master doesn’t have all binary logs needed to execute all missing transactions. Most likely, the backup is too old and the master already purged some of binary logs created between the time when backup was created and when the slave was provisioned.

How to solve this problem?

Unfortunately, there’s not much you can do in this particular case. If you have some MySQL hosts which store binary logs for longer time than the master, you can try to use those logs to replay missing transactions on the slave. Let’s take a look how it can be done.

First of all, let’s take a look at the oldest GTID in the master’s binary logs:

mysql> SHOW BINARY LOGS\G
*************************** 1. row ***************************
 Log_name: binlog.000021
File_size: 463
1 row in set (0.00 sec)

So, ‘binlog.000021’ is the latest (and only) file. Let’s check what’s the first GTID entry in this file:

root@master:~# mysqlbinlog /var/lib/mysql/binlog.000021
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#170320 10:39:51 server id 1  end_log_pos 123 CRC32 0x5644fc9b     Start: binlog v 4, server v 5.7.17-11-log created 170320 10:39:51
# Warning: this binlog is either in use or was not closed properly.
BINLOG '
d7HPWA8BAAAAdwAAAHsAAAABAAQANS43LjE3LTExLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA
AZv8RFY='/*!*/;
# at 123
#170320 10:39:51 server id 1  end_log_pos 194 CRC32 0x5c096d62     Previous-GTIDs
# 5d1e2227-07c6-11e7-8123-080027495a77:1-1106668
# at 194
#170320 11:21:26 server id 1  end_log_pos 259 CRC32 0xde21b300     GTID    last_committed=0    sequence_number=1
SET @@SESSION.GTID_NEXT= '5d1e2227-07c6-11e7-8123-080027495a77:1106669'/*!*/;
# at 259

As we can see, the oldest binary log entry that’s available is: 5d1e2227-07c6-11e7-8123-080027495a77:1106669

We need also to check what’s the last GTID covered in the backup:

root@slave1:~# cat /var/lib/mysql/xtrabackup_binlog_info
binlog.000017    194    5d1e2227-07c6-11e7-8123-080027495a77:1-1106666

It is: 5d1e2227-07c6-11e7-8123-080027495a77:1-1106666 so we lack two events:
5d1e2227-07c6-11e7-8123-080027495a77:1106667-1106668

Let’s see if we can find those transactions on other slave.

mysql> SHOW BINARY LOGS;
+---------------+------------+
| Log_name      | File_size  |
+---------------+------------+
| binlog.000001 | 1074130062 |
| binlog.000002 |  764366611 |
| binlog.000003 |  382576490 |
+---------------+------------+
3 rows in set (0.00 sec)

It seems that ‘binlog.000003’ is the latest binary log. We need to check if our missing GTID’s can be found in it:

slave2:~# mysqlbinlog /var/lib/mysql/binlog.000003 | grep "5d1e2227-07c6-11e7-8123-080027495a77:110666[78]"
SET @@SESSION.GTID_NEXT= '5d1e2227-07c6-11e7-8123-080027495a77:1106667'/*!*/;
SET @@SESSION.GTID_NEXT= '5d1e2227-07c6-11e7-8123-080027495a77:1106668'/*!*/;

Please keep in mind that you may want to copy binlog files outside of the production server as processing them can add some load. As we verified that those GTID’s exist, we can extract them:

slave2:~# mysqlbinlog --exclude-gtids='5d1e2227-07c6-11e7-8123-080027495a77:1-1106666,5d1e2227-07c6-11e7-8123-080027495a77:1106669' /var/lib/mysql/binlog.000003 > to_apply_on_slave1.sql

After a quick scp, we can apply those events on the slave

slave1:~# mysql -ppass < to_apply_on_slave1.sql

Once done, we can verify if those GTID’s have been applied by looking into the output of SHOW SLAVE STATUS:

                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 5d1e2227-07c6-11e7-8123-080027495a77
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp: 170320 10:45:04
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set:
            Executed_Gtid_Set: 5d1e2227-07c6-11e7-8123-080027495a77:1-1106668

Executed_GTID_set looks good therefore we can start slave threads:

mysql> START SLAVE;
Query OK, 0 rows affected (0.00 sec)

Let’s check if it worked fine. We will, again, use SHOW SLAVE STATUS output:

           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 5d1e2227-07c6-11e7-8123-080027495a77:1106669
            Executed_Gtid_Set: 5d1e2227-07c6-11e7-8123-080027495a77:1-1106669

Looks good, it’s up and running!

Another method of solving this problem will be to take a backup one more time and provision the slave again, using fresh data. This will quite likely be faster and definitely more reliable. It is not often that you have different binlog purge policies on master and on slaves)

We will continue discussing other types of replication issues in the next blog post.

Tags:

replication

In the previous post, we discussed how to verify that MySQL Replication is in good shape. We also looked at some of the typical problems. In this post, we will have a look at some more issues that you might see when dealing with MySQL replication.

Missing or Duplicated Entries

This is something which should not happen, yet it happens very often - a situation in which an SQL statement executed on the master succeeds but the same statement executed on one of slaves fails. Main reason is slave drift - something (usually errant transactions but also other issues or bugs in the replication) causes the slave to differ from its master. For example, a row which existed on the master does not exist on a slave and it cannot be deleted or updated. How often this problem shows up depends mostly on your replication settings. In short, there are three ways in which MySQL stores binary log events. First, “statement”, means that SQL is written in plain text, just as it has been executed on a master. This setting has the highest tolerance on slave drift but it’s also the one which cannot guarantee slave consistency - it’s hard to recommend to use it in production. Second format, “row”, stores the query result instead of query statement. For example, an event may look like below:

### UPDATE `test`.`tab`
### WHERE
###   @1=2
###   @2=5
### SET
###   @1=2
###   @2=4

This means that we are updating a row in ‘tab’ table in ‘test’ schema where first column has a value of 2 and second column has a value of 5. We set first column to 2 (value doesn’t change) and second column to 4. As you can see, there’s not much room for interpretation - it’s precisely defined which row is used and how it’s changed. As a result, this format is great for slave consistency but, as you can imagine, it’s very vulnerable when it comes to data drift. Still it is the recommended way of running MySQL replication.

Finally, the third one, “mixed”, works in a way that those events which are safe to write in the form of statements use “statement” format. Those which could cause data drift will use “row” format.

How do you detect them?

As usual, SHOW SLAVE STATUS will help us identify the problem.

               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Update_rows event on table test.tab; Can't find record in 'tab', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log binlog.000021, end_log_pos 970

               Last_SQL_Errno: 1062
               Last_SQL_Error: Could not execute Write_rows event on table test.tab; Duplicate entry '3' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000021, end_log_pos 1229

As you can see, errors are clear and self-explanatory (and they are basically identical between MySQL and MariaDB.

How do you fix the issue?

This is, unfortunately the complex part. First of all, you need to identify a source of truth. Which host contains the correct data? Master or slave? Usually you’d assume it’s the master but don’t assume it by default - investigate! It could be that after failover, some part of the application still issued writes to the old master, which now acts as a slave. It could be that read_only hasn’t been set correctly on that host or maybe the application uses superuser to connect to database (yes, we’ve seen this in production environments). In such case, the slave could be the source of truth - at least to some extent.

Depending on which data should stay and which should go, the best course of action would be to identify what’s needed to get replication back in sync. First of all, replication is broken so you need to attend to this. Log into the master and check the binary log even that caused replication to break.

           Retrieved_Gtid_Set: 5d1e2227-07c6-11e7-8123-080027495a77:1106672
            Executed_Gtid_Set: 5d1e2227-07c6-11e7-8123-080027495a77:1-1106671

As you can see, we miss one event: 5d1e2227-07c6-11e7-8123-080027495a77:1106672. Let’s check it in the master’s binary logs:

mysqlbinlog -v --include-gtids='5d1e2227-07c6-11e7-8123-080027495a77:1106672' /var/lib/mysql/binlog.000021
#170320 20:53:37 server id 1  end_log_pos 1066 CRC32 0xc582a367     GTID    last_committed=3    sequence_number=4
SET @@SESSION.GTID_NEXT= '5d1e2227-07c6-11e7-8123-080027495a77:1106672'/*!*/;
# at 1066
#170320 20:53:37 server id 1  end_log_pos 1138 CRC32 0x6f33754d     Query    thread_id=5285    exec_time=0    error_code=0
SET TIMESTAMP=1490043217/*!*/;
SET @@session.pseudo_thread_id=5285/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 1138
#170320 20:53:37 server id 1  end_log_pos 1185 CRC32 0xa00b1f59     Table_map: `test`.`tab` mapped to number 571
# at 1185
#170320 20:53:37 server id 1  end_log_pos 1229 CRC32 0x5597e50a     Write_rows: table id 571 flags: STMT_END_F

BINLOG '
UUHQWBMBAAAALwAAAKEEAAAAADsCAAAAAAEABHRlc3QAA3RhYgACAwMAAlkfC6A=
UUHQWB4BAAAALAAAAM0EAAAAADsCAAAAAAEAAgAC//wDAAAABwAAAArll1U='/*!*/;
### INSERT INTO `test`.`tab`
### SET
###   @1=3
###   @2=7
# at 1229
#170320 20:53:37 server id 1  end_log_pos 1260 CRC32 0xbbc3367c     Xid = 5224257
COMMIT/*!*/;

We can see it was an insert which sets first column to 3 and second to 7. Let’s verify how our table looks like now:

mysql> SELECT * FROM test.tab;
+----+------+
| id | b    |
+----+------+
|  1 |    2 |
|  2 |    4 |
|  3 |   10 |
+----+------+
3 rows in set (0.01 sec)

Now we have two options, depending on which data should prevail. If correct data is on the master, we can simply delete row with id=3 on the slave. Just make sure you disable binary logging to avoid introducing errant transactions. On the other hand, if we decided that the correct data is on the slave, we need to run REPLACE command on the master to set row with id=3 to correct content of (3, 10) from current (3, 7). On the slave, though, we will have to skip current GTID (or, to be more precise, we will have to create an empty GTID event) to be able to restart replication.

Deleting a row on a slave is simple:

SET SESSION log_bin=0; DELETE FROM test.tab WHERE id=3; SET SESSION log_bin=1;

Inserting an empty GTID is almost as simple:

mysql> SET @@SESSION.GTID_NEXT= '5d1e2227-07c6-11e7-8123-080027495a77:1106672';
Query OK, 0 rows affected (0.00 sec)

mysql> BEGIN;
Query OK, 0 rows affected (0.00 sec)

mysql> COMMIT;
Query OK, 0 rows affected (0.00 sec)

mysql> SET @@SESSION.GTID_NEXT=automatic;
Query OK, 0 rows affected (0.00 sec)

Another method of solving this particular issue (as long as we accept the master as a source of truth) is to use tools like pt-table-checksum and pt-table-sync to identify where the slave is not consistent with its master and what SQL has to be executed on the master to bring the slave back in sync. Unfortunately, this method is rather on the heavy side - lots of load is added to master and a bunch of queries are written into the replication stream which may affect lag on slaves and general performance of the replication setup. This is especially true if there is a significant number of rows which need to be synced.

Finally, as always, you can rebuild your slave using data from the master - in this way you can be sure that the slave will be refreshed with the freshest, up-to-date data. This is, actually, not necessarily a bad idea - when we are talking about large number of rows to sync using pt-table-checksum/pt-table-sync, this comes with significant overhead in replication performance, overall CPU and I/O load and man-hours required.

ClusterControl allows you to rebuild a slave, using a fresh copy of the master data.

Consistency checks

As we mentioned in the previous chapter, consistency can become a serious issue and can cause lots of headaches for users running MySQL replication setups. Let’s see how you can verify that your MySQL slaves are in sync with the master and what you can do about it.

How to detect an inconsistent slave

Unfortunately, the typical way an user gets to know that a slave is inconsistent is by running into one of the issues we mentioned in the previous chapter. To avoid that proactive monitoring of slave consistency is required. Let’s check how it can be done.

We are going to use a tool from Percona Toolkit: pt-table-checksum. It is designed to scan replication cluster and identify any discrepancies.

We built a custom scenario using sysbench and we introduced a bit of inconsistency on one of the slaves. What’s important (if you’d like to test it like we did), you need to apply a patch below to force pt-table-checksum to recognize ‘sbtest’ schema as non-system schema:

--- pt-table-checksum    2016-12-15 14:31:07.000000000 +0000
+++ pt-table-checksum-fix    2017-03-21 20:32:53.282254794 +0000
@@ -7614,7 +7614,7 @@

    my $filter = $self->{filters};

-   if ( $db =~ m/information_schema|performance_schema|lost\+found|percona|percona_schema|test/ ) {
+   if ( $db =~ m/information_schema|performance_schema|lost\+found|percona|percona_schema|^test/ ) {
       PTDEBUG && _d('Database', $db, 'is a system database, ignoring');
       return 0;
    }

At first, we are going to execute pt-table-checksum in following way:

master:~# ./pt-table-checksum  --max-lag=5 --user=sbtest --password=sbtest --no-check-binlog-format --databases='sbtest'
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-21T20:33:30      0      0  1000000      15       0  27.103 sbtest.sbtest1
03-21T20:33:57      0      1  1000000      17       0  26.785 sbtest.sbtest2
03-21T20:34:26      0      0  1000000      15       0  28.503 sbtest.sbtest3
03-21T20:34:52      0      0  1000000      18       0  26.021 sbtest.sbtest4
03-21T20:35:34      0      0  1000000      17       0  42.730 sbtest.sbtest5
03-21T20:36:04      0      0  1000000      16       0  29.309 sbtest.sbtest6
03-21T20:36:42      0      0  1000000      15       0  38.071 sbtest.sbtest7
03-21T20:37:16      0      0  1000000      12       0  33.737 sbtest.sbtest8

Couple of important notes on how we invoked the tool. First of all, user that we set has to exists on all slaves. If you want, you can also use ‘--slave-user’ to define other, less privileged user to access slaves. Another thing worth explaining - we use row-based replication which is not fully compatible with pt-table-checksum. If you have row-based replication, what happens is pt-table-checksum will change binary log format on a session level to ‘statement’ as this is the only format supported. The problem is that such change will work only on a first level of slaves which are directly connected to a master. If you have intermediate masters (so, more than one level of slaves), using pt-table-checksum may break the replication. This is why, by default, if the tool detects row-based replication, it exits and prints error:

“Replica slave1 has binlog_format ROW which could cause pt-table-checksum to break replication. Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation. If you understand the risks, specify --no-check-binlog-format to disable this check.”

We used only one level of slaves so it was safe to specify “--no-check-binlog-format” and move forward.

Finally, we set maximum lag to 5 seconds. If this threshold will be reached, pt-table-checksum will pause for a time needed to bring the lag under the threshold.

As you could see from the output,

03-21T20:33:57      0      1  1000000      17       0  26.785 sbtest.sbtest2

an inconsistency has been detected on table sbtest.sbtest2.

By default, pt-table-checksum stores checksums in percona.checksums table. This data can be used for another tool from Percona Toolkit, pt-table-sync, to identify which parts of the table should be checked in detail to find exact difference in data.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

How to fix inconsistent slave

As mentioned above, we will use pt-table-sync to do that. In our case we are going to use data collected by pt-table-checksum although it is also possible to point pt-table-sync to two hosts (the master and a slave) and it will compare all data on both hosts. It is definitely more time- and resource-consuming process therefore, as long as you have already data from pt-table-checksum, it’s much better to use it. This is how we executed it to test the output:

master:~# ./pt-table-sync --user=sbtest --password=sbtest --databases=sbtest --replicate percona.checksums h=master --print

REPLACE INTO `sbtest`.`sbtest2`(`id`, `k`, `c`, `pad`) VALUES ('1', '434041', '61753673565-14739672440-12887544709-74227036147-86382758284-62912436480-22536544941-50641666437-36404946534-73544093889', '23608763234-05826685838-82708573685-48410807053-00139962956') /*percona-toolkit src_db:sbtest src_tbl:sbtest2 src_dsn:h=10.0.0.101,p=...,u=sbtest dst_db:sbtest dst_tbl:sbtest2 dst_dsn:h=10.0.0.103,p=...,u=sbtest lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:25776 user:root host:vagrant-ubuntu-trusty-64*/;

As you can see, as a result some SQL has been generated. Important to note is --replicate variable. What happens here is we point pt-table-sync to table generated by pt-table-checksum. We also point it to master.

To verify if SQL makes sense we used --print option. Please note SQL generated is valid only at the time it’s generated - you cannot really store it somewhere, review it and then execute. All you can do is to verify if the SQL makes any sense and, immediately after, reexecute tool with --execute flag:

master:~# ./pt-table-sync --user=sbtest --password=sbtest --databases=sbtest --replicate percona.checksums h=10.0.0.101 --execute

This should make slave back in sync with the master. We can verify it with pt-table-checksum:

root@vagrant-ubuntu-trusty-64:~# ./pt-table-checksum  --max-lag=5 --user=sbtest --password=sbtest --no-check-binlog-format --databases='sbtest'
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
03-21T21:36:04      0      0  1000000      13       0  23.749 sbtest.sbtest1
03-21T21:36:26      0      0  1000000       7       0  22.333 sbtest.sbtest2
03-21T21:36:51      0      0  1000000      10       0  24.780 sbtest.sbtest3
03-21T21:37:11      0      0  1000000      14       0  19.782 sbtest.sbtest4
03-21T21:37:42      0      0  1000000      15       0  30.954 sbtest.sbtest5
03-21T21:38:07      0      0  1000000      15       0  25.593 sbtest.sbtest6
03-21T21:38:27      0      0  1000000      16       0  19.339 sbtest.sbtest7
03-21T21:38:44      0      0  1000000      15       0  17.371 sbtest.sbtest8

As you can see, there are no diffs anymore in sbtest.sbtest2 table.

We hope you found this blog post informative and useful. Click here to learn more about MySQL Replication. If you have any questions or suggestions, feel free to reach us through comments below.

Tags:

replication