Channel: Severalnines - MariaDB
Viewing all 327 articles
Browse latest View live

Building a MySQL or MariaDB Database Cold Standby on Amazon AWS


High Availability is a must these days as most organizations can’t allow itself to lose its data. High Availability, however, always comes with a price tag (which can vary a lot.) Any setups which require nearly-immediate action would typically require an expensive environment which would mirror precisely the production setup. But, there are other options that can be less expensive. These may not allow for an immediate switch to a disaster recovery cluster, but they will still allow for business continuity (and won’t drain the budget.) 

An example of this type of setup is a “cold-standby” DR environment. It allows you to reduce your expenses while still being able to spin up a new environment in an external location should the disaster strikes. In this blog post we will demonstrate how to create such a setup.

The Initial Setup

Let’s assume we have a fairly standard Master / Slave MySQL Replication setup in our own datacenter. It is highly available setup with ProxySQL and Keepalived for Virtual IP handling. The main risk is that the datacenter will become unavailable. It is a small DC, maybe it’s only one ISP with no BGP in place. And in this situation, we will assume that if it would take hours to bring back the database that it’s ok as long as it’s possible to bring it back.

ClusterControl Cluster Topology

To deploy this cluster we used ClusterControl, which you can download for free. For our DR environment we will use EC2 (but it could also be any other cloud provider.)

The Challenge

The main issue we have to deal with is how should we ensure we do have a fresh data to restore our database in the disaster recovery environment? Of course, ideally we would have a replication slave up and running in EC2... but then we have to pay for it. If we are tight on the budget, we could try to get around that with backups. This is not the perfect solution as, in the worst case scenario, we will never be able to recover all the data. 

By “the worst case scenario” we mean a situation in which we won’t have access to the original database servers. If we will be able to reach them, data would not have been lost.

The Solution

We are going to use ClusterControl to setup a backup schedule to reduce the chance that the data would be lost. We will also use the ClusterControl feature to upload backups to the cloud. If the datacenter will not be available, we can hope that the cloud provider we have chosen will be reachable.

Setting up the Backup Schedule in ClusterControl

First, we will have to configure ClusterControl with our cloud credentials.

ClusterControl Cloud Credentials

We can do this by using “Integrations” from the left side menu.

ClusterControl Add Cloud Credentials

You can pick Amazon Web Services, Google Cloud or Microsoft Azure as the cloud you want ClusterControl to upload backups to. We will go ahead with AWS where ClusterControl will use S3 to store backups.

Add Cloud Credentials in ClusterControl

We then need to pass key ID and key secret, pick the default region and pick a name for this set of credentials.

AWS Cloud Integration Successful - ClusterControl

Once this is done, we can see the credentials we just added listed in ClusterControl.

Now, we shall proceed with setting up backup schedule.

Backup Scheduling ClusterControl

ClusterControl allows you to either create backup immediately or schedule it. We’ll go with the second option. What we want is to create a following schedule:

  1. Full backup created once per day
  2. Incremental backups created every 10 minutes.

The idea here is like follows. Worst case scenario we will lose only 10 minutes of the traffic. If the datacenter will become unavailable from outside but it would work internally, we could try to avoid any data loss by waiting 10 minutes, copying the latest incremental backup on some laptop and then we can manually send it towards our DR database using even phone tethering and a cellular connection to go around ISP failure. If we won’t be able to get the data out of the old datacenter for some time, this is intended to minimize the amount of transactions we will have to manually merge into DR database.

Create Backup Schedule in ClusterControl

We start with full backup which will happen daily at 2:00 am. We will use the master to take the backup from, we will store it on controller under /root/backups/ directory. We will also enable “Upload Backup to the cloud” option.

Backup Settings in ClusterControl

Next, we want to make some changes in the default configuration. We decided to go with automatically selected failover host (in case our master would be unavailable, ClusterControl will use any other node which is available). We also wanted to enable encryption as we will be sending our backups over the network.

Cloud Settings for Backup Scheduling in ClusterControl

Then we have to pick the credentials, select existing S3 bucket or create a new one if needed.

Create Backup in ClusterControl

We are basically repeating the process for the incremental backup, this time we used the “Advanced” dialog to run the backups every 10 minutes.

The rest of the settings is similar, we also can reuse the S3 bucket.

ClusterControl Cluster Details

The backup schedule looks as above. We don’t have to start full backup manually, ClusterControl will run incremental backup as scheduled and if it detects there’s no full backup available, it will run a full backup instead of the incremental.

With such setup we can be safe to say that we can recover the data on any external system with 10 minute granularity.

Manual Backup Restore

If it happens that you will need to restore the backup on the disaster recovery instance, there are a couple of steps you have to take. We strongly recommend to test this process from time to time, ensuring it works correctly and you are proficient in executing it.

First, we have to install AWS command line tool on our target server:

root@vagrant:~# apt install python3-pip

root@vagrant:~# pip3 install awscli --upgrade --user

Then we have to configure it with proper credentials:

root@vagrant:~# ~/.local/bin/aws configure

AWS Access Key ID [None]: yourkeyID

AWS Secret Access Key [None]: yourkeySecret

Default region name [None]: us-west-1

Default output format [None]: json

We can now test if we have the access to the data in our S3 bucket:

root@vagrant:~# ~/.local/bin/aws s3 ls s3://drbackup/

                           PRE BACKUP-1/

                           PRE BACKUP-2/

                           PRE BACKUP-3/

                           PRE BACKUP-4/

                           PRE BACKUP-5/

                           PRE BACKUP-6/

                           PRE BACKUP-7/

Now, we have to download the data. We will create directory for the backups - remember, we have to download whole backup set - starting from a full backup to the last incremental we want to apply.

root@vagrant:~# mkdir backups

root@vagrant:~# cd backups/

Now there are two options. We can either download backups one by one:

root@vagrant:~# ~/.local/bin/aws s3 cp s3://drbackup/BACKUP-1/ BACKUP-1 --recursive

download: s3://drbackup/BACKUP-1/cmon_backup.metadata to BACKUP-1/cmon_backup.metadata

Completed 30.4 MiB/36.2 MiB (4.9 MiB/s) with 1 file(s) remaining

download: s3://drbackup/BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256 to BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256

root@vagrant:~# ~/.local/bin/aws s3 cp s3://drbackup/BACKUP-2/ BACKUP-2 --recursive

download: s3://drbackup/BACKUP-2/cmon_backup.metadata to BACKUP-2/cmon_backup.metadata

download: s3://drbackup/BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256 to BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256

We can also, especially if you have tight rotation schedule, sync all contents of the bucket with what we have locally on the server:

root@vagrant:~/backups# ~/.local/bin/aws s3 sync s3://drbackup/ .

download: s3://drbackup/BACKUP-2/cmon_backup.metadata to BACKUP-2/cmon_backup.metadata

download: s3://drbackup/BACKUP-4/cmon_backup.metadata to BACKUP-4/cmon_backup.metadata

download: s3://drbackup/BACKUP-3/cmon_backup.metadata to BACKUP-3/cmon_backup.metadata

download: s3://drbackup/BACKUP-6/cmon_backup.metadata to BACKUP-6/cmon_backup.metadata

download: s3://drbackup/BACKUP-5/cmon_backup.metadata to BACKUP-5/cmon_backup.metadata

download: s3://drbackup/BACKUP-7/cmon_backup.metadata to BACKUP-7/cmon_backup.metadata

download: s3://drbackup/BACKUP-3/backup-incr-2019-08-20_115005.xbstream.gz.aes256 to BACKUP-3/backup-incr-2019-08-20_115005.xbstream.gz.aes256

download: s3://drbackup/BACKUP-1/cmon_backup.metadata to BACKUP-1/cmon_backup.metadata

download: s3://drbackup/BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256 to BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256

download: s3://drbackup/BACKUP-7/backup-incr-2019-08-20_123008.xbstream.gz.aes256 to BACKUP-7/backup-incr-2019-08-20_123008.xbstream.gz.aes256

download: s3://drbackup/BACKUP-6/backup-incr-2019-08-20_122008.xbstream.gz.aes256 to BACKUP-6/backup-incr-2019-08-20_122008.xbstream.gz.aes256

download: s3://drbackup/BACKUP-5/backup-incr-2019-08-20_121007.xbstream.gz.aes256 to BACKUP-5/backup-incr-2019-08-20_121007.xbstream.gz.aes256

download: s3://drbackup/BACKUP-4/backup-incr-2019-08-20_120007.xbstream.gz.aes256 to BACKUP-4/backup-incr-2019-08-20_120007.xbstream.gz.aes256

download: s3://drbackup/BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256 to BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256

As you remember, the backups are encrypted. We have to have encryption key which is stored in ClusterControl. Make sure you have its copy stored somewhere safe, outside of the main datacenter. If you cannot reach it, you won’t be able to decrypt backups. The key can be found in ClusterControl configuration:

root@vagrant:~# grep backup_encryption_key /etc/cmon.d/cmon_1.cnf


It is encoded using base64 thus we have to decode it first and store it in the file before we can start decrypting the backup:

echo "aoxhIelVZr1dKv5zMbVPLxlLucuYpcVmSynaeIEeBnM=" | openssl enc -base64 -d > pass

Now we can reuse this file to decrypt backups. For now, let’s say we will do one full and two incremental backups. 

mkdir 1

mkdir 2

mkdir 3

cat BACKUP-1/backup-full-2019-08-20_113009.xbstream.gz.aes256 | openssl enc -d -aes-256-cbc -pass file:/root/backups/pass | zcat | xbstream -x -C /root/backups/1/

cat BACKUP-2/backup-incr-2019-08-20_114009.xbstream.gz.aes256 | openssl enc -d -aes-256-cbc -pass file:/root/backups/pass | zcat | xbstream -x -C /root/backups/2/

cat BACKUP-3/backup-incr-2019-08-20_115005.xbstream.gz.aes256 | openssl enc -d -aes-256-cbc -pass file:/root/backups/pass | zcat | xbstream -x -C /root/backups/3/

We have the data decrypted, now we have to proceed with setting up our MySQL server. Ideally, this should be exactly the same version as on the production systems. We will use Percona Server for MySQL:

cd ~
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb

sudo dpkg -i percona-release_latest.generic_all.deb

apt-get update

apt-get install percona-server-5.7

Nothing complex, just regular installation. Once it’s up and ready we have to stop it and remove the contents of its data directory.

service mysql stop

rm -rf /var/lib/mysql/*

To restore the backup we will need Xtrabackup - a tool CC uses to create it (at least for Perona and Oracle MySQL, MariaDB uses MariaBackup). It is important that this tool is installed in the same version as on the production servers:

apt install percona-xtrabackup-24

That’s all we have to prepare. Now we can start restoring the backup. With incremental backups it is important to keep in mind that you have to prepare and apply them on top of the base backup. Base backup also has to be prepared. It is crucial to run the prepare with ‘--apply-log-only’ option to prevent xtrabackup from running the rollback phase. Otherwise you won’t be able to apply next incremental backup.

xtrabackup --prepare --apply-log-only --target-dir=/root/backups/1/

xtrabackup --prepare --apply-log-only --target-dir=/root/backups/1/ --incremental-dir=/root/backups/2/

xtrabackup --prepare --target-dir=/root/backups/1/ --incremental-dir=/root/backups/3/

In the last command we allowed xtrabackup to run the rollback of not completed transactions - we won’t be applying any more incremental backups afterwards. Now it is time to populate the data directory with the backup, start the MySQL and see if everything works as expected:

root@vagrant:~/backups# mv /root/backups/1/* /var/lib/mysql/

root@vagrant:~/backups# chown -R mysql.mysql /var/lib/mysql

root@vagrant:~/backups# service mysql start

root@vagrant:~/backups# mysql -ppass

mysql: [Warning] Using a password on the command line interface can be insecure.

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 6

Server version: 5.7.26-29 Percona Server (GPL), Release '29', Revision '11ad961'

Copyright (c) 2009-2019 Percona LLC and/or its affiliates

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective


Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show schemas;


| Database           |


| information_schema |

| mysql              |

| performance_schema |

| proxydemo          |

| sbtest             |

| sys                |


6 rows in set (0.00 sec)

mysql> select count(*) from sbtest.sbtest1;


| count(*) |


|    10506 |


1 row in set (0.01 sec)

As you can see, all is good. MySQL started correctly and we were able to access it (and the data is there!) We successfully managed to bring our database back up-and-running in a separate location. The total time required depends strictly on the size of the data - we had to download data from S3, decrypt and decompress it and finally prepare the backup. Still, this is a very cheap option (you have to pay for S3 data only) which gives you an option for business continuity should a disaster strikes.


Comparing Galera Cluster Cloud Offerings: Part One Amazon AWS


Running a MySQL Galera Cluster (either the Percona, MariaDB, or Codership build) is, unfortunately, not a  supported (nor part of) the databases supported by Amazon RDS. Most of the databases supported by RDS use asynchronous replication, while Galera Cluster is a synchronous multi-master replication solution. Galera also requires InnoDB as its storage engine to function properly, and while you can use other storage engines such as MyISAM it is not advised that you use this storage engine because of the lack of transaction handling. 

Because of the lack of support natively in RDS, this blog will focus on the offerings available when choosing and hosting your Galera-based cluster using an AWS environment.

There are certainly many reasons why you would choose or not choose the AWS cloud platform, but for this particular topic we’re going to go over the advantages and benefits of what you can leverage rather than why you would choose the AWS Platform.

The Virtual Servers (Elastic Compute Instances)

As mentioned earlier, MySQL Galera is not part of RDS and InnoDB is a transactional storage engine for which you need the right resources for your application requirement. It must have the capacity to serve the demand of your client request traffic. At the time of this article, your sole choice for running Galera Cluster is by using EC2, Amazon's compute instance cloud offering. 

Because you have the advantage of running your system on a number of nodes on EC2 instances, running a Galera Cluster on EC2 verses on-prem doesn’t differ much. You can access the server remotely via SSH, install your desired software packages, and choose the kind of Galera Cluster build you like to utilize. 

Moreover, with EC2 this offering is more elastic and flexible, allowing you to deliver and offer a simpler,  granular setup. You can take advantage of the web services to automate or build a number of nodes if you need to scaleout your environment, or for example, automate the building of your staging or development environment. It also gives you an edge to quickly build your desired environment, choose and setup your desired OS, and pickup the right computing resources that fits your requirements (such as CPU, memory, and disk storage.) EC2 eliminates the time to wait for hardware, since you can do this on the fly. You can also leverage their AWS CLI tool to automate your Galera cluster setup.

Pricing for Amazon EC2 Instances

EC2 offers a number of selections which are very flexible for consumers who would like to host their Galera Cluster environment on AWS compute nodes. The AWS Free Tier includes 750 hours of Linux and Windows t2.micro instances, each month, for one year. You can stay within the Free Tier by using only EC2 Micro instances, but this might not be the best thing for production use. 

There are multiple types of EC2 instances for which you can deploy when provisioning your Galera nodes. Ideally, these r4/r5/x1 family (memory optimized) and c4/c5 family (compute optimized) are an ideal choice, and these prices differ depending on how large your server resource needs are and type of OS.

These are the types of paid instances you can choose...

On Demand 

Pay by compute capacity (per-hour or per-second), depends on the type of instances you run. For example, prices might differ when provisioning an Ubuntu instances vs RHEL instance aside from the type of instance. It has no long-term commitments or upfront payments needed. It also has the flexibility to increase or decrease your compute capacity. These instances are recommended for low cost and flexible environment needs like applications with short-term, spiky, or unpredictable workloads that cannot be interrupted, or applications being developed or tested on Amazon EC2 for the first time. Check it out here for more info.

Dedicated Hosts

If you are looking for compliance and regulatory requirements such as the need to acquire a dedicated server that runs on a dedicated hardware for use, this type of offer suits your needs. Dedicated Hosts can help you address compliance requirements and reduce costs by allowing you to use your existing server-bound software license, including Windows Server, SQL Server, SUSE Linux Enterprise Server, Red Hat Enterprise Linux, or other software licenses that are bound to VMs, sockets, or physical cores, subject to your license terms. It can be purchased On-Demand (hourly) or as a Reservation for up to 70% off the On-Demand price. Check it out here for more info.

Spot Instances

These instances allow you to request spare Amazon EC2 computing capacity for up to 90% off the On-Demand price. This is recommended for applications that have flexible start and end times, applications that are only feasible at very low compute prices, or users with urgent computing needs for large amounts of additional capacity. Check it out here for more info.

Reserved Instances

This type of payment offer provides you the option to grab up to a 75% discount and, depending on which instance you would like to reserve, you can acquire a capacity reservation giving you additional confidence in your ability to launch instances when you need them. This is recommended if your applications have steady state or predictable usage, applications that may require reserved capacity, or customers that can commit to using EC2 over a 1 or 3 year term to reduce their total computing costs. Check it out here for more info.

Pricing Note

One last thing with EC2, they also offer a per-second billing which also takes cost of unused minutes and seconds in an hour off of the bill. This is advantageous if you are scaling-out for a minimal amount of time, just to handle traffic request from a Galera node or in case you want to try and test on a specific node for just a limited time use.

Database Encryption on AWS

If you're concerned about the confidentiality of your data, or abiding the laws required for your security compliance and regulations, AWS offers data-at-rest encryption. If you're using MariaDB Cluster version 10.2+, they have built-in plugin support to interface with the Amazon Web Services (AWS) Key Management Service (KMS) API. This allows you to take advantage of AWS-KMS key management service to facilitate separation of responsibilities and remote logging & auditing of key access requests. Rather than storing the encryption key in a local file, this plugin keeps the master key in AWS KMS. 

When you first start MariaDB, the AWS KMS plugin will connect to the AWS Key Management Service and ask it to generate a new key. MariaDB will store that key on-disk in an encrypted form. The key stored on-disk cannot be used to decrypt the data; rather, on each startup, MariaDB connects to AWS KMS and has the service decrypt the locally-stored key(s). The decrypted key is stored in-memory as long as the MariaDB server process is running, and that in-memory decrypted key is used to encrypt the local data.

Alternatively, when deploying your EC2 instances, you can encrypt your data storage volume with EBS (Elastic Block Storage) or encrypt the instance itself. Encryption for EBS type volumes are all supported, though it might have an impact but the latency is very minimal or even not visible to the end users. For EC2 instance-type encryption, most of the large instances are supported. So if you're using compute or memory optimized nodes, you can leverage its encryption. 

Below are the list of supported instances types...

  • General purpose: A1, M3, M4, M5, M5a, M5ad, M5d, T2, T3, and T3a
  • Compute optimized: C3, C4, C5, C5d, and C5n
  • Memory optimized: cr1.8xlarge, R3, R4, R5, R5a, R5ad, R5d, u-6tb1.metal, u-9tb1.metal, u-12tb1.metal, X1, X1e, and z1d
  • Storage optimized: D2, h1.2xlarge, h1.4xlarge, I2, and I3
  • Accelerated computing: F1, G2, G3, P2, and P3

You can setup your AWS account to always enable encryption upon deployment of your EC2-type instances. This means that AWS will encrypt new EBS volumes on launch and encrypts new copies of unencrypted snapshots.

Multi-AZ/Multi-Region/Multi-Cloud Deployments

Unfortunately, as of this writing, there's no such direct support in the AWS Console (nor any of their AWS API) that supports Multi-AZ/-Region/-Cloud deployments for Galera node clusters. 

High Availability, Scalability, and Redundancy

To achieve a multi-AZ deployment, it's recommendable that you provision your galera nodes in different availability zones. This prevents the cluster from going down or a cluster malfunction due to lack of quorum. 

You can also setup an AWS Auto Scaling and create an auto scaling group to monitor and do status checks so your cluster will always have redundancy, scalable, and highly availability. Auto Scaling should solve your problem in the case that your node goes down for some unknown reason.

For multi-region or multi-cloud deployment, Galera has its own parameter called gmcast.segment for which you can set this upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments including writeset relaying and IST and SST donor selection. 

This type of setup allows you to deploy multiple nodes in different regions for your Galera Cluster. Aside from that, you can also deploy your Galera nodes on a different vendor, for example, if it's hosted in Google Cloud and you want redundancy on Microsoft Azure. 

I would recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Database Performance on AWS

Depending on your application demand, if your queries memory consuming the memory optimized instances are your ideal choice. If your application has higher transactions that require high-performance for web servers or batch processing, then choose compute optimized instances. If you want to learn more about optimizing your Galera Cluster, you can check out this blog How to Improve Performance of Galera Cluster for MySQL or MariaDB.

Database Backups on AWS

Creating backups can be difficult since there's no direct support within AWS that is specific for MySQL Galera technology. However, AWS provides you a disaster and recovery solution using EBS Snapshots. You can take snapshots of the EBS volumes attached to your instance, then either take a backup by schedule using CloudWatch or by using the Amazon Data Lifecycle Manager (Amazon DLM) to automate the snapshots. 

Take note that the snapshots taken are incremental backups, which means that only the blocks on the device that have changed after your most recent snapshot are saved. You can store these snapshots to AWS S3 to save storage costs. Alternatively,  you can use external tools like Percona Xtrabackup, and Mydumper (for logical backups) and store these to AWS EFS -> AWS S3 -> AWS Glacier

You can also setup Lifecycle Management in AWS if you need your backup data to be stored in a more cost efficient manner. If you have large files and are going to utilize the AWS EFS, you can leverage their AWS Backup solution as this is also a simple yet cost-effective solution.

On the other hand, you can also use external services (as well such as ClusterControl) which provides you both monitoring and backup solutions. Check this out if you want to know more.

Database Monitoring on AWS

AWS offers health checks and some status checks to provide you visibility into your Galera nodes. This is done through CloudWatch and CloudTrail

CloudTrail lets you enable and inspect the logs and perform audits based on what actions and traces have been made. 

CloudWatch lets you collect and track metrics, collect and monitor log files, and set custom alarms. You can set it up according to your custom needs and gain system-wide visibility into resource utilization, application performance, and operational health. CloudWatch comes with a free tier as long as you still fall within its limits (See the screenshot below.)

CloudWatch also comes with a price depending on the volume of metrics being distributed. Checkout its current pricing by checking here

Take note: there's a downside to using CloudWatch. It is not designed to cater to the database health, especially for monitoring MySQL Galera cluster nodes. Alternatively, you can use external tools that offer high-resolution graphs or charts that are useful in reporting and are easier to analyze when diagnosing a problematic node. 

For this you can use PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (as monitoring is FREE with ClusterControl Community.) I would recommend that you use a monitoring tool that suits your needs based on your individual application requirements. It's very important that your monitoring tool be able to notify you aggressively or provide you integration for instant messaging systems such as Slack, PagerDuty or even send you SMS when escalating severe health status.

Database Security on AWS

Securing your EC2 instances is one of the most vital parts of deploying your database into the public cloud. You can setup a private subnet and setup the required security groups only favored to allow the port  or source IP depending on your setup. You can set your database nodes with a non-remote access and just set up a jump host or an Internet Gateway, if nodes requires to access the internet to access or update software packages. You can read our previous blog Deploying Secure Multicloud MySQL Replication on AWS and GCP with VPN on how we set this up. 

In addition to this, you can secure your data in-transit by using TLS/SSL connection or encrypt your data when it's at rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, storing your data via S3 can be encrypted using AWS Server-Side Encryption or use AWS-KMS which I have discussed earlier. Check this external blog on how to setup and leverage a MariaDB Cluster using AWS-KMS so you can store your data securely at-rest.

Galera Cluster Troubleshooting on AWS

AWS CloudWatch can help especially when investigating and checking out the system metrics. You can check the network, CPU, memory, disk, and it's instance or compute usage and balance. This might not, however, meet your requirements when digging into a specific case. 

CloudTrail can perform solid traces of actions that has been governed based on your specific AWS account. This will help you determine if the occurrences aren't coming from MySQL Galera, but might be some bug or issues within the AWS environment (such as Hyper-V is having issues within the host machine where your instance, as the guest, is being hosted.)

If you're using ClusterControl, going to Logs -> System Logs, you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.


AWS does not have pure support for a MySQL Galera Cluster setup, unlike AWS RDS which has MySQL compatibility. Because of this most of the recommendations or opinions running a Galera Cluster for production use within the AWS environment are based on experienced and well-tested environments that have been running for a very long time. 

MariaDB Cluster comes with a great productivity, as they constantly provide concise support for the AWS technology stack solution. In the upcoming release of MariaDB 10.5 version, they will offer a support for S3 Storage Engine, which may be worth the wait.

External tools can help you manage and control your MySQL Galera Cluster running on the AWS Cloud, so it's not a huge concern if you have some dilemmas and FUD on why you should run or shift to the AWS Cloud Platform.

AWS might not be the one-size-fits-all solution in some cases, but it provides a wide-array of solutions that you can customize and tailor it to fit your needs. 

In the next part of our blog, we'll look at another public cloud platform, particularly Google Cloud and see how we can leverage if we choose to run our Galera Cluster into their platform.

Comparing Failover Times for Amazon Aurora, Amazon RDS, and ClusterControl


If your IT infrastructure is running on AWS, you have probably heard about Amazon Relational Database Service (RDS), an easy way to set up, operate, and scale a relational database in the cloud. It provides cost-effective and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching, and backups. There are a number of database engine offerings for RDS like MySQL, MariaDB, PostgreSQL, Microsoft SQL Server and Oracle Server.

ClusterControl 1.7.3 acts similarly to RDS as it supports database cluster deployment, management, monitoring, and scaling on the AWS platform. It also supports a number of other cloud platforms like Google Cloud Platform and Microsoft Azure. ClusterControl understands the database topology and is capable of performing automatic recovery, topology management, and many more advanced features to take control of your database.

In this blog post, we are going to compare automatic failover times for Amazon Aurora, Amazon RDS for MySQL, and a MySQL Replication setup deployed and managed by ClusterControl. The type of failover that we are going to do is slave promotion in case that the master goes down. This is where the most up-to-date slave takes over the master role in the cluster to resume the database service.

Our Failover Test

To measure the failover time, we are going to run a simple MySQL connect-update test, with a loop to count the SQL statement status that connect to a single database endpoint. The script looks like this:







while true


        echo -n "count $j : "

        num=$(od -A n -t d -N 1 /dev/urandom |tr -d '')

        timeout 1 bash -c "mysql -u${_user} -p${_pass} -h${_host} -P${_port} --connect-timeout=1 --disable-reconnect -A -Bse \

        \"UPDATE sbtest.sbtest1 SET k = $num WHERE id = 1\"> /dev/null 2> /dev/null"

        if [ $? -eq 0 ]; then

                echo "OK $(date)"


                echo "Fail ---- $(date)"


        j=$(( $j + 1 ))

        sleep 1


The above Bash script simply connects to a MySQL host and performs an update on a single row with a timeout of 1 second on both Bash and mysql client commands. The timeouts related parameters are required so we can measure the downtime in seconds correctly since mysql client defaults to always reconnect until it reaches the MySQL wait_timeout. We populated a test dataset with the following command beforehand:

$ sysbench \

/usr/share/sysbench/oltp_common.lua \

--db-driver=mysql \

--mysql-host={MYSQL HOST} \

--mysql-user=sbtest \

--mysql-db=sbtest \

--mysql-password=password \

--tables=50 \

--table-size=100000 \


The script reports whether the above query succeeded (OK) or failed (Fail). Sample outputs are shown further down.

Failover with Amazon RDS for MySQL

In our test, we use the lowest RDS offering with the following specs:

  • MySQL version: 5.7.22
  • vCPU: 4
  • RAM: 16 GB
  • Storage type: Provisioned IOPS (SSD)
  • IOPS: 1000
  • Storage: 100Gib
  • Multi-AZ Replication: Yes

After Amazon RDS provisions your DB instance, you can use any standard MySQL client application or utility to connect to the instance. In the connection string, you specify the DNS address from the DB instance endpoint as the host parameter, and specify the port number from the DB instance endpoint as the port parameter.

According to Amazon RDS documentation page, in the event of a planned or unplanned outage of your DB instance, Amazon RDS automatically switches to a standby replica in another Availability Zone if you have enabled Multi-AZ. The time it takes for the failover to complete depends on the database activity and other conditions at the time the primary DB instance became unavailable. Failover times are typically 60-120 seconds.

To initiate a multi-AZ failover in RDS, we performed a reboot operation with "Reboot with Failover" checked, as shown in the following screenshot:

Reboot AWS DB Instance

The following is what being observed by our application:


count 30 : OK Wed Aug 28 03:41:06 UTC 2019

count 31 : OK Wed Aug 28 03:41:07 UTC 2019

count 32 : Fail ---- Wed Aug 28 03:41:09 UTC 2019

count 33 : Fail ---- Wed Aug 28 03:41:11 UTC 2019

count 34 : Fail ---- Wed Aug 28 03:41:13 UTC 2019

count 35 : Fail ---- Wed Aug 28 03:41:15 UTC 2019

count 36 : Fail ---- Wed Aug 28 03:41:17 UTC 2019

count 37 : Fail ---- Wed Aug 28 03:41:19 UTC 2019

count 38 : Fail ---- Wed Aug 28 03:41:21 UTC 2019

count 39 : Fail ---- Wed Aug 28 03:41:23 UTC 2019

count 40 : Fail ---- Wed Aug 28 03:41:25 UTC 2019

count 41 : Fail ---- Wed Aug 28 03:41:27 UTC 2019

count 42 : Fail ---- Wed Aug 28 03:41:29 UTC 2019

count 43 : Fail ---- Wed Aug 28 03:41:31 UTC 2019

count 44 : Fail ---- Wed Aug 28 03:41:33 UTC 2019

count 45 : Fail ---- Wed Aug 28 03:41:35 UTC 2019

count 46 : OK Wed Aug 28 03:41:36 UTC 2019

count 47 : OK Wed Aug 28 03:41:37 UTC 2019


The MySQL downtime as seen by the application side was started from 03:41:09 until 03:41:36 which is around 27 seconds in total. From the RDS events, we can see the multi-AZ failover only happened 15 seconds after actual downtime:

Wed, 28 Aug 2019 03:41:24 GMT Multi-AZ instance failover started.

Wed, 28 Aug 2019 03:41:33 GMT DB instance restarted

Wed, 28 Aug 2019 03:41:59 GMT Multi-AZ instance failover completed.

Once the new database instance restarted around 03:41:33, the MySQL service was then accessible around 3 seconds later.

Failover with Amazon Aurora for MySQL

Amazon Aurora can be considered as a superior version of RDS, with a lot of notable features like faster replication with shared storage, no data loss during failover, and up to 64TB of a storage limit. Amazon Aurora for MySQL is based on the open source MySQL Edition, but is not open source by itself; it is a proprietary, closed-source database. It works similarly with MySQL replication (one and only one master, with multiple slaves) and failover is automatically handled by Amazon Aurora.

According to Amazon Aurora FAQS, if you have an Amazon Aurora Replica, in the same or a different Availability Zone, when failing over, Aurora flips the canonical name record (CNAME) for your DB Instance to point at the healthy replica, which is in turn is promoted to become the new primary. Start-to-finish, failover typically completes within 30 seconds.

If you do not have an Amazon Aurora Replica (i.e. single instance), Aurora will first attempt to create a new DB Instance in the same Availability Zone as the original instance. If unable to do so, Aurora will attempt to create a new DB Instance in a different Availability Zone. From start to finish, failover typically completes in under 15 minutes.

Your application should retry database connections in the event of connection loss.

After Amazon Aurora provisions your DB instance, you will get two endpoints one for the writer and one for the reader. The reader endpoint provides load-balancing support for read-only connections to the DB cluster. The following endpoints are taken from our test setup:

  • writer - aurora-sysbench.cluster-cw9j4kdnvun9.ap-southeast-1.rds.amazonaws.com
  • reader - aurora-sysbench.cluster-ro-cw9j4kdnvun9.ap-southeast-1.rds.amazonaws.com

In our test, we used the following Aurora specs:

  • Instance type: db.r5.large
  • MySQL version: 5.7.12
  • vCPU: 2
  • RAM: 16 GB
  • Multi-AZ Replication: Yes

To trigger a failover, simply pick the writer instance -> Actions -> Failover, as shown in the following screenshot:

Amazon Aurora Failover with SysBench

The following output is reported by our application while connecting to the Aurora writer endpoint:


count 37 : OK Wed Aug 28 12:35:47 UTC 2019

count 38 : OK Wed Aug 28 12:35:48 UTC 2019

count 39 : Fail ---- Wed Aug 28 12:35:49 UTC 2019

count 40 : Fail ---- Wed Aug 28 12:35:50 UTC 2019

count 41 : Fail ---- Wed Aug 28 12:35:51 UTC 2019

count 42 : Fail ---- Wed Aug 28 12:35:52 UTC 2019

count 43 : Fail ---- Wed Aug 28 12:35:53 UTC 2019

count 44 : Fail ---- Wed Aug 28 12:35:54 UTC 2019

count 45 : Fail ---- Wed Aug 28 12:35:55 UTC 2019

count 46 : OK Wed Aug 28 12:35:56 UTC 2019

count 47 : OK Wed Aug 28 12:35:57 UTC 2019


The database downtime was started at 12:35:49 until 12:35:56 with total amount of 7 seconds. That's pretty impressive. 

Looking at the database event from Aurora management console, only these two events happened:

Wed, 28 Aug 2019 12:35:50 GMT A new writer was promoted. Restarting database as a reader.

Wed, 28 Aug 2019 12:35:55 GMT DB instance restarted

It doesn't take much time for Aurora to promote a slave to become a master, and demote the master to become a slave. Note that all Aurora replicas share the same underlying volume with the primary instance and this means that replication can be performed in milliseconds as updates made by the primary instance are instantly available to all Aurora replicas. Therefore, it has minimal replication lag (Amazon claimed to be 100 milliseconds and less). This will greatly reduce the health check time and improve the recovery time significantly.

Failover with ClusterControl

In this example, we imitate a similar setup with Amazon RDS using m5.xlarge instances, with a ProxySQL in between to automate the failover from application using a single endpoint access just like RDS. The following diagram illustrates our architecture:

ClusterControl with ProxySQL

Since we are having direct access to the database instances, we would trigger an automatic failover by simply killing the MySQL process on the active master:

$ kill -9 $(pidof mysqld)

The above command triggered an automatic recovery inside ClusterControl:

[11:08:49]: Job Completed.

[11:08:44]: Flushing logs to update 'SHOW SLAVE HOSTS'

[11:08:39]: Flushing logs to update 'SHOW SLAVE HOSTS'

[11:08:39]: Failover Complete. New master is

[11:08:39]: Attaching slaves to new master.

[11:08:39]: Command 'RESET SLAVE /*!50500 ALL */' succeeded.

[11:08:39]: Executing 'RESET SLAVE /*!50500 ALL */'.

[11:08:39]: Successfully stopped slave.

[11:08:39]: Stopping slave.

[11:08:39]: Successfully stopped slave.

[11:08:39]: Stopping slave.

[11:08:38]: Setting read_only=OFF and super_read_only=OFF.

[11:08:38]: Successfully stopped slave.

[11:08:38]: Stopping slave.

[11:08:38]: Stopping slaves.

[11:08:38]: Completed preparations of candidate.

[11:08:38]: Applied 0 transactions. Remaining: .

[11:08:38]: waiting up to 4294967295 seconds before timing out.

[11:08:38]: Checking if the candidate has relay log to apply.

[11:08:38]: preparing candidate.

[11:08:38]: No errant transactions found.

[11:08:38]: Skipping, same as slave

[11:08:38]: Checking for errant transactions.

[11:08:37]: Setting read_only=ON and super_read_only=ON.

[11:08:37]: Can't connect to MySQL server on '' (115)

[11:08:37]: Setting read_only=ON and super_read_only=ON.

[11:08:37]: Failed to CREATE USER rpl_user. Error: Query  failed: Can't connect to MySQL server on '' (115).

[11:08:36]: Creating user 'rpl_user'@'

[11:08:36]: Executing GRANT REPLICATION SLAVE 'rpl_user'@''.

[11:08:36]: Creating user 'rpl_user'@'

[11:08:36]: Elected as the new Master.

[11:08:36]: Slave lag is 0 seconds.

[11:08:36]: to slave list

[11:08:36]: Checking if slave can be used as a candidate.

[11:08:33]: Trying to shutdown the failed master if it is up.

[11:08:32]: Setting read_only=ON and super_read_only=ON.

[11:08:31]: Setting read_only=ON and super_read_only=ON.

[11:08:30]: Setting read_only=ON and super_read_only=ON.

[11:08:30]: ioerrno=2003 io running 0

[11:08:30]: Checking

[11:08:30]: REPL_UNDEFINED


[11:08:30]: Failover to a new Master.

Job spec: Failover to a new Master.

While from our test application point-of-view, the downtime happened at the following time while connecting to ProxySQL host port 6033:


count 1 : OK Wed Aug 28 11:08:24 UTC 2019

count 2 : OK Wed Aug 28 11:08:25 UTC 2019

count 3 : OK Wed Aug 28 11:08:26 UTC 2019

count 4 : Fail ---- Wed Aug 28 11:08:28 UTC 2019

count 5 : Fail ---- Wed Aug 28 11:08:30 UTC 2019

count 6 : Fail ---- Wed Aug 28 11:08:32 UTC 2019

count 7 : Fail ---- Wed Aug 28 11:08:34 UTC 2019

count 8 : Fail ---- Wed Aug 28 11:08:36 UTC 2019

count 9 : Fail ---- Wed Aug 28 11:08:38 UTC 2019

count 10 : OK Wed Aug 28 11:08:39 UTC 2019

count 11 : OK Wed Aug 28 11:08:40 UTC 2019


By looking at both the recovery job events and the output from our application, the MySQL database node was down 4 seconds before the cluster recovery job starts, from 11:08:28 until 11:08:39, with total MySQL downtime of 11 seconds. One of the most impressive things about ClusterControl is, you can track the recovery progress on what action being taken and performed by ClusterControl during the failover. It provides a level of transparency that you won't be able to get with any database offerings by cloud providers.

For MySQL/MariaDB/PostgreSQL replication, ClusterControl allows you to have a more fine-grained against your databases with the support of the following advanced configuration and parameters:

  • Master-master replication topology management
  • Chain replication topology management
  • Topology viewer
  • Whitelist/Blacklist slaves to be promoted as master
  • Errant transaction checker
  • Pre/post, success/fail failover/switchover events hook with external script
  • Automatic rebuild slave on error
  • Scale out slave from existing backup

Failover Time Summary

In terms of failover time, Amazon RDS Aurora for MySQL is the clear winner with 7 seconds, followed by ClusterControl11 seconds and Amazon RDS for MySQL with 27 seconds

Note that this is just a simple test, with one client and one transaction per second to measure the fastest recovery time. Large transactions or a lengthy recovery process can increase failover time e.g, long running transactions may take long time rolling back when shutting down MySQL.


How to Use the Failover Mechanism of MaxScale


Ever since ClusterControl 1.2.11 was released in 2015, MariaDB MaxScale has been supported as a database load balancer. Over the years MaxScale has grown and matured, adding several rich features. Recently MariaDB MaxScale 2.2 was released and it introduces several new features including replication cluster failover management.

MariaDB MaxScale allows for master/slave deployments with high availability, automatic failover, manual switchover, and automatic rejoin. If the master fails, MariaDB MaxScale can automatically promote the most up-to-date slave to master. If the failed master is recovered, MariaDB MaxScale can automatically reconfigure it as a slave to the new master. In addition, administrators can perform a manual switchover to change the master on demand.

In our previous blogs we discussed how to Deploy MaxScale Using ClusterControl as well as Deploying MariaDB MaxScale on Docker. For those who are not yet familiar with MariaDB MaxScale, it is an advanced, plug-in, database proxy for MariaDB database servers. Maxscale sits between client applications and the database servers, routing client queries and server responses. It also monitors the servers, quickly noticing any changes in server status or replication topology.

Though Maxscale shares some of the characteristics of other load balancing technologies like ProxySQL, this new failover feature (which is part of its monitoring and autodetection mechanism) stands out. In this blog we’re going to discuss this exciting new function of Maxscale.

Overview of the MariaDB MaxScale Failover Mechanism

Master Detection

The monitor is now less likely to suddenly change the master server, even if another server has more slaves than the current master. The DBA can force a master reselection by setting the current master read-only, or by removing all its slaves if the master is down.

Only one server can have the Master status flag at a time, even in a multimaster setup. Others servers in the multimaster group are given the Relay Master and Slave status flags.

Switchover New Master Autoselection

The switchover command can now be called with just the monitor instance name as parameter. In this case the monitor will automatically select a server for promotion.

Replication Lag Detection

The replication lag measurement now simply reads the Seconds_Behind_Master-field of the slave status output of slaves. The slave calculates this value by comparing the time stamp in the binlog event the slave is currently processing to the slave's own clock. If a slave has multiple slave connections, the smallest lag is used.

Automatic Switchover After Low Disk Space Detection

With the recent MariaDB Server versions, the monitor can now check the disk space on the backend and detect if the server is running low. When this happens, the monitor can be set to automatically switchover from a master low on disk space. Slaves can also be set to maintenance mode. The disk space is also a factor which is considered when selecting which new master to promote.

See switchover_on_low_disk_space and maintenance_on_low_disk_space for more information.

Replication Reset Feature

The reset-replication monitor command deletes all slave connections and binary logs, and then sets up replication. Useful when data is in sync but gtid's are not.

Scheduled Events Handling in Failover/Switchover/Rejoin

Server events launched by the event scheduler thread are now handled during cluster modification operations. See handle_server_events for more information.

External Master Support

The monitor can detect if a server in the cluster is replicating from an external master (a server that is not being monitored by the MaxScale monitor). If the replicating server is the cluster master server, then the cluster itself is considered to have an external master.

If a failover/switchover happens, the new master server is set to replicate from the cluster external master server. The username and password for the replication are defined in replication_user and replication_password. The address and port used are the ones shown by SHOW ALL SLAVES STATUS on the old cluster master server. In the case of switchover, the old master also stops replicating from the external server to preserve the topology.

After failover the new master is replicating from the external master. If the failed old master comes back online, it is also replicating from the external server. To normalize the situation, either have auto_rejoin on or manually execute a rejoin. This will redirect the old master to the current cluster master.

How Failover is Useful and Applicable?

Failover helps you minimize downtime, perform daily maintenance, or handle disastrous and unwanted maintenance that can sometimes occur at unfortunate times. With MaxScale’s ability to insulate client applications from backend database servers, it adds valuable functionality that help minimize downtime.

The MaxScale monitoring plugin continuously monitors the state of backend database servers. MaxScale’s routing plugin then uses this status information to always route queries to backend database servers that are in service. It is then able to send queries to the backend database clusters, even if some of the servers of a cluster are going through maintenance or experiencing failure.

MaxScale’s high configurability enables changes in cluster configuration to remain transparent to client applications. For example, if a new server needs to be administratively added to or removed from a master-slave cluster, you can simply add the MaxScale configuration to the server list of monitor and router plugins via the maxadmin CLI console. The client application will be completely unaware of this change and will continue to send database queries to the MaxScale’s listening port.

Setting a database server in maintenance is simple and easy. Simply do the following command using maxctrl and MaxScale will stop sending any queries to this server. For example,

 maxctrl: set server DB_785 maintenance

Then checking the servers state as follows,

 maxctrl: list servers
│ Server │ Address       │ Port │ Connections │ State                │ GTID       │
│ DB_783 │ │ 3306 │ 0           │ Master, Running      │ 0-43001-70 │
│ DB_784 │ │ 3306 │ 0           │ Slave, Running       │ 0-43001-70 │
│ DB_785 │ │ 3306 │ 0           │ Maintenance, Running │ 0-43001-70 │

Once in maintenance mode, MaxScale will stop routing any new requests to the server. For current requests, MaxScale will not kill these sessions, but rather will allow it to complete its execution and will not interrupt any running queries while in maintenance mode. Also, take note that the maintenance mode is not persistent. If MaxScale restarts when a node is in maintenance mode, a new instance of MariaDB MaxScale will not honor this mode. If multiple MariaDB MaxScale instances are configured to use the node them maintenance mode must be set within each MariaDB MaxScale instance. However, if multiple services within one MariaDB MaxScale instance are using the server then you only need to set the maintenance mode once on the server for all services to take note of the mode change.

Once done with your maintenance, just clear the server with the following command. For example,

 maxctrl: clear server DB_785 maintenance

Checking if it's set back to normal, just run the command list servers.

You can also apply certain administrative actions through ClusterControl UI as well. See the example screenshot below:

MaxScale Failover In-Action

The Automatic Failover

MariaDB's MaxScale failover performs very efficiently and reconfigures the slave accordingly as expected. In this test, we have the following configuration file set which was created and managed by ClusterControl. See below:


Take note that, only the auto_failover and auto_rejoin are the variables that I have added since ClusterControl won't add this by default once you setup a MaxScale load balancer (check out this blog on how to setup MaxScale using ClusterControl). Do not forget that you need to restart MariaDB MaxScale once you have applied the changes in your configuration file. Just run,

systemctl restart maxscale

and you're good to go.

Before proceeding the failover test, let's check first the cluster's health:

 maxctrl: list servers
│ Server │ Address       │ Port │ Connections │ State           │ GTID       │
│ DB_783 │ │ 3306 │ 0           │ Master, Running │ 0-43001-75 │
│ DB_784 │ │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
│ DB_785 │ │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │

Looks great!

I killed the master with just the pure killer command KILL -9 $(pidof mysqld) in my master node and see, to no surprise, the monitor has been quick to notice this and triggers the failover. See the logs as follows:

2019-06-28 06:39:14.306   error  : (mon_log_connect_error): Monitor was unable to connect to server DB_783[] : 'Can't connect to MySQL server on '' (115)'
2019-06-28 06:39:14.329   notice : (mon_log_state_change): Server changed state: DB_783[]: master_down. [Master, Running] -> [Down]
2019-06-28 06:39:14.329   warning: (handle_auto_failover): Master has failed. If master status does not change in 2 monitor passes, failover begins.
2019-06-28 06:39:15.011   notice : (select_promotion_target): Selecting a server to promote and replace 'DB_783'. Candidates are: 'DB_784', 'DB_785'.
2019-06-28 06:39:15.011   warning: (warn_replication_settings): Slave 'DB_784' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2019-06-28 06:39:15.012   warning: (warn_replication_settings): Slave 'DB_785' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2019-06-28 06:39:15.012   notice : (select_promotion_target): Selected 'DB_784'.
2019-06-28 06:39:15.012   notice : (handle_auto_failover): Performing automatic failover to replace failed master 'DB_783'.
2019-06-28 06:39:15.017   notice : (redirect_slaves_ex): Redirecting 'DB_785' to replicate from 'DB_784' instead of 'DB_783'.
2019-06-28 06:39:15.024   notice : (redirect_slaves_ex): All redirects successful.
2019-06-28 06:39:15.527   notice : (wait_cluster_stabilization): All redirected slaves successfully started replication from 'DB_784'.
2019-06-28 06:39:15.527   notice : (handle_auto_failover): Failover 'DB_783' -> 'DB_784' performed.
2019-06-28 06:39:15.634   notice : (mon_log_state_change): Server changed state: DB_784[]: new_master. [Slave, Running] -> [Master, Running]
2019-06-28 06:39:20.165   notice : (mon_log_state_change): Server changed state: DB_783[]: slave_up. [Down] -> [Slave, Running]

Now let's have a look at its cluster's health,

 maxctrl: list servers
│ Server │ Address       │ Port │ Connections │ State           │ GTID       │
│ DB_783 │ │ 3306 │ 0           │ Down            │ 0-43001-75 │
│ DB_784 │ │ 3306 │ 0           │ Master, Running │ 0-43001-75 │
│ DB_785 │ │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │

The node which was previously the master has been down. I tried to restart and see if auto-rejoin would trigger, and as you noticed in the log at time 2019-06-28 06:39:20.165, it has been so quick to catch the state of the node and then sets up the configuration automatically with no hassle for the DBA to turn it on.

Now, checking lastly on its state, it looks perfectly working as expected. See below:

 maxctrl: list servers
│ Server │ Address       │ Port │ Connections │ State           │ GTID       │
│ DB_783 │ │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
│ DB_784 │ │ 3306 │ 0           │ Master, Running │ 0-43001-75 │
│ DB_785 │ │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │

My ex-Master Has Been Fixed and Recovered and I Want to Switch Over

Switching over to your previous master is no hassle as well. You can operate this with maxctrl (or maxadmin in previous versions of MaxScale) or through ClusterControl UI (as previously demonstrated).

Let's just refer to the previous state of the replication cluster health earlier, and wanted to switch the (currently slave), back to its master state. Before we proceed, you might need to identify first the monitor you are going to use. You can verify this with the following command below:

 maxctrl: list monitors
│ Monitor             │ State   │ Servers                │
│ replication_monitor │ Running │ DB_783, DB_784, DB_785 │

Once you have it, you can do the following command below to switch over:

maxctrl: call command mariadbmon switchover replication_monitor DB_783 DB_784

Then check again the state of the cluster,

 maxctrl: list servers
│ Server │ Address       │ Port │ Connections │ State           │ GTID       │
│ DB_783 │ │ 3306 │ 0           │ Master, Running │ 0-43001-75 │
│ DB_784 │ │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
│ DB_785 │ │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │

Looks perfect!

Logs will verbosely show you how it went and its series of action during the switch over. See the details below:

2019-06-28 07:03:48.064   error  : (switchover_prepare): 'DB_784' is not a valid promotion target for switchover because it is already the master.
2019-06-28 07:03:48.064   error  : (manual_switchover): Switchover cancelled.
2019-06-28 07:04:30.700   notice : (create_start_slave): Slave connection from DB_784 to []:3306 created and started.
2019-06-28 07:04:30.700   notice : (redirect_slaves_ex): Redirecting 'DB_785' to replicate from 'DB_783' instead of 'DB_784'.
2019-06-28 07:04:30.708   notice : (redirect_slaves_ex): All redirects successful.
2019-06-28 07:04:31.209   notice : (wait_cluster_stabilization): All redirected slaves successfully started replication from 'DB_783'.
2019-06-28 07:04:31.209   notice : (manual_switchover): Switchover 'DB_784' -> 'DB_783' performed.
2019-06-28 07:04:31.318   notice : (mon_log_state_change): Server changed state: DB_783[]: new_master. [Slave, Running] -> [Master, Running]
2019-06-28 07:04:31.318   notice : (mon_log_state_change): Server changed state: DB_784[]: new_slave. [Master, Running] -> [Slave, Running]

In the case of a wrong switch over, it will not proceed and hence it will generate an error as shown in the log above. So you'll be safe and no scary surprises at all.

Making Your MaxScale Highly Available

While it's a bit off-topic in regards to failover, I wanted to add some valuable points here with regard to high availability and how it related to MariaDB MaxScale failover.

Making your MaxScale highly available is an important part in the event that your system crashes, experience disk corruption, or virtual machine corruption. These situations are inevitable and can affect the state of your automated failover setup when these unexpected maintenance cycles occur.

For a replication cluster type environment, this is very beneficial and highly recommended for a specific MaxScale setup. The purpose of this is that, only one MaxScale instance should be allowed to modify the cluster at any given time. If you have setup with Keepalived, this is where the instances with the status of MASTER. MaxScale itself does not know its state, but with maxctrl (or maxadmin in previous versions) can set a MaxScale instance to passive mode. As of version 2.2.2, a passive MaxScale behaves similar to an active one with the distinction that it won't perform failover, switchover or rejoin. Even manual versions of these commands will end in error. The passive/active mode differences may be expanded in the future so stay tuned of such changes in MaxScale. To do this, just do the following:

 maxctrl: alter maxscale passive true

You can verify this afterwards by running the command below:

[root@node5 vagrant]#  maxctrl -u admin -p mariadb -h show maxscale|grep 'passive'│              │     "passive": true,                                         │

If you want to check out how to setup highly available with Keepalived, please check this post from MariaDB.

VIP Handling

Additionally, since MaxScale does not have VIP handling built-in itself, you can use Keepalived to handle that for you. You can just use the virtual_ipaddress assigned to the MASTER state node. This is likely to come up with virtual IP management just like MHA does with master_failover_script variable. As mentioned earlier, check out this Keepalived with MaxScale setup blog post by MariaDB.


MariaDB MaxScale is feature-rich and has lots of capability, not just limited to being a proxy and load balancer, but it also offers the failover mechanism that large organizations are looking for. It's almost a one-size-fits-all software, but of course comes with limitations that a certain application might need to in contrast other load balancers such as ProxySQL.

ClusterControl also offers an auto-failover and master auto-detection mechanism, plus cluster and node recovery with the ability to deploy Maxscale and other load balancing technologies.

Each of these tools has its diverse features and functionality, but MariaDB MaxScale is well supported within ClusterControl and can be deployed feasibly along with Keepalived, HAProxy to help you speed up for your daily routine task.

Comparing Galera Cluster Cloud Offerings: Part Three Microsoft Azure


Microsoft Azure is known to many as an alternative public cloud platform to Amazon AWS. It's not easy to directly compare these two giant companies. Microsoft's cloud business -- dubbed commercial cloud -- includes everything from Azure to Office 365 enterprise subscriptions to Dynamics 365 to LinkedIn services. After LinkedIn was acquired by Microsoft it began moving its infrastructure to Azure. While moving LinkedIn to Azure could take some time, it demonstrates Microsoft Azure’s capabilities and ability to handle millions of transactions. Microsoft's strong enterprise heritage, software stack, and data center tools offer both familiarity and a hybrid approach to cloud deployments.

Microsoft Azure is built as an Infrastructure as a Service (IaaS) as well as a Platform as a Service (PaaS). The Azure Virtual machine offers per-second billing and it's currently a multi-tenant compute. It has, however, recently previewed its new offering which allows virtual machines to run on single-tenant physical servers. The offering is called Azure Dedicated Hosts

Azure also offers specialized large instances (such as for SAP HANA). There are multitenant blocks, file storage, and many  other additional IaaS and PaaS capabilities. These include object storage (Azure Blob Storage), a CDN, a Docker-based container service (Azure Container Service), a batch computing service (Azure Batch), and event-driven “serverless computing” (Azure Functions). The Azure Marketplace offers third-party software and services. Colocation needs are met via partner exchanges (Azure ExpressRoute) offered from partners like Equinix and CoreSite.

With all of these offerings Microsoft Azure has stepped up its game to play a vital role in the public cloud market. The PaaS infrastructure offered to its consumers has garnered a lot of trust and many are moving their own infrastructure or private cloud to Microsoft Azure's public cloud infrastructure. This is especially advantageous for consumers who need integration with other Windows Services, such as Visual Studio.

So what’s different between Azure and the other clouds we have looked at in this series? Microsoft has focused heavily on AI, analytics, and the Internet of Things. AzureStack is another “cloud-meets-data center” effort that has been a real differentiator in the market.

Microsoft Azure Migration Pros & Cons

There are several things you should consider when moving your legacy applications or infrastructure to Microsoft Azure.


  • Enterprises that are strategically committed to Microsoft technology generally choose Azure as their primary IaaS+PaaS provider. The integrated end-to-end experience for enterprises building .NET applications using Visual Studio (and related services) is unsurpassed. Microsoft is also leveraging its tremendous sales reach and ability to co-sell Azure with other Microsoft products and services in order to drive adoption.
  • Azure provides a well-integrated approach to edge computing and Internet of Things (IoT), with offerings that reach from its hyperscale data center out through edge solutions such as AzureStack and Data Box Edge.
  • Microsoft Azure’s capabilities have become increasingly innovative and open. 50% of the workloads are Linux-based alongside numerous open-source application stacks. Microsoft has a unique vision for the future that involves bringing in technology partners through native, first-party offerings such as those from VMware, NetApp, Red Hat, Cray and Databricks.


  • Microsoft Azure’s reliability issues continue to be a challenge for customers, largely as a result of Azure’s growing pains. Since September 2018, Azure has had multiple service-impacting incidents, including significant outages involving Azure Active Directory. These outages leave customers with no ability to mitigate the downtime.
  • Gartner clients often experience challenges with executing on-time implementations within budget. This comes from Microsoft often providing unreasonably high expectations for customers. Much of this stems from the Microsoft’s field sales teams being “encouraged” to appropriately position and sell Azure within its customer base.
  • Enterprises frequently lament the quality of Microsoft technical support (along with the increasing cost of support) and field solution architects. This negatively impacts customer satisfaction, and slows Azure adoption and therefore customer spending.

Microsoft may not be your first choice as it has been seen as a “not-so-open-source-friendly” tech giant, but in fairness it has embraced a lot of activity and support within the Open Source world. Microsoft Azure offers fully-managed services to most of the top open source RDBMS database like PostgreSQL, MySQL, and MariaDB.  

Galera Cluster (Percona, Codership, or MariaDB) variants, unfortunately, aren't supported by Azure. The only way you can deploy your Galera Cluster to Azure is by means of a Virtual Machine. You may also want to check their blog on using MariaDB Enterprise Cluster (which is based on Galera) on Azure.

Azure's Virtual Machine

Virtual Machine is the equivalent offering for compute instances in GCP and AWS. An Azure Virtual Machine is an on-demand, high-performance computing server in the cloud and can be deployed in Azure using various methods. These might include the user interface within the Azure portal, using pre-configured images in the Azure marketplace, scripting through Azure PowerShell, deploying from a template that is defined by using a JSON file, or by deploying directly through Visual Studio.

Azure uses a deployment model called the Azure Resource Manager (ARM), which defines all resources that form part of your overall application solution, allowing you to deploy, update, or delete your solution in a single operation.

Resources may include the storage account, network configurations, and IP addresses. You may have heard the term “ARM templates”, which essentially means the JSON template which defines the different aspects of your solution which you are trying to deploy.

Azure Virtual Machines come in different types and sizes, with names beginning with A-series to N-series. Each VM type is built with specific workloads or performance needs in mind, including general purpose, compute optimized, storage optimized or memory optimized. You can also deploy less common types like GPU or high performance compute VMs.

Similar to other public cloud offerings, you can do the following in your virtual machine instances...

  • Encrypt your disk on virtual machine. Although this does not come easily when compared to GCP and AWS. Encrypting your virtual machine requires a more manual approach. It requires you to complete the Azure Disk Encryption prerequisites. Since Galera does not support Windows, we're only talking here about Linux-based images. Basically, it requires you to have dm-crypt and vfat modules present in the system. Once you get that piece right, then you can encrypt the VM using the Azure CLI. You can check out how to Enable Azure Disk Encryption for Linux IaaS VMs to see how to do it. Encrypting your disk is very important, especially if your company or organization requires that your Galera Cluster data must follow the standards mandated by laws and regulations such as PCI DSS or GDPR.
  • Creating a snapshot. You can create a snapshot either using the Azure CLI or through the portal. Check their manual on how to do it.
  • Use auto scaling or Virtual Machine Scale Sets if you require horizontal scaling. Check out the overview of autoscaling in Azure or the overview of virtual machine scale sets.
  • Multi Zone Deployment. Deploy your virtual machine instances into different availability zones to avoid single-point of failure.

You can also create (or get information from) your virtual machines in different ways. You can use the Azure portal, Azure PowerShell, REST APIs, Client SDKs, or with the Azure CLI. Virtual machines in the Azure virtual network can also easily be connected to your organization’s network and treated as an extended datacenter.

Microsoft Azure Pricing

Just like other public cloud providers, Microsoft Azure also offers a free tier with some free services. It also offers pay-as-you-go options and reserved instances to choose from. Pay-as-you-go starts at $0.008/hour - $0.126/hour.

Microsoft Azure Pricing

For reserved instances, the longer you commit and contract with Azure, the more you save on the cost. Microsoft Azure claims to help subscribers save up to 72% of their billing costs compared to its pay-as-you-go model when subscribers sign up for a one to three year term for a Windows or Linux Virtual Machine. Microsoft also offers added flexibility in the sense that if your business needs change, you can cancel your Azure RI subscription at any time and return the remaining unused RI to Microsoft as an early termination fee.

Let's checkout it's pricing in comparison between GCP, AWS EC2, and an Azure Virtual Machine. This is based on us-east1 region and we will compare the price ranges for the compute instances required to run your Galera Cluster.


Compute Engine






Prices starts at $0.006 -  $0.019 hourly

t2.nano – t3a.2xlarge

Price starts at $0.0058 - $0.3328 hourly


Price starts at $0.0052 - $0.832 hourly


n1-standard-1 – n1-standard-96

Prices starts at $0.034  - $3.193 hourly

m4.large – m4.16xlarge

m5.large – m5d.metal

Prices starts at $0.1 - $5.424  hourly

Av2 Standard, D2-64 v3 latest generation, D2s-64s v3 latest generation, D1-5 v2, DS1-S5 v2, DC-series

Price starts at $0.043 - $3.072 hourly

High Memory/ Memory Optimized

n1-highmem-2 – n1-highmem-96


n1-ultramem-40 – n1-ultramem-160

Prices starts at $0.083  - $17.651 hourly

r4.large – r4.16xlarge

x1.16xlarge – x1.32xlarge

x1e.xlarge – x1e.32xlarge

Prices starts at $0.133  - $26.688 hourly

D2a – D64a v3, D2as – D64as v3, E2-64 v3 latest generation, E2a – E64a v3, E2as – E64as v3, E2s-64s v3 latest generation, D11-15 v2, DS11-S15 v2, M-series, Mv2-series, Instances, Extreme Memory Optimized

Price starts at $0.043 - $44.62 hourly

High CPU/Storage Optimized

n1-highcpu-2 – n1-highcpu-32

Prices starts at $0.05 - $2.383 hourly

h1.2xlarge – h1.16xlarge

i3.large – i3.metal

I3en.large - i3en.metal

d2.xlarge – d2.8xlarge

Prices starts at $0.156 - $10.848  hourly

Fsv2-series, F-series, Fs-Series

Price starts at $0.0497 - $3.045 hourly


Data Encryption on Microsoft Azure

Microsoft Azure does not offer encryption support directly for Galera Cluster (or vice-versa). There are, however, ways you can encrypt data either at-rest or in-transit.

Encryption in-transit is a mechanism for protecting data when it's transmitted across networks. With Azure Storage, you can secure data by using:

Microsoft uses encryption to protect customer data when it’s in-transit between customers realm and Microsoft cloud services. More specifically, Transport Layer Security (TLS) is the protocol that Microsoft’s data centers will use to negotiate with client systems that are connected to Microsoft cloud services.  

Perfect Forward Secrecy (PFS) is also employed so that each connection between customers’ client systems and Microsoft’s cloud services use unique keys. Connections to Microsoft cloud services also take advantage of RSA based 2,048-bit encryption key lengths.

Encryption At-Rest

For many organizations, data encryption at-rest is a mandatory step towards achieving data privacy, compliance, and data sovereignty. Three Azure features provide encryption of data at-rest:

  • Storage Service Encryption is always enabled and automatically encrypts storage service data when writing it to Azure Storage. If your application logic requires your MySQL Galera Cluster database to store valuable data, then storing to Azure Storage can be an option.
  • Client-side encryption also provides the feature of encryption at-rest.
  • Azure Disk Encryption enables you to encrypt the OS disks and data disks that an IaaS virtual machine uses. Azure Disk Encryption also supports enabling encryption on Linux VMs that are configured with disk striping (RAID) by using mdadm, and by enabling encryption on Linux VMs by using LVM for data disks

Galera Cluster Multi-AZ/Multi-Region/Multi-Cloud Deployments with GCP

Similar to AWS and GCP, Microsoft Azure does not offer direct support for deploying a Galera Cluster onto a Multi-AZ/-Region/-Cloud. You can, however, deploy your nodes manually as well as creating scripts using PowerShell or Azure CLI to do this for you. Alternatively, when you provision your Virtual Machine instance you can place your nodes in different availability zones. Microsoft Azure also offers another type of redundancy, aside from having its availability zone, which is called Virtual Machine Scale Sets. You can check the differences between virtual machine and scale sets.

Galera Cluster High Availability, Scalability, and Redundancy on Azure

One of the primary reasons for using a Galera node cluster is for high-availability, redundancy, and for its ability to scale. If you are serving traffic globally, it's best that you cater your traffic by region. You should ensure your architectural design includes geo-distribution of your database nodes. In order to achieve this, multi-AZ, multi-region, or multi-cloud/multi-datacenter deployments are recommended. This prevents the cluster from going down as well as a malfunction due to lack of quorum. 

As mentioned earlier, Microsoft Azure has an auto scaling solution which can be leveraged using scale sets. This allows you to autoscale a node when a certain threshold has been met (based on what you are monitoring). This depends on which health status items you are monitoring before it then vertically scales. You can check out their tutorial on this topic here.

For multi-region or multi-cloud deployments, Galera has its own parameter called gmcast.segment for which can be set upon server start. This parameter is designed to optimize the communication between the Galera nodes and minimize the amount of traffic sent between network segments. This includes writeset relaying and IST and SST donor selection. This type of setup allows you to deploy multiple nodes in different regions. Aside from that, you can also deploy your Galera nodes on a different cloud vendors routing from GCP, AWS, Microsoft Azure, or within an on-premise setup. 

We recommend you to check out our blog Multiple Data Center Setups Using Galera Cluster for MySQL or MariaDB and Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node to gather more information on how to implement these types of deployments.

Galera Cluster Database Performance on Microsoft Azure

The underlying host machines used by virtual machines in Azure are, in fact, very powerful. The newest VM's in Azure have already been equipped with network optimization modules. You can check this in your kernel info by running (e.g. in Ubuntu).

uname -r|grep azure

Note: Make certain that your command has the "azure" string on it. 

For Centos/RHEL, installing any Linux Integration Services (LIS) since version 4.2 contains network optimization. To learn more about this, visit the page on optimizing network throughput.

If your application is very sensitive to network latency, you might be interested in looking at the proximity placement group. It's currently in preview (and not yet recommended for production use) but this helps optimize your network throughput. 

For the type of virtual machine you would consume, then this would depend on the requirement of your application traffic and resource demands. For queries that are high on memory consumption, you can start with Dv3. However, for memory-optimized, then start with the Ev3 series. For High CPU requirements, such as high-transactional database or gaming applications, then start with Fsv2 series.

Choosing the right storage and required IOPS for your database volume is a must. Generally, a SSD-based persistent disk is your ideal choice. Begin with Standard SSD which is cost-effective and offers consistent performance. This decision, however, might depend on if you need more IOPS in the long run. If this is the case, then you should go for Premium SSD storage.

We also recommend you to check and read our blog How to Improve Performance of Galera Cluster for MySQL or MariaDB to learn more about optimizing your Galera Cluster.

Database Backup for Galera Nodes on Azure

There's no existing naitve backup support for your MySQL Galera data in Azure, but you can take a snapshot. Microsoft Azure offers Azure VM Backup which takes a snapshot which can be scheduled and encrypted. 

Alternatively, if you want to backup the data files from your Galera Cluster, you can also use external services like ClusterControl, use Percona Xtrabackup for your binary backup, or use mysqldump or mydumper for your logical backups. These tools provide backup copies for your mission-critical data and you can read this if you want to learn more.

Galera Cluster Monitoring on Azure

Microsoft Azure has its monitoring service named Azure Monitor. Azure Monitor maximizes the availability and performance of your applications by delivering a comprehensive solution for collecting, analyzing, and acting on telemetry from your cloud and on-premise environments. It helps you understand how your applications are performing and proactively identifies issues affecting them (and the resources they depend on). You can setup or create health alerts, get notified on advisories and alerts detected in the services you deployed.

If you want monitoring specific to your database, then you will need to utilize external monitoring tools which have  advanced, highly-granular database metrics. There are several choices you can choose from such as PMM by Percona, DataDog, Idera, VividCortex, or our very own ClusterControl (Monitoring is FREE with ClusterControl Community.)

Galera Cluster Database Security on Azure

As discussed in our previous blogs for AWS and GCP, you can take the same approach for securing your database in the public cloud. Once you create a virtual machine, you can specify what ports only can be opened, or create and setup your Network Security Group in Azure. You can setup the ports need to be open (particularly ports 3306, 4444, 4567, 4568), or create a Virtual Network in Azure and specify the private subnets if they remain as a private node. To add this, if you setup your VM's in Azure without a public IP, it can still an outbound connection merely because it uses SNAT and PAT. If you're familiar with AWS and GCP, you'll like this explanation to make it easier to comprehend.

Another feature available is Role-Based Access Control in Microsoft Azure. This gives you control on which people that access to the specific resources they need.

In addition to this, you can secure your data-in-transit by using a TLS/SSL connection or by encrypting your data when it's at-rest. If you're using ClusterControl, deploying a secure data in-transit is simple and easy. You can check out our blog SSL Key Management and Encryption of MySQL Data in Transit if you want to try out. For data at-rest, you can follow the discussion I have stated earlier in the Encryption section of this blog.

Galera Cluster Troubleshooting 

Microsoft Azure offers a wide array of log types to aid troubleshooting and auditing. The logs Activity logs, Azure diagnostics logs, Azure AD reporting, Virtual machines and cloud services, Network Security Group (NSG) flow logs, and Application insight are very useful when troubleshooting. It might not always be necessary to go into all of these when you need troubleshooting, however, it would add more insights and clues when checking the logs.

If you're using ClusterControl, going to Logs -> System Logs, and you'll be able to browse the captured error logs taken from the MySQL Galera node itself. Apart from this, ClusterControl provides real-time monitoring that would amplify your alarm and notification system in case an emergency or if your MySQL Galera node(s) is kaput.


As we finish this three part blog series, we have showed you the offerings and the advantages of each of the tech-giants serving the public cloud industry. There are advantages and disadvantages when selecting one over the other, but what matters most is your reason for moving to a public cloud, its benefits for your organization, and how it serves the requirements of your application. 

The choice of provider for your Galera Cluster may involve financial considerations like “what's most cost-efficient” and better suits your budgetary needs. It could also be due to privacy laws and regulation compliance, or even because of the technology stack you are wanting to use.  What's important is how your application and database will perform once it's in the cloud handling large amounts of traffic. It has to be highly-available, must be resilient, has the right levels of scalability and redundancy, and takes backups to ensure data recovery.

Database Switchover and Failover for Drupal Websites Using MySQL or PostgreSQL


Drupal is a Content Management System (CMS) designed to create everything from tiny to large corporate websites. Over 1,000,000 websites run on Drupal and it is used to make many of the websites and applications you use every day (including this one). Drupal has a great set of standard features such as easy content authoring, reliable performance, and excellent security. What sets Drupal apart is its flexibility as modularity is one of its core principles. 

Drupal is also a great choice for creating integrated digital frameworks. You can extend it with the thousands of add-ons available. These modules expand Drupal's functionality. Themes let you customize your content's presentation and distributions (Drupal bundles) are bundles which you can use as starter-kits. You can use all these functionalities to mix and match to enhance Drupal's core abilities or to integrate Drupal with external services. It is content management software that is powerful and scalable.

Drupal uses databases to store its web content. When your Drupal-based website or application is experiencing a large amount of traffic it can have an impact on your database server. When you are in this situation you'll require load balancing, high availability, and a redundant architecture to keep your database online. 

When I started researching this blog, I realized there are many answers to this issue online, but the solutions recommended were very dated. This could be a result of the increase in market share by WordPress resulting in a smaller open source community. What I did find were some examples on implementing high availability by using Master/Master (High Availability) or Master/Master/Slave (High Availability/High Performance)

Drupal offers support for a wide array of databases, but it was initially designed using MySQL variants. Though using MySQL is fully supported, there are better approaches you can implement. Implementing these other approaches, however, if not done properly, can cause your website to experience large amounts of downtime, cause your application to suffer performance issues, and may result in write issues to your slaves. Performing maintenance would also be difficult as you need failover to apply the server upgrades or patches (hardware or software) without downtime. This is especially true if you have a large amount of data, causing a potential major impact to your business. 

These are situations you don't want to happen which is why in this blog we’ll discuss how you can implement database failover for your MySQL or PostgreSQL databases.

Why Does Your Drupal Website Need Database Failover?

From Wikipedia“failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.” 

In database operations, switchover is also a term used for manual failover, meaning that it requires a person to operate the failover. Failover comes in handy for any admin as it isolates unwanted problems such as accidental deletes/dropping of tables, long hours of downtime causing business impact, database corruption, or system-level corruption. 

Database Failover consists of more than a single database node, either physically or virtually. Ideally, since failover requires you to do switching over to a different node, you might as well switch to a different database server, if a host is running multiple database instances on a single host. That still can be either switchover or failover, but typically it's more of redundancy and high-availability in case a catastrophe occurs on that current host.

MySQL Failover for Drupal

Performing a failover for your Drupal-based application requires that the data handled by the database does not differentiate, nor separate. There are several solutions available, and we have already discussed some of them in previous Severalnines blogs. You may likely want to read our Introduction to Failover for MySQL Replication - the 101 Blog.

The Master-Slave Switchover

The most common approaches for MySQL Failover is using the master-slave switch over or the manual failover. There are two approaches you can do here:

  • You can implement your database with a typical asynchronous master-slave replication.
  • or can implement with asynchronous master-slave replication using GTID-based replication.

Switching to another master could be quicker and easier. This can be done with the following MySQL syntax:

mysql> SET GLOBAL read_only = 1; /* enable read-only */

mysql> CHANGE MASTER TO MASTER_HOST = '<hostname-or-ip>', MASTER_USER = '<user>', MASTER_PASSWORD = '<password>', MASTER_LOG_FILE = '<master-log-file>', MASTER_LOG_POS=<master_log_position>; /* master information to connect */

mysql> START SLAVE; /* start replication */

mysql> SHOW SLAVE STATUS\G /* check replication status */

or with GTID, you can simply do,


mysql> CHANGE MASTER TO MASTER_HOST = '<hostname-or-ip>', MASTER_USER = '<user>', MASTER_PASSWORD = '<password>', MASTER_AUTO_POSITION = 1; /* master information to connect */



Using the non-GTID approach requires you to determine first the master's log file and master's log pos. You can determine this by looking at the master's status in the master node before switching over. 


You may also consider hardening your server adding sync_binlog = 1 and innodb_flush_log_at_trx_commit = 1 as, in the event your master crashes, you'll have a higher chance that transactions from master will be insync with your slave(s). In such a case that promoted master has a higher chance of being a consistent datasource node.

This, however, may not be the best approach for your Drupal database as it could impose long downtimes if not performed correctly, such as being taken down abruptly. If your master database node experiences a bug resulting in a database to crash, you’ll need your application to point to another database waiting on standby as your new master or by having your slave promoted to be the master. You will need to specify exactly which node should take over and then determine the lag and consistency of that node. Achieving this is not as easy as just doing SET GLOBAL read_only=1; CHANGE MASTER TO… (etc), there are certain situations which require deeper analysis, looking at the viable transactions required to be present in that standby server or promoted master, to get it done. 

Drupal Failover Using MHA

One of the most common tools for automatic and manual failover is MHA. It has been around for a long while now and is still used by many organizations. You can checkout these previous blogs we have on the subject, Top Common Issues with MHA and How to Fix Them or MySQL High Availability Tools - Comparing MHA, MRM and ClusterControl.

Drupal Failover Using Orchestrator

Orchestrator has been widely adopted now and is being used by large organizations such as Github and Booking.com. It not only allows you to manage a failover, but also topology management, host discovery, refactoring, and recovery. There's a nice external blog here which I found it very useful to learn about its failover mechanism with Orchestrator. It's a two part blog series; part one and part two.

Drupal Failover Using MaxScale

MaxScale is not just a load balancer designed for MariaDB server, it also extends high availability, scalability, and security for MariaDB while, at the same time, simplifying application development by decoupling it from underlying database infrastructure. If you are using MariaDB, then MaxScale could be a relevant technology for you. Check out our previous blogs on how you can use the MaxScale failover mechanism.

Drupal Failover Using ClusterControl

Severalnines'ClusterControl offers a wide array of database management and monitoring solutions. Part of the solutions we offer is automatic failover, manual failover, and cluster/node recovery. This is very helpful as if it acts as your virtual database administrator, notifying you in real-time in case your cluster is in “panic mode,” all while the recovery is being managed by the system. You can check out this blog How to Automate Database Failover with ClusterControl to learn more about ClusterControl failover.

Other MySQL Solutions

Some of the old approaches are still applicable when you want to failover. There's MMM, MRM, or you can checkout Group Replication or Galera (note: Galera does not use asynchronous, rather synchronous replication). Failover in a Galera Cluster does not work the same way as it does with asynchronous replication. With Galera you can just write to any node or, if you implement a master-slave approach, you can direct your application to another node that will be the active-writer for the cluster.

Drupal PostgreSQL Failover

Since Drupal supports PostgreSQL, we will also checkout the tools to implement a failover or switchover process for PostgreSQL. PostgreSQL uses built-in Streaming Replication, however you can also set it to use a Logical Replication (added as a core element of PostgreSQL in version 10). 

Drupal Failover Using pg_ctlcluster

If your environment is Ubuntu, using pg_ctlcluster is a simple and easy way to achieve failover. For example, you can just run the following command:

$ pg_ctlcluster 9.6 pg_7653 promote

or with RHEL/Centos, you can use the pg_ctl command just like,

$ sudo -iu postgres /usr/lib/postgresql/9.6/bin/pg_ctl promote -D  /data/pgsql/slave/data

server promoting

You can also trigger failover of a log-shipping standby server by creating a trigger file with the filename and path specified by the trigger_file in the recovery.conf. 

You have to be careful with standby promotion or slave promotion here as you might have to ensure that only one master is accepting the read-write request. This means that, while doing the switchover, you might have to ensure the previous master has been relaxed or stopped.

Taking care of switchover or manual failover from primary to standby server can be fast, but it requires some time to re-prepare the failover cluster. Regularly switching from primary to standby is a useful practice as it allows for regular downtime on each system for maintenance. This also serves as a test of the failover mechanism, to ensure that it will really work when you need it. Written administration procedures are always advised. 

Drupal PostgreSQL Automatic Failover

Instead of a manual approach, you might require automatic failover. This is especially needed when a server goes down due to hardware failure or virtual machine corruption. You may also require an application to automatically perform the failover to lessen the downtime of your Drupal application. We'll now go over some of these tools which can be utilized for automatic failover.

Drupal Failover Using Patroni

Patroni is a template for you to create your own customized, high-availability solution using Python and - for maximum accessibility - a distributed configuration store like ZooKeeper, etcd, Consul or Kubernetes. Database engineers, DBAs, DevOps engineers, and SREs who are looking to quickly deploy HA PostgreSQL in the datacenter-or anywhere else-will hopefully find it useful

Drupal Failover Using Pgpool

Pgpool-II is a proxy software that sits between the PostgreSQL servers and a PostgreSQL database client. Aside from having an automatic failover, it has multiple features that includes connection pooling, load balancing, replication, and limiting the exceeding connections. You can read more about this tool is our three part blog; part one, part two, part three.

Drupal Failover Using pglookout

pglookout is a PostgreSQL replication monitoring and failover daemon. pglookout monitors the database nodes, their replication status, and acts according to that status. For example, calling a predefined failover command to promote a new master in the case the previous one goes missing.

pglookout supports two different node types, ones that are installed on the db nodes themselves and observer nodes that can be installed anywhere. The purpose of having pglookout on the PostgreSQL DB nodes is to monitor the replication status of the cluster and act accordingly, the observers have a more limited remit: they just observe the cluster status to give another viewpoint to the cluster state.

Drupal Failover Using repmgr

repmgr is an open-source tool suite for managing replication and failover in a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in hot-standby capabilities with tools to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations.

repmgr has provided advanced support for PostgreSQL's built-in replication mechanisms since they were introduced in 9.0. The current repmgr series, repmgr 4, supports the latest developments in replication functionality introduced from PostgreSQL 9.3 such as cascading replication, timeline switching and base backups via the replication protocol.

Drupal Failover Using ClusterControl

ClusterControl supports automatic failover for PostgreSQL. If you have an incident, your slave can be promoted to master status automatically. With ClusterControl you can also deploy standalone, replicated, or clustered PostgreSQL database. You can also easily add or remove a node with a single action.

Other PostgreSQL Drupal Failover Solutions

There are certainly automatic failover solutions that I might have missed on this blog. If I did, please add your comments below so we can know your thoughts and experiences with your implementation and setup for failover especially for Drupal websites or applications.

Additional Solutions For Drupal Failover

While the tools I have mentioned earlier definitely handles the solution for your problems with failover, adding some tools that makes the failover pretty easier, safer, and has a total isolation between your database layer can be satisfactory. 

Drupal Failover Using ProxySQL

With ProxySQL, you can just point your Drupal websites or applications to the ProxySQL server host and it will designate which node will receive writes and which nodes will receive the reads. The magic happens transparently within the TCP layer and no changes are needed for your application/website configuration. In addition to that, ProxySQL acts also as your load balancer for your write and read requests for your database traffic. This is only applicable if you are using MySQL database variants.

Drupal Failover Using HAProxy with Keepalived

Using HAProxy and Keepalived adds more high availability and redundancy within your Drupal's database. If you want to failover, it can be done without your application knowing what's happening within your database layer. Just point your application to the vrrp IP that you setup in your Keepalived and everything will be handled with total isolation from your application. Having an automatic failover will be handled transparently and unknowingly by your application so no changes has to occur once, for example, a disaster has occurred and a recovery or failover was applied. The good thing about this setup is that it is applicable for both MySQL and PostgreSQL databases. I suggest you check out our blog PostgreSQL Load Balancing Using HAProxy & Keepalived to learn more about how to do this.

All of the options above are supported by ClusterControl. You can deploy or import the database and then deploy ProxySQL, MaxScale, or HAProxy & Keepalived. Everything will be managed, monitored, and will be set up automatically without any further configuration needed by your end. It all happens in the background and automatically creates a ready-for-production.


Having an always-on Drupal website or application, especially if you are expecting a large amount of traffic, can be complicated to create. If you have the right tools, the right setup, and the right technology stack, however, it is possible to achieve high availability and redundancy.

And if you don’t? Well then ClusterControl will set it up and maintain it for you. Alternatively, you can create a setup using the technologies mentioned in this blog, most of which are open source, free tools that would cater to your needs.

Comparing DBaaS Failover Solutions to Manual Recovery Setups


We have recently written several blogs covering how different cloud providers handle database failover. We compared failover performance in Amazon Aurora, Amazon RDS and ClusterControl, tested the failover behavior in Amazon RDS, and also on Google Cloud Platform. While those services provide great options when it comes to failover, they may not be right for every application.

In this blog post we will spend a bit of time analysing the pros and cons of using the DBaaS solutions compared with designing an environment manually or by using a database management platform, like ClusterControl.

Implementing High Availability Databases with Managed Solutions

The primary reason to use existing solutions is ease of use. You can deploy a highly available solution with automated failover in just a couple of clicks. There’s no need for combining different tools together, managing the databases by hand, deploying tools, writing scripts, designing the monitoring, or any other database management operations. Everything is already in place. This can seriously reduce the learning curve and requires less experience to set up a highly-available environment for the databases; allowing basically everyone to deploy such setups.

In most of the cases with these solutions, the failover process is executed within a reasonable time. It may be blazing fast as with Amazon Aurora or somewhat slower as with Google Cloud Platform SQL nodes. For the majority of the cases, these types of results are acceptable. 

The bottom line. If you can accept 30 - 60 seconds of downtime, you should be ok using any of the DBaaS platforms.

The Downside of Using a Managed Solution for HA

While DBaaS solutions are simple to use, they also come with some serious drawbacks. For starters, there is always a vendor lock-in component to consider. Once you deploy a cluster in Amazon Web Services it is quite tricky to migrate out of that provider. There are no easy methods to download the full dataset through a physical backup. With most providers, only manually executed logical backups are available. Sure, there are always options to achieve this, but it is typically a complex, time-consuming process, which still may require some downtime after all.

Using a provider like Amazon RDS also comes with limitations. Some actions cannot be easily performed which would be very simple to accomplish on environments deployed in a fully user-controlled manner (e.g. AWS EC2). Some of these limitations have already been covered in other blogs, but to summarize is that no DBaaS service gives you the same level of flexibility as regular MySQL GTID-based replication. You can promote any slave, you can re-slave every node off any other...virtually every action is possible. With tools like RDS you face design-induced limitations you cannot bypass.

The problem is also with an ability to understand performance details. When you design your own highly available setup, you become knowledgeable about potential performance issues that may show up. On the other hand, RDS and similar environments are pretty much “black boxes.” Yes, we have learned that Amazon RDS uses DRBD to create a shadow copy of the master, we know that Aurora uses shared, replicated storage to implement very fast failovers. That’s just a general knowledge. We cannot tell what are the performance implications of those solutions other than what we might casually notice. What are common issues associated with them? How stable are those solutions? Only the developers behind the solution know for sure.

What is the Alternative to DBaaS Solutions?

You may wonder, is there an alternative to DBaaS? After all, it is so convenient to run the managed service where you can access most of the typical actions via UI. You can create and restore backups, failover is handled automatically for you. The environment is easy-to-use which can be compelling for companies who do not have dedicated and experienced staff for dealing with databases.

ClusterControl provides a great alternative to cloud-based DBaaS services. It provides you with a graphical user interface, which can be used to deploy, manage, and monitor open source databases. 

In couple of clicks you can easily deploy a highly-available database cluster, with automated failover (faster than most of the DBaaS offerings), backup management, advanced monitoring, and other features like integration with external tools (e.g. Slack or PagerDuty) or upgrade management. All this while completely avoiding vendor lock-in. 

ClusterControl doesn’t care where your databases are located as long as it can connect to them using SSH. You can have setups in cloud, on-prem, or in a mixed environment of multiple cloud providers. As long as connectivity is there, ClusterControl will be able to manage the environment. Utilizing the solutions you want (and not the ones that you are not familiar nor aware of) allows you to take full control over the environment at any point in time. 

Whatever setup you deployed with ClusterControl, you can easily manage it in a more traditional, manual or scripted way. ClusterControl even provides you with command line interface, which will let you incorporate tasks executed by ClusterControl into your shell scripts. You have all the control you want - nothing is a black box, every piece of the environment would be built using open source solutions combined together and deployed by ClusterControl.

Let’s take a look at how easily you can deploy a MySQL Replication cluster using ClusterControl. Let’s assume you have the environment prepared with ClusterControl installed on one instance and all other nodes accessible via SSH from ClusterControl host.

ClusterControl Deployment Wizard

We will start with picking the “Deploy” wizard.

ClusterControl Deployment Wizard

At the first step we have to define how ClusterControl should connect to the nodes on which databases are to be deployed. Both root access or sudo (with or without the password) are supported.

ClusterControl Deployment Wizard

Then, we want to pick a vendor, version and pass the password for the administrative user in our MySQL database.

ClusterControl Deployment Wizard

Finally, we want to define the topology for our new cluster. As you can see, this is already quite complex setup, unlike something you can deploy using AWS RDS or GCP SQL node.

ClusterControl Jobs

All we have to do now is to wait for the process to complete. ClusterControl will do its best to understand the environment it is deploying to and install required set of packages, including the database itself.

ClusterControl Cluster List

Once the cluster is up-and-running, you can proceed with deploying the proxy layer (which will provide your application with a single point of entry into the database layer). This is more or less what happens behind the scenes with DBaaS, where you also have endpoints to connect to the database cluster. It is quite common to use a single endpoint for writes and multiple endpoints for reaching particular replicas.

Database Cluster Topology

Here we will use ProxySQL, which will do the dirty work for us - it will understand the topology, sends writes only to the master and load balance read-only queries across all replicas that we have.

To deploy ProxySQL we will go to Manage -> Load Balancers.

Add Database Load Balancer ClusterControl

We have to fill all required fields: hosts to deploy on, credentials for the administrative and monitoring user, we may import existing user from MySQL into ProxySQL or create a new one. All the details about ProxySQL can be easily found in multiple blogs in our blog section.

We want at least two ProxySQL nodes to be deployed to ensure high-availability. Then, once they are deployed, we will deploy Keepalived on top of ProxySQL. This will ensure that Virtual IP will be configured and pointing to one of the ProxySQL instances, as long as there will be at least one healthy node.

Add ProxySQL ClusterControl

Here is the only potential problem if you go with cloud environments where routing works in a way that you cannot easily bring up a network interface. In such case you will have to modify the configuration of Keepalived, introduce ‘notify_master’ script and use a script, which will make the necessary IP changes - in case of EC2 it would have to detach Elastic IP from one host and attach it to the other host. 

There are plenty of instructions on how to do that using widely-tested open source software in setups deployed by ClusterControl. You can easily find additional information, tips, and how-to’s which are relevant to your particular environment.

Database Cluster Topology with Load Balancer


We hope you found this blog post insightful. If you would like to test ClusterControl, it comes with a 30 day enterprise trial where you have available all the features. You can download it for free and test if it fits in your environment.

How to Troubleshoot MySQL Database Issues


As soon as you start running a database server and your usage grows, you are exposed to many types of technical problems, performance degradation, and database malfunctions.  Each of these could lead to much bigger problems, such as catastrophic failure or data loss. It’s like a chain reaction, where one thing can lead to another, causing more and more issues. Proactive countermeasures must be performed in order for you to have a stable environment as long as possible.

In this blog post, we are going to look at a bunch of cool features offered by ClusterControl that can greatly help us troubleshoot and fix our MySQL database issues when they happen.

Database Alarms and Notifications

For all undesired events, ClusterControl will log everything under Alarms, accessible on the Activity (Top Menu) of ClusterControl page. This is commonly the first step to start troubleshooting when something goes wrong. From this page, we can get an idea on what is actually going on with our database cluster:

ClusterControl Database Alarms

The above screenshot shows an example of a server unreachable event, with severity CRITICAL, detected by two components, Network and Node. If you have configured the email notifications setting, you should get a copy of these alarms in your mailbox. 

When clicking on the “Full Alarm Details,” you can get the important details of the alarm like hostname, timestamp, cluster name and so on. It also provides the next recommended step to take. You can also send out this alarm as an email to other recipients configured under the Email Notification Settings. 

You may also opt to silence an alarm by clicking the “Ignore Alarm” button and it will not appear in the list again. Ignoring an alarm might be useful if you have a low severity alarm and know how to handle or work around it. For example if ClusterControl detects a duplicate index in your database, where in some cases would be needed by your legacy applications.

By looking at this page, we can obtain an immediate understanding of what is going on with our database cluster and what the next step is to do to solve the problem. As in this case, one of the database nodes went down and became unreachable via SSH from the ClusterControl host. Even a beginner SysAdmin would now know what to do next if this alarm appears.

Centralized Database Log Files

This is where we can drill down what was wrong with our database server. Under ClusterControl -> Logs -> System Logs, you can see all log files related to the database cluster. As for MySQL-based database cluster, ClusterControl pulls the ProxySQL log, MySQL error log and backup logs:

ClusterControl System Logs

Click on "Refresh Log" to retrieve the latest log from all hosts that are accessible at that particular time. If a node is unreachable, ClusterControl will still view the outdated log in since this information is stored inside the CMON database. By default ClusterControl keeps retrieving the system logs every 10 minutes, configurable under Settings -> Log Interval. 

ClusterControl will trigger the job to pull the latest log from each server, as shown in the following "Collect Logs" job:

ClusterControl Database Job Details

A centralized view of log file allows us to have faster understanding on what went wrong. For a database cluster which commonly involves multiple nodes and tiers, this feature will greatly improve the log reading where a SysAdmin can compare these logs side-by-side and pinpoint critical events, reducing the total troubleshooting time. 

Web SSH Console

ClusterControl provides a web-based SSH console so you can access the DB server directly via the ClusterControl UI (as the SSH user is configured to connect to the database hosts). From here, we can gather much more information which allows us to fix the problem even faster. Everyone knows when a database issue hits the production system, every second of downtime counts.

To access the SSH console via web, simply pick the nodes under Nodes -> Node Actions -> SSH Console, or simply click on the gear icon for a shortcut:

ClusterControl Web SSH Console Access

Due to security concern that might be imposed with this feature, especially for multi-user or multi-tenant environment, one can disable it by going to /var/www/html/clustercontrol/bootstrap.php on ClusterControl server and set the following constant to false:

define('SSH_ENABLED', false);

Refresh the ClusterControl UI page to load the new changes.

Database Performance Issues

Apart from monitoring and trending features, ClusterControl proactively sends you various alarms and advisors related to database performance, for example:

  • Excessive usage - Resource that passes certain thresholds like CPU, memory, swap usage and disk space.
  • Cluster degradation - Cluster and network partitioning.
  • System time drift - Time difference among all nodes in the cluster (including ClusterControl node).
  • Various other MySQL related advisors:
    • Replication - replication lag, binlog expiration, location and growth
    • Galera - SST method, scan GRA logfile, cluster address checker
    • Schema check - Non-transactional table existance on Galera Cluster.
    • Connections - Threads connected ratio
    • InnoDB - Dirty pages ratio, InnoDB log file growth
    • Slow queries - By default ClusterControl will raise an alarm if it finds a query running for more than 30 seconds. This is of course configurable under Settings -> Runtime Configuration -> Long Query.
    • Deadlocks - InnoDB transactions deadlock and Galera deadlock.
    • Indexes - Duplicate keys, table without primary keys.

Check out the Advisors page under Performance -> Advisors to get the details of things that can be improved as suggested by ClusterControl. For every advisor, it provides justifications and advice as shown in the following example for "Checking Disk Space Usage" advisor:

ClusterControl Disk Space Usage Check

When a performance issue occurs you will get "Warning" (yellow) or "Critical" (red) status on these advisors. Further tuning is commonly required to overcome the problem. Advisors raise alarms, which means, users will get a copy of these alarms inside the mailbox if Email Notifications are configured accordingly. For every alarm raised by ClusterControl or its advisors, users will also get an email if the alarm has been cleared. These are pre-configured within ClusterControl and require no initial configuration. Further customization is always possible under Manage -> Developer Studio. You can check out this blog post on how to write your own advisor.

ClusterControl also provides a dedicated page in regards to database performance under ClusterControl -> Performance. It provides all sorts of database insights following the best-practices like centralized view of DB Status, Variables, InnoDB status, Schema Analyzer, Transaction Logs. These are pretty self-explanatory and straightforward to understand.

For query performance, you can inspect Top Queries and Query Outliers, where ClusterControl highlights queries which performed significantly differ from their average query. We have covered this topic in detail in this blog post, MySQL Query Performance Tuning.

Database Error Reports

ClusterControl comes with an error report generator tool, to collect debugging information about your database cluster to help understand the current situation and status. To generate an error report, simply go to ClusterControl -> Logs -> Error Reports -> Create Error Report:

ClusterControl Database Error Reports

The generated error report can be downloaded from this page once ready. This generated report will be in TAR ball format (tar.gz) and you may attach it to a support request. Since the support ticket has the limit of 10MB of file size, if the tarball size is bigger than that, you could upload it into a cloud drive and only share with us the download link with proper permission. You may remove it later once we already got the file. You can also generate the error report via command line as explained in the Error Report documentation page.

In the event of an outage, we highly recommend that you generate multiple error reports during and right after the outage. Those reports will be very useful to try to understand what went wrong, the consequences of the outage, and to verify that the cluster is in-fact back to operational status after a disastrous event.


ClusterControl proactive monitoring, together with a set of troubleshooting features, provide an efficient platform for  users to troubleshoot any kind of MySQL database issues. Long gone is the legacy way of troubleshooting where one has to open multiple SSH sessions to access multiple hosts and execute multiple commands repeatedly in order to pinpoint the root cause.

If the above mentioned features are not helping you in solving the problem or troubleshooting the database issue, you always contact the Severalnines Support Team to back you up. Our 24/7/365 dedicated technical experts are available to attend your request at anytime. Our average first reply time is usually less than 30 minutes.

What’s New in MySQL Galera Cluster 4.0


MySQL Galera Cluster 4.0 is the new kid on the database block with very interesting new features. Currently it is available only as a part of MariaDB 10.4 but in the future it will work as well with MySQL 5.6, 5.7 and 8.0. In this blog post we would like to go over some of the new features that came along with Galera Cluster 4.0.

Galera Cluster Streaming Replication

The most important new feature in this release is streaming replication. So far the certification process for the Galera Cluster worked in a way that whole transactions had to be certified after they completed. 

This process was not ideal in several scenarios...

  1. Hotspots in tables, rows which are very frequently updated on multiple nodes - hundreds of fast transactions running on multiple nodes, modifying the same set of rows result in frequent deadlocks and rollback of transactions
  2. Long running transactions - if a transaction takes significant time to complete, this seriously increases chances that some other transaction, in the meantime, on another node, may modify some of the rows that were also updated by the long transaction. This resulted in a deadlock during certification and one of the transactions having to be rolled back.
  3. Large transactions - if a transaction modifies a significant number of rows, it is likely that another transaction, at the same time, on a different node, will modify one of the rows already modified by the large transaction. This results in a deadlock during certification and one of the transactions has to be rolled back. In addition to this, large transactions will take additional time to be processed, sent to all nodes in the cluster and certified. This is not an ideal situation as it adds delay to commits and slows down the whole cluster.

Luckily, streaming replication can solve these problems. The main difference is that the certification happens in chunks where there is no need to wait for the whole transaction to complete. As a result, even if a transaction is large or long, majority (or all, depending on the settings we will discuss in a moment) of rows are locked on all of the nodes, preventing other queries from modifying them.

MySQL Galera Cluster Streaming Replication Options

There are two configuration options for streaming replication: 


This tells how big a fragment should be (by default it is set to 0, which means that the streaming replication is disabled)


This tells what the fragment really is. By default it is bytes, but it can also be a ‘statements’ or ‘rows’. 

Those variables can (and should) be set on a session level, making it possible for user to decide which particular query should be replicated using streaming replication. Setting unit to ‘statements’ and size to 1 allow, for example, to use streaming replication just for a single query which, for example, updates a hotspot.

You can configure Galera 4.0 to certify every row that you have modified and grab the locks on all of the nodes while doing so. This makes streaming replication great at solving problems with frequent deadlocks which, prior to Galera 4.0, were possible to solve only by redirecting all writes to a single node.

WSREP Tables

Galera 4.0 introduces several tables, which will help to monitor the state of the cluster:

  • wsrep_cluster
  • wsrep_cluster_members
  • wsrep_streaming_log

All of them are located in the ‘mysql’ schema. wsrep_cluster will provide insight into the state of the cluster. wsrep_cluster_members will give you information about the nodes that are part of the cluster. wsrep_streaming_log helps to track the state of the streaming replication.

Galera Cluster Upcoming Features

Codership, the company behind the Galera, isn’t done yet. We were able to get a preview of the roadmap  from CEO, Seppo Jaakola which was given at Percona Live earlier this year. Apparently, we are going to see features like XA transaction support and gcache encryption. This is really good news. 

Support for XA transactions will be possible thanks to the streaming replication. In short, XA transactions are the distributed transactions which can run across multiple nodes. They utilize two-phase commit, which requires to first acquire all required locks to run the transaction on all of the nodes and then, once it is done, commit the changes. In previous versions Galera did not have means to lock resources on remote nodes, with streaming replication this has changed.

Gcache is a file which stores writesets. Its contents are sent to joiner nodes which asks for a data transfer. If all data is stored in the gcache, joiner will receive just the missing transactions in the process called Incremental State Transfer (IST). If gcache does not contain all required data, State Snapshot Transfer (SST) will be required and the whole dataset will have to be transferred to the joining node. 

Gcache contains information about recent changes, therefore it’s great to see its contents encrypted for better security. With better security standards being introduced through more and more regulations, it is crucial that the software will become better at achieving compliance.


We are definitely looking forward to see how Galera Cluster 4.0 will work out on databases than MariaDB. Being able to deploy MySQL 5.7 or 8.0 with Galera Cluster will be really great. After all, Galera is one of the most widely tested synchronous replication solutions that are available on the market.

A Guide to MySQL Galera Cluster Streaming Replication: Part Two


In the first part of this blog we provided an overview of the new Streaming Replication feature in MySQL Galera Cluster. In this blog we will show you how to enable it and take a look at the results.

Enabling Streaming Replication

It is highly recommended that you enable Streaming Replication at a session-level for the specific transactions that interact with your application/client. 

As stated in the previous blog, Galera logs its write-sets to the wsrep_streaming_log table in MySQL database. This has the potential to create a performance bottleneck, especially when a rollback is needed. This doesn't mean that you can’t use Streaming Replication, it just means you need to design your application client efficiently when using Streaming Replication so you’ll get better performance. Still, it's best to have Streaming Replication for dealing with and cutting down large transactions.

Enabling Streaming Replication requires you to define the replication unit and number of units to use in forming the transaction fragments. Two parameters control these variables: wsrep_trx_fragment_unit and wsrep_trx_fragment_size.

Below is an example of how to set these two parameters:

SET SESSION wsrep_trx_fragment_unit='statements';

SET SESSION wsrep_trx_fragment_size=3;

In this example, the fragment is set to three statements. For every three statements from a transaction, the node will generate, replicate, and certify a fragment.

You can choose between a few replication units when forming fragments:

  • bytes - This defines the fragment size in bytes.
  • rows- This defines the fragment size as the number of rows the fragment updates.
  • statements- This defines the fragment size as the number of statements in a fragment.

Choose the replication unit and fragment size that best suits the specific operation you want to run.

Streaming Replication In Action

As discussed in our other blog on handling large transactions in Mariadb 10.4, we performed and tested how Streaming Replication performed when enabled based on this criteria...

  1. Baseline, set global wsrep_trx_fragment_size=0;
  2. set global wsrep_trx_fragment_unit='rows'; set global wsrep_trx_fragment_size=1;
  3. set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=1;
  4. set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=5;

And results are

Transactions: 82.91 per sec., queries: 1658.27 per sec. (100%)

Transactions: 54.72 per sec., queries: 1094.43 per sec. (66%)

Transactions: 54.76 per sec., queries: 1095.18 per sec. (66%)

Transactions: 70.93 per sec., queries: 1418.55 per sec. (86%)

For this example we're using Percona XtraDB Cluster 8.0.15 straight from their testing branch using the Percona-XtraDB-Cluster_8.0.15.5-27dev.4.2_Linux.x86_64.ssl102.tar.gz build. 

We then tried a 3-node Galera cluster with hosts info below:

testnode11 =

testnode12 =

testnode13 =

We pre-populated a table from my sysbench database and tried to delete a very large rows. 

root@testnode11[sbtest]#> select count(*) from sbtest1;


| count(*) |


| 12608218 |


1 row in set (25.55 sec)

At first, running without Streaming Replication,

root@testnode12[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size,  @@innodb_lock_wait_timeout;


| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |


| bytes                     | 0 |                         50000 |


1 row in set (0.00 sec)

Then run,

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

However, we ended up getting a rollback...

---TRANSACTION 648910, ACTIVE 573 sec rollback

mysql tables in use 1, locked 1

ROLLING BACK 164858 lock struct(s), heap size 18637008, 12199395 row lock(s), undo log entries 11961589

MySQL thread id 183, OS thread handle 140041167468288, query id 79286 localhost root wsrep: replicating and certifying write set(-1)

delete from sbtest1 where id >= 2000000

Using ClusterControl Dashboards to gather an overview of any indication of flow control, since the transaction runs solely on the master (active-writer) node until commit time, there's no any indication of activity for flow control:

ClusterControl Galera Cluster Overview

In case you’re wondering, the current version of ClusterControl does not yet have direct support for PXC 8.0 with Galera Cluster 4 (as it is still experimental). You can, however, try to import it... but it needs minor tweaks to make your Dashboards work correctly. 

Back to the query process. It failed as it rolled back!

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

ERROR 1180 (HY000): Got error 5 - 'Transaction size exceed set threshold' during COMMIT

regardless of the wsrep_max_ws_rows or wsrep_max_ws_size,

root@testnode11[sbtest]#> select @@global.wsrep_max_ws_rows, @@global.wsrep_max_ws_size/(1024*1024*1024);


| @@global.wsrep_max_ws_rows | @@global.wsrep_max_ws_size/(1024*1024*1024) |


|                          0 |               2.0000 |


1 row in set (0.00 sec)

It did, eventually, reach the threshold.

During this time the system table mysql.wsrep_streaming_log is empty, which indicates that Streaming Replication is not happening or enabled,

root@testnode12[sbtest]#> select count(*) from mysql.wsrep_streaming_log;


| count(*) |


|        0 |


1 row in set (0.01 sec)

root@testnode13[sbtest]#> select count(*) from mysql.wsrep_streaming_log;


| count(*) |


|        0 |


1 row in set (0.00 sec)

and that is verified on the other 2 nodes (testnode12 and testnode13).

Now, let's try enabling it with Streaming Replication,

root@testnode11[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size, @@innodb_lock_wait_timeout;


| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |


| bytes                     | 0 |                      50000 |


1 row in set (0.00 sec)

root@testnode11[sbtest]#> set wsrep_trx_fragment_unit='rows'; set wsrep_trx_fragment_size=100; 

Query OK, 0 rows affected (0.00 sec)

Query OK, 0 rows affected (0.00 sec)

root@testnode11[sbtest]#> select @@wsrep_trx_fragment_unit, @@wsrep_trx_fragment_size, @@innodb_lock_wait_timeout;


| @@wsrep_trx_fragment_unit | @@wsrep_trx_fragment_size | @@innodb_lock_wait_timeout |


| rows                      | 100 |                      50000 |


1 row in set (0.00 sec)

What to Expect When Galera Cluster Streaming Replication is Enabled? 

When query has been performed in testnode11,

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

What happens is that it fragments the transaction piece by piece depending on the set value of variable wsrep_trx_fragment_size. Let's check this in the other nodes:

Host testnode12

root@testnode12[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p''



Trx id counter 567148

Purge done for trx's n:o < 566636 undo n:o < 0 state: running but idle

History list length 44




---TRANSACTION 421740651985200, not started

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 553661, ACTIVE 190 sec

18393 lock struct(s), heap size 2089168, 1342600 row lock(s), undo log entries 1342600

MySQL thread id 898, OS thread handle 140266050008832, query id 216824 wsrep: applied write set (-1)



1 row in set (0.08 sec)

PAGER set to stdout


| Variable_name                    | Value |


| wsrep_flow_control_paused_ns     | 211197844753 |

| wsrep_flow_control_paused        | 0.133786 |

| wsrep_flow_control_sent          | 633 |

| wsrep_flow_control_recv          | 878 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |


8 rows in set (0.00 sec)


| count(*) |


|    13429 |


1 row in set (0.04 sec)


Host testnode13

root@testnode13[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p''



Trx id counter 568523

Purge done for trx's n:o < 567824 undo n:o < 0 state: running but idle

History list length 23




---TRANSACTION 552701, ACTIVE 216 sec

21587 lock struct(s), heap size 2449616, 1575700 row lock(s), undo log entries 1575700

MySQL thread id 936, OS thread handle 140188019226368, query id 600980 wsrep: applied write set (-1)



1 row in set (0.28 sec)

PAGER set to stdout


| Variable_name                    | Value |


| wsrep_flow_control_paused_ns     | 210755642443 |

| wsrep_flow_control_paused        | 0.0231273 |

| wsrep_flow_control_sent          | 1653 |

| wsrep_flow_control_recv          | 3857 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |


8 rows in set (0.01 sec)


| count(*) |


|    15758 |


1 row in set (0.03 sec)

Noticeably, the flow control just kicked in!

ClusterControl Galera Cluster Overview

And WSREP queues send/received has been kicking as well:

ClusterControl Galera Overview
Host testnode12 (
ClusterControl Galera Overview
 Host testnode13 (

Now, let's elaborate more of the result from the mysql.wsrep_streaming_log table,

root@testnode11[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8; show engine innodb status\G nopager;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8'

MySQL thread id 134822, OS thread handle 140041167468288, query id 0 System lock

---TRANSACTION 649008, ACTIVE 481 sec

mysql tables in use 1, locked 1

53104 lock struct(s), heap size 6004944, 3929602 row lock(s), undo log entries 3876500

MySQL thread id 183, OS thread handle 140041167468288, query id 105367 localhost root updating

delete from sbtest1 where id >= 2000000



1 row in set (0.01 sec)

then taking the result of,

root@testnode12[sbtest]#> select count(*) from mysql.wsrep_streaming_log;


| count(*) |


|    38899 |


1 row in set (0.40 sec)

It tells how much fragment has been replicated using Streaming Replication. Now, let's do some basic math:

root@testnode12[sbtest]#> select 3876500/38899.0;


| 3876500/38899.0 |


|         99.6555 |


1 row in set (0.03 sec)

I'm taking the undo log entries from theSHOW ENGINE INNODB STATUS\G result and then divide the total count of the mysql.wsrep_streaming_log records. As I've set it earlier, I defined wsrep_trx_fragment_size= 100. The result will show you how much the total replicated logs are currently being processed by Galera.

It’s important to take note at what Streaming Replication is trying to achieve... "the node breaks the transaction into fragments, then certifies and replicates them on the slaves while the transaction is still in progress. Once certified, the fragment can no longer be aborted by conflicting transactions."

The fragments are considered transactions, which have been passed to the remaining nodes within the cluster, certifying the fragmented transaction, then applying the write-sets. This means that once your large transaction has been certified or prioritized, all incoming connections that could possibly have a deadlock will need to wait until the transactions finishes.

Now, the verdict of deleting a huge table? 

root@testnode11[sbtest]#> delete from sbtest1 where id >= 2000000;

Query OK, 12034538 rows affected (30 min 36.96 sec)

It finishes successfully without any failure!

How does it look like in the other nodes? In testnode12,

root@testnode12[sbtest]#> pager sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8; show engine innodb status\G nopager; show global status like 'wsrep%flow%'; select count(*) from mysql.wsrep_streaming_log;

PAGER set to 'sed -n '/TRANSACTIONS/,/FILE I\/O/p'|tail -8'

0 lock struct(s), heap size 1136, 0 row lock(s)

---TRANSACTION 421740651985200, not started

0 lock struct(s), heap size 1136, 0 row lock(s)


165631 lock struct(s), heap size 18735312, 12154883 row lock(s), undo log entries 12154883

MySQL thread id 898, OS thread handle 140266050008832, query id 341835 wsrep: preparing to commit write set(215510)



1 row in set (0.46 sec)

PAGER set to stdout


| Variable_name                    | Value |


| wsrep_flow_control_paused_ns     | 290832524304 |

| wsrep_flow_control_paused        | 0 |

| wsrep_flow_control_sent          | 0 |

| wsrep_flow_control_recv          | 0 |

| wsrep_flow_control_interval      | [ 173, 173 ] |

| wsrep_flow_control_interval_low  | 173 |

| wsrep_flow_control_interval_high | 173          |

| wsrep_flow_control_status        | OFF |


8 rows in set (0.53 sec)


| count(*) |


|   120345 |


1 row in set (0.88 sec)

It stops at a total of 120345 fragments, and if we do the math again on the last captured undo log entries (undo logs are the same from the master as well),

root@testnode12[sbtest]#> select 12154883/120345.0;                                                                                                                                                   +-------------------+

| 12154883/120345.0 |


|          101.0003 |


1 row in set (0.00 sec)

So we had a total of 120345 transactions being fragmented to delete 12034538 rows.

Once you're done using or enabling Stream Replication, do not forget to disable it as it will always log huge transactions and adds a lot of performance overhead to your cluster. To disable it, just run

root@testnode11[sbtest]#> set wsrep_trx_fragment_size=0;

Query OK, 0 rows affected (0.04 sec)


With Streaming Replication enabled, it's important that you are able to identify how large your fragment size can be and what unit you have to choose (bytes, rows, statements). 

It is also very important that you need to run it at session-level and of course identify when you only need to use Streaming Replication. 

While performing these tests, deleting a large number of rows to a huge table with Streaming Replication enabled has noticeably caused a high peak of disk utilization and CPU utilization. The RAM was more stable, but this could due to the statement we performed is not highly a memory contention. 

It’s safe to say that Streaming Replication can cause performance bottlenecks when dealing with large records, so using it should be done with proper decision and care. 

Lastly, if you are using Streaming Replication, do not forget to always disable this once done on that current session to avoid unwanted problems.


Database Load Balancing in the Cloud - MySQL Master Failover with ProxySQL 2.0: Part One (Deployment)


The cloud provides very flexible environments to work with. You can easily scale it up and down by adding or removing nodes. If there’s a need, you can easily create a clone of your environment. This can be used for processes like upgrades, load tests, disaster recovery. The main problem you have to deal with is that applications have to connect to the databases in some way, and flexible setups can be tricky for databases - especially with master-slave setups. Luckily, there are some options to make this process easier. 

One way is to utilize a database proxy. There are several proxies to pick from, but in this blog post we will use ProxySQL, a well known proxy available for MySQL and MariaDB. We are going to show how you can use it to efficiently move traffic between MySQL nodes without visible impact for the application. We are also going to explain some limitations and drawbacks of this approach.

Initial Cloud Setup

At first, let’s discuss the setup. We will use AWS EC2 instances for our environment. As we are only testing, we don’t really care about high availability other than what we want to prove to be possible - seamless master changes. Therefore we will use a single application node and a single ProxySQL node. As per good practices, we will collocate ProxySQL on the application node and the application will be configured to connect to ProxySQL through Unix socket. This will reduce overhead related to TCP connections and increase security - traffic from the application to the proxy will not leave the local instance, leaving only ProxySQL - > MySQL connection to encrypt. Again, as this is a simple test, we will not setup SSL. In production environments you want to do that, even if you use VPC.

The environment will look like in the diagram below:

As the application, we will use Sysbench - a synthetic benchmark program for MySQL. It has an option to disable and enable the use of transactions, which we will use to demonstrate how ProxySQL handles them.

Installing a MySQL Replication Cluster Using ClusterControl

To make the deployment fast and efficient, we are going to use ClusterControl to deploy the MySQL replication setup for us. The installation of ClusterControl requires just a couple of steps. We won’t go into details here but you should open our website, register and installation of ClusterControl should be pretty much straightforward. Please keep in mind that you need to setup passwordless SSH between ClusterControl instance and all nodes that we will be managing with it.

Once ClusterControl has been installed, you can log in. You will be presented with a deployment wizard:

As we already have instances running in cloud, therefore we will just go with “Deploy” option. We will be presented with the following screen:

We will pick MySQL Replication as the cluster type and we need to provide connectivity details. It can be connection using root user or it can as well be a sudo user with or without a password.

In the next step, we have to make some decisions. We will use Percona Server for MySQL in its latest version. We also have to define a password for the root user on the nodes we will deploy.

In the final step we have to define a topology - we will go with what we proposed at the beginning - a master and three slaves.

ClusterControl will start the deployment - we can track it in the Activity tab, as shown on the screenshot above.

Once the deployment has completed, we can see the cluster in the cluster list:

Installing ProxySQL 2.0 Using ClusterControl

The next step will be to deploy ProxySQL. ClusterControl can do this for us.

We can do this in Manage -> Load Balancer.

As we are just testing things, we are going to reuse the ClusterControl instance for ProxySQL and Sysbench. In real life you would probably want to use your “real” application server. If you can’t find it in the drop down, you can always write the server address (IP or hostname) by hand.

We also want to define credentials for monitoring and administrative user. We also double-checked that ProxySQL 2.0 will be deployed (you can always change it to 1.4.x if you need).

On the bottom part of the wizard we will define the user which will be created in both MySQL and ProxySQL. If you have an existing application, you probably want to use an existing user. If you use numerous users for your application you can always import the rest of them later, after ProxySQL will be deployed.

We want to ensure that all the MySQL instances will be configured in ProxySQL. We will use explicit transactions so we set the switch accordingly. This is all we needed to do - the rest is to click on the “Deploy ProxySQL” button and let ClusterControl does its thing.

When the installation is completed, ProxySQL will show up on the list of nodes in the cluster. As you can see on the screenshot above, it already detected the topology and distributed nodes across reader and writer hostgroups.

Installing Sysbench

The final step will be to create our “application” by installing Sysbench. The process is fairly simple. At first we have to install prerequisites, libraries and tools required to compile Sysbench:

root@ip-10-0-0-115:~# apt install git automake libtool make libssl-dev pkg-config libmysqlclient-dev

Then we want to clone the sysbench repository:

root@ip-10-0-0-115:~# git clone https://github.com/akopytov/sysbench.git

Finally we want to compile and install Sysbench:

root@ip-10-0-0-115:~# cd sysbench/

root@ip-10-0-0-115:~/sysbench# ./autogen.sh && ./configure && make && make install

This is it, Sysbench has been installed. We now need to generate some data. For that, at first, we need to create a schema. We will connect to local ProxySQL and through it we will create a ‘sbtest’ schema on the master. Please note we used Unix socket for connection with ProxySQL.

root@ip-10-0-0-115:~/sysbench# mysql -S /tmp/proxysql.sock -u sbtest -psbtest

mysql> CREATE DATABASE sbtest;

Query OK, 1 row affected (0.01 sec)

Now we can use sysbench to populate the database with data. Again, we do use Unix socket for connection with the proxy:

root@ip-10-0-0-115:~# sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --events=0 --time=3600 --mysql-socket=/tmp/proxysql.sock --mysql-user=sbtest --mysql-password=sbtest --tables=32 --report-interval=1 --skip-trx=on --table-size=100000 --db-ps-mode=disable prepare

Once the data is ready, we can proceed to our tests. 


In the second part of this blog, we will discuss ProxySQL’s handling of connections, failover and its settings that can help us to manage the master switch in a way that will be the least intrusive to the application.

Building a Hot Standby on Amazon AWS Using MariaDB Cluster


Galera Cluster 4.0 was first released as part of the MariaDB 10.4 and there are a lot of significant improvements in this version release. The most impressive feature in this release is the Streaming Replication which is designed to handle the following problems.

  • Problems with long transactions
  • Problems with large transactions
  • Problems with hot-spots in tables

In a previous blog, we deep-dove into the new Streaming Replication feature in a two-part series blog (Part 1 and Part 2). Part of this new feature in Galera 4.0 are new system tables which are very useful for querying and checking the Galera Cluster nodes and also the logs that have been processed in Streaming Replication. 

Also in previous blogs, we also showed you the Easy Way to Deploy a MySQL Galera Cluster on AWS and also how to Deploy a MySQL Galera Cluster 4.0 onto Amazon AWS EC2.

Percona hasn't released a GA for their Percona XtraDB Cluster (PXC) 8.0 yet as some features are still under development, such as the MySQL wsrep function WSREP_SYNC_WAIT_UPTO_GTID which looks to be not present yet (at least on PXC 8.0.15-5-27dev.4.2 version). Yet, when PXC 8.0 will be released, it will be packed with great features such as...

  • Improved resilient cluster
  • Cloud friendly cluster
  • improved packaging
  • Encryption support
  • Atomic DDL

While we're waiting for the release of PXC 8.0 GA, we'll cover in this blog how you can create a Hot Standby Node on Amazon AWS for Galera Cluster 4.0 using MariaDB.

What is a Hot Standby?

A hot standby is a common term in computing, especially on highly distributed systems. It's a method for redundancy in which one system runs simultaneously with an identical primary system. When failure happens on the primary node, the hot standby immediately takes over replacing the primary system. Data is mirrored to both systems in real time.

For database systems, a hot standby server is usually the second node after the primary master that is running on powerful resources (same as the master). This secondary node has to be as stable as the primary master to function correctly. 

It also serves as a data recovery node if the master node or the entire cluster goes down. The hot standby node will replace the failing node or cluster while continuously serving the demand from the clients.

In Galera Cluster, all servers part of the cluster can serve as a standby node. However, if the region or entire cluster goes down, how will you be able to cope up with this? Creating a standby node outside the specific region or network of your cluster is one option here. 

In the following section, we'll show you how to create a standby node on AWS EC2 using MariaDB.

Deploying a Hot Standby On Amazon AWS

Previously, we have showed you how you can create a Galera Cluster on AWS. You might want to read Deploying MySQL Galera Cluster 4.0 onto Amazon AWS EC2 in the case that you are new to Galera 4.0.

Deploying your hot standby node can be on another set of Galera Cluster which uses synchronous replication (check this blog Zero Downtime Network Migration With MySQL Galera Cluster Using Relay Node) or by deploying an asynchronous MySQL/MariaDB node. In this blog, we'll setup and deploy the hot standby node replicating asynchronously from one of the Galera nodes.

The Galera Cluster Setup

In this sample setup, we deployed 3-node cluster using MariaDB 10.4.8 version. This cluster is being deployed under US East (Ohio) region and the topology is shown below:

We'll use server as the master for our asynchronous slave which will serve as the standby node.

Setting up your EC2 Instance for Hot Standby Node

In the AWS console, go to EC2 found under the Compute section and click Launch Instance to create an EC2 instance just like below.

We'll create this instance under the US West (Oregon) region. For your OS type, you can choose what server you like (I prefer Ubuntu 18.04) and choose the type of instance based on your preferred target type. For this example I will use t2.micro since it doesn't require any sophisticated setup and it's only for this sample deployment.

As we've mentioned earlier that its best that your hot standby node be located on a different region and not collocated or within the same region. So in case the regional data center goes down or suffers a network outage, your hot standby can be your failover target when things gone bad. 

Before we continue, in AWS, different regions will have its own Virtual Private Cloud (VPC) and its own network. In order to communicate with the Galera cluster nodes, we must first define a VPC Peering so the nodes can communicate within the Amazon infrastructure and do not need to go outside the network which just adds overhead and security concerns. 

First, go to your VPC from where your hot standby node shall reside, then go to Peering Connections. Then you need to specify the VPC of your standby node and the Galera cluster VPC. In the example below, I have us-west-2 interconnecting to us-east-2.

Once created, you'll see an entry under your Peering Connections. However, you need to accept the request from the Galera cluster VPC, which is on us-east-2 in this example. See below,

Once accepted, do not forget to add the CIDR to the routing table. See this external blog VPC Peering about how to do it after VPC Peering.

Now, let's go back and continue creating the EC2 node. Make sure your Security Group has the correct rules or required ports that needs to be opened. Check the firewall settings manual for more information about this. For this setup,  I just set All Traffic to be accepted since this is just a test. See below,

Make sure when creating your instance, you have set the correct VPC and you have defined your proper subnet. You can check this blog in case you need some help about that. 

Setting up the MariaDB Async Slave

Step One

First we need to setup the repository, add the repo keys and update the package list in the repository cache,

$ vi /etc/apt/sources.list.d/mariadb.list

$ apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8

$ apt update

Step Two

Install the MariaDB packages and its required binaries

$ apt-get install mariadb-backup  mariadb-client mariadb-client-10.4 libmariadb3 libdbd-mysql-perl mariadb-client-core-10.4 mariadb-common mariadb-server-10.4 mariadb-server-core-10.4 mysql-common

Step Three

Now, let's take a backup using xbstream to transfer the files to the network from one of the nodes in our Galera Cluster.

## Wipe out the datadir of the newly fresh installed MySQL in your hot standby node.

$ systemctl stop mariadb

$ rm -rf /var/lib/mysql/*

## Then on the hot standby node, run this on the terminal,

$ socat -u tcp-listen:9999,reuseaddr stdout 2>/tmp/netcat.log | mbstream -x -C /var/lib/mysql

## Then on the target master, i.e. one of the nodes in your Galera Cluster (which is the node in this example), run this on the terminal,

$ mariabackup  --backup --target-dir=/tmp --stream=xbstream | socat - TCP4:

where is the IP of the host standby node.

Step Four

Prepare your MySQL configuration file. Since this is in Ubuntu, I am editing the file in /etc/mysql/my.cnf and with the following sample my.cnf taken from our ClusterControl template,










# log_output = FILE

#Slow logging    









innodb_data_file_path = ibdata1:100M:autoextend

## You may want to tune the below depending on number of cores and disk sub









# innodb_file_format = barracuda

innodb_flush_method = O_DIRECT


# innodb_locks_unsafe_for_binlog = 1


## avoid statistics update when doing e.g show tables




# collation_server = utf8_unicode_ci

# init_connect = 'SET NAMES utf8'

# character_set_server = utf8













key_buffer_size = 24M

tmp_table_size = 64M

max_heap_table_size = 64M

max_allowed_packet = 512M

# sort_buffer_size = 256K

# read_buffer_size = 256K

# read_rnd_buffer_size = 512K

# myisam_sort_buffer_size = 8M






query_cache_type = 0

query_cache_size = 0



# 5.6 backwards compatibility (FIXME)

# explicit_defaults_for_timestamp = 1

performance_schema = OFF

performance-schema-max-mutex-classes = 0

performance-schema-max-mutex-instances = 0



# default_character_set = utf8



# default_character_set = utf8



max_allowed_packet = 512M

# default_character_set = utf8



# log_error = /var/log/mysqld.log


# datadir = /var/lib/mysql

Of course, you can change this according to your setup and requirements.

Step Five

Prepare the backup from step #3 i.e. the finish backup that is now in the hot standby node by running the command below,

$ mariabackup --prepare --target-dir=/var/lib/mysql

Step Six

Set the ownership of the datadir in the hot standby node,

$ chown -R mysql.mysql /var/lib/mysql

Step Seven

Now, start the MariaDB instance

$  systemctl start mariadb

Step Eight

Lastly, we need to setup the asynchronous replication,

## Create the replication user on the master node, i.e. the node in the Galera cluster

MariaDB [(none)]> CREATE USER 'cmon_replication'@'' IDENTIFIED BY 'PahqTuS1uRIWYKIN';

Query OK, 0 rows affected (0.866 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'cmon_replication'@'';

Query OK, 0 rows affected (0.127 sec)

## Get the GTID slave position from xtrabackup_binlog_info as follows,

$  cat /var/lib/mysql/xtrabackup_binlog_info

binlog.000002   71131632 1000-1000-120454

##  Then setup the slave replication as follows,

MariaDB [(none)]> SET GLOBAL gtid_slave_pos='1000-1000-120454';

Query OK, 0 rows affected (0.053 sec)

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='', MASTER_USER='cmon_replication', master_password='PahqTuS1uRIWYKIN', MASTER_USE_GTID = slave_pos;

## Now, check the slave status,

MariaDB [(none)]> show slave status \G

*************************** 1. row ***************************

                Slave_IO_State: Waiting for master to send event


                   Master_User: cmon_replication

                   Master_Port: 3306

                 Connect_Retry: 60


           Read_Master_Log_Pos: 4

                Relay_Log_File: relay-bin.000001

                 Relay_Log_Pos: 4


              Slave_IO_Running: Yes

             Slave_SQL_Running: Yes







                    Last_Errno: 0


                  Skip_Counter: 0

           Exec_Master_Log_Pos: 4

               Relay_Log_Space: 256

               Until_Condition: None


                 Until_Log_Pos: 0

            Master_SSL_Allowed: No






         Seconds_Behind_Master: 0

 Master_SSL_Verify_Server_Cert: No

                 Last_IO_Errno: 0


                Last_SQL_Errno: 0



              Master_Server_Id: 1000



                    Using_Gtid: Slave_Pos

                   Gtid_IO_Pos: 1000-1000-120454



                 Parallel_Mode: conservative

                     SQL_Delay: 0

           SQL_Remaining_Delay: NULL

       Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it

              Slave_DDL_Groups: 0

Slave_Non_Transactional_Groups: 0

    Slave_Transactional_Groups: 0

1 row in set (0.000 sec)

Adding Your Hot Standby Node To ClusterControl

If you are using ClusterControl, it's easy to monitor your database server's health. To add this as a slave, select the Galera node cluster you have then go to the selection button as shown below to Add Replication Slave:

Click Add Replication Slave and choose adding an existing slave just like below,

Our topology looks promising.

As you might notice, our node serving as our hot standby node has a different CIDR which prefixes as 172.32.% us-west-2 (Oregon) while the other nodes are of 172.31.% located on us-east-2 (Ohio). They're totally on different region, so in case network failure occurs on your Galera nodes, you can failover to your hot standby node.


Building a Hot Standby on Amazon AWS is easy and straightforward. All you need is to determine your capacity requirements and your networking topology, security, and protocols that need to be setup. 

Using VPC Peering helps speed up inter-communication between different region without going to the public internet, so the connection stays within the Amazon network infrastructure. 

Using asynchronous replication with one slave is, of course, not enough, but this blog serves as the foundation on how you can initiate this. You can now easily create another cluster where the asynchronous slave is replicating and create another series of Galera Clusters serving as your Disaster Recovery nodes, or you can also use gmcast.segment variable in Galera to replicate synchronously just like what we have on this blog

Handling Replication Issues from non-GTID to GTID MariaDB Database Clusters


We recently ran into an interesting customer support case involving a MariaDB replication setup. We spent a lot of time researching this problem and thought it would be worth sharing this with you in this blog post.

Customer’s Environment Description

The issue was as follows: an old (pre 10.x) MariaDB server was in use and an attempt was made to migrate data from it into more recent MariaDB replication setup. This resulted in issues with using Mariabackup to rebuild slaves in the new replication cluster. For the purpose of the tests we recreated this behavior in the following environment:

The data has been migrated from 5.5 to 10.4 using mysqldump:

mysqldump --single-transaction --master-data=2 --events --routines sbtest > /root/dump.sql

This allowed us to collect master binary log coordinates and the consistent dump. As a result, we were able to provision MariaDB 10.4 master node and set up the replication between old 5.5 master and new 10.4 node. The traffic was still running on 5.5 node. 10.4 master was generating GTID’s as it had to replicate data to 10.4 slave. Before we dig into details, let's take a quick look into how GTID’s work in MariaDB.

MariaDB and GTID

For starters, MariaDB uses a different format of the GTID than Oracle MySQL. It consists of three numbers separated by dashes:

0 - 1 - 345

First is a replication domain, which allows for multi-source replication to be properly handled. This is not relevant to our case as all the nodes are in the same replication domain. Second number is the server ID of the node that generated the GTID. Third one is the sequence number - it monotonically increases with every event stored in the binary logs.

MariaDB uses several variables to store the information about GTID’s executed on a given node. The most interesting for us are:

Gtid_binlog_pos - as per the documentation, this variable is the GTID of the last event group written to the binary log.

Gtid_slave_pos - as per the documentation, this system variable contains the GTID of the last transaction applied to the database by the server's slave threads.

Gtid_current_pos - as per the documentation, this system variable contains the GTID of the last transaction applied to the database. If the server_id of the corresponding GTID in gtid_binlog_pos is equal to the servers own server_id, and the sequence number is higher than the corresponding GTID in gtid_slave_pos, then the GTID from gtid_binlog_pos will be used. Otherwise the GTID from gtid_slave_pos will be used for that domain.

So, to make it clear, gtid_binlog_pos stores GTID of the last locally executed event. Gtid_slave_pos stores GTID of the event executed by the slave thread and gtid_current_pos shows either the value from gtid_binlog_pos, if it has the highest sequence number and it has server-id or gtid_slave_pos if it has the highest sequence. Please keep this in your mind.

An Overview of the Issue

The initial state of the relevant variables are on 10.4 master:

MariaDB [(none)]> show global variables like '%gtid%';


| Variable_name           | Value |


| gtid_binlog_pos         | 0-1001-1 |

| gtid_binlog_state       | 0-1001-1 |

| gtid_cleanup_batch_size | 64       |

| gtid_current_pos        | 0-1001-1 |

| gtid_domain_id          | 0 |

| gtid_ignore_duplicates  | ON |

| gtid_pos_auto_engines   | |

| gtid_slave_pos          | 0-1001-1 |

| gtid_strict_mode        | ON |

| wsrep_gtid_domain_id    | 0 |

| wsrep_gtid_mode         | OFF |


11 rows in set (0.001 sec)

Please note gtid_slave_pos which, theoretically, doesn’t make sense - it came from the same node but via slave thread. This could happen if you make a master switch before. We did just that - having two 10.4 nodes we switched the masters from host with server ID of 1001 to host with server ID of 1002 and then back to 1001.

Afterwards we configured the replication from 5.5 to 10.4 and this is how things looked like:

MariaDB [(none)]> show global variables like '%gtid%';


| Variable_name           | Value |


| gtid_binlog_pos         | 0-55-117029 |

| gtid_binlog_state       | 0-1001-1537,0-55-117029 |

| gtid_cleanup_batch_size | 64                      |

| gtid_current_pos        | 0-1001-1 |

| gtid_domain_id          | 0 |

| gtid_ignore_duplicates  | ON |

| gtid_pos_auto_engines   | |

| gtid_slave_pos          | 0-1001-1 |

| gtid_strict_mode        | ON |

| wsrep_gtid_domain_id    | 0 |

| wsrep_gtid_mode         | OFF |


11 rows in set (0.000 sec)

As you can see, the events replicated from MariaDB 5.5, they all have been accounted for in gtid_binlog_pos variable: all events with server ID of 55. This results in a serious issue. As you may remember, gtid_binlog_pos should contain events executed locally on the host. Here it contains events replicated from another server with different server ID.

This makes things dicey when you want to rebuild the 10.4 slave, here’s why. Mariabackup, just like Xtrabackup, works in a simple way. It copies the files from the MariaDB server while scanning redo logs and storing any incoming transactions. When the files have been copied, Mariabackup would freeze the database using either FLUSH TABLES WITH READ LOCK or backup locks, depending on the MariaDB version and the availability of the backup locks. Then it reads the latest executed GTID and stores it alongside the backup. Then the lock is released and backup is completed. The GTID stored in the backup should be used as the latest executed GTID on a node. In case of rebuilding slaves it will be put as a gtid_slave_pos and then used to start the GTID replication. This GTID is taken from gtid_current_pos, which makes perfect sense - after all it is the “GTID of the last transaction applied to the database”. Acute reader can already see the problem. Let’s show the output of the variables when 10.4 replicates from the 5.5 master:

MariaDB [(none)]> show global variables like '%gtid%';


| Variable_name           | Value |


| gtid_binlog_pos         | 0-55-117029 |

| gtid_binlog_state       | 0-1001-1537,0-55-117029 |

| gtid_cleanup_batch_size | 64                      |

| gtid_current_pos        | 0-1001-1 |

| gtid_domain_id          | 0 |

| gtid_ignore_duplicates  | ON |

| gtid_pos_auto_engines   | |

| gtid_slave_pos          | 0-1001-1 |

| gtid_strict_mode        | ON |

| wsrep_gtid_domain_id    | 0 |

| wsrep_gtid_mode         | OFF |


11 rows in set (0.000 sec)

Gtid_current_pos is set to 0-1001-1. This is definitely not the correct moment in time, it’s taken from gtid_slave_pos while we have a bunch of transactions that came from 5.5 after that. The problem is that those transactions are stored as gtid_binlog_pos. On the other hand gtid_current_pos is calculated in a way that it requires local server ID for GTID’s in gitd_binlog_pos before they can be used as the gtid_current_pos. In our case they have the server ID of the 5.5 node so they will not be treated properly as events executed on the 10.4 master. After backup restore, if you’d set the slave according to the GTID state stored in the backup, it would end up re-applying all the events that came from 5.5. This, obviously, would break the replication.

The Solution

A solution to this problem is to take several additional steps:

  1. Stop the replication from 5.5 to 10.4. Run STOP SLAVE on 10.4 master
  2. Execute any transaction on 10.4 - CREATE SCHEMA IF NOT EXISTS bugfix - this will change the GTID situation like this:
MariaDB [(none)]> show global variables like '%gtid%';


| Variable_name           | Value   |


| gtid_binlog_pos         | 0-1001-117122   |

| gtid_binlog_state       | 0-55-117121,0-1001-117122 |

| gtid_cleanup_batch_size | 64                        |

| gtid_current_pos        | 0-1001-117122   |

| gtid_domain_id          | 0   |

| gtid_ignore_duplicates  | ON   |

| gtid_pos_auto_engines   |   |

| gtid_slave_pos          | 0-1001-1   |

| gtid_strict_mode        | ON   |

| wsrep_gtid_domain_id    | 0   |

| wsrep_gtid_mode         | OFF   |


11 rows in set (0.001 sec)

The latest GITD was executed locally, so it was stored as gtid_binlog_pos. As it has the local server ID, it’s picked as the gtid_current_pos. Now, you can take a backup and use it to rebuild slaves off 10.4 master. Once this is done, start the slave thread again.

MariaDB is aware that this kind of bug exists, one of the relevant bug report we found is: https://jira.mariadb.org/browse/MDEV-10279 Unfortunately, there’s no fix so far. What we found is that this issue affects MariaDB up to 5.5. Non-GTID events that come from MariaDB 10.0 are correctly accounted on 10.4 as coming from the slave thread and gtid_slave_pos is properly updated. MariaDB 5.5 is quite an old one (even though it still supported) so you still may see setups running on it and attempts to migrate from 5.5 to more recent, GTID-enabled MariaDB versions. What’s worse, according to the bug report we found, this also affects replication coming from non-MariaDB (one of the comments mentions issue showing up on Percona Server 5.6) servers into MariaDB. 

Anyway, we hope you found this blog post useful and hopefully you will not run into the problem we just described.


Maximizing Database Query Efficiency for MySQL - Part One


Slow queries, inefficient queries, or long running queries are problems that regularly plague DBA's. They are always ubiquitous, yet are an inevitable part of life for anyone responsible for managing a database. 

Poor database design can affect the efficiency of the query and its performance. Lack of knowledge or improper use of function calls, stored procedures, or routines can also cause database performance degradation and can even harm the entire MySQL database cluster

For a master-slave replication, a very common cause of these issues are tables which lack primary or secondary indexes. This causes slave lag which can last for a very long time (in a worse case scenario).

In this two-part series blog, we'll give you a refresher course on how to tackle the maximizing of your database queries in MySQL to driver better efficiency and performance.

Always Add a Unique Index To Your Table

Tables that do not have primary or unique keys typically create huge problems when data gets bigger. When this happens a simple data modification can stall the database. Lack of proper indices and an UPDATE or DELETE statement has been applied to the particular table, a full table scan will be chosen as the query plan by MySQL. That can cause high disk I/O for reads and writes and degrades the performance of your database. See an example below:

root[test]> show create table sbtest2\G

*************************** 1. row ***************************

       Table: sbtest2

Create Table: CREATE TABLE `sbtest2` (

  `id` int(10) unsigned NOT NULL,

  `k` int(10) unsigned NOT NULL DEFAULT '0',

  `c` char(120) NOT NULL DEFAULT '',

  `pad` char(60) NOT NULL DEFAULT ''


1 row in set (0.00 sec)

root[test]> explain extended update sbtest2 set k=52, pad="xx234xh1jdkHdj234" where id=57;


| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref | rows | filtered | Extra       |


|  1 | UPDATE      | sbtest2 | NULL       | ALL | NULL | NULL | NULL    | NULL | 1923216 | 100.00 | Using where |


1 row in set, 1 warning (0.06 sec)

Whereas a table with primary key has a very good query plan,

root[test]> show create table sbtest3\G

*************************** 1. row ***************************

       Table: sbtest3

Create Table: CREATE TABLE `sbtest3` (

  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

  `k` int(10) unsigned NOT NULL DEFAULT '0',

  `c` char(120) NOT NULL DEFAULT '',

  `pad` char(60) NOT NULL DEFAULT '',

  PRIMARY KEY (`id`),

  KEY `k` (`k`)


1 row in set (0.00 sec)

root[test]> explain extended update sbtest3 set k=52, pad="xx234xh1jdkHdj234" where id=57;


| id | select_type | table   | partitions | type | possible_keys | key     | key_len | ref | rows | filtered | Extra   |


|  1 | UPDATE      | sbtest3 | NULL       | range | PRIMARY | PRIMARY | 4       | const | 1 | 100.00 | Using where |


1 row in set, 1 warning (0.00 sec)

Primary or unique keys provides vital component for a table structure because this is very important especially when performing maintenance on a table. For example, using tools from the Percona Toolkit (such as pt-online-schema-change or pt-table-sync) recommends that you must have unique keys. Keep in mind that the PRIMARY KEY is already a unique key and a primary key cannot hold NULL values but unique key. Assigning a NULL value to a Primary Key can cause an error like,

ERROR 1171 (42000): All parts of a PRIMARY KEY must be NOT NULL; if you need NULL in a key, use UNIQUE instead

For slave nodes, it is also common that in certain occasions, the primary/unique key is not present on the table which therefore are discrepancy of the table structure. You can use mysqldiff to achieve this or you can mysqldump --no-data … params and and run a diff to compare its table structure and check if there's any discrepancy. 

Scan Tables With Duplicate Indexes, Then Dropped It

Duplicate indices can also cause performance degradation, especially when the table contains a huge number of records. MySQL has to perform multiple attempts to optimize the query and performs more query plans to check. It includes scanning large index distribution or statistics and that adds performance overhead as it can cause memory contention or high I/O memory utilization.

Degradation for queries when duplicate indices are observed on a table also attributes on saturating the buffer pool. This can also affect the performance of MySQL when the checkpointing flushes the transaction logs into the disk. This is due to the processing and storing of an unwanted index (which is in fact a waste of space in the particular tablespace of that table). Take note that duplicate indices are also stored in the tablespace which also has to be stored in the buffer pool. 

Take a look at the table below which contains multiple duplicate keys:

root[test]#> show create table sbtest3\G

*************************** 1. row ***************************

       Table: sbtest3

Create Table: CREATE TABLE `sbtest3` (

  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

  `k` int(10) unsigned NOT NULL DEFAULT '0',

  `c` char(120) NOT NULL DEFAULT '',

  `pad` char(60) NOT NULL DEFAULT '',

  PRIMARY KEY (`id`),

  KEY `k` (`k`,`pad`,`c`),

  KEY `kcp2` (`id`,`k`,`c`,`pad`),

  KEY `kcp` (`k`,`c`,`pad`),

  KEY `pck` (`pad`,`c`,`id`,`k`)


1 row in set (0.00 sec)

and has a size of 2.3GiB

root[test]#> \! du -hs /var/lib/mysql/test/sbtest3.ibd

2.3G    /var/lib/mysql/test/sbtest3.ibd

Let's drop the duplicate indices and rebuild the table with a no-op alter,

root[test]#> drop index kcp2 on sbtest3; drop index kcp on sbtest3 drop index pck on sbtest3;

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.01 sec)

Records: 0  Duplicates: 0  Warnings: 0

root[test]#> alter table sbtest3 engine=innodb;

Query OK, 0 rows affected (28.23 sec)

Records: 0  Duplicates: 0  Warnings: 0

root[test]#> \! du -hs /var/lib/mysql/test/sbtest3.ibd

945M    /var/lib/mysql/test/sbtest3.ibd

It has been able to save up to ~59% of the old size of the table space which is really huge.

To determine duplicate indexes, you can use pt-duplicate-checker to handle the job for you. 

Tune Up your Buffer Pool

For this section I’m referring only to the InnoDB storage engine. 

Buffer pool is an important component within the InnoDB kernel space. This is where InnoDB caches table and index data when accessed. It speeds up processing because frequently used data are being stored in the memory efficiently using BTREE. For instance, If you have multiple tables consisting of >= 100GiB and are accessed heavily, then we suggest that you delegate a fast volatile memory starting from a size of 128GiB and start assigning the buffer pool with 80% of the physical memory. The 80% has to be monitored efficiently. You can use SHOW ENGINE INNODB STATUS \G or you can leverage monitoring software such as ClusterControl which offers a fine-grained monitoring which includes buffer pool and its relevant health metrics. Also set the innodb_buffer_pool_instances variable accordingly. You might set this larger than 8 (default if innodb_buffer_pool_size >= 1GiB), such as 16, 24, 32, or 64 or higher if necessary.  

When monitoring the buffer pool, you need to check global status variable Innodb_buffer_pool_pages_free which provides you thoughts if there's a need to adjust the buffer pool, or maybe consider if there are also unwanted or duplicate indexes that consumes the buffer. The SHOW ENGINE INNODB STATUS \G also offers a more detailed aspect of the buffer pool information including its individual buffer pool based on the number of innodb_buffer_pool_instances you have set.

Use FULLTEXT Indexes (But Only If Applicable)

Using queries like,

SELECT bookid, page, context FROM books WHERE context like '%for dummies%';

wherein context is a string-type (char, varchar, text) column, is an example of a super bad query! Pulling large content of records with a filter that has to be greedy ends up with a full table scan, and that is just crazy. Consider using FULLTEXT index. A FULLTEXT indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.

In order to use FULLTEXT for searching or filtering data, you need to use the combination of MATCH() ...AGAINST syntax and not like the query above. Of course, you need to specify the field to be your FULLTEXT index field. 

To create a FULLTEXT index, just specify with FULLTEXT as your index. See the example below:

root[minime]#> CREATE FULLTEXT INDEX aboutme_fts ON users_info(aboutme);

Query OK, 0 rows affected, 1 warning (0.49 sec)

Records: 0  Duplicates: 0  Warnings: 1

root[jbmrcd_date]#> show warnings;


| Level   | Code | Message                                          |


| Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |


1 row in set (0.00 sec)

Although using FULLTEXT indexes can offer benefits when searching words within a very large context inside a column, it also creates issues when used incorrectly. 

When doing a FULLTEXT search for a large table that is constantly accessed (where a number of client requests are searching for different,  unique keywords) it could be very CPU intensive. 

There are certain occasions as well that FULLTEXT is not applicable. See this external blog post. Although I haven't tried this with 8.0, I don't see any changes relevant to this. We suggest that do not use FULLTEXT for searching a big data environment, especially for high-traffic tables. Otherwise, try to leverage other technologies such as Apache Lucene, Apache Solr, tsearch2, or Sphinx.

Avoid Using NULL in Columns

Columns that contain null values are totally fine in MySQL. But if you are using columns with null values into an index, it can affect query performance as the optimizer cannot provide the right query plan due to poor index distribution. However, there are certain ways to optimize queries that involves null values but of course, if this suits the requirements. Please check the documentation of MySQL about Null Optimization. You may also check this external post which is helpful as well.

Design Your Schema Topology and Tables Structure Efficiently

To some extent, normalizing your database tables from 1NF (First Normal Form) to 3NF (Third Normal Form) provides you some benefit for query efficiency because normalized tables tend to avoid redundant records. A proper planning and design for your tables is very important because this is how you retrieved or pull data and in every one of these actions has a cost. With normalized tables, the goal of the database is to ensure that every non-key column in every table is directly dependent on the key; the whole key and nothing but the key. If this goal is reached, it pays of the benefits in the form of reduced redundancies, fewer anomalies and improved efficiencies.

While normalizing your tables has many benefits, it doesn't mean you need to normalize all your tables in this way. You can implement a design for your database using Star Schema. Designing your tables using Star Schema has the benefit of simpler queries (avoid complex cross joins), easy to retrieve data for reporting, offers performance gains because there's no need to use unions or complex joins, or fast aggregations. A Star Schema is simple to implement, but you need to carefully plan because it can create big problems and disadvantages when your table gets bigger and requires maintenance. Star Schema (and its underlying tables) are prone to data integrity issues, so you may have a high probability that bunch of your data is redundant. If you think this table has to be constant (structure and design) and is designed to utilize query efficiency, then it's an ideal case for this approach.

Mixing your database designs (as long as you are able to determine and identify what kind of data has to be pulled on your tables) is very important since you can benefit with more efficient queries and as well as help the DBA with backups, maintenance, and recovery.

Get Rid of Constant and Old Data

We recently wrote some Best Practices for Archiving Your Database in the Cloud. It covers about how you can take advantage of data archiving before it goes to the cloud. So how does getting rid of old data or archiving your constant and old data help query efficiency? As stated in my previous blog, there are benefits for larger tables that are constantly modified and inserted with new data, the tablespace can grow quickly. MySQL and InnoDB performs efficiently when records or data are contiguous to each other and has significance to its next row in the table. Meaning, if you have no old records that are no longer need to be used, then the optimizer does not need to include that in the statistics offering much more efficient result. Make sense, right? And also, query efficiency is not only on the application side, it has also need to consider its efficiency when performing a backup and when on maintenance or failover. For example, if you have a bad and long query that can affect your maintenance period or a failover, that can be a problem.

Enable Query Logging As Needed

Always set your MySQL's slow query log in accordance to your custom needs. If you are using Percona Server, you can take advantage of their extended slow query logging. It allows you to customarily define certain variables. You can filter types of queries in combination such as full_scan, full_join, tmp_table, etc. You can also dictate the rate of slow query logging through variable log_slow_rate_type, and many others.

The importance of enabling query logging in MySQL (such as slow query) is beneficial for inspecting your queries so that you can optimize or tune your MySQL by adjusting certain variables that suits to your requirements. To enable slow query log, ensure that these variables are setup:

  • long_query_time - assign the right value for how long the queries can take. If the queries take more than 10 seconds (default), it will fall down to the slow query log file you assigned.
  • slow_query_log - to enable it, set it to 1.
  • slow_query_log_file - this is the destination path for your slow query log file.

The slow query log is very helpful for query analysis and diagnosing bad queries that cause stalls, slave delays, long running queries, memory or CPU intensive, or even cause the server to crash. If you use pt-query-digest or pt-index-usage, use the slow query log file as your source target for reporting these queries alike.


We have discussed some ways you can use to maximize database query efficiency in this blog. In this next part we'll discuss even more factors which can help you maximize performance. Stay tuned!


Full MariaDB Encryption At-Rest and In-Transit for Maximum Data Protection - Part One


In this blog series, we are going to give you a complete walkthrough on how to configure a fully encrypted MariaDB server for at-rest and in-transit encryption, to ensure maximum protection of the data from being stolen physically or while transferring and communicating with other hosts. The basic idea is we are going to turn our "plain" deployment into a fully encrypted MariaDB replication, as simplified in the following diagram:

We are going to configure a number of encryption components:

  • In-transit encryption, which consists of:
    • Client-server encryption
    • Replication encryption
  • At-rest encryption, which consists of:
    • Data file encryption
    • Binary/relay log encryption.

Note that this blog post only covers in-transit encryption. We are going to cover at-rest encryption in the second part of this blog series.

This deployment walkthrough assumed that we already have an already running MariaDB replication server. If you don't have one, you can use ClusterControl to deploy a new MariaDB replication within minutes, with fewer than 5 clicks. All servers are running on MariaDB 10.4.11 on CentOS 7 system.

In-Transit Encryption

Data can be exposed to risks both in transit and at rest and requires protection in both states. In-transit encryption protects your data if communications are intercepted while data moves between hosts through network, either from your site and the cloud provider, between services or between clients and the server.

For MySQL/MariaDB, data is in motion when a client connects to a database server, or when a slave node replicates data from a master node. MariaDB supports encrypted connections between clients and the server using the TLS (Transport Layer Security) protocol. TLS is sometimes referred to as SSL (Secure Sockets Layer) but MariaDB does not actually use the SSL protocol for encrypted connections because its encryption is weak. More details on this at MariaDB documentation page.

Client-Server Encryption

In this setup we are going to use self-signed certificates, which means we do not use external parties like Google, Comodo or any popular Certificate Authority provider out there to verify our identity. In SSL/TLS, identity verification is the first step that must be passed before the server and client exchange their certificates and keys.

MySQL provides a very handy tool called mysql_ssl_rsa_setup which takes care of the key and certificate generation automatically. Unfortunately, there is no such tool for MariaDB server yet. Therefore, we have to manually prepare and generate the SSL-related files for our MariaDB TLS needs.

The following is a list of the files that we will generate using OpenSSL tool:

  • CA key - RSA private key in PEM format. Must be kept secret.
  • CA certificate - X.509 certificate in PEM format. Contains public key and certificate metadata.
  • Server CSR - Certificate signing request. The Common Name (CN) when filling the form is important, for example CN=mariadb-server
  • Server key - RSA private key. Must be kept secret.
  • Server cert - X.509 certificate signed by CA key. Contains public key and certificate metadata.
  • Client CSR - Certificate signing request. Must use a different Common Name (CN) than Server's CSR, for example CN=client1 
  • Client key - RSA private key. Must be kept secret.
  • Client cert - X.509 certificate signed by CA key. Contains public key and certificate metadata.

First and foremost, create a directory to store our certs and keys for in-transit encryption:

$ mkdir -p /etc/mysql/transit/
$ cd /etc/mysql/transit/

Just to give you an idea why we name the directory as mentioned is because in the next part of this blog series, we will create another directory for at-rest encryption at /etc/mysql/rest.

Certificate Authority

Generate a key file for our own Certificate Authority (CA):

$ openssl genrsa 2048 > ca-key.pem
Generating RSA private key, 2048 bit long modulus
e is 65537 (0x10001)

Generate a certificate for our own Certificate Authority (CA) based on the ca-key.pem generated before with expiration of 3650 days:

$ openssl req -new -x509 -nodes -days 3650 -key ca-key.pem -out ca.pem
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
Country Name (2 letter code) [XX]:SE
State or Province Name (full name) []:Stockholm
Locality Name (eg, city) [Default City]:Stockholm
Organization Name (eg, company) [Default Company Ltd]:Severalnines
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:CA
Email Address []:info@severalnines.com

Now we should have ca-key.pem and ca.pem under this working directory.

Key and Certificate for Server

Next, generate private key for the MariaDB server:

$ openssl genrsa 2048 > server-key.pem
Generating RSA private key, 2048 bit long modulus
e is 65537 (0x10001)

A trusted certificate must be a certificate signed by a Certificate Authority whereby here, we are going to use our own CA because we trust the hosts in the network. Before we can create a signed certificate, we need to generate a request certificate called Certificate Signing Request (CSR).

Create a CSR for MariaDB server. We are going to call the certificate as server-req.pem. This is not the certificate that we are going to use for MariaDB server. The final certificate is the one that will be signed by our own CA private key (as shown in the next step):

$ openssl req -new -key server-key.pem -out server-cert.pem
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
Country Name (2 letter code) [XX]:SE
State or Province Name (full name) []:Stockholm
Locality Name (eg, city) [Default City]:Stockholm
Organization Name (eg, company) [Default Company Ltd]:Severalnines
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:MariaDBServer
Email Address []:info@severalnines.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

Take note on the Common Name where we specified "MariaDBServer". This can be any name but the value must not be the same as the client certificate. Commonly, if the applications connect to the MariaDB server via FQDN or hostname (skip-name-resolve=OFF), you probably want to specify the MariaDB server's FQDN as the Common Name. Doing so allows you to connect with 

We can then generate the final X.509 certificate (server-cert.pem) and sign the CSR (server-req.pem) with CA's certificate (ca.pem) and CA's private key (ca-key.pem):

$ openssl x509 -req -in server-req.pem -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out server-cert.pem -days 3650 -sha256
Signature ok
Getting CA Private Key

At this point, this is what we have now:

$ ls -1 /etc/mysql/transite

We only need the signed certificate (server-cert.pem) and the private key (server-key.pem) for the MariaDB server. The CSR (server-req.pem) is no longer required.

Key and Certificate for the Client

Next, we need to generate key and certificate files for the MariaDB client. The MariaDB server will only accept remote connection from the client who has these certificate files. 

Start by generating a 2048-bit key for the client:

$ openssl genrsa 2048 > client-key.pem
Generating RSA private key, 2048 bit long modulus
e is 65537 (0x10001)

Create CSR for the client called client-req.pem:

$ openssl req -new -key client-key.pem -out client-req.pem
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
Country Name (2 letter code) [XX]:SE
State or Province Name (full name) []:Stockholm
Locality Name (eg, city) [Default City]:Stockholm
Organization Name (eg, company) [Default Company Ltd]:Severalnines
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:Client1
Email Address []:info@severalnines.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:
An optional company name []:

Pay attention to the Common Name where we specify "Client1". Specify any name that represents the client. This value must be different from the server's Common Name. For advanced usage, you can use this Common Name to allow certain user with certificate matching this value, for example:

MariaDB> GRANT SELECT ON schema1.* TO 'client1'@'' IDENTIFIED BY 's' REQUIRE SUBJECT '/CN=Client2';

We can then generate the final X.509 certificate (client-cert.pem) and sign the CSR (client-req.pem) with CA's certificate (ca.pem) and CA's private key (ca-key.pem):

$ openssl x509 -req -in client-req.pem -CA ca.pem -CAkey ca-key.pem -CAcreateserial -out client-cert.pem -days 3650 -sha256
Signature ok
Getting CA Private Key

All certificates that we need for in-transit encryption setup are generated. Verify both certificates are correctly signed by the CA:

$ openssl verify -CAfile ca.pem server-cert.pem client-cert.pem
server-cert.pem: OK
client-cert.pem: OK

Configuring SSL for MariaDB

Create a new directory on the every slave:

(slave1)$ mkdir -p /etc/mysql/transit/
(slave2)$ mkdir -p /etc/mysql/transit/

Copy the encryption files to all slaves:

$ scp -r /etc/mysql/transit/* root@slave1:/etc/mysql/transit/
$ scp -r /etc/mysql/transit/* root@slave2:/etc/mysql/transit/

Make sure the owner of the certs directory to the "mysql" user and change the permissions of all key files so it won't be readable globally:

$ cd /etc/mysql/transit
$ chown -R mysql:mysql *
$ chmod 600 client-key.pem server-key.pem ca-key.pem

Here is what you should see when listing out files under "transit" directory:

$ ls -al /etc/mysql/transit
total 32
drwxr-xr-x. 2 root  root 172 Dec 14 04:42 .
drwxr-xr-x. 3 root  root 24 Dec 14 04:18 ..
-rw-------. 1 mysql mysql 1675 Dec 14 04:19 ca-key.pem
-rw-r--r--. 1 mysql mysql 1383 Dec 14 04:22 ca.pem
-rw-r--r--. 1 mysql mysql 1383 Dec 14 04:42 client-cert.pem
-rw-------. 1 mysql mysql 1675 Dec 14 04:42 client-key.pem
-rw-r--r--. 1 mysql mysql 1399 Dec 14 04:42 client-req.pem
-rw-r--r--. 1 mysql mysql 1391 Dec 14 04:34 server-cert.pem
-rw-------. 1 mysql mysql 1679 Dec 14 04:28 server-key.pem
-rw-r--r--. 1 mysql mysql 1415 Dec 14 04:31 server-req.pem

Next, we will enable the SSL connection for MariaDB. On every MariaDB host (master and slaves) edit the configuration file and add the following lines under [mysqld] section:


Restart MariaDB server one node at a time, starting from slaves and finally on the master:

(slave1)$ systemctl restart mariadb
(slave2)$ systemctl restart mariadb
(master)$ systemctl restart mariadb

After restarted, MariaDB is now capable of accepting plain connections by connecting to it without any SSL-related parameters or with encrypted connections, when you specify SSL-related parameter in the connection string.

For ClusterControl users, you can enable client-server encryption a matter of clicks. Just go to ClusterControl -> Security -> SSL Encryption -> Enable -> Create Certificate -> Certificate Expiration -> Enable SSL:

ClusterControl will generate the required keys, X.509 certificate and CA certificate and set up SSL encryption for client-server connections for all the nodes in the cluster. For MySQL/MariaDB replication, the SSL files will be located under /etc/ssl/replication/cluster_X, where X is the cluster ID on every database node. The same certificates will be used on all nodes and the existing ones might be overwritten. The nodes must be restarted individually after this job completes. We recommend that you first restart a replication slave and verify that the SSL settings work.

To restart every node, go to ClusterControl -> Nodes -> Node Actions -> Restart Node. Do restart one node at a time, starting with the slaves. The last node should be the master node with force stop flag enabled:

You can tell if a node is able to handle client-server encryption by looking at the green lock icon right next to the database node in the Overview grid:

At this point, our cluster is now ready to accept SSL connection from MySQL users.

Connecting via Encrypted Connection

The MariaDB client requires all client-related SSL files that we have generated inside the server. Copy the generated client certificate, CA certificate and client key to the client host:

$ cd /etc/mysql/transit
$ scp client-cert.pem client-key.pem ca.pem root@client-host:~

**ClusterControl generates the client SSL files under /etc/ssl/replication/cluster_X/on every database node, where X is the cluster ID.

Create a database user that requires SSL on the master:

MariaDB> CREATE SCHEMA sbtest;
MariaDB> GRANT ALL PRIVILEGES ON sbtest.* to sbtest@'%';

From the client host, connect to the MariaDB server with SSL-related parameters. We can verify the connection status by using "STATUS" statement:

(client)$ mysql -usbtest -p -h192.168.0.91 -P3306 --ssl-cert client-cert.pem --ssl-key client-key.pem --ssl-ca ca.pem -e 'status'
Current user: sbtest@
SSL: Cipher in use is DHE-RSA-AES256-GCM-SHA384

Pay attention to the SSL line where the cipher is used for the encryption. This means the client is successfully connected to the MariaDB server via encrypted connection. 

At this point, we have encrypted the client-server connection to the MariaDB server, as represented by the green two-headed arrow in the following diagram:

In the next part, we are going to encrypt replication connections between nodes.

Replication Encryption

Setting up encrypted connections for replication is similar to doing so for client/server connections. We can use the same client certificates, key and CA certificate to let the replication user access the master's server via encryption channel. This will indirectly enable encryption between nodes when slave IO thread pulls replication events from the master. 

Let's configure this on one slave at a time. For the first slave,, add the following line under [client] section inside MariaDB configuration file:


Stop the replication thread on the slave:

(slave)MariaDB> STOP SLAVE;

On the master, alter the existing replication user to force it to connect using SSL:

(master)MariaDB> ALTER USER rpl_user@ REQUIRE SSL;

On the slave, test the connectivity to the master, via mysql command line with --ssl flag:

(slave)MariaDB> mysql -urpl_user -p -h192.168.0.91 -P 3306 --ssl -e 'status'
Current user: rpl_user@
SSL: Cipher in use is DHE-RSA-AES256-GCM-SHA384

Make sure you can get connected to the master host without error. Then, on the slave, specify the CHANGE MASTER statement with SSL parameters as below:

(slave)MariaDB> CHANGE MASTER TO MASTER_SSL = 1, MASTER_SSL_CA = '/etc/mysql/transit/ca.pem', MASTER_SSL_CERT = '/etc/mysql/transit/client-cert.pem', MASTER_SSL_KEY = '/etc/mysql/transit/client-key.pem';

Start the replication slave:

(slave)MariaDB> START SLAVE;

Verify that the replication is running okay with related SSL parameters:

              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
            Master_SSL_Allowed: Yes
            Master_SSL_CA_File: /etc/mysql/transit/ca.pem
               Master_SSL_Cert: /etc/mysql/transit/client-cert.pem
                Master_SSL_Key: /etc/mysql/transit/client-key.pem

The slave is now replicating from the master securely via TLS encryption.

Repeat all of the above steps on the remaining slave, The only difference is the alter user statement to be executed on the master where we have to change to its respective host:

(master)MariaDB> ALTER USER rpl_user@ REQUIRE SSL;

At this point we have completed in-transit encryption as illustrated by the green lines from master to slaves in the following diagram:

You can verify the encryption connection by looking at the tcpdump output for interface eth1 on the slave. The following is an example of standard replication without encryption:

(plain-slave)$ tcpdump -i eth1 -s 0 -l -w - 'src port 3306 or dst port 3306' | strings
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
create table t1 (id INT AUTO_INCREMENT PRIMARY KEY, data VARCHAR(255))
test data3

^C11 packets captured
11 packets received by filter
0 packets dropped by kernel

We can clearly see the text as read by the slave from the master. While on an encrypted connection, you should see gibberish characters like below:

(encrypted-slave)$ tcpdump -i eth1 -s 0 -l -w - 'src port 3306 or dst port 3306' | strings
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes

^C6 packets captured
6 packets received by filter
0 packets dropped by kernel


In the next part of this blog series we are going to look into completing our fully encrypted setup with MariaDB at-rest encryption. Stay tuned!

Full MariaDB Encryption At-Rest and In-Transit for Maximum Data Protection - Part Two


In the first part of this series, we have covered in-transit encryption configuration for MariaDB replication servers, where we configured client-server and replication encryptions. Taken from the first post, where we had partially configured our full encryption (as indicated by the green arrows on the left in the diagram) and in this blog post, we are going to complete the encryption setup with at-rest encryption to create a fully encrypted MariaDB replication setup.

The following diagram illustrates our current setup and the final setup that we are going to achieve:

At-Rest Encryption

At-rest encryption means the data-at-rest like data files and logs are encrypted on the disk, makes it almost impossible for someone to access or steal a hard disk and get access to the original data (provided that the key is secured and not stored locally). Data-at-Rest Encryption, also known as Transparent Data Encryption (TDE), is supported in MariaDB 10.1 and later. Note that using encryption has an overhead of roughly 5-10%, depending on the workload and cluster type.

For MariaDB, the following MariaDB components can be encrypted at-rest:

  • InnoDB data file (shared tablespace or individual tablespace, e.g, *.ibd and ibdata1)
  • Aria data and index files.
  • Undo/redo logs (InnoDB log files, e.g, ib_logfile0 and ib_logfile1).
  • Binary/relay logs.
  • Temporary files and tables.

The following files can not be encrypted at the moment:

  • Metadata file (for example .frm files).
  • File-based general log/slow query log. Table-based general log/slow query log can be encrypted.
  • Error log.

MariaDB's data-at-rest encryption requires the use of a key management and encryption plugins. In this blog post, we are going to use File Key Management Encryption Plugin, which is provided by default since MariaDB 10.1.3. Note that there are a number of drawbacks using this plugin, e.g, the key can still be read by root and MySQL user, as explained in the MariaDB Data-at-Rest Encryption page.

Generating Key File

Let's create a dedicated directory to store our at-rest encryption stuff:

$ mkdir -p /etc/mysql/rest
$ cd /etc/mysql/rest

Create a keyfile. This is the core of encryption:

$ openssl rand -hex 32 > /etc/mysql/rest/keyfile

Append a string "1;" as the key identifier into the keyfile:

$ echo '1;' 
sed -i '1s/^/1;/' /etc/mysql/rest/keyfile

Thus, when reading the keyfile, it should look something like this:

$ cat /etc/mysql/rest/keyfile

The above simply means for key identifier 1, the key is 4eb... The key file needs to contain two pieces of information for each encryption key. First, each encryption key needs to be identified with a 32-bit integer as the key identifier. Second, the encryption key itself needs to be provided in hex-encoded form. These two pieces of information need to be separated by a semicolon.

Create a password to encrypt the above key. Here we are going to store the password inside a file called "keyfile.passwd":

$ echo -n 'mySuperStrongPassword'> /etc/mysql/rest/keyfile.passwd

You could skip the above step if you would like to specify the password directly in the configuration file using file_key_management_filekey option. For example: file_key_management_filekey=mySuperStrongPassword

But in this example, we are going to read the password that is stored in a file, thus we have to define the following line in the configuration file later on: 


We are going to encrypt the clear text keyfile into another file called keyfile.enc, using password inside the password file:

$  openssl enc -aes-256-cbc -md sha1 -pass file:/etc/mysql/rest/keyfile.passwd -in /etc/mysql/rest/keyfile -out /etc/mysql/rest/keyfile.enc

When listing out the directory, we should see these 3 files:

$ ls -1 /etc/mysql/rest/

The content of the keyfile.enc is simply an encrypted version of keyfile:

To test out, we can decrypt the encrypted file using OpenSSL by providing the password file (keyfile.passwd):

$ openssl aes-256-cbc -d -md sha1 -pass file:/etc/mysql/rest/keyfile.passwd -in /etc/mysql/rest/keyfile.enc

We can then remove the plain key because we are going to use the encrypted one (.enc) together with the password file:

$ rm -f /etc/mysql/encryption/keyfile

We can now proceed to configure MariaDB at-rest encryption.

Configuring At-Rest Encryption

We have to move the encrypted key file and password to the slaves to be used by MariaDB to encrypt/decrypt the data. Otherwise, an encrypted table being backed up from the master using physical backup like MariaDB Backup would be having a problem to read by the slaves (due to different key/password combination). Logical backup like mysqldump should work with different keys and passwords.

On the slaves, create a directory to store at-rest encryption stuff:

(slave1)$ mkdir -p /etc/mysql/rest
(slave2)$ mkdir -p /etc/mysql/rest

On the master, copy the encrypted keyfile and password file to the other slaves:

(master)$ cd /etc/mysql/rest
(master)$ scp keyfile.enc keyfile.passwd root@slave1:/etc/mysql/rest/
(master)$ scp keyfile.enc keyfile.passwd root@slave2:/etc/mysql/rest/

Protect the files from global access and assign "mysql" user as the ownership:

$ chown mysql:mysql /etc/mysql/rest/*
$ chmod 600 /etc/mysql/rest/*

Add the following into MariaDB configuration file under [mysqld] or [mariadb] section:

# at-rest encryption
plugin_load_add              = file_key_management
file_key_management_filename = /etc/mysql/rest/keyfile.enc
file_key_management_filekey  = FILE:/etc/mysql/rest/keyfile.passwd
file_key_management_encryption_algorithm = AES_CBC

innodb_encrypt_tables            = ON
innodb_encrypt_temporary_tables  = ON
innodb_encrypt_log               = ON
innodb_encryption_threads        = 4
innodb_encryption_rotate_key_age = 1
encrypt-tmp-disk-tables          = 1
encrypt-tmp-files                = 1
encrypt-binlog                   = 1
aria_encrypt_tables              = ON

Take note on the file_key_management_filekey variable, if the password is in a file, you have to prefix the path with "FILE:". Alternatively, you could also specify the password string directly (not recommended due to its verbosity): 


Restart MariaDB server one node at a time, starting with the slaves:

(slave1)$ systemctl restart mariadb
(slave2)$ systemctl restart mariadb
(master)$ systemctl restart mariadb

Observe the error log and make sure MariaDB encryption is activated during start up:

$ tail -f /var/log/mysql/mysqld.log
2019-12-17  6:44:47 0 [Note] InnoDB: Encrypting redo log: 2*67108864 bytes; LSN=143311
2019-12-17  6:44:48 0 [Note] InnoDB: Starting to delete and rewrite log files.
2019-12-17  6:44:48 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 67108864 bytes
2019-12-17  6:44:48 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 67108864 bytes
2019-12-17  6:44:48 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2019-12-17  6:44:48 0 [Note] InnoDB: New log files created, LSN=143311
2019-12-17  6:44:48 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2019-12-17  6:44:48 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-12-17  6:44:48 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-12-17  6:44:48 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-12-17  6:44:48 0 [Note] InnoDB: Waiting for purge to start
2019-12-17  6:44:48 0 [Note] InnoDB: 10.4.11 started; log sequence number 143311; transaction id 222
2019-12-17  6:44:48 0 [Note] InnoDB: Creating #1 encryption thread id 139790011840256 total threads 4.
2019-12-17  6:44:48 0 [Note] InnoDB: Creating #2 encryption thread id 139790003447552 total threads 4.
2019-12-17  6:44:48 0 [Note] InnoDB: Creating #3 encryption thread id 139789995054848 total threads 4.
2019-12-17  6:44:48 0 [Note] InnoDB: Creating #4 encryption thread id 139789709866752 total threads 4.
2019-12-17  6:44:48 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2019-12-17  6:44:48 0 [Note] Plugin 'FEEDBACK' is disabled.
2019-12-17  6:44:48 0 [Note] Using encryption key id 1 for temporary files

You should see lines indicating encryption initialization in the error log. At this point, the majority of the encryption configuration is now complete.

Testing Your Encryption

Create a test database to test on the master:

(master)MariaDB> CREATE SCHEMA sbtest;
(master)MariaDB> USE sbtest;

Create a standard table without encryption and insert a row:

MariaDB> INSERT INTO tbl_plain SET data = 'test data';

We can see the stored data in clear text when browsing the InnoDB data file using a hexdump tool:

$ xxd /var/lib/mysql/sbtest/tbl_plain.ibd | less
000c060: 0200 1c69 6e66 696d 756d 0002 000b 0000  ...infimum......
000c070: 7375 7072 656d 756d 0900 0000 10ff f180  supremum........
000c080: 0000 0100 0000 0000 0080 0000 0000 0000  ................
000c090: 7465 7374 2064 6174 6100 0000 0000 0000  test data.......
000c0a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Create an encrypted table and insert a row:

MariaDB> INSERT INTO tbl_enc SET data = 'test data';

We can't tell what is stored in InnoDB data file for encrypted tables:

$ xxd /var/lib/mysql/sbtest/tbl_enc.ibd | less
000c060: 0c2c 93e4 652e 9736 e68a 8b69 39cb 6157  .,..e..6...i9.aW
000c070: 3cd1 581c 7eb9 84ca d792 7338 521f 0639  <.X.~.....s8R..9
000c080: d279 9eb3 d3f5 f9b0 eccb ed05 de16 f3ac  .y..............
000c090: 6d58 5519 f776 8577 03a4 fa88 c507 1b31  mXU..v.w.......1
000c0a0: a06f 086f 28d9 ac17 8923 9412 d8a5 1215  .o.o(....#......

Note that the metadata file tbl_enc.frm is not encrypted at-rest. Only the InnoDB data file (.ibd) is encrypted.

When comparing the "plain" binary or relay logs, we can clearly see the content of it using hexdump tool:

$ xxd binlog.000002 | less
0000560: 0800 0800 0800 0b04 726f 6f74 096c 6f63  ........root.loc
0000570: 616c 686f 7374 0047 5241 4e54 2052 454c  alhost.GRANT REL
0000580: 4f41 442c 4c4f 434b 2054 4142 4c45 532c  OAD,LOCK TABLES,
0000590: 5245 504c 4943 4154 494f 4e20 434c 4945  REPLICATION CLIE
00005a0: 4e54 2c45 5645 4e54 2c43 5245 4154 4520  NT,EVENT,CREATE
00005b0: 5441 424c 4553 5041 4345 2c50 524f 4345  TABLESPACE,PROCE
00005c0: 5353 2c43 5245 4154 452c 494e 5345 5254  SS,CREATE,INSERT
00005d0: 2c53 454c 4543 542c 5355 5045 522c 5348  ,SELECT,SUPER,SH
00005e0: 4f57 2056 4945 5720 4f4e 202a 2e2a 2054  OW VIEW ON *.* T

While for an encrypted binary log, the content looks gibberish:

$ xxd binlog.000004 | less
0000280: 4a1d 1ced 2f1b db50 016a e1e9 1351 84ba  J.../..P.j...Q..
0000290: 38b6 72e7 8743 7713 afc3 eecb c36c 1b19  8.r..Cw......l..
00002a0: 7b3f 6176 208f 0000 00dc 85bf 6768 e7c6  {?av .......gh..
00002b0: 6107 5bea 241c db12 d50c 3573 48e5 3c3d  a.[.$.....5sH.<=
00002c0: 3179 1653 2449 d408 1113 3e25 d165 c95b  1y.S$I....>%.e.[
00002d0: afb0 6778 4b26 f672 1bc7 567e da96 13f5  ..gxK&.r..V~....
00002e0: 2ac5 b026 3fb9 4b7a 3ef4 ab47 6c9f a686  *..&?.Kz>..Gl...

Encrypting Aria Tables

For Aria storage engine, it does not support the ENCRYPTED option in CREATE/ALTER statement since it follows the aria_encrypt_tables global option. Therefore, when creating an Aria table, simply create the table with ENGINE=Aria option:

MariaDB> INSERT INTO tbl_aria_enc(data) VALUES ('test data');
MariaDB> FLUSH TABLE tbl_aria_enc;

We can then verify the content of the table's data file (tbl_aria_enc.MAD) or index file (tbl_aria_enc.MAI) with hexdump tool. To encrypt an existing Aria table, the table needs to be re-built:


This statement causes Aria to rebuild the table using the ROW_FORMAT table option. In the process, with the new default setting, it encrypts the table when it writes to disk.

Encrypting General Log/Slow Query Log

To encrypt general and slow query logs, we can set MariaDB log_output option to 'TABLE' instead of the default 'FILE':

MariaDB> SET GLOBAL log_ouput = 'TABLE';

However, MariaDB will by default create the necessary tables using CSV storage engine, which is not encrypted by MariaDB. No engines other than CSV, MyISAM or Aria are legal for the log tables. The trick is to rebuild the default CSV table with Aria storage engine, provided that aria_encrypt_tables option is set to ON. However, the respective log option must be turned off for the table alteration to succeed.

Thus, the steps to encrypt general log table is:

MariaDB> SET GLOBAL general_log = OFF;
MariaDB> ALTER TABLE mysql.general_log ENGINE=Aria;
MariaDB> SET GLOBAL general_log = ON;

Similarly, for slow query log:

MariaDB> SET GLOBAL slow_query_log = OFF;
MariaDB> ALTER TABLE mysql.slow_log ENGINE=Aria;
MariaDB> SET GLOBAL slow_query_log = ON;

Verify the output of general logs within the server:

MariaDB> SELECT * FROM mysql.general_log;
| event_time                 | user_host                 | thread_id | server_id | command_type | argument                     |
| 2019-12-17 07:45:53.109558 | root[root] @ localhost [] |        19 |     28001 |        Query | select * from sbtest.tbl_enc |
| 2019-12-17 07:45:55.504710 | root[root] @ localhost [] |        20 |     28001 |        Query | select * from general_log    |

As well as the encrypted content of the Aria data file inside data directory using hexdump tool:

$ xxd /var/lib/mysql/mysql/general_log.MAD | less
0002040: 1d45 820d 7c53 216c 3fc6 98a6 356e 1b9e  .E..|S!l?...5n..
0002050: 6bfc e193 7509 1fa7 31e2 e22a 8f06 3c6f  k...u...1..*..<o
0002060: ae71 bb63 e81b 0b08 7120 0c99 9f82 7c33  .q.c....q ....|3
0002070: 1117 bc02 30c1 d9a7 c732 c75f 32a6 e238  ....0....2._2..8
0002080: d1c8 5d6f 9a08 455a 8363 b4f4 5176 f8a1  ..]o..EZ.c..Qv..
0002090: 1bf8 113c 9762 3504 737e 917b f260 f88c  ...<.b5.s~.{.`..
00020a0: 368e 336f 9055 f645 b636 c5c1 debe fbe7  6.3o.U.E.6......
00020b0: d01e 028f 8b75 b368 0ef0 8889 bb63 e032  .....u.h.....c.2

MariaDB at-rest encryption is now complete. Combine this with in-transit encryption we have done in the first post, our final architecture is now looking like this:


It's now possible to totally secure your MariaDB databases via encryption for protection against physical and virtual breach or theft. ClusterControl can help you maintain this type of security as well and you can download it for free here.


Database Performance Tuning for MariaDB


Ever since MySQL was originally forked to form MariaDB it has been widely supported and adopted quickly by a large audience in the open source database community. Originally a drop-in replacement, MariaDB has started to create distinction against MySQL, especially with the release of MariaDB 10.2

Despite this, however, there's still no real telltale difference between MariaDB and MySQL, as both have engines that are compatible and can run natively with one another. So don't be surprised if the tuning of your MariaDB setup has a similar approach to one tuning MySQL

This blog will discuss the tuning of MariaDB, specifically those systems running in a Linux environment.

MariaDB Hardware and System Optimization

MariaDB recommends that you improve your hardware in the following priority order...


Memory is the most important factor for databases as it allows you to adjust the Server System Variables. More memory means larger key and table caches, which are stored in memory so that disks can access, an order of magnitude slower, is subsequently reduced.

Keep in mind though, simply adding more memory may not result in drastic improvements if the server variables are not set to make use of the extra available memory.

Using more RAM slots on the motherboard increases the bus frequency, and there will be more latency between the RAM and the CPU. This means that using the highest RAM size per slot is preferable.


Fast disk access is critical, as ultimately it's where the data resides. The key figure is the disk seek time (a measurement of how fast the physical disk can move to access the data) so choose disks with as low a seek time as possible. You can also add dedicated disks for temporary files and transaction logs.

Fast Ethernet

With the appropriate requirements for your internet bandwidth, fast ethernet means it can have faster response to clients requests, replication response time to read binary logs across the slaves, faster response times is also very important especially on Galera-based clusters.


Although hardware bottlenecks often fall elsewhere, faster processors allow calculations to be performed more quickly, and the results sent back to the client more quickly. Besides processor speed, the processor's bus speed and cache size are also important factors to consider.

Setting Your Disk I/O Scheduler

I/O schedulers exist as a way to optimize disk access requests. It merges I/O requests to similar locations on the disk. This means that the disk drive doesn’t need to seek as often and improves a huge overall response time and saves disk operations. The recommended values for I/O performance are noop and deadline

noop is useful for checking whether complex I/O scheduling decisions of other schedulers are not causing I/O performance regressions. In some cases it can be helpful for devices that do I/O scheduling themselves, as intelligent storage, or devices that do not depend on mechanical movement, like SSDs. Usually, the DEADLINE I/O scheduler is a better choice for these devices, but due to less overhead NOOP may produce better performance on certain workloads.

For deadline, it is a latency-oriented I/O scheduler. Each I/O request has got a deadline assigned. Usually, requests are stored in queues (read and write) sorted by sector numbers. The DEADLINE algorithm maintains two additional queues (read and write) where the requests are sorted by deadline. As long as no request has timed out, the “sector” queue is used. If timeouts occur, requests from the “deadline” queue are served until there are no more expired requests. Generally, the algorithm prefers reads over writes.

For PCIe devices (NVMe SSD drives), they have their own large internal queues along with fast service and do not require or benefit from setting an I/O scheduler. It is recommended to have no explicit scheduler-mode configuration parameter.

You can check your scheduler setting with:

cat /sys/block/${DEVICE}/queue/scheduler

For instance, it should look like this output:

cat /sys/block/sda/queue/scheduler

[noop] deadline cfq

To make it permanent, edit /etc/default/grub configuration file, look for the variable GRUB_CMDLINE_LINUX and add elevator just like below:


Increase Open Files Limit

To ensure good server performance, the total number of client connections, database files, and log files must not exceed the maximum file descriptor limit on the operating system (ulimit -n). Linux systems limit the number of file descriptors that any one process may open to 1,024 per process. On active database servers (especially production ones) it can easily reach the default system limit.

To increase this, edit /etc/security/limits.conf and specify or add the following:

mysql soft nofile 65535

mysql hard nofile 65535

This requires a system restart. Afterwards, you can confirm by running the following:

$ ulimit -Sn


$ ulimit -Hn


Optionally, you can set this via mysqld_safe if you are starting the mysqld process thru mysqld_safe,



or if you are using systemd,

sudo tee /etc/systemd/system/mariadb.service.d/limitnofile.conf <<EOF




sudo systemctl daemon-reload

Setting Swappiness on Linux for MariaDB

Linux Swap plays a big role in database systems. It acts like your spare tire in your vehicle, when nasty memory leaks interfere with your work, the machine will slow down... but in most cases will still be usable to finish its assigned task. 

To apply changes to your swappiness, simply run,

sysctl -w vm.swappiness=1

This happens dynamically, with no need to reboot the server. To make it persistent, edit /etc/sysctl.conf and add the line,


It's pretty common to set swappiness=0, but since the release of new kernels (i.e. kernels > 2.6.32-303), changes have been made so you need to set vm.swappiness=1.

Filesystem Optimizations for MariaDB

The most common file systems used in Linux environments running MariaDB are ext4 and XFS. There are also certain setups available for implementing an architecture using ZFS and BRTFS (as referenced in the MariaDB documentation).

In addition to this, most database setups do not need to record file access time. You might want to disable this when mounting the volume into the system. To do this, edit your file /etc/fstab. For example, on a volume named /dev/md2, this how it looks like:

/dev/md2 / ext4 defaults,noatime 0 0

Creating an Optimal MariaDB Instance

Store Data On A Separate Volume

It is always ideal to separate your database data on a separate volume. This volume is specifically for those types of fast storage volumes such as SSD, NVMe, or PCIe cards. For example, if your entire system volume will fail, you'll have your database volume safe and rest assured not affected in case your storage hardware will fail. 

Tuneup MariaDB To Utilize Memory Efficiently


The primary value to adjust on a database server with entirely/primarily XtraDB/InnoDB tables, can be set up to 80% of the total memory in these environments. If set to 2 GB or more, you will probably want to adjust innodb_buffer_pool_instances as well. You can set this dynamically if you are using MariaDB >= 10.2.2 version. Otherwise, it requires a server restart.


For tmp_memory_table_size (tmp_table_size), if you're dealing with large temporary tables, setting this higher provides performance gains as it will be stored in the memory. This is common on queries that are heavily using GROUP BY, UNION, or sub-queries. Although if max_heap_table_size is smaller, the lower limit will apply. If a table exceeds the limit, MariaDB converts it to a MyISAM or Aria table. You can see if it's necessary to increase by comparing the status variables Created_tmp_disk_tables and Created_tmp_tables to see how many temporary tables out of the total created needed to be converted to disk. Often complex GROUP BY queries are responsible for exceeding the limit.

While max_heap_table_size,  this is the maximum size for user-created MEMORY tables. The value set on this variable is only applicable for the newly created or re-created tables and not the existing ones. The smaller of max_heap_table_size and tmp_table_size also limits internal in-memory tables. When the maximum size is reached, any further attempts to insert data will receive a "table ... is full" error. Temporary tables created with CREATE TEMPORARY will not be converted to Aria, as occurs with internal temporary tables, but will also receive a table full error.


Large memories with high-speed processing and fast I/O disk aren't new and has its reasonable price as it recommends. If you are preferring more performance gains especially during and handling your InnoDB transactions, setting the variable innodb_log_file_size to a larger value such as 5Gib or even 10GiB is reasonable. Increasing means that the larger transactions can run without needing to perform disk I/O before committing. 


In some cases, your queries tend to lack use of proper indexing or simply, there are instances that you need this query to run. Not unless it's going to be heavily called or invoked from the client perspective, setting this variable is best on a session level. Increase it to get faster full joins when adding indexes is not possible, although be aware of memory issues, since joins will always allocate the minimum size.

Set Your max_allowed_packet

MariaDB has the same nature as MySQL when handling packets. It splits data into packets and the client must be aware of the max_allowed_packet variable value. The server will have a buffer to store the body with a maximum size corresponding to this max_allowed_packet value. If the client sends more data than max_allowed_packet size, the socket will be closed. The max_allowed_packet directive defines the maximum size of packet that can be sent.

Setting this value too low can cause a query to stop and close its client connection which is pretty common to receive errors like ER_NET_PACKET_TOO_LARGE or Lost connection to MySQL server during query. Ideally, especially on most application demands today, you can start setting this to 512MiB. If it's a low-demand type of application, just use the default value and set this variable only via session when needed if the data to be sent or received is too large than the default value (16MiB since MariaDB 10.2.4). In certain workloads that demand on large packets to be processed, then you need to adjust his higher according to your needs especially when on replication. If max_allowed_packet is too small on the slave, this also causes the slave to stop the I/O thread.

Using Threadpool

In some cases, this tuning might not be necessary or recommended for you. Threadpools are most efficient in situations where queries are relatively short and the load is CPU bound (OLTP workloads). If the workload is not CPU bound, you might still want to limit the number of threads to save memory for the database memory buffers.

Using threadpool is an ideal solution especially if your system is experiencing context switching and you are finding ways to reduce this and maintain a lower number of threads than the number of clients. However, this number should also not be too low, since we also want to make maximum use of the available CPUs. Therefore there should be, ideally, a single active thread for each CPU on the machine.

You can set the thread_pool_max_threads, thread_pool_min_threads for the maximum and the minimum number of threads. Unlike MySQL, this is only present in MariaDB.

Set the variable thread_handling which determines how the server handles threads for client connections. In addition to threads for client connections, this also applies to certain internal server threads, such as Galera slave threads.

Tune Your Table Cache + max_connections

If you are facing occasional occurrences in the processlist about Opening tables and Closing tables statuses, it can signify that you need to increase your table cache. You can monitor this also via the mysql client prompt by running SHOW GLOBAL STATUS LIKE 'Open%table%'; and monitor the status variables. 

For max_connections, if you are application requires a lot of concurrent connections, you can start setting this to 500. 

Fortable_open_cache, it shall be the total number of your tables but it's best you add more depending on the type of queries you serve since temporary tables shall be cached as well. For example, if you have 500 tables, it would be reasonable you start with 1500. 

While your table_open_cache_instances, start setting it to 8. This can improve scalability by reducing contention among sessions, the open tables cache can be partitioned into several smaller cache instances of size table_open_cache / table_open_cache_instances.

For InnoDB, table_definition_cache acts as a soft limit for the number of open table instances in the InnoDB data dictionary cache. The value to be defined will set the number of table definitions that can be stored in the definition cache. If you use a large number of tables, you can create a large table definition cache to speed up opening of tables. The table definition cache takes less space and does not use file descriptors, unlike the normal table cache. The minimum value is 400. The default value is based on the following formula, capped to a limit of 2000:

MIN(400 + table_open_cache / 2, 2000)

If the number of open table instances exceeds the table_definition_cache setting, the LRU mechanism begins to mark table instances for eviction and eventually removes them from the data dictionary cache. The limit helps address situations in which significant amounts of memory would be used to cache rarely used table instances until the next server restart. The number of table instances with cached metadata could be higher than the limit defined by table_definition_cache, because parent and child table instances with foreign key relationships are not placed on the LRU list and are not subject to eviction from memory.

Unlike the table_open_cache, the table_definition_cache doesn't use file descriptors, and is much smaller.

Dealing with Query Cache

Preferably, we recommend to disable query cache in all of your MariaDB setup. You need to ensure that query_cache_type=OFF and query_cache_size=0 to complete disable query cache. Unlike MySQL, MariaDB is still completely supporting query cache and do not have any plans on withdrawing its support to use query cache. There are some people claiming that query cache still provides performance benefits for them. However, this post from Percona The MySQL query cache: Worst enemy or best friend reveals that query cache, if enabled, results to have an overhead and shows to have a bad server performance.

If you intend to use query cache, make sure that you monitor your query cache by runningSHOW GLOBAL STATUS LIKE 'Qcache%';. Qcache_inserts contains the number of queries added to the query cache, Qcache_hits contains the number of queries that have made use of the query cache, while Qcache_lowmem_prunes contains the number of queries that were dropped from the cache due to lack of memory. While in due time, using and enabling query cache may become fragmented. A high Qcache_free_blocks relative to Qcache_total_blocks may indicate fragmentation. To defragment it, run FLUSH QUERY CACHE. This will defragment the query cache without dropping any queries.

Always Monitor Your Servers

It is highly important that you properly monitor your MariaDB nodes. Common monitoring tools out there (like Nagios, Zabbix, or PMM) are available if you tend to prefer free and open-source tools. For corporate and fully-packed tools we suggest you give ClusterControl a try, as it does not only provide monitoring, but it also offers performance advisors, alerts and alarms which helps you improve your system performance and stay up-to-date with the current trends as you engage with the Support team. Database monitoring with ClusterControl is free and part of the Community Edition.


Tuning your MariaDB setup is almost the same approach as MySQL, but with some disparities, as it differs in some of its approaches and versions that it does support. MariaDB is now a different entity in the database world and has quickly gained the trust by the community without any FUD. They have their own reasons why it has to be implemented this way so it's very important we know how to tune this and optimize your MariaDB server(s).

Announcing ClusterControl 1.7.5: Advanced Cluster Maintenance & Support for PostgreSQL 12 and MongoDB 4.2


We’re excited to announce the 1.7.5 release of ClusterControl - the only database management system you’ll ever need to take control of your open source database infrastructure. 

This new version features support for the latest MongoDB& PostgreSQL general releases as well as new operating system support allowing you to install ClusterControl on Centos 8 and Debian 10.

ClusterControl 1.7.4 provided the ability to place a node into Maintenance Mode. 1.7.5 now allows you to place (or schedule) the entire database cluster in Maintenance Mode, giving you more control over your database operations.

In addition, we are excited to announce a brand new function in ClusterControl we call “Freeze Frame.” This new feature will take snapshots of your MySQL or MariaDB setups right before a detected failure, providing you with invaluable troubleshooting information about what caused the issue. 

Release Highlights

Database Cluster-Wide Maintenance

  • Perform tasks in Maintenance-Mode across the entire database cluster.
  • Enable/disable cluster-wide maintenance mode with a cron-based scheduler.
  • Enable/disable recurring jobs such as cluster or node recovery with automatic maintenance mode.

MySQL Freeze Frame (BETA)

  • Snapshot MySQL status before cluster failure.
  • Snapshot MySQL process list before cluster failure (coming soon).
  • Inspect cluster incidents in operational reports or from the s9s command line tool.

New Operating System & Database Support

  • Centos 8 and Debian 10 support.
  • PostgreSQL 12 support.
  • MongoDB 4.2 and Percona MongoDB v4.0 support.

Additional Misc Improvements

  • Synchronize time range selection between the Overview and Node pages.
  • Improvements to the nodes status updates to be more accurate and with less delay.
  • Enable/Disable Cluster and Node recovery are now regular CMON jobs.
  • Topology view for Cluster-to-Cluster Replication.

View Release Details and Resources

Release Details

Cluster-Wide Maintenance 

The ability to place a database node into Maintenance Mode was implemented in the last version of ClusterControl (1.7.4). In this release we now offer the ability to place your entire database cluster into Maintenance Mode to allow you to perform updates, patches, and more.

MySQL & MariaDB Freeze Frame

This new ClusterControl feature allows you to get a snapshot of your MySQL statuses and related processes immediately before a failure is detected. This allows you to better understand what happened when troubleshooting, and provide you with actionable information on how you can prevent this type of failure from happening in the future. 

This new feature is not part of the auto-recovery features in ClusterControl. Should your database cluster go down those functions will still perform to attempt to get you back online; it’s just that now you’ll have a better idea of what caused it. 

Support for PostgreSQL 12

Released in October 2019, PostgreSQL 12 featured major improvements to indexing, partitioning, new SQL & JSON functions, and improved security features, mainly around authentication. ClusterControl now allows you to deploy a preconfigured Postgres 12 database cluster with the ability to fully monitor and manage it.

PostgreSQL GUI - ClusterControl

Support for MongoDB 4.2

MongoDB 4.2 offers unique improvements such as new ACID transaction guarantees, new query and analytics functions including new charts for rich data visualizations. ClusterControl now allows you to deploy a preconfigured MongoDB 4.2 or Percona Server for MongoDB 4.2 ReplicaSet with the ability to fully monitor and manage it.

MongoDB GUI - ClusterControl

My MySQL Database is Out of Disk Space


When the MySQL server ran out of disk space, you would see one of the following error in your application (as well as in the MySQL error log):

ERROR 3 (HY000) at line 1: Error writing file '/tmp/AY0Wn7vA' (Errcode: 28 - No space left on device)

For binary log:

[ERROR] [MY-000035] [Server] Disk is full writing './binlog.000019' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.

For relay log:

[ERROR] [MY-000035] [Server] Disk is full writing './relay-bin.000007' (OS errno 28 - No space left on device). Waiting for someone to free space... Retry in 60 secs. Message reprinted in 600 secs.

For slow query log:

[ERROR] [MY-011263] [Server] Could not use /var/log/mysql/mysql-slow.log for logging (error 28 - No space left on device). Turning logging off for the server process. To turn it on again: fix the cause, then either restart the query logging by using "SET GLOBAL SLOW_QUERY_LOG=ON" or restart the MySQL server.

For InnoDB:

[ERROR] [MY-012144] [InnoDB] posix_fallocate(): Failed to preallocate data for file ./#innodb_temp/temp_8.ibt, desired size 16384 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/8.0/en/operating-system-error-codes.html
[Warning] [MY-012638] [InnoDB] Retry attempts for writing partial data failed.
[ERROR] [MY-012639] [InnoDB] Write to file ./#innodb_temp/temp_8.ibt failed at offset 81920, 16384 bytes should have been written, only 0 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
[ERROR] [MY-012640] [InnoDB] Error number 28 means 'No space left on device'
[Warning] [MY-012145] [InnoDB] Error while writing 16384 zeroes to ./#

They are all reporting the same error code number which is 28. Alternatively, we can use the error code to see the actual error with perror command:

$ perror 28
OS error code  28: No space left on device

The above simply means the MySQL server is out of disk space, and most of the time MySQL is stopped or stalled at this point. In this blog post, we are going to look into ways to solve this issue for MySQL running in a Linux-based environment.


First of all, we have to determine which disk partition is full. MySQL can be configured to store data on different disk or partition. Look at the path as stated in the error to start with. In this example, our directory is located in the default location, /var/lib/mysql which is under the / partition. We can use df command and specify the full path to the datadir to get the partition the data is stored:

$ df -h /var/lib/mysql
Filesystem      Size Used Avail Use% Mounted on
/dev/sda1        40G 40G 20K 100% /

The above means we have to clear up some space in the root partition.

Temporary Workarounds

The temporary workaround is to clear up some disk space so MySQL can write to the disk and resume the operation. Things that we can do if we face this kind of problems are:

  • Remove unnecessary files
  • Purge binary logs
  • Drop old tables, or rebuild a very big table

Remove Unnecessary Files

This is commonly the first step to do if MySQL server is down or unresponsive, or you have no binary logs enabled. For example, files under /var/log/ are commonly the first place to look for unnecessary files:

$ cd /var/log
$ find . -type f -size +5M -exec du -sh {} +
8.1M ./audit/audit.log.6
8.1M ./audit/audit.log.5
8.1M ./audit/audit.log.4
8.1M ./audit/audit.log.3
8.1M ./audit/audit.log.2
8.1M ./audit/audit.log.1
11M ./audit/audit.log
8.5M ./secure-20190429
8.0M ./wtmp

The above example shows how to retrieve files that are bigger than 5MB. We can safely remove the rotated log files which are usually in {filename}.{number} format, for example audit.log.1 until audit.log.6. The same thing goes to any huge older backups that are stored in the server. If you had performed a restoration via Percona Xtrabackup or MariaDB Backup, all files prefixed with xtrabackup_ can be removed from the MySQL datadir, as they are no longer necessary for the restoration. The xtrabackup_logfile usually is the biggest file since it contains all transactions executed while the xtrabackup process copying the datadir to the destination. The following example shows all the related files in MySQL datadir:

$ ls -lah /var/lib/mysql | grep xtrabackup_
-rw-r-----.  1 mysql root   286 Feb 4 11:30 xtrabackup_binlog_info
-rw-r--r--.  1 mysql root    24 Feb 4 11:31 xtrabackup_binlog_pos_innodb
-rw-r-----.  1 mysql root    83 Feb 4 11:31 xtrabackup_checkpoints
-rw-r-----.  1 mysql root   808 Feb 4 11:30 xtrabackup_info
-rw-r-----.  1 mysql root  179M Feb 4 11:31 xtrabackup_logfile
-rw-r--r--.  1 mysql root     1 Feb 4 11:31 xtrabackup_master_key_id
-rw-r-----.  1 mysql root   248 Feb 4 11:31 xtrabackup_tablespaces

Therefore, the mentioned files are safe to be deleted. Start MySQL service once there is at least 10% more free space.

Purge the Binary Logs

If the MySQL server is still responsive and it has binary log enabled, e.g, for replication or point-in-time recovery, we can purge the old binary log files by using PURGE statement and provide the interval. In this example, we are deleting all binary logs before 3 days ago:


For MySQL Replication, it's safe to delete all logs that have been replicated and applied on slaves. Check the Relay_Master_Log_File value on the server:

        Relay_Master_Log_File: binlog.000008

And delete the older log files for example binlog.000007 and older. It's good practice to restart MySQL server to make sure that it has enough resources. We can also let the binary log rotation to happen automatically via expire_logs_days variable (<MySQL 8.0). For example, to keep only 3 days of binary logs, run the following statement:

mysql> SET GLOBAL expire_logs_days = 3;

Then, add the following line into MySQL configuration file under [mysqld] section:


In MySQL 8.0, use binlog_expire_logs_seconds instead, where the default value is 2592000 seconds (30 days). In this example, we reduce it to only 3 days (60 seconds x 60 minutes x 24 hours x 3 days):

mysql> SET GLOBAL binlog_expire_logs_seconds = (60*60*24*3);
mysql> SET PERSIST binlog_expire_logs_seconds = (60*60*24*3);

SET PERSIST will make sure the configuration is loaded in the next restart. Configuration set by this command is stored inside /var/lib/mysql/mysqld-auto.cnf.

Drop Old Tables / Rebuild Tables

Note that DELETE operation won't free up the disk space unless OPTIMIZE TABLE is executed afterward. Thus, if you have deleted many rows, and you would like to return the free space back to the OS after a huge DELETE operation, run the OPTIMIZE TABLE, or rebuild it. For example:

mysql> DELETE tbl_name WHERE id < 100000; -- remove 100K rows
mysql> OPTIMIZE TABLE tbl_name;

We can also force to rebuild a table by using ALTER statement:

mysql> ALTER TABLE tbl_name FORCE;
mysql> ALTER TABLE tbl_name; -- a.k.a "null" rebuild

Note that the above DDL operation is performed via online DDL, meaning MySQL permits concurrent DML operations while the rebuilding is ongoing. Another way to perform a defragmentation operation is to use mysqldump to dump the table to a text file, drop the table, and reload it from the dump file. Ultimately, we can also use DROP TABLE to remove the unused table or TRUNCATE TABLE to clear up all rows in the table, which consequently return the space back to the OS.

Permanent Solutions to Disk Space Issues

The permanent solution is of course adding more space to the corresponding disk or partition, or applying a shorter retention rule to keep unnecessary files in the server. If you are running on top of a scalable file storage system, you should be able to scale the resource up without too much hassle, or with minimal disruption and downtime to the MySQL service. To learn more on how to dimension your storage and understand MySQL and MariaDB capacity planning, check out this blog post.

You can be least worried with ClusterControl proactive monitoring, where you would get a warning notification when the disk space has reached 80%, and critical notification if the disk usage is 90% and higher.

Steps to Take if You Have a MySQL Outage


A MySQL outage simply means your MySQL service is not accessible or unresponsive from the other's perspective. Outages can be originated by a bunch of possible causes..

  • Network issue - Connectivity issue, switch, routing, resolver, load-balancer tier.
  • Resource issue - Whether you have reached resources limit or bottleneck.
  • Misconfiguration - Wrong permission or ownership, unknown variable, wrong password, privilege changed.
  • Locking - Global or table lock prevent others from accessing the data.

In this blog post, we’ll look at some steps to take if you’re having a MySQL outage (Linux environment).

Step One: Get the Error Code

When you have an outage, your application will throw out some errors and exceptions. These errors commonly come with an error code, that will give you a rough idea on what you’re facing and what to do next to troubleshoot the issue and recover the outage. 

To get more details on the error, check the MySQL Error Code or MariaDB Error Code pages respectively to figure out what the error means.

Step Two: Is the MySQL Server Running?

Log into the server via terminal and see if MySQL daemon is running and listening to the correct port. In Linux, one would do the following:

Firstly, check the MySQL process:

$ ps -ef | grep -i mysql

You should get something in return. Otherwise, MySQL is not running. If MySQL is not running, try to start it up:

$ systemctl start mysql # systemd

$ service mysql start # sysvinit/upstart

$ mysqld_safe # manual

If you are seeing an error on the above step, you should go look at the MySQL error log, which varies depending on the operating system and MySQL variable configuration for log_error in MySQL configuration file. For RedHat-based server, the file is commonly located at:

$ cat /var/log/mysqld.log

Pay attention to the most recent lines with log level "[Error]". Some lines labelled with "[Warning]" could indicate some problems, but those are pretty uncommon. Most of the time, misconfiguration and resource issues can be detected from here.

If MySQL is running, check whether it's listening to the correct port:

$ netstat -tulpn | grep -i mysql

tcp6       0 0 :::3306                 :::* LISTEN   1089/mysqld

You would get the process name "mysqld", listening on all interfaces (:::3306 or on port 3306 with PID 1089 and the state is "LISTEN". If you see the above line shows, MySQL is only listening locally. You might need to change the bind_address value in MySQL configuration file to listen to all IP addresses, or simply comment on the line. 

Step Three: Check for Connectivity Issues

If the MySQL server is running fine without error inside the MySQL error log, the chance that connectivity issues are happening is pretty high. Start by checking connectivity to the host via ping (if ICMP is enabled) and telnet to the MySQL server from the application server:

(application-server)$ ping db1.mydomain.com

(application-server)$ telnet db1.mydomain.com 3306

Trying db1.mydomain.com...

Connected to

Escape character is '^]'.



You should see some lines in the telnet output if you can get connected to the MySQL port. Now, try once more by using MySQL client from the application server:

(application-server)$ mysql -u db_user -p -h db1.mydomain.com -P3306

ERROR 1045 (28000): Access denied for user 'db_user'@'db1.mydomain.com' (using password: YES)

In the above example, the error gives us a bit of information on what to do next. The above probably because someone has changed the password for "db_user" or the password for this user has expired. This is a rather normal behaviour from MySQL 5.7. 4 and above, where the automatic password expiration policy is enabled by default with a 360 days threshold - meaning that all passwords will expire once a year.

Step Four: Check the MySQL Processlist

If MySQL is running fine without connectivity issues, check the MySQL process list to see what processes are currently running:



| Id  | User | Host      | db | Command | Time | State | Info                  | Rows_sent | Rows_examined |


| 117 | root | localhost | NULL | Query   | 0 | init | SHOW FULL PROCESSLIST |       0 | 0 |


1 row in set (0.01 sec)

Pay attention to the Info and Time column. Some MySQL operations could be destructive enough to make the database stalls and become unresponsive. The following SQL statements, if running, could block others to access the database or table (which could bring a brief outage of MySQL service from the application perspective):

  • LOCK TABLE ...

Some long running transactions could also stall others, which eventually would cause timeouts to other transactions waiting to access the same resources. You may either kill the offensive transaction to let others access the same rows or retry the enqueue transactions after the long transaction finishes.


Proactive monitoring is really important to minimize the risk of MySQL outage. If your database is managed by ClusterControl, all the mentioned aspects are being monitored automatically without any additional configuration from the user. You shall receive alarms in your inbox for anomaly detections like long running queries, server misconfiguration, resource exceeding threshold and many more. Plus, ClusterControl will automatically attempt to recover your database service if something goes wrong with the host or network.

You can also learn more about MySQL & MariaDB Disaster Recovery by reading our whitepaper.

Viewing all 327 articles
Browse latest View live