Severalnines - MariaDB

Manual deployments are common, but they can be slow and monotonous, if you ever tried Oracle RAC installation with Data Guard setup on more than three nodes, you know what I mean. Depending on the number of nodes, the deployment steps may be time draining and error likely. Of course, there are many good “how-to’s” on how to do a manual database cluster setup, however, with the manual approach on a scale there are many additional questions to address.

Are the other instances in my environment setup in the same way? Was that QA system set up in the same way as production? Whether what we just deployed is production-ready? To address all that questions deployments are increasingly being automated via configuration management tools.

Popular configuration management tools like Puppet, Chef, and Ansible are proven technologies in deploying various IT services. They help eliminate manual work, minimize the risk of human error, and make it possible to deploy rapidly. In today's blog, we will take a look at one of them.

Ansible is an open source system management tool for centralizing and automating configuration management. With Ansible you can easily automate various database deployments and perform simple administration tasks. We will showcase how to automatically install and configure software such as MySQL server in reproducible environments. In this blog, we are going to focus on MariaDB replication but if you are interested in other tasks please check our other blogs where we write more about Ansible.

Vagrant, Virtualbox, and Ansible

Ansible can help to deploy MySQL Cluster in the cloud or on-prem. For the purpose of this blog, we are going to use the popular setup for running various tests on desktop machines with Vagrant and Virtualbox.

Vagrant is a system that allows you to easily create and move development environments from one machine to another. Simply define what type of VM you want in a file called Vagrantfile and then fire them up with a single command. It integrates well with virtual machine providers like VirtualBox, VMware and AWS and what is important for our task it has a great support of Ansible.

Our Vagrantfile deploys 2 instances on the VirtualBox platform, one for master node and second slave node. Then we will use the Ansible to run the necessary packages installation and execute configuration of a master/slave. Below is the list of tasks that we are going to perform.

Install Vagrant and Virtualbox
Configure vagrant file and ansible playbook
Launch the instances
Download the related Vagrant boxes and Vagrantfile (this is done automatically)
Run Ansible playbook (this will be done automatically)
Add the cluster to ClusterControl for the monitoring and management task (like backups, security, user management, performance management and many other).

Vagrant, Virtualbox and Ansible Installation on Ubuntu

Install packages

sudo apt-get install ansible vagrant virtualbox

Create configuration files for Vagrant and Ansible

$ mkdir mariadbtest
$ vi Vagrantfile
VAGRANTFILE_API_VERSION = "2"
 Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.provision "ansible" do |ansible|
    ansible.playbook = "maria.yml"
    ansible.sudo = true
  end
  config.vm.define "master" do |master|
    master.vm.hostname = "master"
    master.vm.network "forwarded_port", guest: 3306, host: 3336
    master.vm.network "private_network", ip: "192.168.10.2"        
  end
  config.vm.define "slave" do |slave|
    slave.vm.hostname = "slave"
    slave.vm.network "forwarded_port", guest: 3306, host: 3337
    slave.vm.network "private_network", ip: "192.168.10.3"                
  end
  config.vm.provider "virtualbox" do |v|
    v.memory = 1024
    v.cpus = 2
  end
end

Above vagrant file will create two machines with the following configuration:

Master: 2 CPU, 1GB RAM, Private IP: 192.168.10.2 Port forward: 3336
Slave: 2CPU, 1GB RAM, Private IP: 192.168.10.3, Port forward: 3337

Playbook Structure

In this step, we will define Ansible playbook. Ansible uses YAML as an easy markup language to define instructions. We create the following “maria.yml” based on the Ansible file delivered by Mariadb.

$vi maria.yml
- hosts: master:slave
  user: vagrant
  tasks:
  - name: Install MariaDB repository
    apt_repository: repo='deb http://ftp.igh.cnrs.fr/pub/mariadb/repo/10.3/ubuntu trusty main' state=present
  - name: Add repository key to the system
    apt_key: keyserver=keyserver.ubuntu.com id=0xcbcb082a1bb943db
  - name: Install MariaDB Server
    apt: name=mariadb-server state=latest update_cache=yes
  - name: Install python module
    apt: name=python-mysqldb state=installed
  - name: Create replication account
    mysql_user: name=repl host="%" password=s3cr3tPaSSwordR priv=*.*:"REPLICATION SLAVE" state=present
  - name: Create readwrite user
    mysql_user: name=rwuser host="%" password=s3cr3tPaSSwordR priv=*.*:SELECT,INSERT,UPDATE,DELETE,CREATE,DROP state=present
  - name: Modify configuration file to listen on all interfaces
    lineinfile: dest=/etc/mysql/my.cnf regexp="^bind-address" line="bind-address=0.0.0.0"
- hosts: master
  user: vagrant
  tasks:
  - name: Modify configuration file to setup server ID
    lineinfile: dest=/etc/mysql/my.cnf regexp="^#server-id" line="server-id=1"
  - name: Restart mysql service
    service: name=mysql state=restarted
  - name: Reset master binlog
    command: /usr/bin/mysql -u root -e "RESET MASTER"
- hosts: slave
  user: vagrant
  tasks:
  - name: Modify configuration file to setup server ID
    lineinfile: dest=/etc/mysql/my.cnf regexp="^#server-id" line="server-id=2"
  - name: Setup replication
    command: /usr/bin/mysql -uroot -e "CHANGE MASTER TO master_host='192.168.10.2', master_user='repl', master_password='s3cr3tPaSSwordR', master_use_gtid=current_pos"
  - name: Restart mysql service
    service: name=mysql state=restarted

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Now it’s time for the instances. Vagrant up will trigger playbook installation.

$ vagrant up
DEPRECATION: The 'sudo' option for the Ansible provisioner is deprecated.
Please use the 'become' option instead.
The 'sudo' option will be removed in a future release of Vagrant.

==> vagrant: A new version of Vagrant is available: 2.2.4 (installed version: 2.2.3)!
==> vagrant: To upgrade visit: https://www.vagrantup.com/downloads.html

Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'slave' up with 'virtualbox' provider...
==> master: Box 'ubuntu/trusty64' could not be found. Attempting to find and install...
    master: Box Provider: virtualbox
    master: Box Version: >= 0
==> master: Loading metadata for box 'ubuntu/trusty64'
    master: URL: https://vagrantcloud.com/ubuntu/trusty64
==> master: Adding box 'ubuntu/trusty64' (v20190429.0.1) for provider: virtualbox
    master: Downloading: https://vagrantcloud.com/ubuntu/boxes/trusty64/versions/20190429.0.1/providers/virtualbox.box
    master: Download redirected to host: cloud-images.ubuntu.com
    master: Progress: 7% (Rate: 551k/s, Estimated time remaining: 0:14:31)

Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'slave' up with 'virtualbox' provider...

If you don’t have ubuntu image from Virtualbox already downloaded, vagrant will download it automatically like with the above example.

PLAY [master:slave] ************************************************************

TASK [Gathering Facts] *********************************************************
ok: [slave]

TASK [Install MariaDB repository] **********************************************
changed: [slave]

TASK [Add repository key to the system] ****************************************
changed: [slave]

TASK [Install MariaDB Server] **************************************************

After successful playbook installation you will see following output and you should be able to login to the database with predefined credentials (see playbook).

PLAY RECAP ********************************************************************
master                     : ok=12   changed=10   unreachable=0    failed=0

The next step is to import your master/slave configuration to ClusterControl. The easiest and most convenient way to install ClusterControl is to use the installation script provided by Severalnines. Simply download the script and execute as the root user or user with sudo root permission.

$ wget http://www.severalnines.com/downloads/cmon/install-cc
$ chmod +x install-cc
$ ./install-cc # as root or sudo user

If you wish to add a ClusterControl installation to your playbook you can use the following instructions.

Next step is to generate an SSH key which we will use to set up the passwordless SSH later on. If you have a key pair which you would like to use, you can skip the creation of a new one.

ClusterControl: Import existing cluster

After successful installation, you can finally import your newly created test cluster into ClusterControl.

We hope this blog post gave you insight into basic of Ansible MariaDB master/slave replication installation and setup. Please check our other blogs where we present Chef, Puppet, Docker for MariaDB database and other database cluster deployments.

Tags:

MariaDB

replication

vagrant

ansible

In one of the previous blogs, we covered new features which are coming out in MariaDB 10.4. We mentioned there that included in this version will be a new Galera Cluster release. In this blog post we will go over the features of Galera Cluster 26.4.0 (or Galera 4), take a quick look at them, and explore how they will affect your setup when working with MariaDB Galera Cluster.

Streaming Replication

Galera Cluster is by no means a drop-in replacement for standalone MySQL. The way in which the writeset certification works introduced several limitations and edge cases which may seriously limit the ability to migrate into Galera Cluster. The three most common limitations are...

Problems with long transactions
Problems with large transactions
Problems with hot-spots in tables

What’s great to see is that Galera 4 introduces Streaming Replication, which may help in reducing these limitations. Let’s review the current state in a little more detail.

Long Running Transactions

In this case we are talking timewise, which are definitely problematic in Galera. The main thing to understand is that Galera replicates transactions as writesets. Those writesets are certified on the members of the cluster, ensuring that all nodes can apply given writeset. The problem is, locks are created on the local node, they are not replicated across the cluster therefore if your transaction takes several minutes to complete and if you are writing to more than one Galera node, with time it is more and more likely that on one of the remaining nodes some transactions will modify some of the rows updated in your long-running transaction. This will cause certification to fail and long running transaction will have to be rolled back. In short, given you send writes to more than one node in the cluster, longer the transaction, the more likely it is to fail certification due to some conflict.

Hotspots

By that we mean rows, which are frequently updated. Typically it’s some sort of a counter that’s being updated over and over again. The culprit of the problem is the same as in long transactions - rows are locked only locally. Again, if you send writes to more than one node, it is likely that the same counter will be modified at the same time on more than one node, causing conflicts and making certification fail.

For both those problems there is one solution - you can send your writes to just one node instead of distributing them across the whole cluster. You can use proxies for that - ClusterControl deploys HAProxy and ProxySQL, both can be configured so that writes will be sent to only one node. If you cannot send writes to one node only, you have to accept you will be seeing certification conflicts and rollbacks from time to time. In general, application has to be able to handle rollbacks from the database - there is no way around that, but it is even more important when application works with Galera Cluster.

Still, sending the traffic to one node is not enough to handle third problem.

Large Transactions

What is important to keep in mind is that the writeset is sent for certification only when the transaction completes. Then, the writeset is sent to all nodes and the certification process takes place. This induces limits on how big the single transaction can be as Galera, when preparing writeset, stores it in in-memory buffer. Too large transactions will reduce the cluster performance. Therefore two variables has been introduced: wsrep_max_ws_rows, which limits the number of rows per transaction (although it can be set to 0 - unlimited) and, more important: wsrep_max_ws_size, which can be set up to 2 GB. So, the largest transaction you can run with Galera Cluster is up to 2GB in size. Also, you have to keep in mind that certification and applying of the large transaction also takes time, creating “lag” - read after write, that hit node other than where you initially committed the transaction, will most likely result in incorrect data as the transaction is still being applied.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Galera 4 comes with Streaming Replication, which can be used to mitigate all those problems. The main difference will be that the writeset now can be split into parts - no longer it will be needed to wait for the whole transaction to finish before data will be replicated. This may make you wonder - how the certification look like in such case? In short, certification is on the fly - each fragment is certified and all involved rows are locked on all of the nodes in the cluster. This is a serious change in how Galera works - until now locks were created locally, with streaming replication locks will be created on all of the nodes. This helps in the cases we discussed above - locking rows as transaction fragments come in, helps to reduce the probability that transaction will have to be rolled back. Conflicting transactions executed locally will not be able to get the locks they need and will have to wait for the replicating transaction to complete and release the row locks.

In the case of hotspots, with streaming replication it is possible to get the locks on all of the nodes when updating the row. Other queries which want to update the same row will have to wait for the lock to be released before they will execute their changes.

Large transactions will benefit from the streaming replication because it will no longer be needed to wait for the whole transaction to finish nor they will be limited by the transaction size - large transaction will be split into fragments. It also helps to utilize network better - instead of sending 2GB of data at once the same 2GB of data can be split into fragments and sent over a longer period of time.

There are two configuration options for streaming replication: wsrep_trx_fragment_size, which tells how big a fragment should be (by default it is set to 0, which means that the streaming replication is disabled) and wsrep_trx_fragment_unit, which tells what the fragment really is. By default it is bytes, but it can also be a ‘statements’ or ‘rows’. Those variables can (and should) be set on a session level, making it possible for user to decide which particular query should be replicated using streaming replication. Setting unit to ‘statements’ and size to 1 allow, for example, to use streaming replication just for a single query which, for example, updates a hotspot.

Of course, there are drawbacks of running the streaming replication, mainly due to the fact that locks are now taken on all nodes in the cluster. If you have seen large transaction rolling back for ages, now such transaction will have to roll back on all of the nodes. Obviously, the best practice is to reduce the size of a transaction as much as possible to avoid rollbacks taking hours to complete. Another drawback is that, for the crash recovery reasons, writesets created from each fragment are stored in wsrep_schema.SR table on all nodes, which, sort of, implements double-write buffer, increasing the load on the cluster. Therefore you should carefully decide which transaction should be replicated using the streaming replication and, as long as it is feasible, you should still stick to the best practices of having small, short transactions or splitting the large transaction into smaller batches.

Backup Locks

Finally, MariaDB users will be able to benefit from backup locks for SST. The idea behind SST executed using (for MariaDB) mariabackup is that the whole dataset has to be transferred, on the fly, with redo logs being collected in the background. Then, a global lock has to be acquired, ensuring that no write will happen, final position of the redo log has to be collected and stored. Historically, for MariaDB, the locking part was performed using FLUSH TABLES WITH READ LOCK which did its job but under heavy load it was quite hard to acquire. It is also pretty heavy - not only transactions have to wait for the lock to be released but also the data has to be flushed to disk. Now, with MariaDB 10.4, it will be possible to use less intrusive BACKUP LOCK, which will not require data to be flushed, only commits will be blocked for the duration of the lock. This should mean less intrusive SST operations, which is definitely great to hear. Everyone who had to run their Galera Cluster in emergency mode, on one node, keeping fingers crossed that SST will not impact cluster operations should be more than happy to hear about this improvement.

Causal Reads From the Application

Galera 4 introduced three new functions which are intended to help add support for causal reads in the applications - WSREP_LAST_WRITTEN_GTID(), which returns GTID of the last write made by the client, WSREP_LAST_SEEN_GTID(), which returns the GTID of the last write transaction observed by the client and WSREP_SYNC_WAIT_UPTO_GTID(), which will block the client until the GTID passed to the function will be committed on the node. Sure, you can enforce causal reads in Galera even now, but by utilizing those functions it will be possible to implement safe read after write in those parts of the application where it is needed, without having a need to make changes in Galera configuration.

Upgrading to MariaDB Galera 10.4

If you would like to try Galera 4, it is available in the latest release candidate for MariaDB 10.4. As per MariaDB documentation, at this moment there is no way to do a live upgrade of 10.3 Galera to 10.4. You have to stop the whole 10.3 cluster, upgrade it to 10.4 and then start it back. This is a serious blocker and we hope this limitation will be removed in one of the next versions. It is of utmost importance to have the option for a live upgrade and for that both MariaDB 10.3 and MariaDB 10.4 will have to coexist in the same Galera Cluster. Another option, which also may be suitable, is to set up asynchronous replication between old and new Galera Cluster.

We really hope you enjoyed this short review of the features of MariaDB 10.4 Galera Cluster, we are looking forward to see streaming replication in real live production environments. We also hope those changes will help to increase Galera adoption even further. After all, streaming replication solves many issues which can prevent people from migrating into Galera.

Tags:

MariaDB

mariadb cluster

galera

In the previous blog post, we showed you some basic steps to deploy and manage a standalone MySQL server as well as MySQL Replication setup using the MySQL Puppet module. In this second installation, we are going to cover similar steps, but now with a Galera Cluster setup.

Galera Cluster with Puppet

As you might know, Galera Cluster has three main providers:

MySQL Galera Cluster (Codership)
Percona XtraDB Cluster (Percona)
MariaDB Cluster (embedded into MariaDB Server by MariaDB)

A common practice with Galera Cluster deployments is to have an additional layer sitting on top of the database cluster for load balancing purposes. However, that is a complex process which deserves its own post.

There are a number of Puppet modules available in the Puppet Forge that can be used to deploy a Galera Cluster. Here are some of them..

puppetlabs/mysql - MariaDB Galera only
fraenki/galera - Percona XtraDB Cluster and MySQL Galera from Codership
edestecd/mariadb - MariaDB Cluster only
filiadata/percona - Percona XtraDB Cluster

Since our objective is to provide a basic understanding of how to write manifest and automate the deployment for Galera Cluster, we will be covering the deployment of the MariaDB Galera Cluster using the puppetlabs/mysql module. For other modules, you can always take a look at their respective documentation for instructions or tips on how to install.

In Galera Cluster, the ordering when starting node is critical. To properly start a fresh new cluster one node has to be setup as the reference node. This node will be started with an empty-host connection string (gcomm://) to initialize the cluster. This process is called bootstrapping.

Once started, the node will become a primary component and the remaining nodes can be started using the standard mysql start command (systemctl start mysql or service mysql start) followed by a full-host connection string (gcomm://db1,db2,db3). Bootstrapping is only required if there is no primary component holds by any other node in the cluster (check with wsrep_cluster_status status).

The cluster startup process must be performed explicitly by the user. The manifest itself must NOT start the cluster (bootstrap any node) at the first run to avoid any risk of data loss. Remember, the Puppet manifest must be written to be as idempotent as possible. The manifest must be safe in order to be executed multiple times without affecting the already running MySQL instances. This means we have to focus primarily on repository configuration, package installation, pre-running configuration, and SST user configuration.

The following configuration options are mandatory for Galera:

wsrep_on: A flag to turn on writeset replication API for Galera Cluster (MariaDB only).
wsrep_cluster_name: The cluster name. Must be identical on all nodes that part of the same cluster.
wsrep_cluster_address: The Galera communication connection string, prefix with gcomm:// and followed by node list, separated by comma. Empty node list means cluster initialization.
wsrep_provider: The path where the Galera library resides. The path might be different depending on the operating system.
bind_address: MySQL must be reachable externally so value '0.0.0.0' is compulsory.
wsrep_sst_method: For MariaDB, the preferred SST method is mariabackup.
wsrep_sst_auth: MySQL user and password (separated by colon) to perform snapshot transfer. Commonly, we specify a user that has the ability to create a full backup.
wsrep_node_address: IP address for Galera communication and replication. Use Puppet facter to pick the correct IP address.
wsrep_node_name: hostname of FQDN. Use Puppet facter to pick the correct hostname.

For Debian-based deployments, the post-installation script will attempt to start the MariaDB server automatically. If we configured wsrep_on=ON (flag to enable Galera) with the full address in wsrep_cluster_address variable, the server would fail during installation. This is because it has no primary component to connect to.

To properly start a cluster in Galera the first node (called bootstrap node) has to be configured with an empty connection string (wsrep_cluster_address = gcomm://) to initiate the node as the primary component. You can also run the provided bootstrap script, called galera_new_cluster, which basically does a similar thing in but the background.

Deployment of Galera Cluster (MariaDB)

Deployment of Galera Cluster requires additional configuration on the APT source to install the preferred MariaDB version repository.

Note that Galera replication is embedded inside MariaDB Server and requires no additional packages to be installed. That being said, an extra flag is required to enable Galera by using wsrep_on=ON. Without this flag, MariaDB will act as a standalone server.

In our Debian-based environment, the wsrep_on option can only present in the manifest after the first deployment completes (as shown further down in the deployment steps). This is to ensure the first, initial start acts as a standalone server for Puppet to provision the node before it's completely ready to be a Galera node.

Let's start by preparing the manifest content as below (modify the global variables section if necessary):

# Puppet manifest for Galera Cluster MariaDB 10.3 on Ubuntu 18.04 (Puppet v6.4.2) 
# /etc/puppetlabs/code/environments/production/manifests/galera.pp

# global vars
$sst_user         = 'sstuser'
$sst_password     = 'S3cr333t$'
$backup_dir       = '/home/backup/mysql'
$mysql_cluster_address = 'gcomm://192.168.0.161,192.168.0.162,192.168.0.163'


# node definition
node "db1.local", "db2.local", "db3.local" {
  Apt::Source['mariadb'] ~>
  Class['apt::update'] ->
  Class['mysql::server'] ->
  Class['mysql::backup::xtrabackup']
}

# apt module must be installed first: 'puppet module install puppetlabs-apt'
include apt

# custom repository definition
apt::source { 'mariadb':
  location => 'http://sfo1.mirrors.digitalocean.com/mariadb/repo/10.3/ubuntu',
  release  => $::lsbdistcodename,
  repos    => 'main',
  key      => {
    id     => 'A6E773A1812E4B8FD94024AAC0F47944DE8F6914',
    server => 'hkp://keyserver.ubuntu.com:80',
  },
  include  => {
    src    => false,
    deb    => true,
  },
}

# Galera configuration
class {'mysql::server':
  package_name            => 'mariadb-server',
  root_password           => 'q1w2e3!@#',
  service_name            => 'mysql',
  create_root_my_cnf      => true,
  remove_default_accounts => true,
  manage_config_file      => true,
  override_options        => {
    'mysqld' => {
      'datadir'                 => '/var/lib/mysql',
      'bind_address'            => '0.0.0.0',
      'binlog-format'           => 'ROW',
      'default-storage-engine'  => 'InnoDB',
      'wsrep_provider'          => '/usr/lib/galera/libgalera_smm.so',
      'wsrep_provider_options'  => 'gcache.size=1G',
      'wsrep_cluster_name'      => 'galera_cluster',
      'wsrep_cluster_address'   => $mysql_cluster_address,
      'log-error'               => '/var/log/mysql/error.log',
      'wsrep_node_address'      => $facts['networking']['interfaces']['enp0s8']['ip'],
      'wsrep_node_name'         => $hostname,
      'innodb_buffer_pool_size' => '512M',
      'wsrep_sst_method'        => 'mariabackup',
      'wsrep_sst_auth'          => "${sst_user}:${sst_password}"
    },
    'mysqld_safe' => {
      'log-error'               => '/var/log/mysql/error.log'
    }
  }
}

# force creation of backup dir if not exist
exec { "mkdir -p ${backup_dir}" :
  path   => ['/bin','/usr/bin'],
  unless => "test -d ${backup_dir}"
}

# create SST and backup user
class { 'mysql::backup::xtrabackup' :
  xtrabackup_package_name => 'mariadb-backup',
  backupuser              => "${sst_user}",
  backuppassword          => "${sst_password}",
  backupmethod            => 'mariabackup',
  backupdir               => "${backup_dir}"
}

# /etc/hosts definition
host {
  'db1.local': ip => '192.168.0.161';
  'db2.local': ip => '192.169.0.162';
  'db3.local': ip => '192.168.0.163';
}

A bit of explanation is needed at this point. 'wsrep_node_address' must be pointed to the same IP address as what was declared in the wsrep_cluster_address. In this environment our hosts have two network interfaces and we want to use the second interface (called enp0s8) for Galera communication (where 192.168.0.0/24 network is connected to). That's why we use Puppet facter to get the information from the node and apply it to the configuration option. The rest is pretty self-explanatory.

On every MariaDB node, run the following command to apply the catalogue as root user:

$ puppet agent -t

The catalogue will be applied to each node for installation and preparation. Once done, we have to add the following line into our manifest under "override_options => mysqld" section:

'wsrep_on'                 => 'ON',

The above will satisfy the Galera requirement for MariaDB. Then, apply the catalogue on every MariaDB node once more:

$ puppet agent -t

Once done, we are ready to bootstrap our cluster. Since this is a new cluster, we can pick any of the node to be the reference node a.k.a bootstrap node. Let's pick db1.local (192.168.0.161) and run the following command:

$ galera_new_cluster #db1

Once the first node is started, we can start the remaining node with the standard start command (one node at a time):

$ systemctl restart mariadb #db2 and db3

Once started, take a peek at the MySQL error log at /var/log/mysql/error.log and make sure the log ends up with the following line:

2019-06-10  4:11:10 2 [Note] WSREP: Synchronized with group, ready for connections

The above tells us that the nodes are synchronized with the group. We can then verify the status by using the following command:

$ mysql -uroot -e 'show status like "wsrep%"'

Make sure on all nodes, the wsrep_cluster_size, wsrep_cluster_status and wsrep_local_state_comment are 3, "Primary" and "Synced" respectively.

MySQL Management

This module can be used to perform a number of MySQL management tasks...

configuration options (modify, apply, custom configuration)
database resources (database, user, grants)
backup (create, schedule, backup user, storage)
simple restore (mysqldump only)
plugins installation/activation

Service Control

The safest way when provisioning Galera Cluster with Puppet is to handle all service control operations manually (don't let Puppet handle it). For a simple cluster rolling restart, the standard service command would do. Run the following command one node at a time.

$ systemctl restart mariadb # Systemd
$ service mariadb restart # SysVinit

However, in the case of a network partition happening and no primary component is available (check with wsrep_cluster_status), the most up-to-date node has to be bootstrapped to bring the cluster back operational without data loss. You can follow the steps as shown in the above deployment section. To learn more about bootstrapping process with examples scenario, we have covered this in detail in this blog post, How to Bootstrap MySQL or MariaDB Galera Cluster.

Database Resource

Use the mysql::db class to ensure a database with associated user and privileges are present, for example:

  # make sure the database and user exist with proper grant
  mysql::db { 'mynewdb':
    user          => 'mynewuser',
    password      => 'passw0rd',
    host          => '192.168.0.%',
    grant         => ['SELECT', 'UPDATE']
  }

The above definition can be assigned to any node since every node in a Galera Cluster is a master.

Backup and Restore

Since we created an SST user using the xtrabackup class, Puppet will configure all the prerequisites for the backup job - creating the backup user, preparing the destination path, assigning ownership and permission, setting the cron job and setting up the backup command options to use in the provided backup script. Every node will be configured with two backup jobs (one for weekly full and another for daily incremental) default to 11:05 PM as you can tell from the crontab output:

$ crontab -l
# Puppet Name: xtrabackup-weekly
5 23 * * 0 /usr/local/sbin/xtrabackup.sh --target-dir=/home/backup/mysql --backup
# Puppet Name: xtrabackup-daily
5 23 * * 1-6 /usr/local/sbin/xtrabackup.sh --incremental-basedir=/home/backup/mysql --target-dir=/home/backup/mysql/`date +%F_%H-%M-%S` --backup

If you would like to schedule mysqldump instead, use the mysql::server::backup class to prepare the backup resources. Suppose we have the following declaration in our manifest:

  # Prepare the backup script, /usr/local/sbin/mysqlbackup.sh
  class { 'mysql::server::backup':
    backupuser     => 'backup',
    backuppassword => 'passw0rd',
    backupdir      => '/home/backup',
    backupdirowner => 'mysql',
    backupdirgroup => 'mysql',
    backupdirmode  => '755',
    backuprotate   => 15,
    time           => ['23','30'],   #backup starts at 11:30PM everyday
    include_routines  => true,
    include_triggers  => true,
    ignore_events     => false,
    maxallowedpacket  => '64M'
  }

The above tells Puppet to configure the backup script at /usr/local/sbin/mysqlbackup.sh and schedule it up at 11:30PM everyday. If you want to make an immediate backup, simply invoke:

$ mysqlbackup.sh

For the restoration, the module only supports restoration with mysqldump backup method, by importing the SQL file directly to the database using the mysql::db class, for example:

mysql::db { 'mydb':
  user     => 'myuser',
  password => 'mypass',
  host     => 'localhost',
  grant    => ['ALL PRIVILEGES'],
  sql      => '/home/backup/mysql/mydb/backup.gz',
  import_cat_cmd => 'zcat',
  import_timeout => 900
}

The SQL file will be loaded only once and not on every run, unless enforce_sql => true is used.

Configuration Management

In this example, we used manage_config_file => true with override_options to structure our configuration lines which later will be pushed out by Puppet. Any modification to the manifest file will only reflect the content of the target MySQL configuration file. This module will neither load the configuration into runtime nor restart the MySQL service after pushing the changes into the configuration file. It's the sysadmin responsibility to restart the service in order to activate the changes.

To add custom MySQL configuration, we can place additional files into "includedir", default to /etc/mysql/conf.d. This allows us to override settings or add additional ones, which is helpful if you don't use override_options in mysql::server class. Making use of Puppet template is highly recommended here. Place the custom configuration file under the module template directory (default to , /etc/puppetlabs/code/environments/production/modules/mysql/templates) and then add the following lines in the manifest:

# Loads /etc/puppetlabs/code/environments/production/modules/mysql/templates/my-custom-config.cnf.erb into /etc/mysql/conf.d/my-custom-config.cnf

file { '/etc/mysql/conf.d/my-custom-config.cnf':
  ensure  => file,
  content => template('mysql/my-custom-config.cnf.erb')
}

DevOps Guide to Database Management

Learn about what you need to know to automate and manage your open source databases

Download for Free

Puppet vs ClusterControl

Did you know that you can also automate the MySQL or MariaDB Galera deployment by using ClusterControl? You can use ClusterControl Puppet module to install it, or simply by downloading it from our website.

When compared to ClusterControl, you can expect the following differences:

A bit of a learning curve to understand Puppet syntaxes, formatting, structures before you can write manifests.
Manifest must be tested regularly. It's very common you will get a compilation error on the code especially if the catalog is applied for the first time.
Puppet presumes the codes to be idempotent. The test/check/verify condition falls under the author’s responsibility to avoid messing up with a running system.
Puppet requires an agent on the managed node.
Backward incompatibility. Some old modules would not run correctly on the new version.
Database/host monitoring has to be set up separately.

ClusterControl’s deployment wizard guides the deployment process:

Alternatively, you may use the ClusterControl command line interface called "s9s" to achieve similar results. The following command creates a three-node Percona XtraDB Cluster (provided passwordless to all nodes has been configured beforehand):

$ s9s cluster --create \
  --cluster-type=galera \
  --nodes='192.168.0.21;192.168.0.22;192.168.0.23' \
  --vendor=percona \
  --cluster-name='Percona XtraDB Cluster 5.7' \
  --provider-version=5.7 \
  --db-admin='root' \
  --db-admin-passwd='$ecR3t^word' \
  --log

Additionally, ClusterControl supports deployment of load balancers for Galera Cluster - HAproxy, ProxySQL and MariaDB MaxScale - together with a virtual IP address (provided by Keepalived) to eliminate any single point of failure for your database service.

Post deployment, nodes/clusters can be monitored and fully managed by ClusterControl, including automatic failure detection, automatic recovery, backup management, load balancer management, attaching asynchronous slave, configuration management and so on. All of these are bundled together in one product. On average, your database cluster will be up and running within 30 minutes. What it needs is only passwordless SSH to the target nodes.

You can also import an already running Galera Cluster, deployed by Puppet (or any other means) into ClusterControl to supercharge your cluster with all the cool features that comes with it. The community edition (free forever!) offers deployment and monitoring.

In the next episode, we are going to walk you through MySQL load balancer deployment using Puppet. Stay tuned!

Tags:

Nowadays, terms like Docker, Images or Containers are pretty common in all database environments, so it’s normal to see a MariaDB server running on Docker in both production and non-production setups. It is possible, however, that while you may have heard the terms, you might now know the differences between them. In this blog, we provide an overview of these terms and how we can apply them in practice to deploy a MariaDB server.

What is Docker?

Docker is the most common tool to create, deploy, and run applications by using containers. It allows you to package up an application with all of the parts it needs (such as libraries and other dependencies) and ship it all out as one package, allowing for the portable sharing of containers across different machines.

Container vs Virtual Machine

What is an Image?

An Image is like a virtual machine template. It has all the required information to run the container. This includes the operating system, software packages, drivers, configuration files, and helper scripts… all packed into one bundle.

A Docker image can be built by anyone who has the ability to write a script. That is why there are many similar images being built by the community, each with minor differences...but serving a common purpose.

What is a Docker Container?

A Docker Container is an instance of a Docker Image. It runs completely isolated from the host environment by default, only accessing host files and ports if configured to do so.

A container could be considered as a virtual machine, but instead of creating a whole virtual operating system, it allows applications to use the same Linux kernel as the system that they're running on. It only requires applications to be shipped with parts not already running on the host computer. This gives you a significant performance boost and reduces the size of the application.

Keep in mind that any changes made to the container are recorded on a separate layer, not in the same Docker Image. This means if you delete the container, or if you create a new one based on the same Docker Image, the changes won’t be there. To preserve the changes you must commit it into a new Docker Image or create a Docker File.

What is a DockerFile?

A DockerFile is a script used to generate a Docker Image where you have the steps to generate it based on any modifications that you want to apply.

Docker Components

Let’s see a Docker File example.

$ vi Dockerfile
# MariaDB 10.3 with SSH
# Pull the mariadb latest image
FROM mariadb:latest
# List all the packages that we want to install
ENV PACKAGES openssh-server openssh-client
# Install Packages
RUN apt-get update && apt-get install -y $PACKAGES
# Allow SSH Root Login
RUN sed -i 's|^#PermitRootLogin.*|PermitRootLogin yes|g' /etc/ssh/sshd_config
# Configure root password
RUN echo "root:root123" | chpasswd

Now, we can build a new Docker Image from this Docker File:

$ docker build --rm=true -t severalnines/mariadb-ssh .

Check the new image created:

$ docker images
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
severalnines/mariadb-ssh                   latest              a8022951f195        17 seconds ago      485MB

And now, we can use the new image as a common Docker Image as we’ll see in the next section.

DevOps Guide to Database Management

Learn about what you need to know to automate and manage your open source databases

Download for Free

How to Deploy MariaDB on Docker Without Dockerfile

Now that we know more about the Docker world, let’s see how to use it to create a MariaDB server. For this, we’ll assume you already have Docker installed.

We can use the image created by using the Dockerfile, but we’ll pull the official MariaDB Docker Image.

$ docker search mariadb
NAME                                   DESCRIPTION                                     STARS               OFFICIAL            AUTOMATED
mariadb                                MariaDB is a community-developed fork of MyS…   2804                [OK]

Without specifying a TAG, by default, it’ll pull the latest image version, in this case, MariaDB Server 10.3 on Ubuntu 18.04.

$ docker pull mariadb

We can check the image downloaded.

$ docker images
REPOSITORY                                 TAG                 IMAGE ID            CREATED             SIZE
mariadb                                    latest              e07bb20373d8        2 weeks ago         349MB

Then, we’ll create two directories under our MariaDB Docker directory, one for the datadir and another one for the MariaDB configuration files. We’ll add both on our MariaDB Docker Container.

$ cd ~/Docker
$ mkdir datadir
$ mkdir config

The startup configuration is specified in the file /etc/mysql/my.cnf, and it includes any files found in the /etc/mysql/conf.d directory that end with .cnf.

$ tail -1 /etc/mysql/my.cnf
!includedir /etc/mysql/conf.d/

The content of these files will override any repeated parameter configured in /etc/mysql/my.cnf, so you can create an alternative configuration here.

Let’s run our first MariaDB Docker Container:

$ docker run -d --name mariadb1 \
-p 33061:3306 \
-v ~/Docker/mariadb1/config:/etc/mysql/conf.d \
-v ~/Docker/mariadb1/datadir:/var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=root123 \
-e MYSQL_DATABASE=dbtest \
mariadb

After this, we can check our containers running:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                     NAMES
12805cc2d7b5        mariadb             "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:33061->3306/tcp   mariadb1

The container log:

$ docker logs mariadb1
MySQL init process done. Ready for start up.
2019-06-03 23:18:01 0 [Note] mysqld (mysqld 10.3.15-MariaDB-1:10.3.15+maria~bionic) starting as process 1 ...
2019-06-03 23:18:01 0 [Note] InnoDB: Using Linux native AIO
2019-06-03 23:18:01 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-06-03 23:18:01 0 [Note] InnoDB: Uses event mutexes
2019-06-03 23:18:01 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2019-06-03 23:18:01 0 [Note] InnoDB: Number of pools: 1
2019-06-03 23:18:01 0 [Note] InnoDB: Using SSE2 crc32 instructions
2019-06-03 23:18:01 0 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2019-06-03 23:18:01 0 [Note] InnoDB: Completed initialization of buffer pool
2019-06-03 23:18:01 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-06-03 23:18:01 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2019-06-03 23:18:01 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2019-06-03 23:18:01 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2019-06-03 23:18:02 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2019-06-03 23:18:02 0 [Note] InnoDB: Waiting for purge to start
2019-06-03 23:18:02 0 [Note] InnoDB: 10.3.15 started; log sequence number 1630824; transaction id 21
2019-06-03 23:18:02 0 [Note] Plugin 'FEEDBACK' is disabled.
2019-06-03 23:18:02 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2019-06-03 23:18:02 0 [Note] Server socket created on IP: '::'.
2019-06-03 23:18:02 0 [Note] InnoDB: Buffer pool(s) load completed at 190603 23:18:02
2019-06-03 23:18:02 0 [Note] Reading of all Master_info entries succeded
2019-06-03 23:18:02 0 [Note] Added new Master_info '' to hash table
2019-06-03 23:18:02 0 [Note] mysqld: ready for connections.
Version: '10.3.15-MariaDB-1:10.3.15+maria~bionic'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution

And the content of our Docker datadir path (host):

$ ls -l ~/Docker/mariadb1/datadir/
total 249664
-rw-rw----   1 sinsausti  staff     16384 Jun  3 20:18 aria_log.00000001
-rw-rw----   1 sinsausti  staff        52 Jun  3 20:18 aria_log_control
drwx------   3 sinsausti  staff        96 Jun  3 20:18 dbtest
-rw-rw----   1 sinsausti  staff       976 Jun  3 20:18 ib_buffer_pool
-rw-rw----   1 sinsausti  staff  50331648 Jun  3 20:18 ib_logfile0
-rw-rw----   1 sinsausti  staff  50331648 Jun  3 20:17 ib_logfile1
-rw-rw----   1 sinsausti  staff  12582912 Jun  3 20:18 ibdata1
-rw-rw----   1 sinsausti  staff  12582912 Jun  3 20:18 ibtmp1
-rw-rw----   1 sinsausti  staff         0 Jun  3 20:17 multi-master.info
drwx------  92 sinsausti  staff      2944 Jun  3 20:18 mysql
drwx------   3 sinsausti  staff        96 Jun  3 20:17 performance_schema
-rw-rw----   1 sinsausti  staff     24576 Jun  3 20:18 tc.log

We can access the MariaDB container running the following command and using the password specified in the MYSQL_ROOT_PASSWORD variable:

$ docker exec -it mariadb1 bash
root@12805cc2d7b5:/# mysql -p -e "SHOW DATABASES;"
Enter password:
+--------------------+
| Database           |
+--------------------+
| dbtest             |
| information_schema |
| mysql              |
| performance_schema |
+--------------------+

Here we can see our dbtest created.

Docker Commands

Finally, let’s see some useful commands for managing Docker.

Image search
```
$ docker search Image_Name
```
Image download
```
$ docker pull Image_Name
```
List the images installed
```
$ docker images
```
List containers (adding the flag -a we can see also the stopped containers)
```
$ docker ps -a
```
Delete a Docker Image
```
$ docker rmi Image_Name
```
Delete a Docker Container (the container must be stopped)
```
$ docker rm Container_Name
```
Run a container from a Docker Image (adding the flag -p we can mapping a container port to localhost)
```
$ docker run -d --name Container_Name -p Host_Port:Guest_Port Image_Name
```
Stop container
```
$ docker stop Container_Name
```
Start container
```
$ docker start Container_Name
```
Check container logs
```
$ docker logs Container_Name
```
Check container information
```
$ docker inspect Container_Name
```

Create a container linked

$ docker run -d --name Container_Name --link Container_Name:Image_Name Image_Name

Connect to a container from localhost
```
$ docker exec -it Container_Name bash
```

Create a container with volume added

$ docker run -d --name Container_Name --volume=/home/docker/Container_Name/conf.d:/etc/mysql/conf.d Image_Name

Commit changes to a new image

$ docker commit Container_ID Image_Name:TAG

Conclusion

Docker is a really useful tool for sharing a development environment easily using a Dockerfile or publishing a Docker Image. By using it you can make sure that everyone is using the same environment. At the same time it’s also useful to recreate or clone an existing environment. Docker can share volumes, use private networks, map ports, and even more.

In this blog, we saw how to deploy MariaDB Server on Docker as a standalone server. If you want to use a more complex environment like Replication or Galera Cluster, you can use bitnami/mariadb to achieve this configuration.

Tags:

Encrypting your MariaDB database, whether it is in-transit and at-rest, is one of the most important things that an organization should consider if you value your data.

Organizations that deal with financial transactions, medical records, confidential information, or even personal data must require this type of data protection. Fundamentally, database encryption will transform your readable data into a format that is unreadable (or at least hard to be decrypted) by any unauthorized user.

Encrypting your data prevents the misuse or malicious intent by hackers or unauthorized personnel that could damage your business. Unencrypted data is prone to attack by hackers who inject malicious data that could damage your infrastructure or steal information. Quartz recently released an article about the biggest breach that happened along these lines and it’s alarming that data has been stolen from billions of accounts over the past two decades.

In this blog, we will discuss various ways to encrypt your MariaDB data whether it's at-rest and in-transit. We will provide you with a basic understanding of encryption and how to use it so you can utilize these approaches to keep your data secure.

Encrypting MariaDB Data: In-Transit

MariaDB does not, by default, use encryption during data transmission over the network from server to client. However, using the default setup could provoke a potential hacker to eavesdrop on an unsecured / unencrypted channel. If you are operating on an isolated or highly secure environment, this default state may be acceptable. This, however, is not ideal when your client and network are on different network as it setups your database up for a potential “man-in-the-middle” attack.

To avoid these attacks, MariaDB allows you to encrypt data in-transit between the server and clients using the Transport Layer Security (TLS) protocol (formerly known as Secure Socket Layer or SSL). To start, you need to ensure that your MariaDB server was compiled with TLS support. You can verify this by running SHOW GLOBAL VARIABLES statement as shown below:

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'version_ssl_library';
+---------------------+---------------------------------+
| Variable_name       | Value                           |
+---------------------+---------------------------------+
| version_ssl_library | OpenSSL 1.0.1e-fips 11 Feb 2013 |
+---------------------+---------------------------------+
1 row in set (0.001 sec)

You might be confused in how SSL & TSL differ. Documentation may use the term SSL and variables for configuration uses ssl_* as prefix as well, however, MariaDB only supports its secure successors and no longer the older SSL versions. You might have to identify and use the correct versions of MariaDB that requires the right support of TLS versions you require to use. For example, PCI DSS v3.2 recommends using a minimum protocol version of TLSv1.2 which old versions of MariaDB supports. However, with TLS 1.3, requires OpenSSL 1.1.1, is faster due to a more efficient handshake between the two systems communicating and this is supported since MariaDB 10.2.16 and MariaDB 10.3.8.

To utilize the ciphers available for a specific TLS version, you can define it using the --ssl-cipher in the mysqld command or ssl-cipher variable in the configuration file. Take note that TLSv1.3 ciphers cannot be excluded when using OpenSSL, even by using the ssl_cipher system variable.

Configuration Parameters To Encrypt Data In-Transit

To encrypt your data in-transit, you can do the sequence of commands listed below:

Generate A CA Certificate

openssl genrsa 2048 > ca-key.pem
openssl req -new -x509 -nodes -days 365000 -subj "/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=CA Server"  -key ca-key.pem -out ca-cert.pem

Generate A Server Certificate

openssl req -newkey rsa:2048 -days 365000 -nodes -keyout server-key.pem -out server-req.pem -subj "/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=DB Server" 
openssl  rsa -in server-key.pem -out server-key.pem
openssl x509 -req -in server-req.pem -days 365000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out server-cert.pem

Generate A Client Certificate

openssl req -newkey rsa:2048 -days 365000 -nodes -keyout client-key.pem -out client-req.pem  -subj "/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=Client Server"
openssl rsa -in client-key.pem -out client-key.pem
openssl  x509 -req -in client-req.pem -days 365000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out client-cert.pem

Take note that your Common Name (CN) in the -subj argument must be unique against your CA, server and client certificates you are generating. Technically, CA and Server can have same CN but it's best to make it unique identifier for these three. Otherwise, you'll receive an error as such:

ERROR 2026 (HY000): SSL connection error: tlsv1 alert unknown ca

Alright, certificates and keys are in place. You need to specify the path using the ssl_* variables in your MySQL configuration file (e.g. /etc/my.cnf for RHEL-based OS or /etc/mysql/my.cnf for Debian/Ubuntu OS). See the example config below:

[msqld]
...
ssl_ca=/etc/ssl/galera/self-gen/ca-cert.pem
ssl_cert=/etc/ssl/galera/self-gen/server-cert.pem
ssl_key=/etc/ssl/galera/self-gen/server-key.pem

Of course, you must specify the correct path where have you placed your certificate and keys.

Then place these parameters under the [client-mariadb] section of your configuration file as such below:

[client-mariadb]
ssl_ca = /etc/ssl/galera/self-gen/ca-cert.pem
ssl_cert=/etc/ssl/galera/self-gen/client-cert.pem
ssl_key=/etc/ssl/galera/self-gen/client-key.pem

As mentioned earlier, you can specify what type of cipher that your SSL/TLS configuration can use. This can be done by specifying the configuration setup below:

[mysqld]
…
ssl_ca=/etc/ssl/galera/self-gen/ca-cert.pem
ssl_cert=/etc/ssl/galera/self-gen/server-cert.pem
ssl_key=/etc/ssl/galera/self-gen/server-key.pem
ssl-cipher=AES256-GCM-SHA384

Or you can use the following configuration as such below:

ssl-cipher=TLSv1.2      ### This will use all Ciphers available in TLS v1.2
ssl-cipher=HIGH:!DSS:!RCA-SHA:!DES-CBC3-SHA:!aNULL@STRENGTH       ### Will list strong ciphers available and exclude ciphers in prefix.

The last line denotes the equivalent of this command:

openssl ciphers -v 'HIGH:!DSS:!RCA-SHA:!DES-CBC3-SHA:!aNULL@STRENGTH'

This will exclude weak ciphers and those ciphers that are in the prefix form such as DHE-DSS-AES256-GCM-SHA384 cipher for example.

Generating your Certificate using ClusterControl

Alternatively, you can use ClusterControl to generate the certificates and keys for you. To do this, you can do the following as seen below:

Select your MariaDB Cluster, then go to the Security tab and select SSL Encryption. In my example below, this is a MariaDB Galera Cluster:
Select the SSL Encryption and enable it. You'll be able to create a new certificate or choose an existing one. For this sample, I'll be choosing the "Create Certificate" option:
The last step is to configure the days of expiration for your certificate. See below:
If you click "Restart Nodes," ClusterControl will perform a rolling restart. Take note, if you are using MariaDB 10.4 (which is currently on its RC version) you can use SSL without restarting your MariaDB server. Just use the FLUSH SSL command which is a new feature added in MariaDB 10.4 version.

Another way to handle your TLS/SSL certificates/keys, you can also use the Key Management under ClusterControl. Check out this blog to learn more about how to do this.

Create Your TLS/SSL MariaDB User

In case you thought you're done, you’re not. You need to ensure that your users are required to use SSL when they connect to the server. This user will be required to always interact with the server through a private channel. This is very important because you need to make sure that all your clients will be interacting with your server in a very secure and private manner.

To do this, just do the following example:

MariaDB [(none)]> CREATE USER mysecure_user@'192.168.10.200' IDENTIFIED BY 'myP@55w0rd';
Query OK, 0 rows affected (0.005 sec)

MariaDB [(none)]> GRANT ALL ON *.* TO mysecure_user@'192.168.10.200' REQUIRE SSL;
Query OK, 0 rows affected (0.005 sec)

Make sure that upon connecting to your client/application host, copy the certificate that you have generated based on the previous steps.

Verifying Your Connection

Testing your connection if it's encrypted or not is very important to determine the status. To do that, you can do the following command below:

mysql -e "status"|grep -i 'cipher'
SSL:                    Cipher in use is DHE-RSA-AES256-GCM-SHA384

Alternatively, OpenSSL version 1.1.1 added support for -starttls mysql. There's an available statically compiled openssl binary which you can get here https://testssl.sh/openssl-1.0.2k-dev-chacha.pm.ipv6.Linux+FreeBSD.tar.gz (or checkout this presentation in PDF format). Then you can do the following command below:

echo | bin/openssl.Linux.x86_64.static s_client -starttls mysql -connect localhost:3306 -CAfile /etc/ssl/galera/self-gen/ca-cert.pem

The example result would be like below:

$ echo | bin/openssl.Linux.x86_64.static s_client -starttls mysql -connect localhost:3306 -CAfile /etc/ssl/galera/self-gen/ca-cert.pem 
CONNECTED(00000003)
depth=1 C = PH, ST = Davao Del Sur, L = Davao City, O = Maximus Aleksandre, CN = CA Server
verify return:1
depth=0 C = PH, ST = Davao Del Sur, L = Davao City, O = Maximus Aleksandre, CN = DB Server
verify return:1
---
Certificate chain
 0 s:/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=DB Server
   i:/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=CA Server
 1 s:/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=CA Server
   i:/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=CA Server
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIDTDCCAjQCAQEwDQYJKoZIhvcNAQELBQAwazELMAkGA1UEBhMCUEgxFjAUBgNV
BAgMDURhdmFvIERlbCBTdXIxEzARBgNVBAcMCkRhdmFvIENpdHkxGzAZBgNVBAoM
Ek1heGltdXMgQWxla3NhbmRyZTESMBAGA1UEAwwJQ0EgU2VydmVyMCAXDTE5MDYx
MDAyMTMwNFoYDzMwMTgxMDExMDIxMzA0WjBrMQswCQYDVQQGEwJQSDEWMBQGA1UE
CAwNRGF2YW8gRGVsIFN1cjETMBEGA1UEBwwKRGF2YW8gQ2l0eTEbMBkGA1UECgwS
TWF4aW11cyBBbGVrc2FuZHJlMRIwEAYDVQQDDAlEQiBTZXJ2ZXIwggEiMA0GCSqG
SIb3DQEBAQUAA4IBDwAwggEKAoIBAQDNwFuoqJg8YlrDinxDZN4+JjFUTGlDfhmy
9H/1C4fZToegvd3RzU9mz3/Fgyuoez4szHDgkn7o4rqmKAH6tMm9R44qtBNGlxka
fn12PPXudDvij4A9C3nVatBJJXTSvSD4/eySY33kAS1DpKsgsTgKAKOsyadcvXYU
IP5nfFc7pxX/8qZADVmyeik4M+oLxO6ryurt0wmUhOmlz5zQghh9kFZLA49l+p95
m5D53d/O+Qj4HSb2ssZD2ZaRc2k4dMCVpa87xUbdP/VVLeu0J4BE3OJiwC0N1Jfi
ZpP2DOKljsklaAYQF+tPnWi5pgReEd47/ql0fNEjeheF/MJiJM1NAgMBAAEwDQYJ
KoZIhvcNAQELBQADggEBAAz7yB+UdNYJ1O5zJI4Eb9lL+vNVKhRJ8IfNrwKVbpAT
eQk9Xpn9bidfcd2gseqDTyixZhWjsjO2LXix7jRhH1DrJvhGQ7+1w36ujtzscTgy
ydLH90CnE/oZHArbBhmyuqmu041w5rB3PpI9i9SveezDrbVcaL+qeGo8s4ATB2Yr
Y3T3OTqw6o/7cTJJ8S1aXBLTyUq5HAtOTM2GGZMSYwVqUsmBHA3d7M8i7yp20RVH
78j1H6+/hSSY4SDhwr04pSkzmm6HTIBCgOYrmEV2sQ/YeMHqVrSplLRY3SZHvqHo
gbSnasOQAE1oJnSNyxt9CRRAghM/EHEnsA2OlFa9iXQ=
-----END CERTIFICATE-----
subject=/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=DB Server
issuer=/C=PH/ST=Davao Del Sur/L=Davao City/O=Maximus Aleksandre/CN=CA Server
---
No client certificate CA names sent
Client Certificate Types: RSA fixed DH, DSS fixed DH, RSA sign, DSA sign, ECDSA sign
Requested Signature Algorithms: RSA+SHA512:DSA+SHA512:ECDSA+SHA512:RSA+SHA384:DSA+SHA384:ECDSA+SHA384:RSA+SHA256:DSA+SHA256:ECDSA+SHA256:RSA+SHA224:DSA+SHA224:ECDSA+SHA224:RSA+SHA1:DSA+SHA1:ECDSA+SHA1
Shared Requested Signature Algorithms: RSA+SHA512:DSA+SHA512:ECDSA+SHA512:RSA+SHA384:DSA+SHA384:ECDSA+SHA384:RSA+SHA256:DSA+SHA256:ECDSA+SHA256:RSA+SHA224:DSA+SHA224:ECDSA+SHA224:RSA+SHA1:DSA+SHA1:ECDSA+SHA1
Peer signing digest: SHA512
Server Temp Key: DH, 2048 bits
---
SSL handshake has read 3036 bytes and written 756 bytes
---
New, TLSv1/SSLv3, Cipher is DHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : DHE-RSA-AES256-GCM-SHA384
    Session-ID: 46E0F6FA42779DB210B4DF921A68E9E4CC39ADD87D28118DB0073726B98C0786
    Session-ID-ctx: 
    Master-Key: 2A2E6137929E733051BE060953049A0553F49C2F50A183EEC0C40F7EFB4E2749E611DF54A88417518A274EC904FB3CE6
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 300 (seconds)
    TLS session ticket:
    0000 - 4a dd f3 7f 1e b7 9e cb-77 58 b9 75 53 34 5c 61   J.......wX.uS4\a
    0010 - 3a 4d 0e aa e2 6b 27 8e-11 ff be 24 ad 66 88 49   :M...k'....$.f.I
    0020 - c1 ba 20 20 d8 9f d5 5c-23 9d 64 dc 97 f2 fa 77   ..  ...\#.d....w
    0030 - bf e6 26 1f 2c 98 ee 3b-71 66 0c 04 05 3e 54 c1   ..&.,..;qf...>T.
    0040 - 88 b6 f7 a9 fd b8 f9 84-cd b8 99 9f 6e 50 3b 13   ............nP;.
    0050 - 90 30 91 7d 48 ea 11 f7-3f b7 6b 65 2e ea 7e 61   .0.}H...?.ke..~a
    0060 - 70 cd 4e b8 43 54 3d a0-aa dc e5 44 a7 41 3a 5e   p.N.CT=....D.A:^
    0070 - 3e cb 45 57 33 2b a4 8f-75 d8 ce a5 9e 00 16 50   >.EW3+..u......P
    0080 - 24 aa 7a 54 f8 26 65 74-11 d7 f3 d6 66 3b 14 60   $.zT.&et....f;.`
    0090 - 33 98 4a ef e2 17 ba 33-4e 7f 2b ce 46 d7 e9 11   3.J....3N.+.F...

    Start Time: 1560133350
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---
DONE

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Encrypting MariaDB Data: At-Rest

Encrypted data that is stored physically inside your hardware storage (i.e. at rest) provides more stability and protection against a data breach. If a malicious attacker can login to your MariaDB database, then they can read the data in plain text. Similar to using a strings command in Linux, you’d be able to easily retrieve the data from the database. Moreover, it adds more danger if the attacker has an advanced understanding of the file format of the tablespace.

Encryption at-rest is an additional protection, but it is not a replacement for a good firewall, strong passwords, correct user permissions, and in-transit encryption between the client and server.

MariaDB's support for encryption on tables and tablespaces was added in version 10.1.3. With your tables being encrypted, your data is almost impossible for someone to steal. This type of encryption also allows your organization to be compliant with government regulations like GPDR.,

Once you have enabled data-at-rest encryption in MariaDB, tables that are defined with ENCRYPTED=YES or with innodb_encrypt_tables=ON, will have your stored data encrypted. It's decrypted only when accessed via MariaDB's database, otherwise, the data is unreadable.

For example, reading a data that is unencrypted, it will show you as such:

$ strings admin_logs.ibd|head -10
=r7N
infimum
supremum
infimum
supremum/
failThe verification code you entered is incorrect.KK
elmo1234failThe password or username you entered is invalidKK
elmo1234failThe password or username you entered is invalidKK
elmoasfdfailThe verification code you entered is incorrect.KK
safasffailThe verification code you entered is incorrect.KK

but with encrypted data, your tablespace won't be readable just like the example below:

# strings user_logs.ibd |head -10
E?*Pa
[+YQ
KNmbUtQ
T_lPAW
\GbQ.
] e2
#Rsd
ywY%o
kdUY
{]~GE

It is also noteworthy that MariaDB's Data-At-Rest encryption adds a data size overhead of roughly 3-5%. MariaDB encryption is also fully supported when using the XtraDB and InnoDB storage engines. Encryption is also supported for the Aria storage engine, but only for tables created with ROW_FORMAT=PAGE (the default) and for the binary log (replication log). MariaDB even allows the user flexibly in what to encrypt. In XtraDB or InnoDB, one can choose to encrypt:

everything — all tablespaces (with all tables)
individual tables
everything, excluding individual tables

Additionally, one can choose to encrypt XtraDB/InnoDB log files (which is recommended).

MariaDB does have it's limitations. Galera Cluster gcache, for example, is not encrypted but it is planned as part of the MariaDB 10.4 version. You can find a full list of limitations here.

How to Setup and Configure MariaDB for Data-At-Rest Encryption

Generate a random encryption keys using openssl rand command.

$ mkdir -p /etc/mysql/encryption
$ for i in {1..5}; do openssl rand -hex 32 >> /etc/mysql/encryption/keyfile;  done;

Now, open and edit the file /etc/mysql/encryption/keyfile and add the key ID's which this will be reference when creating encrypted tables as it's encryption key id. See ENCRYPTION_KEY_ID for more details. Hence, the following format should be as follows:

<encryption_key_id1>;<hex-encoded_encryption_key1>
<encryption_key_id2>;<hex-encoded_encryption_key2>

In my example keyfile, this looks as the following:

$ cat keyfile
1;687a90b4423c10417f2483726a5901007571c16331d2ee9534333fef4e323075
2;e7bf20f1cbde9632587c2996871cff74871890d19b49e273d13def123d781e17
3;9284c9c80da9a323b3ac2c82427942dfbf1718b57255cc0bc0e2c3d6f15ac3ac
4;abf80c3a8b10643ef53a43c759227304bcffa263700a94a996810b0f0459a580
5;bdbc5f67d34a4904c4adc9771420ac2ab2bd9c6af1ec532e960335e831f02933

Let's create or generate a random password using the similar command from step 1:
```
$ openssl rand -hex 128 > /etc/mysql/encryption/keyfile.key
```
Before proceeding to the next step, it's important to take note of the following details about encrypting the key file:
- The only algorithm that MariaDB currently supports to encrypt the key file is Cipher Block Chaining (CBC) mode of Advanced Encryption Standard (AES).
- The encryption key size can be 128-bits, 192-bits, or 256-bits.
- The encryption key is created from the SHA-1 hash of the encryption password.
- The encryption password has a max length of 256 characters.
Now, to encrypt the key file using openssl enc command, run the following command below:
```
$ openssl enc -aes-256-cbc -md sha1 -pass file:/etc/mysql/encryption/keyfile.key -in /etc/mysql/encryption/keyfile    -out /etc/mysql/encryption/keyfile.enc
```

Add the following variables in your MySQL configuration file (i.e. /etc/my.cnf on RHEL-based Linux OS or /etc/mysql/my.cnf in Debian/Ubuntu Linux based OS)

[mysqld]
…
#################### DATABASE ENCRYPTION ##############################
plugin_load_add = file_key_management
file_key_management_filename = /etc/mysql/encryption/keyfile.enc
file_key_management_filekey = FILE:/etc/mysql/encryption/keyfile.key
file_key_management_encryption_algorithm = aes_cbc 
encrypt_binlog = 1

innodb_encrypt_tables = ON
innodb_encrypt_log = ON
innodb_encryption_threads = 4
innodb_encryption_rotate_key_age = 0 # Do not rotate key

Restart MariaDB Server now
```
$ systemctl start mariadb
```

Verify and Test The Encryption

To verify and test the encryption, just create a sample table. For example, create the table by doing the following SQL statements below:

MariaDB [test]> CREATE TABLE a (i int) ENGINE=InnoDB ENCRYPTED=YES;
Query OK, 0 rows affected (0.018 sec)

MariaDB [test]> CREATE TABLE b (i int) ENGINE=InnoDB;
Query OK, 0 rows affected (0.003 sec)

Then, let's add some data to the tables:

MariaDB [test]> insert into a values(1),(2);
Query OK, 2 rows affected (0.001 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> insert into b values(1),(2);
Query OK, 2 rows affected (0.001 sec)
Records: 2  Duplicates: 0  Warnings: 0

To check and see what are the tables that are encrypted, just run the following SELECT query below:

MariaDB [test]> SELECT * FROM information_schema.innodb_tablespaces_encryption\G
*************************** 1. row ***************************
                       SPACE: 4
                        NAME: mysql/gtid_slave_pos
           ENCRYPTION_SCHEME: 1
          KEYSERVER_REQUESTS: 1
             MIN_KEY_VERSION: 1
         CURRENT_KEY_VERSION: 1
    KEY_ROTATION_PAGE_NUMBER: NULL
KEY_ROTATION_MAX_PAGE_NUMBER: NULL
              CURRENT_KEY_ID: 1
        ROTATING_OR_FLUSHING: 0
*************************** 2. row ***************************
                       SPACE: 2
                        NAME: mysql/innodb_index_stats
           ENCRYPTION_SCHEME: 1
          KEYSERVER_REQUESTS: 1
             MIN_KEY_VERSION: 1
         CURRENT_KEY_VERSION: 1
    KEY_ROTATION_PAGE_NUMBER: NULL
KEY_ROTATION_MAX_PAGE_NUMBER: NULL
              CURRENT_KEY_ID: 1
        ROTATING_OR_FLUSHING: 0
*************************** 3. row ***************************
                       SPACE: 1
                        NAME: mysql/innodb_table_stats
           ENCRYPTION_SCHEME: 1
          KEYSERVER_REQUESTS: 1
             MIN_KEY_VERSION: 1
         CURRENT_KEY_VERSION: 1
    KEY_ROTATION_PAGE_NUMBER: NULL
KEY_ROTATION_MAX_PAGE_NUMBER: NULL
              CURRENT_KEY_ID: 1
        ROTATING_OR_FLUSHING: 0
*************************** 4. row ***************************
                       SPACE: 3
                        NAME: mysql/transaction_registry
           ENCRYPTION_SCHEME: 1
          KEYSERVER_REQUESTS: 0
             MIN_KEY_VERSION: 1
         CURRENT_KEY_VERSION: 1
    KEY_ROTATION_PAGE_NUMBER: NULL
KEY_ROTATION_MAX_PAGE_NUMBER: NULL
              CURRENT_KEY_ID: 1
        ROTATING_OR_FLUSHING: 0
*************************** 5. row ***************************
                       SPACE: 5
                        NAME: test/a
           ENCRYPTION_SCHEME: 1
          KEYSERVER_REQUESTS: 1
             MIN_KEY_VERSION: 1
         CURRENT_KEY_VERSION: 1
    KEY_ROTATION_PAGE_NUMBER: NULL
KEY_ROTATION_MAX_PAGE_NUMBER: NULL
              CURRENT_KEY_ID: 1
        ROTATING_OR_FLUSHING: 0
*************************** 6. row ***************************
                       SPACE: 6
                        NAME: test/b
           ENCRYPTION_SCHEME: 1
          KEYSERVER_REQUESTS: 1
             MIN_KEY_VERSION: 1
         CURRENT_KEY_VERSION: 1
    KEY_ROTATION_PAGE_NUMBER: NULL
KEY_ROTATION_MAX_PAGE_NUMBER: NULL
              CURRENT_KEY_ID: 1
        ROTATING_OR_FLUSHING: 0
6 rows in set (0.000 sec)

Creating the InnoDB table does not need to specify ENCRYPTED=YES keyword. It's created automatically as we specified in the configuration file to have innodb_encrypt_tables = ON.

If you want to specify the encryption id of the table to used, you can do the following as well:

MariaDB [test]> CREATE TABLE c (i int) ENGINE=InnoDB ENCRYPTION_KEY_ID = 4;
Query OK, 0 rows affected (0.003 sec)

The ENCRYPTION_KEY_ID was taken from the encryption keyfile which we generated earlier.

Additionally, if you want more testing through shell, you can use the strings command I showed you earlier.

Additional Information on MariaDB Encryption

If your MariaDB instance should not contain any unencrypted tables, just setup the variable in your my.cnf configuration file within the [mysqld] section as follows:

innodb_encrypt_tables = FORCE.

For binlog encryption, just add the following

encrypt_binlog = 1

InnoDB's redo-log is not encrypted by default. To encrypt it just add the variable below after the [mysqld] section,

innodb-encrypt-log

If you need encryption for your on-disk temporary tables and temporary files, you can add the following:

encrypt-tmp-disk-tables=1
encrypt-tmp-files=1

Tags:

MariaDB

encryption

encryption at rest

encryption in-transit

While it shares the same heritage with MySQL, MariaDB is a different database. Over the years as new versions of MySQL and MariaDB were released, both projects have differed into two different RDBMS platforms.

MariaDB becomes the main database distribution on many Linux platforms and it’s getting high popularity these days. At the same time, it becomes a very attractive database system for many corporations. It’s getting features that are close to the enterprise needs like encryption, hot backups or compatibility with proprietary databases.

But how do new features affect MariaDB compatibility with MySQL? Is it still drop replacement for MySQL? How do the latest changes amplify the migration process? We will try to answer that in this article.

What You Need to Know Before Upgrade

MariaDB and MySQL differ from each other significantly in the last two years, especially with the arrival of their most recent versions: MySQL 8.0, MariaDB 10.3 and MariaDB 10.4 RC (we discussed new features of MariaDB 10.4 RC quite recently so If you would like to read more about what's upcoming in 10.4 please check two blogs of my colleague Krzysztof, What's New in MariaDB 10.4 and second about What's New in MariaDB Cluster 10.4).

With the release MariaDB 10.3, MariaDB surprised many since it is no longer a drop-in replacement for MySQL. MariaDB is no longer merging new MySQL features with MariaDB noir solving MySQL bugs. Nevertheless version 10.3 is now becoming a real alternative to Oracle MySQL Enterprise as well as other enterprise proprietary databases such as Oracle 12c (MSSQL in version 10.4).

Preliminary Check and limitations

Migration is a complex process no matter which version you are upgrading to. There are a few things you need to keep in mind when planning this, such as essential changes between RDBMS versions as well as detailed testing that needs to lead any upgrade process. This is especially critical if you would like to maintain availability for the duration of the upgrade.

Upgrading to a new major version involves risk, and it is important to plan the whole process thoughtfully. In this document, we’ll look at the important new changes in the 10.3 (and upcoming 10.4) version and show you how to plan the test process.

To minimize the risk, let’s take a look on platform differences and limitations.

Starting with the configuration there are some parameters that have different default values. MariaDB provides a matrix of parameter differences. It can be found here.

In MySQL 8.0, caching_sha2_password is the default authentication plugin. This enhancement should improve security by using the SHA-256 algorithm. MySQL has this plugin enabled by default, while MariaDB doesn’t. Although there is already a feature request opened with MariaDB MDEV-9804. MariaDB offers ed25519 plugin instead which seems to be a good alternative to the old authentication method.

MariaDB supports connection thread pools, which are most effective in situations where queries are relatively short and the load is CPU bound. On MySQL’s community edition, the number of threads is static, which limits the flexibility in these situations. The enterprise plan of MySQL includes threadpool capabilities.

MySQL 8.0 includes the sys schema, a set of objects that helps database administrators and software engineers interpret data collected by the Performance Schema. Sys schema objects can be used for optimization and diagnosis use cases. MariaDB doesn’t have this enhancement included.

Another one is invisible columns. Invisible columns give the flexibility of adding columns to existing tables without the fear of breaking an application. This feature is not available in MySQL. It allows creating columns which aren’t listed in the results of a SELECT * statement, nor do they need to be assigned a value in an INSERT statement when their name isn’t mentioned in the statement.

MariaDB decided not to implement native JSON support (one of the major features of MySQL 5.7 and 8.0) as they claim it’s not part of the SQL standard. Instead, to support replication from MySQL, they only defined an alias for JSON, which is actually a LONGTEXT column. In order to ensure that a valid JSON document is inserted, the JSON_VALID function can be used as a CHECK constraint (default for MariaDB 10.4.3). MariaDB can't directly access MySQL JSON format.

Oracle automates a lot of tasks with MySQL Shell. In addition to SQL, MySQL Shell also offers scripting capabilities for JavaScript and Python.

Migration Process Using mysqldump

Once we know our limitations the installation process is fairly simple. It’s pretty much related to standard installation and import using mysqldump. MySQL Enterprise backup tool is not compatible with MariaDB so the recommended way is to use mysqldump. Here is the example process is done on Centos 7 and MariaDB 10.3.

Create dump on MySQL Enterprise server

$ mysqldump --routines --events --triggers --single-transaction db1 > export_db1.sql

Clean yum cache index

sudo yum makecache fast

Install MariaDB 10.3

sudo yum -y install MariaDB-server MariaDB-client

Start MariaDB service.

sudo systemctl start mariadb
sudo systemctl enable mariadb

Secure MariaDB by running mysql_secure_installation.

# mysql_secure_installation 

NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
      SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!

In order to log into MariaDB to secure it, we'll need the current
password for the root user.  If you've just installed MariaDB, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.

Enter current password for root (enter for none): 
OK, successfully used password, moving on...

Setting the root password ensures that nobody can log into the MariaDB
root user without the proper authorisation.

Set root password? [Y/n] y
New password: 
Re-enter new password: 
Password updated successfully!
Reloading privilege tables..
 ... Success!


By default, a MariaDB installation has an anonymous user, allowing anyone
to log into MariaDB without having to have a user account created for
them.  This is intended only for testing, and to make the installation
go a bit smoother.  You should remove them before moving into a
production environment.

Remove anonymous users? [Y/n] y
 ... Success!

Normally, root should only be allowed to connect from 'localhost'.  This
ensures that someone cannot guess at the root password from the network.

Disallow root login remotely? [Y/n] y
 ... Success!

By default, MariaDB comes with a database named 'test' that anyone can
access.  This is also intended only for testing, and should be removed
before moving into a production environment.

Remove test database and access to it? [Y/n] y
 - Dropping test database...
 ... Success!
 - Removing privileges on test database...
 ... Success!

Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.

Reload privilege tables now? [Y/n] y
 ... Success!

Cleaning up...

All done!  If you've completed all of the above steps, your MariaDB
installation should now be secure.
Thanks for using MariaDB!

Import dump

Mysql -uroot -p
> tee import.log
> source export_db1.sql
Review the import log.

$vi import.log

To deploy an environment you can also use ClusterControl which has an option to deploy from scratch.

ClusterControl Deploy MariaDB

ClusterControl can be also used to set up replication or to import a backup from MySQL Enterprise Edition.

Migration Process Using Replication

The other approach for migration between MySQL Enterprise and MariaDB is to use replication process. MariaDB versions allow replicating to them, from MySQL databases - which means you can easily migrate MySQL databases to MariaDB. MySQL Enterprise versions won’t allow replication from MariaDB servers so this is one-way route.

Based on MariaDB documentation: https://mariadb.com/kb/en/library/mariadb-vs-mysql-compatibility/. X refers to MySQL documentation.

Here are some general rules pointed by the MariaDB.

Replicating from MySQL 5.5 to MariaDB 5.5+ should just work. You’ll want MariaDB to be the same or higher version than your MySQL server.
When using a MariaDB 10.2+ as a slave, it may be necessary to set binlog_checksum to NONE.
Replicating from MySQL 5.6 without GTID to MariaDB 10+ should work.
Replication from MySQL 5.6 with GTID, binlog_rows_query_log_events and ignorable events works starting from MariaDB 10.0.22 and MariaDB 10.1.8. In this case, MariaDB will remove the MySQL GTIDs and other unneeded events and instead adds its own GTIDs.

Even if you don’t plan to use replication in the migration/cutover process having one is a good confidence-builder is to replicate your production server on a testing sandbox, and then practice on it.

We hope this introductory blog post helped you to understand the assessment and implementation process of MySQL Enterprise Migration to MariaDB.

Tags:

MariaDB Server is one of the most popular, open source database servers. It was created by the original developers of MySQL and it became popular for being fast, scalable, and robust. MariaDB has a rich ecosystem of storage engines, plugins, and other tools that make it very versatile for a wide variety of use cases.

Disk space and I/O efficiency requirements of our databases continue to grow higher. This is so we’re able to manage our information growth in a correct way.

As for the MariaDB storage engine, we have different types to choose from such as XtraDB, InnoDB, Aria, or MyISAM. Since MariaDB 10.2.5 version MyRocks has also been available. MyRocks is the type of storage engine that could really help us meet the requirements we mentioned earlier.

In this blog, we’ll learn more information about the new MyRocks engine and how we can use it in a MariaDB Server.

What is MyRocks?

MyRocks is an open source storage engine based on RocksDB which was originally developed by Facebook.

MyRocks can be a good storage solution when you have workloads that require greater compression and I/O efficiency. It uses a Log Structured Merge (LSM) architecture that has better compression than the B-tree algorithms used by the InnoDB engine (2x better compression compared to data compressed by InnoDB). It’s also a write-optimized storage engine (10x less write amplification when compared to InnoDB) and it has faster data loading and replication. MyRocks writes data directly to the bottom-most level, which avoids all compaction overheads when you enable faster data loading for a session.

An LSM works by storing modify operations in a buffer (memtable) and, sorting and storing the data when this buffer is full.

By default, tables and databases are stored in a #rocksdb directory inside the MySQL datadir. This information is stored in .sst files without per-table separation.

MyRocks supports READ COMMITTED and REPEATABLE READ isolated levels and it doesn’t support SERIALIZABLE.

How to Implement MyRocks on a MariaDB Server

Installation

First, we need to install MariaDB server. In this example, we’ll use CentOS Linux release 7.6 as the operating system.

By default, this OS version will try to install MariaDB 5.5, so we’ll add the MariaDB repository to install the MariaDB version 10.3.

$ cat > /etc/yum.repos.d/MariaDB.repo <<- EOF
# MariaDB 10.3 CentOS repository
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/10.3/centos7-amd64
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1
EOF

And then, we’ll install the MariaDB Server package:

$ yum install MariaDB-server

This command will install different package dependencies, not only MariaDB Server.

==========================================================================================================================================================================================================
 Package                                                 Arch                                   Version                                                     Repository                               Size
==========================================================================================================================================================================================================
Installing:
 MariaDB-server                                          x86_64                                 10.3.15-1.el7.centos                                        mariadb                                  24 M
Installing for dependencies:
 MariaDB-client                                          x86_64                                 10.3.15-1.el7.centos                                        mariadb                                  11 M
 MariaDB-common                                          x86_64                                 10.3.15-1.el7.centos                                        mariadb                                  78 k
 MariaDB-compat                                          x86_64                                 10.3.15-1.el7.centos                                        mariadb                                 2.8 M
 boost-program-options                                   x86_64                                 1.53.0-27.el7                                               base                                    156 k
 galera                                                  x86_64                                 25.3.26-1.rhel7.el7.centos                                  mariadb                                 8.1 M
 libaio                                                  x86_64                                 0.3.109-13.el7                                              base                                     24 k
 lsof                                                    x86_64                                 4.87-6.el7                                                  base                                    331 k
 make                                                    x86_64                                 1:3.82-23.el7                                               base                                    420 k
 openssl                                                 x86_64                                 1:1.0.2k-16.el7_6.1                                         updates                                 493 k
 perl-Compress-Raw-Bzip2                                 x86_64                                 2.061-3.el7                                                 base                                     32 k
 perl-Compress-Raw-Zlib                                  x86_64                                 1:2.061-4.el7                                               base                                     57 k
 perl-DBI                                                x86_64                                 1.627-4.el7                                                 base                                    802 k
 perl-Data-Dumper                                        x86_64                                 2.145-3.el7                                                 base                                     47 k
 perl-IO-Compress                                        noarch                                 2.061-2.el7                                                 base                                    260 k
 perl-Net-Daemon                                         noarch                                 0.48-5.el7                                                  base                                     51 k
 perl-PlRPC                                              noarch                                 0.2020-14.el7                                               base                                     36 k

Transaction Summary
==========================================================================================================================================================================================================
Install  1 Package (+16 Dependent packages)

By default, the MariaDB Server is installed with the InnoDB storage engine, so we must install the RocksDB engine to be able to make use of it.

$ yum install MariaDB-rocksdb-engine
==========================================================================================================================================================================================================
 Package                                                  Arch                                     Version                                                Repository                                 Size
==========================================================================================================================================================================================================
Installing:
 MariaDB-rocksdb-engine                                   x86_64                                   10.3.15-1.el7.centos                                   mariadb                                   4.4 M
Installing for dependencies:
 libzstd                                                  x86_64                                   1.3.4-1.el7                                            mariadb                                   211 k
 snappy                                                   x86_64                                   1.1.0-3.el7                                            base                                       40 k

Transaction Summary
==========================================================================================================================================================================================================
Install  1 Package (+2 Dependent packages)

This command will install some required dependencies and it’ll enable the plugin on the MariaDB Server. It‘ll also create a configuration file in /etc/my.cnf.d/rocksdb.cnf:

[mariadb]
plugin-load-add=ha_rocksdb.so

We can verify this installation by running the command SHOW PLUGINS into the MariaDB Server.

$ MariaDB> SHOW PLUGINS;
+-------------------------------+----------+--------------------+---------------+---------+
| Name                          | Status   | Type               | Library       | License |
+-------------------------------+----------+--------------------+---------------+---------+
...
| ROCKSDB                       | ACTIVE   | STORAGE ENGINE     | ha_rocksdb.so | GPL     |
| ROCKSDB_CFSTATS               | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DBSTATS               | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_PERF_CONTEXT          | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_PERF_CONTEXT_GLOBAL   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_CF_OPTIONS            | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_COMPACTION_STATS      | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_GLOBAL_INFO           | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DDL                   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_SST_PROPS             | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_INDEX_FILE_MAP        | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_LOCKS                 | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_TRX                   | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
| ROCKSDB_DEADLOCK              | ACTIVE   | INFORMATION SCHEMA | ha_rocksdb.so | GPL     |
+-------------------------------+----------+--------------------+---------------+---------+

If for some reason we don’t have the plugin enabled, we can install it dynamically by executing INSTALL SONAME or INSTALL PLUGIN command:

$ MariaDB> INSTALL SONAME 'ha_rocksdb';

Another option could be restarting the database services. This action should read the /etc/my.cnf.d/rocksdb.cnf file and enable the plugin.

$ service mariadb restart

We can find detailed information about our RocksDB engine by using the following command:

$ SHOW ENGINE ROCKSDB STATUS

Configuration

About the configuration files, the main one is /etc/my.cnf, which includes the directory /etc/my.cnf.d where we can find the rest of the configuration files. In this directory, we’ll have the following configuration files by default:

enable_encryption.preset: It will enable data at rest encryption.
mysql-clients.cnf: Here there are configurations for different groups like [mysqladmin], [mysqlcheck], [mysqldump] and more.
rocksdb.cnf: In this file, we’ll add the specific configuration for MyRocks, like default-storage-engine or rocksdb_block_size.
server.cnf: Here we have configuration related to the database server like bind-address and binlog_format.

All MyRocks system variables and status variables are prefaced with "rocksdb". Let take a look on this.

System variables:

$ MariaDB> SHOW VARIABLES LIKE 'rocksdb%';
+-------------------------------------------------+------------------------------------------+
| Variable_name                                   | Value                                    |
+-------------------------------------------------+------------------------------------------+
| rocksdb_access_hint_on_compaction_start         | 1                                        |
| rocksdb_advise_random_on_open                   | ON                                       |
| rocksdb_allow_concurrent_memtable_write         | OFF                                      |
| rocksdb_allow_mmap_reads                        | OFF                                      |
| rocksdb_allow_mmap_writes                       | OFF                                      |
| rocksdb_allow_to_start_after_corruption         | OFF                                      |
| rocksdb_blind_delete_primary_key                | OFF                                      |
| rocksdb_block_cache_size                        | 536870912                                |
| rocksdb_block_restart_interval                  | 16                                       |
| rocksdb_block_size                              | 4096                                     |
…
+-------------------------------------------------+------------------------------------------+

Status variables:

$ MariaDB> SHOW STATUS LIKE 'rocksdb%';
+----------------------------------------------------+-------+
| Variable_name                                      | Value |
+----------------------------------------------------+-------+
| Rocksdb_rows_deleted                               | 0     |
| Rocksdb_rows_inserted                              | 0     |
| Rocksdb_rows_read                                  | 0     |
| Rocksdb_rows_updated                               | 0     |
| Rocksdb_rows_deleted_blind                         | 0     |
| Rocksdb_rows_expired                               | 0     |
| Rocksdb_rows_filtered                              | 0     |
| Rocksdb_system_rows_deleted                        | 0     |
| Rocksdb_system_rows_inserted                       | 0     |
| Rocksdb_system_rows_read                           | 0     |
…
+----------------------------------------------------+-------+

You can find more information about the status and system variables on the MariaDB website.

Backups for MariaDB Using MyRocks

Backups are a must in all database environments. They’re essential for system recovery, migrations, auditing, testing, and more.

We can categorize the backups in two different types, logical and physical. The logical backup is stored in a human-readable format like SQL, and the physical backup contains the additional binary data.

For logical backups on MariaDB with MyRocks as the database engine, the most common backup tool is the classic mysqldump:

$ mysqldump -hHOST -uUSER -p DATABASE > FILE.SQL

And for physical backup, we can use Mariabackup which is compatible with MyRocks:

$ mariabackup --backup --target-dir=/backup/ --user=USER --password=PASSWORD --host=HOST

Another option can be myrocks_hotbackup, created by Facebook. It can be used to take a physical copy from a running MyRocks instance to local or remote server, without stopping the source instance.

Limitations of Using MyRocks for MariaDB

Let’s look at some of the limitations of using the MyRocks engine...

MariaDB’s optimistic parallel replication may not be supported
MyRocks is not available for 32-bit platforms
MariaDB Cluster (Galera Cluster) doesn’t work with MyRocks (Only InnoDB or XtraDB storage engines)
The transaction must fit in memory
Requires special settings for loading data
SERIALIZABLE is not supported
Transportable Tablespace, Foreign Key, Spatial Index, and Fulltext Index are not supported

Conclusion

MyRocks is available in MariaDB from versions higher than 10.2.5. As we mentioned earlier, this storage engine may be useful to you when you have workloads that require high data compression and greater levels of I/O efficiency. To learn more about MyRocks you can check this out.

Tags:

MariaDB

myrocks

storage engine

MariaDB MaxScale is an advanced, plug-in database proxy for MariaDB database servers. It sits between client applications and the database servers, routing client queries and server responses. MaxScale also monitors the servers, so it will quickly notice any changes in server status or replication topology. This makes MaxScale a natural choice for controlling failover and similar features.

In this two-part blog series we are going to give a complete walkthrough on how to run MariaDB MaxScale on Docker. This part covers the deployment as a standalone Docker container and MaxScale clustering via Docker Swarm for high availability.

MariaDB MaxScale on Docker

There are a number of MariaDB Docker images available in Docker Hub. In this blog, we are going to use the official image maintained and published by MariaDB called "mariadb/maxscale" (tag: latest). The image is around 71MB in size. At this time of writing, the image is pre-installed with MaxScale 2.3.4 as part of its required packages.

Generally, the following steps are required to run a MaxScale with this image on container environment:

A running MariaDB (master-slave or master-master) replication/Galera Cluster or NDB Cluster
Create and grant a database user dedicated for MaxScale monitoring
Prepare the MaxScale configuration file
Map the configuration file into container or load into Kubernetes ConfigMap or Docker Swarm Configs
Start the container/pod/service/replicaset

Note that MaxScale is a product of MariaDB, which means it is tailored towards MariaDB server. Most of the features are still compatible with MySQL except some parts like for example GTID handling, Galera Cluster configuration and internal data files. The version that we are going to use is 2.3.4, which is released under Business Source License (BSL). It allows for all the code to be open and usage under THREE servers is free. When usage goes over three backend servers, the company using it must pay for a commercial subscription. After a specific time period (2 years in the case of MaxScale) the release moves to GPL and all usage is free.

Just to be clear, since this is a test environment, we are okay to have more than 2 nodes. As stated in the MariaDB BSL FAQ page:

Q: Can I use MariaDB products licensed under BSL in test and development environment?
A: Yes, In non-production test and development environment, you can use products licensed under BSL without needing a subscription from MariaDB

In this walkthrough, we already have a three-node MariaDB Replication deployed using ClusterControl. The following diagram illustrates the setup that we are going to deploy:

Our system architecture consists of:

mariadb1 - 192.168.0.91 (master)
mariadb2 - 192.168.0.92 (slave)
mariadb3 - 192.168.0.93 (slave)
docker1 - 192.168.0.200 (Docker host for containers - maxscale, app)

Preparing the MaxScale User

Firstly, create a MySQL database user for MaxScale and allow all hosts in the network 192.168.0.0/24:

MariaDB> CREATE USER 'maxscale'@'192.168.0.%' IDENTIFIED BY 'my_s3cret';

Then, grant the required privileges. If you just want to monitor the backend servers with load balancing, the following grants would suffice:

MariaDB> GRANT SHOW DATABASES ON *.* TO 'maxscale'@'192.168.0.%';
MariaDB> GRANT SELECT ON `mysql`.* TO 'maxscale'@'192.168.0.%';

However, MaxScale can do much more than routing queries. It has the ability to perform failover and switchover for example promoting a slave to a new master. This requires SUPER and REPLICATION CLIENT privileges. If you would like to use this feature, assign ALL PRIVILEGES to the user instead:

mysql> GRANT ALL PRIVILEGES ON *.* TO maxscale@'192.168.0.%';

That's it for the user part.

Preparing MaxScale Configuration File

The image requires a working configuration file to be mapped into the container before it is started. The minimal configuration file provided in the container is not going to help us build the reverse proxy that we want. Therefore, the configuration file has to be prepared beforehand.

The following list can help us in collecting required basic information to construct our configuration file:

Cluster type - MaxScale supports MariaDB replication (master-slave, master-master), Galera Cluster, Amazon Aurora, MariaDB ColumnStore and NDB Cluster (aka MySQL Cluster).
Backend IP address and/or hostname - Reachable IP address or hostname for all backend servers.
Routing algorithm - MaxScale supports two types of query routing - read-write splitting and load balancing in round-robin.
Port to listen by MaxScale - By default, MaxScale uses port 4006 for round-robin connections and 4008 for read-write split connections. You may use UNIX socket if you want.

In the current directory, create a text file called maxscale.cnf so we can map it into the container when starting up. Paste the following lines in the file:

########################
## Server list
########################

[mariadb1]
type            = server
address         = 192.168.0.91
port            = 3306
protocol        = MariaDBBackend
serv_weight     = 1

[mariadb2]
type            = server
address         = 192.168.0.92
port            = 3306
protocol        = MariaDBBackend
serv_weight     = 1

[mariadb3]
type            = server
address         = 192.168.0.93
port            = 3306
protocol        = MariaDBBackend
serv_weight     = 1

#########################
## MaxScale configuration
#########################

[maxscale]
threads                 = auto
log_augmentation        = 1
ms_timestamp            = 1
syslog                  = 1

#########################
# Monitor for the servers
#########################

[monitor]
type                    = monitor
module                  = mariadbmon
servers                 = mariadb1,mariadb2,mariadb3
user                    = maxscale
password                = my_s3cret
auto_failover           = true
auto_rejoin             = true
enforce_read_only_slaves = 1

#########################
## Service definitions for read/write splitting and read-only services.
#########################

[rw-service]
type            = service
router          = readwritesplit
servers         = mariadb1,mariadb2,mariadb3
user            = maxscale
password        = my_s3cret
max_slave_connections           = 100%
max_sescmd_history              = 1500
causal_reads                    = true
causal_reads_timeout            = 10
transaction_replay              = true
transaction_replay_max_size     = 1Mi
delayed_retry                   = true
master_reconnection             = true
master_failure_mode             = fail_on_write
max_slave_replication_lag       = 3

[rr-service]
type            = service
router          = readconnroute
servers         = mariadb1,mariadb2,mariadb3
router_options  = slave
user            = maxscale
password        = my_s3cret

##########################
## Listener definitions for the service
## Listeners represent the ports the service will listen on.
##########################

[rw-listener]
type            = listener
service         = rw-service
protocol        = MariaDBClient
port            = 4008

[ro-listener]
type            = listener
service         = rr-service
protocol        = MariaDBClient
port            = 4006

A bit of explanations for every section:

Server List - The backend servers. Define every MariaDB server of this cluster in its own stanza. The stanza name will be used when we specify the service definition further down. The component type must be "server".
MaxScale Configuration - Define all MaxScale related configurations there.
Monitor module - How MaxScale should monitor the backend servers. The component type must be "monitor" followed by either one of the monitoring modules. For the list of supported monitors, refer to MaxScale 2.3 Monitors.
Service - Where to route the query. The component type must be "service". For the list of supported routers, refer to MaxScale 2.3 Routers.
Listener - How MaxScale should listen to incoming connections. It can be port or socket file. The component type must be "listener". Commonly, listeners are tied to services.

So basically, we would like MaxScale to listen on two ports, 4006 and 4008. Port 4006 is specifically for round-robin connection, suitable for read-only workloads for our MariaDB Replication while port 4008 is specifically for critical read and write workloads. We also want to use MaxScale to perform action to our replication in case of a failover, switchover or slave rejoining, thus we use the monitor module for called "mariadbmon".

Running the Container

We are now ready to run our standalone MaxScale container. Map the configuration file with -v and make sure to publish both listener ports 4006 and 4008. Optionally, you can enable MaxScale REST API interface at port 8989:

$ docker run -d \
--name maxscale \
--restart always \
-p 4006:4006 \
-p 4008:4008 \
-p 8989:8989 \
-v $PWD/maxscale.cnf:/etc/maxscale.cnf \
mariadb/maxscale

Verify with:

$ docker logs -f maxscale
...
2019-06-14 07:15:41.060   notice : (main): Started REST API on [127.0.0.1]:8989
2019-06-14 07:15:41.060   notice : (main): MaxScale started with 8 worker threads, each with a stack size of 8388608 bytes.

Ensure you see no error when looking at the above logs. Verify if the docker-proxy processes are listening on the published ports - 4006, 4008 and 8989:

$ netstat -tulpn | grep docker-proxy
tcp6       0      0 :::8989                 :::*                    LISTEN      4064/docker-proxy
tcp6       0      0 :::4006                 :::*                    LISTEN      4092/docker-proxy
tcp6       0      0 :::4008                 :::*                    LISTEN      4078/docker-proxy

At this point, our MaxScale is running and capable of processing queries.

MaxCtrl

MaxCtrl is a command line administrative client for MaxScale which uses the MaxScale REST API for communication. It is intended to be the replacement software for the legacy MaxAdmin command line client.

To enter MaxCtrl console, execute the "maxctrl" command inside the container:

$ docker exec -it maxscale maxctrl
 maxctrl: list servers
┌──────────┬──────────────┬──────┬─────────────┬─────────────────┬─────────────┐
│ Server   │ Address      │ Port │ Connections │ State           │ GTID        │
├──────────┼──────────────┼──────┼─────────────┼─────────────────┼─────────────┤
│ mariadb1 │ 192.168.0.91 │ 3306 │ 0           │ Master, Running │ 0-5001-1012 │
├──────────┼──────────────┼──────┼─────────────┼─────────────────┼─────────────┤
│ mariadb2 │ 192.168.0.92 │ 3306 │ 0           │ Slave, Running  │ 0-5001-1012 │
├──────────┼──────────────┼──────┼─────────────┼─────────────────┼─────────────┤
│ mariadb3 │ 192.168.0.93 │ 3306 │ 0           │ Slave, Running  │ 0-5001-1012 │
└──────────┴──────────────┴──────┴─────────────┴─────────────────┴─────────────┘

To verify if everything is okay, simply run the following commands:

maxctrl: list servers
maxctrl: list services
maxctrl: list filters
maxctrl: list sessions

To get further info on every component, prefix with "show" command instead, for example:

maxctrl: show servers
┌──────────────────┬──────────────────────────────────────────┐
│ Server           │ mariadb3                                 │
├──────────────────┼──────────────────────────────────────────┤
│ Address          │ 192.168.0.93                             │
├──────────────────┼──────────────────────────────────────────┤
│ Port             │ 3306                                     │
├──────────────────┼──────────────────────────────────────────┤
│ State            │ Slave, Running                           │
├──────────────────┼──────────────────────────────────────────┤
│ Last Event       │ new_slave                                │
├──────────────────┼──────────────────────────────────────────┤
│ Triggered At     │ Mon, 17 Jun 2019 08:57:59 GMT            │
├──────────────────┼──────────────────────────────────────────┤
│ Services         │ rw-service                               │
│                  │ rr-service                               │
├──────────────────┼──────────────────────────────────────────┤
│ Monitors         │ monitor                                  │
├──────────────────┼──────────────────────────────────────────┤
│ Master ID        │ 5001                                     │
├──────────────────┼──────────────────────────────────────────┤
│ Node ID          │ 5003                                     │
├──────────────────┼──────────────────────────────────────────┤
│ Slave Server IDs │                                          │
├──────────────────┼──────────────────────────────────────────┤
│ Statistics       │ {                                        │
│                  │     "connections": 0,                    │
│                  │     "total_connections": 0,              │
│                  │     "persistent_connections": 0,         │
│                  │     "active_operations": 0,              │
│                  │     "routed_packets": 0,                 │
│                  │     "adaptive_avg_select_time": "0ns"│
│                  │ }                                        │
├──────────────────┼──────────────────────────────────────────┤
│ Parameters       │ {                                        │
│                  │     "address": "192.168.0.93",           │
│                  │     "protocol": "MariaDBBackend",        │
│                  │     "port": 3306,                        │
│                  │     "extra_port": 0,                     │
│                  │     "authenticator": null,               │
│                  │     "monitoruser": null,                 │
│                  │     "monitorpw": null,                   │
│                  │     "persistpoolmax": 0,                 │
│                  │     "persistmaxtime": 0,                 │
│                  │     "proxy_protocol": false,             │
│                  │     "ssl": "false",                      │
│                  │     "ssl_cert": null,                    │
│                  │     "ssl_key": null,                     │
│                  │     "ssl_ca_cert": null,                 │
│                  │     "ssl_version": "MAX",                │
│                  │     "ssl_cert_verify_depth": 9,          │
│                  │     "ssl_verify_peer_certificate": true, │
│                  │     "disk_space_threshold": null,        │
│                  │     "type": "server",                    │
│                  │     "serv_weight": "1"│
│                  │ }                                        │
└──────────────────┴──────────────────────────────────────────┘

Connecting to the Database

The application's database user must be granted with the MaxScale host since from MariaDB server point-of-view, it can only sees the MaxScale host. Consider the following example without MaxScale in the picture:

Database name: myapp
User: myapp_user
Host: 192.168.0.133 (application server)

To allow the user to access the database inside MariaDB server, one has to run the following statement:

MariaDB> CREATE USER 'myapp_user'@'192.168.0.133' IDENTIFIED BY 'mypassword';
MariaDB> GRANT ALL PRIVILEGES ON myapp.* to 'myapp_user'@'192.168.0.133';

With MaxScale in the picture, one has to run the following statement instead (replace the application server IP address with the MaxScale IP address, 192.168.0.200):

MariaDB> CREATE USER 'myapp_user'@'192.168.0.200' IDENTIFIED BY 'mypassword';
MariaDB> GRANT ALL PRIVILEGES ON myapp.* to 'myapp_user'@'192.168.0.200';

From the application, there are two ports you can use to connect to the database:

4006 - Round-robin listener, suitable for read-only workloads.
4008 - Read-write split listener, suitable for write workloads.

If your application is allowed to specify only one MySQL port (e.g, Wordpress, Joomla, etc), pick the RW port 4008 instead. This is the safest endpoint connect regardless of the cluster type. However, if your application can handle connections to multiple MySQL ports, you may send the reads to the round-robin listener. This listener has less overhead and much faster if compared to the read-write split listener.

For our MariaDB replication setup, connect to either one of these endpoints as database host/port combination:

192.168.0.200 port 4008 - MaxScale - read/write or write-only
192.168.0.200 port 4006 - MaxScale - balanced read-only
192.168.0.91 port 3306 - MariaDB Server (master) - read/write
192.168.0.92 port 3306 - MariaDB Server (slave) - read-only
192.168.0.93 port 3306 - MariaDB Server (slave) - read-only

Note for multi-master cluster type like Galera Cluster and NDB Cluster, port 4006 can be used as multi-write balanced connections instead. With MaxScale you have many options to pick from when connecting to the database, with each of them provide its own set of advantages.

MaxScale Clustering with Docker Swarm

With Docker Swarm, we can create a group of MaxScale instances via Swarm service with more than one replica together with Swarm Configs. Firstly, import the configuration file into Swarm:

$ cat maxscale.conf | docker config create maxscale_config -

Verify with:

$ docker config inspect --pretty maxscale_config

Then, grant the MaxScale database user to connect from any Swarm hosts in the network:

MariaDB> CREATE USER 'maxscale'@'192.168.0.%' IDENTIFIED BY 'my_s3cret';
MariaDB> GRANT ALL PRIVILEGES ON *.* TO maxscale@'192.168.0.%';

When starting up the Swarm service for MaxScale, we can create multiple containers (called replicas) mapping to the same configuration file as below:

$ docker service create \
--name maxscale-cluster  \
--replicas=3 \
--publish published=4008,target=4008 \
--publish published=4006,target=4006 \
--config source=maxscale_config,target=/etc/maxscale.cnf \
mariadb/maxscale

The above will create three MaxScale containers spread across Swarm nodes. Verify with:

$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE                     PORTS
yj6u2xcdj7lo        maxscale-cluster    replicated          3/3                 mariadb/maxscale:latest   *:4006->4006/tcp, *:4008->4008/tcp

If the applications are running within the Swarm network, you can simply use the service name "maxscale-cluster" as the database host for your applications. Externally, you can connect to any of the Docker host on the published ports and Swarm network will route and balance the connections to the correct containers in round-robin fashion. At this point our architecture can be illustrated as below:

In the second part, we are going to look at advanced use cases of MaxScale on Docker like service control, configuration management, query processing, security and cluster reconciliation.

Tags:

This blog post is a continuation of MariaDB MaxScale Load Balancing on Docker: Deployment - Part1. In this part, we are going to focus more on management operations with advanced use cases like service control, configuration management, query processing, security and cluster reconciliation. The example steps and instructions shown in this post are based on the running environments that we have set up in the first part of this blog series.

Service Control

For MaxScale, starting and stopping the container is the only way to control the service. Provided the container has been created, we can use the following command to manage the service:

$ docker start maxscale
$ docker stop maxscale
$ docker restart maxscale

Running without Root Privileges

The Docker containers by default run with the root privilege and so does the application that runs inside the container. This is another major concern from the security perspective because hackers can gain root access to the Docker host by hacking the application running inside the container.

To run Docker as a non-root user, you have to add your user to the docker group. Firstly, create a docker group if there isn’t one:

$ sudo groupadd docker

Then, add your user to the docker group. In this example our user is "vagrant":

$ sudo usermod -aG docker vagrant

Log out and log back in so that your group membership is re-evaluated (or reboot if it does not work). At this point, you can run the MaxScale container with the standard run command (no sudo required) as user "vagrant":

$ docker run -d \
--name maxscale-unprivileged \
-p 4006:4006 \
-p 4008:4008 \
-p 8989:8989 \
-v $PWD/maxscale.cnf:/etc/maxscale.cnf \
mariadb/maxscale

MaxScale process runs by user "maxscale" and requires no special privileges up to the root level. Thus, running the container in non-privileged mode is always the best way if you are concerned about the security.

Configuration Management

For standalone MaxScale container, configuration management requires modification to the mapped configuration file followed by restarting the MaxScale container. However, if you are running as a Docker Swarm service, the new configuration has to be loaded into the Swarm Configs as a new version, for example:

$ cat maxscale.cnf | docker config create maxscale_config_v2 -

Then, update the service by removing the old configs (maxscale_config) and add the new one (maxscale_config_v2) to the same target:

$ docker service update \
--config-rm maxscale_config \
--config-add source=maxscale_config_v2,target=/etc/maxscale.cnf \
maxscale-cluster

Docker Swarm will then schedule container removal and replace procedures one container at a time until the replicas requirement is satisfied.

Upgrade and Downgrade

One of the advantages of running your applications in Docker is trivial upgrade and downgrade procedure. Every running container is based on an image, and this image can be switched easily with the image tag. To get the list of available images for MaxScale, check out the Tags section in the Docker Hub. The following examples show the process to downgrade a MaxScale 2.3 to one minor version earlier, 2.2:

$ docker run -d \
--name maxscale \
-p 4006:4006 \
-p 4008:4008 \
-v $PWD/maxscale.cnf:/etc/maxscale.cnf \
mariadb/maxscale:2.3
$ docker rm -f maxscale
$ docker run -d \
--name maxscale \
-p 4006:4006 \
-p 4008:4008 \
-v $PWD/maxscale.cnf:/etc/maxscale.cnf \
mariadb/maxscale:2.2

Make sure the configuration options are compatible with the version that you want to run. For example, the above downgrade would be failed at the first run due to the following errors:

2019-06-19 05:29:04.301   error  : (check_config_objects): Unexpected parameter 'master_reconnection' for object 'rw-service' of type 'service', or 'true' is an invalid value for parameter 'master_reconnection'.
2019-06-19 05:29:04.301   error  : (check_config_objects): Unexpected parameter 'delayed_retry' for object 'rw-service' of type 'service', or 'true' is an invalid value for parameter 'delayed_retry'.
2019-06-19 05:29:04.301   error  : (check_config_objects): Unexpected parameter 'transaction_replay_max_size' for object 'rw-service' of type 'service', or '1Mi' is an invalid value for parameter 'transaction_replay_max_size'.
2019-06-19 05:29:04.302   error  : (check_config_objects): Unexpected parameter 'transaction_replay' for object 'rw-service' of type 'service', or 'true' is an invalid value for parameter 'transaction_replay'.
2019-06-19 05:29:04.302   error  : (check_config_objects): Unexpected parameter 'causal_reads_timeout' for object 'rw-service' of type 'service', or '10' is an invalid value for parameter 'causal_reads_timeout'.
2019-06-19 05:29:04.302   error  : (check_config_objects): Unexpected parameter 'causal_reads' for object 'rw-service' of type 'service', or 'true' is an invalid value for parameter 'causal_reads'.

What we need to do is to remove the unsupported configuration options as shown above in the configuration file before downgrading the container image:

master_reconnection
delayed_retry
transaction_replay
causal_reads_timeout
causal_reads

Finally, start the container again and you should be good. Version upgrade for MaxScale works similarly. Just change the tag that you want to use and off you go.

MaxScale Filters

MaxScale uses a component called filter to manipulate or process the requests as they pass through it. There are a bunch of filters you can use, as listed in this page, MaxScale 2.3 Filters. For example, a specific query can be logged into a file if it matches a criteria or you can rewrite the incoming query before it reaches the backend servers.

To activate a filter, you have to define a section and include the definition name into the corresponding service definition, as shown in the examples further down.

Query Logging All (QLA)

As its name explains, QLA filter logs all queries match the set of rule per client session. All queries will be logged following the filebase format.

Firstly, define the component with type=filter and module=qlafilter:

## Query Log All (QLA) filter
## Filter module for MaxScale to log all query content on a per client session basis
[qla-sbtest-no-pk]
type		= filter
module		= qlafilter
filebase	= /tmp/sbtest
match		= select.*from.*
exclude		= where.*id.*
user		= sbtest

Then add the filter component into our services:

[rw-service]
...
filters        = qla-sbtest-no-pk
[rr-service]
...
filters        = qla-sbtest-no-pk

It's also a good idea to map /tmp of the container with the actual directory on the Docker host, so we don't have to access the container to retrieve the generated log files. Firstly, create a directory and give global writable permission:

$ mkdir qla
$ chmod 777 qla

Since we need to bind the above directory into the container, we have to stop and remove the running container and re-run it with the following command:

$ docker stop maxscale
$ docker run -d \
--name maxscale \
--restart always \
-p 4006:4006 \
-p 4008:4008 \
-p 8989:8989 \
-v $PWD/maxscale.cnf:/etc/maxscale.cnf \
-v $PWD/qla:/tmp \
mariadb/maxscale

You can then retrieve the content of the logged queries inside the qla directory:

$ cat qla/*
Date,User@Host,Query
2019-06-18 08:25:13,sbtest@::ffff:192.168.0.19,select * from sbtest.sbtest1

Query Rewriting

Query rewrite is a feature that, depending on the queries running against the database server, quickly allows to isolate and correct problematic queries and improve performance.

Query rewriting can be done via regexfilter. This filter can match or exclude incoming statements using regular expressions and replace them with another statement. Every rule is defined in its own section and include the section name in the corresponding service to activate it.

The following filter will match a number of SHOW commands that we don't want to expose to the read-only clients:

## Rewrite query based on regex match and replace
[block-show-commands]
type            = filter
module          = regexfilter
options         = ignorecase
match           = ^show (variables|global variables|global status|status|processlist|full processlist).*
replace         = SELECT 'Not allowed'

Then we can append the filter to the service that we want to apply. For example, all read-only connections have to be filtered for the above:

[rr-service]
...
filters        = qla-sbtest-no-pk | block-show-commands

Keep in mind that multiple filters can be defined using a syntax akin to the Linux shell pipe "|" syntax. Restart the container to apply the configuration changes:

$ docker restart maxscale

We can then verify with the following query:

$ mysql -usbtest -p -h192.168.0.200 -P4006 -e 'SHOW VARIABLES LIKE "max_connections"'
+-------------+
| Not allowed |
+-------------+
| Not allowed |
+-------------+

You will get the result as expected.

Cluster Recovery

MaxScale 2.2.2 and later supports automatic or manual MariaDB replication or cluster recovery for the following events:

failover
switchover
rejoin
reset-replication

Failover for the master-slave cluster can and often should be set to activate automatically. Switchover must be activated manually through MaxAdmin, MaxCtrl or the REST interface. Rejoin can be set to automatic or activated manually. These features are implemented in the "mariadbmon" module.

The following automatic failover events happened if we purposely shutdown the active master, 192.168.0.91:

$ docker logs -f maxscale
...
2019-06-19 03:53:02.348   error  : (mon_log_connect_error): Monitor was unable to connect to server mariadb1[192.168.0.91:3306] : 'Can't connect to MySQL server on '192.168.0.91' (115)'
2019-06-19 03:53:02.351   notice : (mon_log_state_change): Server changed state: mariadb1[192.168.0.91:3306]: master_down. [Master, Running] -> [Down]
2019-06-19 03:53:02.351   warning: (handle_auto_failover): Master has failed. If master status does not change in 4 monitor passes, failover begins.
2019-06-19 03:53:16.710   notice : (select_promotion_target): Selecting a server to promote and replace 'mariadb1'. Candidates are: 'mariadb2', 'mariadb3'.
2019-06-19 03:53:16.710   warning: (warn_replication_settings): Slave 'mariadb2' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2019-06-19 03:53:16.711   warning: (warn_replication_settings): Slave 'mariadb3' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2019-06-19 03:53:16.711   notice : (select_promotion_target): Selected 'mariadb2'.
2019-06-19 03:53:16.711   notice : (handle_auto_failover): Performing automatic failover to replace failed master 'mariadb1'.
2019-06-19 03:53:16.723   notice : (redirect_slaves_ex): Redirecting 'mariadb3' to replicate from 'mariadb2' instead of 'mariadb1'.
2019-06-19 03:53:16.742   notice : (redirect_slaves_ex): All redirects successful.
2019-06-19 03:53:17.249   notice : (wait_cluster_stabilization): All redirected slaves successfully started replication from 'mariadb2'.
2019-06-19 03:53:17.249   notice : (handle_auto_failover): Failover 'mariadb1' -> 'mariadb2' performed.
2019-06-19 03:53:20.363   notice : (mon_log_state_change): Server changed state: mariadb2[192.168.0.92:3306]: new_master. [Slave, Running] -> [Master, Running]

After failover completes, our topology is now looking like this:

For switchover operation, it requires human intervention and one way to do it through MaxCtrl console. Let's say the old master is back operational and is ready to be promoted as a master, we can perform the switchover operation by sending the following command:

$ docker exec -it maxscale maxctrl
maxctrl: call command mariadbmon switchover monitor mariadb1 mariadb2
OK

Where, the formatting is:

$ call command <monitoring module> <operation> <monitoring section name> <new master> <current master>

Then, verify the new topology by listing out the servers:

 maxctrl: list servers
┌──────────┬──────────────┬──────┬─────────────┬─────────────────┬──────────────┐
│ Server   │ Address      │ Port │ Connections │ State           │ GTID         │
├──────────┼──────────────┼──────┼─────────────┼─────────────────┼──────────────┤
│ mariadb1 │ 192.168.0.91 │ 3306 │ 0           │ Master, Running │ 0-5001-12144 │
├──────────┼──────────────┼──────┼─────────────┼─────────────────┼──────────────┤
│ mariadb2 │ 192.168.0.92 │ 3306 │ 0           │ Slave, Running  │ 0-5001-12144 │
├──────────┼──────────────┼──────┼─────────────┼─────────────────┼──────────────┤
│ mariadb3 │ 192.168.0.93 │ 3306 │ 0           │ Slave, Running  │ 0-5001-12144 │
└──────────┴──────────────┴──────┴─────────────┴─────────────────┴──────────────┘

We just promoted our old master back to its original spot. Fun fact, ClusterControl automatic recovery feature does exactly the same thing if it is enabled.

Final Thoughts

Running MariaDB MaxScale on Docker brings additional benefits like MaxScale clustering, easy to upgrade and downgrade, and also advanced proxying functionalities for MySQL and MariaDB clusters.

Tags:

The typical MySQL DBA might be familiar working and managing an OLTP (Online Transaction Processing) database as part of their daily routine. You may be familiar with how it works and how to manage complex operations. While the default storage engine that MySQL ships is good enough for OLAP (Online Analytical Processing) it's pretty simplistic, especially those who would like to learn artificial intelligence or who deal with forecasting, data mining, data analytics.

In this blog, we're going to discuss the MariaDB ColumnStore. The content will be tailored for the benefit of the MySQL DBA who might have less understanding with the ColumnStore and how it might be applicable for OLAP (Online Analytical Processing) applications.

OLTP vs OLAP

OLTP

The typical MySQL DBA activity for dealing this type of data is by using OLTP (Online Transaction Processing). OLTP is characterized by large database transactions doing inserts, updates, or deletes. OLTP-type of databases are specialized for fast query processing and maintaining data integrity while being accessed in multiple environments. Its effectiveness is measured by the number of transactions per second (tps). It is fairly common for the parent-child relationship tables (after implementation of the normalization form) to reduce redundant data in a table.

Records in a table are commonly processed and stored sequentially in a row-oriented manner and are highly indexed with unique keys to optimize data retrieval or writes. This is also common for MySQL, especially when dealing with large inserts or high concurrent writes or bulk inserts. Most of the storage engines that MariaDB supports are applicable for OLTP applications - InnoDB (the default storage engine since 10.2), XtraDB, TokuDB, MyRocks, or MyISAM/Aria.

Applications like CMS, FinTech, Web Apps often deal with heavy writes and reads and often require high throughput. To make these applications work often requires deep expertise in high-availability, redundancy, resilience, and recovery.

OLAP

OLAP deals with the same challenges as OLTP, but uses a different approach (especially when dealing with data retrieval.) OLAP deals with larger datasets and is common for data warehousing, often used for business intelligence type of applications. It is commonly used for Business Performance Management, Planning, Budgeting, Forecasting, Financial Reporting, Analysis, Simulation Models, Knowledge Discovery, and Data Warehouse Reporting.

Data which is stored in OLAP is typically not as critical as that stored in OLTP. This is because most of the data can be simulated coming from OLTP and then can be fed to your OLAP database. This data is typically used for bulk loading, often needed for business analytics which eventually be rendered into visual graphs. OLAP also performs multidimensional analysis of business data and delivers results which can be used for complex calculations, trend analysis, or sophisticated data modeling.

OLAP usually stores data persistently using a columnar format. In MariaDB ColumnStore, however, the records are broken-out based on its columns and are stored separately into a file. This way data retrieval is very efficient, as it scans only the relevant column referred in your SELECT statement query.

Think of it like this, OLTP processing handles your daily and crucial data transactions that runs your business application, while OLAP helps you manage, predict, analyze, and better market your product - the building blocks of having a business application.

What is MariaDB ColumnStore?

MariaDB ColumnStore is a pluggable columnar storage engine that runs on MariaDB Server. It utilizes a parallel distributed data architecture while keeping the same ANSI SQL interface that is used across the MariaDB server portfolio. This storage engine has been around for a while, as it was originally ported from InfiniDB (A now defunct code which is still available on github.) It is designed for big data scaling (to process petabytes of data), linear scalability, and real-time response to analytics queries. It leverages the I/O benefits of columnar storage; compression, just-in-time projection, and horizontal & vertical partitioning to deliver tremendous performance when analyzing large data sets.

Lastly, MariaDB ColumnStore is the backbone of their MariaDB AX product as the main storage engine used by this technology.

How is MariaDB ColumnStore Different From InnoDB?

InnoDB is applicable for OLTP processing that requires your application to respond the fastest way possible. It's useful if your application are dealing with that nature. On the other hand, MariaDB ColumnStore is a suitable choice for managing big data transactions or large data sets that involves complex joins, aggregation at different levels of dimension hierarchy, project a financial total for a wide range of years, or using equality and range selections. These approaches using ColumnStore do not require you to index these fields, since it can perform sufficiently faster. InnoDB can’t really handle this type of performance, although there's no stopping you from trying that as is doable with InnoDB, but at a cost. This requires you to add indexes, which adds large amounts of data to your disk storage. This means it can take more time to finish your query, and it might not finish at all if it's trapped in a time-loop.

MariaDB ColumnStore Architecture

Let's look at the MariaDB ColumStore architecture below:

Image Courtesy of MariaDB ColumnStore presentation

In contrast to the InnoDB architecture, the ColumnStore contains two modules which denotes its intent is to work efficiently on a distributed architectural environment. InnoDB is intended to scale on a server, but spans on a multiple-interconnected nodes depending on a cluster setup. Hence, ColumnStore has multiple level of components which takes care the processes requested to the MariaDB Server. Let's dig on this components below:

User Module (UM): The UM is responsible for parsing the SQL requests into an optimized set of primitive job steps executed by one or more PM servers. The UM is thus responsible for query optimization and orchestration of query execution by the PM servers. While multiple UM instances can be deployed in a multi-server deployment, a single UM is responsible for each individual query. A database load balancer, like MariaDB MaxScale, can be deployed to appropriately balance external requests against individual UM servers.
Performance Module (PM): The PM executes granular job steps received from a UM in a multi-threaded manner. ColumnStore allows distribution of work across many Performance Modules. The UM is composed of the MariaDB mysqld process and ExeMgr process.
Extent Maps: ColumnStore maintains metadata about each column in a shared distributed object known as the Extent Map The UM server references the Extent Map to help assist in generating the correct primitive job steps. The PM server references the Extent Map to identify the correct disk blocks to read. Each column is made up of one or more files and each file can contain multiple extents. As much as possible the system attempts to allocate contiguous physical storage to improve read performance.
Storage: ColumnStore can use either local storage or shared storage (e.g. SAN or EBS) to store data. Using shared storage allows for data processing to fail over to another node automatically in case of a PM server failing.

Below is how MariaDB ColumnStore processes the query,

Clients issue a query to the MariaDB Server running on the User Module. The server performs a table operation for all tables needed to fulfill the request and obtains the initial query execution plan.
Using the MariaDB storage engine interface, ColumnStore converts the server table object into ColumnStore objects. These objects are then sent to the User Module processes.
The User Module converts the MariaDB execution plan and optimizes the given objects into a ColumnStore execution plan. It then determines the steps needed to run the query and the order in which they need to be run.
The User Module then consults the Extent Map to determine which Performance Modules to consult for the data it needs, it then performs Extent Elimination, eliminating any Performance Modules from the list that only contain data outside the range of what the query requires.
The User Module then sends commands to one or more Performance Modules to perform block I/O operations.
The Performance Module or Modules carry out predicate filtering, join processing, initial aggregation of data from local or external storage, then send the data back to the User Module.
The User Module performs the final result-set aggregation and composes the result-set for the query.
The User Module / ExeMgr implements any window function calculations, as well as any necessary sorting on the result-set. It then returns the result-set to the server.
The MariaDB Server performs any select list functions, ORDER BY and LIMIT operations on the result-set.
The MariaDB Server returns the result-set to the client.

Query Execution Paradigms

Let's dig a bit more how does ColumnStore executes the query and when it impacts.

ColumnStore differs with the standard MySQL/MariaDB storage engines such as InnoDB since ColumnStore gains performance by only scanning necessary columns, utilizing system maintained partitioning, and utilizing multiple threads and servers to scale query response time. Performance is benefited when you only include columns that are necessary for your data retrieval. This means that the greedy asterisk (*) in your select query has significant impact compared to a SELECT <col1>, <col2>... type of query.

Same as with InnoDB and other storage engines, data type has also significance in performance on what you used. If say you have a column that can only have values 0 through 100 then declare this as a tinyint as this will be represented with 1 byte rather than 4 bytes for int. This will reduce the I/O cost by 4 times. For string types an important threshold is char(9) and varchar(8) or greater. Each column storage file uses a fixed number of bytes per value. This enables fast positional lookup of other columns to form the row. Currently the upper limit for columnar data storage is 8 bytes. So for strings longer than this the system maintains an additional 'dictionary' extent where the values are stored. The columnar extent file then stores a pointer into the dictionary. So it is more expensive to read and process a varchar(8) column than a char(8) column for example. So where possible you will get better performance if you can utilize shorter strings especially if you avoid the dictionary lookup. All TEXT/BLOB data types in 1.1 onward utilize a dictionary and do a multiple block 8KB lookup to retrieve that data if required, the longer the data the more blocks are retrieved and the greater a potential performance impact.

In a row based system adding redundant columns adds to the overall query cost but in a columnar system a cost is only occurred if the column is referenced. Therefore additional columns should be created to support different access paths. For instance store a leading portion of a field in one column to allow for faster lookups but additionally store the long form value as another column. Scans on a shorter code or leading portion column will be faster.

Query joins are optimized-ready for large scale joins and avoid the need for indexes and the overhead of nested loop processing. ColumnStore maintains table statistics so as to determine the optimal join order. Similar approaches shares with InnoDB like if the join is too large for the UM memory, it uses disk-based join to make the query completed.

For aggregations, ColumnStore distributes aggregate evaluation as much as possible. This means that it shares across the UM and PM to handle queries especially or very large number of values in the aggregate column(s). Select count(*) is internally optimized to pick the least number of bytes storage in a table. This means that it would pick CHAR(1) column (uses 1 byte) over it INT column which takes 4 bytes. The implementation still honors ANSI semantics in that select count(*) will include nulls in the total count as opposed to an explicit select(COL-N) which excludes nulls in the count.

Order by and limit are currently implemented at the very end by the mariadb server process on the temporary result set table. This has been mentioned in the step #9 on how ColumnStore processes the query. So technically, the results are passed to MariaDB Server for sorting the data.

For complex queries that uses subqueries, it's basically the same approach where are executed in sequence and is managed by UM, same as with Window functions are handled by UM but it uses a dedicated faster sort process, so it's basically faster.

Partitioning your data is provided by ColumnStore which it uses Extent Maps which maintains the min/max values of column data and provide a logical range for partitioning and remove the need for indexing. Extent Maps also provides manual table partitioning, materialized views, summary tables and other structures and objects that row-based databases must implement for query performance. There are certain benefits for columned values when they are in order or semi-order as this allows for very effective data partitioning. With min and max values, entire extent maps after the filter and exclusion will be eliminated. See this page in their manual about Extent Elimination. This generally works particularly well for time-series data or similar values that increase over time.

Installing The MariaDB ColumnStore

Installing the MariaDB ColumnStore can be simple and straightforward. MariaDB has a series of notes here which you can refer to. For this blog, our installation target environment is CentOS 7. You can go to this link https://downloads.mariadb.com/ColumnStore/1.2.4/ and check out the packages based on your OS environment. See the detailed steps below to help you speed up:

### Note: The installation details is ideal for root user installation
cd /root/
wget https://downloads.mariadb.com/ColumnStore/1.2.4/centos/x86_64/7/mariadb-columnstore-1.2.4-1-centos7.x86_64.rpm.tar.gz
tar xzf mariadb-columnstore-1.0.7-1-centos7.x86_64.rpm.tar.gz
sudo yum -y install boost expect perl perl-DBI openssl zlib snappy libaio perl-DBD-MySQL net-tools wget jemalloc
sudo rpm -ivh mariadb-columnstore*.rpm

Once done, you need to run postConfigure command to finally install and setup your MariaDB ColumnStore. In this sample installation, there are two nodes I have setup running on vagrant machine:
csnode1:192.168.2.10
csnode2:192.168.2.20

Both of these nodes are defined in its respective /etc/hosts and both nodes are targeted are set to have its User and Performance Modules combined in both hosts. The installation is a little bit trivial at first. Hence, we share how you can configure it so you can have a basis. See the details below for the sample installation process:

[root@csnode1 ~]# /usr/local/mariadb/columnstore/bin/postConfigure -d

This is the MariaDB ColumnStore System Configuration and Installation tool.
It will Configure the MariaDB ColumnStore System and will perform a Package
Installation of all of the Servers within the System that is being configured.

IMPORTANT: This tool requires to run on the Performance Module #1

Prompting instructions:

        Press 'enter' to accept a value in (), if available or
        Enter one of the options within [], if available, or
        Enter a new value


===== Setup System Server Type Configuration =====

There are 2 options when configuring the System Server Type: single and multi

  'single'  - Single-Server install is used when there will only be 1 server configured
              on the system. It can also be used for production systems, if the plan is
              to stay single-server.

  'multi'   - Multi-Server install is used when you want to configure multiple servers now or
              in the future. With Multi-Server install, you can still configure just 1 server
              now and add on addition servers/modules in the future.

Select the type of System Server install [1=single, 2=multi] (2) > 


===== Setup System Module Type Configuration =====

There are 2 options when configuring the System Module Type: separate and combined

  'separate' - User and Performance functionality on separate servers.

  'combined' - User and Performance functionality on the same server

Select the type of System Module Install [1=separate, 2=combined] (1) > 2

Combined Server Installation will be performed.
The Server will be configured as a Performance Module.
All MariaDB ColumnStore Processes will run on the Performance Modules.

NOTE: The MariaDB ColumnStore Schema Sync feature will replicate all of the
      schemas and InnoDB tables across the User Module nodes. This feature can be enabled
      or disabled, for example, if you wish to configure your own replication post installation.

MariaDB ColumnStore Schema Sync feature, do you want to enable? [y,n] (y) > 


NOTE: MariaDB ColumnStore Replication Feature is enabled

Enter System Name (columnstore-1) > 


===== Setup Storage Configuration =====


----- Setup Performance Module DBRoot Data Storage Mount Configuration -----

There are 2 options when configuring the storage: internal or external

  'internal' -    This is specified when a local disk is used for the DBRoot storage.
                  High Availability Server Failover is not Supported in this mode

  'external' -    This is specified when the DBRoot directories are mounted.
                  High Availability Server Failover is Supported in this mode.

Select the type of Data Storage [1=internal, 2=external] (1) > 

===== Setup Memory Configuration =====


NOTE: Setting 'NumBlocksPct' to 50%
      Setting 'TotalUmMemory' to 25%


===== Setup the Module Configuration =====


----- Performance Module Configuration -----

Enter number of Performance Modules [1,1024] (1) > 2

*** Parent OAM Module Performance Module #1 Configuration ***

Enter Nic Interface #1 Host Name (csnode1) > 
Enter Nic Interface #1 IP Address or hostname of csnode1 (unassigned) > 192.168.2.10
Enter Nic Interface #2 Host Name (unassigned) > 
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) > 

*** Performance Module #2 Configuration ***

Enter Nic Interface #1 Host Name (unassigned) > csnode2
Enter Nic Interface #1 IP Address or hostname of csnode2 (192.168.2.20) > 
Enter Nic Interface #2 Host Name (unassigned) > 
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm2' () > 
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm2' () > 2

===== Running the MariaDB ColumnStore MariaDB Server setup scripts =====

post-mysqld-install Successfully Completed
post-mysql-install Successfully Completed

Next step is to enter the password to access the other Servers.
This is either user password or you can default to using a ssh key
If using a user password, the password needs to be the same on all Servers.

Enter password, hit 'enter' to default to using a ssh key, or 'exit'> 

===== System Installation =====

System Configuration is complete.
Performing System Installation.

Performing a MariaDB ColumnStore System install using RPM packages
located in the /root directory.


----- Performing Install on 'pm2 / csnode2' -----

Install log file is located here: /tmp/columnstore_tmp_files/pm2_rpm_install.log


MariaDB ColumnStore Package being installed, please wait ...  DONE

===== Checking MariaDB ColumnStore System Logging Functionality =====

The MariaDB ColumnStore system logging is setup and working on local server

===== MariaDB ColumnStore System Startup =====

System Configuration is complete.
Performing System Installation.

----- Starting MariaDB ColumnStore on local server -----

MariaDB ColumnStore successfully started

MariaDB ColumnStore Database Platform Starting, please wait .......... DONE

System Catalog Successfully Created

Run MariaDB ColumnStore Replication Setup..  DONE

MariaDB ColumnStore Install Successfully Completed, System is Active

Enter the following command to define MariaDB ColumnStore Alias Commands

. /etc/profile.d/columnstoreAlias.sh

Enter 'mcsmysql' to access the MariaDB ColumnStore SQL console
Enter 'mcsadmin' to access the MariaDB ColumnStore Admin console

NOTE: The MariaDB ColumnStore Alias Commands are in /etc/profile.d/columnstoreAlias.sh

[root@csnode1 ~]# . /etc/profile.d/columnstoreAlias.sh
[root@csnode1 ~]#

Once installation and setup is done, MariaDB will create a master/slave setup for this so whatever we have loaded from csnode1, it will be replicated to csnode2.

Dumping your Big Data

After your installation, you might have no sample data to try. IMDB has shared a sample data which you can download on their site https://www.imdb.com/interfaces/. For this blog, I created a script which does everything for you. Check it out here https://github.com/paulnamuag/columnstore-imdb-data-load. Just make it executable, then run the script. It will do everything for you by downloading the files, create the schema, then load data to the database. It's simple as that.

Running Your Sample Queries

Now, let's try running some sample queries.

MariaDB [imdb]> select count(1), 'title_akas' table_name from title_akas union all select count(1), 'name_basics' as table_name from name_basics union all select count(1), 'title_crew' as table_name from title_crew union all select count(1), 'title_episode' as table_name from title_episode union all select count(1), 'title_ratings' as table_name from title_ratings order by 1 asc;
+----------+---------------+
| count(1) | table_name    |
+----------+---------------+
|   945057 | title_ratings |
|  3797618 | title_akas    |
|  4136880 | title_episode |
|  5953930 | title_crew    |
|  9403540 | name_basics   |
+----------+---------------+
5 rows in set (0.162 sec)

MariaDB [imdb]> select count(*), 'title_akas' table_name from title_akas union all select count(*), 'name_basics' as table_name from name_basics union all select count(*), 'title_crew' as table_name from title_crew union all select count(*), 'title_episode' as table_name from title_episode union all select count(*), 'title_ratings' as table_name from title_ratings order by 2;
+----------+---------------+
| count(*) | table_name    |
+----------+---------------+
|  9405192 | name_basics   |
|  3797618 | title_akas    |
|  5953930 | title_crew    |
|  4136880 | title_episode |
|   945057 | title_ratings |
+----------+---------------+
5 rows in set (0.371 sec)

Basically, it's faster and quick. There are queries that you cannot processed the same you run with other storage engines, such as InnoDB. For example, I tried to play around and do some foolish queries and see how it reacts and it results to:

MariaDB [imdb]> select a.titleId, a.title, a.region, b.id, b.primaryName, b.profession from title_akas a join name_basics b where b.knownForTitles in (select a.titleId from title_akas) limit 25;
ERROR 1815 (HY000): Internal error: IDB-1000: 'a' and 'title_akas' are not joined.

Hence, I found MCOL-1620 and MCOL-131 and it points to setting infinidb_vtable_mode variable. See below:

MariaDB [imdb]> select a.titleId, a.title, a.region, b.id, b.primaryName, b.profession from title_akas a join name_basics b where b.knownForTitles in (select c.titleId from title_akas c) limit 2;
ERROR 1815 (HY000): Internal error: IDB-1000: 'a' and 'b, sub-query' are not joined.

But setting infinidb_vtable_mode=0, which means it treats query as generic and highly compatible row-by-row processing mode. Some WHERE clause components can be processed by ColumnStore, but joins are processed entirely by mysqld using a nested-loop join mechanism. See below:

MariaDB [imdb]> set infinidb_vtable_mode=0;
Query OK, 0 rows affected (0.000 sec)

MariaDB [imdb]> select a.titleId, a.title, a.region, b.id, b.primaryName, b.profession from title_akas a join name_basics b where b.knownForTitles in (select c.titleId from title_akas c) limit 2;
+-----------+---------------+--------+-----------+-------------+---------------+
| titleId   | title         | region | id        | primaryName | profession    |
+-----------+---------------+--------+-----------+-------------+---------------+
| tt0082880 | Vaticano Show | ES     | nm0594213 | Velda Mitzi | miscellaneous |
| tt0082880 | Il pap'occhio | IT     | nm0594213 | Velda Mitzi | miscellaneous |
+-----------+---------------+--------+-----------+-------------+---------------+
2 rows in set (13.789 sec)

It took sometime though as it explains that it processed entirely by mysqld. Still, optimizing and writing good queries still the best approach and not delegate everything to ColumnStore.

Additionally, you have some help to analyze your queries by running commands such as SELECT calSetTrace(1); or SELECT calGetStats();. You can use these set of commands, for example, optimize the low and bad queries or view its query plan. Check it out here for more details on analyzing the queries.

Administering ColumnStore

Once you have fully setup MariaDB ColumnStore, it ships with its tool named mcsadmin for which you can use to do some administrative tasks. You can also use this tool to add another module, assign or move to DBroots from PM to PM, etc. Check out their manual about this tool.

Basically, you can do the following, for example, checking the system information:

mcsadmin> getSystemi
getsysteminfo   Mon Jun 24 12:55:25 2019

System columnstore-1

System and Module statuses

Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Fri Jun 21 21:40:56 2019

Module pm1    ACTIVE                       Fri Jun 21 21:40:54 2019
Module pm2    ACTIVE                       Fri Jun 21 21:40:50 2019

Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'pm1'
MariaDB ColumnStore Replication Feature is enabled
MariaDB ColumnStore set for Distributed Install


MariaDB ColumnStore Process statuses

Process             Module    Status            Last Status Change        Process ID
------------------  ------    ---------------   ------------------------  ----------
ProcessMonitor      pm1       ACTIVE            Thu Jun 20 17:36:27 2019        6026
ProcessManager      pm1       ACTIVE            Thu Jun 20 17:36:33 2019        6165
DBRMControllerNode  pm1       ACTIVE            Fri Jun 21 21:40:31 2019       19890
ServerMonitor       pm1       ACTIVE            Fri Jun 21 21:40:33 2019       19955
DBRMWorkerNode      pm1       ACTIVE            Fri Jun 21 21:40:33 2019       20003
PrimProc            pm1       ACTIVE            Fri Jun 21 21:40:37 2019       20137
ExeMgr              pm1       ACTIVE            Fri Jun 21 21:40:42 2019       20541
WriteEngineServer   pm1       ACTIVE            Fri Jun 21 21:40:47 2019       20660
DDLProc             pm1       ACTIVE            Fri Jun 21 21:40:51 2019       20810
DMLProc             pm1       ACTIVE            Fri Jun 21 21:40:55 2019       20956
mysqld              pm1       ACTIVE            Fri Jun 21 21:40:41 2019       19778

ProcessMonitor      pm2       ACTIVE            Thu Jun 20 17:37:16 2019        9728
ProcessManager      pm2       HOT_STANDBY       Fri Jun 21 21:40:26 2019       25211
DBRMControllerNode  pm2       COLD_STANDBY      Fri Jun 21 21:40:32 2019
ServerMonitor       pm2       ACTIVE            Fri Jun 21 21:40:35 2019       25560
DBRMWorkerNode      pm2       ACTIVE            Fri Jun 21 21:40:36 2019       25593
PrimProc            pm2       ACTIVE            Fri Jun 21 21:40:40 2019       25642
ExeMgr              pm2       ACTIVE            Fri Jun 21 21:40:44 2019       25715
WriteEngineServer   pm2       ACTIVE            Fri Jun 21 21:40:48 2019       25768
DDLProc             pm2       COLD_STANDBY      Fri Jun 21 21:40:50 2019
DMLProc             pm2       COLD_STANDBY      Fri Jun 21 21:40:50 2019
mysqld              pm2       ACTIVE            Fri Jun 21 21:40:32 2019       25467

Active Alarm Counts: Critical = 1, Major = 0, Minor = 0, Warning = 0, Info = 0

Conclusion

MariaDB ColumnStore is a very powerful storage engine for your OLAP and big data processing. This is entirely open source which is very advantageous to use than using proprietary and expensive OLAP databases available in the market. Yet, there are other alternatives to try such as ClickHouse, Apache HBase, or Citus Data's cstore_fdw. However, neither of these are using MySQL/MariaDB so it might not be your viable option if you choose to stick on the MySQL/MariaDB variants.

Tags:

It’s always a headache... you need to add a new user role or change some privileges, and you need to assign it one... by... one. This is a regular duty, especially in large organizations, or in a company where you have a complex privilege structure, or even if you have to manage a high number of database users.

For example, let’s say you need to add the UPDATE privilege to a specific database for all the QA team, if they’re a team of five there’s no problem, but if they’re 50... or 100... that can get hard. Of course, you can always write a script for it, but in this way there is always risk.

In this blog, we’ll see how we can solve this database user management issue by using roles and with specific tips on how to use them with MariaDB.

What is a Role?

In the database world, a role is a group of privileges that can be assigned to one or more users, and a user can have one or more roles assigned to him. To make a comparison, it’s like a group on Linux OS.

If we see the previous example about the UPDATE privilege on the QA team, if we have the QA role created, and all the QA members have this role assigned, it doesn’t matter the number of members, you only need to change the privilege on this QA role and it’ll be propagated for all the QA users.

Roles on MariaDB

To manage roles on MariaDB you must create the role with the CREATE ROLE statement, assign the privilege to that role with a GRANT statement, and then assign the privilege to the user to be able to use this role. You can also set a default role, so the user will take it when connecting.

As a database user, you must set the role when you access the database (if there is not a default role), and you can change the role if needed with a SET ROLE statement.

From the application side, you should be able to set the role (or use the default) before querying to make this work, so in old applications, it could be complex to implement.

Let’s see some specification for Roles on MariaDB.

Only one role can be active at the same time for the current user.
Since MariaDB 10.1 we have a Default Role. This role is automatically enabled when the user connects.
Roles are stored in memory.

How to Check Roles

On MariaDB there are multiple ways to check it:

SHOW GRANTS [ FOR (user | role) ]: List the grants for the current user or for a specific one.

MariaDB [testing]> SHOW GRANTS for testuser@'%';
+----------------------------------------------------------------------------------------------------------+
| Grants for testuser@%                                                                                   |
+----------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'testuser'@'%' IDENTIFIED BY PASSWORD '*FAAFFE644E901CFAFAEC7562415E5FAEC243B8B2' |
+----------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)

SELECT user FROM mysql.user WHERE is_role='Y': List the roles created in the database.

MariaDB [testing]> SELECT user FROM mysql.user WHERE is_role='Y';
+--------+
| user   |
+--------+
| qateam |
+--------+
1 row in set (0.000 sec)

SELECT * FROM information_schema.applicable_roles: It’s a list of available roles for the current user.

MariaDB [testing]> SELECT * FROM information_schema.applicable_roles;
+-------------+-----------+--------------+------------+
| GRANTEE     | ROLE_NAME | IS_GRANTABLE | IS_DEFAULT |
+-------------+-----------+--------------+------------+
| testuser@%  | qateam    | NO           | NO         |
+-------------+-----------+--------------+------------+
1 row in set (0.000 sec)

SELECT * FROM information_schema.enabled_roles: List the current active roles.

MariaDB [testing]> SELECT * FROM information_schema.enabled_roles;
+-----------+
| ROLE_NAME |
+-----------+
| qateam    |
+-----------+
1 row in set (0.000 sec)

SELECT * FROM mysql.roles_mapping: List the relations between roles and user grants.

MariaDB [testing]> SELECT * FROM mysql.roles_mapping;
+-----------+-----------+--------+--------------+
| Host      | User      | Role   | Admin_option |
+-----------+-----------+--------+--------------+
| localhost | root      | qateam | Y            |
| %         | testuser  | qateam | N            |
+-----------+-----------+--------+--------------+
2 rows in set (0.000 sec)

How to manage roles on MariaDB

Let’s see an example of how to manage it on MariaDB. In this case, we’ll use MariaDB 10.3 version running on CentOS 7.

First, let’s create a new database user:

MariaDB [testing]> CREATE USER testuser@'%' IDENTIFIED BY 'PASSWORD';

If we check the grants for this new user, we’ll see something like this:

MariaDB [testing]> SHOW GRANTS for testuser@'%';
+----------------------------------------------------------------------------------------------------------+
| Grants for testuser@%                                                                                   |
+----------------------------------------------------------------------------------------------------------+
| GRANT USAGE ON *.* TO 'testuser'@'%' IDENTIFIED BY PASSWORD '*FAAFFE644E901CFAFAEC7562415E5FAEC243B8B2' |
+----------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)

Now, let’s try to login with this user and connect to the testing database:

$ mysql -utestuser -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 54
Server version: 10.3.16-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use testing
ERROR 1044 (42000): Access denied for user 'testuser'@'%' to database 'testing'

As we could see, we can’t connect to the testing database with this user, so, now, we’ll create a “qateam” role with the privileges and we’ll assign this role to this new user.

MariaDB [testing]> CREATE ROLE qateam;
Query OK, 0 rows affected (0.001 sec)

MariaDB [testing]> GRANT SELECT,INSERT,UPDATE,DELETE ON testing.* TO qateam;
Query OK, 0 rows affected (0.000 sec)

If we try to use this role without the GRANT, we’ll see the following error:

MariaDB [(none)]> SET ROLE qateam;
ERROR 1959 (OP000): Invalid role specification `qateam`

So, now we’ll run the GRANT to allow the user use it:

MariaDB [(none)]> GRANT qateam TO testuser@'%';
Query OK, 0 rows affected (0.000 sec)

Set the role to the current user:

MariaDB [(none)]> SET ROLE qateam;
Query OK, 0 rows affected (0.000 sec)

And try to access the database:

MariaDB [(none)]> use testing;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [testing]>

We can check the grants for the current user:

MariaDB [(none)]> SHOW GRANTS for testuser@'%';
+----------------------------------------------------------------------------------------------------------+
| Grants for testuser@%                                                                                   |
+----------------------------------------------------------------------------------------------------------+
| GRANT qateam TO 'testuser'@'%'                                                                          |
| GRANT USAGE ON *.* TO 'testuser'@'%' IDENTIFIED BY PASSWORD '*FAAFFE644E901CFAFAEC7562415E5FAEC243B8B2' |
+----------------------------------------------------------------------------------------------------------+
2 rows in set (0.000 sec)

And the current role:

MariaDB [testing]> SELECT CURRENT_ROLE;
+--------------+
| CURRENT_ROLE |
+--------------+
| qateam       |
+--------------+
1 row in set (0.000 sec)

Here we can see the grant for the qateam role, and that’s it, we don’t have the privilege assigned directly to the user, we have the privileges for the role, and the user takes the privileges from there.

Conclusion

Managing roles can make our life easier in large companies or databases with a high number of user that access it. If we want to use it from our application, we must take into account the application must be able to manage it too.

Tags:

Galera Cluster, with its (virtually) synchronous replication, is commonly used in many different types of environments. Scaling it by adding new nodes is not hard (or just as simple a couple of clicks when you use ClusterControl).

The main problem with synchronous replication is, well, the synchronous part which often results in that the whole cluster being only as fast as its slowest node. Any write executed on a cluster has to be replicated to all of the nodes and certified on them. If, for whatever reason, this process slows down, it can seriously impact the cluster’s ability to accommodate writes. Flow control will then kick in, this is in order to ensure that the slowest node can still keep up with the load. This makes it quite tricky for some of the common scenarios that happen in a real world environment.

First off, let’s discuss geographically distributed disaster recovery. Sure, you can run clusters across a Wide Area Network, but the increased latency will have a significant impact on the cluster’s performance. This seriously limits the ability of using such a setup, especially over longer distances when latency is higher.

Another quite common use case - a test environment for major version upgrade. It is not a good idea to mix different versions of MariaDB Galera Cluster nodes in the same cluster, even if it is possible. On the other hand, migration to the more recent version requires detailed tests. Ideally, both reads and writes would have been tested. One way to achieve that is to create a separate Galera cluster and run the tests, but you would like to run tests in an environment as close to production as possible. Once provisioned, a cluster can be used for tests with real world queries but it would be hard to generate a workload which would be close to that of production. You cannot move some of the production traffic to such test system, this is because the data is not current.

Finally, the migration itself. Again, what we said earlier, even if it is possible to mix old and new versions of Galera nodes in the same cluster, it is not the safest way to do it.

Luckily, the simplest solution for all those three issues would be to connect separate Galera clusters with an asynchronous replication. What makes it such a good solution? Well, it’s asynchronous which makes it not affect the Galera replication. There is no flow control, thus the performance of the “master” cluster will not be affected by the performance of the “slave” cluster. As with every asynchronous replication, a lag may show up, but as long as it stays within acceptable limits, it can work perfectly fine. You also have to keep in mind that nowadays asynchronous replication can be parallelized (multiple threads can work together to increase bandwidth) and reduce replication lag even further.

In this blog post we will discuss what are the steps to deploy asynchronous replication between MariaDB Galera clusters.

How to Configure Asynchronous Replication Between MariaDB Galera Clusters?

First off we have to have to deploy a cluster. For our purposes we setup a three node cluster. We will keep the setup to the minimum, thus we will not discuss the complexity of the application and proxy layer. Proxy layer may be very useful for handling tasks for which you want to deploy asynchronous replication - redirecting a subset of the read-only traffic to the test cluster, helping in the disaster recovery situation when the “main” cluster is not available by redirecting the traffic to the DR cluster. There are numerous proxies you can try, depending on your preference - HAProxy, MaxScale or ProxySQL - all can be used in such setups and, depending on the case, some of them may be able to help you manage your traffic.

Configuring the Source Cluster

Our cluster consists of three MariaDB 10.3 nodes, we also deployed ProxySQL to do the read-write split and distribute the traffic across all nodes in the cluster. This is not a production-grade deployment, for that we would have to deploy more ProxySQL nodes and a Keepalived on top of them. It is still enough for our purposes. To set up asynchronous replication we will have to have a binary log enabled on our cluster. At least one node but it’s better to keep it enabled on all of them in case the only node with binlog enabled go down - then you want to have another node in the cluster up and running that you can slave off.

When enabling binary log, make sure that you configure the binary log rotation so the old logs will be removed at some point. You will use ROW binary log format. You should also ensure that you have GTID configured and in use - it will come very handy when you would have to reslave your “slave” cluster or if you would need to enable multi-threaded replication. As this is a Galera cluster, you want to have ‘wsrep_gtid_domain_id’ configured and ‘wsrep_gtid_mode’ enabled. Those settings will ensure that GTID’s will be generated for the traffic coming from the Galera cluster. More information can be found in the documentation. Once this is all done, you can proceed with setting up the second cluster.

Setting Up the Target Cluster

Given that currently there is no target cluster, we have to start with deploying it. We will not cover those steps in detail, you can find instructions in the documentation. Generally speaking the process consists of several steps:

Configure MariaDB repositories
Install MariaDB 10.3 packages
Configure nodes to form a cluster

At the beginning we will start with just one node. You can setup all of them to form a cluster but then you should stop them and use just one for the next step. That one node will become a slave to the original cluster. We will use mariabackup to provision it. Then we will configure the replication.

First, we have to create a directory where we will store the backup:

mkdir /mnt/mariabackup

Then we execute the backup and create it in the directory prepared in the step above. Please make sure you use the correct user and password to connect to the database:

mariabackup --backup --user=root --password=pass --target-dir=/mnt/mariabackup/

Next, we have to copy the backup files to the first node in the second cluster. We used scp for that, you can use whatever you like - rsync, netcat, anything which will work.

scp -r /mnt/mariabackup/* 10.0.0.104:/root/mariabackup/

After the backup has been copied, we have to prepare it by applying the log files:

mariabackup --prepare --target-dir=/root/mariabackup/
mariabackup based on MariaDB server 10.3.16-MariaDB debian-linux-gnu (x86_64)
[00] 2019-06-24 08:35:39 cd to /root/mariabackup/
[00] 2019-06-24 08:35:39 This target seems to be not prepared yet.
[00] 2019-06-24 08:35:39 mariabackup: using the following InnoDB configuration for recovery:
[00] 2019-06-24 08:35:39 innodb_data_home_dir = .
[00] 2019-06-24 08:35:39 innodb_data_file_path = ibdata1:100M:autoextend
[00] 2019-06-24 08:35:39 innodb_log_group_home_dir = .
[00] 2019-06-24 08:35:39 InnoDB: Using Linux native AIO
[00] 2019-06-24 08:35:39 Starting InnoDB instance for recovery.
[00] 2019-06-24 08:35:39 mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2019-06-24  8:35:39 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-06-24  8:35:39 0 [Note] InnoDB: Uses event mutexes
2019-06-24  8:35:39 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
2019-06-24  8:35:39 0 [Note] InnoDB: Number of pools: 1
2019-06-24  8:35:39 0 [Note] InnoDB: Using SSE2 crc32 instructions
2019-06-24  8:35:39 0 [Note] InnoDB: Initializing buffer pool, total size = 100M, instances = 1, chunk size = 100M
2019-06-24  8:35:39 0 [Note] InnoDB: Completed initialization of buffer pool
2019-06-24  8:35:39 0 [Note] InnoDB: page_cleaner coordinator priority: -20
2019-06-24  8:35:39 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=3448619491
2019-06-24  8:35:40 0 [Note] InnoDB: Starting final batch to recover 759 pages from redo log.
2019-06-24  8:35:40 0 [Note] InnoDB: Last binlog file '/var/lib/mysql-binlog/binlog.000003', position 865364970
[00] 2019-06-24 08:35:40 Last binlog file /var/lib/mysql-binlog/binlog.000003, position 865364970
[00] 2019-06-24 08:35:40 mariabackup: Recovered WSREP position: e79a3494-964f-11e9-8a5c-53809a3c5017:25740

[00] 2019-06-24 08:35:41 completed OK!

In case of any error you may have to re-execute the backup. If everything went ok, we can remove the old data and replace it with the backup information

rm -rf /var/lib/mysql/*
mariabackup --copy-back --target-dir=/root/mariabackup/
…
[01] 2019-06-24 08:37:06 Copying ./sbtest/sbtest10.frm to /var/lib/mysql/sbtest/sbtest10.frm
[01] 2019-06-24 08:37:06         ...done
[00] 2019-06-24 08:37:06 completed OK!

We also want to set the correct owner of the files:

chown -R mysql.mysql /var/lib/mysql/

We will be relying on GTID to keep the replication consistent thus we need to see what was the last applied GTID in this backup. That information can be found in xtrabackup_info file that’s part of the backup:

root@vagrant:~/mariabackup# cat /var/lib/mysql/xtrabackup_info | grep binlog_pos
binlog_pos = filename 'binlog.000003', position '865364970', GTID of the last change '9999-1002-23012'

We will also have to ensure that the slave node has binary logs enabled along with ‘log_slave_updates’. Ideally, this will be enabled on all of the nodes in the second cluster - just in case the “slave” node failed and you would have to set up the replication using another node in the slave cluster.

The last bit we need to do before we can set up the replication is to create an user which we will use to run the replication:

MariaDB [(none)]> CREATE USER 'repuser'@'10.0.0.104' IDENTIFIED BY 'reppass';
Query OK, 0 rows affected (0.077 sec)

MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.*  TO 'repuser'@'10.0.0.104';
Query OK, 0 rows affected (0.012 sec)

That’s all we need. Now, we can start the first node in the second cluster, our to-be-slave:

galera_new_cluster

Once it’s started, we can enter MySQL CLI and configure it to become a slave, using the GITD position we found couple steps earlier:

mysql -ppass

MariaDB [(none)]> SET GLOBAL gtid_slave_pos = '9999-1002-23012';
Query OK, 0 rows affected (0.026 sec)

Once that’s done, we can finally set up the replication and start it:

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.0.0.101', MASTER_PORT=3306, MASTER_USER='repuser', MASTER_PASSWORD='reppass', MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.016 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.010 sec)

At this point we have a Galera Cluster consisting of one node. That node is also a slave of the original cluster (in particular, its master is node 10.0.0.101). To join other nodes we will use SST but to make it work first we have to ensure that SST configuration is correct - please keep in mind that we just replaced all the users in our second cluster with the contents of the source cluster. What you have to do now is to ensure that ‘wsrep_sst_auth’ configuration of the second cluster matches the one of the first cluster. Once that’s done, you can start remaining nodes one by one and they should join the existing node (10.0.0.104), get the data over SST and form the Galera cluster. Eventually, you should end up with two clusters, three node each, with asynchronous replication link across them (from 10.0.0.101 to 10.0.0.104 in our example). You can confirm that the replication is working by checking the value of:

MariaDB [(none)]> show global status like 'wsrep_last_committed';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 106   |
+----------------------+-------+
1 row in set (0.001 sec)

MariaDB [(none)]> show global status like 'wsrep_last_committed';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| wsrep_last_committed | 114   |
+----------------------+-------+
1 row in set (0.001 sec)

How to Configure Asynchronous Replication Between MariaDB Galera Clusters Using ClusterControl?

As of the time of this blog, ClusterControl does not have the functionality to configure asynchronous replication across multiple clusters, we are working on it as I type this. Nonetheless ClusterControl can be of great help in this process - we will show you how you can speed up the laborious manual steps using automation provided by ClusterControl.

From what we showed before, we can conclude that those are the general steps to take when setting up replication between two Galera clusters:

Deploy a new Galera cluster
Provision new cluster using data from the old one
Configure new cluster (SST configuration, binary logs)
Set up the replication between the old and the new cluster

First three points are something you can easily do using ClusterControl even now. We are going to show you how to do that.

Deploy and Provision a New MariaDB Galera Cluster Using ClusterControl

The initial situation is similar - we have one cluster up and running. We have to set up the second one. One of the more recent features of ClusterControl is an option to deploy a new cluster and provision it using the data from backup. This is very useful to create test environments, it is also an option we will use to provision our new cluster for the replication setup. Therefore the first step we will take is to create a backup using mariabackup:

Three steps in which we picked the node to take the backup off it. This node (10.0.0.101) will become a master. It has to have binary logs enabled. In our case all of the nodes have binlog enabled but if they hadn’t it’s very easy to enable it from the ClusterControl - we will show the steps later, when we will do it for the second cluster.

Once the backup is completed, it will become visible on the list. We can then proceed and restore it:

Should we want that, we could even do the Point-In-Time Recovery, but in our case it does not really matter: once the replication will be configured, all required transactions from binlogs will be applied on the new cluster.

Then we pick the option to create a cluster from the backup. This opens another dialog:

It is a confirmation which backup will be used, which host the backup was taken from, what method was used to create it and some metadata to help verify if the backup looks sound.

Then we basically go to the regular deployment wizard in which we have to define SSH connectivity between ClusterControl host and the nodes to deploy the cluster on (the requirement for ClusterControl) and, in the second step, vendor, version, password and nodes to deploy on:

That’s all regarding deployment and provisioning. ClusterControl will set up the new cluster and it will provision it using the data from the old one.

We can monitor the progress in the activity tab. Once completed, second cluster will show up on the cluster list in ClusterControl.

Reconfiguration of the New Cluster Using ClusterControl

Now, we have to reconfigure the cluster - we will enable binary logs. In the manual process we had to make changes in the wsrep_sst_auth config and also configuration entries in [mysqldump] and [xtrabackup] sections of the config. Those settings can be found in secrets-backup.cnf file. This time it is not needed as ClusterControl generated new passwords for the cluster and configured the files correctly. What is important to keep in mind, though, should you change the password of the ‘backupuser’@’127.0.0.1’ user in the original cluster, you will have to make configuration changes in the second cluster too to reflect that as changes in the first cluster will replicate to the second cluster.

Binary logs can be enabled from the Nodes section. You have to pick node by node and run “Enable Binary Logging” job. You will be presented with a dialog:

Here you can define how long you would like to keep the logs, where they should be stored and if ClusterControl should restart the node for you to apply changes - binary log configuration is not dynamic and MariaDB has to be restarted to apply those changes.

When the changes will complete you will see all nodes marked as “master”, which means that those nodes have binary log enabled and can act as master.

If we do not have replication user created already, we have to do that. In the first cluster we have to go to Manage -> Schemas and Users:

On the right hand side we have an option to create a new user:

This concludes the configuration required to set up the replication.

Setting up replication between clusters using ClusterControl

As we stated, we are working on automating this part. Currently it has to be done manually. As you may remember, we need GITD position of our backup and then run couple commands using MySQL CLI. GTID data is available in the backup. ClusterControl creates backup using xbstream/mbstream and it compresses it afterwards. Our backup is stored on the ClusterControl host where we don’t have access to mbstream binary. You can try to install it or you can copy the backup file to the location, where such binary is available:

scp /root/backups/BACKUP-2/ backup-full-2019-06-24_144329.xbstream.gz 10.0.0.104:/root/mariabackup/

Once that’s done, on 10.0.0.104 we want to check the contents of xtrabackup_info file:

cd /root/mariabackup
zcat backup-full-2019-06-24_144329.xbstream.gz | mbstream -x
root@vagrant:~/mariabackup# cat /root/mariabackup/xtrabackup_info | grep binlog_pos
binlog_pos = filename 'binlog.000007', position '379', GTID of the last change '9999-1002-846116'

Finally, we configure the replication and start it:

MariaDB [(none)]> SET GLOBAL gtid_slave_pos ='9999-1002-846116';
Query OK, 0 rows affected (0.024 sec)

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='10.0.0.101', MASTER_PORT=3306, MASTER_USER='repuser', MASTER_PASSWORD='reppass', MASTER_USE_GTID=slave_pos;
Query OK, 0 rows affected (0.024 sec)

MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.010 sec)

This is it - we just configured asynchronous replication between two MariaDB Galera clusters using ClusterControl. As you could have seen, ClusterControl was able to automate the majority of the steps we had to take in order to set up this environment.

Tags:

Ever since ClusterControl 1.2.11 was released in 2015, MariaDB MaxScale has been supported as a database load balancer. Over the years MaxScale has grown and matured, adding several rich features. Recently MariaDB MaxScale 2.2 was released and it introduces several new features including replication cluster failover management.

MariaDB MaxScale allows for master/slave deployments with high availability, automatic failover, manual switchover, and automatic rejoin. If the master fails, MariaDB MaxScale can automatically promote the most up-to-date slave to master. If the failed master is recovered, MariaDB MaxScale can automatically reconfigure it as a slave to the new master. In addition, administrators can perform a manual switchover to change the master on demand.

In our previous blogs we discussed how to Deploy MaxScale Using ClusterControl as well as Deploying MariaDB MaxScale on Docker. For those who are not yet familiar with MariaDB MaxScale, it is an advanced, plug-in, database proxy for MariaDB database servers. Maxscale sits between client applications and the database servers, routing client queries and server responses. It also monitors the servers, quickly noticing any changes in server status or replication topology.

Though Maxscale shares some of the characteristics of other load balancing technologies like ProxySQL, this new failover feature (which is part of its monitoring and autodetection mechanism) stands out. In this blog we’re going to discuss this exciting new function of Maxscale.

Overview of the MariaDB MaxScale Failover Mechanism

Master Detection

The monitor is now less likely to suddenly change the master server, even if another server has more slaves than the current master. The DBA can force a master reselection by setting the current master read-only, or by removing all its slaves if the master is down.

Only one server can have the Master status flag at a time, even in a multimaster setup. Others servers in the multimaster group are given the Relay Master and Slave status flags.

Switchover New Master Autoselection

The switchover command can now be called with just the monitor instance name as parameter. In this case the monitor will automatically select a server for promotion.

Replication Lag Detection

The replication lag measurement now simply reads the Seconds_Behind_Master-field of the slave status output of slaves. The slave calculates this value by comparing the time stamp in the binlog event the slave is currently processing to the slave's own clock. If a slave has multiple slave connections, the smallest lag is used.

Automatic Switchover After Low Disk Space Detection

With the recent MariaDB Server versions, the monitor can now check the disk space on the backend and detect if the server is running low. When this happens, the monitor can be set to automatically switchover from a master low on disk space. Slaves can also be set to maintenance mode. The disk space is also a factor which is considered when selecting which new master to promote.

See switchover_on_low_disk_space and maintenance_on_low_disk_space for more information.

Replication Reset Feature

The reset-replication monitor command deletes all slave connections and binary logs, and then sets up replication. Useful when data is in sync but gtid's are not.

Scheduled Events Handling in Failover/Switchover/Rejoin

Server events launched by the event scheduler thread are now handled during cluster modification operations. See handle_server_events for more information.

External Master Support

The monitor can detect if a server in the cluster is replicating from an external master (a server that is not being monitored by the MaxScale monitor). If the replicating server is the cluster master server, then the cluster itself is considered to have an external master.

If a failover/switchover happens, the new master server is set to replicate from the cluster external master server. The username and password for the replication are defined in replication_user and replication_password. The address and port used are the ones shown by SHOW ALL SLAVES STATUS on the old cluster master server. In the case of switchover, the old master also stops replicating from the external server to preserve the topology.

After failover the new master is replicating from the external master. If the failed old master comes back online, it is also replicating from the external server. To normalize the situation, either have auto_rejoin on or manually execute a rejoin. This will redirect the old master to the current cluster master.

How Failover is Useful and Applicable?

Failover helps you minimize downtime, perform daily maintenance, or handle disastrous and unwanted maintenance that can sometimes occur at unfortunate times. With MaxScale’s ability to insulate client applications from backend database servers, it adds valuable functionality that help minimize downtime.

The MaxScale monitoring plugin continuously monitors the state of backend database servers. MaxScale’s routing plugin then uses this status information to always route queries to backend database servers that are in service. It is then able to send queries to the backend database clusters, even if some of the servers of a cluster are going through maintenance or experiencing failure.

MaxScale’s high configurability enables changes in cluster configuration to remain transparent to client applications. For example, if a new server needs to be administratively added to or removed from a master-slave cluster, you can simply add the MaxScale configuration to the server list of monitor and router plugins via the maxadmin CLI console. The client application will be completely unaware of this change and will continue to send database queries to the MaxScale’s listening port.

Setting a database server in maintenance is simple and easy. Simply do the following command using maxctrl and MaxScale will stop sending any queries to this server. For example,

 maxctrl: set server DB_785 maintenance
OK

Then checking the servers state as follows,

 maxctrl: list servers
┌────────┬───────────────┬──────┬─────────────┬──────────────────────┬────────────┐
│ Server │ Address       │ Port │ Connections │ State                │ GTID       │
├────────┼───────────────┼──────┼─────────────┼──────────────────────┼────────────┤
│ DB_783 │ 192.168.10.10 │ 3306 │ 0           │ Master, Running      │ 0-43001-70 │
├────────┼───────────────┼──────┼─────────────┼──────────────────────┼────────────┤
│ DB_784 │ 192.168.10.20 │ 3306 │ 0           │ Slave, Running       │ 0-43001-70 │
├────────┼───────────────┼──────┼─────────────┼──────────────────────┼────────────┤
│ DB_785 │ 192.168.10.30 │ 3306 │ 0           │ Maintenance, Running │ 0-43001-70 │
└────────┴───────────────┴──────┴─────────────┴──────────────────────┴────────────┘

Once in maintenance mode, MaxScale will stop routing any new requests to the server. For current requests, MaxScale will not kill these sessions, but rather will allow it to complete its execution and will not interrupt any running queries while in maintenance mode. Also, take note that the maintenance mode is not persistent. If MaxScale restarts when a node is in maintenance mode, a new instance of MariaDB MaxScale will not honor this mode. If multiple MariaDB MaxScale instances are configured to use the node them maintenance mode must be set within each MariaDB MaxScale instance. However, if multiple services within one MariaDB MaxScale instance are using the server then you only need to set the maintenance mode once on the server for all services to take note of the mode change.

Once done with your maintenance, just clear the server with the following command. For example,

 maxctrl: clear server DB_785 maintenance
OK

Checking if it's set back to normal, just run the command list servers.

You can also apply certain administrative actions through ClusterControl UI as well. See the example screenshot below:

MaxScale Failover In-Action

The Automatic Failover

MariaDB's MaxScale failover performs very efficiently and reconfigures the slave accordingly as expected. In this test, we have the following configuration file set which was created and managed by ClusterControl. See below:

[replication_monitor]
type=monitor
servers=DB_783,DB_784,DB_785
disk_space_check_interval=1000
disk_space_threshold=/:85
detect_replication_lag=true
enforce_read_only_slaves=true
failcount=3
auto_failover=1
auto_rejoin=true
monitor_interval=300
password=725DE70F196694B277117DC825994D44
user=maxscalecc
replication_password=5349E1268CC4AF42B919A42C8E52D185
replication_user=rpl_user
module=mariadbmon

Take note that, only the auto_failover and auto_rejoin are the variables that I have added since ClusterControl won't add this by default once you setup a MaxScale load balancer (check out this blog on how to setup MaxScale using ClusterControl). Do not forget that you need to restart MariaDB MaxScale once you have applied the changes in your configuration file. Just run,

systemctl restart maxscale

and you're good to go.

Before proceeding the failover test, let's check first the cluster's health:

 maxctrl: list servers
┌────────┬───────────────┬──────┬─────────────┬─────────────────┬────────────┐
│ Server │ Address       │ Port │ Connections │ State           │ GTID       │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_783 │ 192.168.10.10 │ 3306 │ 0           │ Master, Running │ 0-43001-75 │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_784 │ 192.168.10.20 │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_785 │ 192.168.10.30 │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
└────────┴───────────────┴──────┴─────────────┴─────────────────┴────────────┘

Looks great!

I killed the master with just the pure killer command KILL -9 $(pidof mysqld) in my master node and see, to no surprise, the monitor has been quick to notice this and triggers the failover. See the logs as follows:

2019-06-28 06:39:14.306   error  : (mon_log_connect_error): Monitor was unable to connect to server DB_783[192.168.10.10:3306] : 'Can't connect to MySQL server on '192.168.10.10' (115)'
2019-06-28 06:39:14.329   notice : (mon_log_state_change): Server changed state: DB_783[192.168.10.10:3306]: master_down. [Master, Running] -> [Down]
2019-06-28 06:39:14.329   warning: (handle_auto_failover): Master has failed. If master status does not change in 2 monitor passes, failover begins.
2019-06-28 06:39:15.011   notice : (select_promotion_target): Selecting a server to promote and replace 'DB_783'. Candidates are: 'DB_784', 'DB_785'.
2019-06-28 06:39:15.011   warning: (warn_replication_settings): Slave 'DB_784' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2019-06-28 06:39:15.012   warning: (warn_replication_settings): Slave 'DB_785' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2019-06-28 06:39:15.012   notice : (select_promotion_target): Selected 'DB_784'.
2019-06-28 06:39:15.012   notice : (handle_auto_failover): Performing automatic failover to replace failed master 'DB_783'.
2019-06-28 06:39:15.017   notice : (redirect_slaves_ex): Redirecting 'DB_785' to replicate from 'DB_784' instead of 'DB_783'.
2019-06-28 06:39:15.024   notice : (redirect_slaves_ex): All redirects successful.
2019-06-28 06:39:15.527   notice : (wait_cluster_stabilization): All redirected slaves successfully started replication from 'DB_784'.
2019-06-28 06:39:15.527   notice : (handle_auto_failover): Failover 'DB_783' -> 'DB_784' performed.
2019-06-28 06:39:15.634   notice : (mon_log_state_change): Server changed state: DB_784[192.168.10.20:3306]: new_master. [Slave, Running] -> [Master, Running]
2019-06-28 06:39:20.165   notice : (mon_log_state_change): Server changed state: DB_783[192.168.10.10:3306]: slave_up. [Down] -> [Slave, Running]

Now let's have a look at its cluster's health,

 maxctrl: list servers
┌────────┬───────────────┬──────┬─────────────┬─────────────────┬────────────┐
│ Server │ Address       │ Port │ Connections │ State           │ GTID       │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_783 │ 192.168.10.10 │ 3306 │ 0           │ Down            │ 0-43001-75 │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_784 │ 192.168.10.20 │ 3306 │ 0           │ Master, Running │ 0-43001-75 │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_785 │ 192.168.10.30 │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
└────────┴───────────────┴──────┴─────────────┴─────────────────┴────────────┘

The node 192.168.10.10 which was previously the master has been down. I tried to restart and see if auto-rejoin would trigger, and as you noticed in the log at time 2019-06-28 06:39:20.165, it has been so quick to catch the state of the node and then sets up the configuration automatically with no hassle for the DBA to turn it on.

Now, checking lastly on its state, it looks perfectly working as expected. See below:

 maxctrl: list servers
┌────────┬───────────────┬──────┬─────────────┬─────────────────┬────────────┐
│ Server │ Address       │ Port │ Connections │ State           │ GTID       │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_783 │ 192.168.10.10 │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_784 │ 192.168.10.20 │ 3306 │ 0           │ Master, Running │ 0-43001-75 │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_785 │ 192.168.10.30 │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
└────────┴───────────────┴──────┴─────────────┴─────────────────┴────────────┘

My ex-Master Has Been Fixed and Recovered and I Want to Switch Over

Switching over to your previous master is no hassle as well. You can operate this with maxctrl (or maxadmin in previous versions of MaxScale) or through ClusterControl UI (as previously demonstrated).

Let's just refer to the previous state of the replication cluster health earlier, and wanted to switch the 192.168.10.10 (currently slave), back to its master state. Before we proceed, you might need to identify first the monitor you are going to use. You can verify this with the following command below:

 maxctrl: list monitors
┌─────────────────────┬─────────┬────────────────────────┐
│ Monitor             │ State   │ Servers                │
├─────────────────────┼─────────┼────────────────────────┤
│ replication_monitor │ Running │ DB_783, DB_784, DB_785 │
└─────────────────────┴─────────┴────────────────────────┘

Once you have it, you can do the following command below to switch over:

maxctrl: call command mariadbmon switchover replication_monitor DB_783 DB_784
OK

Then check again the state of the cluster,

 maxctrl: list servers
┌────────┬───────────────┬──────┬─────────────┬─────────────────┬────────────┐
│ Server │ Address       │ Port │ Connections │ State           │ GTID       │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_783 │ 192.168.10.10 │ 3306 │ 0           │ Master, Running │ 0-43001-75 │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_784 │ 192.168.10.20 │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
├────────┼───────────────┼──────┼─────────────┼─────────────────┼────────────┤
│ DB_785 │ 192.168.10.30 │ 3306 │ 0           │ Slave, Running  │ 0-43001-75 │
└────────┴───────────────┴──────┴─────────────┴─────────────────┴────────────┘

Looks perfect!

Logs will verbosely show you how it went and its series of action during the switch over. See the details below:

2019-06-28 07:03:48.064   error  : (switchover_prepare): 'DB_784' is not a valid promotion target for switchover because it is already the master.
2019-06-28 07:03:48.064   error  : (manual_switchover): Switchover cancelled.
2019-06-28 07:04:30.700   notice : (create_start_slave): Slave connection from DB_784 to [192.168.10.10]:3306 created and started.
2019-06-28 07:04:30.700   notice : (redirect_slaves_ex): Redirecting 'DB_785' to replicate from 'DB_783' instead of 'DB_784'.
2019-06-28 07:04:30.708   notice : (redirect_slaves_ex): All redirects successful.
2019-06-28 07:04:31.209   notice : (wait_cluster_stabilization): All redirected slaves successfully started replication from 'DB_783'.
2019-06-28 07:04:31.209   notice : (manual_switchover): Switchover 'DB_784' -> 'DB_783' performed.
2019-06-28 07:04:31.318   notice : (mon_log_state_change): Server changed state: DB_783[192.168.10.10:3306]: new_master. [Slave, Running] -> [Master, Running]
2019-06-28 07:04:31.318   notice : (mon_log_state_change): Server changed state: DB_784[192.168.10.20:3306]: new_slave. [Master, Running] -> [Slave, Running]

In the case of a wrong switch over, it will not proceed and hence it will generate an error as shown in the log above. So you'll be safe and no scary surprises at all.

Making Your MaxScale Highly Available

While it's a bit off-topic in regards to failover, I wanted to add some valuable points here with regard to high availability and how it related to MariaDB MaxScale failover.

Making your MaxScale highly available is an important part in the event that your system crashes, experience disk corruption, or virtual machine corruption. These situations are inevitable and can affect the state of your automated failover setup when these unexpected maintenance cycles occur.

For a replication cluster type environment, this is very beneficial and highly recommended for a specific MaxScale setup. The purpose of this is that, only one MaxScale instance should be allowed to modify the cluster at any given time. If you have setup with Keepalived, this is where the instances with the status of MASTER. MaxScale itself does not know its state, but with maxctrl (or maxadmin in previous versions) can set a MaxScale instance to passive mode. As of version 2.2.2, a passive MaxScale behaves similar to an active one with the distinction that it won't perform failover, switchover or rejoin. Even manual versions of these commands will end in error. The passive/active mode differences may be expanded in the future so stay tuned of such changes in MaxScale. To do this, just do the following:

 maxctrl: alter maxscale passive true
OK

You can verify this afterwards by running the command below:

[root@node5 vagrant]#  maxctrl -u admin -p mariadb -h 127.0.0.1:8989 show maxscale|grep 'passive'│              │     "passive": true,                                         │

If you want to check out how to setup highly available with Keepalived, please check this post from MariaDB.

VIP Handling

Additionally, since MaxScale does not have VIP handling built-in itself, you can use Keepalived to handle that for you. You can just use the virtual_ipaddress assigned to the MASTER state node. This is likely to come up with virtual IP management just like MHA does with master_failover_script variable. As mentioned earlier, check out this Keepalived with MaxScale setup blog post by MariaDB.

Conclusion

MariaDB MaxScale is feature-rich and has lots of capability, not just limited to being a proxy and load balancer, but it also offers the failover mechanism that large organizations are looking for. It's almost a one-size-fits-all software, but of course comes with limitations that a certain application might need to in contrast other load balancers such as ProxySQL.

ClusterControl also offers an auto-failover and master auto-detection mechanism, plus cluster and node recovery with the ability to deploy Maxscale and other load balancing technologies.

Each of these tools has its diverse features and functionality, but MariaDB MaxScale is well supported within ClusterControl and can be deployed feasibly along with Keepalived, HAProxy to help you speed up for your daily routine task.

Tags:

MaxScale

failover

MariaDB

Most databases grow in size over time. The growth is not always fast enough to impact the performance of the database, but there are definitely cases where that happens. When it does, we often wonder what could be done to reduce that impact and how can we ensure smooth database operations when dealing with data on a large scale.

First of all, let’s try to define what does a “large data volume” mean? For MySQL or MariaDB it is uncompressed InnoDB. InnoDB works in a way that it strongly benefits from available memory - mainly the InnoDB buffer pool. As long as the data fits there, disk access is minimized to handling writes only - reads are served out of the memory. What happens when the data outgrows memory? More and more data has to be read from disk when there’s a need to access rows, which are not currently cached. When the amount of data increase, the workload switches from CPU-bound towards I/O-bound. It means that the bottleneck is no longer CPU (which was the case when the data fit in memory - data access in memory is fast, data transformation and aggregation is slower) but rather it’s the I/O subsystem (CPU operations on data are way faster than accessing data from disk.) With increased adoption of flash, I/O bound workloads are not that terrible as they used to be in the times of spinning drives (random access is way faster with SSD) but the performance hit is still there.

Another thing we have to keep in mind that we typically only care about the active dataset. Sure, you may have terabytes of data in your schema but if you have to access only last 5GB, this is actually quite a good situation. Sure, it still pose operational challenges, but performance-wise it should still be ok.

Let’s just assume for the purpose of this blog, and this is not a scientific definition, that by the large data volume we mean case where active data size significantly outgrows the size of the memory. It can be 100GB when you have 2GB of memory, it can be 20TB when you have 200GB of memory. The tipping point is that your workload is strictly I/O bound. Bear with us while we discuss some of the options that are available for MySQL and MariaDB.

Partitioning

The historical (but perfectly valid) approach to handling large volumes of data is to implement partitioning. The idea behind it is to split table into partitions, sort of a sub-tables. The split happens according to the rules defined by the user. Let’s take a look at some of the examples (the SQL examples are taken from MySQL 8.0 documentation)

MySQL 8.0 comes with following types of partitioning:

RANGE
LIST
COLUMNS
HASH
KEY

It can also create subpartitions. We are not going to rewrite documentation here but we would still like to give you some insight into how partitions work. To create partitions, you have to define the partitioning key. It can be a column or in case of RANGE or LIST multiple columns that will be used to define how the data should be split into partitions.

HASH partitioning requires user to define a column, which will be hashed. Then, the data will be split into user-defined number of partitions based on that hash value:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY HASH( YEAR(hired) )
PARTITIONS 4;

In this case hash will be created based on the outcome generated by YEAR() function on ‘hired’ column.

KEY partitioning is similar with the exception that user define which column should be hashed and the rest is up to the MySQL to handle.

While HASH and KEY partitions randomly distributed data across the number of partitions, RANGE and LIST let user decide what to do. RANGE is commonly used with time or date:

CREATE TABLE quarterly_report_status (
    report_id INT NOT NULL,
    report_status VARCHAR(20) NOT NULL,
    report_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
)
PARTITION BY RANGE ( UNIX_TIMESTAMP(report_updated) ) (
    PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ),
    PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ),
    PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ),
    PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ),
    PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ),
    PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ),
    PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ),
    PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ),
    PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ),
    PARTITION p9 VALUES LESS THAN (MAXVALUE)
);

It can also be used with other type of columns:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT NOT NULL,
    store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
    PARTITION p0 VALUES LESS THAN (6),
    PARTITION p1 VALUES LESS THAN (11),
    PARTITION p2 VALUES LESS THAN (16),
    PARTITION p3 VALUES LESS THAN MAXVALUE
);

The LIST partitions work based on a list of values that sorts the rows across multiple partitions:

CREATE TABLE employees (
    id INT NOT NULL,
    fname VARCHAR(30),
    lname VARCHAR(30),
    hired DATE NOT NULL DEFAULT '1970-01-01',
    separated DATE NOT NULL DEFAULT '9999-12-31',
    job_code INT,
    store_id INT
)
PARTITION BY LIST(store_id) (
    PARTITION pNorth VALUES IN (3,5,6,9,17),
    PARTITION pEast VALUES IN (1,2,10,11,19,20),
    PARTITION pWest VALUES IN (4,12,13,14,18),
    PARTITION pCentral VALUES IN (7,8,15,16)
);

What is the point in using partitions you may ask? The main point is that the lookups are significantly faster than with non-partitioned table. Let’s say that you want to search for the rows which were created in a given month. If you have several years worth of data stored in the table, this will be a challenge - an index will have to be used and, as we know, indexes help to find rows but accessing those rows will result in a bunch of random reads from the whole table. If you have partitions created on year-month basis, MySQL can just read all the rows from that particular partition - no need for accessing index, no need for doing random reads: just read all the data from the partition, sequentially, and we are all set.

Partitions are also very useful in dealing with data rotation. If MySQL can easily identify rows to delete and map them to single partition, instead of running DELETE FROM table WHERE …, which will use index to locate rows, you can truncate the partition. This is extremely useful with RANGE partitioning - sticking to the example above, if we want to keep data for 2 years only, we can easily create a cron job, which will remove old partition and create a new, empty one for next month.

InnoDB Compression

If we have a large volume of data (not necessarily thinking about databases), the first thing that comes to our mind is to compress it. There are numerous tools that provide an option to compress your files, significantly reducing their size. InnoDB also has an option for that - both MySQL and MariaDB supports InnoDB compression. The main advantage of using compression is the reduction of the I/O activity. Data, when compressed, is smaller thus it is faster to read and to write. Typical InnoDB page is 16KB in size, for SSD this is 4 I/O operations to read or write (SSD typically use 4KB pages). If we manage to compress 16KB into 4KB, we just reduced I/O operations by four. It does not really help much regarding dataset to memory ratio. Actually, it may even make it worse - MySQL, in order to operate on the data, has to decompress the page. Yet it reads compressed page from disk. This results in InnoDB buffer pool storing 4KB of compressed data and 16KB of uncompressed data. Of course, there are algorithms in place to remove unneeded data (uncompressed page will be removed when possible, keeping only compressed one in memory) but you cannot expect too much of an improvement in this area.

It is also important to keep in mind how compression works regarding the storage. Solid state drives are norm for database servers these days and they have a couple of specific characteristics. They are fast, they don’t care much whether traffic is sequential or random (even though they still prefer sequential access over the random). They are expensive for large volumes. They suffer from “worn out” as they can handle a limited number of write cycles. Compression significantly helps here - by reducing the size of the data on disk, we reduce the cost of the storage layer for database. By reducing the size of the data we write to disk, we increase the lifespan of the SSD.

Unfortunately, even if compression helps, for larger volumes of data it still may not be enough. Another step would be to look for something else than InnoDB.

MyRocks

MyRocks is a storage engine available for MySQL and MariaDB that is based on a different concept than InnoDB. My colleague, Sebastian Insausti, has a nice blog about using MyRocks with MariaDB. The gist is, due to its design (it uses Log Structured Merge, LSM), MyRocks is significantly better in terms of compression than InnoDB (which is based on B+Tree structure). MyRocks is designed for handling large amounts of data and to reduce the number of writes. It originated from Facebook, where data volumes are large and requirements to access the data are high. Thus SSD storage - still, on such a large scale every gain in compression is huge. MyRocks can deliver even up to 2x better compression than InnoDB (which means you cut the number of servers by two). It is also designed to reduce the write amplification (number of writes required to handle a change of the row contents) - it requires 10x less writes than InnoDB. This, obviously, reduces I/O load but, even more importantly, it will increase lifespan of a SSD ten times compared with handing the same load using InnoDB). From a performance standpoint, smaller the data volume, the faster the access thus storage engines like that can also help to get the data out of the database faster (even though it was not the highest priority when designing MyRocks).

Columnar Datastores

At some point all we can do is to admit that we cannot handle such volume of data using MySQL. Sure, you can shard it, you can do different things but eventually it just doesn’t make sense anymore. It is time to look for additional solutions. One of them would be to use columnar datastores - databases, which are designed with big data analytics in mind. Sure, they will not help with OLTP type of the traffic but analytics are pretty much standard nowadays as companies try to be data-driven and make decisions based on exact numbers, not random data. There are numerous columnar datastores but we would like to mention here two of those. MariaDB AX and ClickHouse. We have a couple of blogs explaining what MariaDB AX is and how can MariaDB AX be used. What’s important, MariaDB AX can be scaled up in a form of a cluster, improving the performance. ClickHouse is another option for running analytics - ClickHouse can easily be configured to replicate data from MySQL, as we discussed in one of our blog posts. It is fast, it is free and it can also be used to form a cluster and to shard data for even better performance.

Conclusion

We hope that this blog post gave you insights into how large volumes of data can be handled in MySQL or MariaDB. Luckily, there are a couple of options at our disposal and, eventually, if we cannot really make it work, there are good alternatives.

Tags:

big data

MySQL

MariaDB

Databases are intended to efficiently store and query data. The problem is, there are many different types of data we can store: numbers, strings, JSON, geometrical data. Databases use different methods to store different types of data - table structure, indexes. Not always the same way of storing and querying the data is efficient for all of its types, making it quite hard to use one-fits-all solution. As a result, databases try to use different approaches for different data types. For example, in MySQL or MariaDB we have generic, well performing solution like InnoDB, which works fine in majority of the cases, but we also have separate functions to work with JSON data, separate spatial indexes to speed up querying geometric data or fulltext indexes, helping with text data. In this blog, we will take a look at how MariaDB can be used to work with full text data.

Regular B+Tree indexes in InnoDB can also be used to speed up searches for the text data. The main issue is that, due to their structure and nature, they can only help with search for the leftmost prefixes. It is also expensive to index large volumes of text (which, given the limitations of the leftmost prefix, doesn’t really make sense). Why? Let’s take a look at a simple example. We have the following sentence:

“The quick brown fox jumps over the lazy dog”

Using regular indexes in InnoDB we can index the full sentence:

“The quick brown fox jumps over the lazy dog”

The point is, when looking for this data, we have to lookup full leftmost prefix. So a query like:

SELECT text FROM mytable WHERE sentence LIKE “The quick brown fox jumps”;

Will benefit from this index but a query like:

SELECT text FROM mytable WHERE sentence LIKE “quick brown fox jumps”;

Will not. There’s no entry in the index that starts from ‘quick’. There’s an entry in the index that contains ‘quick’ but starts from ‘The’, thus it cannot be used. As a result, it is virtually impossible to efficiently query text data using B+Tree indexes. Luckily, both MyISAM and InnoDB have implemented FULLTEXT indexes, which can be used to actually work with text data on MariaDB. The syntax is slightly different than with regular SELECTs, let’s take a look at what we can do with them. As for data we used random index file from the dump of Wikipedia database. The data structure is as below:

617:11539268:Arthur Hamerschlag
617:11539269:Rooster Cogburn (character)
617:11539275:Membership function
617:11539282:Secondarily Generalized Tonic-Clonic Seizures
617:11539283:Corporate Challenge
617:11539285:Perimeter Mall
617:11539286:1994 St. Louis Cardinals season

As a result, we created table with two BIG INT columns and one VARCHAR.

MariaDB [(none)]> CREATE TABLE ft_data.ft_table (c1 BIGINT, c2 BIGINT, c3 VARCHAR, PRIMARY KEY (c1, c2);

Afterwards we loaded the data:

MariaDB [ft_data]> LOAD DATA INFILE '/vagrant/enwiki-20190620-pages-articles-multistream-index17.txt-p11539268p13039268' IGNORE INTO  TABLE ft_table COLUMNS TERMINATED BY ':';

MariaDB [ft_data]> ALTER TABLE ft_table ADD FULLTEXT INDEX idx_ft (c3);
Query OK, 0 rows affected (5.497 sec)
Records: 0  Duplicates: 0  Warnings: 0

We also created the FULLTEXT index. As you can see, the syntax for that is similar to regular index, we just had to pass the information about the index type as it defaults to B+Tree. Then we were ready to run some queries.

MariaDB [ft_data]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('Starship');
+-----------+----------+------------------------------------+
| c1        | c2       | c3                                 |
+-----------+----------+------------------------------------+
| 119794610 | 12007923 | Starship Troopers 3                |
| 250627749 | 12479782 | Miranda class starship (Star Trek) |
| 250971304 | 12481409 | Starship Hospital                  |
| 253430758 | 12489743 | Starship Children's Hospital       |
+-----------+----------+------------------------------------+
4 rows in set (0.009 sec)

As you can see, the syntax for the SELECT is slightly different than what we are used to. For fulltext search you should use MATCH() … AGAINST () syntax, where in MATCH() you pass the column or columns you want to search and in AGAINST() you pass coma-delimited list of keywords. You can see from the output that by default search is case insensitive and it searches the whole string, not just the beginning as it is with B+Tree indexes. Let’s compare how it will look like if we would add normal index on the ‘c3’ column - FULLTEXT and B+Tree indexes can coexist on the same column without any problems. Which would be used is decided based on the SELECT syntax.

MariaDB [ft_data]> ALTER TABLE ft_data.ft_table ADD INDEX idx_c3 (c3);
Query OK, 0 rows affected (1.884 sec)
Records: 0  Duplicates: 0  Warnings: 0

After the index has been created, let’s take a look at the search output:

MariaDB [ft_data]> SELECT * FROM ft_data.ft_table WHERE c3 LIKE 'Starship%';
+-----------+----------+------------------------------+
| c1        | c2       | c3                           |
+-----------+----------+------------------------------+
| 253430758 | 12489743 | Starship Children's Hospital |
| 250971304 | 12481409 | Starship Hospital            |
| 119794610 | 12007923 | Starship Troopers 3          |
+-----------+----------+------------------------------+
3 rows in set (0.001 sec)

As you can see, our query returned only three rows. This is expected as we are looking for rows which only start with a string ‘Starship’.

MariaDB [ft_data]> EXPLAIN SELECT * FROM ft_data.ft_table WHERE c3 LIKE 'Starship%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ft_table
         type: range
possible_keys: idx_c3,idx_ft
          key: idx_c3
      key_len: 103
          ref: NULL
         rows: 3
        Extra: Using where; Using index
1 row in set (0.000 sec)

When we check the EXPLAIN output we can see that the index has been used to lookup for the data. But what if we want to look for all the rows which contain the string ‘Starship’, no matter if it is at the beginning or not. We have to write the following query:

MariaDB [ft_data]> SELECT * FROM ft_data.ft_table WHERE c3 LIKE '%Starship%';
+-----------+----------+------------------------------------+
| c1        | c2       | c3                                 |
+-----------+----------+------------------------------------+
| 250627749 | 12479782 | Miranda class starship (Star Trek) |
| 253430758 | 12489743 | Starship Children's Hospital       |
| 250971304 | 12481409 | Starship Hospital                  |
| 119794610 | 12007923 | Starship Troopers 3                |
+-----------+----------+------------------------------------+
4 rows in set (0.084 sec)

The output matches what we got from the fulltext search.

MariaDB [ft_data]> EXPLAIN SELECT * FROM ft_data.ft_table WHERE c3 LIKE '%Starship%'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ft_table
         type: index
possible_keys: NULL
          key: idx_c3
      key_len: 103
          ref: NULL
         rows: 473367
        Extra: Using where; Using index
1 row in set (0.000 sec)

The EXPLAIN is different though - as you can see it still uses index but this time it does a full index scan. That is possible as we indexed full c3 column so all the data is available in the index. Index scan will result in random reads from the table but for such small table MariaDB decided it more efficient than reading the whole table. Please note the execution time: 0.084s for our regular SELECT. Comparing this to fulltext query, it is bad:

MariaDB [ft_data]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('Starship');
+-----------+----------+------------------------------------+
| c1        | c2       | c3                                 |
+-----------+----------+------------------------------------+
| 119794610 | 12007923 | Starship Troopers 3                |
| 250627749 | 12479782 | Miranda class starship (Star Trek) |
| 250971304 | 12481409 | Starship Hospital                  |
| 253430758 | 12489743 | Starship Children's Hospital       |
+-----------+----------+------------------------------------+
4 rows in set (0.001 sec)

As you can see, query which use FULLTEXT index took 0.001s to execute. We are talking here about orders of magnitude differences.

MariaDB [ft_data]> EXPLAIN SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('Starship')\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ft_table
         type: fulltext
possible_keys: idx_ft
          key: idx_ft
      key_len: 0
          ref:
         rows: 1
        Extra: Using where
1 row in set (0.000 sec)

Here’s how the EXPLAIN output look like for the query using FULLTEXT index - that fact is indicated by type: fulltext.

Fulltext queries also have some other features. It is possible, for example, to return rows which might be relevant to the search term. MariaDB looks for words located near the row that you search for and then run a search also for them.

MariaDB [(none)]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('Starship');
+-----------+----------+------------------------------------+
| c1        | c2       | c3                                 |
+-----------+----------+------------------------------------+
| 119794610 | 12007923 | Starship Troopers 3                |
| 250627749 | 12479782 | Miranda class starship (Star Trek) |
| 250971304 | 12481409 | Starship Hospital                  |
| 253430758 | 12489743 | Starship Children's Hospital       |
+-----------+----------+------------------------------------+
4 rows in set (0.001 sec)

In our case, word ‘Starship’ can be related to words like ‘Troopers’, ‘class’, ‘Star Trek’, ‘Hospital’ etc. To use this feature we should run the query with “WITH QUERY EXPANSION” modifyer:

MariaDB [(none)]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('Starship' WITH QUERY EXPANSION) LIMIT 10;
+-----------+----------+-------------------------------------+
| c1        | c2       | c3                                  |
+-----------+----------+-------------------------------------+
| 250627749 | 12479782 | Miranda class starship (Star Trek)  |
| 119794610 | 12007923 | Starship Troopers 3                 |
| 253430758 | 12489743 | Starship Children's Hospital        |
| 250971304 | 12481409 | Starship Hospital                   |
| 277700214 | 12573467 | Star ship troopers                  |
|  86748633 | 11886457 | Troopers Drum and Bugle Corps       |
| 255120817 | 12495666 | Casper Troopers                     |
| 396408580 | 13014545 | Battle Android Troopers             |
|  12453401 | 11585248 | Star trek tos                       |
|  21380240 | 11622781 | Who Mourns for Adonais? (Star Trek) |
+-----------+----------+-------------------------------------+
10 rows in set (0.002 sec)

The output contained large number of rows but this sample is enough to see how it works. The query returned rows like:

“Troopers Drum and Bugle Corps”

“Battle Android Troopers”

Those are based on the search for the word ‘Troopers’. It also returned rows with strings like:

“Star trek tos”

“Who Mourns for Adonais? (Star Trek)”

Which, obviously, are based on the lookup for the word ‘Start Trek’.

If you would need more control over the term you want to search for, you can use “IN BOOLEAN MODE”. It allows to use additional operators. The full list is in the documentation, we’ll show just a couple of examples.

Let’s say we want to search not just for the word ‘Star’ but also for other words which start with the string ‘Star’:

MariaDB [(none)]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('Star*' IN BOOLEAN MODE) LIMIT 10;
+----------+----------+---------------------------------------------------+
| c1       | c2       | c3                                                |
+----------+----------+---------------------------------------------------+
| 20014704 | 11614055 | Ringo Starr and His third All-Starr Band-Volume 1 |
|   154810 | 11539775 | Rough blazing star                                |
|   154810 | 11539787 | Great blazing star                                |
|   234851 | 11540119 | Mary Star of the Sea High School                  |
|   325782 | 11540427 | HMS Starfish (19S)                                |
|   598616 | 11541589 | Dwarf (star)                                      |
|  1951655 | 11545092 | Yellow starthistle                                |
|  2963775 | 11548654 | Hydrogenated starch hydrolysates                  |
|  3248823 | 11549445 | Starbooty                                         |
|  3993625 | 11553042 | Harvest of Stars                                  |
+----------+----------+---------------------------------------------------+
10 rows in set (0.001 sec)

As you can see, in the output we have rows that contain strings like ‘Stars’, ‘Starfish’ or ‘starch’.

Another use case for the BOOLEAN mode. Let’s say we want to search for rows which are relevant to the House of Representatives in Pennsylvania. If we will run regular query, we will get results somehow related to any of those strings:

MariaDB [ft_data]> SELECT COUNT(*) FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('House, Representatives, Pennsylvania');
+----------+
| COUNT(*) |
+----------+
|     1529 |
+----------+
1 row in set (0.005 sec)

MariaDB [ft_data]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('House, Representatives, Pennsylvania') LIMIT 20;
+-----------+----------+--------------------------------------------------------------------------+
| c1        | c2       | c3                                                                       |
+-----------+----------+--------------------------------------------------------------------------+
| 198783294 | 12289308 | Pennsylvania House of Representatives, District 175                      |
| 236302417 | 12427322 | Pennsylvania House of Representatives, District 156                      |
| 236373831 | 12427423 | Pennsylvania House of Representatives, District 158                      |
| 282031847 | 12588702 | Pennsylvania House of Representatives, District 47                       |
| 282031847 | 12588772 | Pennsylvania House of Representatives, District 196                      |
| 282031847 | 12588864 | Pennsylvania House of Representatives, District 92                       |
| 282031847 | 12588900 | Pennsylvania House of Representatives, District 93                       |
| 282031847 | 12588904 | Pennsylvania House of Representatives, District 94                       |
| 282031847 | 12588909 | Pennsylvania House of Representatives, District 193                      |
| 303827502 | 12671054 | Pennsylvania House of Representatives, District 55                       |
| 303827502 | 12671089 | Pennsylvania House of Representatives, District 64                       |
| 337545922 | 12797838 | Pennsylvania House of Representatives, District 95                       |
| 219202000 | 12366957 | United States House of Representatives House Resolution 121              |
| 277521229 | 12572732 | United States House of Representatives proposed House Resolution 121     |
|  20923615 | 11618759 | Special elections to the United States House of Representatives          |
|  20923615 | 11618772 | List of Special elections to the United States House of Representatives  |
|  37794558 | 11693157 | Nebraska House of Representatives                                        |
|  39430531 | 11699551 | Belgian House of Representatives                                         |
|  53779065 | 11756435 | List of United States House of Representatives elections in North Dakota |
|  54048114 | 11757334 | 2008 United States House of Representatives election in North Dakota     |
+-----------+----------+--------------------------------------------------------------------------+
20 rows in set (0.003 sec)

As you can see, we found some useful data but we also found data which is totally not relevant to our search. Luckily, we can refine such query:

MariaDB [ft_data]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('+House, +Representatives, +Pennsylvania' IN BOOLEAN MODE);
+-----------+----------+-----------------------------------------------------+
| c1        | c2       | c3                                                  |
+-----------+----------+-----------------------------------------------------+
| 198783294 | 12289308 | Pennsylvania House of Representatives, District 175 |
| 236302417 | 12427322 | Pennsylvania House of Representatives, District 156 |
| 236373831 | 12427423 | Pennsylvania House of Representatives, District 158 |
| 282031847 | 12588702 | Pennsylvania House of Representatives, District 47  |
| 282031847 | 12588772 | Pennsylvania House of Representatives, District 196 |
| 282031847 | 12588864 | Pennsylvania House of Representatives, District 92  |
| 282031847 | 12588900 | Pennsylvania House of Representatives, District 93  |
| 282031847 | 12588904 | Pennsylvania House of Representatives, District 94  |
| 282031847 | 12588909 | Pennsylvania House of Representatives, District 193 |
| 303827502 | 12671054 | Pennsylvania House of Representatives, District 55  |
| 303827502 | 12671089 | Pennsylvania House of Representatives, District 64  |
| 337545922 | 12797838 | Pennsylvania House of Representatives, District 95  |
+-----------+----------+-----------------------------------------------------+
12 rows in set (0.001 sec)

As you can see, by adding ‘+’ operator we made it clear we are interested only in the output where given word exists. As a result the data we got in response is exactly what we were looking for.

We can also exclude words from the search. Let’s say that we are looking for flying things but our search results are contaminated by different flying animals we are not interested in. We can easily get rid of foxes, squirrels and frogs:

MariaDB [ft_data]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('+flying -fox* -squirrel* -frog*' IN BOOLEAN MODE) LIMIT 10;
+----------+----------+-----------------------------------------------------+
| c1       | c2       | c3                                                  |
+----------+----------+-----------------------------------------------------+
| 13340153 | 11587884 | List of surviving Boeing B-17 Flying Fortresses     |
| 16774061 | 11600031 | Flying Dutchman Funicular                           |
| 23137426 | 11631421 | 80th Flying Training Wing                           |
| 26477490 | 11646247 | Kites and Kite Flying                               |
| 28568750 | 11655638 | Fear of Flying                                      |
| 28752660 | 11656721 | Flying Machine (song)                               |
| 31375047 | 11666654 | Flying Dutchman (train)                             |
| 32726276 | 11672784 | Flying Wazuma                                       |
| 47115925 | 11728593 | The Flying Locked Room! Kudou Shinichi's First Case |
| 64330511 | 11796326 | The Church of the Flying Spaghetti Monster          |
+----------+----------+-----------------------------------------------------+
10 rows in set (0.001 sec)

Final feature we would like to show is the ability to search for the exact quote:

MariaDB [ft_data]> SELECT * FROM ft_data.ft_table WHERE MATCH(c3) AGAINST ('"People\'s Republic of China"' IN BOOLEAN MODE) LIMIT 10;
+-----------+----------+------------------------------------------------------------------------------------------------------+
| c1        | c2       | c3                                                                                                   |
+-----------+----------+------------------------------------------------------------------------------------------------------+
|  12093896 | 11583713 | Religion in the People's Republic of China                                                           |
|  25280224 | 11640533 | Political rankings in the People's Republic of China                                                 |
|  43930887 | 11716084 | Cuisine of the People's Republic of China                                                            |
|  62272294 | 11789886 | Office of the Commissioner of the Ministry of Foreign Affairs of the People's Republic of China in t |
|  70970904 | 11824702 | Scouting in the People's Republic of China                                                           |
| 154301063 | 12145003 | Tibetan culture under the People's Republic of China                                                 |
| 167640800 | 12189851 | Product safety in the People's Republic of China                                                     |
| 172735782 | 12208560 | Agriculture in the people's republic of china                                                        |
| 176185516 | 12221117 | Special Economic Zone of the People's Republic of China                                              |
| 197034766 | 12282071 | People's Republic of China and the United Nations                                                    |
+-----------+----------+------------------------------------------------------------------------------------------------------+
10 rows in set (0.001 sec)

As you can see, fulltext search in MariaDB works quite well, it is also faster and more flexible than search using B+Tree indexes. Please keep in mind though that this is by no means a way of handling large volumes of data - with the data growth, feasibility of this solution will reduce. Still, for the small data sets this solution is perfectly valid. It can definitely buy you more time to, eventually, implement dedicated full text search solutions like Sphinx or Lucene. Of course, all of the features we described are available in MariaDB clusters deployed from ClusterControl.

Tags:

MariaDB

MariaDB Server is no longer a straight imitate of MySQL. It grew into a mature fork, which implements new functionalities similar to what proprietary database systems offer in the upstream. MariaDB 10.3 greatly extends the list of enterprise features, and with new SQL_MODE=Oracle becomes an exciting choice for companies that would like to migrate their Oracle databases to an open source database. However, operational management is an area where there is still some catching up to do, and MariaDB requires that you build your own scripts.

Perhaps a good opportunity to look into an automation system?

Automated procedures are accurate and consistent. They can give you much-needed repeatability so you can minimize the risk of change in the production systems. However, as modern open source databases develop so fast, it's more challenging to keep your management systems on par with all new features.

The natural next step is to look for automation platforms. There are many platforms that you can use to deploy systems. Puppet, Chef, and Ansible are probably the best examples of that new trend. These platforms are suitable for the fast deployment of various software services. They are perfect for deployments, but still require you to maintain the code, cover feature changes, and usually, they cover just one aspect of your work. Things like backups, performance, and maintenance still need external tools or scripts.

On the other side, we have cloud platforms, with polished interfaces and a variety of additional services for a fully managed experience. However, it may not be feasible; for instance, hybrid environments where you might be using the cloud, but with still a significant on-prem footprint.

So, how about a dedicated management layer for your MariaDB databases?

ClusterControl was designed to automate the deployment and management of MariaDB as well as other open-source databases. At the core of ClusterControl is functionality that lets you automate the database tasks you have to perform regularly, like deploying new database instances and clusters, managing backups, high availability and failover, topology changes, upgrades, scaling new nodes and more.

ClusterControl installation

To start with ClusterControl, you need a dedicated virtual machine or host. The VM and supported systems requirements are described here. At the minimum you can start from tiny VM 2 GB RAM, 2 CPU cores and 20 GB storage space, either on-prem or in the cloud.

The primary installation method is to download an installation wizard that walks you through all the steps (OS configuration, package download and installation, metadata creation, and others).

For environments without internet access, you can use the offline installation process.

ClusterControl is agentless so you don't need to install additional software. It requires only SSH access to the database hosts. It also supports agent-based monitoring for higher resolution monitoring data.

To set up passwordless SSH to all target nodes (ClusterControl and all database hosts), run the following commands on the ClusterControl server:

$ ssh-keygen -t rsa # press enter on all prompts
$ ssh-copy-id -i ~/.ssh/id_rsa [ClusterControl IP address]
$ ssh-copy-id -i ~/.ssh/id_rsa [Database nodes IP address] # repeat this to all target database nodes

One of the most convenient ways to try out cluster control maybe the option to run it in docker container.

docker run -d --name clustercontrol \
--network db-cluster \
--ip 192.168.10.10 \
-h clustercontrol \
-p 5000:80 \
-p 5001:443 \
-v /storage/clustercontrol/cmon.d:/etc/cmon.d \
-v /storage/clustercontrol/datadir:/var/lib/mysql \
-v /storage/clustercontrol/sshkey:/root/.ssh \
-v /storage/clustercontrol/cmonlib:/var/lib/cmon \
-v /storage/clustercontrol/backups:/root/backups \
severalnines/clustercontrol

After successful deployment, you should be able to access the ClusterControl Web UI at {host's IP address}:{host's port}, for example:

HTTP: http://192.168.10.100:5000/clustercontrol
HTTPS: https://192.168.10.100:5001/clustercontrol

Installation of MariaDB Cluster

Once we enter the ClusterControl interface, the first thing to do is to deploy a new database or import an existing one. The version 1.7.2 introduced support for version 10.3 (along with 10.0,10.1,10.2). In 1.7.3 which was released this week, we can see the improved deployment of installation in the cloud.

ClusterControl: Deploy/Import

At the time of writing this blog, the current versions are 10.3.16. Latest packages are picked up by default. Select the option "Deploy Database Cluster" and follow the instructions that appear.

Now is the time to provide data needed for the connection between ClusterControl and DB nodes. At this step, you would have clean VM's or images of OS that you use inside your organization. When choosing MariaDB, we must specify User, Key or Password and port to connect by SSH to our servers.

ClusterControl: Deploy Database Cluster

After setting up the SSH access information, we must enter the data to access our database, for MariaDB that will be the superuser root. We can also specify which repository to use. You can have three types of repositories when deploying database server/cluster using ClusterControl:

Use Vendor Repository. Provision software by setting up and using the database vendor's preferred software repository. ClusterControl will install the latest version of what is provided by the database vendor repository.
Do Not Setup Vendor Repositories. No repositories will be set up by ClusterControl. ClusterControl will rely on the system configuration (your default repository files).
Create and mirror the current database vendor's repository and then deploy using the local mirrored repository. This allows you to "freeze" the current versions of the software packages.

When all is set, hit the deploy button. The deployment process will also take care of the installation of additional tools provided by MariaDB like mariabackup and tools from external vendors, popular in database administration.

Import a New Cluster

We also have the option to manage an existing setup by importing it into ClusterControl. Such an environment can be created by ClusterControl or other methods (puppet, chef, ansible, docker …). The process is simple and doesn't require specialized knowledge.

First, we must enter the SSH access credentials to our existing database servers. Then we enter the access credentials to our database, the server data directory, and the version. We add the nodes by IP or hostname, in the same way as when we deploy, and press on Import. Once the task is finished, we are ready to manage our cluster from ClusterControl. At this point, we can also define the options for the node or cluster auto recovery.

ClusterControl: Import existing 10.3 database cluster

Scaling MariaDB, Adding More Nodes to DB Cluster

With ClusterControl, adding more servers to the server is an easy step. You can do that from the GUI or CLI. For more advanced users, you can use ClusterControl Developer Studio and write a resource base condition to expand your cluster automatically.

ClusterControl: Adding MariaDB Node

ClusterControl supports an option to use an existing backup, so there is no need to overwhelm the production master node with additional work.

Securing MariaDB

The default MariaDB installation comes with relaxed security. This has been improved with the recent versions however production-grade systems still require tweaks in the default my.cnf configuration. ClusterControl deployments come with non-default my.cnf settings (different for different cluster types).

ClusterControl removes human error and provides access to a suite of security features, to automatically protect your databases from hacks and other threats.

ClusterControl: Security Panel

ClusterControl enables SSL support for MariaDB connections. Enabling SSL adds another level of security for communication between the applications (including ClusterControl) and database. MariaDB clients open encrypted connections to the database servers and verify the identity of those servers before transferring any sensitive information.

ClusterControl will execute all necessary steps, including creating certificates on all database nodes. Such certificates can be maintained later on in the Key Management tab.

With ClusterControl you can also enable auditing. It uses the audit plugin provided by MariaDB. Continuous auditing is an imperative task for monitoring your database environment. By auditing your database, you can achieve accountability for actions taken or content accessed. Moreover, the audit may include some critical system components, such as the ones associated with financial data to support a precise set of regulations like SOX, or the EU GDPR regulation. The guided process lets you choose what should be audited and how to maintain the audit log files.

Monitoring and Alerting

When working with database systems, you should be able to monitor them. That will enable you to identify trends, plan for upgrades or improvements or react effectively to any problems or errors that may arise.

ClusterControl: Overview

The new ClusterControl is using Prometheus as the data store with PromQL query language. The list of dashboards includes Server General, Server Caches, InnoDB Metrics, Replication Master, Replication Slave, System Overview, and Cluster Overview Dashboards.

ClusterControl: DashBoard

ClusterControl installs Prometheus agents, configures metrics and maintains access to Prometheus exporters configuration via its GUI, so you can better manage parameter configuration like collector flags for the exporters (Prometheus).

As a database operator, we need to be informed whenever something critical occurs in our database. The three main methods in ClusterControl to get an alert includes:

email notifications
integrations
advisors

ClusterControl: Integration Services

You can set the email notifications on a user level. Go to Settings > Email Notifications. Where you can choose between criticality and type of alert to be sent.

The next method is to use the Integration services. This is to pass the specific category of events to the other service like ServiceNow tickets, Slack, PagerDuty, etc. so you can create advanced notification methods and integrations within your organization.

The last one is to involve sophisticated metrics analysis in the Advisor section, where you can build intelligent checks and triggers.

ClusterControl: Advisors

SQL Monitoring

The SQL Monitoring is divided into three sections.

Top Queries - presents the information about queries that take a significant chunk of resources.
Query Monitor: Top queries
Running Queries - it’s a process list of information combined from all database cluster nodes into one view. You can use that to kill queries that affect your database operations.
Query Monitor: Running Queries
Query Outliers - present the list of queries with execution time longer than average.
Query Monitor: Query Outliers

Backup and Recovery

Now that you have your MariaDB up and running, and have your monitoring in place, it is time for the next step: ensure you have a backup of your data.

ClusterControl: Backup repository

ClusterControl provides an interface for MariaDB backup management with support for scheduling and creative reports. It gives you two options for backup methods.

Logical backup (text): mysqldump
Binary backups: xtrabackup (lower versions), mariabackup

A good backup strategy is a critical part of any database management system. ClusterControl offers many options for backups and recovery/restore.

ClusterControl backup retention is configurable; you can choose to retain your backup for any time period or to never delete backups. AES256 encryption is employed to secure your backups against rogue elements. For rapid recovery, backups can be restored directly into a new cluster - ClusterControl handles the full restore process from the launch of a new database setup to the recovery of data, removing error-prone manual steps from the process.

Backups can be automatically verified upon completion, and then uploaded to cloud storage services (AWS, Azure and Google). Different retention policies can be defined for local backups in the data center as well as backups that are uploaded in the cloud.

Node and cluster auto-recovery

ClusterControl provides advanced support for failure detection and handling. It also allows you to deploy different proxies to integrate them with your HA stack, so there is no need to adjust application connection string or DNS entry to redirect the application to the new master node.

When the master server is down, ClusterControl will create a job to perform automatic failover. ClusterControl does all the background work to elect a new master, deploy failover slave servers, and configure load balancers.

ClusterControl automatic failover was designed with the following principles:

Make sure the master is really dead before you failover
Failover only once
Do not failover to an inconsistent slave
Only write to the master
Do not automatically recover the failed master

With the built-in algorithms, failover can often be performed pretty quickly so you can assure the highest SLA's for your database environment.

ClusterControl: Auto Recovery

The process is highly configurable. It comes with multiple parameters that you can use to adopt recovery to the specifics of your environment. Among the different options you can find replication_stop_on_error, replication_auto_rebuild_slave, replication_failover_blacklist, replication_failover_whitelist, replication_skip_apply_missing_txs, replication_onfail_failover_script and many others.

Failover is the process of moving to a healthy standby component, during a failure or maintenance event, in order to preserve uptime. The quicker it can be done, the faster you can be back online. If you're looking at minimizing downtime and meet your SLAs through an automated approach for TimescaleDB, then this blog is for you.

MaxScale Load Balancer

In addition to MariaDB 10.3, ClusterControl adds an option of MaxScale 2.3 load balancer. MaxScale is a SQL-aware proxy that can be used to build highly available environments. It comes with numerous features, however, the main goal is to enable load balancing and high availability.

ClusterControl: MaxScale

MaxScale can be used to track the health of the master MariaDB node and, should it fail, perform a fast, automatic failover. Automated failover is crucial in building up a highly available solution that can recover promptly from the failure.

Load Balance Database Sessions

Read-write splitting is a critical feature to allow read scaling. It is enough for the application to connect to the MaxScale, and it detects the topology, determine which MariaDB acts as a master and which act as slaves. It routes the traffic accordingly to this.

Summary

We hope that this blog helps you to get familiar with ClusterControl and MariaDB 10.3 administration modules. The best option is to download ClusterControl and test each of them.

Tags:

MariaDB

management

automation

Despite the fact that PHP 5 has reached end-of-life, there are still legacy applications built on top of it that need to run in production or test environments. If you are installing PHP packages via operating system repository, there is still a chance you will end up with PHP 5 packages, e.g. CentOS 7 operating system. Having said that, there is always a way to make your legacy applications run with the newer database versions, and thus take advantage of new features.

In this blog post, we’ll walk you through how we can run PHP 5 applications with the latest version of MySQL 8.0 on CentOS 7 operating system. This blog is based on actual experience with an internal project that required PHP 5 application to be running alongside our new MySQL 8.0 in a new environment. Note that it would work best to run the latest version of PHP 7 alongside MySQL 8.0 to take advantage of all of the significant improvements introduced in the newer versions.

PHP and MySQL on CentOS 7

First of all, let's see what files are being provided by php-mysql package:

$ cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
$ repoquery -q -l --plugins php-mysql
/etc/php.d/mysql.ini
/etc/php.d/mysqli.ini
/etc/php.d/pdo_mysql.ini
/usr/lib64/php/modules/mysql.so
/usr/lib64/php/modules/mysqli.so
/usr/lib64/php/modules/pdo_mysql.so

By default, if we installed the standard LAMP stack components come with CentOS 7, for example:

$ yum install -y httpd php php-mysql php-gd php-curl mod_ssl

You would get the following related packages installed:

$ rpm -qa | egrep 'php-mysql|mysql|maria'
php-mysql-5.4.16-46.el7.x86_64
mariadb-5.5.60-1.el7_5.x86_64
mariadb-libs-5.5.60-1.el7_5.x86_64
mariadb-server-5.5.60-1.el7_5.x86_64

The following MySQL-related modules will then be loaded into PHP:

$ php -m | grep mysql
mysql
mysqli
pdo_mysql

When looking at the API version reported by phpinfo() for MySQL-related clients, they are all matched to the MariaDB version that we have installed:

$ php -i | egrep -i 'client.*version'
Client API version => 5.5.60-MariaDB
Client API library version => 5.5.60-MariaDB
Client API header version => 5.5.60-MariaDB
Client API version => 5.5.60-MariaDB

At this point, we can conclude that the installed php-mysql module is built and compatible with MariaDB 5.5.60.

Installing MySQL 8.0

However, in this project, we are required to run on MySQL 8.0 so we chose Percona Server 8.0 to replace the default existing MariaDB installation we have on that server. To do that, we have to install Percona Repository and enable the Percona Server 8.0 repository:

$ yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
$ percona-release setup ps80
$ yum install percona-server-server

However, we got the following error after running the very last command:

--> Finished Dependency Resolution
Error: Package: 1:mariadb-5.5.60-1.el7_5.x86_64 (@base)
           Requires: mariadb-libs(x86-64) = 1:5.5.60-1.el7_5
           Removing: 1:mariadb-libs-5.5.60-1.el7_5.x86_64 (@anaconda)
               mariadb-libs(x86-64) = 1:5.5.60-1.el7_5
           Obsoleted By: percona-server-shared-compat-8.0.15-6.1.el7.x86_64 (ps-80-release-x86_64)
               Not found
Error: Package: 1:mariadb-server-5.5.60-1.el7_5.x86_64 (@base)
           Requires: mariadb-libs(x86-64) = 1:5.5.60-1.el7_5
           Removing: 1:mariadb-libs-5.5.60-1.el7_5.x86_64 (@anaconda)
               mariadb-libs(x86-64) = 1:5.5.60-1.el7_5
           Obsoleted By: percona-server-shared-compat-8.0.15-6.1.el7.x86_64 (ps-80-release-x86_64)
               Not found
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest

The above simply means that the Percona Server shared compat package shall obsolete the mariadb-libs-5.5.60, which is required by the already installed mariadb-server packages. Since this is a plain new server, removing the existing MariaDB packages is not a big issue. Let's remove them first and then try to install the Percona Server 8.0 once more:

$ yum remove mariadb mariadb-libs
...
Resolving Dependencies
--> Running transaction check
---> Package mariadb-libs.x86_64 1:5.5.60-1.el7_5 will be erased
--> Processing Dependency: libmysqlclient.so.18()(64bit) for package: perl-DBD-MySQL-4.023-6.el7.x86_64
--> Processing Dependency: libmysqlclient.so.18()(64bit) for package: 2:postfix-2.10.1-7.el7.x86_64
--> Processing Dependency: libmysqlclient.so.18()(64bit) for package: php-mysql-5.4.16-46.el7.x86_64
--> Processing Dependency: libmysqlclient.so.18(libmysqlclient_18)(64bit) for package: perl-DBD-MySQL-4.023-6.el7.x86_64
--> Processing Dependency: libmysqlclient.so.18(libmysqlclient_18)(64bit) for package: 2:postfix-2.10.1-7.el7.x86_64
--> Processing Dependency: libmysqlclient.so.18(libmysqlclient_18)(64bit) for package: php-mysql-5.4.16-46.el7.x86_64
--> Processing Dependency: mariadb-libs(x86-64) = 1:5.5.60-1.el7_5 for package: 1:mariadb-5.5.60-1.el7_5.x86_64
---> Package mariadb-server.x86_64 1:5.5.60-1.el7_5 will be erased
--> Running transaction check
---> Package mariadb.x86_64 1:5.5.60-1.el7_5 will be erased
---> Package perl-DBD-MySQL.x86_64 0:4.023-6.el7 will be erased
---> Package php-mysql.x86_64 0:5.4.16-46.el7 will be erased
---> Package postfix.x86_64 2:2.10.1-7.el7 will be erased

Removing mariadb-libs will also remove other packages that depend on this from the system. Our primary concern is the php-mysql packages which will be removed because of the dependency on libmysqlclient.so.18 provided by mariadb-libs. We will fix that later.

After that, we should be able to install Percona Server 8.0 without error:

$ yum install percona-server-server

At this point, here are MySQL related packages that we have in the server:

$ rpm -qa | egrep 'php-mysql|mysql|maria|percona'
percona-server-client-8.0.15-6.1.el7.x86_64
percona-server-shared-8.0.15-6.1.el7.x86_64
percona-server-server-8.0.15-6.1.el7.x86_64
percona-release-1.0-11.noarch
percona-server-shared-compat-8.0.15-6.1.el7.x86_64

Notice that we don't have php-mysql packages that provide modules to connect our PHP application with our freshly installed Percona Server 8.0 server. We can confirm this by checking the loaded PHP module. You should get empty output with the following command:

$ php -m | grep mysql

Let's install it again:

$ yum install php-mysql
$ systemctl restart httpd

Now we do have them and are loaded into PHP:

$ php -m | grep mysql
mysql
mysqli
pdo_mysql

And we can also confirm that by looking at the PHP info via command line:

$ php -i | egrep -i 'client.*version'
Client API version => 5.6.28-76.1
Client API library version => 5.6.28-76.1
Client API header version => 5.5.60-MariaDB
Client API version => 5.6.28-76.1

Notice the difference on the Client API library version and the API header version. We will see the after affect of that later during the test.

Let's start our MySQL 8.0 server to test out our PHP5 application. Since we had MariaDB use the datadir in /var/lib/mysql, we have to wipe it out first, re-initialize the datadir, assign proper ownership and start it up:

$ rm -Rf /var/lib/mysql
$ mysqld --initialize
$ chown -Rf mysql:mysql /var/lib/mysql
$ systemctl start mysql

Grab the temporary MySQL root password generated by Percona Server from the MySQL error log file:

$ grep root /var/log/mysqld.log
2019-07-22T06:54:39.250241Z 5 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: 1wAXsGrISh-D

Use it to login during the first time login of user root@localhost. We have to change the temporary password to something else before we can perform any further action on the server:

$ mysql -uroot -p

mysql> ALTER USER root@localhost IDENTIFIED BY 'myP455w0rD##';

Then, proceed to create our database resources required by our application:

mysql> CREATE SCHEMA testdb;
mysql> CREATE USER testuser@localhost IDENTIFIED BY 'password';
mysql> GRANT ALL PRIVILEGES ON testdb.* TO testuser@localhost;

Once done, import the existing data from backup into the database, or create your database objects manually. Our database is now ready to be used by our application.

Errors and Warnings

In our application, we had a simple test file to make sure the application is able to connect via socket, or in other words, localhost on port 3306 to eliminate all database connections via network. Immediately, we would get the version mismatch warning:

$ php -e test_mysql.php
PHP Warning:  mysqli::mysqli(): Headers and client library minor version mismatch. Headers:50560 Library:50628 in /root/test_mysql.php on line 9

At the same time, you would also encounter the authentication error with php-mysql module:

$ php -e test_mysql.php
PHP Warning:  mysqli::mysqli(): (HY000/2059): Authentication plugin 'caching_sha2_password' cannot be loaded: /usr/lib64/mysql/plugin/caching_sha2_password.so: cannot open shared object file: No such file or directory in /root/test_mysql.php on line 9

Or, if you were running with MySQL native driver library (php-mysqlnd), you would get the following error:

$ php -e test_mysql.php
PHP Warning:  mysqli::mysqli(): The server requested authentication method unknown to the client [caching_sha2_password] in /root/test_mysql.php on line 9

Plus, there would be also another issue you would see regarding charset:

PHP Warning:  mysqli::mysqli(): Server sent charset (255) unknown to the client. Please, report to the developers in /root/test_mysql.php on line 9

Solutions and Workarounds

Authentication plugin

Neither php-mysqlnd nor php-mysql library for PHP5 supports the new authentication method for MySQL 8.0. Starting from MySQL 8.0.4 authentication method has been changed to 'caching_sha2_password', which offers a more secure password hashing if compare to 'mysql_native_password' which default in the previous versions.

To allow backward compatibility on our MySQL 8.0. Inside MySQL configuration file, add the following line under [mysqld] section:

default-authentication-plugin=mysql_native_password

Restart MySQL server and you should be good. If the database user has been created before the above changes e.g, via backup and restore, re-create the user by using DROP USER and CREATE USER statements. MySQL will follow the new default authentication plugin when creating a new user.

Minor version mismatch

With php-mysql package, if we check the library version installed, we would notice the difference:

$ php -i | egrep -i 'client.*version'
Client API version => 5.6.28-76.1
Client API library version => 5.6.28-76.1
Client API header version => 5.5.60-MariaDB
Client API version => 5.6.28-76.1

The PHP library is compiled with MariaDB 5.5.60 libmysqlclient, while the client API version is on version 5.6.28, provided by percona-server-shared-compat package. Despite the warning, you can still get a correct response from the server.

To suppress this warning on library version mismatch, use php-mysqlnd package, which does not depend on MySQL Client Server library (libmysqlclient). This is the recommended way, as stated in MySQL documentation.

To replace php-mysql library with php-mysqlnd, simply run:

$ yum remove php-mysql
$ yum install php-mysqlnd
$ systemctl restart httpd

If replacing php-mysql is not an option, the last resort is to compile PHP with MySQL 8.0 Client Server library (libmysqlclient) manually and copy the compiled library files into /usr/lib64/php/modules/ directory, replacing the old mysqli.so, mysql.so and pdo_mysql.so. This is a bit of a hassle with small chance of success rate, mostly due to deprecated dependencies of header files in the current MySQL version. Knowledge of programming is required to work around that.

Incompatible Charset

Starting from MySQL 8.0.1, MySQL has changed the default character set from latin1 to utf8mb4. The utf8mb4 character set is useful because nowadays the database has to store not only language characters but also symbols, newly introduced emojis, and so on. Charset utf8mb4 is UTF-8 encoding of the Unicode character set using one to four bytes per character, if compared to the standard utf8 (a.k.a utf8mb3) which using one to three bytes per character.

Many legacy applications were not built on top of utf8mb4 character set. So it would be good if we change the character setting for MySQL server to something understandable by our legacy PHP driver. Add the following two lines into MySQL configuration under [mysqld] section:

collation-server = utf8_unicode_ci
character-set-server = utf8

Optionally, you can also add the following lines into MySQL configuration file to streamline all client access to use utf8:

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

Don't forget to restart the MySQL server for the changes to take effect. At this point, our application should be getting along with MySQL 8.0.

That's it for now. Do share any feedback with us in the comments section if you have any other issues moving legacy applications to MySQL 8.0.

Tags:

Dealing with large transactions was always a pain point in Galera Cluster. The way in which Galera writeset certification works causes troubles when transactions are long or when a single row is being modified often on multiple nodes. As a result, transactions have to be rolled back and retried causing performance drops. Luckily, this problem has been addressed in Galera 4, a new release of Galera from Codership. This library is used in MariaDB 10.4, so installing MariaDB 10.4 the easiest way of testing the newly introduced features. In this blog post we will take a look at how the streaming replication can be used to mitigate problems which used to be a standard issue in previous Galera versions.

We will use three nodes of MariaDB Galera cluster version 10.4.6, which comes with Galera version of 26.4.2.

MariaDB [(none)]> show global status like 'wsrep_provider%';
+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name               | Value                                                                                                                                          |
+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name         | Galera                                                                                                                                         |
| wsrep_provider_vendor       | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version      | 26.4.2(r4498)                                                                                                                                  |
+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.001 sec)

There are three main pain points streaming replication is intended to deal with:

Long transactions
Large transactions
Hot spots in tables

Let’s consider them one by one and see how streaming replication may help us to deal with them but first let’s focus on the writeset certification - the root cause of those issues to show up.

Writeset certification in Galera cluster

Galera cluster consists of multiple writeable nodes. Each transaction executed on Galera cluster forms a writeset. Every writeset has to be sent to all of the nodes in the cluster for certification - a process which ensures that all the nodes can apply given transaction. Writesets have to be executed on all of the cluster nodes so if there is any conflict, transaction cannot be committed. What are typical reasons why the transaction cannot be committed? Well, the three points we listed earlier:

Long transactions - longer the transaction takes, more likely is that in the meantime another node will execute updates which will eventually conflict with the writeset and prevent it from passing the certification
Large transactions - first of all, large transactions are also longer than small ones, so that triggers the first problem. Second problem, strictly related to the large transactions, is the volume of the changes. More rows are going to be updated, more likely is that some write on another node will result in a conflict and the whole transaction would have to be rolled back.
Hot spots in tables - more likely given row is to be updated, more probably such update will happen simultaneously on multiple nodes resulting in some of the transactions to be rolled back

The main issue here is that Galera does not introduce any locking on nodes other than the initial node, on which the transaction was opened. The certification process is based on a hope that if one node could execute a transaction, others should be able to do it too. It is true but, as we discussed, there are corner cases in which the probability of this happening is significantly reduced.

In Galera 4, with streaming replication, the behavior has changed and all locks are being taken in all of the nodes. Transactions will be split into parts and each part will be certified on all nodes. After successful certification rows will be locked on all nodes in the cluster. There are a couple of variables that govern how exactly this is done - wsrep_trx_fragment_size and wsrep_trx_fragment_unit define how large the fragment should be and how it should be defined. It is very fine-grained control: you can define fragment unit as bytes, statements or rows which makes it possible to run the certification for every row modified in the transaction. Let’s take a look at how you can benefit from the streaming replication in real life.

Working with the streaming replication

Let’s consider the following scenario. We have a transaction to run that takes at least 30 seconds:

BEGIN; UPDATE sbtest.sbtest1 SET k = k - 2 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; SELECT SLEEP(30); COMMIT;

Then, while it is running, we will execute SQL that touches similar rows. This will be executed on another node:

BEGIN; UPDATE sbtest.sbtest1 SET k = k - 1 WHERE id < 20 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 20 ; COMMIT;

What would be the result?

The first transaction is rolled back as soon as the second one is executed:

MariaDB [sbtest]> BEGIN; UPDATE sbtest.sbtest1 SET k = k - 2 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; SELECT SLEEP(30); COMMIT;
Query OK, 0 rows affected (0.001 sec)

Query OK, 667 rows affected (0.020 sec)
Rows matched: 667  Changed: 667  Warnings: 0

Query OK, 667 rows affected (0.010 sec)
Rows matched: 667  Changed: 667  Warnings: 0

Query OK, 667 rows affected (0.009 sec)
Rows matched: 667  Changed: 667  Warnings: 0

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
Query OK, 0 rows affected (0.001 sec)

The transaction on the second node succeeded:

MariaDB [(none)]> BEGIN; UPDATE sbtest.sbtest1 SET k = k - 1 WHERE id < 20 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 20 ; COMMIT;
Query OK, 0 rows affected (0.000 sec)

Query OK, 7 rows affected (0.002 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Query OK, 7 rows affected (0.001 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Query OK, 0 rows affected (0.004 sec)

What we can do to avoid it is to use streaming replication for the first transaction. We will ask Galera to certify every row change:

MariaDB [sbtest]> BEGIN; SET SESSION wsrep_trx_fragment_size=1 ; SET SESSION wsrep_trx_fragment_unit='rows' ; UPDATE sbtest.sbtest1 SET k = k - 2 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 2000 ; SELECT SLEEP(30); COMMIT; SET SESSION wsrep_trx_fragment_size=0;
Query OK, 0 rows affected (0.001 sec)

Query OK, 0 rows affected (0.000 sec)

Query OK, 0 rows affected (0.000 sec)

Query OK, 667 rows affected (1.757 sec)
Rows matched: 667  Changed: 667  Warnings: 0

Query OK, 667 rows affected (1.708 sec)
Rows matched: 667  Changed: 667  Warnings: 0

Query OK, 667 rows affected (1.685 sec)
Rows matched: 667  Changed: 667  Warnings: 0

As you can see, this time it worked just fine. On the second node:

MariaDB [(none)]> BEGIN; UPDATE sbtest.sbtest1 SET k = k - 1 WHERE id < 20 ; UPDATE sbtest.sbtest1 SET k = k + 1 WHERE id < 20 ; COMMIT;
Query OK, 0 rows affected (0.000 sec)

Query OK, 7 rows affected (33.942 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Query OK, 7 rows affected (0.001 sec)
Rows matched: 7  Changed: 7  Warnings: 0

Query OK, 0 rows affected (0.026 sec)

What is interesting, you can see that the UPDATE took almost 34 seconds to execute - this was caused by the fact that the initial transaction, through the streaming replication, locked all modified rows on all of the nodes and our second transaction had to wait for the first one to complete even though both transactions were executed on different nodes.

This is basically it when it comes to the streaming replication. Depending on the requirements and the traffic you may use it in a less strict manner - we certified every row but you can change that to every n-th row or every statement. You can even decide on the volume of data to certify. This should be enough to match the requirements of your environment.

There are couple more things we would like you to keep in mind and remember. First of all, streaming replication is by no means a solution that should be used by default. This is the reason why it is, by default, disabled. The recommended use case is to manually decide on transactions that would benefit from the streaming replication and enable it at the session level. This is the reason why our examples end with:

SET SESSION wsrep_trx_fragment_size=0;

This statement (setting wsrep_trx_fragment_size to 0) disables streaming replication for current session.

Another thing worth remembering - if you happen to use streaming replication, it will use ‘wsrep_streaming_log’ table in ‘mysql’ schema for persistently storing the data that is streaming. Using this table you can get some idea about the data that is being transferred across the cluster using streaming replication.

Finally, the performance. This is also one of the reasons why you do not want to use streaming replication all the time. Main reason for that is locking - with streaming replication you have to acquire row locks on all of the nodes. This takes time to get the locks and, should you have to rollback the transaction, it will also put the pressure on all nodes to perform the rollback. We ran a very quick test of the performance impact that the streaming replication have. The environment is strictly a test one so do not assume those results to be the same on the production-grade hardware, it is more for you to see what the impact could be.

We tested four scenarios:

Baseline, set global wsrep_trx_fragment_size=0;
set global wsrep_trx_fragment_unit='rows'; set global wsrep_trx_fragment_size=1;
set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=1;
set global wsrep_trx_fragment_unit='statements'; set global wsrep_trx_fragment_size=5;

We used sysbench r/w test:

sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --events=0 --time=300 --mysql-host=10.0.0.141 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=3306 --tables=32 --report-interval=1 --skip-trx=off --table-size=100000 --db-ps-mode=disable run

The results are:

Transactions: 82.91 per sec., queries: 1658.27 per sec. (100%)
Transactions: 54.72 per sec., queries: 1094.43 per sec. (66%)
Transactions: 54.76 per sec., queries: 1095.18 per sec. (66%)
Transactions: 70.93 per sec., queries: 1418.55 per sec. (86%)

As you can see, the impact is significant, performance drops even by 33%.

We hope you found this blog post informative and it gave you some insights into the streaming replication that comes with Galera 4 and MariaDB 10.4. We tried to cover use cases and potential drawbacks related to this new technology.

Tags:

MariaDB

mariadb galera cluster

streaming replication

ProxySQL has supported native clustering since v1.4.2. This means multiple ProxySQL instances are cluster-aware; they are aware of each others' state and able to handle the configuration changes automatically by syncing up to the most up-to-date configuration based on configuration version, timestamp and checksum value. Check out this blog post which demonstrates how to configure clustering support for ProxySQL and how you could expect it to behave.

ProxySQL is a decentralized proxy, recommended to be deployed closer to the application. This approach scales pretty well even up to hundreds of nodes, as it was designed to be easily reconfigurable at runtime. To efficiently manage multiple ProxySQL nodes, one has to make sure whatever changes performed on one of the nodes should be applied across all nodes in the farm. Without native clustering, one has to manually export the configurations and import them to the other nodes (albeit, you could automate this by yourself).

In the previous blog post, we have covered ProxySQL clustering via Kubernetes ConfigMap. This approach is more or less pretty efficient with the centralized configuration approach in ConfigMap. Whatever loaded into ConfigMap will be mounted into pods. Updating the configuration can be done via versioning (modify the proxysql.cnf content and load it into ConfigMap with another name) and then push to the pods depending on the Deployment method scheduling and update strategy.

However, in a rapidly changing environment, this ConfigMap approach is probably not the best method because in order to load the new configuration, pod rescheduling is required to remount the ConfigMap volume and this might jeopardize the ProxySQL service as a whole. For example, let's say in our environment, our strict password policy requires to force MySQL user password expiration for every 7 days, which we would have to keep updating the ProxySQL ConfigMap for the new password on a weekly basis. As a side note, MySQL user inside ProxySQL requires user and password to match the one on the backend MySQL servers. That's where we should start making use of ProxySQL native clustering support in Kubernetes, to automatically apply the configuration changes without the hassle of ConfigMap versioning and pod rescheduling.

In this blog post, we’ll show you how to run ProxySQL native clustering with headless service on Kubernetes. Our high-level architecture can be illustrated as below:

We have 3 Galera nodes running on bare-metal infrastructure deployed and managed by ClusterControl:

192.168.0.21
192.168.0.22
192.168.0.23

Our applications are all running as pods within Kubernetes. The idea is to introduce two ProxySQL instances in between the application and our database cluster to serve as a reverse proxy. Applications will then connect to ProxySQL pods via Kubernetes service which will be load balanced and failover across a number of ProxySQL replicas.

The following is a summary of our Kubernetes setup:

root@kube1:~# kubectl get nodes -o wide
NAME    STATUS   ROLES    AGE     VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kube1   Ready    master   5m      v1.15.1   192.168.100.201   <none>        Ubuntu 18.04.1 LTS   4.15.0-39-generic   docker://18.9.7
kube2   Ready    <none>   4m1s    v1.15.1   192.168.100.202   <none>        Ubuntu 18.04.1 LTS   4.15.0-39-generic   docker://18.9.7
kube3   Ready    <none>   3m42s   v1.15.1   192.168.100.203   <none>        Ubuntu 18.04.1 LTS   4.15.0-39-generic   docker://18.9.7

ProxySQL Configuration via ConfigMap

Let's first prepare our base configuration which will be loaded into ConfigMap. Create a file called proxysql.cnf and add the following lines:

datadir="/var/lib/proxysql"

admin_variables=
{
    admin_credentials="proxysql-admin:adminpassw0rd;cluster1:secret1pass"
    mysql_ifaces="0.0.0.0:6032"
    refresh_interval=2000
    cluster_username="cluster1"
    cluster_password="secret1pass"
    cluster_check_interval_ms=200
    cluster_check_status_frequency=100
    cluster_mysql_query_rules_save_to_disk=true
    cluster_mysql_servers_save_to_disk=true
    cluster_mysql_users_save_to_disk=true
    cluster_proxysql_servers_save_to_disk=true
    cluster_mysql_query_rules_diffs_before_sync=3
    cluster_mysql_servers_diffs_before_sync=3
    cluster_mysql_users_diffs_before_sync=3
    cluster_proxysql_servers_diffs_before_sync=3
}

mysql_variables=
{
    threads=4
    max_connections=2048
    default_query_delay=0
    default_query_timeout=36000000
    have_compress=true
    poll_timeout=2000
    interfaces="0.0.0.0:6033;/tmp/proxysql.sock"
    default_schema="information_schema"
    stacksize=1048576
    server_version="5.1.30"
    connect_timeout_server=10000
    monitor_history=60000
    monitor_connect_interval=200000
    monitor_ping_interval=200000
    ping_interval_server_msec=10000
    ping_timeout_server=200
    commands_stats=true
    sessions_sort=true
    monitor_username="proxysql"
    monitor_password="proxysqlpassw0rd"
    monitor_galera_healthcheck_interval=2000
    monitor_galera_healthcheck_timeout=800
}

mysql_galera_hostgroups =
(
    {
        writer_hostgroup=10
        backup_writer_hostgroup=20
        reader_hostgroup=30
        offline_hostgroup=9999
        max_writers=1
        writer_is_also_reader=1
        max_transactions_behind=30
        active=1
    }
)

mysql_servers =
(
    { address="192.168.0.21" , port=3306 , hostgroup=10, max_connections=100 },
    { address="192.168.0.22" , port=3306 , hostgroup=10, max_connections=100 },
    { address="192.168.0.23" , port=3306 , hostgroup=10, max_connections=100 }
)

mysql_query_rules =
(
    {
        rule_id=100
        active=1
        match_pattern="^SELECT .* FOR UPDATE"
        destination_hostgroup=10
        apply=1
    },
    {
        rule_id=200
        active=1
        match_pattern="^SELECT .*"
        destination_hostgroup=20
        apply=1
    },
    {
        rule_id=300
        active=1
        match_pattern=".*"
        destination_hostgroup=10
        apply=1
    }
)

mysql_users =
(
    { username = "wordpress", password = "passw0rd", default_hostgroup = 10, transaction_persistent = 0, active = 1 },
    { username = "sbtest", password = "passw0rd", default_hostgroup = 10, transaction_persistent = 0, active = 1 }
)

proxysql_servers =
(
    { hostname = "proxysql-0.proxysqlcluster", port = 6032, weight = 1 },
    { hostname = "proxysql-1.proxysqlcluster", port = 6032, weight = 1 }
)

Some of the above configuration lines are explained per section below:

admin_variables

Pay attention on the admin_credentials variable where we used non-default user which is "proxysql-admin". ProxySQL reserves the default "admin" user for local connection via localhost only. Therefore, we have to use other users to access the ProxySQL instance remotely. Otherwise, you would get the following error:

ERROR 1040 (42000): User 'admin' can only connect locally

We also appended the cluster_username and cluster_password value in the admin_credentials line, separated by semicolon to allow automatic syncing to happen. All variables prefixed with cluster_* are related to ProxySQL native clustering and are self-explanatory.

mysql_galera_hostgroups

This is a new directive introduced for ProxySQL 2.x (our ProxySQL image is running on 2.0.5). If you would like to run on ProxySQL 1.x, do remove this part and use scheduler table instead. We already explained the configuration details in this blog post, How to Run and Configure ProxySQL 2.0 for MySQL Galera Cluster on Docker under "ProxySQL 2.x Support for Galera Cluster".

mysql_servers

All lines are self-explanatory, which is based on three database servers running in MySQL Galera Cluster as summarized in the following Topology screenshot taken from ClusterControl:

proxysql_servers

Here we define a list of ProxySQL peers:

hostname - Peer's hostname/IP address
port - Peer's admin port
weight - Currently unused, but in the roadmap for future enhancements
comment - Free form comment field

In Docker/Kubernetes environment, there are multiple ways to discover and link up container hostnames or IP addresses and insert them into this table, either by using ConfigMap, manual insert, via entrypoint.sh scripting, environment variables or some other means. In Kubernetes, depending on the ReplicationController or Deployment method used, guessing the pod's resolvable hostname in advanced is somewhat tricky unless if you are running on StatefulSet.

Check out this tutorial on StatefulState pod ordinal index which provides a stable resolvable hostname for the created pods. Combine this with headless service (explained further down), the resolvable hostname format would be:

{app_name}-{index_number}.{service}

Where {service} is a headless service, which explains where "proxysql-0.proxysqlcluster" and "proxysql-1.proxysqlcluster" come from. If you want to have more than 2 replicas, add more entries accordingly by appending an ascending index number relative to the StatefulSet application name.

Now we are ready to push the configuration file into ConfigMap, which will be mounted into every ProxySQL pod during deployment:

$ kubectl create configmap proxysql-configmap --from-file=proxysql.cnf

Verify if our ConfigMap is loaded correctly:

$ kubectl get configmap
NAME                 DATA   AGE
proxysql-configmap   1      7h57m

Creating ProxySQL Monitoring User

The next step before we start the deployment is to create ProxySQL monitoring user in our database cluster. Since we are running on Galera cluster, run the following statements on one of the Galera nodes:

mysql> CREATE USER 'proxysql'@'%' IDENTIFIED BY 'proxysqlpassw0rd';
mysql> GRANT USAGE ON *.* TO 'proxysql'@'%';

If you haven't created the MySQL users (as specified under mysql_users section above), we have to create them as well:

mysql> CREATE USER 'wordpress'@'%' IDENTIFIED BY 'passw0rd';
mysql> GRANT ALL PRIVILEGES ON wordpress.* TO 'wordpress'@'%';
mysql> CREATE USER 'sbtest'@'%' IDENTIFIED BY 'passw0rd';
mysql> GRANT ALL PRIVILEGES ON sbtest.* TO 'proxysql'@'%';

That's it. We are now ready to start the deployment.

Deploying a StatefulSet

We will start by creating two ProxySQL instances, or replicas for redundancy purposes using StatefulSet.

Let's start by creating a text file called proxysql-ss-svc.yml and add the following lines:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: proxysql
  labels:
    app: proxysql
spec:
  replicas: 2
  serviceName: proxysqlcluster
  selector:
    matchLabels:
      app: proxysql
      tier: frontend
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: proxysql
        tier: frontend
    spec:
      restartPolicy: Always
      containers:
      - image: severalnines/proxysql:2.0.4
        name: proxysql
        volumeMounts:
        - name: proxysql-config
          mountPath: /etc/proxysql.cnf
          subPath: proxysql.cnf
        ports:
        - containerPort: 6033
          name: proxysql-mysql
        - containerPort: 6032
          name: proxysql-admin
      volumes:
      - name: proxysql-config
        configMap:
          name: proxysql-configmap
---
apiVersion: v1
kind: Service
metadata:
  annotations:
  labels:
    app: proxysql
    tier: frontend
  name: proxysql
spec:
  ports:
  - name: proxysql-mysql
    port: 6033
    protocol: TCP
    targetPort: 6033
  - name: proxysql-admin
    nodePort: 30032
    port: 6032
    protocol: TCP
    targetPort: 6032
  selector:
    app: proxysql
    tier: frontend
  type: NodePort

There are two sections of the above definition - StatefulSet and Service. The StatefulSet is the definition of our pods, or replicas and the mount point for our ConfigMap volume, loaded from proxysql-configmap. The next section is the service definition, where we define how the pods should be exposed and routed for internal or external network.

Verify the pod and service states:

$ kubectl get pods,svc
NAME             READY   STATUS    RESTARTS   AGE
pod/proxysql-0   1/1     Running   0          4m46s
pod/proxysql-1   1/1     Running   0          2m59s

NAME                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                         AGE
service/kubernetes        ClusterIP   10.96.0.1        <none>        443/TCP                         10h
service/proxysql          NodePort    10.111.240.193   <none>        6033:30314/TCP,6032:30032/TCP   5m28s

If you look at the pod's log, you would notice we got flooded with this warning:

$ kubectl logs -f proxysql-0
...
2019-08-01 19:06:18 ProxySQL_Cluster.cpp:215:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer proxysql-1.proxysqlcluster:6032 . Error: Unknown MySQL server host 'proxysql-1.proxysqlcluster' (0)

The above simply means proxysql-0 was unable to resolve "proxysql-1.proxysqlcluster" and connect to it, which is expected since we haven't created our headless service for DNS records that is going to be needed for inter-ProxySQL communication.

Kubernetes Headless Service

In order for ProxySQL pods to be able to resolve the anticipated FQDN and connect to it directly, the resolving process must be able to lookup the assigned target pod IP address and not the virtual IP address. This is where headless service comes into the picture. When creating a headless service by setting "clusterIP=None", no load-balancing is configured and no cluster IP (virtual IP) is allocated for this service. Only DNS is automatically configured. When you run a DNS query for headless service, you will get the list of the pods IP addresses.

Here is what it looks like if we look up the headless service DNS records for "proxysqlcluster" (in this example we had 3 ProxySQL instances):

$ host proxysqlcluster
proxysqlcluster.default.svc.cluster.local has address 10.40.0.2
proxysqlcluster.default.svc.cluster.local has address 10.40.0.3
proxysqlcluster.default.svc.cluster.local has address 10.32.0.2

While, the following output shows the DNS record for the standard service called "proxysql" which resolves to the clusterIP:

$ host proxysql
proxysql.default.svc.cluster.local has address 10.110.38.154

To create a headless service and attach it to the pods, one has to define the ServiceName inside the StatefulSet declaration, and the Service definition must have "clusterIP=None" as shown below. Create a text file called proxysql-headless-svc.yml and add the following lines:

apiVersion: v1
kind: Service
metadata:
  name: proxysqlcluster
  labels:
    app: proxysql
spec:
  clusterIP: None
  ports:
  - port: 6032
    name: proxysql-admin
  selector:
    app: proxysql

Create the headless service:

$ kubectl create -f proxysql-headless-svc.yml

Just for verification, at this point, we have the following services running:

$ kubectl get svc
NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                         AGE
kubernetes        ClusterIP   10.96.0.1       <none>        443/TCP                         8h
proxysql          NodePort    10.110.38.154   <none>        6033:30200/TCP,6032:30032/TCP   23m
proxysqlcluster   ClusterIP   None            <none>        6032/TCP                        4s

Now, check out one of our pod's log:

$ kubectl logs -f proxysql-0
...
2019-08-01 19:06:19 ProxySQL_Cluster.cpp:215:ProxySQL_Cluster_Monitor_thread(): [WARNING] Cluster: unable to connect to peer proxysql-1.proxysqlcluster:6032 . Error: Unknown MySQL server host 'proxysql-1.proxysqlcluster' (0)
2019-08-01 19:06:19 [INFO] Cluster: detected a new checksum for mysql_query_rules from peer proxysql-1.proxysqlcluster:6032, version 1, epoch 1564686376, checksum 0x3FEC69A5C9D96848 . Not syncing yet ...
2019-08-01 19:06:19 [INFO] Cluster: checksum for mysql_query_rules from peer proxysql-1.proxysqlcluster:6032 matches with local checksum 0x3FEC69A5C9D96848 , we won't sync.

You would notice the Cluster component is able to resolve, connect and detect a new checksum from the other peer, proxysql-1.proxysqlcluster on port 6032 via the headless service called "proxysqlcluster". Note that this service exposes port 6032 within Kubernetes network only, hence it is unreachable externally.

At this point, our deployment is now complete.

Connecting to ProxySQL

There are several ways to connect to ProxySQL services. The load-balanced MySQL connections should be sent to port 6033 from within Kubernetes network and use port 30033 if the client is connecting from an external network.

To connect to the ProxySQL admin interface from external network, we can connect to the port defined under NodePort section, 30032 (192.168.100.203 is the primary IP address of host kube3.local):

$ mysql -uproxysql-admin -padminpassw0rd -h192.168.100.203 -P30032

Use the clusterIP 10.110.38.154 (defined under "proxysql" service) on port 6032 if you want to access it from other pods in Kubernetes network.

Then perform the ProxySQL configuration changes as you wish and load them to runtime:

mysql> INSERT INTO mysql_users (username,password,default_hostgroup) VALUES ('newuser','passw0rd',10);
mysql> LOAD MYSQL USERS TO RUNTIME;

You will notice the following lines in one of the pods indicating the configuration syncing completes:

$ kubectl logs -f proxysql-0
...
2019-08-02 03:53:48 [INFO] Cluster: detected a peer proxysql-1.proxysqlcluster:6032 with mysql_users version 2, epoch 1564718027, diff_check 4. Own version: 1, epoch: 1564714803. Proceeding with remote sync
2019-08-02 03:53:48 [INFO] Cluster: detected peer proxysql-1.proxysqlcluster:6032 with mysql_users version 2, epoch 1564718027
2019-08-02 03:53:48 [INFO] Cluster: Fetching MySQL Users from peer proxysql-1.proxysqlcluster:6032 started
2019-08-02 03:53:48 [INFO] Cluster: Fetching MySQL Users from peer proxysql-1.proxysqlcluster:6032 completed

Keep in mind that the automatic syncing only happens if there is a configuration change in ProxySQL runtime. Therefore, it's vital to run "LOAD ... TO RUNTIME" statement before you can see the action. Don't forget to save the ProxySQL changes into the disk for persistency:

mysql> SAVE MYSQL USERS TO DISK;

Limitation

Note that there is a limitation to this setup due to ProxySQL does not support saving/exporting the active configuration into a text configuration file that we could use later on to load into ConfigMap for persistency. There is a feature request for this. Meanwhile, you could push the modifications to ConfigMap manually. Otherwise, if the pods were accidentally deleted, you would lose your current configuration because the new pods would be bootstrapped by whatever defined in the ConfigMap.

Special thanks to Sampath Kamineni, who sparked the idea of this blog post and provide insights about the use cases and implementation.

Tags:

Complex, inflexible architectures, redundancy and out-of-date technology, are common problems for companies facing data to cloud migration.

We look to the“clouds,” hoping that we will find there a magic solution to improve operational speed and performance, better workload and scalability, less prone and less complicated architectures. We hope to make our database administrator's life more comfortable. But is it really always a case?

As more enterprises are moving to the cloud, the hybrid model is actually becoming more popular. The hybrid model is seen as a safe model for many businesses.

In fact, it's challenging to do a heart transplant and port everything over immediately. Many companies are doing a slow migration that usually takes a year or even maybe forever until everything is migrated. The move should be made in an acceptable peace.

Unfortunately, hybrid means another puzzle piece that not necessary to reduce complexity. Perhaps as many others walking this road before you, you will find out that some of the applications will actually not move.

Or you will find out that the other project team just decided to use yet another cloud provider.

For instance, it is free, and relatively easy, to move any amount of data into an AWS EC2 instance, but you'll have to pay to transfer data out of AWS. The database services on Amazon are only available on Amazon. Vendor lock-in is there and should not be ignored.

Along the same lines, ClusterControl offers a suite of database automation and management functions to give you full control of your database infrastructure. On-prem, in the cloud and multiple vendors, support.

With ClusterControl, you can monitor, deploy, manage, and scale your databases, securely and with ease through our point-and-click interface.

Utilizing the cloud enables your company and applications to profit from the cost-savings and versatility that originate with cloud computing.

Supported Cloud Platforms

ClusterControl allows you to run multiple databases on the top of the most popular cloud providers without being locked-in to any vendor. It has offered the ability to deploy databases (and backup databases) in the cloud since ClusterControl 1.6.

The supported cloud platforms are Amazon AWS, Microsoft Azure and Google Cloud. It is possible to launch new instances and deploy MySQL, MariaDB, MongoDB, and PostgreSQL directly from the ClusterControl user interface.

The recent ClusterControl version (1.7.4) added support for the MySQL Replication 8.0, PostgreSQL and TimescaleDB from Amazon AWS, Google Cloud Platform, and Microsoft Azure.

ClusterControl: Supported Platforms

Cloud Providers Configuration

Before we jump into our first deployment we need to connect ClusterControl with our cloud provider.
It’s done in the Integrations panel.

ClusterControl- Cloud Credential Management

The tool will walk you through the Cloud integration with the straightforward wizard. As we can see in the below screenshot first, we start with one of the three big players Amazon Web Services (AWS), Google Cloud and Microsoft Azure.

ClusterControl -Supported Cloud Platforms

In the next section, we need to provide the necessary credentials.

When all is set and ClusterControl can talk with your cloud provider we can go to the deployment section.

Cloud Deployment Process

In this part, you want to select the supported cluster type, MySQL Galera Cluster, MongoDB Replica Set, or PostgreSQL Streaming Replication, TimescaleDB, MySQL Replication.

The next move is to pick the supported vendor for the selected cluster type. At the moment, the following vendors and versions are:

MySQL Galera Cluster - Percona XtraDB Cluster 5.7, MariaDB 10.2, MariaDB 10.3
MySQL Replication Cluster - Percona Server 8.0, MariaDB Server 10.3, Oracle MySQL Server 8.0
MongoDB Replica Set - Percona Server for MongoDB 3.6, MongoDB 3.6, MongoDB 4.0
PostgreSQL Cluster - PostgreSQL 11.0
TimescaleDB 11.0

The deployment procedure is aware of the functionality and flexibility of the cloud environments, like the type of VM's dynamic IP and hostname allocation, NAT-ed public IP address, virtual private cloud network or storage.

In the following dialog:

ClusterControl - Deploy MySQL Replication Cluster

Most of the settings in this step are dynamically populated from the cloud provider by the chosen credentials. You can configure the operating system, instance size, VPC setting, storage type, and size and also specify the SSH key location on the ClusterControl host. You can also let ClusterControl generate a new key specifically for these instances.

ClusterControl - Cloud Deployment, Select Virutal Machine

When all is set you will see your configuration. At this stage, you can also pick up additional subnet.

ClusterControl - Cloud Deployment Summary

Verify if everything is correct and hit the "Deploy Cluster" button to start the deployment.

You can then monitor the progress by clicking on the Activity -> Jobs -> Create Cluster -> Full Job Details:

Depending on the cluster size, it could take 10 to 20 minutes to complete. Once done, you will see a new database cluster listed under the ClusterControl dashboard.

Under the hood, the deployment process did the following:

Create SSH key
Create cloud VM instances
Configure security groups and networking (firewalls, subnets)
Verify the SSH connectivity from ClusterControl to all created instances
Prepare VM’s for a specific type of cluster (VM node configuration like package installation, kernel configuration, etc)
Deploy a database on every instance
Configure the clustering or replication links
Register the deployment into ClusterControl

After the deployment, you can review the process and see what exactly was executed. With the extended logging, you can see each command. You can see who triggered the job and what was the outcome.
If at any point you want to extend your cluster, you can use the scaling which is also integrated with your cloud provider.

The process is simple. In the first phase, you choose the desired VM type.

Finally, you can choose the master node and remaining settings which depends on your cluster type:

Conclusion

We showed you how to set up your database MySQL Replication environment on Microsoft Azure, it only took a couple of clicks to build virtual machines, network, and finally a reliable master/slave replication cluster. With new scaling in the cloud functionality, you can also easily expand cluster whenever needed.

This is just a first step if you want to see what to do next check out our other blogs where we talk about auto-recovery, backups, security and many other aspects of day to day administration with ClusterControl. Want to try it by yourself? Give it a try.

Tags: