Severalnines - MariaDB

Long gone are the days of the one-database-fits-all approach.

With increasing requirements on velocity, performance and agility, numerous datastores emerged, which are intended to solve one particular problem. We have relational databases, document stores, time-series databases, columnar databases, full text search engines.

It is quite common to see multiple datastores work together in the same environment.

So how does MariaDB AX fit in the picture? How does it compare with MariaDB TX and what problem does it solve?

In this blog post, we will have a look at MariaDB AX, and see why you may want to use it.

What is MariaDB AX?

First things first, so what is MariaDB AX?

It is a column store, and it stores its data by ...column! It is implemented as a separate engine in MariaDB 10.3 database.

As you may know, MySQL and MariaDB are designed to use pluggable storage engines. Every storage engine, be it InnoDB, Aria, MyRocks, Spider or any other engines are plugins.

In the same way, MariaDB AX uses ColumnStore engine:

MariaDB [(none)]> SHOW ENGINES\G
*************************** 1. row ***************************
      Engine: Columnstore
     Support: YES
     Comment: Columnstore storage engine
Transactions: YES
          XA: NO
  Savepoints: NO

This results in an interesting combination. The SQL parsing is done by MariaDB, thus you can expect to work with query syntax which is very similar to what you are used to in MariaDB. This makes it also easier to combine access to both MariaDB AX and MariaDB TX in the same application. No need for any specific connectors or libraries to connect to two datastores. All can be done using MySQL or MariaDB client library. You can also utilize MaxScale for both datastores, which can help to build high availability for MariaDB AX.

Why Should We Use a Columnar Datastore?

Let’s go through a short introduction to the idea behind columnar datastores.

What makes MariaDB AX different from MariaDB TX?

The main difference is how the data is structured. In typical database data is stored as rows.

Id, Product, Price, Code, Warehouse
1, Door, 10, 12334, EU1
2, Window, 9, 9523, EU1
3, Glass, 12, 97643, EU2

As you can see, we have three rows, each containing all the data about a product entry.

The problem is, this way of storing data is not really efficient when you want to get just a subset of this data. Let’s say you want to get just the “Product” and “Price” columns - to do that you have to read whole rows, all the data and then discard the unneeded columns. It is also tricky to sort the data. If you would like to sort the dataset from the most expensive to the cheapest product you have to read everything and then do the sort.

We all know that databases utilize indexes to speed up access. An index is structured in a way that it contains the content of the indexed column as well as a pointer to the full row (in InnoDB that’s the Primary Key). For example, an index on the “Product” column, assuming that “Id” is the Primary Key, may look like as follows:

Product, Id
Door, 1
Window, 2
Glass, 3

This speeds up the access to the data as there’s no need to read full row just to find a value in “Product” column. Once database finds it, it can read the rest of the row (if needed) by following the pointer.

In a column store, things are different. Data is structured not as rows but as columns. To some extend this is similar to the index. Our table in columnar datastore may look like this:

Id: 1, 2, 3
Product: Door, Window, Glass
Price: 10, 9, 12
Code: 12334, 9523, 97643
Warehouse: EU1, EU1, EU2

In MariaDB AX, columns are stored in separate files, each entry for a given “row” starts at the same offset.

The main advantage here is that if you want to run a query which will work with just a subset of data, you only need to read data from columns which are relevant to the query.

In our example earlier, instead of reading whole dataset, we can just load data for columns ‘Product” and “Price”. It reduces the data needed to be accessed on disk and speeds up the process.

What’s also important, storing data in columns make them less distinct which makes it compress better. For example, in our “Warehouse” column we have just two types of entries. Even in real world scenario it is very likely that we will end up with a small number of warehouses compared to the number of products. This makes the “Warehouse” column very good target for compression.

As a result of all of this, columnar datastores can handle large dataset better and can query it in more efficient way than “standard” OLTP-focused databases.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Why Should I Use MariaDB AX?

Disk access is a major bottleneck in databases. A columnar datastore improves performance by reducing the amount of data that needs to be read from disk. It reads only the data necessary to answer the query.

Of course, MariaDB AX is not the only columnar datastore out there. There are many others like, for example, Clickhouse or Apache HBase.

The truth is, none of the other options support full SQL syntax that MySQL does. They require different connectors, different approach to querying the data while MariaDB AX can be queried just like you would query the ‘normal’ MariaDB.

What’s also important, given that MariaDB AX utilizes ColumnStore engine, it is perfectly fine to mix it with other engines. You can mix and join InnoDB and ColumnStore tables in the same query without any problem.

On top of that, tools that come with MariaDB TX, like MaxScale, will work just fine with MariaDB AX making it easier to build an integrated, easy to use environment. So when you are running ClusterControl with MariaDB 10.3 and MaxScale, you can easily add MariaDB AX into the mix and it will work with other parts of the setup.

MariaDB AX comes with tools which are intended to help with transferring the data from other sources. If you happen to use Kafka or Spark, there are connectors to use when importing data from those sources into MariaDB AX.

Additionally, even though regular replication between MariaDB TX (InnoDB) and MariaDB AX (ColumnStore) is not performing well due to ColumnStore limitations (it’s always better to do batch inserts in columnar datastores than single inserts, as it’s done on replication), it is possible to build a pipeline which would consist of MaxScale configured as binlog server and Avro CDC router, MaxScale CDC Data Adapter and MariaDB AX, which will receive data from the adapter almost in real time.

We hope this blog post will give you some insight into what MariaDB AX is and how it can be utilized alongside MariaDB TX environment deployed and managed by ClusterControl (download it for free!).

Tags:

MariaDB

mariadb ax

analytics

columnar datastore

Using Galera cluster is a great way of building a highly available environment for MySQL or MariaDB. It is a shared-nothing cluster environment which can be scaled even beyond 12-15 nodes. Galera has some limitations, though. It shines in low-latency environments and even though it can be used across WAN, the performance is limited by network latency. Galera performance can also be impacted if one of the nodes starts to behave incorrectly. For example, excessive load on one of the nodes may slow it down, resulting in slower handling of the writes and that will impact all of the other nodes in the cluster. On the other hand, it is quite impossible to run a business without analyzing your data. Such analysis, typically, requires running heavy queries, which is quite different from an OLTP workload. In this blog post, we will discuss an easy way of running analytical queries for data stored in Galera Cluster for MySQL or MariaDB, in a way that it does not impact the performance of the core cluster.

How to run analytical queries on Galera Cluster?

As we stated, running long running queries directly on a Galera cluster is doable, but perhaps not so good idea. Hardware-dependant, this can be acceptable solution (if you use strong hardware and you will not run a multi-threaded analytical workload) but even if CPU utilization will not be a problem, the fact that one of the nodes will have mixed workload (OLTP and OLAP) will alone pose some performance challenges. OLAP queries will evict data required for your OLTP workload from the buffer pool, and this will slow down your OLTP queries. Luckily, there is a simple yet efficient way of separating analytical workload from regular queries - an asynchronous replication slave.

Replication slave is a very simple solution - all you need is just another host which can be provisioned and asynchronous replication has to be configured from Galera Cluster to that node. With asynchronous replication, the slave will not impact the rest of the cluster in any way. No matter if it is heavily loaded, uses different (less powerful) hardware, it will just continue replicating from the core cluster. The worst case scenario is that the replication slave will start lagging behind but then it is up to you to implement multi-threaded replication or, eventually to scale up the replication slave.

Once the replication slave is up and running, you should run the heavier queries on it and offload the Galera cluster. This can be done in multiple ways, depending on your setup and environment. If you use ProxySQL, you can easily direct queries to the analytical slave based on the source host, user, schema or even the query itself. Otherwise it will be up to your application to send analytical queries to the correct host.

Setting up a replication slave is not very complex but it still can be tricky if you are not proficient with MySQL and tools like xtrabackup. The whole process would consist of setting up the repository on a new server and installing the MySQL database. Then you will have to provision that host using data from Galera cluster. You can use xtrabackup for that but other tools like mydumper/myloader or even mysqldump will work as well (as long as you execute them correctly). Once the data is there, you will have to setup the replication between a master Galera node and the replication slave. Finally, you would have to reconfigure your proxy layer to include the new slave and route the traffic towards it or make tweaks in how your application connects to the database in order to redirect some of the load to the replication slave.

What is important to keep in mind, this setup is not resilient. If the “master” Galera node would go down, the replication link will be broken and it will take a manual action to slave the replica off another master node in the Galera cluster.

This is not a big deal, especially if you use replication with GTID (Global Transaction ID) but you have to identify that the replication is broken and then take the manual action.

How to set up the asynchronous slave to Galera Cluster using ClusterControl?

Luckily, if you use ClusterControl, the whole process can be automated and it requires just a handful of clicks. The initial state has already been set up using ClusterControl - a 3 node Galera cluster with 2 ProxySQL nodes and 2 Keepalived nodes for high availability of both database and proxy layer.

Adding the replication slave is just a click away:

Replication, obviously, requires binary logs to be enabled. If you do not have binlogs enabled on your Galera nodes, you can do it also from the ClusterControl. Please keep in mind that enabling binary logs will require a node restart to apply the configuration changes.

Even if one node in the cluster has binary logs enabled (marked as “Master” on the screenshot above), it’s still good to enable binary log on at least one more node. ClusterControl can automatically failover the replication slave after it detects that the master Galera node crashed, but for that, another master node with binary logs enabled is required or it won’t have anything to fail over to.

As we stated, enabling binary logs requires restart. You can either perform it straight away, or just make the configuration changes and perform the restart at some other time.

After binlogs have been enabled on some of the Galera nodes, you can proceed with adding the replication slave. In the dialog you have to pick the master host, pass the hostname or IP address of the slave. If you have recent backups at hand (which you should do), you can use one to provision the slave. Otherwise ClusterControl will provision it using xtrabackup - all the recent master data will be streamed to the slave and then the replication will be configured.

After the job completed, a replication slave has been added to the cluster. As stated earlier, should the 10.0.0.101 die, another host in the Galera cluster will be picked as the master and ClusterControl will automatically slave 10.0.0.104 off another node.

As we use ProxySQL, we need to configure it. We’ll add a new server into ProxySQL.

We created another hostgroup (30) where we put our asynchronous slave. We also increased “Max Replication Lag” to 50 seconds from default 10. It is up to your business requirements how badly analytics slave can be lagging before it becomes a problem.

After that we have to configure a query rule that will match our OLAP traffic and route it to the OLAP hostgroup (30). On the screenshot above we filled several fields - this is not mandatory. Typically you will need to use one, two of them at most. Above screenshot serves as an example so we can easily see that you can match queries using schema (if you have a separate schema with analytical data), hostname/IP (if OLAP queries are executed from some particular host), user (if application uses particular user for analytical queries. You can also match queries directly by either passing a full query or by marking them with SQL comments and let ProxySQL route all queries with a “OLAP_QUERY” string to our analytical hostgroup.

As you can see, thanks to ClusterControl we were able to deploy a replication slave to Galera Cluster in just a couple of clicks. Some may argue that MySQL is not the most suitable database for analytical workload and we tend to agree. You can easily extend this setup using ClickHouse and by setting up a replication from asynchronous slave to ClickHouse columnar datastore for much better performance of analytical queries. We described this setup in one of the earlier blog posts.

Tags:

Migrating from Oracle database to MariaDB can come with a number of benefits: lower cost of ownership, access to and use of an open source database engine, tight integration with the web, an active community of MariaDB database users and more.

Over the years MariaDB has gained Enterprise support and maturity to run critical and complex data transaction systems. With the recent version, MariaDB has added some great new features such as SQL_Mode=Oracle compatibility, making the transition process easier than ever before.

Whether you’re planning to migrate from Oracle database to MariaDB manually or with the help of a commercial tool to automate the entire migration process, you need to know all the possible bottlenecks and methods involved in the process and the results validation.

Join us on March 12th as we walk you through all you need to know to plan and execute a successful migration from Oracle database to MariaDB.

Date, Time & Registration

Europe/MEA/APAC

Tuesday, March 12th at 09:00 GMT / 10:00 CET (Germany, France, Sweden)

North America/LatAm

Tuesday, March 12th at 10:00 Pacific Time (US) / 13:00 Eastern Time (US)

Agenda

A brief introduction to the platform
- Oracle vs MariaDB
- Platform support
- Installation process
- Database access
- Backup process
- Controlling query execution
- Security
- Replication options
Migration
- Planning and development strategy
- Assessment or preliminary check
- Data type mapping
- Migration tools
- Migration process
- Testing
Post-migration
- Monitoring and Alerting
- Performance Management
- Backup Management
- High availability
- Upgrades
- Scaling
- Staff training

Speaker

Bartlomiej Oles is a MySQL and Oracle DBA, with over 15 years experience in managing highly available production systems at IBM, Nordea Bank, Acxiom, Lufthansa, and other Fortune 500 companies. In the past five years, his focus has been on building and applying automation tools to manage multi-datacenter database environments.

We look forward to “seeing” you there!

Tags:

Gartner predicts that by 2022, 50% of existing commercial databases will have been converted to open source databases. Even more, 70% of new in-house applications will be developed on an open source database platform (State of the Open-Source DBMS Market, 2018).

These are high numbers considering the maturity, stability, and criticality of popular, proprietary database software. The same may be observed in the top database ranking, where most of the top ten databases are open source.

https://db-engines.com/en/ranking

What is pushing companies to do such moves?

There could be many reasons to migrate database systems. For some, the main reason will be the cost of license and ownership; but is it really only about the cost? And s open source stable enough to move critical production systems to that new open source world?

Open source databases, especially new ones brought into an organization, often come from a developer on a project team. It’s chosen because it’s free (doesn’t impact the direct project’s external spend) and meets the technical requirements of the moment.

But the free aspect doesn't actually come simply with no cost as you have to consider many factors including the migration and the cost of manhours. The smoother the migration the less time and money spent on the project.

Database migrations can be challenging especially for heterogeneous proprietary database migrations such as Oracle to PostgreSQL, Oracle to Percona or MySQL. The complex schema structure, data types, and database code like PL/SQL can be quite different from those of the target databases,
requiring a schema and code transformation step before the data migration starts.

In the recent article by my colleague Paul Namuag, he examined how to migrate Oracle to Percona.

This time we will take a look at what you should know before migrating from Oracle to MariaDB.

MariaDB promises the enterprise features and migration features which can help to migrate Oracle databases into the open source world.

In this blog post we will cover the following:

Why migrate?
Storage engine differences
Database connectivity considerations
Installation & administration simplicity
Security differences
Replication & HA
PL/SQL and database code
Clustering and scaling
Backup and recovery
Cloud compatibility
Miscellaneous considerations

Why Migrate from Oracle?

Most enterprises will run Oracle or SQL Server, or a combination of both, with small pockets of isolated open source databases operating independently. Small to medium-sized businesses would tend to deploy mainly open source databases, especially for new applications. But this is changing, and often, open source is the main choice even for big organizations.

A quick comparison of these two databases systems looks as follows:

Only Oracle Express Edition is free of cost, but it has very limited features when compared to MariaDB. For extensive features, either Oracle Standard Edition or Oracle Enterprise Edition has to be purchased.
On the other hand, MariaDB and MySQL community were working hard to minimize the potential features gap. Security compliance, hot backups, and many other enterprise features are now available in MariaDB.

There are things that were always more flexible in MariaDB/MySQL than in massive Oracle setups. One of them is the ease of replication and horizontal cluster scalability.

Storage engine differences

First, let’s start with some basics. You can still hear a lot of legends and myths regarding MySQL or MariaDB limitations, which mostly refer to the dark times when the main storage engine was MyISAM.

MyISAM was the default storage engine from MySQL 3.23 until it was replaced by InnoDB in MariaDB 5.5. It's a light, non-transactional engine with great performance but does not offer row-level locking or the reliability of InnoDB.

With InnoDB (default storage engine), MariaDB offers the two standard row-level locks, which are shared locks(S) and exclusive locks(X). A shared lock is obtained to read a row and allows other transactions to read the locked row. Different transactions may also acquire their own shared locks.
The particular lock is obtained to write to a row and stops additional transactions from locking the same row.

InnoDB has definitely covered the biggest transactional feature gap between these two systems.

Due to the pluggable nature of MariaDB, it offers even more storage engines so you can better adjust it to a specific workload. I.e. when space matters you can go with TokuDB which offers great compression ratio, Spider optimized for partitioning and data sharding, ColumnStore for big data scaling.

Nevertheless, for those migrating from Oracle, my recommendation would be to go first with the InnoDB storage engine.

Connectivity Considerations

MariaDB shares with Oracle good support for database access including ODBC and JDBC drivers, as well as access libraries for Perl, Python, and PHP. MySQL and Oracle both support binary large objects, character, numeric, and date data types. So you should have no issues with finding the right connector for your application services.

MariaDB doesn't have the dedicated listener process to maintain database connections nori SCAN address for the clustered database like we know from Oracle. You will also not find flexible database services. Instead, you will need to configure manually between Unix socket (a local, most secure way to connect DB - app on the same server), remote connections (by default, MariaDB does not permit remote logins) and also pipe and memory available on Windows systems only. For the cluster, the SCAN address needs to be replaced by the load balancer. MariaDB recommends using their other product MaxScale, but you can also find others like ProxySQL or HAproxy that will work with MariaDB, with some limitations. While using external load balancers for MariaDB can be difficult you may find great features which, by way of comparison, are not available in the Oracle database.

A load balancer would also be a recommendation for those who are looking for Oracle Transparent Application Failover (TAF), Oracle Firewall DB or some of the advanced security features like Oracle Connection Manager. You can find more about choosing the right load balancer in the following white paper.

While these technologies are free and can be deployed manually using script-based installations, systems such as ClusterControl automate the process with their point-and-click interface. ClusterControl also lets you deploy caching technologies.

Installation & Administration Simplicity

The latest available Oracle DB version added a long-awaited installation feature: Oracle 18c can now be installed on Oracle Linux using an RPM. Dedicated Java-based installation was always a problem for those who wanted to write automation for their cookbooks or Puppet code snippets. You could go with predefined silent installation but the file was changing from time to time and still, you had to deal with dependency-hell. RPM-based installation was definitely a good move.

So how does it work in MariaDB?

For those who are moving from the Oracle world, it’s always a nice surprise to see how fast you can deploy instances, create new databases or even set up complex replication flows. The installation and configuration process is probably the smoothest part of the migration process. Although choosing the right settings takes time and knowledge.

Oracle provides a set of binary distributions of MySQL. These include generic binary distributions in the form of compressed tar files (files with a .tar.gz extension) for a number of platforms and binaries in platform-specific packages. On the Windows platform, you can find a standard installation wizard via a GUI.

The Oracle database configuration assistant (DBCA) is basically not needed as you will be able to create a database with a single line command.

CREATE [OR REPLACE] {DATABASE | SCHEMA} [IF NOT EXISTS] db_name
    [create_specification] ...

create_specification:
    [DEFAULT] CHARACTER SET [=] charset_name
  | [DEFAULT] COLLATE [=] collation_name

You can also have a database with different database collations and character sets under the same MariaDB instance.

Replication setup is just to enable binary logging on a master (similar to archive log in Oracle) and running the following command on the slave to attach it to the master.

CHANGE MASTER TO
MASTER_HOST = host,
MASTER_PORT = port,
MASTER_USER = replication_user,
MASTER_PASSWORD = password,
MASTER_AUTO_POSITION = 1;

Security and Compliance

Oracle provides enhanced database security.

User authentication is performed in Oracle by specifying global roles in addition to location, username, and password. In Oracle, User authentication is performed by different authentication methods including database authentication, external authentication, and proxy authentication.

For a long time roles were not available in MariaDB or MySQL. MariaDB added roles with version 10.2 after they appeared in MySQL 8.0.

Roles, an option that is heavily used in common Oracle DB setups, can be easily transformed in MariaDB, so you don’t have to waste time on single user permission adjustments.

Create, alter user, passwords: it all works similarly to Oracle DB.

To achieve enterprise security compliance standards, MariaDB offers built-in features like:

Audit plugin in
Encryption of data-at-rest
Certificates, TSS connection
PAM Plugin

Audit plugin offers a sort of fine-grained auditing (FGA) or AUDI SQL available in Oracle. It does not offer the same set of features but usually, it’s good enough to satisfy security compliance audits.

Encryption of data-at-rest Data at rest encryption can be a requirement for security regulations like HIPAA or PCI DSS. Such encryption can be implemented on multiple levels - you can encrypt the whole disk on which the files are stored. You can encrypt only the MySQL database through functionality available in the latest versions of MySQL or MariaDB. Encryption can also be implemented in the application so that it encrypts the data before storing it in the database. Every option has its pros and cons: disk encryption can help only when disks are physically stolen, but the files would not be encrypted on a running database server.

PAM Plugin extends logging functionality to tight user accounts with LDAP settings. In fact, I find it much easier to set up than LDAP Integration with Oracle Database.

Replication & HA

MariaDB is well known for its replication simplicity and flexibility. By default, you can read or even write to your standby/slave servers. Luckily, MySQL 10.X versions brought many significant enhancements to Replication, including Global Transaction IDs, event checksums, multi-threaded slaves and crash-safe slaves/masters at making replication even better. DBAs accustomed to MySQL replication reads and writes would expect a similar or even simpler solution from it's bigger brother, Oracle. Unfortunately not by default.

The standard physical standby implementation for Oracle is closed for any read-write operations. In fact, Oracle offers logical variation but it has many limitations, and it's not designed for HA. The solution to this problem is an additional paid feature called Active Data Guard, which you can use to read data from the standby while you apply redo logs.

Active Data Guard is a paid add-on solution to Oracle’s free Data Guard disaster recovery software available only for Oracle Database Enterprise Edition (highest cost license). It delivers read-only access, while continuously applying changes sent from the primary database. As an active standby database, it helps offload read queries, reporting and incremental backups from the primary database. The product’s architecture is designed to allow standby databases to be isolated from failures that may occur at the primary database.

An exciting feature of Oracle database 12c and something that Oracle DBA would miss is the data corruption validation. Oracle Data Guard corruption checks are performed to ensure that data is in exact alignment before data is copied to a standby database. This mechanism can also be used to restore data blocks on the primary directly from the standby database.

MariaDB offers various replication methods and replication features like:

synchronous,
asynchronous,
semi-synchronous

The feature set for MariaDB replication is rich. With synchronous replication, you can set up failover with no write transaction loss. To reduce asynchronous replication delays you may wish to go with In-order parallelized replication on slaves. The events that can be compressed are the events that normally can be of significant size: Query events (for DDL and DML in statement-based replication), and row events (for DML in row-based replication). Similar to other compression options MariaDB compressed replication is transparent. As mentioned before, the whole process is very easy compared to Oracle Data Guard physical and logical replication.

PL/SQL and database code

Now we come to the tough part: PL/SQL.

While replication and HA with MariaDB reign supreme. Oracle is the king of PL/SQL, no doubt there.

PL/SQL is the main obstacle for migration into the open source world in many organizations . But MariaDB does not give up here.

MariaDB 10.3 (also known as MariaDB TX 3.0) has added some amazing new features including SEQUENCE constructs, Oracle-style packages, and the ROW data type – making migrations much easier.

With new parameter SQL_MODE = ORACLE, MariaDB is now able to parse, depending on the case, a bunch of the legacy Oracle PL/SQL without rewriting the code.

As we can find on their customer story page using the core Oracle PL/SQL compatibility in MariaDB TX 3.0, the Development Bank of Singapore (DBS) has been able to migrate more than half of their business-critical applications in just 12 months from Oracle Database to MariaDB.

The new compatibility mode helps with the following syntax:

Loop Syntax
Variable Declaration
Non-ANSI Stored Procedure Construct
Cursor Syntax
Stored Procedure Parameters
Data Type inheritance (%TYPE, %ROWTYPE)
PL/SQL style Exceptions
Synonyms for Basic SQL Types (VARCHAR2, NUMBER, …)

But if we take a look at the older version 10.2, some of the compatibility between Oracle and MariaDB appeared before like:

Common table expressions
Recursive SQL queries
Windows Functions, NTILETE, RANK, DENESE_RANK.

Native PL/SQL parsing or is some cases direct execution of native Oracle procedures can greatly reduce the cost of development.

Another very useful feature added by SQL_MODE=Oracle is sequences. The implementation of sequences in MariaDB Server 10.3 follows the SQL:2003 standard and includes syntax compatibility with Oracle.

To create a sequence, a create statement is used:

CREATE SEQUENCE Sequence_1 
  START WITH 1  
  INCREMENT BY 1;

When created sequences can be used for example with inserts like:

INSERT INTO database (database_id, database_name) VALUES(Sequence_1.NEXTVAL, 'MariaDB');

Clustering and scaling

MariaDB is an asynchronous, active-active, multi-master database cluster.

MariaDB Cluster differs from what is known as Oracle’s MySQL Cluster - NDB.

MariaDB cluster is based on the multi-master replication plugin provided by Codership (Galera). Since version 5.5, the Galera technology (wsrep API) is an integral part of MariaDB. The Galera plugin architecture stands on three core layers: certification, replication, and group communication framework.

The certification layer prepares the write-sets and does the certification checks on them, guaranteeing that they can be applied.

The Replication layer manages the replication protocol and provides the total ordering capability.

Group Communication Framework implements a plugin architecture which allows other systems to connect via gcomm backend schema.

The main difference from Oracle RAC is that each node has separated data. Oracle RAC is commonly mistaken as a complementary HA solution while disks are usually in the same disk array. MariaDB not only offers redundant storage but it also supports geo located clustering without the need of dedicated fiber.

Backup and Recovery

Oracle offers many backup mechanisms including hot backup, backup, import, export, and many others.

Contrary to MySQL, MariaDB offers an external tool for hot backups called mariabackup. It is a fork of Percona XtraBackup designed to work with encrypted and compressed tables and is the recommended backup method for MariaDB databases.

MariaDB Server 10.1 introduced MariaDB Compression and Data-at-Rest Encryption, but the existing backup solutions did not support full backup capability for these features. So MariaDB decided to extend XtraBackup (version 2.3.8) and named this solution Mariabackup.

Percona and Mariabackup offer similar functionalities, but if you are interested in differences, you can find them here.

What MariaDB does not offer is the recovery catalog of your database backups. Fortunately, this can be extended with third-party systems like ClusterControl.

Cloud compatibility

Cloud infrastructures are getting increasingly popular these days. Although a cloud VM may not be as reliable as an enterprise-grade server, the main cloud providers offer a variety of tools to increase service availability. You can choose between EC2 architecture or DBaaS like Amazon RDS.

Amazon RDS supports MariaDB Server 10.3. It does not support SQL_MODE=Oracle but you still can find a set of features making it easier to migrate. Amazon cloud supports common management tasks like monitoring, backups, multi A-Z deployments, etc..

Another popular cloud provider, Google Cloud, also offers the most recent MariaDB version. You can deploy it as a container or Bintami library certified VM image.

Azure also offers its own implementation of MariaDB. It’s similar to Amazon RDS, with the backups, scaling and builds in high availability. The guaranteed SLA is 99.99% which corresponds to 4 m 23 seconds per month of downtime.

Miscellaneous Considerations

As mentioned in the very beginning of this article, Oracle to MariaDB migration is a multi-stage process. A piece of general advice will be not to try to migrate all of the databases at once. Dividing the migration into small batches is, in most scenarios, the best approach.

If you are not familiar with the technology, give it a try. You should feel confident with the platform and know it’s pros and cons. Testing will build confidence and it affects your decisions with regards to migration.

There are interesting tools which can help you with the most difficult PL/SQL migration process. So interesting ones are dbconvert, AWS Schema Conversion Tool - AWS Documentation.

Finally, you can join me on March 12th for a webinar during which I’ll walk you through all you need to know when it comes to migrating from Oracle database to MariaDB.

Tags:

In previous blogs, we discussed the topic of How to Migrate from Oracle to MySQL / Percona Server and most recently Migrating from Oracle Database to MariaDB - What You Should Know.

Over the years and as new versions of MySQL and MariaDB were released, both projects have deviated entirely into two very different RDBMS platforms.

MariaDB and MySQL now diverge from each other significantly, especially with the arrival of their most recent versions: MySQL 8.0 and MariaDB 10.3 GA and its 10.4 (currently RC candidate).

With the release MariaDB TX 3.0, MariaDB surprised many since it is no longer a drop-in replacement for MySQL. It introduces a new level of compatibility with Oracle database and is now becoming a real alternative to Oracle as well as other enterprise and proprietary databases such as IBM DB2 or EnterpriseDB.

Starting with MariaDB version 10.3, significant features have been introduced such as system-versioned tables and, what's most appealing for Oracle DBA's, support for PL/SQL!

According to the MariaDB website, approximately 80% of the legacy Oracle PL/SQL can be migrated without rewriting the code. MariaDB also has ColumnStore, which is their new analytics engine and a columnar storage engine designed for distributed, massively parallel processing (MPP), such as for big data analytics.

The MariaDB team have worked hard for the added support for PL/SQL. It adds extra ease when migrating to MariaDB from Oracle. As a reference point for your planned migration, you can check the following reference from MariaDB. As per our previous blog, this will not cover the overall process of migration, as it is a long process. But it will hopefully provide enough background information to serve as a guide for your migration process.

Planning and Development Strategy

For the DBA, migrating from Oracle database going to MariaDB, such a migration means a lot of similar factors that shouldn’t be too difficult to shift and adapt to. MariaDB can be operated in Windows server and does have binaries available for Windows platform for downloads. If you are using Oracle for OLAP (Online Analytical Processing) or business intelligence, MariaDB also has the ColumnStore, which is the equivalent of Oracle's Database In-Memory column store.

If you’re used to having an Oracle architecture having MAA (Maximum Available Architecture) with Data Guard ++ Oracle RAC (Real Application Cluster), same as MySQL/Percona Server, in MariaDB, you can choose from a synchronous replication, semi-sync, or an asynchronous replication.

For a highly available solution, MariaDB has Maxscale as your main option you can use. You can mix MaxScale with Keepalived and HAProxy. ClusterControl for example can manage this efficiently and even with the new arrival of MariaDB's product, MariaDB TX. See our previous blog to learn more on how ClusterControl can efficiently manage this.

With MariaDB being an open source technology, this question be considered: "How do we get support?"

You need to make sure when choosing a support option that it isn’t limited to the database but it should cover expertise in scalability, redundancy, resiliency, backups, high-availability, security, monitoring/observability, recovery and engaging on mission critical systems. Overall, the support offering you choose needs to come with an understanding of your architectural setup without exposing confidentiality of your data.

Additionally, MariaDB has a very large and collaborative community world wide. If you experience problems and want to ask people involved in this community, you can try on Freenode via IRC client (Internet Relay Chat), go to their community page, or join their mailing list.

Assessment or Preliminary Check

Backing up your data including configurations or setup files, kernel tunings, automation scripts need to be considered: it's an obvious task, but before you migrate, always secure everything first , especially when moving to a different platform.

You must assess as well that your applications are following up-to-date software engineering conventions and ensure that they are platform agnostic. These practices can be to your benefit especially when moving to a different database platform.

Since MariaDB is an open-source technology, make sure you know what the available connectors are that are available in MariaDB. This is pretty straight-forward right now as there are various available client-libraries. Check here for a list of these client libraries. Aside from that, you can check as well this list of available Clients and Utilities page.

Lastly, make sure of your hardware requirements.

MariaDB doesn't have specific requirements: a typical commodity server can work but that depends on how much performance you require. However, if you are engaged with ColumnStore for your analytical applications or data warehouse applications, check out their documentation. Taken from their page, for AWS, they have tested this generally using m4.4xlarge instance types as a cost effective middle ground. The R4.8xlarge has also been tested and performs about twice as fast for about twice the price.

What You Should Know

Same as MySQL, in MariaDB, you can create multiple databases whereas Oracle does not come with that same functionality.

In MariaDB, a schema is synonymous with a database. You can substitute the keyword SCHEMA instead of DATABASE in the MariaDB SQL syntax. For example, using CREATE SCHEMA instead of CREATE DATABASE; whilst Oracle has a distinction for this. A schema represents only a part of a database: the tables and other objects owned by a single user. Normally, there is a one-to-one relationship between the instance and the database.

For example, in a replication setup equivalent in Oracle (e.g. Real Application Clusters or RAC), you have your multiple instances accessing a single database. This lets you start Oracle on multiple servers, all accessing the same data. However, in MariaDB, you can allow access to multiple databases from your multiple instances and can even filter out which databases/schema you can replicate to a MariaDB node.

Referencing from one of our previous blogs (this and this), the same principle applies when speaking of converting your database with available tools found on the internet.

There is no such tool that can 100% convert Oracle database into MariaDB,though MariaDB has Red Rover Migration Practice ;this is a service that MariaDB offers and it's not free.

MariaDB talks about migration at Development Bank of Singapore (DBS), as a result of its collaboration with MariaDB on Oracle compatibility. It has been able to migrate more than 50 percent of its mission-critical applications in just 12 months from Oracle Database to MariaDB.

But if you are looking for some tools, sqlines tools, which are SQLines SQL Converter and SQLines Data Tool offer a simple yet operational set of tools.

The following sections below further outline the things that you must be aware of when it comes to migration and verifying the logical SQL result.

Data Type Mapping

MySQL and MariaDB share the same data types available. Although there are variations as to how it is implemented, you can check for the list of data types in MariaDB here.

While MySQL uses the JSON data-type, MariaDB differs as it's just an alias of LONGTEXT data type. MariaDB has also a function, JSON_VALID, which can be used within the CHECK constraint expression.

Hence, I'll make use of this tabular presentation below based on the information here, since data-types from MySQL against MariaDB don’t deviate so much, but I have added changes as the ROW data type has been introduced in MariaDB 10.3.0 as part of the PL/SQL compatibility feature.

Check out the table below:

Oracle			MySQL
1	BFILE	Pointer to binary file, ⇐ 4G	VARCHAR(255)
2	BINARY_FLOAT	32-bit floating-point number	FLOAT
3	BINARY_DOUBLE	64-bit floating-point number	DOUBLE
4	BLOB	Binary large object, ⇐ 4G	LONGBLOB
5	CHAR(n), CHARACTER(n)	Fixed-length string, 1 ⇐ n ⇐ 255	CHAR(n), CHARACTER(n)
6	CHAR(n), CHARACTER(n)	Fixed-length string, 256 ⇐ n ⇐ 2000	VARCHAR(n)
7	CLOB	Character large object, ⇐ 4G	LONGTEXT
8	DATE	Date and time	DATETIME
9	DECIMAL(p,s), DEC(p,s)	Fixed-point number	DECIMAL(p,s), DEC(p,s)
10	DOUBLE PRECISION	Floating-point number	DOUBLE PRECISION
11	FLOAT(p)	Floating-point number	DOUBLE
12	INTEGER, INT	38 digits integer	INT	DECIMAL(38)
13	INTERVAL YEAR(p) TO MONTH	Date interval	VARCHAR(30)
14	INTERVAL DAY(p) TO SECOND(s)	Day and time interval	VARCHAR(30)
15	LONG	Character data, ⇐ 2G	LONGTEXT
16	LONG RAW	Binary data, ⇐ 2G	LONGBLOB
17	NCHAR(n)	Fixed-length UTF-8 string, 1 ⇐ n ⇐ 255	NCHAR(n)
18	NCHAR(n)	Fixed-length UTF-8 string, 256 ⇐ n ⇐ 2000	NVARCHAR(n)
19	NCHAR VARYING(n)	Varying-length UTF-8 string, 1 ⇐ n ⇐ 4000	NCHAR VARYING(n)
20	NCLOB	Variable-length Unicode string, ⇐ 4G	NVARCHAR(max)
21	NUMBER(p,0), NUMBER(p)	8-bit integer, 1 <= p < 3	TINYINT	(0 to 255)
		16-bit integer, 3 <= p < 5	SMALLINT
		32-bit integer, 5 <= p < 9	INT
		64-bit integer, 9 <= p < 19	BIGINT
		Fixed-point number, 19 <= p <= 38	DECIMAL(p)
22	NUMBER(p,s)	Fixed-point number, s > 0	DECIMAL(p,s)
23	NUMBER, NUMBER(*)	Floating-point number	DOUBLE
24	NUMERIC(p,s)	Fixed-point number	NUMERIC(p,s)
25	NVARCHAR2(n)	Variable-length UTF-8 string, 1 ⇐ n ⇐ 4000	NVARCHAR(n)
26	RAW(n)	Variable-length binary string, 1 ⇐ n ⇐ 255	BINARY(n)
27	RAW(n)	Variable-length binary string, 256 ⇐ n ⇐ 2000	VARBINARY(n)
28	REAL	Floating-point number	DOUBLE
29	ROWID	Physical row address	CHAR(10) Hence, for PL/SQL compatibility, you can use ROW (<field name> <data type> [{, <field name> <data type>}... ])
30	SMALLINT	38 digits integer	DECIMAL(38)
31	TIMESTAMP(p)	Date and time with fraction	DATETIME(p)
32	TIMESTAMP(p) WITH TIME ZONE	Date and time with fraction and time zone	DATETIME(p)
33	UROWID(n)	Logical row addresses, 1 ⇐ n ⇐ 4000	VARCHAR(n)
34	VARCHAR(n)	Variable-length string, 1 ⇐ n ⇐ 4000	VARCHAR(n)
35	VARCHAR2(n)	Variable-length string, 1 ⇐ n ⇐ 4000	VARCHAR(n)
36	XMLTYPE	XML data	LONGTEXT

Data type attributes and options:

Oracle	MySQL
BYTE and CHAR column size semantics	Size is always in characters

Transactions

MariaDB uses XtraDB from previous versions until 10.1 and shifted to InnoDB from version 10.2 onwards; though various storage engines can be an alternative choice for handling transactions such as the MyRocks storage engine.

By default, MariaDB has the autocommit variable set to ON which means that you have to explicitly handle transactional statements to take advantage of ROLLBACK for ignoring changes or taking advantage of using SAVEPOINT.

It's basically the same concept that Oracle uses in terms of commit, rollbacks and savepoints.

For explicit transactions, this means that you have to use the START TRANSACTION/BEGIN; <SQL STATEMENTS>; COMMIT; syntax.

Otherwise, if you have to disable autocommit, you have to explicitly COMMIT all the time for your statements that require changes to your data.

Dual Table

MariaDB has the dual compatibility with Oracle which is meant for compatibility of databases using a dummy table, namely DUAL. It operates just the same as MySQL where the FROM clause is not mandatory, so the DUAL table is not necessary. However, the DUAL table does not work exactly the same way as it does for Oracle, but for simple SELECT's in MariaDB, this is fine.

This suits Oracle's usage of DUAL so any existing statements in your application that use DUAL might require no changes upon migration to MariaDB.

The Oracle FROM clause is mandatory for every SELECT statement, so Oracle database uses DUAL table for SELECT statement where a table name is not required.

See the following example below:

In Oracle:

SQL> DESC DUAL;
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 DUMMY                                              VARCHAR2(1)

SQL> SELECT CURRENT_TIMESTAMP FROM DUAL;
CURRENT_TIMESTAMP
---------------------------------------------------------------------------
16-FEB-19 04.16.18.910331 AM +08:00

But in MariaDB:

MariaDB [test]> DESC DUAL;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'DUAL' at line 1
MariaDB [test]> SELECT CURRENT_TIMESTAMP FROM DUAL;
+---------------------+
| CURRENT_TIMESTAMP   |
+---------------------+
| 2019-02-27 04:11:01 |
+---------------------+
1 row in set (0.000 sec)

Note: the DESC DUAL syntax does not work in MariaDB and the results as well differ as CURRENT_TIMESTAMP (uses TIMESTAMP data type) in MySQL does not include the timezone.

SYSDATE

Oracle's SYSDATE function is almost the same in MariaDB.

MariaDB returns date and time and it’s a function that requires () (close and open parenthesis with no arguments required. To demonstrate this below, here's Oracle and MariaDB on using SYSDATE.

In Oracle, using plain SYSDATE just returns the date of the day without the time. But to get the time and date, use TO_CHAR to convert the date time into its desired format; whereas in MariaDB, you might not need it to get the date and the time as it returns both.

See example below.

In Oracle:

SQL> SELECT TO_CHAR (SYSDATE, 'MM-DD-YYYY HH24:MI:SS') "NOW" FROM DUAL;
NOW
-------------------
02-16-2019 04:39:00

SQL> SELECT SYSDATE FROM DUAL;

SYSDATE
---------
16-FEB-19

But in MariaDB:

MariaDB [test]> SELECT SYSDATE() FROM DUAL;
+---------------------+
| SYSDATE()           |
+---------------------+
| 2019-02-27 04:11:57 |
+---------------------+
1 row in set (0.000 sec)

If you want to format the date, MariaDB has a DATE_FORMAT() function.

You can check the MariaDB's Date and Time documentation for more information.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

TO_DATE

Oracle's TO_DATE equivalent in MariaDB is the STR_TO_DATE() function.

It’s almost identical to the one in Oracle: it returns the DATE data type, while in MariaDB it returns the DATETIME data type.

Oracle:

SQL> SELECT TO_DATE ('20190218121212','yyyymmddhh24miss') as "NOW" FROM DUAL; 
NOW
-------------------------
18-FEB-19

MariaDB:

MariaDB [test]> SELECT STR_TO_DATE('2019-02-18 12:12:12','%Y-%m-%d %H:%i:%s') as "NOW" FROM DUAL;
+---------------------+
| NOW                 |
+---------------------+
| 2019-02-18 12:12:12 |
+---------------------+
1 row in set (0.000 sec)

SYNONYM

MariaDB does not have an equivalent functionality to this yet. Currently, based on their Jira ticket MDEV-16482 , this feature request to add SYNONYM is still open and no sign yet of progress as of this time. We're hoping that this will be incorporated in the future release. However, a possible alternative could be using VIEW.

Although SYNONYM in Oracle can be used to create an alias of a remote table,

e.g.

CREATE PUBLIC SYNONYM emp_table FOR hr.employees@remote.us.oracle.com

In MariaDB, you can take advantage of using the CONNECT storage engine which is more powerful than the FederatedX storage engine is, as it allows you to connect various database sources. You can check out this short video presentation.

There's a good example in the MariaDB's manual page,which I will not reiterate here as there are certain considerations you have to meet especially when using ODBC. Please refer to the manual.

Behaviour of Empty String and NULL

Take note that in MariaDB, empty string is not NULL whereas Oracle treats empty string as null values.

In Oracle:

SQL> SELECT CASE WHEN '' IS NULL THEN 'Yes' ELSE 'No' END AS "Null Eval" FROM dual;
Nul
---
Yes

In MariaDB:

MariaDB [test]> SELECT CASE WHEN '' IS NULL THEN 'Yes' ELSE 'No' END AS "Null Eval" FROM dual;
+-----------+
| Null Eval |
+-----------+
| No        |
+-----------+
1 row in set (0.001 sec)

Sequences

Since MariaDB 10.3, Oracle-compatible sequences and a stored procedure language compliant with Oracle PL/SQL has been introduced. In MariaDB, creating a sequence is pretty similar to Oracle's SEQUENCE.

MariaDB's example:

CREATE SEQUENCE s START WITH 100 INCREMENT BY 10;
CREATE SEQUENCE s2 START WITH -100 INCREMENT BY -10;

and specifying workable minimum and maximum values shows as follows

CREATE SEQUENCE s3 START WITH -100 INCREMENT BY 10 MINVALUE=-100 MAXVALUE=1000;

Character String Functions

MariaDB, same as MySQL, also has a handful of string functions which is too long to discuss it here one-by-one. Hence, can check the documentation from here and compare this against Oracle's string functions.

DML Statements

Insert/Update/Delete statements from Oracle are congruous in MariaDB.

Oracle's INSERT ALL/INSERT FIRST is not supported in MariaDB and no one yet opened this feature request in their Jira (that I know of).

Otherwise, you’d need to state your MySQL queries one-by-one.

e.g.

In Oracle:

SQL> INSERT ALL
  INTO CUSTOMERS (customer_id, customer_name, city) VALUES (1000, 'Jase Alagaban', 'Davao City')
  INTO CUSTOMERS (customer_id, customer_name, city) VALUES (2000, 'Maximus Aleksandre Namuag', 'Davao City')
SELECT * FROM dual;
2 rows created.

But in MariaDB, you have to run the insert one at a time:

MariaDB [test]> INSERT INTO CUSTOMERS (customer_id, customer_name, city) VALUES (1000, 'Jase Alagaban', 'Davao City');
Query OK, 1 row affected (0.02 sec)
MariaDB [test]> INSERT INTO CUSTOMERS (customer_id, customer_name, city) VALUES (2000, 'Maximus Aleksandre Namuag', 'Davao City');
Query OK, 1 row affected (0.00 sec)

The INSERT ALL/INSERT FIRST doesn’t compare to how it is used in Oracle, where you can take advantage of conditions by adding a WHEN keyword in your syntax; there's no equivalent option as of this time in MariaDB.

Hence, your alternative solution on this is to use procedures.

Outer Joins "+" Symbol

Currently, for compatibility, it's not yet present in MariaDB. Hence, there are plenty of Jira tickets I have found in MariaDB but this one is much more precise in terms of feature request. Hence, your alternative choice for this time is to use JOIN syntax. Please check the documentation for more info about this.

START WITH..CONNECT BY

Oracle uses START WITH..CONNECT BY for hierarchical queries.

Starting MariaDB 10.2, they introduced CTE (Common Table Expression) which is designed to support generations of hierarchical data results, which use models such as adjacency lists or nested set models.

Similar to PostgreSQL and MySQL, MariaDB uses non-recursive and recursive CTE's.

For example, a simple non-recursive which is used to compare individuals against their group:

WITH sales_product_year AS (
SELECT product,
YEAR(ship_date) AS year,
SUM(price) AS total_amt
FROM item_sales
GROUP BY product, year
)

SELECT * 
FROM sales_product_year S1
WHERE
total_amt > 
    (SELECT 0.1 * SUM(total_amt)
     FROM sales_product_year S2
     WHERE S2.year = S1.year)

while a recursive CTE (example: return the bus destinations with New York as the origin)

WITH RECURSIVE bus_dst as ( 
    SELECT origin as dst FROM bus_routes WHERE origin='New York' 
  UNION
    SELECT bus_routes.dst FROM bus_routes, bus_dst WHERE bus_dst.dst= bus_routes.origin 
) 
SELECT * FROM bus_dst;

PL/SQL in MariaDB?

Previously, in our blog about "Migrating from Oracle Database to MariaDB - What You Should Know", we showcased how powerful it is now in MariaDB adding its compliance to adopt PL/SQL as part of its database kernel. Whenever you use PL/SQL compatibility in MariaDB, make sure you have set SQL_MODE = 'Oracle' just like as follows:

SET SQL_MODE='ORACLE';

The new compatibility mode helps with the following syntax:

Loop Syntax
Variable Declaration
Non-ANSI Stored Procedure Construct
Cursor Syntax
Stored Procedure Parameters
Data Type Inheritance (%TYPE, %ROWTYPE)
PL/SQL Style Exceptions
Synonyms for Basic SQL Types (VARCHAR2, NUMBER, …)

For example, in Oracle, you can create a package, which is a schema object that groups logically related PL/SQL types, variables, and subprograms. Hence, in MariaDB, you can do it just like below:

MariaDB [test]> CREATE OR REPLACE PACKAGE BODY hello AS
    -> 
    ->   vString VARCHAR2(255) := NULL;
    -> 
    ->   -- was declared public in PACKAGE
    ->   PROCEDURE helloFromS9s(pString VARCHAR2) AS
    ->   BEGIN
    ->     SELECT 'Severalnines showing MariaDB Package Procedure in ' || pString || '!' INTO vString FROM dual;
    ->     SELECT vString;
    ->   END;
    -> 
    -> BEGIN
    ->   SELECT 'called only once per connection!';
    -> END hello;
    -> /
Query OK, 0 rows affected (0.021 sec)

MariaDB [test]> 
MariaDB [test]> DECLARE
    ->   vString VARCHAR2(255) := NULL;
    ->   -- CONSTANT seems to be not supported yet by MariaDB
    ->   -- cString CONSTANT VARCHAR2(255) := 'anonymous block';
    ->   cString VARCHAR2(255) := 'anonymous block';
    -> BEGIN
    ->   CALL hello.helloFromS9s(cString);
    -> END;
    -> /
+----------------------------------+
| called only once per connection! |
+----------------------------------+
| called only once per connection! |
+----------------------------------+
1 row in set (0.000 sec)

+--------------------------------------------------------------------+
| vString                                                            |
+--------------------------------------------------------------------+
| Severalnines showing MariaDB Package Procedure in anonymous block! |
+--------------------------------------------------------------------+
1 row in set (0.000 sec)

Query OK, 1 row affected (0.000 sec)

MariaDB [test]> 
MariaDB [test]> DELIMITER ;

However, Oracle's PL/SQL is compiled before execution when it is loaded into the server. Although MariaDB does not say this in their manual, I would assume that the approach is the same as MySQL where it is compiled and stored in the cache when it's invoked.

Migration Tools

As my colleague Bart indicated in our previous blog here, sqlines tools which are SQLines SQL Converter and SQLines Data Tool can also provide aid as part of your migration.

MariaDB have their Red Rover Migration Practice service which you can take advantage of.

Overall, Oracle's migration to MariaDB is not as easy a thing as for migrating to MySQL/Percona, which could add more challenges than MariaDB; especially no PL/SQL compatibility exists in MySQL.

Anyhow, if you find or know of any tools that you find helpful and beneficial for migrating from Oracle to MariaDB, please leave a comment on this blog!

Testing

Same as what I have stated in this blog, allow me to reiterate some of it here.

As part of your migration plan, testing is a vital task that plays a very important role and affects your decision with regards to migration.

The tool dbdeployer (a replacement of MySQL Sandbox) is a very helpful tool that you can take advantage of. This is pretty easy for you to try and test different approaches and saves you time, rather than setting up the whole stack if your purpose is to try and test the RDBMS platform first.

For testing your SQL stored routines (functions or procedures), triggers, events, I suggest you use these tools mytap or the Google's Unit Testing Framework.

Percona tools can still be useful and can be incorporated to your DBA or engineering tasks even with MariaDB. Checkout Percona Toolkit here. You can cherry-pick the tools according to your needs especially for testing and production-usage tasks.

Overall, things that you need to keep-in-mind as your guidelines when doing a test for your MariaDB Server are:

After your installation, you need to consider doing some tuning. Checkout our webinar about tuning your MariaDB server.
Do some benchmarks and stress-load testing for your configuration setup on your current node. Checkout mysqlslap and sysbench which can help you with this. Also check out our blog "How to Benchmark Performance of MySQL & MariaDB using SysBench".
Check your DDL's if they are correctly defined such as data-types, constraints, clustered and secondary indexes, or partitions, if you have any.
Check your DML especially if syntax are correct and are saving the data correctly as expected.
Check out your stored routines, events, trigger to ensure they run/return the expected results.
Verify that your queries running are performant. I suggest you take advantage of open-source tools or try our ClusterControl product. It offers monitoring/observability especially of your MariaDB cluster. Check this previous blog in which we showcase how ClusterControl can help you manage MariaDB TX 3.0. You can use ClusterControl here to monitor your queries and its query plan to make sure they are performant.

Tags:

MariaDB

oracle

migration

Long gone are the days when a database was deployed as a single node or instance - a powerful, standalone server which was tasked to handle all the requests to the database. Vertical scaling was the way to go - replace the server with another, even more powerful one. During these times, one didn’t really have to be bothered by network performance. As long as the requests were coming in, all was good.

But nowadays, databases are built as clusters with nodes interconnected over a network. It is not always a fast, local network. With businesses reaching global scale, database infrastructure has also to span across the globe, to stay close to customers and to reduce latency. It comes with additional challenges that we have to face when designing a highly available database environment. In this blog post, we will look into the network issues that you may face and provide some suggestions on how to deal with them.

Two Main Options for MySQL or MariaDB HA

We covered this particular topic quite extensively in one of the whitepapers, but let’s look at the two main ways of building high availability for MySQL and MariaDB.

Galera Cluster

Galera Cluster is shared-nothing, virtually synchronous cluster technology for MySQL. It allows to build multi-writer setups that can span across the globe. Galera thrives in low-latency environments but it can also be configured to work with long WAN connections. Galera has a built-in quorum mechanism which ensures that data will not be compromised in case of the network partitioning of some of the nodes.

MySQL Replication

MySQL Replication can be either asynchronous or semi-synchronous. Both are designed to build large scale replication clusters. Like in any other master-slave or primary-secondary replication setup, there can be only one writer, the master. Other nodes, slaves, are used for failover purposes as they contain the copy of the data set from the maser. Slaves can also be used for reading the data and offloading some of the workload from the master.

Both solutions have their own limits and features, both suffer from different problems. Both can be affected by unstable network connections. Let’s take a look at those limitations and how we can design the environment to minimize the impact of an unstable network infrastructure.

Galera Cluster - Network Problems

First, let’s take a look at Galera Cluster. As we discussed, it works best in a low-latency environment. One of the main latency-related problems in Galera is the way how Galera handles the writes. We will not go into all the details in this blog, but further reading in our Galera Cluster for MySQL tutorial. The bottom line is that, due to the certification process for writes, where all nodes in the cluster have to agree on whether the write can be applied or not, your write performance for single row is strictly limited by the network roundtrip time between the writer node and the most far away node. As long as the latency is acceptable and as long as you do not have too many hot spots in your data, WAN setups may work just fine. The problem starts when the network latency spikes from time to time. Writes will then take 3 or 4 times longer than usual and, as a result, databases may start to be overloaded with long-running writes.

One of great features of Galera Cluster is its ability to detect the cluster state and react upon network partitioning. If a node of the cluster cannot be reached, it will be evicted from the cluster and it will not be able to perform any writes. This is crucial in maintaining the integrity of the data during the time when the cluster is split - only the majority of the cluster will accept writes. Minority will complain. To handle this, Galera introduces a vast array of checks and configurable timeouts to avoid false alerts on very transient network issues. Unfortunately, if the network is unreliable, Galera Cluster will not be able to work correctly - nodes will start to leave the cluster, join it later. It will be especially problematic when we have Galera Cluster spanning across WAN - separated pieces of the cluster may disappear randomly if the interconnecting network will not work properly.

How to Design Galera Cluster for an Unstable Network?

First things first, if you have network problems within the single datacenter, there is not much you can do unless you will be able to solve those issues somehow. Unreliable local network is a no go for Galera Cluster, you have to reconsider using some other solution (even though, to be honest, unreliable network will always be a problematic). On the other hand, if the problems are related to WAN connections only (and this is one of the most typical cases), it may be possible to replace WAN Galera links with regular asynchronous replication (if the Galera WAN tuning did not help).

There are several inherent limitations in this setup - the main issue is that the writes used to happen locally. Now, all the writes will have to head to the “master” datacenter (DC A in our case). This is not as bad as it sounds. Please keep in mind that in an all-Galera environment, writes will be slowed down by the latency between nodes located in different datacenters. Even local writes will be affected. It will be more or less the same slowdown as with asynchronous setup in which you would send the writes across WAN to the “master” datacenter.

Using asynchronous replication comes with all of the problems typical for the asynchronous replication. Replication lag may become a problem - not that Galera would be more performant, it’s just that Galera would slow down the traffic via flow control while replication does not have any mechanism to throttle the traffic on the master.

Another problem is the failover: if the “master” Galera node (the one which acts as the master to the slaves in other datacenters) would fail, some mechanism has to be created to repoint slaves to another, working master node. It might be some sort of a script, it is also possible to try something with VIP where the “slave” Galera cluster slaves off Virtual IP which is always assigned to the alive Galera node in the “master” cluster.

The main advantage of such setup is that we do remove the WAN Galera link which means that our “master” cluster will not be slowed down by the fact that some of the nodes are separated geographically. As we mentioned, we lose the ability to write in all of the data-centers but latency-wise writing across the WAN is the same as writing locally to the Galera cluster which spans across WAN. As a result the overall latency should improve. Asynchronous replication is also less vulnerable to the unstable networks. Worst case scenario, the replication link will break and it will be recreated when the networks converge.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

How to Design MySQL Replication for an Unstable Network?

In the previous section, we covered Galera cluster and one solution was to use asynchronous replication. How does it look like in a plain asynchronous replication setup? Let’s look at how an unstable network can cause the biggest disruptions in the replication setup.

First of all, latency - one of the main pain points for Galera Cluster. In case of replication, it is almost a non-issue. Unless you use semi-synchronous replication that is - in such case, increased latency will slow down writes. In asynchronous replication, latency has no impact on the write performance. It may, though, have some impact on the replication lag. It is not anything as significant as it was for Galera but you may expect more lag spikes and overall less stable replication performance if the network between nodes suffers from high latency. This is mostly due to the fact that the master may as well serve several writes before data transfer to the slave can be initiated on high latency network.

The network instability may definitely impact replication links but it is, again, not that critical. MySQL slaves will attempt to reconnect to their masters and replication will commence.

The main issue with MySQL replication is actually something that Galera Cluster solves internally - network partitioning. We are talking about the network partitioning as the condition in which segments of the network are separated from each other. MySQL replication utilizes one single writer node - master. No matter how you design your environment, you have to send your writes to the master. If the master is not available (for whatever reasons), application cannot do its job unless it runs in some sort of read-only mode. Therefore there is a need to pick the new master as soon as possible. This is where the issues show up.

First, how to tell which host is a master and which one is not. One of the usual ways is to use the “read_only” variable to distinguish slaves from the master. If node has read_only enabled (set read_only=1), it is a slave (as slaves should not handle any direct writes). If the node has read_only disabled (set read_only=0), it is a master. To make things safer, a common approach is to set read_only=1 in MySQL configuration - in case of a restart, it is safer if the node shows up as a slave. Such “language” can be understood by proxies like ProxySQL or MaxScale.

Let’s take a look at an example.

We have application hosts which connect to the proxy layer. Proxies perform the read/write split sending SELECTs to slaves and writes to master. If master is down, failover is performed, new master is promoted, proxy layer detects that and start sending writes to another node.

If node1 restarts, it will come up with read_only=1 and it will be detected as a slave. It is not ideal as it is not replicating but it is acceptable. Ideally, the old master should not show up at all until it is rebuilt and slaved off the new master.

Way more problematic situation is if we have to deal with network partitioning. Let’s consider the same setup: application tier, proxy tier and databases.

When the network makes the master not reachable, the application is not usable as no writes make it to their destination. New master is promoted, writes are redirected to it. What will happen then if the network issues cease and the old master becomes reachable? It has not been stopped, therefore it is still using read_only=0:

You’ve now ended up in a split brain, when writes were directed to two nodes. This situation is pretty bad as to merge diverged datasets may take a while and it is quite a complex process.

What can be done to avoid this problem? There is no silver bullet but some actions can be taken to minimize the probability of a split brain to happen.

First of all, you can be smarter in detecting the state of the master. How do the slaves see it? Can they replicate from it? Maybe some of the slaves still can connect to the master, meaning that the master is up and running or, at least, making it possible to stop it should that be necessary. What about the proxy layer? Do all of the proxy nodes see the master as unavailable? If some can still connect, than you can try to utilize those nodes to ssh into the master and stop it before the failover?

The failover management software can also be smarter in detecting the state of the network. Maybe it utilizes RAFT or some other clustering protocol to build a quorum-aware cluster. If a failover management software can detect the split brain, it can also take some actions based on this like, for example, setting all nodes in the partitioned segment to read_only ensuring that the old master will not show up as writable when the networks converge.

You can also include tools like Consul or Etcd to store the state of the cluster. The proxy layer can be configured to use data from Consul, not the state of the read_only variable. It will be then up to the failover management software to make necessary changes in Consul so that all proxies will send the traffic to a correct, new master.

Some of those hints can even be combined together to make the failure detection even more reliable. All in all, it is possible to minimize the chances that the replication cluster will suffer from unreliable networks.

As you can see, no matter if we are talking about Galera or MySQL Replication, unstable networks may become a serious problem. On the other hand, if you design the environment correctly, you can still make it work. We hope this blog post will help you to create environments which will work stable even if the networks are not.

Tags:

We regularly get questions about how to set up a Galera cluster with just 2 nodes.

The documentation clearly states you should have at least 3 Galera nodes to avoid network partitioning. But there are some valid reasons for considering a 2 node deployment, e.g., if you want to achieve database high availability but have a limited budget to spend on a third database node. Or perhaps you are running Galera in a development/sandbox environment and prefer a minimal setup.

Galera implements a quorum-based algorithm to select a primary component through which it enforces consistency. The primary component needs to have a majority of votes, so in a 2 node system, there would be no majority resulting in split brain. Fortunately, it is possible to add a garbd (Galera Arbitrator Daemon), which is a lightweight stateless daemon that can act as the odd node. Arbitrator failure does not affect the cluster operations and a new instance can be reattached to the cluster at any time. There can be several arbitrators in the cluster.

ClusterControl has support for deploying garbd on non-database hosts.

Normally a Galera cluster needs at least three hosts to be fully functional, however, at deploy time, two nodes would suffice to create a primary component. Here are the steps:

Deploy a Galera cluster of two nodes,
After the cluster has been deployed by ClusterControl, add garbd on the ClusterControl node.

You should end up with the below setup:

Deploy the Galera Cluster

Go to the ClusterControl Deploy section to deploy the cluster.

After selecting the technology that we want to deploy, we must specify User, Key or Password and port to connect by SSH to our hosts. We also need the name for our new cluster and if we want ClusterControl to install the corresponding software and configurations for us.

After setting up the SSH access information, we must select vendor/version and we must define the database admin password, datadir and port. We can also specify which repository to use.

Even though ClusterControl warns you that a Galera cluster needs an odd number of nodes, only add two nodes to the cluster.

Deploying a Galera cluster will trigger a ClusterControl job which can be monitored at the Jobs page.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Install Garbd

Once deployment is complete, install garbd on the ClusterControl host. We have the option to deploy garbd from ClusterControl, but this option won’t work if we want to deploy it in the same ClusterControl server. This is to avoid some issue related to the database versions and package dependencies.

So, we must install it manually, and then import garbd to ClusterControl.

Let’s see the manual installation of Percona Garbd on CentOS 7.

Create the Percona repository file:

$ vi /etc/yum.repos.d/percona.repo
[percona-release-$basearch]
name = Percona-Release YUM repository - $basearch
baseurl = http://repo.percona.com/release/$releasever/RPMS/$basearch
enabled = 1
gpgcheck = 0
[percona-release-noarch]
name = Percona-Release YUM repository - noarch
baseurl = http://repo.percona.com/release/$releasever/RPMS/noarch
enabled = 1
gpgcheck = 0
[percona-release-source]
name = Percona-Release YUM repository - Source packages
baseurl = http://repo.percona.com/release/$releasever/SRPMS
enabled = 0
gpgcheck = 0

Then, install the Percona XtraDB Cluster garbd package:

$ yum install Percona-XtraDB-Cluster-garbd-57

Now, we need to configure garbd. For this, we need to edit the /etc/sysconfig/garb file:

$ vi /etc/sysconfig/garb
# Copyright (C) 2012 Codership Oy
# This config file is to be sourced by garb service script.
# A comma-separated list of node addresses (address[:port]) in the cluster
GALERA_NODES="192.168.100.192:4567,192.168.100.193:4567"
# Galera cluster name, should be the same as on the rest of the nodes.
GALERA_GROUP="Galera1"
# Optional Galera internal options string (e.g. SSL settings)
# see http://galeracluster.com/documentation-webpages/galeraparameters.html
# GALERA_OPTIONS=""
# Log file for garbd. Optional, by default logs to syslog
# Deprecated for CentOS7, use journalctl to query the log for garbd
# LOG_FILE=""

Change the GALERA_NODES and GALERA_GROUP parameter according to the Galera nodes configuration. We also need to remove the line # REMOVE THIS AFTER CONFIGURATION before starting the service.

And now, we can start the garb service:

$ service garb start
Redirecting to /bin/systemctl start garb.service

Now, we can import the new garbd into ClusterControl.

Go to ClusterControl -> Select Cluster -> Add Load Balancer.

Then, select Garbd and Import Garbd section.

Here we only need to specify the hostname or IP Address and the port of the new Garbd.

Importing garbd will trigger a ClusterControl job which can be monitored at the Jobs page. Once completed, you can verify garbd is running with a green tick icon at the top bar:

That’s it!

Our minimal two-node Galera cluster is now ready!

Tags:

Galera replication is relatively new if compared to MySQL replication, which is natively supported since MySQL v3.23. Although MySQL replication is designed for master-slave unidirectional replication, it can be configured as an active master-master setup with bidirectional replication. While it is easy to set up, and some use cases might benefit from this “hack”, there are a number of caveats. On the other hand, Galera cluster is a different type of technology to learn and manage. Is it worth it?

In this blog post, we are going to compare master-master replication to Galera cluster.

Replication Concepts

Before we jump into the comparison, let’s explain the basic concepts behind these two replication mechanisms.

Generally, any modification to the MySQL database generates an event in binary format. This event is transported to the other nodes depending on the replication method chosen - MySQL replication (native) or Galera replication (patched with wsrep API).

MySQL Replication

The following diagrams illustrates the data flow of a successful transaction from one node to another when using MySQL replication:

The binary event is written into the master's binary log. The slave(s) via slave_IO_thread will pull the binary events from master's binary log and replicate them into its relay log. The slave_SQL_thread will then apply the event from the relay log asynchronously. Due to the asynchronous nature of replication, the slave server is not guaranteed to have the data when the master performs the change.

Ideally, MySQL replication will have the slave to be configured as a read-only server by setting read_only=ON or super_read_only=ON. This is a precaution to protect the slave from accidental writes which can lead to data inconsistency or failure during master failover (e.g., errant transactions). However, in a master-master active-active replication setup, read-only has to be disabled on the other master to allow writes to be processed simultaneously. The primary master must be configured to replicate from the secondary master by using the CHANGE MASTER statement to enable circular replication.

Galera Replication

The following diagrams illustrates the data replication flow of a successful transaction from one node to another for Galera Cluster:

The event is encapsulated in a writeset and broadcasted from the originator node to the other nodes in the cluster by using Galera replication. The writeset undergoes certification on every Galera node and if it passes, the applier threads will apply the writeset asynchronously. This means that the slave server will eventually become consistent, after agreement of all participating nodes in global total ordering. It is logically synchronous, but the actual writing and committing to the tablespace happens independently, and thus asynchronously on each node with a guarantee for the change to propagate on all nodes.

Avoiding Primary Key Collision

In order to deploy MySQL replication in master-master setup, one has to adjust the auto increment value to avoid primary key collision for INSERT between two or more replicating masters. This allows the primary key value on masters to interleave each other and prevent the same auto increment number being used twice on either of the node. This behaviour must be configured manually, depending on the number of masters in the replication setup. The value of auto_increment_increment equals to the number of replicating masters and the auto_increment_offset must be unique between them. For example, the following lines should exist inside the corresponding my.cnf:

Master1:

log-slave-updates
auto_increment_increment=2
auto_increment_offset=1

Master2:

log-slave-updates
auto_increment_increment=2
auto_increment_offset=2

Likewise, Galera Cluster uses this same trick to avoid primary key collisions by controlling the auto increment value and offset automatically with wsrep_auto_increment_control variable. If set to 1 (the default), will automatically adjust the auto_increment_increment and auto_increment_offset variables according to the size of the cluster, and when the cluster size changes. This avoids replication conflicts due to auto_increment. In a master-slave environment, this variable can be set to OFF.

The consequence of this configuration is the auto increment value will not be in sequential order, as shown in the following table of a three-node Galera Cluster:

Node	auto_increment_increment	auto_increment_offset	Auto increment value
Node 1	3	1	1, 4, 7, 10, 13, 16...
Node 2	3	2	2, 5, 8, 11, 14, 17...
Node 3	3	3	3, 6, 9, 12, 15, 18...

If an application performs insert operations on the following nodes in the following order:

Node1, Node3, Node2, Node3, Node3, Node1, Node3 ..

Then the primary key value that will be stored in the table will be:

1, 6, 8, 9, 12, 13, 15 ..

Simply said, when using master-master replication (MySQL replication or Galera), your application must be able to tolerate non-sequential auto-increment values in its dataset.

For ClusterControl users, take note that it supports deployment of MySQL master-master replication with a limit of two masters per replication cluster, only for active-passive setup. Therefore, ClusterControl does not deliberately configure the masters with auto_increment_increment and auto_increment_offset variables.

Data Consistency

Galera Cluster comes with its flow-control mechanism, where each node in the cluster must keep up when replicating, or otherwise all other nodes will have to slow down to allow the slowest node to catch up. This basically minimizes the probability of slave lag, although it might still happen but not as significant as in MySQL replication. By default, Galera allows nodes to be at least 16 transactions behind in applying through variable gcs.fc_limit. If you want to do critical reads (a SELECT that must return most up to date information), you probably want to use session variable, wsrep_sync_wait.

Galera Cluster on the other hand comes with a safeguard to data inconsistency whereby a node will get evicted from the cluster if it fails to apply any writeset for whatever reasons. For example, when a Galera node fails to apply writeset due to internal error by the underlying storage engine (MySQL/MariaDB), the node will pull itself out from the cluster with the following error:

150305 16:13:14 [ERROR] WSREP: Failed to apply trx 1 4 times
150305 16:13:14 [ERROR] WSREP: Node consistency compromized, aborting..

To fix the data consistency, the offending node has to be re-synced before it is allowed to join the cluster. This can be done manually or by wiping out the data directory to trigger snapshot state transfer (full syncing from a donor).

MySQL master-master replication does not enforce data consistency protection and a slave is allowed to diverge e.g, replicate a subset of data or lag behind, which makes the slave inconsistent with the master. It is designed to replicate data in one flow - from master down to the slaves. Data consistency checks have to be performed manually or via external tools like Percona Toolkit pt-table-checksum or mysql-replication-check.

Conflict Resolution

Generally, master-master (or multi-master, or bi-directional) replication allows more than one member in the cluster to process writes. With MySQL replication, in case of replication conflict, the slave's SQL thread simply stops applying the next query until the conflict is resolved, either by manually skipping the replication event, fixing the offending rows or resyncing the slave. Simply said, there is no automatic conflict resolution support for MySQL replication.

Galera Cluster provides a better alternative by retrying the offending transaction during replication. By using wsrep_retry_autocommit variable, one can instruct Galera to automatically retry a failed transaction due to cluster-wide conflicts, before returning an error to the client. If set to 0, no retries will be attempted, while a value of 1 (the default) or more specifies the number of retries attempted. This can be useful to assist applications using autocommit to avoid deadlocks.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Node Consensus and Failover

Galera uses Group Communication System (GCS) to check node consensus and availability between cluster members. If a node is unhealthy, it will be automatically evicted from the cluster after gmcast.peer_timeout value, default to 3 seconds. A healthy Galera node in "Synced" state is deemed as a reliable node to serve reads and writes, while others are not. This design greatly simplifies health check procedures from the upper tiers (load balancer or application).

In MySQL replication, a master does not care about its slave(s), while a slave only has consensus with its sole master via the slave_IO_thread process when replicating the binary events from master's binary log. If a master goes down, this will break the replication and an attempt to re-establish the link will be made every slave_net_timeout (default to 60 seconds). From the application or load balancer perspective, the health check procedures for replication slave must at least involve checking the following state:

Seconds_Behind_Master
Slave_IO_Running
Slave_SQL_Running
read_only variable
super_read_only variable (MySQL 5.7.8 and later)

In terms of failover, generally, master-master replication and Galera nodes are equal. They hold the same data set (albeit you can replicate a subset of data in MySQL replication, but that's uncommon for master-master) and share the same role as masters, capable of handling reads and writes simultaneously. Therefore, there is actually no failover from the database point-of-view due to this equilibrium. Only from the application side that would require failover to skip the unoperational nodes. Keep in mind that because MySQL replication is asynchronous, it is possible that not all of the changes done on the master will have propagated to the other master.

Node Provisioning

The process of bringing a node into sync with the cluster before replication starts, is known as provisioning. In MySQL replication, provisioning a new node is a manual process. One has to take a backup of the master and restore it over to the new node before setting up the replication link. For an existing replication node, if the master's binary logs have been rotated (based on expire_logs_days, default to 0 means no automatic removal), you may have to re-provision the node using this procedure. There are also external tools like Percona Toolkit pt-table-sync and ClusterControl to help you out on this. ClusterControl supports resyncing a slave with just two clicks. You have options to resync by taking a backup from the active master or an existing backup.

In Galera, there are two ways of doing this - incremental state transfer (IST) or state snapshot transfer (SST). IST process is the preferred method where only the missing transactions transfer from a donor's cache. SST process is similar to taking a full backup from the donor, it is usually pretty resource intensive. Galera will automatically determine which syncing process to trigger based on the joiner's state. In most cases, if a node fails to join a cluster, simply wipe out the MySQL datadir of the problematic node and start the MySQL service. Galera provisioning process is much simpler, it comes very handy when scaling out your cluster or re-introducing a problematic node back into the cluster.

Loosely Coupled vs Tightly Coupled

MySQL replication works very well even across slower connections, and with connections that are not continuous. It can also be used across different hardware, environment and operating systems. Most storage engines support it, including MyISAM, Aria, MEMORY and ARCHIVE. This loosely coupled setup allows MySQL master-master replication to work well in a mixed environment with less restriction.

Galera nodes are tightly-coupled, where the replication performance is as fast as the slowest node. Galera uses a flow control mechanism to control replication flow among members and eliminate any slave lag. The replication can be all fast or all slow on every node and is adjusted automatically by Galera. Thus, it's recommended to use uniform hardware specs for all Galera nodes, especially with respect to CPU, RAM, disk subsystem, network interface card and network latency between nodes in the cluster.

Conclusions

In summary, Galera Cluster is superior if compared to MySQL master-master replication due to its synchronous replication support with strong consistency, plus more advanced features like automatic membership control, automatic node provisioning and multi-threaded slaves. Ultimately, this depends on how the application interacts with the database server. Some legacy applications built for a standalone database server may not work well on a clustered setup.

To simplify our points above, the following reasons justify when to use MySQL master-master replication:

Things that are not supported by Galera:
- Replication for non-InnoDB/XtraDB tables like MyISAM, Aria, MEMORY or ARCHIVE.
- XA transactions.
- Statement-based replication between masters (e.g, when bandwidth is very expensive).
- Relying on explicit locking like LOCK TABLES statement.
- The general query log and the slow query log must be directed to a table, instead of a file.
Loosely coupled setup where the hardware specs, software version and connection speed are significantly different on every master.
When you already have a MySQL replication chain and you want to add another active/backup master for redundancy to speed up failover and recovery time in case if one of the master is unavailable.
If your application can't be modified to work around Galera Cluster limitations and having a MySQL-aware load balancer like ProxySQL or MaxScale is not an option.

Reasons to pick Galera Cluster over MySQL master-master replication:

Ability to safely write to multiple masters.
Data consistency automatically managed (and guaranteed) across databases.
New database nodes easily introduced and synced.
Failures or inconsistencies automatically detected.
In general, more advanced and robust high availability features.

Tags:

The following is an excerpt from our whitepaper “How to Design Highly Available Open Source Database Environments” which can be downloaded for free.

A Couple of Words on “High Availability”

These days high availability is a must for any serious deployment. Long gone are days when you could schedule a downtime of your database for several hours to perform a maintenance. If your services are not available, you are losing customers and money. Therefore making a database environment highly available has typically one of the highest priorities.

This poses a significant challenge to database administrators. First of all, how do you tell if your environment is highly available or not? How would you measure it? What are the steps you need to take in order to improve availability? How to design your setup to make it highly available from the beginning?

There are many many HA solutions available in the MySQL (and MariaDB) ecosystem, but how do we know which ones we can trust? Some solutions might work under certain specific conditions, but might cause more trouble when applied outside of these conditions. Even a basic functionality like MySQL replication, which can be configured in many ways, can cause significant harm - for instance, circular replication with multiple writeable masters. Although it is easy to set up a ‘multi-master setup’ using replication, it can very easily break and leave us with diverging datasets on different servers. For a database, which is often considered the single source of truth, compromised data integrity can have catastrophic consequences.

In the following chapters, we’ll discuss the requirements for high availability in database
setups, and how to design the system from the ground up.

Measuring High Availability

What is high availability? To be able to decide if a given environment is highly available or not, one has to have some metrics for that. There are numerous ways you can measure high availability, we’ll focus on some of the most basic stuff.

First, though, let’s think what this whole high availability is all about? What is its purpose? It is about making sure your environment serves its purpose. Purpose can be defined in many ways but, typically, it will be about delivering some service. In the database world, typically it’s somewhat related to data. It could be serving data to your internal application. It can be to store data and make it queryable by analytical processes. It can be to store some data for your users, and provide it when requested on demand. Once we are clear about the purpose, we can establish the success factors involved. This will help us define what high availability means in our specific case.

SLA’s

Service Level Agreement (SLA). It is also quite common to define SLA’s for internal services. What is an SLA? It is a definition of the service level you plan to provide to your customers. This is for them to better understand what level of stability you plan for a service they bought or are planning to buy. There are numerous methods you can leverage to prepare a SLA but typical ones are:

Availability of the service (percent)
Responsiveness of the service - latency (average, max, 95 percentile, 99 percentile)
Packet loss over the network (percent)
Throughput (average, minimum, 95 percentile, 99 percentile)

It can get more complex than that, though. In a sharded, multi-user environment you can define, let’s say, your SLA as: “Service will be available 99,99% of the time, downtime is declared when more than 2% of the users is affected. No incident can take more than 15 minutes to be resolved”. Such SLA can also be extended to incorporate query response time: “downtime is called if 99 percentile of latency for queries excede 200 milliseconds”.

Nines

Availability is typically measured in “nines”, let us look into what exactly a given amount of “nines” guarantees. The table below is taken from Wikipedia:

Availability %	Downtime per year	Downtime per month	Downtime per week	Downtime per day
90% ("one nine")	36.5 days	72 hours	16.8 hours	2.4 hours
95% ("one and a half nines")	18.25 days	36 hours	8.4 hours	1.2 hours
97%	10.96 days	21.6 hours	5.04 hours	43.2 min
98%	7.30 days	14.4 hours	3.36 hours	28.8 min
99% ("two nines")	3.65 days	7.20 hours	1.68 hours	14.4 min
99.5% ("two and a half nines")	1.83 days	3.60 hours	50.4 min	7.2 min
99.8%	17.52 hours	86.23 min	20.16 min	2.88 min
99.9% ("three nines")	8.76 hours	43.8 min	10.1 min	1.44 min
99.95% ("three and a half nines")	4.38 hours	21.56 min	5.04 min	43.2 s
99.99% ("four nines")	52.56 min	4.38 min	1.01 min	8.64 s
99.995% ("four and a half nines")	26.28 min	2.16 min	30.24 s	4.32 s
99.999% ("five nines")	5.26 min	25.9 s	6.05 s	864.3 ms
99.9999% ("six nines")	31.5 s	2.59 s	604.8 ms	86.4 ms
99.99999% ("seven nines")	3.15 s	262.97 ms	60.48 ms	8.64 ms
99.999999% ("eight nines")	315.569 ms	26.297 ms	6.048 ms	0.864 ms
99.9999999% ("nine nines")	31.5569 ms	2.6297 ms	0.6048 ms	0.0864 ms

As we can see, it escalates quickly. Five nines (99,999% availability) is equivalent to 5.26 minutes of downtime over the course of a year. Availability can also be calculated in different, smaller ranges: per month, per week, per day. Keep in mind those numbers, as they will be useful when we start to discuss the costs associated with maintaining different levels of availability.

Measuring Availability

To tell if there is a downtime or not, one has to have insight into the environment. You need to track the metrics which define the availability of your systems. It is important to keep in mind that you should measure it from a customer’s point of view, taking the broader picture under consideration. It doesn’t matter if your databases are up if, let’s say, due to a network issue, no application cannot reach them. Every single building block of your setup has its impact on availability.

One of the good places where to look for availability data is web server logs. All requests which ended up with errors mean something has happened. It could be HTTP error 500 returned by the application, because the database connection failed. Those could be programmatic errors pointing to some database issues, and which ended up in Apache’s error log. You can also use simple metric as uptime of database servers, although, with more complex SLA’s it might be tricky to determine how the unavailability of one database impacted your user base. No matter what you do, you should use more than one metric - this is needed to capture issues which might have happened on different layers of your environment.

Magic Number: “Three”

Even though high availability is also about redundancy, in case of database clusters, three is a magic number. It is not enough to have two nodes for redundancy - such setup does not provide any built-in high availability. Sure, it might be better than just a single node, but human intervention is required to recover services. Let’s see why it is so.

Let’s assume we have two nodes, A and B. There’s a network link between them. Let us assume that both A and B serves writes and the application randomly picks where to connect (which means that part of the application will connect to node A and the other part will connect to node B). Now, let’s imagine we have a network issue which results in lost network connectivity between A and B.

What now? Neither A nor B can know the state of the other node. There are two actions which can be taken by both nodes:

They can continue accepting traffic
They can cease to operate and refuse to serve any traffic

Let’s think about the first option. As long as the other node is indeed down, this is the preferred action to take - we want our database to continue serving traffic. This is the main idea behind high availability after all. What would happen, though, if both nodes would continue to accept traffic while being disconnected from each other? New data will be added on both sides, and the datasets will get out of sync. When the network issue will be resolved, it will be a daunting task to merge those two datasets. Therefore, it is not acceptable to keep both nodes up and running. The problem is - how can node A tell if node B is alive or not (and vice versa)? The answer is - it cannot. If all connectivity is down, there is no way to distinguish a failed node from a failed network. As a result, the only safe action is for both nodes to cease all operations and refuse to
serve traffic.

Let’s think now how a third node can help us in such a situation.

So we now have three nodes: A, B and C. All are interconnected, all are handling reads and writes.

Again, as in the previous example, node B has been cut off from the rest of the cluster due to network issues. What can happen next? Well, the situation is fairly similar to what we discussed earlier. Two options - node B can either be down (and the rest of the cluster should continue) or it can be up, in which case it shouldn’t be allowed to handle any traffic. Can we now tell what’s the state of the cluster? Actually, yes. We can see that nodes A and C can talk to each other and, as a result, they can agree that node B is not available. They won’t be able to tell why it happened, but what they know is that out of three nodes in the cluster two still have connectivity between each other. Given that those two nodes form a majority of the cluster, it makes possible to continue handling traffic. At the same time node B can also deduct that the problem is on its side. It cannot access neither node A nor node C, making node B separated from the rest of the cluster. As it is isolated and is not part of a majority (1 of 3), the only safe action it can take is to stop serving traffic and refuse to accept any queries, ensuring that data drift won’t happen.

Of course, it doesn’t mean you can have only three nodes in the cluster. If you want better failure tolerance, you may want to add more. Keep in mind, though, it should be an odd number if you want to improve high availability. Also, we were talking about “nodes” in the examples above. Please keep in mind that this is also true for datacenters, availability zones etc. If you have two datacenters, each having the same number of nodes (let’s say three nodes each), and you lose connectivity between those two DC’s, same principles apply here - you cannot tell which half of the cluster should start handling traffic. To be able to tell that, you have to have an observer in a third datacenter. It can be yet another set of nodes, or just a single host, with the task
to observe the state of remaining dataceters and take part in making decisions (an example here would be the Galera arbitrator).

Single Points of Failure

High availability is all about removing single points of failure (SPOF) and not introducing new ones in the process. What are the SPOFs? Any part of your infrastructure which, when failed, brings downtime as defined in SLA, is called a SPOF. Infrastructure design requires a holistic approach, the different components cannot be designed independently of each other. Most likely, you are not responsible for the whole design -
database administrators tend to focus on databases and not, for example, the network layer. Still, you have to keep the other parts in mind and work with the teams which are responsible for them, to make sure that not only the part you are responsible for is designed correctly but also that the remaining bits of the infrastructure were designed using the same principles. On top of that, such knowledge of how the whole
infrastructure is designed, helps you to design the database stack too. Knowing what issues may happen helps to build some mechanisms to prevent them from impacting the availability of the database.

Tags:

MySQL

MariaDB

high availability

ProxySQL is an intelligent and high-performance SQL proxy which supports MySQL, MariaDB and ClickHouse. Recently, ProxySQL 2.0 has become GA and it comes with new exciting features such as GTID consistent reads, frontend SSL, Galera and MySQL Group Replication native support.

It is relatively easy to run ProxySQL as Docker container. We have previously written about how to run ProxySQL on Kubernetes as a helper container or as a Kubernetes service, which is based on ProxySQL 1.x. In this blog post, we are going to use the new version ProxySQL 2.x which uses a different approach for Galera Cluster configuration.

ProxySQL 2.x Docker Image

We have released a new ProxySQL 2.0 Docker image container and it's available in Docker Hub. The README provides a number of configuration examples particularly for Galera and MySQL Replication, pre and post v2.x. The configuration lines can be defined in a text file and mapped into the container's path at /etc/proxysql.cnf to be loaded into ProxySQL service.

The image "latest" tag still points to 1.x until ProxySQL 2.0 officially becomes GA (we haven't seen any official release blog/article from ProxySQL team yet). Which means, whenever you install ProxySQL image using latest tag from Severalnines, you will still get version 1.x with it. Take note the new example configurations also enable ProxySQL web stats (introduced in 1.4.4 but still in beta) - a simple dashboard that summarizes the overall configuration and status of ProxySQL itself.

ProxySQL 2.x Support for Galera Cluster

Let's talk about Galera Cluster native support in greater detail. The new mysql_galera_hostgroups table consists of the following fields:

writer_hostgroup: ID of the hostgroup that will contain all the members that are writers (read_only=0).
backup_writer_hostgroup: If the cluster is running in multi-writer mode (i.e. there are multiple nodes with read_only=0) and max_writers is set to a smaller number than the total number of nodes, the additional nodes are moved to this backup writer hostgroup.
reader_hostgroup: ID of the hostgroup that will contain all the members that are readers (i.e. nodes that have read_only=1)
offline_hostgroup: When ProxySQL monitoring determines a host to be OFFLINE, the host will be moved to the offline_hostgroup.
active: a boolean value (0 or 1) to activate a hostgroup
max_writers: Controls the maximum number of allowable nodes in the writer hostgroup, as mentioned previously, additional nodes will be moved to the backup_writer_hostgroup.
writer_is_also_reader: When 1, a node in the writer_hostgroup will also be placed in the reader_hostgroup so that it will be used for reads. When set to 2, the nodes from backup_writer_hostgroup will be placed in the reader_hostgroup, instead of the node(s) in the writer_hostgroup.
max_transactions_behind: determines the maximum number of writesets a node in the cluster can have queued before the node is SHUNNED to prevent stale reads (this is determined by querying the wsrep_local_recv_queue Galera variable).
comment: Text field that can be used for any purposes defined by the user

Here is an example configuration for mysql_galera_hostgroups in table format:

Admin> select * from mysql_galera_hostgroups\G
*************************** 1. row ***************************
       writer_hostgroup: 10
backup_writer_hostgroup: 20
       reader_hostgroup: 30
      offline_hostgroup: 9999
                 active: 1
            max_writers: 1
  writer_is_also_reader: 2
max_transactions_behind: 20
                comment:

ProxySQL performs Galera health checks by monitoring the following MySQL status/variable:

read_only - If ON, then ProxySQL will group the defined host into reader_hostgroup unless writer_is_also_reader is 1.
wsrep_desync - If ON, ProxySQL will mark the node as unavailable, moving it to offline_hostgroup.
wsrep_reject_queries - If this variable is ON, ProxySQL will mark the node as unavailable, moving it to the offline_hostgroup (useful in certain maintenance situations).
wsrep_sst_donor_rejects_queries - If this variable is ON, ProxySQL will mark the node as unavailable while the Galera node is serving as an SST donor, moving it to the offline_hostgroup.
wsrep_local_state - If this status returns other than 4 (4 means Synced), ProxySQL will mark the node as unavailable and move it into offline_hostgroup.
wsrep_local_recv_queue - If this status is higher than max_transactions_behind, the node will be shunned.
wsrep_cluster_status - If this status returns other than Primary, ProxySQL will mark the node as unavailable and move it into offline_hostgroup.

Having said that, by combining these new parameters in mysql_galera_hostgroups together with mysql_query_rules, ProxySQL 2.x has the flexibility to fit into much more Galera use cases. For example, one can have a single-writer, multi-writer and multi-reader hostgroups defined as the destination hostgroup of a query rule, with the ability to limit the number of writers and finer control on the stale reads behaviour.

Contrast this to ProxySQL 1.x, where the user had to explicitly define a scheduler to call an external script to perform the backend health checks and update the database servers state. This requires some customization to the script (user has to update the ProxySQL admin user/password/port) plus it depended on an additional tool (MySQL client) to connect to ProxySQL admin interface.

Here is an example configuration of Galera health check script scheduler in table format for ProxySQL 1.x:

Admin> select * from scheduler\G
*************************** 1. row ***************************
         id: 1
     active: 1
interval_ms: 2000
   filename: /usr/share/proxysql/tools/proxysql_galera_checker.sh
       arg1: 10
       arg2: 20
       arg3: 1
       arg4: 1
       arg5: /var/lib/proxysql/proxysql_galera_checker.log
    comment:

Besides, since ProxySQL scheduler thread executes any script independently, there are many versions of health check scripts available out there. All ProxySQL instances deployed by ClusterControl uses the default script provided by the ProxySQL installer package.

In ProxySQL 2.x, max_writers and writer_is_also_reader variables can determine how ProxySQL dynamically groups the backend MySQL servers and will directly affect the connection distribution and query routing. For example, consider the following MySQL backend servers:

Admin> select hostgroup_id, hostname, status, weight from mysql_servers;
+--------------+--------------+--------+--------+
| hostgroup_id | hostname     | status | weight |
+--------------+--------------+--------+--------+
| 10           | DB1          | ONLINE | 1      |
| 10           | DB2          | ONLINE | 1      |
| 10           | DB3          | ONLINE | 1      |
+--------------+--------------+--------+--------+

Together with the following Galera hostgroups definition:

Admin> select * from mysql_galera_hostgroups\G
*************************** 1. row ***************************
       writer_hostgroup: 10
backup_writer_hostgroup: 20
       reader_hostgroup: 30
      offline_hostgroup: 9999
                 active: 1
            max_writers: 1
  writer_is_also_reader: 2
max_transactions_behind: 20
                comment:

Considering all hosts are up and running, ProxySQL will most likely group the hosts as below:

Let's look at them one by one:

Configuration	Description
writer_is_also_reader=0	Groups the hosts into 2 hostgroups (writer and backup_writer). Writer is part of the backup_writer. Since the writer is not a reader, nothing in hostgroup 30 (reader) because none of the hosts are set with read_only=1. It is not a common practice in Galera to enable the read-only flag.
writer_is_also_reader=1	Groups the hosts into 3 hostgroups (writer, backup_writer and reader). Variable read_only=0 in Galera has no affect thus writer is also in hostgroup 30 (reader) Writer is not part of backup_writer.
writer_is_also_reader=2	Similar with writer_is_also_reader=1 however, writer is part of backup_writer.

With this configuration, one can have various choices for hostgroup destination to cater for specific workloads. "Hotspot" writes can be configured to go to only one server to reduce multi-master conflicts, non-conflicting writes can be distributed equally on the other masters, most reads can be distributed evenly on all MySQL servers or non-writers, critical reads can be forwarded to the most up-to-date servers and analytical reads can be forwarded to a slave replica.

ProxySQL Deployment for Galera Cluster

In this example, suppose we already have a three-node Galera Cluster deployed by ClusterControl as shown in the following diagram:

Our Wordpress applications are running on Docker while the Wordpress database is hosted on our Galera Cluster running on bare-metal servers. We decided to run a ProxySQL container alongside our Wordpress containers to have a better control on Wordpress database query routing and fully utilize our database cluster infrastructure. Since the read-write ratio is around 80%-20%, we want to configure ProxySQL to:

Forward all writes to one Galera node (less conflict, focus on write)
Balance all reads to the other two Galera nodes (better distribution for the majority of the workload)

Firstly, create a ProxySQL configuration file inside the Docker host so we can map it into our container:

$ mkdir /root/proxysql-docker
$ vim /root/proxysql-docker/proxysql.cnf

Then, copy the following lines (we will explain the configuration lines further down):

datadir="/var/lib/proxysql"

admin_variables=
{
    admin_credentials="admin:admin"
    mysql_ifaces="0.0.0.0:6032"
    refresh_interval=2000
    web_enabled=true
    web_port=6080
    stats_credentials="stats:admin"
}

mysql_variables=
{
    threads=4
    max_connections=2048
    default_query_delay=0
    default_query_timeout=36000000
    have_compress=true
    poll_timeout=2000
    interfaces="0.0.0.0:6033;/tmp/proxysql.sock"
    default_schema="information_schema"
    stacksize=1048576
    server_version="5.1.30"
    connect_timeout_server=10000
    monitor_history=60000
    monitor_connect_interval=200000
    monitor_ping_interval=200000
    ping_interval_server_msec=10000
    ping_timeout_server=200
    commands_stats=true
    sessions_sort=true
    monitor_username="proxysql"
    monitor_password="proxysqlpassword"
    monitor_galera_healthcheck_interval=2000
    monitor_galera_healthcheck_timeout=800
}

mysql_galera_hostgroups =
(
    {
        writer_hostgroup=10
        backup_writer_hostgroup=20
        reader_hostgroup=30
        offline_hostgroup=9999
        max_writers=1
        writer_is_also_reader=1
        max_transactions_behind=30
        active=1
    }
)

mysql_servers =
(
    { address="db1.cluster.local" , port=3306 , hostgroup=10, max_connections=100 },
    { address="db2.cluster.local" , port=3306 , hostgroup=10, max_connections=100 },
    { address="db3.cluster.local" , port=3306 , hostgroup=10, max_connections=100 }
)

mysql_query_rules =
(
    {
        rule_id=100
        active=1
        match_pattern="^SELECT .* FOR UPDATE"
        destination_hostgroup=10
        apply=1
    },
    {
        rule_id=200
        active=1
        match_pattern="^SELECT .*"
        destination_hostgroup=20
        apply=1
    },
    {
        rule_id=300
        active=1
        match_pattern=".*"
        destination_hostgroup=10
        apply=1
    }
)

mysql_users =
(
    { username = "wordpress", password = "passw0rd", default_hostgroup = 10, transaction_persistent = 0, active = 1 },
    { username = "sbtest", password = "passw0rd", default_hostgroup = 10, transaction_persistent = 0, active = 1 }
)

Now, let's pay a visit to some of the most configuration sections. Firstly, we define the Galera hostgroups configuration as below:

mysql_galera_hostgroups =
(
    {
        writer_hostgroup=10
        backup_writer_hostgroup=20
        reader_hostgroup=30
        offline_hostgroup=9999
        max_writers=1
        writer_is_also_reader=1
        max_transactions_behind=30
        active=1
    }
)

Hostgroup 10 will be the writer_hostgroup, hostgroup 20 for backup_writer and hostgroup 30 for reader. We set max_writers to 1 so we can have a single-writer hostgroup for hostgroup 10 where all writes should be sent to. Then, we define writer_is_also_reader to 1 which will make all Galera nodes as reader as well, suitable for queries that can be equally distributed to all nodes. Hostgroup 9999 is reserved for offline_hostgroup if ProxySQL detects unoperational Galera nodes.

Then, we configure our MySQL servers with default to hostgroup 10:

mysql_servers =
(
    { address="db1.cluster.local" , port=3306 , hostgroup=10, max_connections=100 },
    { address="db2.cluster.local" , port=3306 , hostgroup=10, max_connections=100 },
    { address="db3.cluster.local" , port=3306 , hostgroup=10, max_connections=100 }
)

With the above configurations, ProxySQL will "see" our hostgroups as below:

Then, we define the query routing through query rules. Based on our requirement, all reads should be sent to all Galera nodes except the writer (hostgroup 20) and everything else is forwarded to hostgroup 10 for single writer:

mysql_query_rules =
(
    {
        rule_id=100
        active=1
        match_pattern="^SELECT .* FOR UPDATE"
        destination_hostgroup=10
        apply=1
    },
    {
        rule_id=200
        active=1
        match_pattern="^SELECT .*"
        destination_hostgroup=20
        apply=1
    },
    {
        rule_id=300
        active=1
        match_pattern=".*"
        destination_hostgroup=10
        apply=1
    }
)

Finally, we define the MySQL users that will be passed through ProxySQL:

mysql_users =
(
    { username = "wordpress", password = "passw0rd", default_hostgroup = 10, transaction_persistent = 0, active = 1 },
    { username = "sbtest", password = "passw0rd", default_hostgroup = 10, transaction_persistent = 0, active = 1 }
)

We set transaction_persistent to 0 so all connections coming from these users will respect the query rules for reads and writes routing. Otherwise, the connections would end up hitting one hostgroup which defeats the purpose of load balancing. Do not forget to create those users first on all MySQL servers. For ClusterControl user, you may use Manage -> Schemas and Users feature to create those users.

We are now ready to start our container. We are going to map the ProxySQL configuration file as bind mount when starting up the ProxySQL container. Thus, the run command will be:

$ docker run -d \
--name proxysql2 \
--hostname proxysql2 \
--publish 6033:6033 \
--publish 6032:6032 \
--publish 6080:6080 \
--restart=unless-stopped \
-v /root/proxysql/proxysql.cnf:/etc/proxysql.cnf \
severalnines/proxysql:2.0

Finally, change the Wordpress database pointing to ProxySQL container port 6033, for instance:

$ docker run -d \
--name wordpress \
--publish 80:80 \
--restart=unless-stopped \
-e WORDPRESS_DB_HOST=proxysql2:6033 \
-e WORDPRESS_DB_USER=wordpress \
-e WORDPRESS_DB_HOST=passw0rd \
wordpress

At this point, our architecture is looking something like this:

If you want ProxySQL container to be persistent, map /var/lib/proxysql/ to a Docker volume or bind mount, for example:

$ docker run -d \
--name proxysql2 \
--hostname proxysql2 \
--publish 6033:6033 \
--publish 6032:6032 \
--publish 6080:6080 \
--restart=unless-stopped \
-v /root/proxysql/proxysql.cnf:/etc/proxysql.cnf \
-v proxysql-volume:/var/lib/proxysql \
severalnines/proxysql:2.0

Keep in mind that running with persistent storage like the above will make our /root/proxysql/proxysql.cnf obsolete on the second restart. This is due to ProxySQL multi-layer configuration whereby if /var/lib/proxysql/proxysql.db exists, ProxySQL will skip loading options from configuration file and load whatever is in the SQLite database instead (unless you start proxysql service with --initial flag). Having said that, the next ProxySQL configuration management has to be performed via ProxySQL admin console on port 6032, instead of using configuration file.

Monitoring

ProxySQL process log by default logging to syslog and you can view them by using standard docker command:

$ docker ps
$ docker logs proxysql2

To verify the current hostgroup, query the runtime_mysql_servers table:

$ docker exec -it proxysql2 mysql -uadmin -padmin -h127.0.0.1 -P6032 --prompt='Admin> '
Admin> select hostgroup_id,hostname,status from runtime_mysql_servers;
+--------------+--------------+--------+
| hostgroup_id | hostname     | status |
+--------------+--------------+--------+
| 10           | 192.168.0.21 | ONLINE |
| 30           | 192.168.0.21 | ONLINE |
| 30           | 192.168.0.22 | ONLINE |
| 30           | 192.168.0.23 | ONLINE |
| 20           | 192.168.0.22 | ONLINE |
| 20           | 192.168.0.23 | ONLINE |
+--------------+--------------+--------+

If the selected writer goes down, it will be transferred to the offline_hostgroup (HID 9999):

Admin> select hostgroup_id,hostname,status from runtime_mysql_servers;
+--------------+--------------+--------+
| hostgroup_id | hostname     | status |
+--------------+--------------+--------+
| 10           | 192.168.0.22 | ONLINE |
| 9999         | 192.168.0.21 | ONLINE |
| 30           | 192.168.0.22 | ONLINE |
| 30           | 192.168.0.23 | ONLINE |
| 20           | 192.168.0.23 | ONLINE |
+--------------+--------------+--------+

The above topology changes can be illustrated in the following diagram:

We have also enabled the web stats UI with admin-web_enabled=true.To access the web UI, simply go to the Docker host in port 6080, for example: http://192.168.0.200:8060 and you will be prompted with username/password pop up. Enter the credentials as defined under admin-stats_credentials and you should see the following page:

By monitoring MySQL connection pool table, we can get connection distribution overview for all hostgroups:

Admin> select hostgroup, srv_host, status, ConnUsed, MaxConnUsed, Queries from stats.stats_mysql_connection_pool order by srv_host;
+-----------+--------------+--------+----------+-------------+---------+
| hostgroup | srv_host     | status | ConnUsed | MaxConnUsed | Queries |
+-----------+--------------+--------+----------+-------------+---------+
| 20        | 192.168.0.23 | ONLINE | 5        | 24          | 11458   |
| 30        | 192.168.0.23 | ONLINE | 0        | 0           | 0       |
| 20        | 192.168.0.22 | ONLINE | 2        | 24          | 11485   |
| 30        | 192.168.0.22 | ONLINE | 0        | 0           | 0       |
| 10        | 192.168.0.21 | ONLINE | 32       | 32          | 9746    |
| 30        | 192.168.0.21 | ONLINE | 0        | 0           | 0       |
+-----------+--------------+--------+----------+-------------+---------+

The output above shows that hostgroup 30 does not process anything because our query rules do not have this hostgroup configured as destination hostgroup.

The statistics related to the Galera nodes can be viewed in the mysql_server_galera_log table:

Admin>  select * from mysql_server_galera_log order by time_start_us desc limit 3\G
*************************** 1. row ***************************
                       hostname: 192.168.0.23
                           port: 3306
                  time_start_us: 1552992553332489
                success_time_us: 2045
              primary_partition: YES
                      read_only: NO
         wsrep_local_recv_queue: 0
              wsrep_local_state: 4
                   wsrep_desync: NO
           wsrep_reject_queries: NO
wsrep_sst_donor_rejects_queries: NO
                          error: NULL
*************************** 2. row ***************************
                       hostname: 192.168.0.22
                           port: 3306
                  time_start_us: 1552992553329653
                success_time_us: 2799
              primary_partition: YES
                      read_only: NO
         wsrep_local_recv_queue: 0
              wsrep_local_state: 4
                   wsrep_desync: NO
           wsrep_reject_queries: NO
wsrep_sst_donor_rejects_queries: NO
                          error: NULL
*************************** 3. row ***************************
                       hostname: 192.168.0.21
                           port: 3306
                  time_start_us: 1552992553329013
                success_time_us: 2715
              primary_partition: YES
                      read_only: NO
         wsrep_local_recv_queue: 0
              wsrep_local_state: 4
                   wsrep_desync: NO
           wsrep_reject_queries: NO
wsrep_sst_donor_rejects_queries: NO
                          error: NULL

The resultset returns the related MySQL variable/status state for every Galera node for a particular timestamp. In this configuration, we configured the Galera health check to run every 2 seconds (monitor_galera_healthcheck_interval=2000). Hence, the maximum failover time would be around 2 seconds if a topology change happens to the cluster.

References

Tags:

High availability is a high percentage of time that the system is working and responding according to the business needs. For production database systems it is typically the highest priority to keep it close to 100%. We build database clusters to eliminate all single point of failure. If an instance becomes unavailable, another node should be able to take the workload and carry on from there. In a perfect world, a database cluster would solve all of our system availability problems. Unfortunately, while all may look good on paper, the reality is often different. So where can it go wrong?

Transactional databases systems come with sophisticated storage engines. Keeping data consistent across multiple nodes makes this task way harder. Clustering introduces a number of new variables that highly depend on network and underlying infrastructure. It is not uncommon for a standalone database instance that was running fine on a single node suddenly performs poorly in a cluster environment.

Among the number of things that can affect cluster availability, latency issues play a crucial role. However, what is the latency? Is it only related to the network?

The term "latency" actually refers to several kinds of delays incurred in the processing of data. It’s how long it takes for a piece of information to move from stage to another.

In this blog post, we’ll look at the two main high availability solutions for MySQL and MariaDB, and how they can each be affected by latency issues.

At the end of the article, we take a look at modern load balancers and discuss how they can help you address some types of latency issues.

In a previous article, my colleague Krzysztof Książek wrote about "Dealing with Unreliable Networks When Crafting an HA Solution for MySQL or MariaDB". You will find tips which can help you to design your production ready HA architecture, and avoid some of the issues described here.

Master-Slave replication for High Availability.

MySQL master-slave replication is probably the most popular database cluster type on the planet. One of the main things you want to monitor while running your master-slave replication cluster is the slave lag. Depending on your application requirements and the way how you utilize your database, the replication latency (slave lag) may determine if the data can be read from the slave node or not. Data committed on master but not yet available on an asynchronous slave means that the slave has an older state. When it’s not ok to read from a slave, you would need to go to the master, and that can affect application performance. In the worst case scenario, your system will not be able to handle all the workload on a master.

Slave lag and stale data

To check the status of the master-slave replication, you should start with below command:

SHOW SLAVE STATUS\G
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.3.100
                  Master_User: rpl_user
                  Master_Port: 3306
                Connect_Retry: 10
              Master_Log_File: binlog.000021
          Read_Master_Log_Pos: 5101
               Relay_Log_File: relay-bin.000002
                Relay_Log_Pos: 809
        Relay_Master_Log_File: binlog.000021
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 5101
              Relay_Log_Space: 1101
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 3
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 0-3-1179
      Replicate_Do_Domain_Ids: 
  Replicate_Ignore_Domain_Ids: 
                Parallel_Mode: conservative
1 row in set (0.01 sec)

Using the above information you can determine how good the overall replication latency is. The lower the value you see in "Seconds_Behind_Master", the better the data transfer speed for replication.

Another way to monitor slave lag is to use ClusterControl replication monitoring. In this screenshot we can see the replication status of asymchoronous Master-Slave (2x) Cluster with ProxySQL.

There are a number of things that can affect replication time. The most obvious is the network throughput and how much data you can transfer. MySQL comes with multiple configuration options to optimize replication process. The essential replication related parameters are:

Parallel apply
Logical clock algorithm
Compression
Selective master-slave replication
Replication mode

Parallel apply

It’s not uncommon to start replication tuning with enabling parallel process apply. The reason for that is by default, MySQL goes with sequential binary log apply, and a typical database server comes with several CPUs to use.

To get around sequential log apply, both MariaDB and MySQL offer parallel replication. The implementation may differ per vendor and version. E.g. MySQL 5.6 offers parallel replication as long as a schema separates the queries while MariaDB (starting version 10.0) and MySQL 5.7 both can handle parallel replication across schemas. Different vendors and versions come with their limitations and feature so always check the documentation.

Executing queries via parallel slave threads may speed up your replication stream if you are write heavy. However, if you aren’t, it would be best to stick to the traditional single-threaded replication. To enable parallel processing, change the slave_parallel_workers to the number of CPU threads you want to involve in the process. It is recommended to keep the value lower of the number of available CPU threads.

Parallel replication works best with the group commits. To check if you have group commits happening run following query.

show global status like 'binlog_%commits';

The bigger the ratio between these two values the better.

Logical clock

The slave_parallel_type=LOGICAL_CLOCK is an implementation of a Lamport clock algorithm. When using a multithreaded slave this variable specifies the method used to decide which transactions are allowed to execute in parallel on the slave. The variable has no effect on slaves for which multithreading is not enabled so make sure slave_parallel_workers is set higher than 0.

MariaDB users should also check optimistic mode introduced in version 10.1.3 as it also may give you better results.

GTID

MariaDB comes with its own implementation of GTID. MariaDB’s sequence consists of a domain, server, and transaction. Domains allow multi-source replication with distinct ID. Different domain ID’s can be used to replicate the portion of data out-of-order (in parallel). As long it’s okayish for your application this can reduce replication latency.

The similar technique applies to MySQL 5.7 which can also use the multisource master and independent replication channels.

Compression

CPU power is getting less expensive over time, so using it for binlog compression could be a good option for many database environments. The slave_compressed_protocol parameter tells MySQL to use compression if both master and slave support it. By default, this parameter is disabled.

Starting from MariaDB 10.2.3, selected events in the binary log can be optionally compressed, to save the network transfers.

Replication formats

MySQL offers several replication modes. Choosing the right replication format helps to minimize the time to pass data between the cluster nodes.

Multimaster Replication For High Availability

Some applications can not afford to operate on outdated data.

In such cases, you may want to enforce consistency across the nodes with synchronous replication. Keeping data synchronous requires an additional plugin, and for some, the best solution on the market for that is Galera Cluster.

Galera cluster comes with wsrep API which is responsible of transmitting transactions to all nodes and executing them according to a cluster-wide ordering. This will block the execution of subsequent queries until the node has applied all write-sets from its applier queue. While it’s a good solution for consistency, you may hit some architectural limitations. The common latency issues can be related to:

The slowest node in the cluster
Horizontal scaling and write operations
Geolocated clusters
High Ping
Transaction size

The slowest node in the cluster

By design, the write performance of the cluster cannot be higher than the performance of the slowest node in the cluster. Start your cluster review by checking the machine resources and verify the configuration files to make sure they all run on the same performance settings.

Parallelization

Parallel threads do not guarantee better performance, but it may speed up the synchronization of new nodes with the cluster. The status wsrep_cert_deps_distance tells us the possible degree of parallelization. It is the value of the average distance between the highest and lowest seqno values that can be possibly applied in parallel. You can use the wsrep_cert_deps_distance status variable to determine the maximum number of slave threads possible.

Horizontal scaling

By adding more nodes in the cluster, we have fewer points that could fail; however, the information needs to go across multi-instances until it’s committed, which multiplies the response times. If you need scalable writes, consider an architecture based on sharding. A good solution can be a Spider storage engine.

In some cases, to reduce information shared across the cluster nodes, you can consider having one writer at a time. It’s relatively easy to implement while using a load balancer. When you do this manually make sure you have a procedure to change DNS value when your writer node goes down.

Geolocated clusters

Although Galera Cluster is synchronous, it is possible to deploy a Galera Cluster across data centers. Synchronous replication like MySQL Cluster (NDB) implements a two-phase commit, where messages are sent to all nodes in a cluster in a 'prepare' phase, and another set of messages are sent in a 'commit' phase. This approach is usually not suitable for geographically disparate nodes, because of the latencies in sending messages between nodes.

High Ping

Galera Cluster with the default settings does not handle well high network latency. If you have a network with a node that shows a high ping time, consider changing evs.send_window and evs.user_send_window parameters. These variables define the maximum number of data packets in replication at a time. For WAN setups, the variable can be set to a considerably higher value than the default value of 2. It’s common to set it to 512. These parameters are part of wsrep_provider_options.

--wsrep_provider_options="evs.send_window=512;evs.user_send_window=512"

Transaction size

One of the things you need to consider while running Galera Cluster is the size of the transaction. Finding the balance between the transaction size, performance and Galera certification process is something you have to estimate in your application. You can find more information about that in the article How to Improve Performance of Galera Cluster for MySQL or MariaDB by Ashraf Sharif.

Load Balancer Causal Consistency Reads

Even with the minimized risk of data latency issues, standard MySQL asynchronous replication cannot guarantee consistency. It is still possible that the data is yet not replicated to slave while your application is reading it from there. Synchronous replication can solve this problem, but it has architecture limitations and may not fit your application requirements (e.g., intensive bulk writes). So how to overcome it?

The first step to avoid stale data reading is to make the application aware of replication delay. It is usually programmed in application code. Fortunately, there are modern database load balancers with the support of adaptive query routing based on GTID tracking. The most popular are ProxySQL and Maxscale.

ProxySQL 2.0

ProxySQL Binlog Reader allows ProxySQL to know in real time which GTID has been executed on every MySQL server, slaves and master itself. Thanks to this, when a client executes a reads that needs to provide causal consistency reads, ProxySQL immediately knows on which server the query can be executed. If for whatever reason the writes were not executed on any slave yet, ProxySQL will know that the writer was executed on master and send the read there.

Maxscale 2.3

MariaDB introduced casual reads in Maxscale 2.3.0. The way it works it’s similar to ProxySQL 2.0. Basically when causal_reads are enabled, any subsequent reads performed on slave servers will be done in a manner that prevents replication lag from affecting the results. If the slave has not caught up to the master within the configured time, the query will be retried on the master.

Tags:

Long running queries/statements/transactions are sometimes inevitable in a MySQL environment. In some occasions, a long running query could be a catalyst to a disastrous event. If you care about your database, optimizing query performance and detecting long running queries must be performed regularly. Things do get harder though when multiple instances in a group or cluster are involved.

When dealing with multiple nodes, the repetitive tasks to check every single node is something that we have to avoid. ClusterControl monitors multiple aspects of your database server, including queries. ClusterControl aggregates all the query-related information from all nodes in the group or cluster to provide a centralized view of workload. Right there is a great way to understand your cluster as a whole with minimal effort.

In this blog post, we show you how to detect MySQL long running queries using ClusterControl.

Why a Query Takes Longer Time?

First of all, we have to know the nature of the query, whether it is expected to be a long running or a short running query. Some analytic and batch operations are supposed to be long running queries, so we can skip those for now. Also, depending on the table size, modifying table structure with ALTER command can be a long running operation.

For a short-span transaction, it should be executed as fast as possible, usually in a matter of subsecond. The shorter the better. This comes with a set of query best-practice rules that users have to follow, like use proper indexing in WHERE or JOIN statement, using the right storage engine, picking proper data types, scheduling the batch operation during off-peak hours, offloading analytical/reporting traffic to dedicated replicas, and so on.

There are a number of things that may cause a query to take longer time to execute:

Inefficient query - Use non-indexed columns while lookup or joining, thus MySQL takes longer time to match the condition.
Table lock - The table is locked, by global lock or explicit table lock when the query is trying to access it.
Deadlock - A query is waiting to access the same rows that are locked by another query.
Dataset does not fit into RAM - If your working set data fits into that cache, then SELECT queries will usually be relatively fast.
Suboptimal hardware resources - This could be slow disks, RAID rebuilding, saturated network etc.
Maintenance operation - Running mysqldump can bring huge amounts of otherwise unused data into the buffer pool, and at the same time the (potentially useful) data that is already there will be evicted and flushed to disk.

The above list emphasizes it is not only the query itself that causes all sorts of problems. There are plenty of reasons which require looking at different aspects of a MySQL server. In some worse-case scenario, a long running query could cause a total service disruption like server down, server crash and connections maxing out. If you see a query takes longer than usual to execute, do investigate it.

How to Check?

PROCESSLIST

MySQL provides a number of built-in tools to check the long running transaction. First of all, SHOW PROCESSLIST or SHOW FULL PROCESSLIST commands can expose the running queries in real-time. Here is a screenshot of ClusterControl Running Queries feature, similar to SHOW FULL PROCESSLIST command (but ClusterControl aggregates all the process into one view for all nodes in the cluster):

As you can see, we can immediately see the offensive query right away from the output. But how often do we stare at those processes? This is only useful if you are aware of the long running transaction. Otherwise, you wouldn't know until something happens - like connections are piling up, or the server is getting slower than usual.

Slow Query Log

Slow query log captures slow queries (SQL statements that take more than long_query_time seconds to execute), or queries that do not use indexes for lookups (log_queries_not_using_indexes). This feature is not enabled by default and to enable it simply set the following lines and restart the MySQL server:

[mysqld]
slow_query_log=1
long_query_time=0.1
log_queries_not_using_indexes=1

The slow query log can be used to find queries that take a long time to execute and are therefore candidates for optimization. However, examining a long slow query log can be a time-consuming task. There are tools to parse MySQL slow query log files and summarize their contents like mysqldumpslow, pt-query-digest or ClusterControl Top Queries.

ClusterControl Top Queries summarizes the slow query using two methods - MySQL slow query log or Performance Schema:

You can easily see a summary of the normalized statement digests, sorted based on a number of criteria:

Host
Occurrences
Total execution time
Maximum execution time
Average execution time
Standard deviation time

We have covered this feature in great detail in this blog post, How to use the ClusterControl Query Monitor for MySQL, MariaDB and Percona Server.

Performance Schema

Performance Schema is a great tool available for monitoring MySQL Server internals and execution details at a lower level. The following tables in Performance Schema can be used to find slow queries:

events_statements_current
events_statements_history
events_statements_history_long
events_statements_summary_by_digest
events_statements_summary_by_user_by_event_name
events_statements_summary_by_host_by_event_name

MySQL 5.7.7 and higher includes the sys schema, a set of objects that helps DBAs and developers interpret data collected by the Performance Schema into more easily understandable form. Sys schema objects can be used for typical tuning and diagnosis use cases.

ClusterControl provides advisors, which are mini-programs that you can write using ClusterControl DSL (similar to JavaScript) to extend the ClusterControl monitoring capabilities custom to your needs. There are a number of scripts included based on Performance Schema that you can use to monitor query performance like I/O wait, lock wait time and so on. For example under Manage -> Developer Studio, go to s9s -> mysql -> p_s -> top_tables_by_iowait.js and click "Compile and Run" button. You should see the output under Messages tab for top 10 tables sorted by I/O wait per server:

There are a number of scripts that you can use to understand low-level information where and why the slowness happens like top_tables_by_lockwait.js, top_accessed_db_files.js and so on.

ClusterControl - Detecting and alerting upon long running queries

With ClusterControl, you will get additional powerful features that you won't find in the standard MySQL installation. ClusterControl can be configured to proactively monitor the running processes, and raise an alarm and send notification to the user if long query threshold is exceeded. This can be configured by using the Runtime Configuration under Settings:

For pre1.7.1, the default value for query_monitor_alert_long_running_query is false. We encourage user to enable this by setting it to 1 (true). To make it persistent, add the following line into /etc/cmon.d/cmon_X.cnf:

query_monitor_alert_long_running_query=1
query_monitor_long_running_query_ms=30000

Any changes made in the Runtime Configuration is applied immediately and no restart required. You will see something like this under the Alarms section if a query exceeds 30000ms (30 seconds) thresholds:

If you configure the mail recipient settings as "Deliver" for the DbComponent plus CRITICAL severity category (as shown in the following screenshot):

You should get a copy of this alarm in your email. Otherwise, it can be forwarded manually by clicking on the "Send Email" button.

Furthermore, you can filter out any kind of processlist resources that match certain criteria with regular expression (regex). For example, if you want ClusterControl to detect long running query for three MySQL users called 'sbtest', 'myshop' and 'db_user1', the following should do:

Any changes made in the Runtime Configuration is applied immediately and no restart required.

Additionally, ClusterControl will list out all deadlock transactions together with the InnoDB status when it was happening under Performance -> Transaction Log:

This feature is not enabled by default, due to deadlock detection will affect CPU usage on database nodes. To enable it, simply tick the "Enable Transaction Log" checkbox and specify the interval that you want. To make it persistent, add variable with value in seconds inside /etc/cmon.d/cmon_X.cnf:

db_deadlock_check_interval=30

Similarly, if you want to check out the InnoDB status, simply go to Performance -> InnoDB Status, and choose the MySQL server from the dropdown. For example:

There we go - all the required information is easily retrievable in a couple of clicks.

Summary

Long running transactions could lead to performance degradation, server down, connections maxed out and deadlocks. With ClusterControl, you can detect long running queries directly from the UI, without the need to examine every single MySQL node in the cluster.

Tags:

Camunda BPM is an open-source workflow and decision automation platform. Camunda BPM ships with tools for creating workflow and decision models, operating deployed models in production, and allowing users to execute workflow tasks assigned to them.

By default, Camunda comes with an embedded database called H2, which works pretty decently within a Java environment with relatively small memory footprint. However, when it comes to scaling and high availability, there are other database backends that might be more appropriate.

In this blog post, we are going to deploy Camunda BPM 7.10 Community Edition on Linux, with a focus on achieving database high availability. Camunda supports major databases through JDBC drivers, namely Oracle, DB2, MySQL, MariaDB and PostgreSQL. This blog only focuses on MySQL and MariaDB Galera Cluster, with different implementation on each - one with ProxySQL as database load balancer, and the other using the JDBC driver to connect to multiple database instances. Take note that this article does not cover on high availability for the Camunda application itself.

Prerequisite

Camunda BPM runs on Java. In our CentOS 7 box, we have to install JDK and the best option is to use the one from Oracle, and skip using the OpenJDK packages provided in the repository. On the application server where Camunda should run, download the latest Java SE Development Kit (JDK) from Oracle by sending the acceptance cookie:

$ wget --header "Cookie: oraclelicense=accept-securebackup-cookie" https://download.oracle.com/otn-pub/java/jdk/12+33/312335d836a34c7c8bba9d963e26dc23/jdk-12_linux-x64_bin.rpm

Install it on the host:

$ yum localinstall jdk-12_linux-x64_bin.rpm

Verify with:

$ java --version
java 12 2019-03-19
Java(TM) SE Runtime Environment (build 12+33)
Java HotSpot(TM) 64-Bit Server VM (build 12+33, mixed mode, sharing)

Create a new directory and download Camunda Community for Apache Tomcat from the official download page:

$ mkdir ~/camunda
$ cd ~/camunda
$ wget --content-disposition 'https://camunda.org/release/camunda-bpm/tomcat/7.10/camunda-bpm-tomcat-7.10.0.tar.gz'

Extract it:

$ tar -xzf camunda-bpm-tomcat-7.10.0.tar.gz

There are a number of dependencies we have to configure before starting up Camunda web application. This depends on the chosen database platform like datastore configuration, database connector and CLASSPATH environment. The next sections explain the required steps for MySQL Galera (using Percona XtraDB Cluster) and MariaDB Galera Cluster.

Note that the configurations shown in this blog are based on Apache Tomcat environment. If you are using JBOSS or Wildfly, the datastore configuration will be a bit different. Refer to Camunda documentation for details.

MySQL Galera Cluster (with ProxySQL and Keepalived)

We will use ClusterControl to deploy MySQL-based Galera cluster with Percona XtraDB Cluster. There are some Galera-related limitations mentioned in the Camunda docs surrounding Galera multi-writer conflicts handling and InnoDB isolation level. In case you are affected by these, the safest way is to use the single-writer approach, which is achievable with ProxySQL hostgroup configuration. To provide no single-point of failure, we will deploy two ProxySQL instances and tie them with a virtual IP address by Keepalived.

The following diagram illustrates our final architecture:

First, deploy a three-node Percona XtraDB Cluster 5.7. Install ClusterControl, generate a SSH key and setup passwordless SSH from ClusterControl host to all nodes (including ProxySQL). On ClusterControl node, do:

$ whoami
root
$ ssh-keygen -t rsa
$ for i in 192.168.0.21 192.168.0.22 192.168.0.23 192.168.0.11 192.168.0.12; do ssh-copy-id $i; done

Before we deploy our cluster, we have to modify the MySQL configuration template file that ClusterControl will use when installing MySQL servers. The template file name is my57.cnf.galera and located under /usr/share/cmon/templates/ on the ClusterControl host. Make sure the following lines exist under [mysqld] section:

[mysqld]
...
transaction-isolation=READ-COMMITTED
wsrep_sync_wait=7
...

Save the file and we are good to go. The above are the requirements as stated in Camunda docs, especially on the supported transaction isolation for Galera. Variable wsrep_sync_wait is set to 7 to perform cluster-wide causality checks for READ (including SELECT, SHOW, and BEGIN or START TRANSACTION), UPDATE, DELETE, INSERT, and REPLACE statements, ensuring that the statement is executed on a fully synced node. Keep in mind that value other than 0 can result in increased latency.

Go to ClusterControl -> Deploy -> MySQL Galera and specify the following details (if not mentioned, use the default value):

SSH User: root
SSH Key Path: /root/.ssh/id_rsa
Cluster Name: Percona XtraDB Cluster 5.7
Vendor: Percona
Version: 5.7
Admin/Root Password: {specify a password}
Add Node: 192.168.0.21 (press Enter), 192.168.0.22 (press Enter), 192.168.0.23 (press Enter)

Make sure you got all the green ticks, indicating ClusterControl is able to connect to the node passwordlessly. Click "Deploy" to start the deployment.

Create the database, MySQL user and password on one of the database nodes:

mysql> CREATE DATABASE camunda;
mysql> CREATE USER camunda@'%' IDENTIFIED BY 'passw0rd';
mysql> GRANT ALL PRIVILEGES ON camunda.* TO camunda@'%';

Or from the ClusterControl interface, you can use Manage -> Schema and Users instead:

Once cluster is deployed, install ProxySQL by going to ClusterControl -> Manage -> Load Balancer -> ProxySQL -> Deploy ProxySQL and enter the following details:

Server Address: 192.168.0.11
Administration Password:
Monitor Password:
DB User: camunda
DB Password: passw0rd
Are you using implicit transactions?: Yes

Repeat the ProxySQL deployment step for the second ProxySQL instance, by changing the Server Address value to 192.168.0.12. The virtual IP address provided by Keepalived requires at least two ProxySQL instances deployed and running. Finally, deploy virtual IP address by going to ClusterControl -> Manage -> Load Balancer -> Keepalived and pick both ProxySQL nodes and specify the virtual IP address and network interface for the VIP to listen:

Our database backend is now complete. Next, import the SQL files into the Galera Cluster as the created MySQL user. On the application server, go to the "sql" directory and import them into one of the Galera nodes (we pick 192.168.0.21):

$ cd ~/camunda/sql/create
$ yum install mysql #install mysql client
$ mysql -ucamunda -p -h192.168.0.21 camunda < mysql_engine_7.10.0.sql
$ mysql -ucamunda -p -h192.168.0.21 camunda < mysql_identity_7.10.0.sql

Camunda does not provide MySQL connector for Java since its default database is H2. On the application server, download MySQL Connector/J from MySQL download page and copy the JAR file into Apache Tomcat bin directory:

$ wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-8.0.15.tar.gz
$ tar -xzf mysql-connector-java-8.0.15.tar.gz
$ cd mysql-connector-java-8.0.15
$ cp mysql-connector-java-8.0.15.jar ~/camunda/server/apache-tomcat-9.0.12/bin/

Then, set the CLASSPATH environment variable to include the database connector. Open setenv.sh using text editor:

$ vim ~/camunda/server/apache-tomcat-9.0.12/bin/setenv.sh

And add the following line:

export CLASSPATH=$CLASSPATH:$CATALINA_HOME/bin/mysql-connector-java-8.0.15.jar

Open ~/camunda/server/apache-tomcat-9.0.12/conf/server.xml and change the lines related to datastore. Specify the virtual IP address as the MySQL host in the connection string, with ProxySQL port 6033:

<Resource name="jdbc/ProcessEngine"
              ...
              driverClassName="com.mysql.jdbc.Driver" 
              defaultTransactionIsolation="READ_COMMITTED"
              url="jdbc:mysql://192.168.0.10:6033/camunda"
              username="camunda"  
              password="passw0rd"
              ...
/>

Finally, we can start the Camunda service by executing start-camunda.sh script:

$ cd ~/camunda
$ ./start-camunda.sh
starting camunda BPM platform on Tomcat Application Server
Using CATALINA_BASE:   ./server/apache-tomcat-9.0.12
Using CATALINA_HOME:   ./server/apache-tomcat-9.0.12
Using CATALINA_TMPDIR: ./server/apache-tomcat-9.0.12/temp
Using JRE_HOME:        /
Using CLASSPATH:       :./server/apache-tomcat-9.0.12/bin/mysql-connector-java-8.0.15.jar:./server/apache-tomcat-9.0.12/bin/bootstrap.jar:./server/apache-tomcat-9.0.12/bin/tomcat-juli.jar
Tomcat started.

Make sure the CLASSPATH shown in the output includes the path to the MySQL Connector/J JAR file. After the initialization completes, you can then access Camunda webapps on port 8080 at http://192.168.0.8:8080/camunda/. The default username is demo with password 'demo':

You can then see the digested capture queries from Nodes -> ProxySQL -> Top Queries, indicating the application is interacting correctly with the Galera Cluster:

There is no read-write splitting configured for ProxySQL. Camunda uses "SET autocommit=0" on every SQL statement to initialize transaction and the best way for ProxySQL to handle this by sending all the queries to the same backend servers of the target hostgroup. This is the safest method alongside better availability. However, all connections might end up reaching a single server, so there is no load balancing.

MariaDB Galera

MariaDB Connector/J is able to handle a variety of connection modes - failover, sequential, replication and aurora - but Camunda only supports failover and sequential. Taken from MariaDB Connector/J documentation:

Mode Description

Mode	Description
sequential (available since 1.3.0)	This mode supports connection failover in a multi-master environment, such as MariaDB Galera Cluster. This mode does not support load-balancing reads on slaves. The connector will try to connect to hosts in the order in which they were declared in the connection URL, so the first available host is used for all queries. For example, let's say that the connection URL is the following: `jdbc:mariadb:sequential:host1,host2,host3/testdb` When the connector tries to connect, it will always try host1 first. If that host is not available, then it will try host2. etc. When a host fails, the connector will try to reconnect to hosts in the same order.
failover (available since 1.2.0)	This mode supports connection failover in a multi-master environment, such as MariaDB Galera Cluster. This mode does not support load-balancing reads on slaves. The connector performs load-balancing for all queries by randomly picking a host from the connection URL for each connection, so queries will be load-balanced as a result of the connections getting randomly distributed across all hosts.

sequential
(available since 1.3.0)

This mode supports connection failover in a multi-master environment, such as MariaDB Galera Cluster. This mode does not support load-balancing reads on slaves. The connector will try to connect to hosts in the order in which they were declared in the connection URL, so the first available host is used for all queries. For example, let's say that the connection URL is the following:

jdbc:mariadb:sequential:host1,host2,host3/testdb

When the connector tries to connect, it will always try host1 first. If that host is not available, then it will try host2. etc. When a host fails, the connector will try to reconnect to hosts in the same order.

failover
(available since 1.2.0) This mode supports connection failover in a multi-master environment, such as MariaDB Galera Cluster. This mode does not support load-balancing reads on slaves. The connector performs load-balancing for all queries by randomly picking a host from the connection URL for each connection, so queries will be load-balanced as a result of the connections getting randomly distributed across all hosts.

Using "failover" mode poses a higher potential risk of deadlock, since writes will be distributed to all backend servers almost equally. Single-writer approach is a safe way to run, which means using sequential mode should do the job pretty well. You also can skip the load-balancer tier in the architecture. Hence with MariaDB Java connector, we can deploy our architecture as simple as below:

Before we deploy our cluster, modify the MariaDB configuration template file that ClusterControl will use when installing MariaDB servers. The template file name is my.cnf.galera and located under /usr/share/cmon/templates/ on ClusterControl host. Make sure the following lines exist under [mysqld] section:

[mysqld]
...
transaction-isolation=READ-COMMITTED
wsrep_sync_wait=7
performance_schema = ON
...

Save the file and we are good to go. A bit of explanation, the above list are the requirements as stated in Camunda docs, especially on the supported transaction isolation for Galera. Variable wsrep_sync_wait is set to 7 to perform cluster-wide causality checks for READ (including SELECT, SHOW, and BEGIN or START TRANSACTION), UPDATE, DELETE, INSERT, and REPLACE statements, ensuring that the statement is executed on a fully synced node. Keep in mind that value other than 0 can result in increased latency. Enabling Performance Schema is optional for ClusterControl query monitoring feature.

Now we can start the cluster deployment process. Install ClusterControl, generate a SSH key and setup passwordless SSH from ClusterControl host to all Galera nodes. On ClusterControl node, do:

$ whoami
root
$ ssh-keygen -t rsa
$ for i in 192.168.0.41 192.168.0.42 192.168.0.43; do ssh-copy-id $i; done

Go to ClusterControl -> Deploy -> MySQL Galera and specify the following details (if not mentioned, use the default value):

SSH User: root
SSH Key Path: /root/.ssh/id_rsa
Cluster Name: MariaDB Galera 10.3
Vendor: MariaDB
Version: 10.3
Admin/Root Password: {specify a password}
Add Node: 192.168.0.41 (press Enter), 192.168.0.42 (press Enter), 192.168.0.43 (press Enter)

Make sure you got all the green ticks when adding nodes, indicating ClusterControl is able to connect to the node passwordlessly. Click "Deploy" to start the deployment.

Create the database, MariaDB user and password on one of the Galera nodes:

mysql> CREATE DATABASE camunda;
mysql> CREATE USER camunda@'%' IDENTIFIED BY 'passw0rd';
mysql> GRANT ALL PRIVILEGES ON camunda.* TO camunda@'%';

For ClusterControl user, you can use ClusterControl -> Manage -> Schema and Users instead:

Our database cluster deployment is now complete. Next, import the SQL files into the MariaDB cluster. On the application server, go to the "sql" directory and import them into one of the MariaDB nodes (we chose 192.168.0.41):

$ cd ~/camunda/sql/create
$ yum install mysql #install mariadb client
$ mysql -ucamunda -p -h192.168.0.41 camunda < mariadb_engine_7.10.0.sql
$ mysql -ucamunda -p -h192.168.0.41 camunda < mariadb_identity_7.10.0.sql

Camunda does not provide MariaDB connector for Java since its default database is H2. On the application server, download MariaDB Connector/J from MariaDB download page and copy the JAR file into Apache Tomcat bin directory:

$ wget https://downloads.mariadb.com/Connectors/java/connector-java-2.4.1/mariadb-java-client-2.4.1.jar
$ cp mariadb-java-client-2.4.1.jar ~/camunda/server/apache-tomcat-9.0.12/bin/

Then, set the CLASSPATH environment variable to include the database connector. Open setenv.sh via text editor:

$ vim ~/camunda/server/apache-tomcat-9.0.12/bin/setenv.sh

And add the following line:

export CLASSPATH=$CLASSPATH:$CATALINA_HOME/bin/mariadb-java-client-2.4.1.jar

Open ~/camunda/server/apache-tomcat-9.0.12/conf/server.xml and change the lines related to datastore. Use the sequential connection protocol and list out all the Galera nodes separated by comma in the connection string:

<Resource name="jdbc/ProcessEngine"
              ...
              driverClassName="org.mariadb.jdbc.Driver" 
              defaultTransactionIsolation="READ_COMMITTED"
              url="jdbc:mariadb:sequential://192.168.0.41:3306,192.168.0.42:3306,192.168.0.43:3306/camunda"
              username="camunda"  
              password="passw0rd"
              ...
/>

Finally, we can start the Camunda service by executing start-camunda.sh script:

$ cd ~/camunda
$ ./start-camunda.sh
starting camunda BPM platform on Tomcat Application Server
Using CATALINA_BASE:   ./server/apache-tomcat-9.0.12
Using CATALINA_HOME:   ./server/apache-tomcat-9.0.12
Using CATALINA_TMPDIR: ./server/apache-tomcat-9.0.12/temp
Using JRE_HOME:        /
Using CLASSPATH:       :./server/apache-tomcat-9.0.12/bin/mariadb-java-client-2.4.1.jar:./server/apache-tomcat-9.0.12/bin/bootstrap.jar:./server/apache-tomcat-9.0.12/bin/tomcat-juli.jar
Tomcat started.

Make sure the CLASSPATH shown in the output includes the path to the MariaDB Java client JAR file. After the initialization completes, you can then access Camunda webapps on port 8080 at http://192.168.0.8:8080/camunda/. The default username is demo with password 'demo':

You can see the digested capture queries from ClusterControl -> Query Monitor -> Top Queries, indicating the application is interacting correctly with the MariaDB Cluster:

With MariaDB Connector/J, we do not need load balancer tier which simplifies our overall architecture. The sequential connection mode should do the trick to avoid multi-writer deadlocks - which can happen in Galera. This setup provides high availability with each Camunda instance configured with JDBC to access the cluster of MySQL or MariaDB nodes. Galera takes care of synchronizing the data between the database instances in real time.

Tags:

We’re happy to announce that our new whitepaper How to Deploy Open Source Databases is now available to download for free!

Choosing which DB engine to use between all the options we have today is not an easy task. An that is just the beginning. After deciding which engine to use, you need to learn about it and actually deploy it to play with it. We plan to help you on that second step, and show you how to install, configure and secure some of the most popular open source DB engines.

In this whitepaper we are going to explore the top open source databases and how to deploy each technology using proven methodologies that are battle-tested.

Topics included in this whitepaper are …

An Overview of Popular Open Source Databases
- Percona
- MariaDB
- Oracle MySQL
- MongoDB
- PostgreSQL
How to Deploy Open Source Databases
- Percona Server for MySQL
- Oracle MySQL Community Server
  - Group Replication
- MariaDB
  - MariaDB Cluster Configuration
- Percona XtraDB Cluster
- NDB Cluster
- MongoDB
- Percona Server for MongoDB
- PostgreSQL
How to Deploy Open Source Databases by Using ClusterControl
- Deploy
- Scaling
- Load Balancing
- Management

Download the whitepaper today!

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

About ClusterControl

ClusterControl is the all-inclusive open source database management system for users with mixed environments that removes the need for multiple management tools. ClusterControl provides advanced deployment, management, monitoring, and scaling functionality to get your MySQL, MongoDB, and PostgreSQL databases up-and-running using proven methodologies that you can depend on to work. At the core of ClusterControl is it’s automation functionality that lets you automate many of the database tasks you have to perform regularly like deploying new databases, adding and scaling new nodes, running backups and upgrades, and more.

To learn more about ClusterControl click here.

About Severalnines

Severalnines provides automation and management software for database clusters. We help companies deploy their databases in any environment, and manage all operational aspects to achieve high-scale availability.

Severalnines' products are used by developers and administrators of all skill levels to provide the full 'deploy, manage, monitor, scale' database cycle, thus freeing them from the complexity and learning curves that are typically associated with highly available database clusters. Severalnines is often called the “anti-startup” as it is entirely self-funded by its founders. The company has enabled over 32,000 deployments to date via its popular product ClusterControl. Currently counting BT, Orange, Cisco, CNRS, Technicolor, AVG, Ping Identity and Paytrail as customers. Severalnines is a private company headquartered in Stockholm, Sweden with offices in Singapore, Japan and the United States. To see who is using Severalnines today visit, https://www.severalnines.com/company.

Tags:

MySQL master-slave replication is pretty easy and straightforward to set up. This is the main reason why people choose this technology as the first step to achieve better database availability. However, it comes at the price of complexity in management and maintenance; it is up to the admin to maintain the data integrity, especially during failover, failback, maintenance, upgrade and so on.

There are many articles out there describing on how to perform failover operation for replication setup. We have also covered this topic in this blog post, Introduction to Failover for MySQL Replication - the 101 Blog. In this blog post, we are going to cover the post-disaster tasks when restoring to the original topology - performing failback operation.

Why Do We Need Failback?

The replication leader (master) is the most critical node in a replication setup. It requires good hardware specs to ensure it can process writes, generate replication events, process critical reads and so on in a stable way. When failover is required during disaster recovery or maintenance, it might not be uncommon to find us promoting a new leader with inferior hardware. This situation might be okay temporarily, however for a long run, the designated master must be brought back to lead the replication after it is deemed healthy.

Contrary to failover, failback operation usually happens in a controlled environment through switchover, it rarely happens in panic-mode. This gives the operation team some time to plan carefully and rehearse the exercise for a smooth transition. The main objective is simply to bring back the good old master to the latest state and restore the replication setup to its original topology. However, there are some cases where failback is critical, for example when the newly promoted master did not work as expected and affecting the overall database service.

How to Perform Failback Safely?

After failover happened, the old master would be out of the replication chain for maintenance or recovery. To perform the switchover, one must do the following:

Provision the old master to the correct state, by making it the most up-to-date slave.
Stop the application.
Verify all slaves are caught up.
Promote the old master as the new leader.
Repoint all slaves to the new master.
Start up the application by writing to the new master.

Consider the following replication setup:

"A" was a master until a disk-full event causing havoc to the replication chain. After a failover event, our replication topology was lead by B and replicates onto C till E. The failback exercise will bring back A as the leader and restore the original topology before the disaster. Take note that all nodes are running on MySQL 8.0.15 with GTID enabled. Different major version might use different commands and steps.

While this is what our architecture looks like now after failover (taken from ClusterControl's Topology view):

Node Provisioning

Before A can be a master, it must be brought up-to-date with the current database state. The best way to do this is to turn A as slave to the active master, B. Since all nodes are configured with log_slave_updates=ON (it means a slave also produces binary logs), we can actually pick other slaves like C and D as the source of truth for initial syncing. However, the closer to the active master, the better. Keep in mind of the additional load it might cause when taking the backup. This part takes the most of the failback hours. Depending on the node state and dataset size, syncing up the old master could take some time (it could be hours and days).

Once problem on "A" is resolved and ready to join the replication chain, the best first step is to attempt replicating from "B" (192.168.0.42) with CHANGE MASTER statement:

mysql> SET GLOBAL read_only = 1; /* enable read-only */
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.42', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'p4ss', MASTER_AUTO_POSITION = 1; /* master information to connect */
mysql> START SLAVE; /* start replication */
mysql> SHOW SLAVE STATUS\G /* check replication status */

If replication works, you should see the following in the replication status:

             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

If the replication fails, look at the Last_IO_Error or Last_SQL_Error from slave status output. For example, if you see the following error:

Last_IO_Error: error connecting to master 'rpl_user@192.168.0.42:3306' - retry-time: 60  retries: 2

Then, we have to create the replication user on the current active master, B:

mysql> CREATE USER rpl_user@192.168.0.41 IDENTIFIED BY 'p4ss';
mysql> GRANT REPLICATION SLAVE ON *.* TO rpl_user@192.168.0.41;

Then, restart the slave on A to start replicating again:

mysql> STOP SLAVE;
mysql> START SLAVE;

Other common error you would see is this line:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: ...

That probably means the slave is having problem reading the binary log file from the current master. In some occasions, the slave might be way behind whereby the required binary events to start the replication have been missing from the current master, or the binary on the master has been purged during the failover and so on. In this case, the best way is to perform a full sync by taking a full backup on B and restore it on A. On B, you can use either mysqldump or Percona Xtrabackup to take a full backup:

$ mysqldump -uroot -p --all-databases --single-transaction --triggers --routines > dump.sql # for mysqldump
$ xtrabackup --defaults-file=/etc/my.cnf --backup --parallel 1 --stream=xbstream --no-timestamp | gzip -6 - > backup-full-2019-04-16_071649.xbstream.gz # for xtrabackup

Transfer the backup file to A, reinitialize the existing MySQL installation for a proper cleanup and perform database restoration:

$ systemctl stop mysqld # if mysql is still running
$ rm -Rf /var/lib/mysql # wipe out old data
$ mysqld --initialize --user=mysql # initialize database
$ systemctl start mysqld # start mysql
$ grep -i 'temporary password' /var/log/mysql/mysqld.log # retrieve the temporary root password
$ mysql -uroot -p -e 'ALTER USER root@localhost IDENTIFIED BY "p455word"' # mandatory root password update
$ mysql -uroot -p < dump.sql # restore the backup using the new root password

Once restored, setup the replication link to the active master B (192.168.0.42) and enable read-only. On A, run the following statements:

mysql> SET GLOBAL read_only = 1; /* enable read-only */
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.42', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'p4ss', MASTER_AUTO_POSITION = 1; /* master information to connect */
mysql> START SLAVE; /* start replication */
mysql> SHOW SLAVE STATUS\G /* check replication status */

For Percona Xtrabackup, please refer to the documentation page on how to restore to A. It involves a prerequisite step to prepare the backup first before replacing the MySQL data directory.

Once A has started replicating correctly, monitor the Seconds_Behind_Master in the slave status. This will give you an idea on how far the slave has left behind and how long you need to wait before it catches up. At this point, our architecture looks like this:

Once Seconds_Behind_Master falls back to 0, that's the moment when A has caught up as an up-to-date slave.

If you are using ClusterControl, you have the option to resync the node by restoring from an existing backup or create and stream the backup directly from the active master node:

Staging the slave with existing backup is the recommended way to do in order to build the slave, since it doesn't bring any impact the active master server when preparing the node.

Promote the Old Master

Before promoting A as the new master, the safest way is to stop all writes operation on B. If this is not possible, simply force B to operate in read-only mode:

mysql> SET GLOBAL read_only = 'ON';
mysql> SET GLOBAL super_read_only = 'ON';

Then, on A, run SHOW SLAVE STATUS and check the following replication status:

Read_Master_Log_Pos: 45889974
Exec_Master_Log_Pos: 45889974
Seconds_Behind_Master: 0
Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

The value of Read_Master_Log_Pos and Exec_Master_Log_Pos must be identical, while Seconds_Behind_Master is 0 and the state must be 'Slave has read all relay log'. Make sure that all slaves have processed any statements in their relay log, otherwise you will risk that the new queries will affect transactions from the relay log, triggering all sorts of problems (for example, an application may remove some rows which are accessed by transactions from relay log).

On A, stop the replication and use RESET SLAVE ALL statement to remove all replication-related configuration and disable read only:

mysql> STOP SLAVE;
mysql> RESET SLAVE ALL;
mysql> SET GLOBAL read_only = 'OFF';
mysql> SET GLOBAL super_read_only = 'OFF';

At this point, A is ready to accept writes (read_only=OFF), however slaves are not connected to it, as illustrated below:

For ClusterControl users, promoting A can be done by using "Promote Slave" feature under Node Actions. ClusterControl will automatically demote the active master B, promote slave A as master and repoint C and D to replicate from A. B will be put aside and user has to explicitly choose "Change Replication Master" to rejoin B replicating from A at a later stage.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Slave Repointing

It's now safe to change the master on related slaves to replicate from A (192.168.0.41). On all slaves except E, configure the following:

mysql> STOP SLAVE;
mysql> CHANGE MASTER TO MASTER_HOST = '192.168.0.41', MASTER_USER = 'rpl_user', MASTER_PASSWORD = 'p4ss', MASTER_AUTO_POSITION = 1;
mysql> START SLAVE;

If you are a ClusterControl user, you may skip this step as repointing is being performed automatically when you decided to promote A previously.

We can then start our application to write on A. At this point, our architecture is looking something like this:

From ClusterControl topology view, we have restored our replication cluster to its original architecture which looks like this:

Take note that failback exercise is much less risky if compared to failover. It's important to schedule this exercise during off-peak hours to minimize the impact to your business.

Final Thoughts

Failover and failback operation must be performed carefully. The operation is fairly simple if you have a small number of nodes but for multiple nodes with complex replication chain, it could be a risky and error-prone exercise. We also showed how ClusterControl can be used to simplify complex operations by performing them through the UI, plus the topology view is visualized in real-time so you have the understanding on the replication topology you want to build.

Tags:

There are multiple ways of deploying a database. You can install it by hand, you can rely on the widely available infrastructure orchestration tools like Ansible, Chef, Puppet or Salt. Those tools are very popular and it is quite easy to find scripts, recipes, playbooks, you name it, which will help you automate the installation of a database cluster. There are also more specialized database automation platforms, like ClusterControl, which can also be used to automated deployment. What would be the best way of deploying your cluster? How much time you will actually need to deploy it?

First, let us clarify what we want to do. Let’s assume we will be deploying Percona XtraDB Cluster 5.7. It will consist of three nodes and for that we will use three Vagrant virtual machines running Ubuntu 16.04 (bento/ubuntu-16.04 image). We will attempt to deploy a cluster manually, then using Ansible and ClusterControl. Let’s see how the results will look like.

Manual Deployment

Repository Setup - 1 minute, 45 seconds.

First of all, we have to configure Percona repositories on all Ubuntu nodes. Quick google search, ssh into the virtual machines and running required commands takes 1m45s

We found the following page with instructions:
https://www.percona.com/doc/percona-repo-config/percona-release.html

and we executed steps described in “DEB-BASED GNU/LINUX DISTRIBUTIONS” section. We also ran apt update, to refresh apt’s cache.

Installing PXC Nodes - 2 minutes 45 seconds

This step basically consists of executing:

root@vagrant:~# apt install percona-xtradb-cluster-5.7

The rest is mostly dependent on your internet connection speed as packages are being downloaded. Your input will also be needed (you’ll be passing a password for the superuser) so it is not unattended installation. When everything is done, you will end up with three running Percona XtraDB Cluster nodes:

root     15488  0.0  0.2   4504  1788 ?        S    10:12   0:00 /bin/sh /usr/bin/mysqld_safe
mysql    15847  0.3 28.3 1339576 215084 ?      Sl   10:12   0:00  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1

Configuring PXC nodes - 3 minutes, 25 seconds

Here starts the tricky part. It is really hard to quantify experience and how much time one would need to actually understand what is needed to be done. What is good, google search “how to install percona xtrabdb cluster” points to Percona’s documentation, which describes how the process should look like. It still may take more or less time, depending on how familiar you are with the PXC and Galera in general. Worst case scenario you will not be aware of any additional required actions and you will connect to your PXC and start working with it, not realizing that, in fact, you have three nodes, each forming a cluster of its own.

Let’s assume we follow the recommendation from Percona and time just those steps to be executed. In short, we modified configuration files as per instructions on the Percona website, we also attempted to bootstrap the first node:

root@vagrant:~# /etc/init.d/mysql bootstrap-pxc
mysqld: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line 10!
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
mysqld: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line 10!
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
mysqld: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line 10!
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
mysqld: [ERROR] Found option without preceding group in config file /etc/mysql/my.cnf at line 10!
mysqld: [ERROR] Fatal error in defaults handling. Program aborted!
 * Bootstrapping Percona XtraDB Cluster database server mysqld                                                                                                                                                                                                                     ^C

This did not look correct. Unfortunately, instructions weren’t crystal clear. Again, if you don’t know what is going on, you will spend more time trying to understand what happened. Luckily, stackoverflow.com comes very helpful (although not the first response on the list that we got) and you should realise that you miss [mysqld] section header in your /etc/mysql/my.cnf file. Adding this on all nodes and repeating the bootstrap process solved the issue. In total we spent 3 minutes and 25 seconds (not including googling for the error as we noticed immediately what was the problem).

Configuring for SST, Bringing Other Nodes Into the Cluster - Starting From 8 Minutes to Infinity

The instructions on Percona web site are quite clear. Once you have one node up and running, just start remaining nodes and you will be fine. We tried that and we were unable to see more nodes joining the cluster. This is where it is virtually impossible to tell how long it will take to diagnose the issue. It took us 6-7 minutes but to be able to do it quickly you have to:

Be familiar with how PXC configuration is structured:

root@vagrant:~# tree  /etc/mysql/
/etc/mysql/
├── conf.d
│   ├── mysql.cnf
│   └── mysqldump.cnf
├── my.cnf -> /etc/alternatives/my.cnf
├── my.cnf.fallback
├── my.cnf.old
├── percona-xtradb-cluster.cnf
└── percona-xtradb-cluster.conf.d
    ├── client.cnf
    ├── mysqld.cnf
    ├── mysqld_safe.cnf
    └── wsrep.cnf

Know how the !include and !includedir directives work in MySQL configuration files
Know how MySQL handles the same variables included in multiple files
Know what to look for and be aware of configurations that would result in node bootstrapping itself to form a cluster on its own

The problem was related to the fact that instructions did not mention any file except for /etc/mysql/my.cnf where, in fact, we should have been modifying /etc/mysql/percona-xtradb-cluster.conf.d/wsrep.cnf. That file contained empty variable:

wsrep_cluster_address=gcomm://

and such configuration forces node to bootstrap as it does not have information about other nodes to join to. We set that variable in /etc/mysql/my.cnf but later wsrep.cnf file was included, overwriting our setup.

This issue might be a serious blocker for people who are not really familiar with how MySQL and Galera works, resulting even in hours if not more of debugging.

Total Installation Time - 16 minutes (If You Are MySQL DBA Like I Am)

We managed to install Percona XtraDB Cluster in 16 minutes. You have to keep in mind a couple of things - we did not tune the configuration. This is something which will require more time and knowledge. PXC node comes with some simple configuration, related mostly to binary logging and Galera writeset replication. There is no InnoDB tuning. If you are not familiar with MySQL internals, this is hours if not days of reading and familiarizing yourself with internal mechanisms. Another important thing is that this is a process you would have to re-apply for every cluster you deploy. Finally, we managed to identify the issue and solve it very fast due to our experience with Percona XtraDB Cluster and MySQL in general. Casual user will most likely spend significantly more time trying to understand what is going on and why.

Ansible Playbook

Now, on to automation with Ansible. Let’s try to find and use an ansible playbook, which we could reuse for all further deployments. Let’s see how long will it take to do that.

Configuring SSH Connectivity - 1 minute

Ansible requires SSH connectivity across all the nodes to connect and configure them. We generated a SSH key and manually distributed it across the nodes.

Finding Ansible Playbook - 2 minutes 15 seconds

The main issue here is that there are so many playbooks available out there that it is impossible to decide what’s best. As such, we decided to go with top 3 Google results and try to pick one. We decided on https://github.com/cdelgehier/ansible-role-XtraDB-Cluster as it seems to be more configurable than the remaining ones.

Cloning Repository and Installing Ansible - 30 seconds

This is quick, all we needed was to

apt install ansible git
git clone https://github.com/cdelgehier/ansible-role-XtraDB-Cluster.git

Preparing Inventory File - 1 minute 10 seconds

This step was also very simple, we created an inventory file using example from documentation. We just substituted IP addresses of the nodes to what we have configured in our environment.

Preparing a Playbook - 1 minute 45 seconds

We decided to use the most extensive example from the documentation, which includes also a bit of the configuration tuning. We prepared a correct structure for the Ansible (there was no such information in the documentation):

/root/pxcansible/
├── inventory
├── pxcplay.yml
└── roles
    └── ansible-role-XtraDB-Cluster

Then we ran it but immediately we got an error:

root@vagrant:~/pxcansible# ansible-playbook pxcplay.yml
 [WARNING]: provided hosts list is empty, only localhost is available

ERROR! no action detected in task

The error appears to have been in '/root/pxcansible/roles/ansible-role-XtraDB-Cluster/tasks/main.yml': line 28, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: "Include {{ ansible_distribution }} tasks"
  ^ here
We could be wrong, but this one looks like it might be an issue with
missing quotes.  Always quote template expression brackets when they
start a value. For instance:

    with_items:
      - {{ foo }}

Should be written as:

    with_items:
      - "{{ foo }}"

This took 1 minute and 45 seconds.

Fixing the Playbook Syntax Issue - 3 minutes 25 seconds

The error was misleading but the general rule of thumb is to try more recent Ansible version, which we did. We googled and found good instructions on Ansible website. Next attempt to run the playbook also failed:

TASK [ansible-role-XtraDB-Cluster : Delete anonymous connections] *****************************************************************************************************************************************************************************************************************
fatal: [node2]: FAILED! => {"changed": false, "msg": "The PyMySQL (Python 2.7 and Python 3.X) or MySQL-python (Python 2.X) module is required."}
fatal: [node3]: FAILED! => {"changed": false, "msg": "The PyMySQL (Python 2.7 and Python 3.X) or MySQL-python (Python 2.X) module is required."}
fatal: [node1]: FAILED! => {"changed": false, "msg": "The PyMySQL (Python 2.7 and Python 3.X) or MySQL-python (Python 2.X) module is required."}

Setting up new Ansible version and running the playbook up to this error took 3 minutes and 25 seconds.

Fixing the Missing Python Module - 3 minutes 20 seconds

Apparently, the role we used did not take care of its prerequisites and a Python module was missing for connecting to and securing the Galera cluster. We first tried to install MySQL-python via pip but it became apparent that it will take more time as it required mysql_config:

root@vagrant:~# pip install MySQL-python
Collecting MySQL-python
  Downloading https://files.pythonhosted.org/packages/a5/e9/51b544da85a36a68debe7a7091f068d802fc515a3a202652828c73453cad/MySQL-python-1.2.5.zip (108kB)
    100% |████████████████████████████████| 112kB 278kB/s
    Complete output from command python setup.py egg_info:
    sh: 1: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-zzwUtq/MySQL-python/setup.py", line 17, in <module>
        metadata, options = get_config()
      File "/tmp/pip-build-zzwUtq/MySQL-python/setup_posix.py", line 43, in get_config
        libs = mysql_config("libs_r")
      File "/tmp/pip-build-zzwUtq/MySQL-python/setup_posix.py", line 25, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    EnvironmentError: mysql_config not found

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-zzwUtq/MySQL-python/

That is provided by MySQL development libraries so we would have to install them manually, which was pretty much pointless. We decided to go with PyMySQL, which did not require other packages to install. This brought us to another issue:

TASK [ansible-role-XtraDB-Cluster : Delete anonymous connections] *****************************************************************************************************************************************************************************************************************
fatal: [node3]: FAILED! => {"changed": false, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1698, u\"Access denied for user 'root'@'localhost'\")"}
fatal: [node2]: FAILED! => {"changed": false, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1698, u\"Access denied for user 'root'@'localhost'\")"}
fatal: [node1]: FAILED! => {"changed": false, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1698, u\"Access denied for user 'root'@'localhost'\")"}
    to retry, use: --limit @/root/pxcansible/pxcplay.retry

Up to this point we spent 3 minutes and 20 seconds.

Fixing “Access Denied” Error - 18 minutes 55 seconds

As per error, we did ensure that MySQL config is prepared correctly and that it included correct user and password to connect to the database. This, unfortunately, did not work as expected. We did investigate further and found that the role did not create root user properly, even though it marked the step as completed. We did a short investigation but decided to make the manual fix instead of trying to debug the playbook, which would take way more time than the steps which we did. We just created manually users root@127.0.0.1 and root@localhost with correct passwords. This allowed us to pass this step and onto another error:

TASK [ansible-role-XtraDB-Cluster : Start the master node] ************************************************************************************************************************************************************************************************************************
skipping: [node1]
skipping: [node2]
skipping: [node3]

TASK [ansible-role-XtraDB-Cluster : Start the master node] ************************************************************************************************************************************************************************************************************************
skipping: [node1]
skipping: [node2]
skipping: [node3]

TASK [ansible-role-XtraDB-Cluster : Create SST user] ******************************************************************************************************************************************************************************************************************************
skipping: [node1]
skipping: [node2]
skipping: [node3]

TASK [ansible-role-XtraDB-Cluster : Start the slave nodes] ************************************************************************************************************************************************************************************************************************
fatal: [node3]: FAILED! => {"changed": false, "msg": "Unable to start service mysql: Job for mysql.service failed because the control process exited with error code. See \"systemctl status mysql.service\" and \"journalctl -xe\" for details.\n"}
fatal: [node2]: FAILED! => {"changed": false, "msg": "Unable to start service mysql: Job for mysql.service failed because the control process exited with error code. See \"systemctl status mysql.service\" and \"journalctl -xe\" for details.\n"}
fatal: [node1]: FAILED! => {"changed": false, "msg": "Unable to start service mysql: Job for mysql.service failed because the control process exited with error code. See \"systemctl status mysql.service\" and \"journalctl -xe\" for details.\n"}
    to retry, use: --limit @/root/pxcansible/pxcplay.retry

For this section we spent 18 minutes and 55 seconds.

Fixing “Start the Slave Nodes” Issue (part 1) - 7 minutes 40 seconds

We tried a couple of things to solve this problem. We tried to specify node using its name, we tried to switch group names, nothing solved the issue. We decided to clean up the environment using the script provided in the documentation and start from scratch. It did not clean it but just made things even worse. After 7 minutes and 40 seconds we decided to wipe out the virtual machines, recreate the environment and start from scratch hoping that when we add the Python dependencies, this will solve our issue.

Fixing “Start the Slave Nodes” Issue (part 2) - 13 minutes 15 seconds

Unfortunately, setting up Python prerequisites did not help at all. We decided to finish the process manually, bootstrapping the first node and then configuring SST user and starting remaining slaves. This completed the “automated” setup and it took us 13 minutes and 15 seconds to debug and then finally accept that it will not work like the playbook designer expected.

Further Debugging - 10 minutes 45 seconds

We did not stop there and decided that we’ll try one more thing. Instead of relying on Ansible variables we just put the IP of one of the nodes as the master node. This solved that part of the problem and we ended up with:

TASK [ansible-role-XtraDB-Cluster : Create SST user] ******************************************************************************************************************************************************************************************************************************
skipping: [node2]
skipping: [node3]
fatal: [node1]: FAILED! => {"changed": false, "msg": "unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (1045, u\"Access denied for user 'root'@'::1' (using password: YES)\")"}

This was the end of our attempts - we tried to add this user but it did not work correctly through the ansible playbook while we could use IPv6 localhost address to connect to when using MySQL client.

Total Installation Time - Unknown (Automated Installation Failed)

In total we spent 64 minutes and we still haven’t managed to get things going automatically. The remaining problems are root password creation which doesn’t seem to work and then getting the Galera Cluster started (SST user issue). It is hard to tell how long will it take to debug it further. It is sure possible - it is just hard to quantify because it really depends on the experience with Ansible and MySQL. It is definitely not something anyone can just download, configure and run. Well, maybe another playbook would have worked differently? It is possible, but it may as well result in different issues. Ok, so there is a learning curve to climb and debugging to make but then, when you are all set, you will just run a script. Well, that’s sort of true. As long as changes introduced by the maintainer won’t break something you depend on or new Ansible version will break the playbook or the maintainer will just forget about the project and stop developing it (for the role that we used there’s quite useful pull request waiting already for almost a year, which might be able to solve the Python dependency issue - it has not been merged). Unless you accept that you will have to maintain this code, you cannot really rely on it being 100% accurate and working in your environment, especially given that the original developer has no incentives in keeping the code up to date. Also, what about other versions? You cannot use this particular playbook to install PXC 5.6 or any MariaDB version. Sure, there are other playbooks you can find. Will they work better or maybe you’ll spend another bunch of hours trying to make them to work?

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

ClusterControl

Finally, let’s take a look at how ClusterControl can be used to deploy Percona XtraDB Cluster.

Configuring SSH Connectivity - 1 minute

ClusterControl requires SSH connectivity across all the nodes to connect and configure them. We generated a SSH key and manually distributed it across the nodes.

Setting Up ClusterControl - 3 minutes 15 seconds

Quick search “ClusterControl install” pointed us to relevant ClusterControl documentation page. We were looking for a “simpler way to install ClusterControl” therefore we followed the link and found following instructions.

Downloading the script and running it took 3 minutes and 15 seconds, we had to take some actions while installation proceeded so it is not unattended installation.

Logging Into UI and Deployment Start - 1 minute 10 seconds

We pointed our browser to the IP of ClusterControl node.

We passed the required contact information and we were presented with the Welcome screen:

Next step - we picked the deployment option.

We had to pass SSH connectivity details.

We also decided on the vendor, version, password and hosts to use. This whole process took 1 minute and 10 seconds.

Percona XtraDB Cluster Deployment - 12 minutes 5 seconds

The only thing left was to wait for ClusterControl to finish the deployment. After 12 minutes and 5 seconds the cluster was ready:

Total Installation Time - 17 minutes 30 seconds

We managed to deploy ClusterControl and then PXC cluster using ClusterControl in 17 minutes and 30 seconds. The PXC deployment itself took 12 minutes and 5 seconds. At the end we have a working cluster, deployed according to the best practices. ClusterControl also ensures that the configuration of the cluster makes sense. In short, even if you don't really know anything about MySQL or Galera Cluster, you can have a production-ready cluster deployed in a couple of minutes. ClusterControl is not just a deployment tool, it is also management platform - makes things even easier for people not experienced with MySQL and Galera to identify performance problems (through advisors) and do management actions (scaling the cluster up and down, running backups, creating asynchronous slaves to Galera). What is important, ClusterControl will always be maintained and can be used to deploy all MySQL flavors (and not only MySQL/MariaDB, it also supports TimeScaleDB, PostgreSQL and MongoDB). It also worked out of the box, something which cannot be said about other methods we tested.

If you would like to experience the same, you can download ClusterControl for free. Let us know how you liked it.

Tags:

A Docker image can be built by anyone who has the ability to write a script. That is why there are many similar images being built by the community, with minor differences but really serving a common purpose. A good (and popular) container image must have well-written documentation with clear explanations, an actively maintained repository and with regular updates. Check out this blog post if you want to learn how to build and publish your own Docker image for MySQL, or this blog post if you just want to learn the basics of running MySQL on Docker.

In this blog post, we are going to look at some of the most popular Docker images to run our MySQL or MariaDB server. The images we have chosen are general-purpose public images that can at least run a MySQL service. Some of them include non-essential MySQL-related applications, while others just serve as a plain mysqld instance. The listing here is based on the result of Docker Hub, the world's largest library and community for container images.

TLDR

The following table summarizes the different options:

Aspect	MySQL (Docker)	MariaDB (Docker)	Percona (Docker)	MySQL (Oracle)	MySQL/MariaDB (CentOS)	MariaDB (Bitnami)
Downloads^*	10M+	10M+	10M+	10M+	10M+	10M+
Docker Hub	mysql	mariadb	percona	mysql/mysql-server	mysql-80-centos7 mysql-57-centos7 mysql-56-centos7 mysql-55-centos7 mariadb-102-centos7 mariadb-101-centos7	bitnami/mariadb
Project page	mysql	mariadb	percona-docker	mysql-docker	mysql-container	bitnami-docker-mariadb
Base image	Debian 9	Ubuntu 18.04 (bionic) Ubuntu 14.04 (trusty)	CentOS 7	Oracle Linux 7	RHEL 7 CentOS 7	Debian 9 (minideb) Oracle Linux 7
Supported database versions	5.5 5.6 5.7 8.0	5.5 10.0 10.1 10.2 10.3 10.4	5.6 5.7 8.0	5.5 5.6 5.7 8.0	5.5 5.6 5.7 8.0 10.1 10.2	10.1 10.2 10.3
Supported platforms	x86_64	x86 x86_64 arm64v8 ppc64le	x86 x86_64	x86_64	x86_64	x86_64
Image size (tag: latest)	129 MB	120 MB	193 MB	99 MB	178 MB	87 MB
First commit	May 18, 2014	Nov 16, 2014	Jan 3, 2016	May 18, 2014^**	Feb 15, 2015	May 17, 2015
Contributors	18	9	15	14	30	20
Github Star	1267	292	113	320	89	152
Github Fork	1291	245	121	1291**	146	71

^* Taken from Docker Hub page.
^** Forked from MySQL docker project.

mysql (Docker)

The images are built and maintained by the Docker community with the help of MySQL team. It can be considered the most popular publicly available MySQL server images hosted on Docker Hub and one of the earliest on the market (the first commit was May 18, 2014). It has been forked ~1300 times with 18 active contributors. It supports the Docker version down to 1.6 on a best-effort basis. At this time of writing, all the MySQL major versions are supported - 5.5, 5.6, 5.7 and 8.0 on x86_64 architecture only.

Most of the MySQL images built by others are inspired by the way this image was built. MariaDB, Percona and MySQL Server (Oracle) images are following a similar environment variables, configuration file structure and container initialization process flow.

The following environment variables are available on most of the MySQL container images on Docker Hub:

MYSQL_ROOT_PASSWORD
MYSQL_DATABASE
MYSQL_USER
MYSQL_PASSWORD
MYSQL_ALLOW_EMPTY_PASSWORD
MYSQL_RANDOM_ROOT_PASSWORD
MYSQL_ONETIME_PASSWORD

The image size (tag: latest) is averagely small (129MB), easy to use, well maintained and updated regularly by the maintainer. If your application requires the latest MySQL database container, this is the most recommended public image you can use.

mariadb (Docker)

The images are maintained by Docker community with the help of MariaDB team. It uses the same style of building structure as the mysql (Docker) image, but it comes with multiple architectures support:

Linux x86-64 (amd64)
ARMv8 64-bit (arm64v8)
x86/i686 (i386)
IBM POWER8 (ppc64le)

At the time of this writing, the images support MariaDB version 5.5 up until 10.4, where image with the "latest" tag size is around 120MB. This image serves as a general-purpose image and follows the instructions, environment variables and configuration file structure as mysql (Docker). Most applications that required MySQL as the database backend is commonly compatible with MariaDB, since both are talking the same protocol.

MariaDB server used to be a fork of MySQL but now it has been diverted away from it. In terms of database architecture design, some MariaDB versions are not 100% compatible and no longer a drop-in replacement with theirs respective MySQL versions. Check out this page for details. However, there are ways to migrate between each other by using logical backup. Simply said, that once you are in the MariaDB ecosystem, you probably have to stick with it. Mixing or switching between MariaDB and MySQL in a cluster is not recommended.

If you would like to set up a more advanced MariaDB setup (replication, Galera, sharding), there are other images built to achieve that objective much more easily, e.g, bitnami/mariadb as explained further down.

percona (Docker)

Percona Server is a fork of MySQL created by Percona. These are the only official Percona Server Docker images, created and maintained by the Percona team. It supports both x86 and x86_64 architecture and the image is based on CentOS 7. Percona only maintains the latest 3 major MySQL versions for container images - 5.6, 5.7 and 8.0.

The code repository points out that first commit was Jan 3, 2016 with 15 actively contributors mostly from Percona development team. Percona Server for MySQL comes with XtraDB storage engine (a drop-in replacement for InnoDB) and follows the upstream Oracle MySQL releases very closely (including all the bug fixes in it) with some additional features like MyRocks storage engine, TokuDB as well as Percona’s own bug fixes. In a way, you can think of it as an improved version of Oracle’s MySQL. You can easily switch between MySQL and Percona Server images, provided you are running on the compatible version.

The images recognize two additional environment variables for TokuDB and RocksDB for MySQL (available since v5.6):

INIT_TOKUDB - Set to 1 to allow the container to be started with enabled TOKUDB storage engine.
INIT_ROCKSDB - Set to 1 to allow the container to be started with enabled ROCKSDB storage engine.

mysql-server (Oracle)

The repository is forked from mysql by Docker team. The images are created, maintained and supported by the MySQL team at Oracle built on top of Oracle Linux 7 base image. The MySQL 8.0 image comes with MySQL Community Server (minimal) and MySQL Shell and the server is configured to expose X protocol on port 33060 from minimal repository. The minimal package was designed for use by the official Docker images for MySQL. It cuts out some of the non-essential pieces of MySQL like innochecksum, myisampack, mysql_plugin, but is otherwise the same product. Therefore, it has a very small image footprint which is around 99 MB.

One important point to note is the images have a built-in health check script, which is very handy for some people who are in need for an accurate availability logic. Otherwise, people have to write a custom Docker's HEALTHCHECK command (or script) to check for the container health.

mysql-xx-centos7 & mariadb-xx-centos7 (CentOS)

The container images are built and maintained by CentOS team which include MySQL database server for OpenShift and general usage. For RHEL based images, you can pull them from Red Hat's Container Catalog while the CentOS based images are hosted publicly at Docker Hub on different pages for every major version (only list out images with 10M+ downloads):

MySQL 8.0: https://hub.docker.com/r/centos/mysql-80-centos7
MySQL 5.7: https://hub.docker.com/r/centos/mysql-57-centos7
MySQL 5.6: https://hub.docker.com/r/centos/mysql-56-centos7
MySQL 5.5: https://hub.docker.com/r/centos/mysql-55-centos7
MariaDB 10.2: https://hub.docker.com/r/centos/mariadb-102-centos7
MariaDB 10.1: https://hub.docker.com/r/centos/mariadb-101-centos7

The image structure is a bit different and it doesn't make use of image tag like others, thus the image name becomes a bit longer instead. Having said that, you have to go to the correct Docker Hub page to get the major version you want to pull.

According to the code repository page, 30 contributors have collaborated in the project since February 15, 2015. It supports MySQL 5.5 up until 8.0 and MariaDB 5.5 until 10.2 for x86_64 architecture only. If you heavily rely on Red Hat containerization infrastructure like OpenShift, these are probably the most popular or well-maintained images for MySQL and MariaDB.

The following environment variables influence the MySQL/MariaDB configuration file and they are all optional:

MYSQL_LOWER_CASE_TABLE_NAMES (default: 0)
MYSQL_MAX_CONNECTIONS (default: 151)
MYSQL_MAX_ALLOWED_PACKET (default: 200M)
MYSQL_FT_MIN_WORD_LEN (default: 4)
MYSQL_FT_MAX_WORD_LEN (default: 20)
MYSQL_AIO (default: 1)
MYSQL_TABLE_OPEN_CACHE (default: 400)
MYSQL_KEY_BUFFER_SIZE (default: 32M or 10% of available memory)
MYSQL_SORT_BUFFER_SIZE (default: 256K)
MYSQL_READ_BUFFER_SIZE (default: 8M or 5% of available memory)
MYSQL_INNODB_BUFFER_POOL_SIZE (default: 32M or 50% of available memory)
MYSQL_INNODB_LOG_FILE_SIZE (default: 8M or 15% of available memory)
MYSQL_INNODB_LOG_BUFFER_SIZE (default: 8M or 15% of available memory)
MYSQL_DEFAULTS_FILE (default: /etc/my.cnf)
MYSQL_BINLOG_FORMAT (default: statement)
MYSQL_LOG_QUERIES_ENABLED (default: 0)

The images support MySQL auto-tuning when the MySQL image is running with the --memory parameter set and if you didn't specify value for the following parameters, their values will be automatically calculated based on the available memory:

MYSQL_KEY_BUFFER_SIZE (default: 10%)
MYSQL_READ_BUFFER_SIZE (default: 5%)
MYSQL_INNODB_BUFFER_POOL_SIZE (default: 50%)
MYSQL_INNODB_LOG_FILE_SIZE (default: 15%)
MYSQL_INNODB_LOG_BUFFER_SIZE (default: 15%)

DevOps Guide to Database Management

Learn about what you need to know to automate and manage your open source databases

Download for Free

bitnami/mariadb

The images are built and maintained by Bitnami, experts in software packaging in virtual or cloud deployment. The images are released daily with the latest distribution packages available and use a minimalist Debian-based image called minideb. Thus, the image size for the latest tag is the smallest among all which is around 87MB. The project has 20 contributors with the first commit happened on May 17, 2015. At this time of writing, it only supports MariaDB 10.1 up until 10.3.

One outstanding feature of this image is the ability to deploy a highly available MariaDB setup via Docker environment variables. A zero downtime MariaDB master-slave replication cluster can easily be setup with the Bitnami MariaDB Docker image using the following environment variables:

MARIADB_REPLICATION_MODE: The replication mode. Possible values master/slave. No defaults.
MARIADB_REPLICATION_USER: The replication user created on the master on first run. No defaults.
MARIADB_REPLICATION_PASSWORD: The replication users password. No defaults.
MARIADB_MASTER_HOST: Hostname/IP of replication master (slave parameter). No defaults.
MARIADB_MASTER_PORT_NUMBER: Server port of the replication master (slave parameter). Defaults to 3306.
MARIADB_MASTER_ROOT_USER: User on replication master with access to MARIADB_DATABASE (slave parameter). Defaults to root
MARIADB_MASTER_ROOT_PASSWORD: Password of user on replication master with access to
MARIADB_DATABASE (slave parameter). No defaults.

In a replication cluster, you can have one master and zero or more slaves. When replication is enabled the master node is in read-write mode, while the slaves are in read-only mode. For best performance its advisable to limit the reads to the slaves.

In addition, these images also support deployment on Kubernetes as Helm Charts. You can read more about the installation steps in the Bitnami MariaDB Chart GitHub repository.

Conclusions

There are tons of MySQL server images that have been contributed by the community and we can't cover them all here. Keep in mind that these images are popular because they are built for general purpose usage. Some less popular images can do much more advanced stuff, like database container orchestration, automatic bootstrapping and automatic scaling. Different images provide different approaches that can be used to address other problems.

Tags:

WHMCS is an all-in-one client management, billing and support solution for web hosting companies. It's one of the leaders in the hosting automation world to be used alongside the hosting control panel itself. WHMCS runs on a LAMP stack, with MySQL/MariaDB as the database provider. Commonly, WHMCS is installed as a standalone instance (application and database) independently by following the WHMCS installation guide, or through software installer tools like cPanel Site Software or Softaculous. The database can be made highly available by migrating to a Galera Cluster of 3 nodes.

In this blog post, we will show you how to migrate the WHMCS database from a standalone MySQL server (provided by the WHM/cPanel server itself) to an external three-node MariaDB Galera Cluster to improve the database availability. The WHMCS application itself will be kept running on the same cPanel server. We’ll also give you some tuning tips to optimize performance.

Deploying the Database Cluster

Install ClusterControl:
```
$ whoami
root
$ wget https://severalnines.com/downloads/cmon/install-cc
$ chmod 755 install-cc
$ ./install-cc
```
Follow the instructions accordingly until the installation is completed. Then, go to the http://192.168.55.50/clustercontrol (192.168.55.50 being the IP address of the ClusterControl host) and register a super admin user with password and other required details.

Setup passwordless SSH from ClusterControl to all database nodes:

$ whoami
root
$ ssh-keygen -t rsa # Press enter on all prompts
$ ssh-copy-id 192.168.55.51
$ ssh-copy-id 192.168.55.52
$ ssh-copy-id 192.168.55.53

Configure the database deployment for our 3-node MariaDB Galera Cluster. We are going to use the latest supported version MariaDB 10.3:
Make sure you get all green checks after pressing ‘Enter’ when adding the node details. Wait until the deployment job completes and you should see the database cluster is listed in ClusterControl.
Deploy a ProxySQL node (we are going to co-locate it with the ClusterControl node) by going to Manage -> Load Balancer -> ProxySQL -> Deploy ProxySQL. Specify the following required details:
Under "Add Database User", you can ask ClusterControl to create a new ProxySQL and MySQL user as it sets up , thus we put the user as "portal_whmcs", assigned with ALL PRIVILEGES on database "portal_whmcs.*". Then, check all the boxes for "Include" and finally choose "false" for "Are you using implicit transactions?".

Once the deployment finished, you should see something like this under Topology view:

Our database deployment is now complete. Keep in mind that we do not cover the load balancer tier redundancy in this blog post. You can achieve that by adding a secondary load balancer and string them together with Keepalived. To learn more about this, check out ProxySQL Tutorials under chapter "4.2. High availability for ProxySQL".

WHMCS Installation

If you already have WHMCS installed and running, you may skip this step.

Take note that WHMCS requires a valid license which you have to purchase beforehand in order to use the software. They do not provide a free trial license, but they do offer a no questions asked 30-day money-back guarantee, which means you can always cancel the subscription before the offer expires without being charged.

To simplify the installation process, we are going to use cPanel Site Software (you may opt for WHMCS manual installation) to one of our sub-domain, selfportal.mytest.io. After creating the account in WHM, go to cPanel > Software > Site Software > WHMCS and install the web application. Login as the admin user and activate the license to start using the application.

At this point, our WHMCS instance is running as a standalone setup, connecting to the local MySQL server.

Single Console for Your Entire Database Infrastructure

Find out what else is new in ClusterControl

Install ClusterControl for FREE

Migrating the WHMCS Database to MariaDB Galera Cluster

Running WHMCS on a standalone MySQL server exposes the application to single-point-of-failure (SPOF) from database standpoint. MariaDB Galera Cluster provides redundancy to the data layer with built-in clustering features and support for multi-master architecture. Combine this with a database load balancer, for example ProxySQL, and we can improve the WHMCS database availability with very minimal changes to the application itself.

However, there are a number of best-practices that WHMCS (or other applications) have to follow in order to work efficiently on Galera Cluster, especially:

All tables must be running on InnoDB/XtraDB storage engine.
All tables should have a primary key defined (multi-column primary key is supported, unique key does not count).

Depending on the version installed, in our test environment installation (cPanel/WHM 11.78.0.23, WHMCS 7.6.0 via Site Software), the above two points did not meet the requirement. The default cPanel/WHM MySQL configuration comes with the following line inside /etc/my.cnf:

default-storage-engine=MyISAM

The above would cause additional tables managed by WHMCS Addon Modules to be created in MyISAM storage engine format if those modules are enabled. Here is the output of the storage engine after we have enabled 2 modules (New TLDs and Staff Noticeboard):

MariaDB> SELECT tables.table_schema, tables.table_name, tables.engine FROM information_schema.tables WHERE tables.table_schema='whmcsdata_whmcs' and tables.engine <> 'InnoDB';
+-----------------+----------------------+--------+
| table_schema    | table_name           | engine |
+-----------------+----------------------+--------+
| whmcsdata_whmcs | mod_enomnewtlds      | MyISAM |
| whmcsdata_whmcs | mod_enomnewtlds_cron | MyISAM |
| whmcsdata_whmcs | mod_staffboard       | MyISAM |
+-----------------+----------------------+--------+

MyISAM support is experimental in Galera, which means you should not run it in production. In some worse cases, it could compromise data consistency and cause writeset replication failures due to its non-transactional nature.

Another important point is that every table must have a primary key defined. Depending on the WHMCS installation procedure that you performed (as for us, we used cPanel Site Software to install WHMCS), some of the tables created by the installer do not come with primary key defined, as shown in the following output:

MariaDB [information_schema]> SELECT TABLES.table_schema, TABLES.table_name FROM TABLES LEFT JOIN KEY_COLUMN_USAGE AS c ON (TABLES.TABLE_NAME = c.TABLE_NAME AND c.CONSTRAINT_SCHEMA = TABLES.TABLE_SCHEMA AND c.constraint_name = 'PRIMARY' ) WHERE TABLES.table_schema <> 'information_schema' AND TABLES.table_schema <> 'performance_schema' AND TABLES.table_schema <> 'mysql' and TABLES.table_schema <> 'sys' AND c.constraint_name IS NULL;
+-----------------+------------------------------------+
| table_schema    | table_name                         |
+-----------------+------------------------------------+
| whmcsdata_whmcs | mod_invoicedata                    |
| whmcsdata_whmcs | tbladminperms                      |
| whmcsdata_whmcs | tblaffiliates                      |
| whmcsdata_whmcs | tblconfiguration                   |
| whmcsdata_whmcs | tblknowledgebaselinks              |
| whmcsdata_whmcs | tbloauthserver_access_token_scopes |
| whmcsdata_whmcs | tbloauthserver_authcode_scopes     |
| whmcsdata_whmcs | tbloauthserver_client_scopes       |
| whmcsdata_whmcs | tbloauthserver_user_authz_scopes   |
| whmcsdata_whmcs | tblpaymentgateways                 |
| whmcsdata_whmcs | tblproductconfiglinks              |
| whmcsdata_whmcs | tblservergroupsrel                 |
+-----------------+------------------------------------+

As a side note, Galera would still allow tables without primary key to exist. However, DELETE operations are not supported on those tables plus it would expose you to much bigger problems like node crash, writeset certification performance degradation or rows may appear in a different order on different nodes.

To overcome this, our migration plan must include the additional step to fix the storage engine and schema structure, as shown in the next section.

Migration Plan

Due to restrictions explained in the previous chapter, our migration plan has to be something like this:

Enable WHMCS maintenance mode
Take backups of the whmcs database using logical backup
Modify the dump files to meet Galera requirement (convert storage engine)
Bring up one of the Galera nodes and let the remaining nodes shut down
Restore to the chosen Galera node
Fix the schema structure to meet Galera requirement (missing primary keys)
Bootstrap the cluster from the chosen Galera node
Start the second node and let it sync
Start the third node and let it sync
Change the database pointing to the appropriate endpoint
Disable WHMCS maintenance mode

The new architecture can be illustrated as below:

Our WHMCS database name on the cPanel server is "whmcsdata_whmcs" and we are going to migrate this database to an external three-node MariaDB Galera Cluster deployed by ClusterControl. On top of the database server, we have a ProxySQL (co-locate with ClusterControl) running to act as the MariaDB load balancer, providing the single endpoint to our WHMCS instance. The database name on the cluster will be changed to "portal_whmcs" instead, so we can easily distinguish it.

Firstly, enable the site-wide Maintenance Mode by going to WHMCS > Setup > General Settings > General > Maintenance Mode > Tick to enable - prevents client area access when enabled. This will ensure there will be no activity from the end user during the database backup operation.

Since we have to make slight modifications to the schema structure to fit well into Galera, it's a good idea to create two separate dump files. One with the schema only and another one for data only. On the WHM server, run the following command as root:

$ mysqldump --no-data -uroot whmcsdata_whmcs > whmcsdata_whmcs_schema.sql
$ mysqldump --no-create-info -uroot whmcsdata_whmcs > whmcsdata_whmcs_data.sql

Then, we have to replace all MyISAM occurrences in the schema dump file with 'InnoDB':

$ sed -i 's/MyISAM/InnoDB/g' whmcsdata_whmcs_schema.sql

Verify that we don't have MyISAM lines anymore in the dump file (it should return nothing):

$ grep -i 'myisam' whmcsdata_whmcs_schema.sql

Transfer the dump files from the WHM server to mariadb1 (192.168.55.51):

$ scp whmcsdata_whmcs_* 192.168.55.51:~

Create the MySQL database. From ClusterControl, go to Manage -> Schemas and Users -> Create Database and specify the database name. Here we use a different database name called "portal_whmcs". Otherwise, you can manually create the database with the following command:

$ mysql -uroot -p

MariaDB> CREATE DATABASE 'portal_whmcs';

Create a MySQL user for this database with its privileges. From ClusterControl, go to Manage -> Schemas and Users -> Users -> Create New User and specify the following:

In case you choose to create the MySQL user manually, run the following statements:

$ mysql -uroot -p

MariaDB> CREATE USER 'portal_whmcs'@'%' IDENTIFIED BY 'ghU51CnPzI9z';
MariaDB> GRANT ALL PRIVILEGES ON portal_whmcs.* TO portal_whmcs@'%';

Take note that the created database user has to be imported into ProxySQL, to allow the WHMCS application to authenticate against the load balancer. Go to Nodes -> pick the ProxySQL node -> Users -> Import Users and select "portal_whmcs"@"%", as shown in the following screenshot:

In the next window (User Settings), specify Hostgroup 10 as the default hostgroup:

Now the restoration preparation stage is complete.

In Galera, restoring a big database via mysqldump on a single-node cluster is more efficient, and this improves the restoration time significantly. Otherwise, every node in the cluster would have to certify every statement from the mysqldump input, which would take longer time to complete.

Since we already have a three-node MariaDB Galera Cluster running, let's stop MySQL service on mariadb2 and mariadb3, one node at a time for a graceful scale down. To shut down the database nodes, from ClusterControl, simply go to Nodes -> Node Actions -> Stop Node -> Proceed. Here is what you would see from ClusterControl dashboard, where the cluster size is 1 and the status of the db1 is Synced and Primary:

Then, on mariadb1 (192.168.55.51), restore the schema and data accordingly:

$ mysql -uportal_whmcs -p portal_whmcs < whmcsdata_whmcs_schema.sql
$ mysql -uportal_whmcs -p portal_whmcs < whmcsdata_whmcs_data.sql

Once imported, we have to fix the table structure to add the necessary "id" column (except for table "tblaffiliates") as well as adding the primary key on all tables that have been missing any:

$ mysql -uportal_whmcs -p

MariaDB> USE portal_whmcs;
MariaDB [portal_whmcs]> ALTER TABLE `tblaffiliates` ADD PRIMARY KEY (id);
MariaDB [portal_whmcs]> ALTER TABLE `mod_invoicedata` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tbladminperms` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tblconfiguration` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tblknowledgebaselinks` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tbloauthserver_access_token_scopes` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tbloauthserver_authcode_scopes` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tbloauthserver_client_scopes` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tbloauthserver_user_authz_scopes` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tblpaymentgateways` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tblproductconfiglinks` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;
MariaDB [portal_whmcs]> ALTER TABLE `tblservergroupsrel` ADD `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST;

Or, we can translate the above repeated statements using a loop in a bash script:

#!/bin/bash

db_user='portal_whmcs'
db_pass='ghU51CnPzI9z'
db_whmcs='portal_whmcs'
tables=$(mysql -u${db_user} "-p${db_pass}"  information_schema -A -Bse "SELECT TABLES.table_name FROM TABLES LEFT JOIN KEY_COLUMN_USAGE AS c ON (TABLES.TABLE_NAME = c.TABLE_NAME AND c.CONSTRAINT_SCHEMA = TABLES.TABLE_SCHEMA AND c.constraint_name = 'PRIMARY' ) WHERE TABLES.table_schema <> 'information_schema' AND TABLES.table_schema <> 'performance_schema' AND TABLES.table_schema <> 'mysql' and TABLES.table_schema <> 'sys' AND c.constraint_name IS NULL;")
mysql_exec="mysql -u${db_user} -p${db_pass} $db_whmcs -e"

for table in $tables
do
        if [ "${table}" = "tblaffiliates" ]
        then
                $mysql_exec "ALTER TABLE ${table} ADD PRIMARY KEY (id)";
        else
                $mysql_exec "ALTER TABLE ${table} ADD id INT NOT NULL AUTO_INCREMENT PRIMARY KEY FIRST";
        fi
done

At this point, it's safe to start the remaining nodes to sync up with mariadb1. Start with mariadb2 by going to Nodes -> pick db2 -> Node Actions -> Start Node. Monitor the job progress and make sure mariadb2 is in Synced and Primary state (monitor the Overview page for details) before starting up mariadb3.

Finally, change the database pointing to the ProxySQL host on port 6033 inside WHMCS configuration file, as in our case it's located at /home/whmcsdata/public_html/configuration.php:

$ vim configuration.php
<?php
$license = 'WHMCS-XXXXXXXXXXXXXXXXXXXX';
$templates_compiledir = 'templates_c';
$mysql_charset = 'utf8';
$cc_encryption_hash = 'gLg4oxuOWsp4bMleNGJ--------30IGPnsCS49jzfrKjQpwaN';
$db_host = 192.168.55.50;
$db_port = '6033';
$db_username = 'portal_whmcs';
$db_password = 'ghU51CnPzI9z';
$db_name = 'portal_whmcs';

$customadminpath = 'admin2d27';

Don't forget to disable WHMCS maintenance mode by going to WHMCS > Setup > General Settings > General > Maintenance Mode > uncheck "Tick to enable - prevents client area access when enabled". Our database migration exercise is now complete.

Testing and Tuning

You can verify if by looking at the ProxySQL's query entries under Nodes -> ProxySQL -> Top Queries:

For the most repeated read-only queries (you can sort them by Count Star), you may cache them to improve the response time and reduce the number of hits to the backend servers. Simply rollover to any query and click Cache Query, and the following pop-up will appear:

What you need to do is to only choose the destination hostgroup and click "Add Rule". You can then verify if the cached query got hit under "Rules" tab:

From the query rule itself, we can tell that reads (all SELECT except SELECT .. FOR UPDATE) are forwarded to hostgroup 20 where the connections are distributed to all nodes while writes (other than SELECT) are forwarded to hostgroup 10, where the connections are forwarded to one Galera node only. This configuration minimizes the risk for deadlocks that may be caused by a multi-master setup, which improves the replication performance as a whole.

That's it for now. Happy clustering!

Tags:

Puppet is an open source systems management tool for centralizing and automating configuration management. Automation tools help to minimize manual and repetitive tasks, and can save a great deal of time.

Puppet works by default in a server/agent model. Agents fetch their “catalog” (final desired state) from the master and apply it locally. Then they report back to the server. The catalog is computed depending on “facts” the machine sends to the server, user input (parameters) and modules (source code).

In this blog, we’ll show you how to deploy and manage MySQL/MariaDB instances via Puppet. There are a number of technologies around MySQL/MariaDB such as replication (master-slave, Galera or group replication for MySQL), SQL-aware load balancers like ProxySQL and MariaDB MaxScale, backup and recovery tools and many more which we will cover in this blog series. There are also many modules available in the Puppet Forge built and maintained by the community which can help us simplify the code and avoid reinventing the wheel. In this blog, we are going to focus on MySQL Replication.

puppetlabs/mysql

This is the most popular Puppet module for MySQL and MariaDB (and probably the best in the market) right now. This module manages both the installation and configuration of MySQL, as well as extending Puppet to allow management of MySQL resources, such as databases, users, and grants.

The module is officially maintained by the Puppet team (via puppetlabs Github repository) and supports all major versions of Puppet Enterprise 2019.1.x, 2019.0.x, 2018.1.x, Puppet >= 5.5.10 < 7.0.0 on RedHat, Ubuntu, Debian, SLES, Scientific, CentOS, OracleLinux platforms. User has options to install MySQL, MariaDB and Percona Server by customizing the package repository

The following example shows how to deploy a MySQL server. On the puppet master install the MySQL module and create the manifest file:

(puppet-master)$ puppet module install puppetlabs/mysql
(puppet-master)$ vim /etc/puppetlabs/code/environments/production/manifests/mysql.pp

Add the following lines:

node "db1.local" {
  class { '::mysql::server':
    root_password => 't5[sb^D[+rt8bBYu',
    remove_default_accounts => true,
    override_options => {
      'mysqld' => {
        'log_error' => '/var/log/mysql.log',
        'innodb_buffer_pool_size' => '512M'
      }
      'mysqld_safe' => {
        'log_error' => '/var/log/mysql.log'
      }
    }
  }
}

Then on the puppet agent node, run the following command to apply the configuration catalog:

(db1.local)$ puppet agent -t

On the first run, you might get the following error:

Info: Certificate for db1.local has not been signed yet

Just run the following command on the Puppet master to sign the certificate:

(puppet-master)$ puppetserver ca sign --certname=db1.local
Successfully signed certificate request for db1.local

Retry again with "puppet agent -t" command to re-initiate the connection with the signed certificate.

The above definition will install the standard MySQL-related packages available in the OS distribution repository. For example, on Ubuntu 18.04 (Bionic), you would get MySQL 5.7.26 packages installed:

(db1.local) $ dpkg --list | grep -i mysql
ii  mysql-client-5.7                5.7.26-0ubuntu0.18.04.1           amd64        MySQL database client binaries
ii  mysql-client-core-5.7           5.7.26-0ubuntu0.18.04.1           amd64        MySQL database core client binaries
ii  mysql-common                    5.8+1.0.4                         all          MySQL database common files, e.g. /etc/mysql/my.cnf
ii  mysql-server                    5.7.26-0ubuntu0.18.04.1           all          MySQL database server (metapackage depending on the latest version)
ii  mysql-server-5.7                5.7.26-0ubuntu0.18.04.1           amd64        MySQL database server binaries and system database setup
ii  mysql-server-core-5.7           5.7.26-0ubuntu0.18.04.1           amd64        MySQL database server binaries

You may opt for other vendors like Oracle, Percona or MariaDB with extra configuration on the repository (refer to the README section for details). The following definition will install the MariaDB packages from MariaDB apt repository (requires apt Puppet module):

$ puppet module install puppetlabs/apt
$ vim /etc/puppetlabs/code/environments/production/manifests/mariadb.pp
# include puppetlabs/apt module
include apt

# apt definition for MariaDB 10.3
apt::source { 'mariadb':
  location => 'http://sgp1.mirrors.digitalocean.com/mariadb/repo/10.3/ubuntu/',
  release  => $::lsbdistcodename,
  repos    => 'main',
  key      => {
    id     => 'A6E773A1812E4B8FD94024AAC0F47944DE8F6914',
    server => 'hkp://keyserver.ubuntu.com:80',
  },
  include => {
    src   => false,
    deb   => true,
  },
}

# MariaDB configuration
class {'::mysql::server':
  package_name     => 'mariadb-server',
  service_name     => 'mysql',
  root_password    => 't5[sb^D[+rt8bBYu',
  override_options => {
    mysqld => {
      'log-error' => '/var/log/mysql/mariadb.log',
      'pid-file'  => '/var/run/mysqld/mysqld.pid',
    },
    mysqld_safe => {
      'log-error' => '/var/log/mysql/mariadb.log',
    },
  }
}

# Deploy on db2.local
node "db2.local" {
Apt::Source['mariadb'] ->
Class['apt::update'] ->
Class['::mysql::server']
}

Take note on the key->id value, where there is a special way to retrieve the 40-character id as shown in this article:

$ sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
$ apt-key adv --list-public-keys --with-fingerprint --with-colons
uid:-::::1459359915::6DC53DD92B7A8C298D5E54F950371E2B8950D2F2::MariaDB Signing Key <signing-key@mariadb.org>::::::::::0:
sub:-:4096:1:C0F47944DE8F6914:1459359915::::::e::::::23:
fpr:::::::::A6E773A1812E4B8FD94024AAC0F47944DE8F6914:

Where the id value is in the line started with "fpr", which is 'A6E773A1812E4B8FD94024AAC0F47944DE8F6914'.

After the Puppet catalog is applied, you may directly access MySQL console as root without explicit password since the module configures and manages ~/.my.cnf automatically. If we would want to reset the root password to something else, simply change the root_password value in the Puppet definition and apply the catalog on the agent node.

MySQL Replication Deployment

To deploy a MySQL Replication setup, one has to create at least two types of configuration to separate master and slave configuration. The master will have read-only disabled to allow read/write while slaves will be configured with read-only enabled. In this example, we are going to use GTID-based replication to simplify the configuration (since all nodes' configuration would be very similar). We will want to initiate replication link to the master right after the slave is up.

Supposed we are having 3 nodes MySQL master-slave replication:

db1.local - master
db2.local - slave #1
db3.local - slave #2

To meet the above requirements, we can write down our manifest to something like this:

# Puppet manifest for MySQL GTID-based replication MySQL 5.7 on Ubuntu 18.04 (Puppet v6.4.2) 
# /etc/puppetlabs/code/environments/production/manifests/replication.pp

# node's configuration
class mysql {
  class {'::mysql::server':
    root_password           => 'q1w2e3!@#',
    create_root_my_cnf      => true,
    remove_default_accounts => true,
    manage_config_file      => true,
    override_options        => {
      'mysqld' => {
        'datadir'                 => '/var/lib/mysql',
        'bind_address'            => '0.0.0.0',
        'server-id'               => $mysql_server_id,
        'read_only'               => $mysql_read_only,
        'gtid-mode'               => 'ON',
        'enforce_gtid_consistency'=> 'ON',
        'log-slave-updates'       => 'ON',
        'sync_binlog'             => 1,
        'log-bin'                 => '/var/log/mysql-bin',
        'read_only'               => 'OFF',
        'binlog-format'           => 'ROW',
        'log-error'               => '/var/log/mysql/error.log',
        'report_host'             => ${fqdn},
        'innodb_buffer_pool_size' => '512M'
      },
      'mysqld_safe' => {
        'log-error'               => '/var/log/mysql/error.log'
      }
    }
  }
  
  # create slave user
  mysql_user { "${slave_user}@192.168.0.%":
      ensure        => 'present',
      password_hash => mysql_password("${slave_password}")
  }

  # grant privileges for slave user
  mysql_grant { "${slave_user}@192.168.0.%/*.*":
      ensure        => 'present',
      privileges    => ['REPLICATION SLAVE'],
      table         => '*.*',
      user          => "${slave_user}@192.168.0.%"
  }

  # /etc/hosts definition
  host {
    'db1.local': ip => '192.168.0.161';
    'db2.local': ip => '192.169.0.162';
    'db3.local': ip => '192.168.0.163';
  }

  # executes change master only if $master_host is defined
  if $master_host {
    exec { 'change master':
      path    => '/usr/bin:/usr/sbin:/bin',
      command => "mysql --defaults-extra-file=/root/.my.cnf -e \"CHANGE MASTER TO MASTER_HOST = '$master_host', MASTER_USER = '$slave_user', MASTER_PASSWORD = '$slave_password', MASTER_AUTO_POSITION = 1; START SLAVE;\"",
      unless  => "mysql --defaults-extra-file=/root/.my.cnf -e 'SHOW SLAVE STATUS\G' | grep 'Slave_SQL_Running: Yes'"
    }
  }
}

## node assignment

# global vars
$master_host = undef
$slave_user = 'slave'
$slave_password = 'Replicas123'

# master
node "db1.local" {
  $mysql_server_id = '1'
  $mysql_read_only = 'OFF'
  include mysql
}

# slave1
node "db2.local" {
  $mysql_server_id = '2'
  $mysql_read_only = 'ON'
  $master_host = 'db1.local'
  include mysql
}

# slave2
node "db3.local" {
  $mysql_server_id = '3'
  $mysql_read_only = 'ON'
  $master_host = 'db1.local'
  include mysql
}

Force the agent to apply the catalog:

(all-mysql-nodes)$ puppet agent -t

On the master (db1.local), we can verify all the connected slaves:

mysql> SHOW SLAVE HOSTS;
+-----------+-----------+------+-----------+--------------------------------------+
| Server_id | Host      | Port | Master_id | Slave_UUID                           |
+-----------+-----------+------+-----------+--------------------------------------+
|         3 | db3.local | 3306 |         1 | 2d0b14b6-8174-11e9-8bac-0273c38be33b |
|         2 | db2.local | 3306 |         1 | a9dfa4c7-8172-11e9-8000-0273c38be33b |
+-----------+-----------+------+-----------+--------------------------------------+

Pay extra attention to the "exec { 'change master' :" section, where it means a MySQL command will be executed to initiate the replication link if the condition is met. All "exec" resources executed by Puppet must be idempotent, meaning the operation that will have the same effect whether you run it once or 10,001 times. There are a number of condition attributes you may use like "unless", "onlyif" and "create" to safeguard the correct state and prevent Puppet to be messing up with your setup. You may delete/comment that section if you want to initiate the replication link manually.

MySQL Management

This module can be used to perform a number of MySQL management tasks:

configuration options (modify, apply, custom configuration)
database resources (database, user, grants)
backup (create, schedule, backup user, storage)
simple restore (mysqldump only)
plugins installation/activation

Database Resource

As you can see in the example manifest above, we have defined two MySQL resources - mysql_user and mysql_grant - to create user and grant privileges for the user respectively. We can also use the mysql::db class to ensure a database with associated user and privileges are present, for example:

  # make sure the database and user exist with proper grant
  mysql::db { 'mynewdb':
    user          => 'mynewuser',
    password      => 'passw0rd',
    host          => '192.168.0.%',
    grant         => ['SELECT', 'UPDATE']
  }

Take note that in MySQL replication, all writes must be performed on the master only. So, make sure that the above resource is assigned to the master. Otherwise, errant transaction could occur.

Backup and Restore

Commonly, only one backup host is required for the entire cluster (unless you replicate a subset of data). We can use the mysql::server::backup class to prepare the backup resources. Suppose we have the following declaration in our manifest:

  # Prepare the backup script, /usr/local/sbin/mysqlbackup.sh
  class { 'mysql::server::backup':
    backupuser     => 'backup',
    backuppassword => 'passw0rd',
    backupdir      => '/home/backup',
    backupdirowner => 'mysql',
    backupdirgroup => 'mysql',
    backupdirmode  => '755',
    backuprotate   => 15,
    time           => ['23','30'],   #backup starts at 11:30PM everyday
    include_routines  => true,
    include_triggers  => true,
    ignore_events     => false,
    maxallowedpacket  => '64M',
    optional_args     => ['--set-gtid-purged=OFF'] #extra argument if GTID is enabled
  }

Puppet will configure all the prerequisites before running a backup - creating the backup user, preparing the destination path, assigning ownership and permission, setting the cron job and setting up the backup command options to use in the provided backup script located at /usr/local/sbin/mysqlbackup.sh. It's then up to the user to run or schedule the script. To make an immediate backup, simply invoke:

$ mysqlbackup.sh

If we extract the actual mysqldump command based on the above, here is what it looks like:

$ mysqldump --defaults-extra-file=/tmp/backup.NYg0TR --opt --flush-logs --single-transaction --events --set-gtid-purged=OFF --all-databases

For those who want to use other backup tools like Percona Xtrabackup, MariaDB Backup (MariaDB only) or MySQL Enterprise Backup, the module provides the following private classes:

mysql::backup::xtrabackup (Percona Xtrabackup and MariaDB Backup)
mysql::backup::mysqlbackup (MySQL Enterprise Backup)

Example declaration with Percona Xtrabackup:

  class { 'mysql::backup::xtrabackup':
    xtrabackup_package_name => 'percona-xtrabackup',
    backupuser     => 'xtrabackup',
    backuppassword => 'passw0rd',
    backupdir      => '/home/xtrabackup',
    backupdirowner => 'mysql',
    backupdirgroup => 'mysql',
    backupdirmode  => '755',
    backupcompress => true,
    backuprotate   => 15,
    include_routines  => true,
    time              => ['23','30'], #backup starts at 11:30PM
    include_triggers  => true,
    maxallowedpacket  => '64M',
    incremental_backups => true
  }

The above will schedule two backups, one full backup every Sunday at 11:30 PM and one incremental backup every day except Sunday at the same time, as shown by the cron job output after the above manifest is applied:

(db1.local)$ crontab -l
# Puppet Name: xtrabackup-weekly
30 23 * * 0 /usr/local/sbin/xtrabackup.sh --target-dir=/home/backup/mysql/xtrabackup --backup
# Puppet Name: xtrabackup-daily
30 23 * * 1-6 /usr/local/sbin/xtrabackup.sh --incremental-basedir=/home/backup/mysql/xtrabackup --target-dir=/home/backup/mysql/xtrabackup/`date +%F_%H-%M-%S` --backup

For more details and options available for this class (and other classes), check out the option reference here.

For the restoration aspect, the module only support restoration with mysqldump backup method, by importing the SQL file directly to the database using the mysql::db class, for example:

mysql::db { 'mydb':
  user     => 'myuser',
  password => 'mypass',
  host     => 'localhost',
  grant    => ['ALL PRIVILEGES'],
  sql      => '/home/backup/mysql/mydb/backup.gz',
  import_cat_cmd => 'zcat',
  import_timeout => 900
}

The SQL file will be loaded only once and not on every run, unless enforce_sql => true is used.

Configuration Options

In this example, we used manage_config_file => true with override_options to structure our configuration lines which later will be pushed out by Puppet. Any modification to the manifest file will only reflect the content of the target MySQL configuration file. This module will neither load the configuration into runtime nor restart the MySQL service after pushing the changes into the configuration file. It's the sysadmin responsibility to restart the service in order to activate the changes.

To add custom MySQL configuration, we can place additional files into "includedir", default to /etc/mysql/conf.d. This allows us to override settings or add additional ones, which is helpful if you don't use override_options in mysql::server class. Making use of Puppet template is highly recommended here. Place the custom configuration file under the module template directory (default to , /etc/puppetlabs/code/environments/production/modules/mysql/templates) and then add the following lines in the manifest:

# Loads /etc/puppetlabs/code/environments/production/modules/mysql/templates/my-custom-config.cnf.erb into /etc/mysql/conf.d/my-custom-config.cnf

file { '/etc/mysql/conf.d/my-custom-config.cnf':
  ensure  => file,
  content => template('mysql/my-custom-config.cnf.erb')
}

To implement version specific parameters, use the version directive, for example [mysqld-5.5]. This allows one config for different versions of MySQL.

Puppet vs ClusterControl

Did you know that you can also automate the MySQL or MariaDB replication deployment by using ClusterControl? You can use ClusterControl Puppet module to install it, or simply by downloading it from our website.

When compared to ClusterControl, you can expect the following differences:

A bit of a learning curve to understand Puppet syntaxes, formatting, structures before you can write manifests.
Manifest must be tested regularly. It's very common you will get a compilation error on the code especially if the catalog is applied for the first time.
Puppet presumes the codes to be idempotent. The test/check/verify condition falls under the author’s responsibility to avoid messing up with a running system.
Puppet requires an agent on the managed node.
Backward incompatibility. Some old modules would not run correctly on the new version.
Database/host monitoring has to be set up separately.

ClusterControl’s deployment wizard guides the deployment process:

Alternatively, you may use ClusterControl command line interface called "s9s" to achieve similar results. The following command creates a three-node MySQL replication cluster (provided passwordless to all nodes has been configured beforehand):

$ s9s cluster --create \
  --cluster-type=mysqlreplication \
      --nodes=192.168.0.41?master;192.168.0.42?slave;192.168.0.43?slave;192.168.0.44?master; \
  --vendor=oracle \
  --cluster-name='MySQL Replication 8.0' \
  --provider-version=8.0 \
  --db-admin='root' \
  --db-admin-passwd='$ecR3t^word' \
  --log

The following MySQL/MariaDB replication setups are supported:

Master-slave replication (file/position-based)
Master-slave replication with GTID (MySQL/Percona)
Master-slave replication with MariaDB GTID
Master-master replication (semi-sync/async)
Master-slave chain replication (semi-sync/async)

Post deployment, nodes/clusters can be monitored and fully managed by ClusterControl, including automatic failure detection, master failover, slave promotion, automatic recovery, backup management, configuration management and so on. All of these are bundled together in one product. The community edition (free forever!) offers deployment and monitoring. On average, your database cluster will be up and running within 30 minutes. What it needs is only passwordless SSH to the target nodes.

In the next part, we are going to walk you through Galera Cluster deployment using the same Puppet module. Stay tuned!

Tags:

MariaDB 10.4 is a current development branch of MariaDB. Recently, on the 21st of May, the third Release Candidate (10.4.5) was released, bringing us closer to the official release. That’s why we thought it might be a good idea to take a look at the new 10.4 features. We will also share some thoughts on a recent blog post published by MariaDB Corporation. For information about the release itself, you can find all the details in the changelog of MariaDB 10.4.0.

Performance changes

Unicode character sets are typically slower than charsets like latin1, mostly due to their size. MySQL 8.0 brought significant improvements in this area, and MariaDB 10.4 should also be noticeably faster than 10.3 in this regard. It is quite an important improvement - people really love to use emojis, which require UTF8 to be enabled. Some work has been done in the optimizer - MariaDB 10.4 should work better for IN() subqueries as it is now possible to push conditions into materialized subqueries.

Starting and stopping InnoDB can take a while, depending on the amount of data in redo logs. MariaDB 10.4 will improve startup, shutdown and purging. Such improvements are especially important given the popularity of hot backup tools like mariabackup and xtrabackup. Those tools, in the end, go through the InnoDB startup process from an unclean shutdown when they apply redo logs, therefore every improvement in that area should reduce the time needed to restore backups.

InnoDB changes

MariaDB 10.4 has received an instant DROP COLUMN operation. It is now also possible to reorder the columns in the table without a need of rebuilding it. We can’t stress how important this is. You may wonder what are the most common operations that you do in the production environment? We would say it is adding or removing an index. Another most common operation would be operations on the columns - add a new column and remove existing column. So far the most common approach was to use external tools to do the job: pt-online-schema-change or, more recently, gh-ost. Both have their limitations (gh-ost doesn’t work for Galera Cluster, for example) which may make it impossible to use them on your system. Especially tricky are foreign keys. With instant DROP COLUMN (instant ADD COLUMN is already available), a big part of the schema changes can be performed ad hoc, without detailed scheduling and planning, as it has to be done now. It’s important to keep in mind that instant changes are what we want to have. There are non-blocking schema changes, like creating an index, but such operations pose serious challenge when the replication is used as they induce replication lag. Thus, even though the operation could have been executed on live system, we prefer to use workarounds like pt-online-schema-change to keep better control over the process.

This is not the only improvement in how schema changes are executed. MariaDB 10.4 will benefit from faster extension of VARCHAR columns, additionally character set and collation changes on non-indexed columns will be instant.

General changes

One of the biggest changes are changes in the user management. Mysql.host table will not be created, mysql.user table is deprecated. User accounts and global privileges will be kept in the mysql.global_priv table. This is potentially a serious change for all the tools (including ClusterControl), which have an option to manage MySQL and MariaDB users - new cases will have to be written to cover user management in MariaDB 10.4 and onwards. While we acknowledge that changes are needed, this definitely does not help to maintain tools for both MariaDB and MySQL, making the tooling landscape even more divided than it already is. Talking about users, MariaDB 10.4 comes with an option for expiring user password. This is definitely a step in a good direction - it helps to enforce good practices regarding password management.

Even though we will cover it in a separate blog in more detail, we have to mention here support for Galera 26.4 - MariaDB 10.4 will benefit from a new Galera version with features like streaming replication or improved SST thanks to backup locks.

Finally, in MariaDB 10.4 you can set sql_mode=MSSQL. This is an initial implementation but sql_mode=ORACLE was also an initial implementation at some point. This shows MariaDB’s focus on enterprise customers - if Oracle customers decide to migrate, it is quite likely that MariaDB adoption among Microsoft SQL Server will also grow as more features will be added and migration will become less of an issue.

MariaDB being a fork

Quite recently we saw a blog post explaining MariaDB stance on InnoDB changes and compatibility. The gist is that MariaDB will no longer merge InnoDB features from MySQL, the focus will be on the stability and performance improvement done by MariaDB. This basically means that MariaDB will become incompatible with MySQL. Even if you could do the binary upgrade in the past, this will not be possible in the future. Even right now it can be tricky to execute. This increases the importance of tools like mydumper/myloader as logical backup will be the only way for migration. What is good, MariaDB will be able to own the stability of their fork of InnoDB - they will not have to deal with issues introduced by upstream developers therefore we can expect less bugs being introduced.

Performance-wise we have to wait for benchmarks but given the historical data, we can assume MariaDB will be slower than MySQL. In the past benchmarks what we typically see is that the performance increase for MariaDB kicks in when more recent InnoDB version has been integrated. This will no longer be the case which makes us wonder how MariaDB will fare in the performance comparison from now on and if the improvements introduced by MariaDB will be enough to keep up with MySQL 8.0 and further versions.

For us users, all of this means that MariaDB 10.4 should be more stable than the previous releases. It also means that eventually we will have to learn internals of two different storage engines - especially if we care about the performance. This is far from ideal but that’s the way it is. Tools will have to be designed to work with one or the other version of InnoDB (or additional work will have to be added to support both MySQL and MariaDB). We will keep an eye on how this will progress. When you think of it, it is not such a surprising move - MariaDB always had to take its time to integrate with more recent InnoDB version. With more and more incompatible features being added to MariaDB and huge changes introduced in MySQL 8.0, it makes sense to focus on developing new functionality rather than on porting incompatible InnoDB from upstream MySQL.

We hope this short blog post gave you insight into changes that will hit the production systems when going to MariaDB 10.4.

Tags:

MariaDB