Quantcast
Channel: Severalnines - MariaDB
Viewing all 327 articles
Browse latest View live

What is MariaDB Enterprise Cluster?

$
0
0

MariaDB Enterprise Cluster is a subscription service of a highly available database solution from MariaDB Corporation which is managed with an Enterprise Lifecycle. There are three aspects of the Enterprise Lifecycle that are provided by MariaDB: Enterprise Builds, Enterprise Releases, and Enterprise Support.

Enterprise Builds ensure you will get the highest level of quality of software, which consists of optimized default parameters and priority of bug fixes available for subscription customers.

Enterprise Release gives you predictable releases for patches and updates based on a certain schedule.

Enterprise Support provides the user with customer support, professional services, training, and documentation.

The MariaDB Enterprise Cluster consists of MariaDB Enterprise Server with Galera Cluster for redundancy, and MariaDB Maxscale for load balancing. 

MariaDB Enterprise Server & Cluster

MariaDB Enterprise Cluster comes with an Enterprise grade database server called MariaDB Enterprise Server. It provides enterprise features such as:

  • MariaDB Enterprise Audit, comprehensive audit plugin that provides detailed information of connections and also the changes of database.
  • MariaDB Enterprise Backup, it is an enhanced feature from MariaDB Backup that allows the writes and schema changes while the backup is running. The DDL blocking is reduced through backup stages and DDL logging. 

Beside the enterprise features, there are some standard features that you might be familiar with in MariaDB, for example : SQL based account locking, password expiration, bitemporal tables, account automatic lock after failed login attempts. 

MariaDB Enterprise Cluster and Galera Cluster

MariaDB Enterprise Cluster uses Galera Cluster for MariaDB which is already enhanced for the enterprise. It synchronizes data to achieve redundancy and high availability. Galera Cluster is a database clustering solution that enables multi master replication between the nodes with synchronous replication state. 

The synchronous replication in Galera Cluster uses certification based replication where group communication and transaction ordering are used. The transaction is executed in a node, at the point when the commit happens, it will run coordination of the certification process to enforce global consistency. The broadcast service establishes a global total order between transactions to achieve global coordination.

Certification Based Replication requires some features of the database in order to be working. The features are:

  • Transactional Database; the database must be transactional, it needs to be able to rollback uncommitted transactions.
  • Atomic Changes; the transaction changes must occur completely or not occur at all in the database.
  • Global Ordering; the replication must be ordered globally. Transaction must apply to all instances within the same order.

MariaDB Enterprise Cluster and MariaDB Maxscale

MariaDB Enterprise Cluster also comes with MariaDB Maxscale as a database proxy which can provide a high availability, scalability environment. Other popular proxies that are used by MySQL and MariaDB users include HAProxy and ProxySQL.

There are some great features for Maxscale that give you benefit for your environment scaling:

Automatic Failover

Maxscale can monitor database server availability and automatically trigger failover for service resiliency if a crash happens. In MariaDB Enterprise Cluster where any node can accept writes and reads, Maxscale is used to minimize the database failures. In addition, maxscale also can be used to split write traffic.

Traffic Control

There are some features related to traffic controls in maxscale. You can set the max threshold of your query per seconds using Query throttling, SQL firewall can be used to restrict data access and block queries that have similar patterns based on the rules we defined. Authentication support that supports PAM and Kerberos.

Load Balancing 

It provides load balancing for your traffic distributed to your database. It can be used to scale out your database (split read/write traffic through the nodes).

There are also some improvements on the latest Maxscale (version 2.4) such as Change Data Capture (CDC) adapter, connection attempt throttling, smart query routing, and ClustrixDB support.

We hope this short blog post gives you an understanding of what it is included in MariaDB Enterprise Cluster.


Exploring Storage Engine Options for MariaDB

$
0
0

MariaDB Server was originally derived from MySQL and has therefore inherited its pluggable storage engine architecture. Different storage engines have different characteristics in terms of performance but also features and possibilities. This allows users to pick the right tool for the job instead of using the same storage engine no matter what is the purpose of the data, what are the requirements regarding data storage and how the data should be accessed. In this blog post we would like to look at the options available in MariaDB and discuss potential use cases for the different storage engines available.

What is a Storage Engine?

First, though, let’s take a look at what is the storage engine? MariaDB consists of multiple layers that operate together. SQL is parsed by one of them, then MariaDB reaches out for data, using a common API. Under the hood there is a storage engine that contains the data and it reacts to the requests for data, extracts the data and makes it available to MariaDB

In short, MariaDB sends a request for a row and it is all up to the storage engine to retrieve it and send it back. MariaDB does not care how exactly the row is stored or how it is going to be retrieved, it is all up to the implementation within the storage engine. Storage engines may also implement different features. Transactions are being handled also entirely on the storage engine’s side. That’s why some of the support transactions and some do not. With this architecture it is possible to write different storage engines, dedicated to solving different problems.

Storage Engines in MariaDB Server

MariaDB comes with a set of storage engines. You can check which ones are available through a simple command:

MariaDB [(none)]> SHOW STORAGE ENGINES;

+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+

| Engine             | Support | Comment                                                                                         | Transactions | XA   | Savepoints |

+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+

| MRG_MyISAM         | YES     | Collection of identical MyISAM tables                                                           | NO           | NO   | NO         |

| CSV                | YES     | Stores tables as CSV files                                                                      | NO           | NO   | NO         |

| Aria               | YES     | Crash-safe tables with MyISAM heritage. Used for internal temporary tables and privilege tables | NO           | NO   | NO         |

| SEQUENCE           | YES     | Generated tables filled with sequential values                                                  | YES          | NO   | YES        |

| MEMORY             | YES     | Hash based, stored in memory, useful for temporary tables                                       | NO           | NO   | NO         |

| MyISAM             | YES     | Non-transactional engine with good performance and small data footprint                         | NO           | NO   | NO         |

| PERFORMANCE_SCHEMA | YES     | Performance Schema                                                                              | NO           | NO   | NO         |

| InnoDB             | DEFAULT | Supports transactions, row-level locking, foreign keys and encryption for tables                | YES          | YES  | YES        |

+--------------------+---------+-------------------------------------------------------------------------------------------------+--------------+------+------------+

8 rows in set (0.000 sec)

As you can see, there are many of them, we will cover the most important ones.

InnoDB

InnoDB, obviously, is THE storage engine. Transactional, built to deal with OLTP traffic, can provide really great performance. It is the default engine used in MariaDB and, unless you know what you are doing, you probably want to stick to it for your database.

MyISAM

MyISAM is one of the “original” storage engines available in MySQL and then MariaDB. It is not transactional, making it not ideal for the replication setups and, well, most of other environments as well. It is still very fast engine, especially regarding index access, making it suitable for read-only workloads that won’t be affected by locking of INSERTs and overall fragility of MyISAM.

Aria

Aria is an engine created for MariaDB as a replacement for MyISAM. It is not transactional but it is crash-safe making it way more reliable. Currently it is used for system and temporary tables but it can also be used instead of MyISAM for workloads requiring fast, read-only access to data.

Memory

This is an all-in-memory engine that is typically used for temporary in-memory tables. It is not persistent but might work for some read-only workloads.

CSV

This storage engine is designed to store data in a file as comma-separated values. It is not the most used storage engine, it’s very specialized but it still can be used to easily extract data from MariaDB into any other database software as well as Excel or similar software.

Storage Engines in MariaDB Enterprise Server

MariaDB Enterprise Server comes with a couple additional storage engines over what is available in the community edition. Let’s take a look at them as well.

ColumnStore

This is a dedicated storage engine for analytical workload. Thanks to the specific way of storing the data it makes it faster to retrieve large volumes of data, frequently needed for reporting. This might be the storage engine of your choosing for OLAP (OnLine Analytical Processing) workloads.

S3

S3 engine allows you to access data located in S3. It is a non-transactional engine intended to give users the option to archive data in the S3. Read only access is available after the table is created.

Spider

Spider engine lets you connect multiple MariaDB databases across the network, creating a sharded storage. It is transactional and it makes it easier for users to scale out by splitting the data across numerous MariaDB Enterprise Servers, distributing the traffic and workload among them.

MyRocks

MyRocks is a storage engine developed in Facebook, it is intended to reduce the write amplification and minimize the wear out of SSD drives. It is a transactional engine which should handle OLTP workload quite well, especially workloads typical for social media websites. MyRocks comes with pretty good compression, better than InnoDB, which can help to significantly reduce expenses on storage if the dataset becomes too large for InnoDB to handle properly.

Conclusion

As you can see, there are numerous options provided by both MariaDB Enterprise and Community Server regarding the way in which data can be stored. There are storage engines that excel in read-only workloads, OLAP or large datasets. It is up to the user to pick a good fit. Please keep in mind that, when in doubt, you can always stick to InnoDB, which provides quite good performance in general and should be more than enough for the majority of the cases. It is for those edge cases where you may need to look for something more suitable.

Using the MariaDB Audit Plugin for Database Security

$
0
0

There are different ways to keep your data safe. Practices such as controlling database access, securing configuration, upgrading your system, and more are part of database security. It is even possible that you have security issues and don’t realize it (until it is too late), that’s why monitoring is a key piece to ensure that if something unexpected happens, you will be able to catch it. This includes not only your system, but also your databases.

Auditing is a way to know what is happening in your database, and it is also required for many security regulations or standards (e.g. PCI - Payment Card Industry).

MariaDB Server, one of the most popular open-source database servers, has his own Audit Plugin (which also works on MySQL), in order to help with this auditing task. In this blog, you will see how to install and use this useful MariaDB Audit Plugin.

What is MariaDB Audit Plugin?

The Audit Plugin is developed by MariaDB to meet the requirements to record user access to be in compliance with auditing regulations.

For each client session, it records, in a log file (or syslog), who connected to the server, what queries executed, which tables were accessed, and server variables changed.

It works with MariaDB, MySQL, and Percona Server. MariaDB started including by default the Audit Plugin from versions 10.0.10 and 5.5.37, and it can be installed in any version from MariaDB 5.5.20.

MariaDB Audit Plugin Installation

The plugin file (server_audit.so) is installed by default during the MariaDB installation in the plugins directory /usr/lib/mysql/plugin/:

$ ls -lah /usr/lib/mysql/plugin/ |grep server_audit

-rw-r--r-- 1 root  root  63K May  9 19:33 server_audit.so

So, you just need to add it into the MariaDB instance:

MariaDB [(none)]> INSTALL SONAME 'server_audit';

Query OK, 0 rows affected (0.003 sec)

MariaDB [(none)]> SHOW PLUGINS;

+--------------+--------+-------+-----------------+---------+

| Name         | Status | Type  | Library         | License |

+--------------+--------+-------+-----------------+---------+

| SERVER_AUDIT | ACTIVE | AUDIT | server_audit.so | GPL     |

+--------------+--------+-------+-----------------+---------+

And enable it using the SET GLOBAL command:

MariaDB [(none)]> SET GLOBAL server_audit_logging=ON;

Query OK, 0 rows affected (0.000 sec)

Or make it persistent in the my.cnf configuration file to start to auditing:

[MYSQLD]

server_audit_logging=ON

Another way to add it into the MariaDB instance is by adding the plugin_load_add parameter in the my.cnf configuration file:

[mariadb]

plugin_load_add = server_audit

It is also recommended to add the FORCE_PLUS_PERMANENT to avoid uninstalling it:

[mariadb]

plugin_load_add = server_audit

server_audit=FORCE_PLUS_PERMANENT

Now you have the MariaDB Audit Plugin installed, let’s see how to configure it.

MariaDB Audit Plugin Configuration

To check the current configuration you can see the value of the “server_audit%” global variables by running the following command:

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE "server_audit%";

+-------------------------------+-----------------------+

| Variable_name                 | Value                 |

+-------------------------------+-----------------------+

| server_audit_events           |                       |

| server_audit_excl_users       |                       |

| server_audit_file_path        | server_audit.log      |

| server_audit_file_rotate_now  | OFF                   |

| server_audit_file_rotate_size | 1000000               |

| server_audit_file_rotations   | 9                     |

| server_audit_incl_users       |                       |

| server_audit_logging          | OFF                   |

| server_audit_mode             | 0                     |

| server_audit_output_type      | file                  |

| server_audit_query_log_limit  | 1024                  |

| server_audit_syslog_facility  | LOG_USER              |

| server_audit_syslog_ident     | mysql-server_auditing |

| server_audit_syslog_info      |                       |

| server_audit_syslog_priority  | LOG_INFO              |

+-------------------------------+-----------------------+

15 rows in set (0.001 sec)

You can modify these variables using the SET GLOBAL command or make it persistent in the my.cnf configuration file under the [mysqld] section.

Let’s describe some of the most important variables:

  • server_audit_logging:  Enables audit logging.
  • server_audit_events: Specifies the events that you want to record. By default, the value is empty, which means that all events are recorded. The options are CONNECTION, QUERY, and TABLE.
  • server_audit_excl_users, server_audit_incl_users: These variables specify which users’ activity should be excluded or included in the audit log file. By default, all users’ activity is recorded.
  • server_audit_output_type: By default auditing output is sent to a file. The other option is syslog, meaning all entries go to the syslog facility.
  • server_audit_syslog_facility, server_audit_syslog_priority: Specifies the syslog facility and the priority of the events that should go to syslog.

After configuring it, you can see the audit events in the specified log file (or syslog). Let’s see how it looks.

MariaDB Audit Plugin Log

To see the events registered by the Audit Log Plugin, you can check the specified log file (by default server_audit.log).

$ tail -f /var/lib/mysql/server_audit.log

20200703 19:07:04,MariaDB1,cmon,10.10.10.116,64,915239,QUERY,information_schema,'FLUSH /*!50500 SLOW */ LOGS',0

20200703 19:07:05,MariaDB1,cmon,10.10.10.116,61,915240,QUERY,information_schema,'SHOW GLOBAL STATUS',0

20200703 19:07:05,MariaDB1,cmon,10.10.10.116,64,915241,WRITE,mysql,slow_log,

20200703 19:07:05,MariaDB1,cmon,10.10.10.116,64,915241,QUERY,information_schema,'SET GLOBAL SLOW_QUERY_LOG=1',0

20200703 19:07:06,MariaDB1,cmon,10.10.10.116,61,915242,QUERY,information_schema,'SHOW GLOBAL STATUS',0

20200703 19:15:42,MariaDB1,root,localhost,124,0,CONNECT,,,0

20200703 19:15:42,MariaDB1,root,localhost,124,917042,QUERY,,'select @@version_comment limit 1',0

20200703 19:15:48,MariaDB1,root,localhost,124,0,DISCONNECT,,,0

20200703 19:57:41,MariaDB1,root,localhost,135,925831,QUERY,,'create database test1',0

20200703 19:58:05,MariaDB1,root,127.0.0.1,136,0,FAILED_CONNECT,,,1045

20200703 19:58:05,MariaDB1,root,127.0.0.1,136,0,DISCONNECT,,,0

20200703 19:58:49,MariaDB1,root,localhost,137,926073,QUERY,,'SELECT DATABASE()',0

20200703 19:58:49,MariaDB1,root,localhost,137,926075,QUERY,test1,'show databases',0

20200703 19:58:49,MariaDB1,root,localhost,137,926076,QUERY,test1,'show tables',0

20200703 19:59:20,MariaDB1,root,localhost,137,926182,CREATE,test1,t1,

20200703 19:59:20,MariaDB1,root,localhost,137,926182,QUERY,test1,'create table t1 (id int, message text)',0

20200703 19:59:48,MariaDB1,root,localhost,137,926287,QUERY,test1,'insert into t1 values (4,\'message 1\')',0

As you can see in the above log, you will have events about database connections and the queries running there, depending on the server_audit_events configuration.

Using the MariaDB Audit Plugin in ClusterControl

In order to avoid manual configuration, you can enable the Audit Plugin from the ClusterControl UI. For this, you only need to go to ClusterControl -> Select the MariaDB Cluster -> Security -> Audit Log:

And you will have the plugin enabled without any manual installation nor configuration.

Using ClusterControl you can also take advantage of different features, not only security but also monitoring, managing, and backing up, among other useful features.

Conclusion

Auditing is required for many security regulations and it is also useful if you want to know what happened in your database, and when and who was responsible for that.

MariaDB Audit Plugin is an excellent way to audit your databases without using any external tool, and it is also compatible with MySQL and Percona Server. If you want to avoid configuring it manually, you can use ClusterControl to enable this Audit Plugin in an easy way from the ClusterControl UI.

What's New in MariaDB MaxScale 2.4

$
0
0

MaxScale 2.4 was released on December 21st, 2019, and ClusterControl 1.7.6 supports the monitoring and managing up to this version. However, for deployment, ClusterControl only supports up to version 2.3. One has to manually upgrade the instance manually, and fortunately, the upgrade steps are very straightforward. Just download the latest version from MariaDB MaxScale download page and perform the package installation command. 

The following commands show how to upgrade from an existing MaxScale 2.3 to MaxScale 2.4 on a CentOS 7 box:

$ wget https://dlm.mariadb.com/1067184/MaxScale/2.4.10/centos/7/x86_64/maxscale-2.4.10-1.centos.7.x86_64.rpm
$ systemctl stop maxscale
$ yum localinstall -y maxscale-2.4.10-1.centos.7.x86_64.rpm
$ systemctl start maxscale
$ maxscale --version
MaxScale 2.4.10

In this blog post, we are going to highlight some of the notable improvements and new features of this version and how it looks like in action. For a full list of changes in MariaDB MaxScale 2.4, check out its changelog.

Interactive Mode Command History

This is basically a small improvement with a major impact on MaxScale administration and monitoring task efficiency. The interactive mode for MaxCtrl now has its command history. Command history easily allows you to repeat the executed command by pressing the up or down arrow key. However, Ctrl+R functionality (recall the last command matching the characters you provide) is still not there.

In the previous versions, one has to use the standard shell mode to make sure the commands are captured by .bash_history file.

GTID Monitoring for galeramon

This is a good enhancement for those who are running on Galera Cluster with geographical redundancy via asynchronous replication, also known as cluster-to-cluster replication, or MariaDB Galera Cluster replication over MariaDB Replication.

In MaxScale 2.3 and older, this is what it looks like if you have enabled master-slave replication between MariaDB Clusters:

Maxscale 2.4

For MaxScale 2.4, it is now looking like this (pay attention to Galera1's row):

Maxscale 2.4

It's now easier to see the replication state for all nodes from MaxScale, without the need to check on individual nodes repeatedly.

SmartRouter

This is one of the new major features in MaxScale 2.4, where MaxScale is now smart enough to learn which backend MariaDB backend server is the best to process the query. SmartRouter keeps track of the performance, or the execution time, of queries to the clusters. Measurements are stored with the canonical of a query as the key. The canonical of a query is the SQL with all user-defined constants replaced with question marks, for example:

UPDATE `money_in` SET `accountholdername` = ? , `modifiedon` = ? , `status` = ? , `modifiedby` = ? WHERE `id` = ? 

This is a very useful feature if you are running MariaDB on a multi-site geographical replication or a mix of MariaDB storage engines in one replication chain, for example, a dedicated slave to handle transaction workloads (OLTP) with InnoDB storage engine and another dedicated slave to handle analytics workloads (OLAP) with Columnstore storage engine.

Supposed we are having two sites - Sydney and Singapore as illustrated in the following diagram:

Maxscale 2.4

The primary site is located in Singapore and has a MariaDB master and a slave, while another read-only slave is located in Sydney. The application connects to the MaxScale instance located in its respective country with the following port settings:

  • Read-write split: 3306
  • Round robin: 3307
  • Smart router: 3308

Our SmarRouter service and listener definitions are:

[SmartQuery]
type=service
router=smartrouter
servers=DB_1,DB_2,DB_5
master=DB_1
user=maxscale
password=******
[SmartQuery-Listener]
type = listener
service = SmartQuery
protocol = mariadbclient
port = 3308

Restart MaxScale and start sending a read-only query to both MaxScale nodes located in Singapore and Sydney. If the query is processed by the round-robin router (port 3307), we would see the query is being routed based on the round-robin algorithm:

(app)$ mysql -usbtest -p -h maxscale_sydney -P3307 -e 'SELECT COUNT(id),@@hostname FROM sbtest.sbtest1'
+-----------+--------------------+
| count(id) | @@hostname         |
+-----------+--------------------+
|   1000000 | mariadb_singapore2 |
+-----------+--------------------+

From the above, we can tell that Sydney's MaxScale forwarded the above query to our Singapore's slave, which is not the best routing option per se.

With SmartRouter listening on port 3308, we would see the query is being routed to the nearest slave in Sydney:

(app)$ mysql -usbtest -p -h maxscale_sydney -P3308 -e 'SELECT COUNT(id),@@hostname FROM sbtest.sbtest1'
+-----------+-----------------+
| count(id) | @@hostname      |
+-----------+-----------------+
|   1000000 | mariadb_sydney1 |
+-----------+-----------------+

And if the same query is executed in our Singapore site, it will be routed to the MariaDB slave located in Singapore:

(app)$ mysql -usbtest -p -h maxscale_singapore -P3308 -e 'SELECT COUNT(id),@@hostname FROM sbtest.sbtest1'
+-----------+--------------------+
| count(id) | @@hostname         |
+-----------+--------------------+
|   1000000 | mariadb_singapore2 |
+-----------+--------------------+

There is a catch though. When SmartRouter sees a read-query whose canonical has not been seen before, it will send the query to all clusters. The first response from a cluster will designate that cluster as the best one for that canonical. Also, when the first response is received, the other queries are canceled. The response is sent to the client once all clusters have responded to the query or the cancel.

This means, to keep track of the canonical query (normalized query) and measure its performance, you probably will see the very first query fails in its first execution, for example:

(app)$ mysql -usbtest -p -h maxscale_sydney -P3308 -e 'SELECT COUNT(id),@@hostname FROM sbtest.sbtest1'
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query

From the general log in MariaDB Sydney, we can tell that the first query (ID 74) was executed successfully (connect, query and quit), despite the "Lost connection" error from MaxScale:

  74 Connect  sbtest@3.25.143.151 as anonymous on 
  74 Query    SELECT COUNT(id),@@hostname FROM sbtest.sbtest1
  74 Quit

While the identical subsequent query was correctly processed and returned with the correct response:

(app)$ mysql -usbtest -p -h maxscale_sydney -P3308 -e 'SELECT COUNT(id),@@hostname FROM sbtest.sbtest1'
+-----------+------------------------+
| count(id) | @@hostname             |
+-----------+------------------------+
|   1000000 | mariadb_sydney.cluster |
+-----------+------------------------+

Looking again at the general log in MariaDB Sydney (ID 75), the same processing events happened just like the first query:

  75 Connect  sbtest@3.25.143.151 as anonymous on 
  75 Query    SELECT COUNT(id),@@hostname FROM sbtest.sbtest1
  75 Quit

From this observation, we can conclude that occasionally, MaxScale has to fail the first query in order to measure performance and become smarter for the subsequent identical queries. Your application must be able to handle this "first error" properly before returning to the client or retry the transaction once more.

UNIX Socket for Server

There are multiple ways to connect to a running MySQL or MariaDB server. You could use the standard networking TCP/IP with host IP address and port (remote connection), named pipes/shared memory on Windows or Unix socket files on Unix-based systems. The UNIX socket file is a special kind of file that facilitates communications between different processes, which in this case is the MySQL client and the server. The socket file is a file-based communication, and you can't access the socket from another machine. It provides a faster connection than TCP/IP (no network overhead) and a more secure connection approach because it can be used only when connecting to a service or process on the same computer.

Supposed the MaxScale server is also installed on the MariaDB Server itself, we can use the socket UNIX socket file instead. Under the Server section, remove or comment the "address" line and add the socket parameter with the location of the socket file:

[DB_2]
type=server
protocol=mariadbbackend
#address=54.255.133.39
socket=/var/lib/mysql/mysql.sock

Before applying the above changes, we have to create a MaxScale axscale user from localhost. On the master server:

MariaDB> CREATE USER 'maxscale'@'localhost' IDENTIFIED BY 'maxscalep4ss';
MariaDB> GRANT SELECT ON mysql.user TO 'maxscale'@'localhost';
MariaDB> GRANT SELECT ON mysql.db TO 'maxscale'@'localhost';
MariaDB> GRANT SELECT ON mysql.tables_priv TO 'maxscale'@'localhost';
MariaDB> GRANT SHOW DATABASES ON *.* TO 'maxscale'@'localhost';

After a restart, MaxScale will show the UNIX socket path instead of the actual address, and the server listing will be shown like this:

Maxscale 2.4

As you can see, the state and GTID information are retrieved correctly through a socket connection. Note that this DB_2 is still listening to port 3306 for the remote connections. It just shows that MaxScale uses a socket to connect to this server for monitoring.

Using socket is always better due to the fact that it only allows local connections and it is more secure. You could also close down your MariaDB server from the network (e.g, --skip-networking) and let MaxScale handle the "external" connections and forward them to the MariaDB server via UNIX socket file.

Server Draining

In MaxScale 2.4, the backend servers can be drained, which means existing connections can continue to be used, but no new connections will be created to the server. With the drain feature, we can perform a graceful maintenance activity without affecting the user experience from the application side. Note that draining a server can take a longer time, depending on the running queries that need to be gracefully closed.

To drain a server, use the following command:

Maxscale 2.4

The after-effect could be one of the following states:

  • Draining - The server is being drained.
  • Drained - The server has been drained. The server was being drained and now the number of connections to the server has dropped to 0.
  • Maintenance - The server is under maintenance.

After a server has been drained, the state of the MariaDB server from MaxScale point of view is "Maintenance":

Maxscale 2.4

When a server is in maintenance mode, no connections will be created to it and existing connections will be closed.

Conclusion

MaxScale 2.4 brings a lot of improvements and changes over the previous version and it's the best database proxy to handle MariaDB servers and all of its components.

Deploying MariaDB Replication for High Availability

$
0
0

MariaDB Server offers asynchronous and synchronous replication. It can be set up to have a multi-source replication or with a multi-master setup. 

For a read and write intensive application, a master-slave setup is common, but can differ based on the underlying stack needed to build a highly available database environment. 

Having a master-slave replication setup might not satisfy your needs, especially in a production environment. A MariaDB Server alone (master-slave setup) is not enough to offer high availability as it still has a single point of failure (SPOF). 

MariaDB introduced an enterprise product (MariaDB Platform) to address this high availability issue. It includes various components: an enterprise version of MariaDB, MariaDB ColumnStore, MaxScale, and lightweight MariaDB Connectors. Compared to other vendors with the same enterprise solution offering, it could be a cost effective option, however not everyone needs this level of complexity.

In this blog, we'll show you how to use MariaDB Server using replication on a highly available environment with the option to choose from using all free tools or our cost-efficient, management software to run and monitor your MariaDB Server infrastructure.

MariaDB High-Availability Topology Setup

A usual setup for a master-slave topology with MariaDB Server uses asynchronous or synchronous approach with just one master receiving writes, then replicates its changes to the slaves just like the diagram below:

MariaDB High-Availability Topology Setup

But then again, this doesn't serve any high availability and has a single point of failure. If the master dies, then your application client no longer functions. Now, we need to add in the stack to have an auto-failover mechanism to avoid SPOF and also offers load balancing for splitting read-writes and in a round-robin fashion. So for now, we'll end up having the type of topology,

MariaDB High-Availability Topology Setup

Now, this topology serves more safety in terms of SPOF. MaxScale will do the read and write splitting over the database nodes from your master against the slaves. MaxScale does a perfect approach when dealing with this type of setup. MaxScale also has auto-detection built-in. So whatever changes occur on the state of your database nodes, it will detect and act accordingly. MaxScale has the availability to proceed a failover or even a switchover. To know more about its failover mechanism, read our previous blog which tackles the mechanism of MariaDB MaxScale failover. 

Take note that MaxScale failover mechanism with MariaDB Monitor also has its limitations. It's best applied only for a master-slave setup with no overcomplicated setup. This means that a master-master setup is not supported. However, MaxScale has more things to offer. It does not only do some load balancing as it performs read-write splits, it has built-in SmartRouter which sends the query to the most performant node. Although this doesn't add the feature of being highly available but it helps the nodes from getting stuck in traffic and avoid certain database nodes from under-performing that can cause timeouts or to a totally unavailable server caused by high resource intensive activity on-going.

One thing as a caveat of using MaxScale, they are using BSL (Business Source LIcense). You might have to review the FAQ before adopting this software.

Another option to choose from is using a more convenient approach. It can be cost-efficient for you to choose using ClusterControl and have proxies in the middle using HaProxy, MaxScale, or ProxySQL, for which the latter can be configured to from light-weight to a more production-level configuration that does query routing, query filtering, firewall, or security. See the illustration below:

MariaDB High-Availability Topology Setup

Now, sitting on top of them is the ClusterControl. ClusterControl is set up with a high availability i.e. CMON HA. Alternatively, the proxy layer can be chosen from either HaProxy--a very lightweight option to choose from, MaxScale, as mentioned previously, or ProxySQL which has a more refined set of parameters if you want more flexibility and configuration ideal for a high-scaled production setup. ClusterControl will handle the auto-detection in terms of the health status of the nodes, especially the master which is the main node to determine if it requires a failover or not. Now, this can be more self-sufficient yet it adds more cost due to a number of nodes required to implement this setup and also using ClusterControl auto-failover which applies on our advance and enterprise license. But on the other hand, it provides you all the safety, security, and observability towards your database infrastructure. It is actually more of a low-cost enterprise implementation compared to the available solutions in the global market.

Deploying Your MariaDB Master-Slave Replication for High Availability

Let's assume that you have an existing master-slave setup of MariaDB. For this example, we’ll use ClusterControl using the free community edition which you can install and use free of charge. It just makes your work easy and quick to set up. To do this, you just have to import your existing MariaDB Replication cluster. Checkout our previous blog on how to manage MariaDB with ClusterControl. For this blog, I have the following setup initially as my MariaDB Replication cluster as seen below:

Deploying Your MariaDB Master-Slave Replication for High Availability

Now, let's use MaxScale here as an alternative solution from MariaDB Platform which also offers high availability. To do that, it's very easy to use with ClusterControl by just a few clicks, you are then able to set up your MaxScale that is running on-top of your existing MariaDB Replication cluster. To do that, just go to Manage → Load Balancer → MaxScale, and you'll be able to setup and provide the appropriate values as seen below,

Deploying Your MariaDB Master-Slave Replication for High Availability

Then just enable or click the checkbox option to select which servers have to be added as part of your MaxScale monitoring. See below,

Deploying Your MariaDB Master-Slave Replication for High Availability

Assuming that you have more than one MaxScale node to add, just repeat the same steps.

Lastly, we'll set up Keepalived to keep our MaxScale nodes always available whenever necessary. This is just  very quick with just simple steps using ClusterControl. Again, you have to go to Manage → Load Balancer but instead, select Keepalived,

Deploying Your MariaDB Master-Slave Replication for High Availability

As you've noticed, I've placed my Keepalived along with MaxScale on the same node of my slave (192.168.10.30). Whereas, on the other hand, the second (2nd) Keepalived is running on 192.168.10.40 along with Maxscale on the same host.

The result of the topology is production ready which can provide you query routing, high availability, and with auto-failover equipped with extensive monitoring and observability using ClusterControl. See below,

Deploying Your MariaDB Master-Slave Replication for High Availability

Conclusion

Using MariaDB Server replication alone does not offer you high availability. Extending and using third-party tools will equip you to have your database stack highly available by not only relying on MariaDB products or even using MariaDB Platform. 

There are ways to achieve this and manage it to be more cost-effective. Yet, there is a huge difference to availing to these solutions available in the market such as ClusterControl since it provides you speed, less hassle,, and of course the ultimate observability with real-time and up-to-date events not only the health but also the events occurring in your database cluster.

 

How to Design a Geographically Distributed MariaDB Cluster

$
0
0

It is very common to see databases distributed across multiple geographical locations. One scenario for doing this type of setup is for disaster recovery, where your standby data center is located in a separate location than your main datacenter. It might as well be required so that the databases are located closer to the users. 

The main challenge to achieving this setup is by designing the database in a way that reduces the chance of issues related to the network partitioning.

MariaDB Cluster can be a good choice to build such an environment for several reasons. We would like to discuss them here and also talk a bit about how such an environment may look like.

Why Use MariaDB Cluster for Geo-Distributed Environments?

First reason is that MariaDB Cluster can support multiple writers. This makes the write routing way easier to design - you just write to the local MariaDB nodes. Of course, given synchronous replication, latency impacts the write performance and you may see that your writes are getting slower if you spread your cluster too far geographically. After all, you can’t ignore the laws of physics and they say, as of now at least, that even the speed of light in fiber connections is limited. Any routers added on top of that will also increase latency even if only by a couple milliseconds.

Second, lag handling in MariaDB Cluster. Asynchronous replication is a subject for replication lag - slaves may not be up to date with the data if they struggle to apply all the changes in time. In MariaDB Cluster this is different - flow control is a mechanism that is intended to keep the cluster in sync. Well, almost - in some edge cases you can still observe lag. We are talking here about, typically, milliseconds, a couple seconds at most while in the asynchronous replication sky is the limit.

Why Use MariaDB Cluster for Geo-Distributed Environments?

Third, segments. By default MariaDB CLuster uses all to all communication and every writeset is sent by the node to all other nodes in the cluster. This behavior can be changed using segments. Segments allow users to split MariaDB Clusters in several parts. Each segment may contain multiple nodes and it elects one of them as a relay node. Such nodes receive writesets from other segments and redistribute them across MariaDB nodes local to the segment. As a result, as you can see on the diagram above, it is possible to reduce the replication traffic going over WAN three times - just two “replicas” of the replication stream are being sent over WAN: one per datacenter compared to one per slave in asynchronous replication.

Finally, where MariaDB Cluster really shines is the handling of the network partitioning. MariaDB Cluster constantly monitors the state of the nodes in the cluster. Every node attempts to connect with its peers and exchange the state of the cluster. If a subset of nodes is not reachable, MariaDB attempts to relay the communication so if there is a way to reach those nodes, they will be reached.

Why Use MariaDB Cluster for Geo-Distributed Environments?

An example can be seen on the diagram above: DC 1 lost the connectivity with DC2 but DC2 and DC3 can connect. In this case one of the nodes in DC3 will be used to relay data from DC1 to DC2 ensuring that the intra-cluster communication can be maintained.

Why Use MariaDB Cluster for Geo-Distributed Environments?

MariaDB is able to take actions based on the state of the cluster. It implements quorum - majority of the nodes have to be available in order for the cluster to be able to operate. If node gets disconnected from the cluster and cannot reach any other node, it will cease to operate. 

As can be seen on the diagram above, there’s a partial loss of the network communication in DC1 and the affected node is removed from the cluster, ensuring that the application will not access outdated data.

Why Use MariaDB Cluster for Geo-Distributed Environments?

This is also true on a larger scale. The DC1 got all of its communication cut off. As a result, the whole datacenter has been removed from the cluster and neither of its nodes will serve the traffic. The rest of the cluster maintained majority (6 out of 9 nodes are available) and it reconfigured itself to keep the connection between DC 2 and DC3. In the diagram above we assumed the writer hits the node in DC2 but please keep in mind that MariaDB is capable of running with multiple writers.

Designing Geographically Distributed MariaDB Cluster

We went through some of the features that make MariaDB Cluster a nice fit for geo-distributed environments, let’s focus now a bit on the design. At the beginning, let’s explain what the environment we are working with. We will use three remote data centers, connected via Wide Area Network (WAN). Each datacenter will receive writes from local application servers. Reads will also be only local. This is intended to avoid unnecessary traffic crossing the WAN. 

To make this blog less complicated, we won’t be going into details of how the connectivity should look like. We assume some sort of a properly configured, secure connection across all datacenters. VPN or other tools can be used to implement that.

We will use MaxScale as a loadbalancer. MaxScale will be deployed locally in each datacenter. It will also route traffic only to the local nodes. Remote nodes can always be added manually and we will explain cases where this might be a good solution. Applications can be configured to connect to one of the local MaxScale nodes using a round-robin algorithm. We can as well use Keepalived and Virtual IP to route the traffic towards the single MaxScale node, as long as a single MaxScale node would be able to handle all of the traffic. 

Another possible solution is to collocate MaxScale with application nodes and configure the application to connect to the proxy on the localhost. This approach works quite well under the assumption that it is unlikely that MaxScale will not be available yet the application would work ok on the same node. Typically what we see is either node failure or network failure, which would affect both MaxScale and application at the same time.

Designing Geographically Distributed MariaDB Cluster

The diagram above shows the version of the environment, where MaxScale forms proxy farms - all proxy nodes with the same configuration, load balanced using Keepalived, or just simply round robin from the application across all MaxScale nodes. MaxScale is configured to distribute the workload across all MariaDB nodes in the local datacenter. One of those nodes would be picked as a node to send the writes to while SELECTs would be distributed across all nodes. Having one dedicated writer node in a datacenter helps to reduce the number of possible certification conflicts, leading to, typically, better performance. To reduce this even further we would have to start sending the traffic over the WAN connection, which is not ideal as the bandwidth utilization would significantly increase. Right now, with segments in place, only two copies of the writeset are being sent across datacenters - one per DC.

Conclusion

As you can see, MariaDB Cluster can easily be used to create geo-distributed clusters that can work even across the world. The limiting factor will be network latency. If it is too high, you may have to consider using separate MariaDB clusters connected using asynchronous replication.

MaxScale Basic Management Using MaxCtrl for MariaDB Cluster

$
0
0

In the previous blog post, we have covered some introductions to MaxScale installation, upgrade, and deployment using MaxCtrl command-line client. In this blog post, we are going to cover the MaxScale management aspects for our MariaDB Cluster. 

There are a number of MaxScale components that we can manage with MaxCtrl, namely:

  1. Server management
  2. Service management
  3. Monitor management
  4. Listener management
  5. Filter management
  6. MaxScale management
  7. Logging management

In this blog post, we are going to cover the first 4 components which are commonly used in MariaDB Cluster. All of the commands in this blog post are based on MaxScale 2.4.11. 

Server Management

List/Show Servers

List a summary of all servers in MaxScale:

 maxctrl: list servers
┌────────────────┬────────────────┬──────┬─────────────┬─────────────────────────┬─────────────┐
│ Server         │ Address        │ Port │ Connections │ State                   │ GTID        │
├────────────────┼────────────────┼──────┼─────────────┼─────────────────────────┼─────────────┤
│ mariadbgalera1 │ 192.168.10.201 │ 3306 │ 0           │ Slave, Synced, Running  │ 100-100-203 │
├────────────────┼────────────────┼──────┼─────────────┼─────────────────────────┼─────────────┤
│ mariadbgalera2 │ 192.168.10.202 │ 3306 │ 0           │ Slave, Synced, Running  │ 100-100-203 │
├────────────────┼────────────────┼──────┼─────────────┼─────────────────────────┼─────────────┤
│ mariadbgalera3 │ 192.168.10.203 │ 3306 │ 0           │ Master, Synced, Running │ 100-100-203 │
└────────────────┴────────────────┴──────┴─────────────┴─────────────────────────┴─────────────┘

For MariaDB Cluster, the server list summarizes the node and cluster state, with its MariaDB GTID, only if the cluster is set to replicate from another cluster via the standard MariaDB Replication. The state is used by MaxScale to control the behavior of the routing algorithm:

  • Master - For a Cluster, this is considered the Write-Master.
  • Slave - If all slaves are down, but the master is still available, then the router will use the master.
  • Synced - A Cluster node which is in a synced state with the cluster.
  • Running - A server that is up and running. All servers that MariaDB MaxScale can connect to are labeled as running.

Although MariaDB Cluster is capable of handling multi-master replication, MaxScale will always pick one node to hold the Master role which will receive all writes for readwritesplit routing. By default, the Galera Monitor will choose the node with the lowest wsrep_local_index value as the master. This will mean that two MaxScales running on different servers will choose the same server as the master.

Show all servers in more detail:

maxctrl: show servers

Create Servers

This is commonly the first thing you need to do when setting up MaxScale as a load balancer. It's common to add all of the MariaDB Cluster nodes into MaxScale and label it with an object name. In this example, we label the Galera nodes as in "mariadbgalera#" format:

maxctrl: create server mariadbgalera1 192.168.0.221 3306
maxctrl: create server mariadbgalera2 192.168.0.222 3306
maxctrl: create server mariadbgalera3 192.168.0.222 3306

The server state will only be reported correctly after we have activated the monitoring module, as shown under the Monitor Management section further down.

Delete a Server

To delete a server, one has to unlink the server from any services or monitors beforehand. As an example, in the following server list, we would want to delete mariadbgalera3 from MaxScale:

  maxctrl: list servers
┌────────────────┬────────────────┬──────┬─────────────┬─────────────────────────┬─────────────┐
│ Server         │ Address        │ Port │ Connections │ State                   │ GTID        │
├────────────────┼────────────────┼──────┼─────────────┼─────────────────────────┼─────────────┤
│ mariadbgalera1 │ 192.168.10.201 │ 3306 │ 0           │ Slave, Synced, Running  │ 100-100-203 │
├────────────────┼────────────────┼──────┼─────────────┼─────────────────────────┼─────────────┤
│ mariadbgalera2 │ 192.168.10.202 │ 3306 │ 0           │ Slave, Synced, Running  │ 100-100-203 │
├────────────────┼────────────────┼──────┼─────────────┼─────────────────────────┼─────────────┤
│ mariadbgalera3 │ 192.168.10.203 │ 3306 │ 0           │ Master, Synced, Running │ 100-100-203 │
└────────────────┴────────────────┴──────┴─────────────┴─────────────────────────┴─────────────┘

List out all monitors and see if the server is part of any monitor module:

 

 maxctrl: list monitors
 ┌─────────────────┬─────────┬────────────────────────────────────────────────┐
 │ Monitor         │ State   │ Servers                                        │
 ├─────────────────┼─────────┼────────────────────────────────────────────────┤
 │ MariaDB-Monitor │ Running │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
 └─────────────────┴─────────┴────────────────────────────────────────────────┘

Looks like mariadbgalera3 is part of MariaDB-Monitor, so we have to remove it first by using the "unlink monitor" command:

 maxctrl: unlink monitor MariaDB-Monitor mariadbgalera3
 OK

Next, list out all services to check if the corresponding server is part of any MaxScale services:

  maxctrl: list services
┌─────────────────────┬────────────────┬─────────────┬───────────────────┬────────────────────────────────────────────────┐
│ Service             │ Router         │ Connections │ Total Connections │ Servers                                        │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Read-Write-Service  │ readwritesplit │ 1           │ 1                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Round-Robin-Service │ readconnroute  │ 1           │ 1                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Replication-Service │ binlogrouter   │ 1           │ 1                 │                                                │
└─────────────────────┴────────────────┴─────────────┴───────────────────┴────────────────────────────────────────────────┘

As you can see, mariadbgalera3 is part of the Read-Write-Service and Round-Robin-Service. Remove the server from those services by using "unlink service" command:

 maxctrl: unlink service Read-Write-Service mariadbgalera3
OK
 maxctrl: unlink service Round-Robin-Service mariadbgalera3
OK

Finally, we can remove the server from MaxScale by using the "destroy server" command:

 maxctrl: destroy server mariadbgalera3
OK

Verify using the "list servers" that we have removed mariadbgalera3 from MaxScale.:

  maxctrl: list servers
┌────────────────┬────────────────┬──────┬─────────────┬─────────────────────────┬──────┐
│ Server         │ Address        │ Port │ Connections │ State                   │ GTID │
├────────────────┼────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ mariadbgalera1 │ 192.168.10.201 │ 3306 │ 0           │ Master, Synced, Running │      │
├────────────────┼────────────────┼──────┼─────────────┼─────────────────────────┼──────┤
│ mariadbgalera2 │ 192.168.10.202 │ 3306 │ 0           │ Slave, Synced, Running  │      │
└────────────────┴────────────────┴──────┴─────────────┴─────────────────────────┴──────┘

Modify Server's Parameter

To modify a server's parameter, one can use the "alter server" command which only takes one key/value parameter at a time. For example:

  maxctrl: alter server mariadbgalera3 priority 10
 OK

Use the "show server" command and look into the Parameters section, for a list of parameters that can be changed for the "server" object:

maxctrl: show server mariadbgalera3
...

│ Parameters       │ {                                         │
│                  │     "address": "192.168.10.203",          │
│                  │     "protocol": "mariadbbackend",         │
│                  │     "port": 3306,                         │
│                  │     "extra_port": 0,                      │
│                  │     "authenticator": null,                │
│                  │     "monitoruser": null,                  │
│                  │     "monitorpw": null,                    │
│                  │     "persistpoolmax": 0,                  │
│                  │     "persistmaxtime": 0,                  │
│                  │     "proxy_protocol": false,              │
│                  │     "ssl": "false",                       │
│                  │     "ssl_cert": null,                     │
│                  │     "ssl_key": null,                      │
│                  │     "ssl_ca_cert": null,                  │
│                  │     "ssl_version": "MAX",                 │
│                  │     "ssl_cert_verify_depth": 9,           │
│                  │     "ssl_verify_peer_certificate": false, │
│                  │     "disk_space_threshold": null,         │
│                  │     "priority": "10"│
│                  │ }

Note that alter command effect is immediate and the parameter's value in the runtime will be modified as well as the value in its individual MaxScale configuration file inside /var/lib/maxscale/maxscale.cnf.d/ for persistence across restart.

Set Server State

MaxScale allows the backend Galera servers to be temporarily excluded from the load balancing set by activating the maintenance mode. We can achieve this by using the "set server" command:

 maxctrl: set server mariadbgalera3 maintenance
OK

When looking at the state of the server, we should see this:

 maxctrl: show server mariadbgalera3
...
│ State            │ Maintenance, Running
...

When a server is in maintenance mode, no connections will be created to it and existing connections will be closed. To clear the maintenance state from the host, use the "clear server" command:

 maxctrl: clear server mariadbgalera3 maintenance
OK

Verify with "show server":

 maxctrl: show server mariadbgalera3
...
│ State            │ Slave, Synced, Running                    │
...

Monitor Management

Create a Monitor

The MaxScale monitor module for MariaDB Cluster is called galeramon. Defining a correct monitoring module is necessary so MaxScale can determine the best routing for queries depending on the state of the nodes. For example, if a Galera node is serving as a donor for a joiner node, should it be part of the healthy nodes? In some cases like where the database size is so small, marking a donor node as healthy (by setting the parameter available_when_donor=true in MaxScale) is not a bad plan and sometimes improves the query routing performance.

To create a service (router), one must create a monitoring user on the backend of MariaDB servers. Commonly, one would use the same monitoring user that we have defined for the monitor module. For Galera Cluster, if the monitoring user does not exist, just create it on one of the nodes with the following privileges:

MariaDB> CREATE USER maxscale_monitor@'192.168.0.220' IDENTIFIED BY 'MaXSc4LeP4ss';
MariaDB> GRANT SELECT ON mysql.* TO 'maxscale_monitor'@'192.168.0.220';
MariaDB> GRANT SHOW DATABASES ON *.* TO 'maxscale_monitor'@'192.168.0.220';

Use the "create monitor" command and specify a name with galeramon as the monitor module:

  maxctrl: create monitor MariaDB-Monitor galeramon servers=mariadbgalera1,mariadbgalera2,mariadbgalera3 user=maxscale_monitor password=MaXSc4LeP4ss
OK

Note that we didn't configure MaxScale secret which means we store the user password in plain text format. To enable encryption, see the example in this blog post, Introduction to MaxScale Administration Using maxctrl for MariaDB Cluster under Adding Monitoring into MaxScale section.

List/Show Monitors

To list out all monitors:

 maxctrl: list monitors
┌─────────────────┬─────────┬────────────────────────────────────────────────┐
│ Monitor         │ State   │ Servers                                        │
├─────────────────┼─────────┼────────────────────────────────────────────────┤
│ MariaDB-Monitor │ Running │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
└─────────────────┴─────────┴────────────────────────────────────────────────┘

To get a more detailed look on the monitor, use the "show monitor" command:

 maxctrl: show monitor MariaDB-Monitor

┌─────────────────────┬───────────────────────────────────────────┐
│ Monitor             │ MariaDB-Monitor                           │
├─────────────────────┼───────────────────────────────────────────┤
│ State               │ Running                                   │
├─────────────────────┼───────────────────────────────────────────┤
│ Servers             │ mariadbgalera1                            │
│                     │ mariadbgalera2                            │
│                     │ mariadbgalera3                            │
├─────────────────────┼───────────────────────────────────────────┤
│ Parameters          │ {                                         │
│                     │     "user": "maxscale_monitor",           │
│                     │     "password": "*****",                  │
│                     │     "passwd": null,                       │
│                     │     "monitor_interval": 2000,             │
│                     │     "backend_connect_timeout": 3,         │
│                     │     "backend_read_timeout": 1,            │
│                     │     "backend_write_timeout": 2,           │
│                     │     "backend_connect_attempts": 1,        │
│                     │     "journal_max_age": 28800,             │
│                     │     "disk_space_threshold": null,         │
│                     │     "disk_space_check_interval": 0,       │
│                     │     "script": null,                       │
│                     │     "script_timeout": 90,                 │
│                     │     "events": "all",                      │
│                     │     "disable_master_failback": false,     │
│                     │     "available_when_donor": true,         │
│                     │     "disable_master_role_setting": false, │
│                     │     "root_node_as_master": false,         │
│                     │     "use_priority": false,                │
│                     │     "set_donor_nodes": false              │
│                     │ }                                         │
├─────────────────────┼───────────────────────────────────────────┤
│ Monitor Diagnostics │ {                                         │
│                     │     "disable_master_failback": false,     │
│                     │     "disable_master_role_setting": false, │
│                     │     "root_node_as_master": false,         │
│                     │     "use_priority": false,                │
│                     │     "set_donor_nodes": false              │
│                     │ }                                         │
└─────────────────────┴───────────────────────────────────────────┘

Stop/Start Monitor

Stopping a monitor will pause the monitoring of the servers. This commonly being used in conjunction with "set server" command to manually control server states. To stop the monitoring service use the "stop monitor" command:

 maxctrl: stop monitor MariaDB-Monitor
OK

Verify the state with "show monitor":

 maxctrl: show monitors MariaDB-Monitor
┌─────────────────────┬───────────────────────────────────────────┐
│ Monitor             │ MariaDB-Monitor                           │
├─────────────────────┼───────────────────────────────────────────┤
│ State               │ Stopped                                   │
...

To start it up again, use the "start monitor":

 maxctrl: start monitor MariaDB-Monitor
OK

Modify Monitor's Parameter

To change a parameter for this monitor, use the "alter monitor" command and specify the parameter key/value as below:

 maxctrl: alter monitor MariaDB-Monitor available_when_donor true
OK

Use the "show monitor" command and look into the Parameters section, for a list of parameters that can be changed for the galeramon module:

maxctrl: show server mariadbgalera3
...
│ Parameters          │ {                                         │
│                     │     "user": "maxscale_monitor",           │
│                     │     "password": "*****",                  │
│                     │     "monitor_interval": 2000,             │
│                     │     "backend_connect_timeout": 3,         │
│                     │     "backend_read_timeout": 1,            │
│                     │     "backend_write_timeout": 2,           │
│                     │     "backend_connect_attempts": 1,        │
│                     │     "journal_max_age": 28800,             │
│                     │     "disk_space_threshold": null,         │
│                     │     "disk_space_check_interval": 0,       │
│                     │     "script": null,                       │
│                     │     "script_timeout": 90,                 │
│                     │     "events": "all",                      │
│                     │     "disable_master_failback": false,     │
│                     │     "available_when_donor": true,         │
│                     │     "disable_master_role_setting": false, │
│                     │     "root_node_as_master": false,         │
│                     │     "use_priority": false,                │
│                     │     "set_donor_nodes": false              │
│                     │ }                                         │

Delete a Monitor

In order to delete a monitor, one has to remove all servers linked with the monitor first. For example, consider the following monitor in MaxScale:

 maxctrl: list monitors
┌─────────────────┬─────────┬────────────────────────────────────────────────┐
│ Monitor         │ State   │ Servers                                        │
├─────────────────┼─────────┼────────────────────────────────────────────────┤
│ MariaDB-Monitor │ Running │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
└─────────────────┴─────────┴────────────────────────────────────────────────┘

Remove all servers from that particular service:

 maxctrl: unlink monitor MariaDB-Monitor mariadbgalera1 mariadbgalera2 mariadbgalera3

OK

Our monitor is now looking like this:

 maxctrl: list monitors
┌─────────────────┬─────────┬─────────┐
│ Monitor         │ State   │ Servers │
├─────────────────┼─────────┼─────────┤
│ MariaDB-Monitor │ Running │         │
└─────────────────┴─────────┴─────────┘

Only then we can delete the monitor:

 maxctrl: destroy monitor MariaDB-Monitor
OK

Add/Remove Servers into Monitor

After creating a monitor, we can use the "link monitor" command to add the Galera servers into the monitor. Use the server's name as created under Create Servers section:

 maxctrl: link monitor MariaDB-Monitor mariadbgalera1 mariadbgalera2 mariadbgalera3
OK

Similarly, to remove a server from the service, just use "unlink monitor" command:

 maxctrl: unlink monitor MariaDB-Monitor mariadbgalera3
OK

Verify with "list monitors" or "show monitors" command.

Service Management

Create a Service

To create a service (router), one must create a monitoring user on the backend of MariaDB servers. Commonly, one would use the same monitoring user that we have defined for the monitor module. For Galera Cluster, if the monitoring user does not exist, just create it on one of the nodes with the following privileges:

MariaDB> CREATE USER maxscale_monitor@'192.168.0.220' IDENTIFIED BY 'MaXSc4LeP4ss';
MariaDB> GRANT SELECT ON mysql.* TO 'maxscale_monitor'@'192.168.0.220';
MariaDB> GRANT SHOW DATABASES ON *.* TO 'maxscale_monitor'@'192.168.0.220';

Where 192.168.0.220 is the IP address of the MaxScale host.

Then, specify the name of the service, the routing type together with a monitoring user for MaxScale to connect to the backend servers:

 maxctrl: create service Round-Robin-Service readconnroute user=maxscale_monitor password=******
OK

Also, you can specify additional parameters when creating the service. In this example, we would like the "master" node to be included in the round-robin balancing set for our MariaDB Galera Cluster:

 maxctrl: create service Round-Robin-Service readconnroute user=maxscale_monitor password=****** router_options=master,slave
OK

Use the "show service" command to see the supported parameters. For round-robin router, the list as follows:

  maxctrl: show service Round-Robin-Service
...
│ Parameters          │ {                                          │
│                     │     "router_options": null,                │
│                     │     "user": "maxscale_monitor",            │
│                     │     "password": "*****",                   │
│                     │     "passwd": null,                        │
│                     │     "enable_root_user": false,             │
│                     │     "max_retry_interval": 3600,            │
│                     │     "max_connections": 0,                  │
│                     │     "connection_timeout": 0,               │
│                     │     "auth_all_servers": false,             │
│                     │     "strip_db_esc": true,                  │
│                     │     "localhost_match_wildcard_host": true, │
│                     │     "version_string": null,                │
│                     │     "weightby": null,                      │
│                     │     "log_auth_warnings": true,             │
│                     │     "retry_on_failure": true,              │
│                     │     "session_track_trx_state": false,      │
│                     │     "retain_last_statements": -1,          │
│                     │     "session_trace": 0

For the read-write split router, the supported parameters are:

  maxctrl: show service Read-Write-Service
...
│ Parameters          │ {                                                           │
│                     │     "router_options": null,                                 │
│                     │     "user": "maxscale_monitor",                             │
│                     │     "password": "*****",                                    │
│                     │     "passwd": null,                                         │
│                     │     "enable_root_user": false,                              │
│                     │     "max_retry_interval": 3600,                             │
│                     │     "max_connections": 0,                                   │
│                     │     "connection_timeout": 0,                                │
│                     │     "auth_all_servers": false,                              │
│                     │     "strip_db_esc": true,                                   │
│                     │     "localhost_match_wildcard_host": true,                  │
│                     │     "version_string": null,                                 │
│                     │     "weightby": null,                                       │
│                     │     "log_auth_warnings": true,                              │
│                     │     "retry_on_failure": true,                               │
│                     │     "session_track_trx_state": false,                       │
│                     │     "retain_last_statements": -1,                           │
│                     │     "session_trace": 0,                                     │
│                     │     "use_sql_variables_in": "all",                          │
│                     │     "slave_selection_criteria": "LEAST_CURRENT_OPERATIONS", │
│                     │     "master_failure_mode": "fail_instantly",                │
│                     │     "max_slave_replication_lag": -1,                        │
│                     │     "max_slave_connections": "255",                         │
│                     │     "retry_failed_reads": true,                             │
│                     │     "prune_sescmd_history": false,                          │
│                     │     "disable_sescmd_history": false,                        │
│                     │     "max_sescmd_history": 50,                               │
│                     │     "strict_multi_stmt": false,                             │
│                     │     "strict_sp_calls": false,                               │
│                     │     "master_accept_reads": false,                           │
│                     │     "connection_keepalive": 300,                            │
│                     │     "causal_reads": false,                                  │
│                     │     "causal_reads_timeout": "10",                           │
│                     │     "master_reconnection": false,                           │
│                     │     "delayed_retry": false,                                 │
│                     │     "delayed_retry_timeout": 10,                            │
│                     │     "transaction_replay": false,                            │
│                     │     "transaction_replay_max_size": "1Mi",                   │
│                     │     "optimistic_trx": false                                 │
│                     │ }

List/Show Services
 

To list out all created services (routers), use the "list services" command:

 maxctrl: list services
┌─────────────────────┬────────────────┬─────────────┬───────────────────┬────────────────────────────────────────────────┐
│ Service             │ Router         │ Connections │ Total Connections │ Servers                                        │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Read-Write-Service  │ readwritesplit │ 1           │ 1                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Round-Robin-Service │ readconnroute  │ 1           │ 1                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Binlog-Repl-Service │ binlogrouter   │ 1           │ 1                 │                                                │
└─────────────────────┴────────────────┴─────────────┴───────────────────┴────────────────────────────────────────────────┘

In the above examples, we have created 3 services, with 3 different routers. However, the Binlog-Repl-Service for our binlog server is not linked with any servers yet.

To show all services in details:

 maxctrl: show services

Or if you want to show a particular service:

 maxctrl: show service Round-Robin-Service

Stop/Start Services

Stopping a service will prevent all the listeners for that service from accepting new connections. Existing connections will still be handled normally until they are closed. To stop and start all services, use the "stop services":

 maxctrl: stop services
 maxctrl: show services
 maxctrl: start services
 maxctrl: show services

Or we can use the "stop service" to stop only one particular service:

 maxctrl: stop services Round-Robin-Service

Delete a Service

In order to delete a service, one has to remove all servers and destroy the listeners associated with the service first. For example, consider the following services in MaxScale:

 maxctrl: list services
┌─────────────────────┬────────────────┬─────────────┬───────────────────┬────────────────────────────────────────────────┐
│ Service             │ Router         │ Connections │ Total Connections │ Servers                                        │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Read-Write-Service  │ readwritesplit │ 1           │ 1                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Round-Robin-Service │ readconnroute  │ 1           │ 1                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Replication-Service │ binlogrouter   │ 1           │ 1                 │                                                │
└─────────────────────┴────────────────┴─────────────┴───────────────────┴────────────────────────────────────────────────┘

Let's remove Round-Robin-Service from the setup. Remove all servers from this particular service:

 maxctrl: unlink service Round-Robin-Service mariadbgalera1 mariadbgalera2 mariadbgalera3
OK

Our services are now looking like this:

 maxctrl: list services
┌─────────────────────┬────────────────┬─────────────┬───────────────────┬────────────────────────────────────────────────┐
│ Service             │ Router         │ Connections │ Total Connections │ Servers                                        │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Read-Write-Service  │ readwritesplit │ 1           │ 1                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Round-Robin-Service │ readconnroute  │ 1           │ 1                 │                                                │
├─────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Replication-Service │ binlogrouter   │ 1           │ 1                 │                                                │
└─────────────────────┴────────────────┴─────────────┴───────────────────┴────────────────────────────────────────────────┘

If the service is tied with a listener, we have to remove it as well. Use "list listeners" and specify the service name to look for it:

 maxctrl: list listeners Round-Robin-Service
┌──────────────────────┬──────┬─────────┬─────────┐
│ Name                 │ Port │ Host    │ State   │
├──────────────────────┼──────┼─────────┼─────────┤
│ Round-Robin-Listener │ 3307 │ 0.0.0.0 │ Running │
└──────────────────────┴──────┴─────────┴─────────┘

And then remove the listener:

 maxctrl: destroy listener Round-Robin-Service Round-Robin-Listener
OK

Finally, we can remove the service:

 maxctrl: destroy service Round-Robin-Service
OK

Modify Service's Parameter

Similar to the other object, one can modify a service parameter by using the "alter service" command:

 maxctrl: alter service Read-Write-Service master_accept_reads true
OK

Some routers support runtime configuration changes to all parameters. Currently all readconnroute, readwritesplit and schemarouter parameters can be changed at runtime. In addition to module specific parameters, the following list of common service parameters can be altered at runtime:

  • user
  • passwd
  • enable_root_user
  • max_connections
  • connection_timeout
  • auth_all_servers
  • optimize_wildcard
  • strip_db_esc
  • localhost_match_wildcard_host
  • max_slave_connections
  • max_slave_replication_lag
  • retain_last_statements

Note that alter command effect is immediate and the parameter's value in the runtime will be modified as well as the value in its individual MaxScale configuration file inside /var/lib/maxscale/maxscale.cnf.d/ for persistence across restart.

Add/Remove Servers into Service

After creating a service, we can use the link command to add our servers into the service. Use the server's name as created under Create Servers section:

 maxctrl: link service Round-Robin-Service mariadbgalera1 mariadbgalera2 mariadbgalera3
OK

Similarly, to remove a server from the service, just use "unlink service" command:

 maxctrl: unlink service Round-Robin-Service mariadbgalera3
OK

We can only remove one server from a service at a time, so repeat it for other nodes to delete them. Verify with "list services" or "show services" command.

Listener Management

List Listeners

To list all listeners, we need to know the service name in advanced:

maxctrl: list services
┌──────────────────────┬────────────────┬─────────────┬───────────────────┬────────────────────────────────────────────────┐
│ Service              │ Router         │ Connections │ Total Connections │ Servers                                        │
├──────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Read-Write-Service   │ readwritesplit │ 0           │ 0                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├──────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤
│ Round-Robin-Service  │ readconnroute  │ 0           │ 0                 │ mariadbgalera1, mariadbgalera2, mariadbgalera3 │
├──────────────────────┼────────────────┼─────────────┼───────────────────┼────────────────────────────────────────────────┤

In the above example, we have two services, Read-Write-Service and Round-Robin-Service. Then, we can list out the listener for that particular service. For Read-Write-Service:

 maxctrl: list listeners Read-Write-Service
┌─────────────────────┬──────┬─────────┬─────────┐
│ Name                │ Port │ Host    │ State   │
├─────────────────────┼──────┼─────────┼─────────┤
│ Read-Write-Listener │ 3306 │ 0.0.0.0 │ Running │
└─────────────────────┴──────┴─────────┴─────────┘

And for Round-Robin-Service:

 maxctrl: list listeners Round-Robin-Service
┌──────────────────────┬──────┬─────────┬─────────┐
│ Name                 │ Port │ Host    │ State   │
├──────────────────────┼──────┼─────────┼─────────┤
│ Round-Robin-Listener │ 3307 │ 0.0.0.0 │ Running │
└──────────────────────┴──────┴─────────┴─────────┘

Unlike other objects in MaxScale, the listener does not have a "show" and "alter" commands since it is a fairly simple object.

Create a Listener

Make sure a service has been created. In this example, taken from the Create Service section above, we will create a listener so MaxScale will listen on port 3307 to process the MariaDB connections in a round-robin fashion:

 maxctrl: create listener Round-Robin-Service Round-Robin-Listener 3307
OK

Delete a Listener

To delete a listener, use the "destroy listener" command with the respective service name and listener name:

 maxctrl: destroy listener Round-Robin-Service Round-Robin-Listener
OK

This concludes this episode of basic MaxScale management tasks for MariaDB Cluster. In the next series, we are going to cover the MaxScale advanced management tasks like service filters, MaxScale user management and so on.

 

Tips for Monitoring MariaDB Cluster

$
0
0

In previous blog posts, we have covered topics for Monitoring Your Galera Cluster whether it's MySQL or MariaDB. Although the technology versions don't differ much, MariaDB Cluster has some major changes since version 10.4.2. In this version it supports Galera Cluster 4 and has some great new features that we will look at in this blog post.

For beginners that are not yet familiar with MariaDB Cluster,  is a virtually synchronous multi-master cluster for MariaDB. It is available on Linux only, and only supports the XtraDB/InnoDB storage engines (although there is experimental support for MyISAM - see the wsrep_replicate_myisam system variable). 

The software is a bundled technology which is powered by MariaDB Server,  MySQL-wsrep patch for MySQL Server and MariaDB Server developed by Codership (supports Unix-like OS), and the Galera wsrep provider library

You might compare this product with MySQL Group Replication or with the MySQL InnoDB Cluster, which aims to provide high availability. (Though they differ diversely on principles and approaches for providing HA.) 

Now that we’ve covered the basics, in this blog we are going to provide tips we think beneficial when monitoring your MariaDB Cluster.

The Essentials of MariaDB Cluster

When you start using MariaDB Cluster you have to identify what exactly is your purpose and why you have chosen MariaDB Cluster in the first place. First you have to digest what are the features and their benefits when using MariaDB Cluster. The reason to identify these is because those are essentially what have to be monitored and checked in order for you to determine performance, normal health conditions, and if it's running in accordance to your plans.

Essentially, it is identified as no slave lag, no lost transactions, read scalability, and smaller client latencies. Then questions can arise like, how does it make no slave lag, or lost transactions? How does it make read being scalable or with smaller latencies in the client side? These areas are one of the key areas you need to look and monitor especially for heavy production usage.

Although the MariaDB Cluster itself can be customized accordingly. Applying changes to the default behavior such as pc.weight or pc.ignore_quorum, or even using multicast with UDP for a large number of nodes, can impact the way you monitor the nature of your MariaDB Cluster. But on the other hand, the most essential status variables are usually your silver lining here knowing the state and flow of your cluster is doing fine or its degrading showing a possible problem leading to a catastrophic failure beforehand.

Always Monitor Your Server Activity (Network, Disk, Load, Memory, & CPU)

Monitoring your server activity can also be a complex task if you have a very complicated stack that is intertwined in your database architecture. However, for a MariaDB Cluster, it's always best to have your nodes always set up as dedicated yet simple as possible. Although that doesn't limit you from using all the spare resources, below are the common key areas you have to look into.

Network

Galera Cluster 4 features streaming replication as one of the key features and changes from the previous version. Since streaming replication addresses the drawbacks it had in the previous releases but allows it to manage more than 2GB of write-sets since Galera Cluster 4. This allows big transactions to be fragmented and is highly recommended to enable this during session level alone. This means, monitoring your network activity is very important and crucial to the normal activity of your MariaDB Cluster. This will help you identify which node did have the most or highest network traffic based on the period of time. 

So how will that help you improve where nodes with the highest network traffic have been identified? Well, this provides you room for improvement with your database topology or the architectural layer of your database cluster. Using load balancers or a database proxy allows you to configure proactively your database traffic especially when determining which specific writes shall go to a specific node. Let's say, out of the 3 nodes, one of them is more capable of handling large and big queries due to differences with the hardware specifications. This allows you to manage more of your capex and improve your capacity planning as demands on a specified period of time changes.

Disk

As network activity matters also with your disk performance especially during flushing time. It's also best to determine how committed time and retrieval performs when high peak load is reached. There are times that you stock up your database host with not only being dedicated to a Galera Cluster activity but also mash up with other tools like docker, SQL proxies such as ProxySQL or MaxScale. This gives you control with low load servers and allows you to use the spare resources available that can be utilized for other beneficial purposes especially to your database architecture stack. Once you are able to determine which node upon monitoring has the lowest load but still capable of managing its disk IO utilization, then you can select the specific node while watching over the time passes by. Again, this still gives you better management with your capacity planning. 

CPU, Memory, and Load Activity

Let me briefly put these three areas to look upon when monitoring. In this section, it's always best you have better observability of the following areas at once. It's quicker and easier to understand, especially ruling out a performance bottleneck or identifying bugs that cause your nodes to either stall and that can also affect the other nodes and the possibility of going down the cluster. 

So how does CPU, memory, and load activity upon monitoring help your MariaDB Cluster? Well, as what I have mentioned above, those are one of the few things yet a big factor for daily routine checks. Now, this also helps you identify if these are periodical or random occurrences. If periodical, it might be related to backups running in one of your Galera nodes, or it's a massive query that requires optimization. For example, bad queries with no proper indexes, or in-balance usage of data retrieval such as doing a string comparison for such a large string. That can be undeniably inapplicable for OLTP type databases such as MariaDB Cluster especially if it's really the nature and requirements of your application. Better use other analytical tools such as MariaDB Columnstore, or other third-party analytic processing tools (Apache Spark, Kafka, or MongoDB, etc.) for large string data retrieval and/or string matching. 

So with all these key areas being monitored, the question is, how it shall be monitored? It has to be monitored at least per-minute. With refined monitoring, i.e. per-second of collective metrics can be resource intensive and much greedy in terms of your resources. Although half-a-minute of collectivity is acceptable especially if your data and RPO (recovery point objective) is very low, so you need more granular and real-time data metrics. It is very important that you are able to oversee the whole picture of your database cluster. Aside from this, it's also best and important that whenever what metrics you are monitoring, you have the right tool to tap your attention when things are in danger or even just warnings. Using the proper tool such as ClusterControl helps you manage these key areas to be monitored. I'm using here a free version or community edition of ClusterControl and helps me monitor my nodes without any hassle from installation up to the monitoring of nodes by just a few clicks. For example, see the screenshots below:

Monitoring MariaDB Cluster

The view is a more refined and quick overview of what's happening currently. A more granular graph can be used as well,

Monitoring MariaDB Cluster

or with a more powerful and rich data model which also supports query language can provide you analysis of how your MariaDB Cluster performs based on historical data comparing its performance in a timely manner. For example,

Monitoring MariaDB Cluster

That just provides you more visible metrics. So you see how important it really is to have the right tool when monitoring your MariaDB Cluster.

Ensure Collective Monitoring of Your MariaDB Cluster Statistic Variables

From time to time, it cannot be inevitable that MariaDB Cluster versions will produce new stats to monitor or enhance the nature of monitoring the database by providing more status variables and refine values to look upon. As what I have mentioned above, I am using ClusterControl to monitor my nodes in this example blog. However, that doesn't mean it's the best tool out there. I mean PMM from Percona is very rich when it comes to collective monitoring for every statistic variable that whenever MariaDB Cluster has newer statistic variables to offer, you can leverage this and also change it as PMM is an open-source tool. It's a great advantage that you also have all the visibility of your MariaDB Cluster as every aspect counts especially in a production-based database that caters hundreds of thousands of requests per minute. 

But let's get more specific into the problem here. What are these statistical variables to look into? There's many to count on for a MariaDB Cluster but focusing again on the features and benefits that we believe you use the MariaDB Cluster what it has to offer, then we'll focus into that.

Galera Cluster - Flow Control

The flow control of your MariaDB Cluster provides you the overview of how the replication health performs on all over the cluster. The replication process in Galera Cluster uses a feedback mechanism, which means it signals all over the nodes within that cluster and flags whether the node has to pause or resume replication according to its needs. This also prevents any node from lagging too far while the others are applying the incoming transactions. This is how the flow control serves as its function within Galera. Now, this has to be seen and not to be overlooked when monitoring your MariaDB Cluster. This, as mentioned in one of the benefits upon using MariaDB Cluster is that the avoidance of having slave lag. Although that's too naive to understand about the flow control and the slave lag, but with flow control, it will impact your Galera cluster's performance when there's a lot of queue and commits or flushing of pages to the disk goes very low for such disk issues or it's just the query running is a bad query. If you're a beginner of how Galera works, you might be interested reading this external post about what is flow control in Galera.

Bytes Sent/Received

The bytes sent or received correlates with the networking activity and even is one of the key areas to look side-by-side with flow control. This allows you to determine which node is the most impacted or attributing to the performance issues that are suffering within your Galera Cluster. It is very important as you can check if there can be any degradation in terms of hardware such as your network device or the underlying storage device for which syncing of dirty pages can take too much time to be done.

Cluster Load

Well, this is more of the database activity of how much changes or data retrieval have been queried or done so far since the server's uptime. It helps you rule out what kind of queries are mostly affecting your database cluster performance. This allows you to provide room for improvement especially on balancing the load of your database requests. Using ProxySQL helps you out here with a more refined and granular approach for query routing. Although MaxScale also offers this feature, ProxySQL has more granularity although it also serves some performance impact or cost as well. Impact comes when you only have one ProxySQL as the SQL proxy to work out the query routing and it can struggle when high traffic is on-going. Having cost, if you add more ProxySQL nodes to balance more of the traffic which an underlying KeepAlived. Although, this is a perfect combo but it can be run at a low-cost until needed. However, how will you be able to determine if needed, right? That's the question remains here, so a keen eye to monitor these key areas is very important, not only for observability, but also for improvement of the performance of your database cluster as time goes by.

As such, there are tons of variables to look upon in a MariaDB Cluster. The most important thing here you have to take into account is the tool you are using for monitoring your database cluster. As mentioned earlier, I prefer using the free version license of ClusterControl (Community Edition) here in this blog as it provides me more ways with flexibility to look upon in a Galera Cluster. See the example below,

Monitoring MariaDB Cluster

I have marked or circled in red those tabs that allow me to visually oversee the health of my MariaDB Cluster. Let's say, if your application is greedy over using streaming replication from time to time and it sends a large number of fragments (large network transfer) for cluster inter-activity, it's best to determine how well your nodes can handle the stress. Especially during stress testing before pushing specific changes in your application, it's always best to try and test to determine the capacity management of your application product and determine if your current database nodes and design can handle the load of your application requirements.

Even on a community edition of the ClusterControl, I am able to gather granular and more refined results of the health of my MariaDB Cluster. See below,

Monitoring MariaDB Cluster

This is how you shall approach the monitoring of your MariaDB Cluster. A perfect visualization is always easier and quicker to manage. When things go south, you cannot afford to lose your productivity and also the downtime can impact your business. Although having free does not provide you the luxury and comfort when managing high traffic databases, having alarms, notifications, and database management in one area is a-walk-in-the-park add-ons that ClusterControl can do. 

Conclusion

MariaDB Cluster is not as simple to monitor as compared with the traditional asynchronous MySQL/MariaDB master-slave setups. It works differently and you must have the right tools to determine what's going on and what's going into your database cluster. Always prepare your capacity planning ahead before running your MariaDB Cluster without proper monitoring beforehand. It's always best that your database load and activity is known prior to a catastrophic event.


Comparing MariaDB Server to MariaDB Cluster

$
0
0

MariaDB Server and MariaDB Cluster are open source products powered by the MariaDB Corporation. MariaDB Server is one of the most popular relational databases, it was originally forked from MySQL server.

MariaDB Cluster is a high availability solution built from MariaDB Server, using a Galera Cluster wsrep library to synchronize the data between nodes.  The replication method of Galera is synchronous (or ‘virtually synchronous’), ensuring the data will be the same on all the nodes.

MariaDB server can also be made highly available via standard replication. Replication can be asynchronous or semi-synchronous. 

So how does the MariaDB server with standard replication differ from MariaDB Cluster with Galera Cluster? In this blog, we will compare those two. We will be using ClusterControl to illustrate some of the differences. 

MariaDB Server Architecture

The architecture of MariaDB Server can be a single/standalone instance or master/slave replication as shown in the diagram below. 

MariaDB Server Architecture

The MariaDB Server single instance architecture stands for one node only. The drawback of having a single instance is a single point of failure for the database. If your database crashes and does not come back up, you do not have any failover mechanism, and you need to do a restore to recover your database from the last backup. 

The master/slave architecture is a distributed setup, with the master acting as writer and the slave(s) as reader(s). Using a load balancer like Maxscale or ProxySQL, you can split the database traffic so that writes are sent to the master and reads to the slave(s). Having a replication setup will eliminate a single point of failure for the database, but you need to be able to failover automatically if the master fails. Or else, applications will not be able to write to the database and they will be affected. ClusterControl can be configured to provide automatic failover and recovery for MariaDB replication.

MariaDB Cluster Architecture

MariaDB Cluster is a high availability solution consisting of MariaDB Server and Galera Replication as shown in the architecture diagram below :

MariaDB Cluster Architecture

It is synchronous (“virtually synchronous”) replication, all of the nodes are writable. The synchronous replication guarantees if  the changes happen in one of the galera nodes, it will be available on all the other nodes in the cluster before being committed.

The big difference is that all the nodes are equal from the application point of view, they can send write traffic to any of the database instances. Also, all nodes should have exactly the same data so there is no data loss in case of node failure. 

MariaDB Deployment

Both MariaDB Replication and MariaDB Cluster can be deployed via ClusterControl. When you deploy MariaDB Server, you need to start by choosing MySQL Replication while for MariaDB Cluster, you need to choose MySQL Galera. 

For MariaDB Server, you can either deploy a single node MariaDB instance or you can setup master/slave and bi-directional replication. The minimum number of nodes in a replication setup is two, you need one master and at least one slave.Just fill the IP Address for the master and add slaves (if you want to have master/slave architecture). You can use the Add Second Master field if you want to set up bi-directional replication. A master-master setup will be provisioned with bi-directional replication, but one of the nodes will be set as read-only. The reason is to minimize the risk for data drift and ‘errant transactions’. 

MariaDB Deployment

For MariaDB Cluster, you need at least 3 hosts for target database nodes to be installed. This is because it has to be able to handle network partitioning or “split brain” syndrome.  You just need to fill the ip address when Add Node when defining MySQL Servers configuration.

MariaDB Deployment

Do not forget to choose MariaDB as the vendor of database, database version that you want to install and fill the root password. You can also change the non default datadir to any other path. 

After we configure all of the things, just deploy the cluster. It will trigger a new job for database deployment.

Note that it is also possible to have 2 Galera nodes and one Galera arbitrator aka garbd on a third host.  

MariaDB Server & Cluster Monitoring

Database monitoring is a critical part of the database, you can know the current state of database health. The difference between MariaDB Server and MariaDB Cluster monitoring is Galera Metrics for synchronization.

MariaDB Server & Cluster Monitoring
MariaDB Server & Cluster Monitoring

On MariaDB Server, you can check your current database health through the MySQL Metrics; MySQL Server - General, MySQL Server - Caches, MySQL InnoDB Metrics which is also visible on the MariaDB Cluster as shown in below:

MariaDB Server & Cluster Monitoring

MySQL Server- General gives you information about the current state of InnoDB buffer pool hit ratio, database connection, queries, locking, and database memory utilization.

MySQL Server- Caches, there is a lot of information provided in Caches. Mostly related to the caching in database, eg: buffer pool size, buffer pool instance. There is also information about table cache usage, hit ratio, Cache Hits and Misses. You can also find thread cache usage and hit ratio information .

MySQL Server- InnoDB Metrics shows metrics related to InnoDB storage eg : Bufferpool activity, InnoDB Row operations, InnoDB Log file size, InnoDB Data Read/Write.

MariaDB Server & Cluster Monitoring

On MariaDB Server, if you setup master/slave replication, there is one subcategory of metrics under MySQL Replication - Master. There is information related to master binary log file, master binary log position, and binlog creation frequency. 

MariaDB Server has a lot of information related to the database, these are also available for MariaDB Cluster. The difference is there are two dashboards for MariaDB Cluster - Galera Overview and Galera Server Charts.

MariaDB Server & Cluster Monitoring

Galera Overview gives information related to the current state of Galera replication. There is information like cluster size, flow control sent, flow control received, flow control paused.

Galera Server Charts has information about cluster name, cluster status, size, Global cache size. 

Conclusion

MariaDB Server with standard replication and MariaDB Cluster are not really different products in terms of database service, but they have different characteristics depending on your requirements on availability and scalability. ClusterControl supports both MariaDB Server with standard replication and MariaDB Cluster deployments, so do give both setups a try and let us know your thoughts.

MariaDB Server Database Encryption Basics

$
0
0

Encryption is one of the most important security features to keep your data as secure as possible. Depending on the data that you are handling, not always it is a must but you should at least consider it as a security improvement in your organization anyway, and it is actually recommended to avoid data theft or unauthorized access.

In this blog, we will describe two basic types of encryption and how to configure it on a MariaDB Server.

What is Data Encryption?

There are two basic types of data encryption: at-rest and in-transit. Let’s see what they mean.

Data-at-Rest Encryption

Data stored in a system is known as data-at-rest. The encryption of this data consists of using an algorithm to convert text or code to unreadable. You must have an encryption key to decode the encrypted data. 

Encrypting an entire database should be done with caution since it can result in a serious performance impact. It is therefore wise to encrypt only individual fields or tables.

Encrypting data-at-rest protects the data from physical theft of hard drives or unauthorized file storage access. This encryption also complies with data security regulations, especially if there is financial or health data stored on the filesystem.

Data-in-Transit Encryption

Data transferred or moving around between transactions is known as data-in-transit. The data moving between the server and client while browsing web pages is a good example of this kind of data. 

Since it is always on the move, it needs to be protected with proper encryption to avoid any theft or alteration to the data before it reaches its destination.

The ideal situation to protect data-in-transit is to have the data encrypted before it moves and is only decrypted when it reaches the final destination.

MariaDB Data-at-Rest Encryption

The encryption of tables and tablespaces was added in MariaDB from 10.1 version, and it supports encryption for XtraDB, InnoDB, and Aria storage engines, and also for binary logs.

You can choose different ways to encrypt:

  • All tables
  • Individual tables
  • Everything, excluding individual tables

According to the documentation, using encryption has an overhead of roughly 3-5%, so it is important to have a test environment to stress it and see how it responds, to avoid issues in production.

How to Configure Data-at-Rest Encryption on MariaDB

Let’s check an existing “city” table in a MariaDB database:

$ strings city.ibd |head

infimum

supremum

infimum

supremum

3ABW

3KHM

infimum

supremum

Kabul                              AFGKabol

Qandahar                           AFGQandahar

As you can see, you can read data from there without any issue using the strings Linux command for example. Now, let’s see how to encrypt it.

Generate an encryption keys using the openssl rand command:

$ mkdir -p /etc/mysql/encryption

$ for i in {1..4}; do openssl rand -hex 32 >> /etc/mysql/encryption/keyfile;  done;

Edit the generated file /etc/mysql/encryption/keyfile and add the key IDs which will be referenced when creating encrypted tables. The format should be as follows:

<encryption_key_id1>;<hex-encoded_encryption_key1>

<encryption_key_id2>;<hex-encoded_encryption_key2>

You can edit it using the sed linux command in this way:

$ for i in {1..4}; do sed -i -e "$i s/^/$i;/" keyfile; done

So the file should be something like this:

$ cat /etc/mysql/encryption/keyfile

1;f237fe72e16206c0b0f6f43c3b3f4accc242564d77f5fe17bb621de388c193af

2;0c0819a10fb366a5ea657a71759ee6a950ae8f25a5ba7400a91f59b63683edc5

3;ac9ea3a839596dbf52492d9ab6b180bf11a35f44995b2ed752c370d920a10169

4;72afc936e16a8df05cf994c7902e588de0d11ca7301f9715d00930aa7d5ff8ab

Now, generate a random password using the similar openssl command that you saw earlier:

$ openssl rand -hex 128 > /etc/mysql/encryption/keyfile.key

Before proceeding to the next step, it is important to know the following details about encrypting the key file:

  • The only algorithm that MariaDB currently supports to encrypt the key file is Cipher Block Chaining (CBC) mode of Advanced Encryption Standard (AES).
  • The encryption key size can be 128-bits, 192-bits, or 256-bits.
  • The encryption key is created from the SHA-1 hash of the encryption password.
  • The encryption password has a max length of 256 characters.

Now, to encrypt the key file using the openssl enc command, run the following command:

$ openssl enc -aes-256-cbc -md sha1 -pass file:/etc/mysql/encryption/keyfile.key -in /etc/mysql/encryption/keyfile -out /etc/mysql/encryption/keyfile.enc

Finally, you need to add the following parameters in your my.cnf configuration file (located in /etc/ on RedHat-based OS or /etc/mysql/ on Debian-Based OS):

[mysqld]

…

#################### DATABASE ENCRYPTION ####################

plugin_load_add = file_key_management

file_key_management_filename = /etc/mysql/encryption/keyfile.enc

file_key_management_filekey = FILE:/etc/mysql/encryption/keyfile.key

file_key_management_encryption_algorithm = aes_cbc

encrypt_binlog = 1



innodb_encrypt_tables = ON

innodb_encrypt_log = ON

innodb_encryption_threads = 4

innodb_encryption_rotate_key_age = 0 

…

And restart the MariaDB service to take the changes:

$ systemctl restart mariadb

At this point, everything is ready to use the encryption feature. Let’s encrypt the same table that we showed earlier, “city”. For this, you need to use the ALTER TABLE statement setting the ENCRYPTED parameter in YES:

MariaDB [world]> ALTER TABLE city ENCRYPTED=YES;

Query OK, 0 rows affected (0.483 sec)

Records: 0  Duplicates: 0  Warnings: 0

Now, if you try to access the table directly from the file system, you will see something like this:

$ strings city.ibd |head

PU%O

!ybN)b

9,{9WB4

T3uG:

?oiN

,35sz

8g)Q

o(o

q_A1

k=-w

As you can see, the table is unreadable. You can also specify the Encryption Key ID by adding the ENCRYPTION_KEY_ID = <ID> parameter in the MySQL command, where <ID> is the ID number from the keyfile created previously.

New tables will be encrypted by default as we set the innodb_encrypt_tables parameter in ON in the my.cnf configuration file.

MariaDB Data-in-Transit Encryption

MariaDB allows you to encrypt data-in-transit between the server and clients using the Transport Layer Security protocol (TLS), formerly known as Secure Socket Layer or SSL.

First of all, you need to ensure that your MariaDB server was compiled with TLS support. You can verify this by running the following SHOW GLOBAL VARIABLES statement:

MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'version_ssl_library';

+---------------------+----------------------------+

| Variable_name       | Value                      |

+---------------------+----------------------------+

| version_ssl_library | OpenSSL 1.1.1  11 Sep 2018 |

+---------------------+----------------------------+

1 row in set (0.001 sec)

And check if it is not currently in use using the SHOW VARIABLES statement:

MariaDB [(none)]> SHOW VARIABLES LIKE '%ssl%';

+---------------------+----------------------------+

| Variable_name       | Value                      |

+---------------------+----------------------------+

| have_openssl        | YES                        |

| have_ssl            | DISABLED                   |

| ssl_ca              |                            |

| ssl_capath          |                            |

| ssl_cert            |                            |

| ssl_cipher          |                            |

| ssl_crl             |                            |

| ssl_crlpath         |                            |

| ssl_key             |                            |

| version_ssl_library | OpenSSL 1.1.1  11 Sep 2018 |

+---------------------+----------------------------+

10 rows in set (0.001 sec)

You can also verify the SSL status using the status MariaDB command:

MariaDB [(none)]> status

--------------

mysql  Ver 15.1 Distrib 10.4.13-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

Connection id: 22

Current database:

Current user: root@localhost

SSL: Not in use

Current pager: stdout

Using outfile: ''

Using delimiter: ;

Server: MariaDB

Server version: 10.4.13-MariaDB-1:10.4.13+maria~bionic-log mariadb.org binary distribution

Protocol version: 10

Connection: Localhost via UNIX socket

Server characterset: latin1

Db     characterset: latin1

Client characterset: utf8

Conn.  characterset: utf8

UNIX socket: /var/lib/mysql/mysql.sock

Uptime: 4 hours 28 min 25 sec

Threads: 11  Questions: 111668  Slow queries: 0  Opens: 92  Flush tables: 1  Open tables: 85  Queries per second avg: 6.933

--------------

How to Configure Data-in-Transit Encryption on MariaDB

Let’s create the certs directory to store all the certificates:

$ mkdir -p /etc/mysql/certs

Now, let’s generate the CA certificates that will be configured to encrypt the connection:

$ openssl genrsa 2048 > ca-key.pem

$ openssl req -new -x509 -nodes -days 365000 -key ca-key.pem -out ca-cert.pem

This last command will ask you to complete the following information:

Country Name (2 letter code) [AU]:

State or Province Name (full name) [Some-State]:

Locality Name (eg, city) []:

Organization Name (eg, company) [Internet Widgits Pty Ltd]:

Organizational Unit Name (eg, section) []:

Common Name (e.g. server FQDN or YOUR name) []:

Email Address []:

Now, you need to generate the server certificates:

$ openssl req -newkey rsa:2048 -nodes -keyout server-key.pem -out server-req.pem

This command will ask you to fill the same information that before plus an optional certificate password.

$ openssl rsa -in server-key.pem -out server-key.pem

$ openssl x509 -req -in server-req.pem -days 365000 -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out server-cert.pem

And finally, you need to generate the client certificates:

$ openssl req -newkey rsa:2048 -nodes -keyout client-key.pem -out client-req.pem

This will also ask you to complete the information and an optional certificate password.

$ openssl rsa -in client-key.pem -out client-key.pem

$ openssl x509 -req -in client-req.pem -CA ca-cert.pem -CAkey ca-key.pem -set_serial 01 -out client-cert.pem

Make sure you’re using a different Common Name on each certificate, otherwise it won’t work and you will receive a message like:

ERROR 2026 (HY000): SSL connection error: self signed certificate

At this time, you will have something like this:

$ ls /etc/mysql/certs/

ca-cert.pem  ca-key.pem  client-cert.pem  client-key.pem  client-req.pem  server-cert.pem  server-key.pem  server-req.pem

And you can validate the certificates using the following command:

$ openssl verify -CAfile ca-cert.pem server-cert.pem client-cert.pem

server-cert.pem: OK

client-cert.pem: OK

So now let’s configure it in the my.cnf configuration file (located in /etc/ on RedHat-based OS or /etc/mysql/ on Debian-Based OS):

[mysqld]

ssl_ca=/etc/mysql/certs/ca-cert.pem

ssl_cert=/etc/mysql/certs/server-cert.pem

ssl_key=/etc/mysql/certs/server-key.pem



[client-mariadb]

ssl_ca =/etc/mysql/certs/ca-cert.pem

ssl_cert=/etc/mysql/certs/client-cert.pem

ssl_key=/etc/mysql/certs/client-key.pem

Make sure you are adding it under the corresponding section (mysqld and client-mariadb).

Change the certificate’s owner and restart the database service:

$ chown mysql.mysql /etc/mysql/certs/

$ systemctl restart mariadb

After this, if you take a look at the SHOW VARIABLES output, you should have this:

MariaDB [(none)]> SHOW VARIABLES LIKE '%ssl%';

+---------------------+----------------------------------+

| Variable_name       | Value                            |

+---------------------+----------------------------------+

| have_openssl        | YES                              |

| have_ssl            | YES                              |

| ssl_ca              | /etc/mysql/certs/ca-cert.pem     |

| ssl_capath          |                                  |

| ssl_cert            | /etc/mysql/certs/server-cert.pem |

| ssl_cipher          |                                  |

| ssl_crl             |                                  |

| ssl_crlpath         |                                  |

| ssl_key             | /etc/mysql/certs/server-key.pem  |

| version_ssl_library | OpenSSL 1.1.1  11 Sep 2018       |

+---------------------+----------------------------------+

10 rows in set (0.001 sec)

Now, let’s create an user with the REQUIRE SSL parameter to use it:

MariaDB [(none)]> GRANT ALL PRIVILEGES ON *.* TO 's9s'@'%' IDENTIFIED BY 'root123' REQUIRE SSL;

Query OK, 0 rows affected (0.005 sec)

If you use this user to access the database, and check the status command, you will see the SSL in use:

MariaDB [(none)]> status

--------------

mysql  Ver 15.1 Distrib 10.4.13-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

Connection id: 15

Current database:

Current user: s9s@127.0.0.1

SSL: Cipher in use is TLS_AES_256_GCM_SHA384

Current pager: stdout

Using outfile: ''

Using delimiter: ;

Server: MariaDB

Server version: 10.4.13-MariaDB-1:10.4.13+maria~bionic-log mariadb.org binary distribution

Protocol version: 10

Connection: 127.0.0.1 via TCP/IP

Server characterset: latin1

Db     characterset: latin1

Client characterset: utf8

Conn.  characterset: utf8

TCP port: 3306

Uptime: 16 sec

Threads: 11  Questions: 136  Slow queries: 0  Opens: 17  Flush tables: 1  Open tables: 11  Queries per second avg: 8.500

--------------

How to Enable SSL Encryption with ClusterControl

Another way, and even an easier way, to enable SSL on your MariaDB database is by using ClusterControl. We will assume you have ClusterControl installed and you are managing your MariaDB database using it, so go to ClusterControl -> Select your MariaDB Cluster -> Security -> SSL Encryption -> Enable.

MariaDB Server Database Encryption Basics

And that’s it, you will have your SSL encryption enabled in your MariaDB database without any manual task.

At-Rest Encryption Limitations in MariaDB

There are some limitations related to MariaDB at-rest encryption to take into account:

  • Metadata (for example .frm files) and data sent to the client are not encrypted.
  • Only the MariaDB server knows how to decrypt the data, in particular
    • mysqlbinlog can read encrypted binary logs only when --read-from-remote-server is used.
    • Percona XtraBackup can’t back up instances that use encrypted InnoDB. However, Mariabackup can backup encrypted instances.
  • The disk-based Galera gcache is not encrypted in the community version of MariaDB Server, however, this file is encrypted in MariaDB Enterprise Server 10.4.
  • The Audit plugin can’t create encrypted output. Send it to syslog and configure the protection there instead.
  • File-based general query log and slow query log can’t be encrypted.
  • The Aria log is not encrypted. This affects only non-temporary Aria tables.
  • The MariaDB error log is not encrypted. The error log can contain query text and data in some cases, including crashes, assertion failures, and cases where InnoDB/XtraDB write monitor output to the log to aid in debugging. It can be sent to syslog too if needed.

Conclusion

Protecting data-in-transit is as important as protecting data-at-rest, and even if it is not a must in your organization, you should consider applying it as it can help you to avoid data theft or unauthorized access.

MariaDB has a pretty easy way to implement it following the steps mentioned earlier, but it is even easier using ClusterControl for sure.

Tips for Monitoring MariaDB Replication with ClusterControl

$
0
0

MariaDB replication is one of the most popular high availability solutions for MariaDB and widely used by top companies like Booking.com and Google. It is very easy to set up, with some trade-offs on the ongoing maintenance like software upgrades, schema changes, topology changes, failover and recovery which have always been tricky. Nevertheless, with the right toolset, you should be able to handle the topology with ease. In this blog post, we are going to look into some tips to monitor MariaDB replication efficiently using ClusterControl.

Using the Topology Viewer

A replication setup consists of a number of roles. A node in a replication setup could be a:

  • Master - The primary writer/reader.
  • Backup master - A read-only slave with semi-sync replication, solely for master redundancy.
  • Intermediate master - Replicate from a master, while other slaves replicate from this node.
  • Binlog server - Only collect/store binlogs without serving data.
  • Slave - Replicate from a master, and commonly set as read-only.
  • Multi-source slave - Replicate from multiple masters.

Every role has its own responsibility and limitation and one must understand the correct topology when dealing with the database nodes. This is also true for the application as well, where the application has to write only to the master node at any given time. Thus, it's important to have an overview on which node is holding which role, so we don't screw up our database.

In ClusterControl, the Topology Viewer can give you an overview of the replication topology and its state, as shown in the following screenshot:

MariaDB Replication Topology Viewer

ClusterControl understands MariaDB replication and is able to visualize the topology with the correct replication data flow, as represented by the arrows pointed to the slave nodes. We can easily distinguish which node is the master, slaves and load balancers (MaxScale) in our replication setup. The green box indicates all the important services are running as expected with the assigned role.

Consider the following screenshot where a number of our nodes are having problems:

MariaDB Replication Topology Viewer

ClusterControl will immediately tell you what is wrong with the current topology. One of the slaves (red box) is showing "Slave IO Running" as No, to indicate some connectivity issue to replicate from the master. While the yellow box shows our MaxScale service is not running. We can also tell the MaxScale versions are not identical for both nodes. You can also perform management tasks by clicking on the gear icon (top right on every box) directly which reduces the risks of picking up a wrong node.

Replication Lag

This is the most important thing if you rely on data replication consistency. Replication lag occurs when the slaves cannot keep up with the updates happening on the master. Unapplied changes accumulate in the slaves' relay logs and the version of the database on the slaves becomes increasingly different from the master.

In ClusterControl, you can find the replication lag histogram under Overview -> Replication Lag where ClusterControl constantly samples the Seconds_Behind_Master value from "SHOW SLAVE STATUS" output:

Replication lag happens when either the I/O Thread or SQL Thread cannot cope with the demands placed upon it. If the I/O Thread is suffering, this means that the network connection between the master and its slaves is slow or having problems. You might want to consider enabling the slave_compressed_protocol to compress network traffic or report to your network administrator.

If it's the SQL thread then the problem is probably due to poorly-optimized queries that are taking the slave too long to apply. There may be long-running transactions or too much I/O activity. Having no primary key on the slave tables when using the ROW or MIXED replication format is also a common cause of lag on this thread. Check that the master and slave versions of tables have a primary key.

Some more tips and tricks are covered in this blog post, How to Reduce Replication Lag in Multi-Cloud Deployments.

Binary/Relay Log Size

It's important to monitor the binary and relay logs disk size because it could consume a considerable amount of storage on every node in a replication cluster. Commonly, one would set the expire_logs_days system variable to expire binary log files automatically after a given number of days, for example, expire_logs_days=7. The size of binary logs is totally dependent on the number of binary events created (incoming writes) and little that we know how much disk space it would consume before the logs are going to be expired by MariaDB. Keep in mind if you enable log_slave_updates on the slaves, the size of logs will be almost doubled because of the existence of both binary and relay logs on the same server.

For ClusterControl, we can set a disk space utilization threshold under ClusterControl -> Settings -> Thresholds to get a warning and critical notifications as below:

ClusterControl monitors all disk space related to MariaDB services like the location of MariaDB data directory, the binary logs directory and also the root partition. If you have reached the threshold, consider purging the binary logs manually by using the PURGE BINARY LOGS command, as explained and discussed in this article.

Enable Monitoring Dashboards

ClusterControl provides two monitoring options to sample the database nodes - agentless or agent-based. The default is agentless where sampling happens via SSH in a pull-only mechanism. Agent-based monitoring requires a Prometheus server to be running, and all monitored nodes to be configured with at least three exporters:

  • Process exporter (port 9011)
  • Node/system metrics exporter (port 9100)
  • MySQL/MariaDB exporter (port 9104)

To enable the agent-based monitoring dashboard, one has to go to ClusterControl -> Dashboards -> Enable Agent Based Monitoring. Once enabled, you will see a set of dashboards configured for our MariaDB replication which gives us a much better insight on our replication setup. The following screenshot shows what you would see for the master node:

Apart from MariaDB standard monitoring dashboards like general, caches and InnoDB metrics, you will be presented with a replication dashboard. For the master node, we can get a lot of useful information regarding the state of the master, the write throughput and binlog creation frequency. 

While for the slaves, all the important states are sampled and summarized as the following screenshot. if everything is green, you are in good hands:

Understanding the MariaDB Error Log

MariaDB logs its important events inside the error log, which is useful to understand what was going on with the server, especially before, during and after a topology change. ClusterControl provides a centralized view of error logs under ClusterControl -> Logs -> System Logs by pulling them from every database node. You click on "Refresh Logs" to trigger a job to pull the latest logs from the server. 

Collected files are represented in a navigation tree structure and a text area with syntax highlighting for better readability:

From the above screenshot, we can understand the sequence of events and what happened to this node during a topology change event. From the last 12 lines of the error log above, the slave had an error once connecting to the master and the last binary log file and position were recorded in the log before it stopped. Then a newer CHANGE MASTER command was executed with GTID information, as shown in the line "Previous Using_Gtid=No. New Using_Gtid=Slave_Pos" and then the replication resumes as what we wanted.

MariaDB Alert and Notifications

Monitoring is incomplete without alerts and notifications. All events and alarms generated by ClusterControl can be sent to the email or any other supported third-party tools. For email notifications, one can configure whether the type of events will be delivered immediately, ignored or digested (a daily summarized report):

For all critical severity events, it's recommended to set everything to "Deliver" so you will get the notifications as soon as possible. Set "Digest" to warning events so you are well aware of the cluster health and state.

You can integrate your preferred communication and messaging tools with ClusterControl by using the Notifications Management feature under ClusterControl -> Integrations -> 3rd Party Notifications. ClusterControl can send alarms and events to PagerDuty, VictorOps, OpsGenie, Slack, Telegram, ServiceNow or any user registered webhooks.

The following screenshot shows all critical events will be pushed to the configured telegram channel for our MariaDB 10.3 Replication cluster:

ClusterControl also supports chatbot integration, where you can interact with the controller service via s9s client right from your messaging tool as shown in this blog post, Automate Your Database with CCBot: ClusterControl Hubot Integration.

Conclusion

ClusterControl offers a complete set of proactive monitoring tools for your database clusters. Do use ClusterControl to monitor your MariaDB replication setup because most of the monitoring features are available for free in the community edition. Don't miss those out!

An Overview of MariaDB Xpand (formerly ClustrixDB)

$
0
0

MariaDB Xpand is a new product from MariaDB. It was formerly known as ClustrixDB which was acquired in September of 2018 by MariaDB Corporation. 

ClustrixDB is no longer available as a separate entity, but is now included as part of MariaDB Enterprise Server. Now called Xpand, it extends MariaDB Enterprise Server with distributed data and transaction processing, transforming it into a distributed SQL database capable of scaling to millions of transactions per second with a shared-nothing architecture. However, Xpand is not an all or nothing, as DBAs can choose to use both replicated and distributed tables. Xpand is good for complex queries and analytics processing as it can perform parallel queries across the available nodes within the cluster.

Basically, Xpand is a shared-nothing architecture and designed as a scale-out SQL database, built from the ground up which originally could run on commodity hardware with automatic data redistribution (so you never need to shard). It has built-in fault tolerance, all accessible by a simple SQL interface and support for business critical MySQL features (replication, triggers, stored routines, etc). It's license is only available as proprietary, so if you want to take advantage of this product, you have to contact MariaDB sales first in order to acquire a valid license.

When To Use MariaDB Xpand

Xpand is designed to handle large volumes of data and that lets you scale your database more efficiently. This means the scaling out of your cluster is done easily and automatically by Xpand itself. Since the release of MariaDB Platform X5, Xpand is already part of the platform provided to the customers as part of the distributed SQL solution. The Xpand smart engine allows customers to scale beyond the InnoDB storage engine's sweet spot of high-performance mixed read/write workloads on a single node with the option of adding scale via replication and employing a highly-available fault-tolerant distributed solution for large-scale workloads.

With Xpand, you have the flexibility to scale on a per table basis. Start by using Xpand for just a single table and expand the usage as your needs grow beyond what a single node can handle. Increase the use of distributed SQL as your enterprise needs grow beyond replication or clustering. When data or query volumes increase to the point of degrading performance, you can use Xpand to distribute tables or the entire database for improved throughput and concurrency. Xpand has built-in high-availability and elasticity, so nodes can be added or removed transparently as needed to scale-out.

Just as with MariaDB ColumnStore, the columnar smart engine, cross engine JOINs are possible (and encouraged) between replicated and distributed tables. Unlike other Distributed SQL implementations that distribute the entire database and have, therefore, significant overhead on smaller tables, MariaDB allows the combined use of InnoDB for replicated small data sets and massive distributed data sets via Xpand.

Unfortunately, there's no formal documentation regarding the state of change from ClustrixDB to MariaDB Xpand, so you might still want to rely on https://docs.clustrix.com/ for documentation regarding how ClustrixDB works. It's also known that GTID is not supported by ClustrixDB, though this might have changed since the release of MariaDB 10.5.

How Does MariaDB Xpand Work?

Deployment using the MariaDB Xpand requires that you have MariaDB Enterprise Servers with the Xpand plugin installed, then the Xpand Nodes running alongside. It's similarly just like how you set up MaxScale and MariaDB Server replication setup for High Availability and you can place MaxScale on top to manage connections and transparently fail over between the frontend Enterprise Server instances with replicated smaller data sets in InnoDB.. It's also recommended that for best performance experience with Xpand, the front-end servers and nodes have to be run on separate physical servers. See the MariaDB Xpand topology architecture below from MariaDB on how this works:

To explain further above, the Xpand splits a number of slices for each table that is built using Xpand. Each slice is stored on a primary node and then replicated to one or more other nodes to ensure fault tolerance. Each Xpand node can perform both reads and writes. And each node has a map of the data distribution.

For read operations, the major part of the query is pushed down to Xpand where the query is evaluated and relevant portions of the query are then sent to the appropriate Xpand nodes. MariaDB Enterprise Server collects the return data from the Xpand nodes to generate a result-set.

For write operations, MariaDB Xpand uses a component called the “rebalancer” to automatically and transparently distribute data across the available Xpand nodes.

MariaDB Xpand as a Distributed SQL

Each Xpand node is able to perform both reads and writes. When a query is received by MariaDB Enterprise Server, it is evaluated by a query optimizer and portions of the query are sent to the relevant Xpand nodes. The results are collected and a single result-set returned to the client.

MariaDB Xpand leverages a shared-nothing architecture; a single node handles each request, and memory and storage are not shared.

MariaDB Xpand HA and Fault Tolerance

MariaDB Xpand is fault tolerant by design. Xpand maintains two replicas of all data using a rebalancer process that runs in the background. Xpand can suffer a single node or zone failure without data loss.

Upon node failure, data is rebalanced from remaining nodes, automatically healing the data protection without intervention. In a zone failure, the rebalancer performs the same operation between nodes and remaining zones.

When the failed node is replaced, the rebalancer redistributes data, restoring MariaDB Xpand to its intended node count.

Horizontal Scale-Out with MariaDB Xpand

MariaDB Xpand is flexible by design. If the load on MariaDB Enterprise Server increases, you can add additional Servers to your deployment, load balancing between them using MariaDB MaxScale. Each Server can connect to the Xpand nodes to access data stored on Xpand tables.

If the load on MariaDB Xpand increases, you can scale out by adding new nodes. When you add an Xpand node to the deployment, the rebalancing process redistributes data from the existing nodes. Once complete, the Xpand node can now handle both read and write operations from MariaDB Enterprise Servers.

If the load on MariaDB Xpand decreases, you can scale down by removing nodes. When you remove an Xpand node from the deployment, the rebalancing process redistributes data to the remaining nodes, ensuring fault tolerance.

What Makes MariaDB Xpand scalable?

There are no bottlenecks and no single points of failure. All processors are enlisted in support of query processing. Queries are parallelized and distributed across the cluster to the relevant data. New nodes are automatically recognized and incorporated into the cluster. Workloads and data are automatically balanced across all nodes in the cluster. Cluster-wide SQL relational calculus and ACID properties eliminate multi-node complexity from the development and management of multi-tiered applications. The complexity commonly required to scale existing db models to handle large volumes of data is eliminated. And as your database grows, just add nodes.

There are several things that affect scalability and performance:

  • Shared-nothing architecture, which eliminates potential bottlenecks. Contrast this with shared-disk / shared-cache architectures that bottleneck, don't scale, and are difficult to manage.
  • Parallelization of queries, which are distributed to the node(s) with the relevant data. Results are created as close to the data as possible, then routed back to the requesting node for consolidation and returned to the client.

This is very different from other systems, which routinely move large amounts of data to the node that is processing the query, then eliminate all the data that doesn't fit the query (typically lots of data). By only moving qualified data across the network to the requesting node, Xpand significantly reduces the network traffic bottleneck. In addition, more processors participate in the data selection process, By selecting data on multiple nodes in parallel, the system produces results more quickly than if all data was selected by a single node, which first has to collect all the required data from the other nodes in the system.

Since each node focuses on a particular partition and sends work items to other nodes rather than requesting raw data from other nodes, each node's cache contains more of that node's data, and less redundant data from other nodes. This means cache hit rates will be much higher, significantly reducing the need for slow disk accesses.

Deploying MariaDB Xpand

There are two separate MariaDB Xpand deployments in order to start using the MariaDB Xpand. Xpand deployments consist of MariaDB Enterprise Server instances, called the front-end servers, having the Xpand plugin installed, then the Xpand Nodes are running alongside with these front-end servers. For the best performance, the Enterprise Server and the Xpand node can be installed on separate physical servers.

  1. You need to set up the MariaDB Xpand Node. Xpand nodes are configured in a deployment to provide the storage back-end for MariaDB Enterprise Servers with the Xpand storage engine plugin. Servers store data for Xpand tables on Xpand nodes rather than the local file system.  Installing the Xpand Node requires a license, which is a JSON object,  and you can only acquire by reaching out to MariaDB Sales. The installation process is not as quick as just a single command or click so we suggest you go to their installation guide for the Xpand Node.
  2. Deploy a front-end Server. As what I've noticed here over the changes they made, it looks like the most recommended way to use Xpand is using MariaDB Enterprise Server 10.5. The Xpand 

MariaDB Xpand Hardware Compatibility

If you're curious about its hardware compatibility, the MariaDB Platform can run in a variety of environments. As long as your MariaDB servers can run or hosted on the environments you are currently using, as long as you are able to set up the Xpand Nodes alongside with the MariaDB servers and have Xpand plugins installed, then this will definitely work. From their documentation, the list of Physical and Cloud Environments are listed below:

  • On-premises (on-prem)
  • Collocated (colo)
  • Private Cloud
  • Public Cloud
  • Hybridized

For the hardware architecture, it's worth noting that as of MariaDB Enterprise Server 10.4.10-4 (2019-11-18), MariaDB Enterprise Server supports only x86_64 hardware architecture platforms.

Conclusion

MariaDB Xpand simplifies efficiency and expandability in a very convenient fashion. The most appealing aspect of this product is that you can use MariaDB’s standard SQL functions as well. It can be embedded through your existing MariaDB environment, which can take advantage of its features and scalability. Although that may be enticing, it requires special licensing and large fees in order for you to leverage this product. If it serves a purpose for your enterprise application, then this MariaDB Xpand might be worth a try.

 

Comparing MariaDB Enterprise Backup to ClusterControl Backup Management

$
0
0

MariaDB Enterprise Backup is a backup solution from MariaDB Corporation with a number of features such as non-blocking backups, full backup, incremental backup, partial backup and Point in Time Recovery.

We often get questions about the differences between MariaDB Backup and ClusterControl’s backup management features. So this is what this blog is about.

Creating backups vs managing them

MariaDB Backup is a fork of Percona XtraBackup, and is a tool to take physical backups of MariaDB server. It allows you to do things like full, incremental, partial backups. One can also perform point in time recovery with the help of binary logs. According to the documentation, the ‘Enterprise’ version of MariaDB backup provides “DDL statement tracking, which reduces lock-time during backups”.

ClusterControl supports MariaDB Backup as a backup method for MariaDB. It provides a graphical user interface to schedule full backups, incremental backups and partial backups, and perform recovery of backup files or also automates point in time recovery. In addition, ClusterControl provides features like encryption, compression, upload to the cloud storage (Azure, AWS, Google Cloud) and automatic verification of backups to ensure they are recoverable. 

Full Backup and Restore

To perform full backup using MariaDB Enterprise Backup, you can use mariabackup command utilities. There are 4 parameter inputs after the mariabackup command. The parameter are :

  • Backup - this is used for backup the database using mariabackup utilities. 
  • Prepare - to make a point-in-time consistent backup, you need to prepare the backup after the raw backup was executed.
  • Copy-back - used to restore the extracted backup to the default data directory of mysql. It will copy the backup to the mysql directory, without removing the original backup.
  • Move-back -  used to restore the extracted backup to the mysql data directory by moving all backup directories.

If you want to backup and restore, you just pass the mandatory parameter after mariabackup command. For a full backup command, below is a sample script using MariaDB Backup.

mariabackup --backup --target-dir=/backup/full/ --user=bkpuser --password=p4sswordb4ckup

There are some options you need to define, such as --target-dir, which is the target location for backup files, --user, used for credential users for backup, and --password for the credential backup password. 

To make the backup become point-int-time consistent, you must run the prepare after the full backup is finished. The data files are not consistent until you run the prepare, it is because when you run the backup, the data files were copied at different points in time during the backup.

To run prepare backup:

mariabackup --prepare --target-dir=/backup/full

After you run prepare, it will make the backup ready to be restored. You will see the message on the last line as below,  when the prepare was successful.

InnoDB: Shutdown completed; log sequence number 9553231

You can run the restore command using copy-back. Here is the sample script to restore the backup:

mariabackup --copy-back --target-dir=/backup/full

You can put the above script in a shell script command and give executable permission, configure it on the operating system scheduler.

Backup and Restore using ClusterControl Backup Management is very easy to use. ClusterControl supports logical backup and physical backup. For logical backup, ClusterControl uses mysqldump and for physical backup uses mariabackup full backup and incremental. 

There are two options on how you want to do the backup; you can create the backup directly or you can schedule the backup.

You can also enable some options like encryption, compression, parallel copy thread as shown below :

Restoring the backup is as easy as the backup was created. You just need to select the full backup file that you want to restore.

There are two options on how you want to restore the backup; you can restore the backup to the nodes where the backup was taken or you can restore the backup to a dedicated standalone host.

Incremental Backup and Restore

Taking a full backup of a very large database will be time consuming and resource intensive. Incremental backup is used to perform backup of the changes after the last full backup was taken. 

When incremental backup is running, MariaDB Enterprise Backup will compare previous full backup or incremental backup to find the last changes.

mariabackup --backup --incremental-basedir=/backup/full --target-dir=/backup/incr --user=bkpuser  --password=p4sswordb4ackup

Before you perform the incremental backup, you need to ensure that full backup has been prepared. After that, you can run the incremental backup, applying to the last full backup.

mariabackup --prepare  --target-dir=/backup/full --incremental-dir=/backup/incr

After the incremental backup has been applied to the full backup, the full backup directory will now have all the backup data prepared.

Restoring the prepared full backup with all the incremental changes can be done through :

mariabackup --copy-back --target-dir=/backup/full

To perform incremental backup in ClusterControl, you can choose the mariabackup incremental. You need to have the full prepared backup before doing the incremental backup.

ClusterControl will automatically find the nearest full backup when you run the incremental backup. And for restoring the backup, you can choose the full prepared backup and restore. It will prompt you how you want to restore the backup, either on the node or standalone host. It will restore the backup including incremental changes.

Partial Backup and Restore

Partial backup specifies which database or table you want to backup. You can either choose a list of databases and tables to back up, or you can exclude some of databases and tables from the backup. The options include : --databases, --databases-exclude, --tables, --tables-exclude

Below is a sample script to do the partial backup, for the card_data table.

mariabackup --backup --target-dir=/backup/partial --user=bkpuser --password=p4sswordb4ckup --tables=card_data

You still need to prepare the full partial backup to make the backup point-in-time consistent by running the below command : 

mariabackup --prepare --export --target-dir=/backup/partial

Performing partial restore is very different compared to restoring full backup and incremental backup. You need to prepare the tables and database in the running MariaDB Server, and then manually copy the data files into mysql data directory.

For example, you want to do a partial restore for the card_data table (non-partitioned table).

  • Create the empty table of card_data with the same structure in the target database 
  • Run the DISCARD tablespace on the table card_data.
    ALTER TABLE carddb.card_data DISCARD TABLESPACE;
  • Copy the data files into mysql data directory 
    cp /backup/partial/carddb/card_data.* /var/lib/mysql/carddb
  • Change the owner of files become mysql
    chown mysql:mysql /var/lib/mysql/carddb/card_data.*
  • Last thing, import the tablespace: 
    ALTER TABLE carddb.card_data IMPORT TABLESPACE;

Partial Backup in ClusterControl is really straightforward, you just need to enable the Partial Backup option. It will give you the option to include or exclude database and tables as shown below :

The next part is similar to the full backup and incremental backup, you can choose settings like encryption and compression.

Restoring the partial backup is exactly the same as when we restore full backup. You just need to choose the partial backup, and the rest will be handled by ClusterControl.

Point in Time Recovery

Restoring the full backup or incremental backup does give you a backup from the time the backup was taken, but it does not give you any data that came after the backup was taken. These changes would be in the binary log. When you perform the prepared backup with binlog enabled, there will be a file called xtrabackup_binlog_info. The file contains a binary log file and position of the last sequence number.

You can perform the point in time recovery by extracting the changes to SQL, like after the restore has been done. You can run mysqlbinlog to extract the specific time in the source database node, and apply the SQL in the target/restored database node.

Point in Time Recovery (PITR) in ClusterControl can be enabled as shown below:

You need to define until what point to recover, there are two options supported which are time based or position based. For time based, you just need to fill the exact time when the data will be restored. For the position based, you need to fill the binlog name and position. The rest of the restore is similar.

Conclusion

That’s it for now. As we have seen above, MariaDB Backup is a nice tool with lots of options. ClusterControl provides an easy to use GUI to perform the backup procedures. It also adds a number of features like encryption, compression, scheduling, retention management and automatic backup verification.

MySQL Storage Engine Optimization: Configuring InnoDB Optimization For High Performance

$
0
0

InnoDB is one of the most widely used storage engines in MySQL. This storage engine is known as a high-reliability and a high-performance storage engine and its key advantages include supporting row-level locking, foreign keys and following the ACID model. InnoDB replaces MyISAM as the default storage engine since MySQL 5.5, which was released in 2010.

This storage engine can be incredibly performant and powerful if optimized properly - today we’re taking a look at the things we can do to make it perform at the very best of its ability, but before we dive into InnoDB though, we should understand what the aforementioned ACID model is.

What is ACID and Why is it Important?

ACID is a set of properties of database transactions.The acronym translates to four words: Atomicity, Consistency, Isolation and Durability. In short, these properties ensure that database transactions are processed reliably and warrant data validity despite errors, power outages or any such issues. A database management system that adheres to these principles is said to be an ACID-compliant DBMS. Here’s how everything works in InnoDB:

  • Atomicity ensures that the statements in a transaction operate as an indivisible unit and that their effects are seen collectively or not at all;
  • Consistency is handled by MySQL’s logging mechanisms which record all changes to the database;
  • Isolation refers to InnoDB’s row-level locking;
  • Durability is also maintained because InnoDB maintains a log file that tracks all changes to the system.

Understanding InnoDB

Now that we have covered ACID, we should probably look at how InnoDB looks under the hood. Here’s how InnoDB looks like from the inside (image courtesy of Percona):

InnoDB Internals
InnoDB Internals

From the image above we can clearly see that InnoDB has a few parameters crucial to its performance and these are as follows:

  • The innodb_data_file_path parameter describes the system tablespace (the system tablespace is the storage area for the InnoDB data dictionary, the double write and change buffers and undo logs). The parameter depicts the file where data derived from InnoDB tables will be stored;
  • The innodb_buffer_pool_size parameter is a memory buffer that InnoDB uses to cache data and indexes of its tables;
  • The innodb_log_file_size parameter depicts the size of InnoDB log files;
  • The innodb_log_buffer_size parameter is used to write to the log files on disk;
  • The innodb_flush_log_at_trx_commit parameter controls the balance between strict ACID compliance and higher performance;
  • The innodb_lock_wait_timeout parameter is the length of time in seconds an InnoDB transaction waits for a row lock before giving up;
  • The innodb_flush_method parameter defines the method used to flush data to InnoDB data files and log files which can affect I/O throughput.

InnoDB also stores the data from its tables in a file called ibdata1 - the logs however are stored in two separate files named ib_logfile0 and ib_logfile1: all of those three files reside in the /var/lib/mysql directory. 

In order to make InnoDB as performant as possible, we must fine tune these parameters and optimize them as much as we can by looking at our available hardware resources.

Tuning InnoDB For High Performance

In order to adjust InnoDB’s performance on your hardware, follow these steps:

  • In order to extend innodb_data_file_path automatically, specify the autoextend attribute in the setting and restart the server. For example:

innodb_data_file_path=ibdata1:10M:autoextend

When the autoextend parameter is used, the data file automatically increases in size by 8MB increments each time space is required. A new auto-extending data file can also be specified like so (in this case, the new data file is called ibdata2):

innodb_data_file_path=ibdata1:10M;ibdata2:10M:autoextend
  • When using InnoDB, the main mechanism used is the buffer pool. InnoDB heavily relies on the buffer pool and as a rule of thumb, the innodb_buffer_pool_size parameter should be about 60% to 80% of the total available RAM on the server. Keep in mind that you should leave some RAM for the processes running in the OS as well;

  • InnoDB’s innodb_log_file_size should be set as big as possible, but not bigger than necessary. In this case, keep in mind that a bigger log file size is better for performance, but the bigger it is, the more recovery time after a crash is required. As such, there is no “one size fits all” solution, but it’s said that the combined size of the log files should be large enough. This helps the MySQL server from regularly working on checkpointing and disk flushing activity. This saves too much CPU and disk IO and can run smoothly during its peak time or high workload activity. Although the recommended approach is to test and experiment it yourself and find the optimal value yourself;

  • The innodb_log_buffer_size value should be set to at least 16M. A large log buffer allows large transactions to run without a need to write the log to disk before the transactions commit saving some disk I/O;

  • When tuning innodb_flush_log_at_trx_commit, keep in mind that this parameter accepts three values - 0, 1 and 2. With a value of 1 you get ACID compliance and with values 0 or 2 you get more performance, but less reliability because in that case transactions for which logs have not yet been flushed to disk can be lost in a crash;

  • In order to set innodb_lock_wait_timeout to a proper value, keep in mind that this parameter defines the time in seconds (the default value is 50) before issuing the following error and rolling back the current statement:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
  • In InnoDB, there are multiple flush methods available. By default this setting is set to “async_unbuffered” on Windows machines if the value is set to NULL and to “fsync” in Linux machines. Here’s what the methods are and what they do:

InnoDB Flush Method

Purpose

normal

InnoDB will use simulated asynchronous I/O and buffered I/O.

unbuffered

InnoDB will use simulated asynchronous I/O and non-buffered I/O.

async_unbuffered

InnoDB will use Windows asynchronous I/O and non-buffered I/O. Default settings on Windows machines.

fsync

InnoDB will use the fsync() function to flush the data and the log files. Default setting on Linux machines.

O_DSYNC

InnoDB will use O_SYNC to open and flush the log files and the fsync()function to flush the data files. O_DSYNC is faster than O_DIRECT, but data may or may not be consistent due to latency or an outright crash.

nosync

Used for internal performance testing - unsupported.

littlesync

Used for internal performance testing - unsupported.

O_DIRECT

InnoDB will use O_DIRECT to open the data files and the fsync()function to flush both the data and the log files. In comparison with O_DSYNC, O_DIRECT is more stable and more data consistent, but slower. The OS cache will be avoided using this setting - this setting is the recommended setting on Linux machines.

O_DIRECT_NO_FSYNC

InnoDB will use O_DIRECT during flushing I/O - the “NO_FSYNC” part defines that the fsync() function will be skipped.

 
  • You should also consider enabling innodb_file_per_table setting. This parameter is ON by default in MySQL 5.6 and higher. This parameter relieves you of management issues relating to InnoDB tables by storing them in separate files and avoiding bloated main dictionaries and system tables. Enabling this variable also avoids from facing data recovery complexity when a certain table is corrupted
  • Now that you modified these settings per the instructions outlined above, you should be almost ready to go! Before you hit the ground running though, you should probably keep an eye on the busiest file in the entire InnoDB infrastructure - the ibdata1.

Dealing with ibdata1

There are several classes of information that are stored in ibdata1:

  1. The data of InnoDB tables;
  2. The indexes of InnoDB tables;
  3. InnoDB table metadata;
  4. Multiversion Concurrency Control (MVCC) data;
  5. The doublewrite buffer - such a buffer enables InnoDB to recover from half-written pages. The purpose of such a buffer is to prevent data corruption;
  6. The insert buffer - such a buffer is used by InnoDB to buffer updates to the same page so they can be performed at once and not one after another.

When dealing with big data sets, the ibdata1 file can get extremely large and this can be the core of a very frustrating problem - the file can only grow and by default, it cannot shrink. You can shut down MySQL and delete this file but this is not recommended unless you know what you are doing. When deleted, MySQL will not function properly as the dictionary and system tables are gone, thus the main system table is corrupted. 

In order to shrink ibdata1 once and for all, follow these steps:

  1. Dump all data from InnoDB databases. You can use mysqldump or mysqlpump for this action;
  2. Drop all databases except for the mysql, performance_schema and information_schema databases;
  3. Stop MySQL;
  4. Add the following to your my.cnf file:
    [mysqld]
    innodb_file_per_table = 1
    innodb_flush_method = O_DIRECT
    innodb_log_file_size = 25% of innodb_buffer_pool_size
    innodb_buffer_pool_size = up to 60-80% of available RAM.
  5. Delete the ibdata1 and ib_logfile* files (these will be recreated upon the next restart of MySQL);
  6. Start MySQL and restore the data from the dump you took before.  After performing the steps outlined above, the ibdata1 file will still grow, but it will no longer contain the data from InnoDB tables - the file will only contain metadata and each InnoDB table will exist outside of ibdata1. Now, if you go to the /var/lib/mysql directory, you will see two files representing each table you have with the InnoDB engine. The files will look like so:
    1. demotable.frm
    2. demotable.ibd

The .frm file contains the storage engine header and the .ibd file contains the table data and indexes of your table.

Before rolling out the changes though, make sure to fine-tune the parameters according to your infrastructure. These parameters can make or break InnoDB performance so make sure to keep an eye on them at all times. Now you should be good to go!

Summary

To summarize, optimizing the performance of InnoDB can be a great benefit if you develop applications that require data integrity and high performance at the same time - InnoDB allows you to change how much memory the engine is allowed to consume, to change the log file size, the flush method the engine uses and so on - these changes can make InnoDB perform extremely well if they are tuned properly. Before performing any enhancements though, beware of the consequences of your actions to both your server and MySQL.

As always, before optimizing anything for performance always take (and test!) backups so you can restore your data if necessary and always test any changes on a local server before rolling out the changes to production.

Building a Highly Available Database for Moodle Using MariaDB (Replication & MariaDB Cluster)

$
0
0

Face-to-face meetings, nowadays, are limited to the bare minimum, online activities have taken over as the main way for teacher - student interaction. It increased the stress on the existing online “meeting” platforms (is there anyone who does not know what Zoom is nowadays?) but also on online learning platforms. The high availability of the online tools is more important than ever and the operation teams rush to build durable, highly available architectures for their environments.

Most likely at least some of you have used Moodle - it is a standalone online learning platform that you can deploy on premises and use it to deliver online training for your organization. As we mentioned, it is as important as ever to make it work in a durable, highly available fashion. We would like to propose a highly available solution that involves MariaDB as a backend database - both asynchronous replication and Galera Cluster.

Environment Design Process

We would like to start with a process where we would explain the thought process behind designing the environment for Moodle. We want high availability therefore a single database node does not work for us. We want multiple nodes and this leads us to the first design decision. Should we use asynchronous replication or Galera Cluster? Second question is: how will we distribute the workload across the nodes? Let’s start with the second one.

The latest Moodle version at the time when this blog has been written (3.9) introduced a nice feature called safe reads. The problem to solve here is read after write. When you use one node, the world is a simple place. You write and then you read. Whatever you wrote is there already. When you add nodes, though, things change. In asynchronous replication slaves may be lagging behind even tens of seconds or more. Whatever you write on the master may take even minutes (if not more in the more extreme cases) to be applied to the slave. If you execute a write and then immediately attempt to read the same data from one of the slaves, you may be up to a nasty surprise - data will not be there. Galera cluster uses a “virtually” synchronous replication and in this particular case “virtually” makes a huge difference - Galera is not immune to the read-after-write problems. There’s always delay between write execution on the local node and writeset being applied to remaining nodes of the cluster. Sure, it’s most likely measured in milliseconds rather than seconds but it still may break the assumption that you can immediately read what you wrote. The only place where you can safely read after writing is the node on which you wrote the data.

As Moodle relies on the read-after-write quite a lot, we cannot easily scale reads only by adding more nodes to read from. For Galera Cluster we could attempt to mitigate the issue by using wsrep-sync-wait configuration setting to force Galera to ensure that the reads are safe to execute. This creates the performance impact on the system as all reads have to wait for writes to be applied before they can be executed. This is also a solution for MariaDB Cluster (and other Galera-based solutions), not for asynchronous replication. Luckily, the solution from Moodle solves this issue. You can define a list of nodes that may possibly be lagging and Moodle will use them only for reads that do not require to be up to date with the writes. All remaining reads that require data to be always up to date would be directed to the writer node.So, Moodle’s scalability is sort of limited as only the “safe” reads can be scaled out. We will definitely want to use the 3.9’s feature given that this is the only safe method to determine which select should go where. Given that everything is written in a Moodle’s configuration file, we would most likely want to use a load balancer, preferably ProxySQL, to create logic that would handle our read distribution.

Should we use MariaDB Cluster or asynchronous replication? We will actually show you how to use both. In both cases the configuration for the Moodle will be pretty much the same. In both cases we will utilize ProxySQL as the loadbalancer. The main difference between those solutions is the failover. MariaDB Cluster is way easier to deal with - if one node is down, ProxySQL will simply move the write traffic to one of the remaining nodes. With asynchronous replication things are slightly different though. If the master goes down, failover has to happen. This does not happen automatically, you either have to perform it by hand or you can rely on some software to accomplish that. In our case we will use ClusterControl to manage the environment and perform the failover therefore, from the user standpoint, there’s not much of a difference between asynchronous replication and the MariaDB Cluster - in both cases writer failure will be automatically handled and cluster will automatically recover.

What we have established is that we will showcase both asynchronous and virtually synchronous replication. We will use the safe writes feature from Moodle 3.9 and we will use ProxySQL as the loadbalancer. To ensure high availability we will need more than one ProxySQL instance therefore we will go with two of them and to create a single point of entry into the database layer we will use Keepalived to create a Virtual IP and point it to one of the available ProxySQL nodes. Here’s how our database cluster may look like:

Moodle MariaDB Cluster

For asynchronous replication this could look something like this:

Deploying a Highly Available Database Backend for Moodle Using MariaDB Replication

Let’s start with the MariaDB Replication. We are going to use ClusterControl to deploy the whole database backend including load balancers.

Deploying MariaDB Replication Cluster

At first, we need to pick “Deploy” from the wizard:

Then we should define SSH connectivity, passwordless, key-based SSH access is a requirement for ClusterControl to manage database infrastructure.

When you fill those details, it’s time to pick a vendor and a version, define superuser’s password and decide on some other details.

We are going to use MariaDB 10.4 for now. As a next step we have to define the replication topology:

We should pass the hostnames of the nodes and how they should relate to each other. Once we are happy with the topology, we can deploy. For the purpose of this blog we will use master and two slaves as our backend.

We have our first cluster up and ready. Now, let’s deploy ProxySQL and Keepalived.

Deploying ProxySQL

For ProxySQL it’s required to fill in some details - pick the host to install it on, decide on ProxySQL version, credentials for the administrative and monitoring users. You should also import existing database users or create a new one for your application. Finally, decide which database nodes you want to use with ProxySQL and decide if you use implicit transactions. In the case of Moodle this is not true.

Deploying Keepalived

As the next step we will deploy Keepalived.

After passing details like ProxySQL instances that should be monitored, Virtual IP and the interface VIP should bind to we are ready to deploy. After couple of minutes everything should be ready and the topology should look like below:

Configure Moodle and ProxySQL for Safe Writes Scale-Out

The final step will be to configure Moodle and ProxySQL to use safe writes. While it is possible to hardcode database nodes in the Moodle configuration, it would be much better to rely on ProxySQL to handle the topology changes. What we can do is to create an additional user in the database. That user will be configured in Moodle to execute safe reads. ProxySQL will be configured to send all traffic executed from that user to the available slave nodes.

First, let’s create a user that we’ll use for read-only access.

We are granting all privileges here but it should be possible to limit that list.

User that we just created has to be added to both ProxySQL instances that we have in the cluster in order to allow ProxySQL to authenticate as that user. In the ClusterControl UI you can use the “Import User” action.

We can search for the user that we just created:

ProxySQL uses a concept of hostgroups - groups of hosts that serve the same purpose. In our default configuration there are two hostgroups - hostgroup 10 which always point to current master and hostgroup 20 which points towards slave nodes. We want this user to send the traffic to slave nodes therefore we will assign HG 20 as the default one.

That’s it, the user will be shown on the list of the users:

Now we should repeat the same process on the other ProxySQL node or use the “Sync Instances” option. One way or the other, both ProxySQL nodes should have the moodle_safereads user added.

The last step will be to deploy Moodle. We won’t go here through the whole process, but there’s one issue we have to address. ProxySQL presents itself as 5.5.30 and Moodle complains it is too old. We can edit it easily to whatever version we want:

Once this is done, we have to temporarily send all of the traffic to the master. This can be accomplished by deleting all of the query rules in ProxySQL. The ‘moodle’ user has HG10 as the default hostgroup which means that with no query rules all traffic from that user will be directed to the master. The second, safe reads, user has the default hostgroup 20 which is pretty much all the configuration we want to have in place.

Once this is done, we should edit Moodle’s configuration file and enable the safe reads feature:

<?php  // Moodle configuration file



unset($CFG);

global $CFG;

$CFG = new stdClass();



$CFG->dbtype    = 'mysqli';

$CFG->dblibrary = 'native';

$CFG->dbhost    = '192.168.1.111';

$CFG->dbname    = 'moodle';

$CFG->dbuser    = 'moodle';

$CFG->dbpass    = 'pass';

$CFG->prefix    = 'mdl_';

$CFG->dboptions = array (

  'dbpersist' => 0,

  'dbport' => 6033,

  'dbsocket' => '',

  'dbcollation' => 'utf8mb4_general_ci',

  'readonly' => [

    'instance' => [

      'dbhost' => '192.168.1.111',

      'dbport' => 6033,

      'dbuser' => 'moodle_safereads',

      'dbpass' => 'pass'

    ]

  ]



);



$CFG->wwwroot   = 'http://192.168.1.200/moodle';

$CFG->dataroot  = '/var/www/moodledata';

$CFG->admin     = 'admin';



$CFG->directorypermissions = 0777;



require_once(__DIR__ . '/lib/setup.php');



// There is no php closing tag in this file,

// it is intentional because it prevents trailing whitespace problems!

What happened here is that we added the read only connection to ProxySQL which will use moodle_safereads user. This user will always point towards slaves. This concludes our setup of Moodle for MariaDB replication.

Deploying a Highly Available Database Backend for Moodle Using MariaDB Cluster

This time we’ll try to use MariaDB Cluster as our backend. Again, the first step is the same, we need to pick “Deploy” from the wizard:

Once you do that, we should define SSH connectivity, passwordless, key-based SSH access is a requirement for ClusterControl to manage database infrastructure.

Then we should decide on the vendor, version, password hosts and couple more settings:

Once we fill all the details, we are good to deploy.

We could continue here further but given all further steps are basically the same as with MariaDB replication, we would just ask you to scroll up and check the “Deploying ProxySQL” section and everything that follows it. You have to deploy ProxySQL, Keepalived, reconfigure it, change Moodle’s configuration file and this is pretty much it. We hope that this blog will help you to build highly available environments for Moodle backed by MariaDB Cluster or replication.


Scaling Out the Moodle Database

$
0
0

Moodle is a very popular platform to run online courses. With the situation we see in 2020, Moodle, along with communicators like Zoom forms the backbone of the services that allow online learning and stay-at-home education. The demand put on Moodle platforms significantly increased compared to previous years. New platforms have been built, additional load has been put on the platforms that, historically, only acted as a helper tool and now they are intended to drive the whole educational effort. How to scale out the Moodle? We have a blog on this topic. How to scale the database backend for Moodle? Well, that’s another story. Let’s take a look at it as scaling out databases is not the easiest thing to do, especially if the Moodle adds its own little twist.

As the entry point we will use the architecture described in one of our earlier posts. MariaDB Cluster with ProxySQL and Keepalived on top of things. 

Scaling Out the Moodle Database
Scaling Out the Moodle Database

As you can see, we have a three node MariaDB Cluster with ProxySQL that splits safe reads from the rest of the traffic based on the user. 

<?php  // Moodle configuration file



unset($CFG);

global $CFG;

$CFG = new stdClass();



$CFG->dbtype    = 'mysqli';

$CFG->dblibrary = 'native';

$CFG->dbhost    = '192.168.1.222';

$CFG->dbname    = 'moodle';

$CFG->dbuser    = 'moodle';

$CFG->dbpass    = 'pass';

$CFG->prefix    = 'mdl_';

$CFG->dboptions = array (

  'dbpersist' => 0,

  'dbport' => 6033,

  'dbsocket' => '',

  'dbcollation' => 'utf8mb4_general_ci',

  'readonly' => [

    'instance' => [

      'dbhost' => '192.168.1.222',

      'dbport' => 6033,

      'dbuser' => 'moodle_safereads',

      'dbpass' => 'pass'

    ]

  ]



);



$CFG->wwwroot   = 'http://192.168.1.200/moodle';

$CFG->dataroot  = '/var/www/moodledata';

$CFG->admin     = 'admin';



$CFG->directorypermissions = 0777;



require_once(__DIR__ . '/lib/setup.php');



// There is no php closing tag in this file,

// it is intentional because it prevents trailing whitespace problems!

User, as shown above, is defined in the Moodle configuration file. This allows us to automatically and safely send writes and all SELECT statements that require data consistency to the writer node while still sending some of the SELECTs to the remaining nodes in the MariaDB Cluster.

Let’s assume that this particular setup is not enough for us. What are the options that we have? We have two main elements in the setup - MariaDB Cluster and ProxySQL. We’ll consider issues on both sides:

  • What can be done if the ProxySQL instance cannot cope with traffic?
  • What can be done if MariaDB Cluster cannot cope with traffic?

Let’s start with the first scenario.

ProxySQL Instance is Overloaded

In the current environment only one ProxySQL instance can be handling the traffic - the one that Virtual IP points to. This leaves us with a ProxySQL instance that is acting as a standby - up and running but not used for anything. If the active ProxySQL instance is getting close to CPU saturation, there are a couple of things you may want to do. First, obviously, you can scale vertically - increasing the size of a ProxySQL instance might be the easiest way to let it handle higher traffic. Please keep in mind that ProxySQL, by default, is configured to use 4 threads.

Troubleshooting Moodle Database

If you want to be able to utilize more CPU cores, this is the setting you need to change as well.

Alternatively, you can attempt to scale out horizontally. Instead of using two ProxySQL instances with VIP you can collocate ProxySQL with Moodle hosts. Then you want to reconfigure Moodle to connect to ProxySQL on the local host, ideally through the Unix socket - it is the most efficient way of connecting to ProxySQL. There is not much of a configuration that we use with ProxySQL therefore using multiple instances of ProxySQL should not add too much of the overhead. If you want, you can always setup ProxySQL Cluster to help you to keep the ProxySQL instances in sync regarding the configuration.

MariaDB Cluster is Overloaded

Now we are talking about a more serious issue. Of course, increasing the size of the instances will help, as usual. On the other hand, horizontal scale out is somewhat limited because of the “safe reads” limitation. Sure, you can add more nodes to the cluster but you can use them only for the safe reads. To what extent this lets you scale out, it depends on the workload. For pure read-only workload (browsing through the contents, forums etc) it looks quite nice:

MySQL [(none)]> SELECT hostgroup, srv_host, srv_port, status, queries FROM stats_mysql_connection_pool WHERE hostgroup IN (20, 10) AND status='ONLINE';

+-----------+---------------+----------+--------+---------+

| hostgroup | srv_host      | srv_port | status | Queries |

+-----------+---------------+----------+--------+---------+

| 20        | 192.168.1.204 | 3306     | ONLINE | 5683    |

| 20        | 192.168.1.205 | 3306     | ONLINE | 5543    |

| 10        | 192.168.1.206 | 3306     | ONLINE | 553     |

+-----------+---------------+----------+--------+---------+

3 rows in set (0.002 sec)

This is pretty much a ratio of 1:20 - for one query that hits the writer we have 20 “safe reads” that can be spread across the remaining nodes. On the other hand, when we start to modify the data, the ratio quickly changes.

MySQL [(none)]> SELECT hostgroup, srv_host, srv_port, status, queries FROM stats_mysql_connection_pool WHERE hostgroup IN (20, 10) AND status='ONLINE';

+-----------+---------------+----------+--------+---------+

| hostgroup | srv_host      | srv_port | status | Queries |

+-----------+---------------+----------+--------+---------+

| 20        | 192.168.1.204 | 3306     | ONLINE | 3117    |

| 20        | 192.168.1.205 | 3306     | ONLINE | 3010    |

| 10        | 192.168.1.206 | 3306     | ONLINE | 6807    |

+-----------+---------------+----------+--------+---------+

3 rows in set (0.003 sec)

This is an output after issuing several grades, creating forum topics and adding some course content. As you can see, with such a safe/unsafe queries ratio the writer will be saturated earlier than the readers therefore scaling out by adding more nodes is not suitable.

What can be done about it? There is a setting called “latency”. As per the configuration file, it determines when it is safe to read the table after the write. When write happens, the table is marked as modified and for the “latency” time all SELECTs will be sent to the writer node. Once the time longer than “latency” passed, SELECTs from that table may again be sent to read nodes. Please keep in mind that with MariaDB Cluster, time required for writeset to be applied across all of the nodes is typically very low, counted in milliseconds. This would allow us to set the latency quite low in the Moodle configuration file, for example the value like 0.1s (100 milliseconds) should be quite ok. Of course, should you run into any problems, you can always increase this value even further.

Another option to test would be to rely purely on MariaDB Cluster to tell when the read is safe and when it is not. There is a wsrep_sync_wait variable that can be configured to force causality checks on several access patterns (reads, updates, inserts, deletes, replaces and SHOW commands). For our purpose it would be enough to ensure that reads are executed with the causality enforced thus we shall set this variable to ‘1’.

We are going to make this change on all of the MariaDB Cluster nodes. We will also need to reconfigure ProxySQL for read/write split based on the query rules, not just the users, as we had previously. We will also remove the ‘moodle_safereads’ user as it is not needed anymore in this setup.

We set up three query rules that distribute the traffic based on the query. SELECT … FOR UPDATE is sent to the writer node, all SELECT queries are sent to readers and everything else (INSERT, DELETE, REPLACE, UPDATE, BEGIN, COMMIT and so on) is sent to the writer node as well.

This allows us to ensure that all the reads can be spread across the reader nodes thus allowing horizontal scale out through adding more nodes to the MariaDB Cluster.

We hope with those couple of tips you will be able to scale out your Moodle database backend much easier and to a greater extent

A Guide to MySQL Indexes

$
0
0

When MySQL query optimization is mentioned, indexes are one of the first things that get covered. Today, we will try to see why they are so important.

What are Indexes?

In general, an index is an alphabetical list of records with references to the pages on which they are mentioned. In MySQL, an index is a data structure used to quickly find rows. Indexes are also called keys and those keys are critical for good performance - as the data grows larger, the need of using indexes properly might become more and more important. Using indexes is one of the most powerful ways to improve query performance - if indexes are used properly, query performance might increase by tens or even hundreds of times.

Today, we will try to explain the basic benefits and drawbacks of using indexes in MySQL. Keep in mind that MySQL indexes alone deserve an entire book so this post will not cover absolutely everything, but it will be a good starting point. For those who are interested in how indexes work on a deeper level, reading the book Relational Database Index Design and the Optimizers by Tapio Lahdenmäki and Michael Leach should provide more insight.

The Benefits of Using Indexes

There are a few main benefits of using indexes in MySQL and these are as follows:

  • Indexes allow to quickly find rows matching a WHERE clause;
  • Indexes might help queries avoid searching through certain rows thus reducing the amount of data the server needs to examine - if there is a choice between multiple indexes, MySQL usually uses the most selective index, that is such an index that finds the smallest amount of rows;
  • Indexes might be used in order to retrieve rows from other tables in JOIN operations;
  • Indexes might be used to find the minimum or the maximum value of a specific column that uses an index;
  • Indexes might be used to sort or group a table if the operations are performed on a leftmost prefix of an index - similarly, a leftmost prefix of a multiple-column index might also be used by the query optimizer to look up rows;
  • Indexes might also be used to save disk I/O - when a covering index is in use, a query can return values straight from the index structure saving disk I/O.

Similarly, there are multiple types of indexes:

  • INDEX is a type of index where values do not need to be unique. This type of index accepts NULL values;
  • UNIQUE INDEX is frequently used to remove duplicate rows from a table - this type of index allows developers to enforce the uniqueness of row values;
  • FULLTEXT INDEX is an index that is applied on fields that utilize full text search capabilities. This type of index finds keywords in the text instead of directly comparing values to the values in the index;
  • DESCENDING INDEX is an index that stores rows in a descending order - the query optimizer will choose this type of an index when a descending order is requested by the query. This index type was introduced in MySQL 8.0;
  • PRIMARY KEY is also an index. In a nutshell, the PRIMARY KEY is a column or a set of columns that identifies each row in a table - frequently used together with fields having an AUTO_INCREMENT attribute. This type of index does not accept NULL values and once set, the values in the PRIMARY KEY cannot be changed.

Now, we will try to go through both the benefits and the drawbacks of using indexes in MySQL. We will start with the probably most frequently discussed upside - speeding up queries that match a WHERE clause.

Speeding up Queries Matching a WHERE Clause

Indexes are frequently used to speed up search queries that match a WHERE clause. The reason why an index makes such search operations faster is pretty simple - queries that use an index avoid a full table scan.

In order to speed up queries that match a WHERE clause you can make use of the EXPLAIN statement in MySQL. The statement EXPLAIN SELECT should provide you some insight about how the MySQL query optimizer executes the query - it can also show you whether the query in question uses an index or not and what index does it use. Take a look at the following query explanation:

mysql> EXPLAIN SELECT * FROM demo_table WHERE field_1 = “Demo” \G;

*************************** 1. row ***************************

<...>

possible_keys: NULL

key: NULL

key_len: NULL

<...>

The above query does not use an index. However, if we add an index on “field_1”, the index would be used successfully:

mysql> EXPLAIN SELECT * FROM demo_table WHERE field_1 = “Demo” \G;

*************************** 1. row ***************************

<...>

possible_keys: field_1

key: field_1

key_len: 43

<...>

The possible_keys column describes the possible indexes that MySQL can choose, the key column describes the index actually chosen and the key_len column describes the length of the chosen key.

In this case, MySQL would perform a lookup of the values in the index and return any rows containing the specified value - as a result, the query would be faster. Although indexes do help certain queries to be faster, there are a couple of things that you need to keep in mind if you want your indexes to help your queries:

  • Isolate your columns - MySQL cannot use indexes if the columns the indexes are used on are not isolated. For example, a query like this wouldn’t use an index:
    SELECT field_1 FROM demo_table WHERE field_1 + 5 = 10;

In order to solve this, leave the column that goes after the WHERE clause alone - simplify your query as much as possible and isolate the columns;

  • Avoid using LIKE queries with a preceding wildcard - in this case, MySQL will not use an index because the preceding wildcard means that there can be anything before the text. If you must use LIKE queries with wildcards and want the queries to make use of indexes, make sure that the wildcard is at the end of the search statement.

Of course, speeding up queries that match a WHERE clause can also be done in other ways (for example, partitioning), but for the sake of simplicity, we won’t be looking further into that in this post.

What we might be interested in however are different kinds of index types, so we’ll look into that now.

Getting rid of Duplicate Values in a Column - UNIQUE Indexes

The purpose of a UNIQUE index in MySQL is to enforce the uniqueness of the values in a column. To use a UNIQUE index run a CREATE UNIQUE INDEX query:

CREATE UNIQUE INDEX demo_index ON demo_table(demo_column);

You can also create a unique index when you create a table:

CREATE TABLE demo_table (
`demo_column` VARCHAR(100) NOT NULL,
UNIQUE KEY(demo_column)
);

That’s all it takes to add a unique index to a table. Now, when you try to add a duplicate value to the table MySQL will come back with the following error:

#1062 - Duplicate entry ‘Demo’ for key ‘demo_column’

FULLTEXT Indexes

A FULLTEXT index is such an index that is applied to the columns that use full text search capabilities. This type of index has many unique capabilities including stopwords and search modes.

The InnoDB stopword list has 36 words while the MyISAM stopword list has 143. In InnoDB, the stopwords are derived from the table set in the variable innodb_ft_user_stopword_table, otherwise, if this variable is not set they are derived from the innodb_ft_server_stopword_table variable. If neither of those two variables are set, InnoDB uses the built-in list. To see the default InnoDB stopword list, query the INNODB_FT_DEFAULT_STOPWORD table.

In MyISAM, the stopwords are derived from the storage/myisam/ft_static.c file. The ft_stopword_file variable enables the default stopword list to be changed. Stopwords will be disabled if this variable is set to an empty string, but keep in mind that if this variable defines a file, the defined file is not parsed for comments - MyISAM will treat all of the words found in the file as stopwords.

The FULLTEXT indexes are also famous for its unique search modes:

  • If a FULLTEXT search query with no modifiers is run, a natural language mode will be activated. The natural language mode can also be activated by using the IN NATURAL LANGUAGE MODE modifier;
  • The WITH QUERY EXPANSION modifier enables a search mode with query expansion. Such a search mode works by performing the search twice and when the search is run for the second time, the result set would include a few of the most relevant documents from the first search. In general, this modifier is useful when the user has some implied knowledge (for example, the user might search for “database” and hope to see “InnoDB” and “MyISAM” in the result set);
  • The IN BOOLEAN MODE modifier allows searching with boolean operators. For example, the +, - or * operators would each accomplish different tasks - the + operator would define that the value must be present in a row, the - operator would define that the value must not exist and the * operator would act as a wildcard.

A query that uses a FULLTEXT index looks like so:

SELECT * FROM demo_table WHERE MATCH(demo_field) AGAINST(‘value’ IN NATURAL LANGUAGE MODE);

Keep in mind that FULLTEXT indexes are generally useful for MATCH() AGAINST() operations - not for WHERE operations meaning that if a WHERE clause would be used, the usefulness of using different index types would not be eliminated.

It is also worth mentioning that FULLTEXT indexes have a minimum length of characters. In InnoDB, a FULLTEXT search can only be performed when the search query consists of a minimum of three characters - this limit is increased to four characters in the MyISAM storage engine.

DESCENDING Indexes

A DESCENDING index is such an index where InnoDB stores the entries in a descending order - the query optimizer will use such an index when a descending order is requested by the query. Such an index can be added to a column by running a query like below:

CREATE INDEX descending_index ON demo_table(column_name DESC);

An ascending index can also be added to a column - just replace DESC with ASC.

PRIMARY KEYs

A PRIMARY KEY serves as an unique identifier for each row in a table. A column with a PRIMARY KEY must contain unique values - no NULL values are allowed to be used either. If a duplicate value is added to a column which has a PRIMARY KEY, MySQL will respond with an error #1062:

#1062 - Duplicate entry ‘Demo’ for key ‘PRIMARY’

If a NULL value is added to the column, MySQL will respond with an error #1048:

#1048 - Column ‘id’ cannot be null

Primary indexes are also sometimes called clustered indexes (we discuss them later).

You can also create indexes on multiple columns at once - such indexes are called multicolumn indexes.

Multicolumn Indexes

Indexes on multiple columns are often misunderstood - sometimes developers and DBAs index all of the columns separately or index them in the wrong order. In order to make queries utilizing multicolumn indexes as effective as possible, remember that the order of columns in indexes that use more than one column is one of the most common causes of confusion in this space - as there are no “this way or the highway” index order solutions, you must remember that the correct order of multicolumn indexes does depend on the queries that are using the index. While this may seem pretty obvious, do remember that the column order is vital when dealing with multicolumn indexes - choose the column order such that it’s as selective as possible for the queries that will run the most frequently.

In order to measure the selectivity for specific columns, get the ratio of the number of distinct indexed values to the total number of rows in the table - the column that has the higher selectivity should be the first one.

Sometimes you also need to index very long character columns, and in that case, you can often save time and resources by indexing the first few characters - a prefix - instead of the whole value.

Prefix Indexes

Prefix indexes can be useful when the columns contain very long string values, which would mean that adding an index on the whole column would consume a lot of disk space. MySQL helps to address this issue by allowing you to only index a prefix of the value which in turn makes the index size smaller. Take a look:

CREATE TABLE `demo_table` (
`demo_column` VARCHAR(100) NOT NULL,
INDEX(demo_column(10))
);

The above query would create a prefix index on the demo column only indexing the first 10 characters of the value. You can also add a prefix index to an existing table:

CREATE INDEX index_name ON table_name(column_name(length));

So, for example, if you would want to index the first 5 characters of a demo_column on a demo_table, you could run the following query:

CREATE INDEX demo_index ON demo_table(demo_column(5));

You should choose a prefix that is long enough to give selectivity, but also short enough to give space. This might be easier said than done though - you need to experiment and find the solution that works for you.

Covering Indexes

A covering index “covers” all of the required fields to execute a query. In other words, when all fields in a query are covered by an index, a covering index is in use. For example for a query like so:

SELECT id, title FROM demo_table WHERE id = 1;

A covering index might look like this:

INDEX index_name(id, title);

If you want to make sure that a query uses a covering index, issue an EXPLAIN statement on it, then take a look at the Extra column. For example, if your table has a multicolumn index on id and title and a query that accesses only these two columns is executed, MySQL will use the index:

mysql> EXPLAIN SELECT id, title FROM demo_table \G;

*************************** 1. row ***************************

<...>

type: index

key: index_name

key_len: 5

rows: 1000

Extra: Using index

<...>

Keep in mind that a covering index must store the values from the columns it covers. That means that MySQL can only use B-Tree indexes to cover queries because other kinds of indexes do not store these values.

Clustered, Secondary Indexes, and Index Cardinality

When indexes are discussed, you might also hear the terms clustered, secondary indexes, and index cardinality. Put simply, clustered indexes are an approach to data storage and all indexes other than clustered indexes are secondary indexes. Index cardinality on the other hand is the number of unique values in an index.

A clustered index speeds up queries because close values are also stored close to each other on the disk, but that’s also the reason why you can only have one clustered index in a table.

A secondary index is any index that isn’t the primary index. Such an index may have duplicates.

The Drawbacks of Using Indexes

The usage of indexes certainly has upsides, but we mustn’t forget that indexes can be one of the leading causes of issues in MySQL too. Some of the drawbacks of using indexes are as follows:

  • Indexes can degrade the performance of certain queries - even though indexes tend to speed up the performance of SELECT queries, they slow down the performance of INSERT, UPDATE, and DELETE queries because when the data is updated the index also needs to be updated together with it: any operation that involves manipulating the indexes will be slower than usual;
  • Indexes consume disk space - an index occupies its own space, so indexed data will consume more disk space too;
  • Redundant and duplicate indexes can be a problem - MySQL allows you to create duplicate indexes on a column and it does not “protect you” from doing such a mistake. Take a look at this example: 
    CREATE TABLE `demo_table` (
    
    `id` INT(10) NOT NULL AUTO_INCREMENT PRIMARY KEY,
    
    `column_2` VARCHAR(10) NOT NULL,
    
    `column_3` VARCHAR(10) NOT NULL,
    
    INDEX(id),
    
    UNIQUE(id)
    
    );

An inexperienced user might think that this query makes the id column increment automatically, then adds an index on the column and makes the column not accept duplicate values. However, this isn’t what’s happening here. In this case, the same column has three indexes on it: an ordinary INDEX, and since MySQL implements both PRIMARY KEY and UNIQUE constraints with indexes, that adds two more indexes on the same column!

Conclusion

To conclude, indexes in MySQL have their own place - indexes can be used in a multitude of scenarios, but each of those usage scenarios have their own downsides which must be considered in order to get the most of indexes that are in use.

To use indexes well, profile your queries, take a look at what options you have when it comes to indexes, know their benefits and disadvantages, decide what indexes you need based on your requirements and after you index the columns, make sure your indexes are actually used by MySQL. If you have indexed your schema properly, the performance of your queries should improve, but if the response time doesn’t satisfy you, see if a better index can be created in order to improve it.

MaxScale Basic Management Using MaxCtrl for MariaDB Cluster - Part Two

$
0
0

In the previous blog post, we have covered 4 basic management components using the MaxCtrl command-line client. In this blog post, we are going to cover the remaining part of the MaxScale components which are commonly used in a MariaDB Cluster:

  • Filter management
  • MaxScale management
  • Logging management

All of the commands in this blog post are based on MaxScale 2.5.3. 

Filter Management

The filter is a module in MaxScale which acts as the processing and routing engine for a MaxScale service. The filtering happens between the client connection to MaxScale and the MaxScale connection to the backend database servers. This path (the client-side of MaxScale out to the actual database servers) can be considered a pipeline, filters can then be placed in that pipeline to monitor, modify, copy or block the content that flows through it.

There are many filters that can be applied to extend the processing capabilities of a MaxScale service, as shown in the following table:

Filter Name

Description

Binlog

Selectively replicates the binary log events to slave servers combined together with a binlogrouter service.

Cache

A simple cache that is capable of caching the result of SELECTs, so that subsequent identical SELECTs are served directly by MaxScale, without the queries being routed to any server.

Consistent Critical Read

Allows consistent critical reads to be done through MaxScale while still allowing scaleout of non-critical reads.

Database Firewall

Blocks queries that match a set of rules. This filter should be viewed as a best-effort solution intended for protecting against accidental misuse rather than malicious attacks.

Hint

Adds routing hints to a service, instructing the router to route a query to a certain type of server.

Insert Stream

Converts bulk inserts into CSV data streams that are consumed by the backend server via the LOAD DATA LOCAL INFILE mechanism

Lua

Calls a set of functions in a Lua script.

Masking

Obfuscates the returned value of a particular column

Maxrows

Restricting the number of rows that a SELECT, a prepared statement, or stored procedure could return to the client application.

Named Server

Routes queries to servers based on regular expression (regex) matches.

Query Log All

Logs query content to a file in CSV format.

Regex

Rewrites query content using regular expression matches and text substitution.

Tee

Make copies of requests from the client and send the copies to another service within MariaDB MaxScale.

Throttle

Replaces and extends on the limit_queries functionality of the Database Firewall filter

Top

Monitors the query performance of the selected SQL statement that passes through the filter.

Transaction Performance Monitoring

Monitors every SQL statement that passes through the filter, grouped as per transaction, for transaction performance analysis.

Every filter has its own ways to configure. Filters are commonly attached to a MaxScale service. For example, a binlog filter can be applied to the binlogrouter service, to only replicate a subset of data onto a slave server which can hugely reduce the disk space for huge tables. Check out the MaxScale filters documentation for the right way to configure the parameters for the corresponding filter.

Create a Filter

Every MaxScale filter has its own way to be configured. In this example, we are going to create a masking filter, to mask our sensitive data for column "card_no" in our table "credit_cards". Masking requires a rule file, written in JSON format. Firstly, create a directory to host our rule files:

$ mkdir /var/lib/maxscale/rules

Then, create a text file:

$ vi /var/lib/maxscale/rules/masking.json

Specify the lines as below:

{
    "rules": [
        {
            "obfuscate": {
                "column": "card_no"
            }
        }
    ]
}

The above simple rules will simply obfuscate the output of column card_no for any tables, to protect the sensitive data to be seen by the MariaDB client.

After the rule file has been created, we can create the filter, using the following command:

maxctrl: create filter Obfuscates-card masking rules=/var/lib/maxscale/rules/masking.json
OK

Note that some filters require different parameters. As for this masking filter, the basic parameter is "rules", where we need to specify the created masking rule file in JSON format.

Attach a Filter to a Service

A filter can only be activated by attaching it to a service. Modifying an existing service using MaxCtrl is only supported by some parameters, and adding a filter is not one of them. We have to add the filter component under MaxScale's service configuration file to basically attach the filter. In this example, we are going to apply the "Obfuscates-card" filter to our existing round-robin service called rr-service.

Go to /var/lib/maxscale/maxscale.cnf.d directory and find rr-service.cnf, open it with a text editor and then add the following line:

filters=Obfuscates-card

A MaxScale restart is required to load the new change:

$ systemctl restart maxscale

To test the filter, we will use a MariaDB client and compare the output by connecting to two different services. Our rw-service is attached to a listener listening on port 3306, without any filters configured. Hence, we should see the unfiltered response from the MaxScale:

$ mysql -ucard_user -p -hmaxscale_host -p3306 -e "SELECT * FROM secure.credit_cards LIMIT 1"
+----+-----------+-----------------+-------------+-----------+---------+
| id | card_type | card_no         | card_expiry | card_name | user_id |
+----+-----------+-----------------+-------------+-----------+---------+
|  1 | VISA      | 425388910909238 | NULL        | BOB SAGAT |       1 |
+----+-----------+-----------------+-------------+-----------+---------+

When connecting to the rr-service listener on port 3307, which configured with our filter, our "card_no" value is obfuscated with a gibberish output:

$ mysql -ucard_user -p -hmaxscale_host -p3307 -e "SELECT * FROM secure.credit_cards LIMIT 1"
+----+-----------+-----------------+-------------+-----------+---------+
| id | card_type | card_no         | card_expiry | card_name | user_id |
+----+-----------+-----------------+-------------+-----------+---------+
|  1 | VISA      | ~W~p[=&^M~5f~~M | NULL        | BOB SAGAT |       1 |
+----+-----------+-----------------+-------------+-----------+---------+

This filtering is performed by MaxScale, following the matching rules inside masking.json that we have created earlier.

List Filters

To list out all created filters, use the "list filters" command:

maxctrl: list filters
┌─────────────────┬────────────┬─────────────┐
│ Filter          │ Service    │ Module      │
├─────────────────┼────────────┼─────────────┤
│ qla             │            │ qlafilter   │
├─────────────────┼────────────┼─────────────┤
│ Obfuscates-card │ rr-service │ masking     │
├─────────────────┼────────────┼─────────────┤
│ fetch           │            │ regexfilter │
└─────────────────┴────────────┴─────────────┘

In the above examples, we have created 3 filters. However, only the Obfuscates-card filter is linked to a service.

To show all services in details:

maxctrl: show filters

Or if you want to show a particular service:

maxctrl: show filter Obfuscates-card
┌────────────┬──────────────────────────────────────────────────────┐
│ Filter     │ Obfuscates-card                                      │
├────────────┼──────────────────────────────────────────────────────┤
│ Module     │ masking                                              │
├────────────┼──────────────────────────────────────────────────────┤
│ Services   │ rr-service                                           │
├────────────┼──────────────────────────────────────────────────────┤
│ Parameters │ {                                                    │
│            │     "check_subqueries": true,                        │
│            │     "check_unions": true,                            │
│            │     "check_user_variables": true,                    │
│            │     "large_payload": "abort",                        │
│            │     "prevent_function_usage": true,                  │
│            │     "require_fully_parsed": true,                    │
│            │     "rules": "/var/lib/maxscale/rules/masking.json", │
│            │     "treat_string_arg_as_field": true,               │
│            │     "warn_type_mismatch": "never"│
│            │ }                                                    │
└────────────┴──────────────────────────────────────────────────────┘

Delete a Filter

In order to delete a filter, one has to unlink from the associated services first. For example, consider the following filters in MaxScale:

 maxctrl: list filters
┌─────────────────┬────────────┬───────────┐
│ Filter          │ Service    │ Module    │
├─────────────────┼────────────┼───────────┤
│ qla             │            │ qlafilter │
├─────────────────┼────────────┼───────────┤
│ Obfuscates-card │ rr-service │ masking   │
└─────────────────┴────────────┴───────────┘

For the qla filter, we can simply use the following command to delete it:

 maxctrl: destroy filter qla
OK

However, for the Obfuscates-card filter, it has to be unlinked with rr-service and unfortunately, this requires a configuration file modification and MaxScale restart. Go to /var/lib/maxscale/maxscale.cnf.d directory and find rr-service.cnf, open it with a text editor and then remove the following line:

filters=Obfuscates-card

You could also remove the "Obfuscates-card" string from the above line and let "filters" line equal to an empty value. Then, save the file and restart MaxScale service to load the changes:

$ systemctl restart maxscale

Only then we can remove the Obfuscates-card filter from MaxScale using the "destroy filter" command:

maxctrl: destroy filter Obfuscates-card
OK

MaxScale Management

List Users

To list all MaxScale users, use the "list users" command:

maxctrl: list users
┌───────┬──────┬────────────┐
│ Name  │ Type │ Privileges │
├───────┼──────┼────────────┤
│ admin │ inet │ admin      │
└───────┴──────┴────────────┘

Create a MaxScale User

By default, a created user is a read-only user:

 maxctrl: create user dev mySecret
OK

To create an administrator user, specify the --type=admin command:

 maxctrl: create user dba mySecret --type=admin
OK

Delete a MaxScale User

To delete a user, simply use the "destroy user" command:

 maxctrl: destroy user dba
OK

The last remaining administrative user cannot be removed. Create a replacement administrative user before attempting to remove the last administrative user.

Show MaxScale Parameters

To show all loaded parameters for the MaxScale instance, use the "show maxscale" command:

maxctrl: show maxscale
┌──────────────┬──────────────────────────────────────────────────────────────────────┐
│ Version      │ 2.5.3                                                                │
├──────────────┼──────────────────────────────────────────────────────────────────────┤
│ Commit       │ de3770579523e8115da79b1696e600cce1087664                             │
├──────────────┼──────────────────────────────────────────────────────────────────────┤
│ Started At   │ Mon, 21 Sep 2020 04:44:49 GMT                                        │
├──────────────┼──────────────────────────────────────────────────────────────────────┤
│ Activated At │ Mon, 21 Sep 2020 04:44:49 GMT                                        │
├──────────────┼──────────────────────────────────────────────────────────────────────┤
│ Uptime       │ 1627                                                                 │
├──────────────┼──────────────────────────────────────────────────────────────────────┤
│ Parameters   │ {                                                                    │
│              │     "admin_auth": true,                                              │
│              │     "admin_enabled": true,                                           │
│              │     "admin_gui": true,                                               │
│              │     "admin_host": "127.0.0.1",                                       │
│              │     "admin_log_auth_failures": true,                                 │
│              │     "admin_pam_readonly_service": null,                              │
│              │     "admin_pam_readwrite_service": null,                             │
│              │     "admin_port": 8989,                                              │
│              │     "admin_secure_gui": true,                                        │
│              │     "admin_ssl_ca_cert": null,                                       │
│              │     "admin_ssl_cert": null,                                          │
│              │     "admin_ssl_key": null,                                           │
│              │     "auth_connect_timeout": 10000,                                   │
│              │     "auth_read_timeout": 10000,                                      │
│              │     "auth_write_timeout": 10000,                                     │
│              │     "cachedir": "/var/cache/maxscale",                               │
│              │     "connector_plugindir": "/usr/lib/x86_64-linux-gnu/mysql/plugin", │
│              │     "datadir": "/var/lib/maxscale",                                  │
│              │     "debug": null,                                                   │
│              │     "dump_last_statements": "never",                                 │
│              │     "execdir": "/usr/bin",                                           │
│              │     "language": "/var/lib/maxscale",                                 │
│              │     "libdir": "/usr/lib/x86_64-linux-gnu/maxscale",                  │
│              │     "load_persisted_configs": true,                                  │
│              │     "local_address": null,                                           │
│              │     "log_debug": false,                                              │
│              │     "log_info": false,                                               │
│              │     "log_notice": false,                                             │
│              │     "log_throttling": {                                              │
│              │         "count": 0,                                                  │
│              │         "suppress": 0,                                               │
│              │         "window": 0                                                  │
│              │     },                                                               │
│              │     "log_warn_super_user": false,                                    │
│              │     "log_warning": false,                                            │
│              │     "logdir": "/var/log/maxscale",                                   │
│              │     "max_auth_errors_until_block": 10,                               │
│              │     "maxlog": true,                                                  │
│              │     "module_configdir": "/etc/maxscale.modules.d",                   │
│              │     "ms_timestamp": true,                                            │
│              │     "passive": false,                                                │
│              │     "persistdir": "/var/lib/maxscale/maxscale.cnf.d",                │
│              │     "piddir": "/var/run/maxscale",                                   │
│              │     "query_classifier": "qc_sqlite",                                 │
│              │     "query_classifier_args": null,                                   │
│              │     "query_classifier_cache_size": 0,                                │
│              │     "query_retries": 1,                                              │
│              │     "query_retry_timeout": 5000,                                     │
│              │     "rebalance_period": 0,                                           │
│              │     "rebalance_threshold": 20,                                       │
│              │     "rebalance_window": 10,                                          │
│              │     "retain_last_statements": 0,                                     │
│              │     "session_trace": 0,                                              │
│              │     "skip_permission_checks": false,                                 │
│              │     "sql_mode": "default",                                           │
│              │     "syslog": true,                                                  │
│              │     "threads": 1,                                                    │
│              │     "users_refresh_interval": 0,                                     │
│              │     "users_refresh_time": 30000,                                     │
│              │     "writeq_high_water": 16777216,                                   │
│              │     "writeq_low_water": 8192                                         │
│              │ }                                                                    │
└──────────────┴──────────────────────────────────────────────────────────────────────┘

Alter MaxScale parameters

  • auth_connect_timeout
  • auth_read_timeout
  • auth_write_timeout
  • admin_auth
  • admin_log_auth_failures
  • passive

The rest of the parameters must be set inside /etc/maxscale.conf, which requires a MaxScale restart to apply the new changes.

MaxScale GUI

MaxGUI is a new browser-based tool for configuring and managing MaxScale, introduced in version 2.5. It's accessible via port 8989 of the MaxScale host on the localhost interface, 127.0.0.1. By default, it is required to set admin_secure_gui=true and configure both the admin_ssl_key and admin_ssl_cert parameters. However, in this blog post, we are going to allow connectivity via the plain HTTP by adding the following line under [maxctrl] directive inside /etc/maxscale.cnf:

admin_secure_gui = false

Restart MaxScale service to load the change:

$ systemctl restart maxscale

Since the GUI is listening on the localhost interface, we can use SSH tunneling to access the GUI from our local workstation:

$ ssh -L 8989:localhost:8989 ubuntu@<Maxscale public IP address>

Then, open a web browser, point the URL to http://127.0.0.1:8989/ and log in. MaxGUI uses the same credentials as maxctrl, thus the default password is "admin" with the password "mariadb". For security purposes, one should create a new admin user with a stronger password specifically for this purpose. Once logged in, you should see the MaxGUI dashboard as below:

Most of the MaxCtrl management commands that we have shown in this blog series can be performed directly from this GUI. If you click on the "Create New" button, you will be presented with the following dialog:

As you can see, all of the important MaxScale components can be managed directly from the GUI, with a nice intuitive clean look, makes things much simpler and more straightforward to manage. For example, associating a filter can be done directly from the UI, without the need to restart the MaxScale service, as shown under "Attach a filter to a service" section in this blog post.

For more information about this new GUI, check out this MaxGUI guide.

Logging Management

Show Logging Parameters

To display the logging parameters, use the "show logging" command:

 maxctrl: show logging
┌────────────────────┬────────────────────────────────┐
│ Current Log File   │ /var/log/maxscale/maxscale.log │
├────────────────────┼────────────────────────────────┤
│ Enabled Log Levels │ alert                          │
│                    │ error                          │
│                    │ warning                        │
│                    │ notice                         │
├────────────────────┼────────────────────────────────┤
│ Parameters         │ {                              │
│                    │    "highprecision": true,     │
│                    │     "log_debug": false,        │
│                    │     "log_info": false,         │
│                    │     "log_notice": true,        │
│                    │     "log_warning": true,       │
│                    │     "maxlog": true,            │
│                    │     "syslog": true,            │
│                    │     "throttling": {            │
│                    │         "count": 10,           │
│                    │         "suppress_ms": 10000,  │
│                    │         "window_ms": 1000      │
│                    │     }                          │
│                    │ }                              │
└────────────────────┴────────────────────────────────┘

Edit Logging Parameters

All of the logging parameters as shown above can be configured via the MaxCtrl command in runtime. For example, we can turn on the log_info by using the "alter logging" command:

maxctrl: alter logging log_info true

Rotate Logs

By default, the MaxScale provides a log rotate configuration file under /etc/logrotate.d/maxscale_logrotate. Based on the log rotation configuration, the log file is rotated monthly and makes use of MaxCtrl's "rotate logs" command. We can force log rotation to happen immediately with the following command:

$ logrotate --force /etc/logrotate.d/maxscale_logrotate

Verify with the following command:

$ ls -al /var/log/maxscale/
total 1544
drwxr-xr-x  2 maxscale maxscale    4096 Sep 21 05:53 ./
drwxrwxr-x 10 root     syslog      4096 Sep 20 06:25 ../
-rw-r--r--  1 maxscale maxscale      75 Sep 21 05:53 maxscale.log
-rw-r--r--  1 maxscale maxscale  253250 Sep 21 05:53 maxscale.log.1
-rw-r--r--  1 maxscale maxscale 1034364 Sep 18 06:25 maxscale.log.2
-rw-r--r--  1 maxscale maxscale  262676 Aug  1 06:25 maxscale.log.3

Conclusion

We have reached the end of the series of MaxScale deployment and management using the MaxCtrl client. Across this blog series, we have used a couple of different latest MaxScale versions (relative to the write-up date) and we have seen many significant improvements in every version. 

Kudos to the MariaDB MaxScale team for their hard work in making MaxScale one of the best database load balancer tools in the market.

How to Upgrade from MariaDB 10.4 to MariaDB 10.5

$
0
0

MariaDB 10.5 was released as GA in June 2020. In the release, there has been added support for Amazon S3 or any third-party public or private cloud that supports S3 API. It also features sophisticated handling for privileges extending its granularity which enables a DBA for example to provide limited privileges on a particular database user for tight security of your database. 

MariaDB 10.5 also boasted its improvements with the InnoDB storage engine for its performance and some new variables are also presented but major deprecated variables have been marked deprecated or totally removed. For example, take note that in MariaDB 10.5, innodb_buffer_pool_instances has already been marked as deprecated while it is set to be removed in version 10.6. If you're curious about whatever reason they say, please check out MDEV-15058

With all of these changes, it's best to deliver this blog to provide a guide on how to upgrade MariaDB 10.4 to MariaDB 10.5. We'll take it step-by-step on what are the things you need to consider for upgrading.

Things You Need Before Upgrading

It's not always the best method to upgrade your database live in production without doing a test. This simple jargon defines the term we call SNAFU. You might hit Google to find the term but basically, it's always best not to touch normal health especially the normally functioning systems. However, it's not always that your system has to stay constant, it has to be upgraded to avail security patches, bug fixes, and advance features that are present on the newer version releases. So in this case, you always have a failback mechanism planned and setup ahead of the upgrade. In case upgrading the system caught up with issues that were left unobserve, then it can bring impact to your business.

Always Create a Backup of Your Database

In this case, always provide backup to your data. You can use tools such as mariabackup or mydumper or, if you are a ClusterControl user, then use the Database Backup Management tool. If you are not prepared yet on what type of backup you need, you might have to check the best practices when taking a backup.

Test...Test… and Test Again

While backup provides data to feed in case you need to restore to its primary state if unforeseen problems occur, upgrading to a major release has to be tested first into a development or staging machine. For large enterprise companies, it's common practice to always do a regression test on a targeted QA environment or staging environment where the upgrading of the database servers to its major version has first to be applied. All systems from application and database have to proceed a regression test or series of QA testing until everything has passed. It's not a good idea to simplify a test case of your application going to the database systems and just rule out that everything is fine as long as the database does not crash or it just has been proven for just a short amount of time where test is short, a very simple test which just covers a small percentage of your overall system. Testing your upgrade first on a staging or QA environment must be prioritized such that it has to achieve the perfectly good shape of your application without impacting the business side and also the users that are going to utilize your application, than realizing late that the database upgrade causes your system to behave abnormally due to changes you have not yet discovered.

Prepare A Restore Procedure

Everything has to be planned during the upgrade of your database. Whenever backup is available and testing reveals strong and promising results, it always feels secure and predictable in case unexpected challenges occur while upgrading your production MariaDB database servers. In this case, always write and prepare a procedure that makes things go back to normal smoothly and seamlessly. 

If your maintenance window is not too long, preparing a restore procedure using automated tools such as Ansible, Chef, Puppet, SaltStack, or Terraform can be a good choice for restore procedure. It minimizes human errors and provides speed and agility to perform vital tasks. Although it might damage in case a single error can be encountered if the automation script fails, then that also means you cannot ignore the possibility that this can happen. Therefore, this points also that restoring has to be seamless and has been tested properly that it shall be able to restore into a valid procedure.

MariaDB Upgrade Procedures

Upgrading your MariaDB version 10.4 to 10.5 is not a hassle yet a straightforward. Below are the steps you can follow to upgrade to the latest MariaDB 10.5 version.

Prepare Your Repository

It's understandable that you have MariaDB 10.4, so it's an assumption that there's a present repository in your current MariaDB server nodes. Otherwise, you can add a repository anyway and that's just simple. 

Ubuntu/Debian

For Ubuntu/Debian based systems, for an existing mariadb repository, you can edit the repository. You may be able to verify if the existing repositories are in your host or find if there's an existing MariaDB repository somewhere. To do that, just,

$ grep ^[^#] /etc/apt/sources.list /etc/apt/sources.list.d/*

Typically, you have  a mariadb.list repository. In my setup in Ubuntu 18.0 (Bionic), this shows as the following:

root@debnode20:/vagrant# cat /etc/apt/sources.list.d/mariadb.list

deb [arch=amd64] http://ftp.osuosl.org/pub/mariadb/repo/10.4/ubuntu bionic main

Just run the following command line to add the MariaDB 10.5 repository,

 . /etc/os-release

sudo echo "deb [arch=amd64] http://ftp.osuosl.org/pub/mariadb/repo/10.5/${ID} ${VERSION_CODENAME}  main">> /etc/apt/sources.list.d/mariadb.list

Before MariaDB packages can be installed,  it requires that the packages to be installed must be imported with GPG public key which is used to verify the digital signatures of the packages in their repository. You might check your apt keys with the following,

$ apt-key list |grep -C2 -i 'mariadb'

If keys aren't imported, 

$ sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xcbcb082a1bb943db

or for newer Ubuntu/Debian based versions i.e. starting with Debian 9 (Stretch), and Debian Unstable (Sid), and Ubuntu 16.04 LTS (Xenial),

$ sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8

Once done, just run

$ sudo apt update

CentOS/RHEL

For CentOS/RHEL, if you have an existing repository, you can just add or edit the file. Otherwise, adding the lines below for the MariaDB 10.5 repository will suffice the repository requirements (see mariadb.repo). For example, I have the following mariadb.repo in my CentOS 8.0 host.

[root@testnode30 ~]# cat /etc/yum.repos.d/mariadb.repo

[mariadb]

name = MariaDB Repository

baseurl = http://yum.mariadb.org/10.4/centos8-amd64

enabled = 1

gpgkey = https://yum.mariadb.org/RPM-GPG-KEY-MariaDB

gpgcheck = 1



[mariadb_10.5]

name = MariaDB Repository For 10.5

baseurl = http://yum.mariadb.org/10.5/centos8-amd64

enabled = 1

gpgkey = https://yum.mariadb.org/RPM-GPG-KEY-MariaDB

gpgcheck = 1

You can verify that if the MariaDB repository is enabled and works fine:

[root@testnode32 ~]# dnf --disablerepo=* --enablerepo=mariadb_10.5 repolist 

repo id                                repo name                                              status

mariadb_10.5                           MariaDB Repository For 10.5                            83

Upgrade Your MariaDB Packages

Upgrading with MariaDB is very straightforward. Make sure that you have shutdown the MariaDB server properly first.

For a busy and live production server, ensure that you have no incoming connections and that the dirty pages are properly flushed to disk. Before shutting down the server, you can set the flushing of dirty pages with your Innodb storage engine aggressively so that all dirty pages are all flushed and make shutdown process faster,

set global innodb_max_dirty_pages_pct = 0;

Then monitor the dirty pages with the following,

$ mysqladmin ext -i10 | grep dirty

| Innodb_buffer_pool_pages_dirty                         | 0                                                |

| Innodb_buffer_pool_bytes_dirty                         | 0                                                |

Once good, shutdown the MariaDB instance,

systemctl stop mariadb

For a master/replica database cluster, it's a good practice to always start your upgrade on the replica(s). So before the upgrade and after the shutdown, make sure you have added the following in your my.cnf configuration file,

[mysqld]

….

skip-slave-start

This allows you to avoid automatically starting the replication threads when the MariaDB server is started. This gives you more safety and avoids further mistakes in the replication. Only start the replication threads manually once ready with the following statement,

START SLAVE;

Ubuntu/Debian

Upgrading with Ubuntu/Debian based systems is pretty straightforward,

sudo apt install --only-upgrade  mariadb-server mariadb-client mariadb-backup mariadb-common

Of course, do not supply with the -y option so you can review the following packages to be updated.

Centos/RHEL

Same as with Ubuntu/Debian based systems, the CentOS/RHEL also shows no hassle for upgrading your current MariaDB 10.4 version. You can run the following command below to suffice the process,

$ dnf --disablerepo=* --enablerepo=mariadb_10.5 upgrade Mariadb-server MariaDB-client MariaDB-backup MariaDB-common Mariadb-shared

Post Installation/Package Upgrade

Once the packages have been upgraded. SInce this is a major upgrade, do not forget to reload the daemon for systemd. Just run,

$ systemctl daemon-reload

Now that you're set, start the mariadb service

$ systemctl start mariadb

 and run mysqld_upgrade,

$ mysql_upgrade 

While running the mysql_upgrade, always monitor the error log so you can catch any errors prior to running and starting everything for your normal operations:

tail -5f /var/log/mariadb/mariadb.log

Upgrade Tips for ClusterControl Users

Since ClusterControl does not provide major versions upgrade, when performing a package upgrade, always do not forget to turn off auto recovery modes for your MariaDB cluster. Set the nodes to maintenance mode so that alerts are silent and no false alerts to be notified.

Top Open Source Tools for MySQL & MariaDB Migrations

$
0
0

Large organizations that are using MySQL or MariaDB database platforms are often faced with a need to perform a database migration from one place to another. Regardless of the platform, type of database software (such as from RDBMS to NoSQL or NoSQL going back to RDBMS), or if it’s just a data migration, performing a migration is a huge amount of work and costs. 

A database migration will always involve the process of migrating data from one or more source databases to one or more target databases. This can involve a database migration service or a mashup set of tools that the engineers have built to create a service and tailor to this kind of problem. 

It is expected that a database migration does not mean the source database platform will end up its target platform to be exactly as the source of origin. Once a migration is finished, the dataset from the target databases could end up being possibly restructured. What matters most, once migration is fully done, is that the clients accessing the database shall be redirected to the newly source databases. The new source database has to provide the exact copy of data from source, and with no impacts to performance that may impact the overall user experience.

Moving your data from one platform to the target destination platform is a huge task to do. This is what a database migration covers when an organization or company decides to switch off its light to the current platform for numerous reasons. The common reasons for migrating data is because of cost effectiveness to the target destination platform or for its flexibility upon deployment and scalability. While the current platform hosting the current production data causes more costs for its upgrades and scalability wise, it is just burdensome when deploying small changes which can actually be deployed in a microservice platform.

In this blog we're going to put focus on the top open source tools you can use for MySQL and MariaDB migrations on a more homogeneous database migration.

Backup Tools For Data Migration

The most easy path to use when performing migration is to use database backup tools. We'll look at what these tools are and how you can use them during migration.

mysqldump/mysqlpump

This tool is one of the most famous utilities for MySQL or MariaDB that a database admin or system admin will hook this tool to migrate either a full database or a partial copy of the database. For database admins that are not familiar with MySQL/MariaDB, this tool will allow you to create a copy of backup which will generate a logical copy of data that you can dump on the target database. 

A common setup with using this tool is, whenever a target database is located somewhere else and is hosted on a different platform than the source, the target acts as a slave or replica. Using mysqldump commonly invoked with --single-transaction on a busy system, and with --master-data will provide you the coordinates to set up a slave on the target database which will be used as a host for data migration. An alternative to mysqldump is also mysqlpump but with a lesser feature yet can do parallel processing of databases, and of objects within databases, to speed up the dump process. The downside is that, with mysqlpump, there's no option you can use such as --master-data which is very useful if you want to create a replica which will be used as a target destination for database migration. 

mysqlpump is advantageous if your data is more of idle or is put into maintenance mode such that no processed writes or changes ongoing to the source database. It is faster and quicker compared to mysqldump.

mydumper/myloader 

mydumper/myloader is a very nifty and efficient tool that you can use for logical backup especially for importing bulk data with faster processing speed as it offers parallelism, ability to send data by chunks, supports threshold and control rate through number of threads, rows, statement size, and compress the result. It does generate or include binary log file and log positions which is very helpful if you setup the target destination platform to act as a replica of the current source and production environment. 

Basically, mydumper is the binary and the command you have to invoke via command line for creating the logical backup. Whereas, myloader is the binary and the command you have to use when loading data to the desired target destination. Its flexibility allows you to manage your desired rate when processing the actions whether its creating a backup or loading the data. Using mydumper, you can also create a full backup or just a partial backup copy of your source database. This is very useful in case you need a large data or schema that you wanted to move away from the current database host, and slightly move it to another target destination while starting to setup a new database shards. This can also be one way to migrate large data by pulling a huge segment of the dataset then moving it but as a new shard node.

mydumper/myloader has also its limitations. It has been stopped from updates from the original authors but saved by Max Bube yet the tool is still being widely used even for production environments.

Percona XtraBackup/MariaDB Backup

Percona's XtraBackup is a gift for database administrators that  do not want to use and spend money for the enterprise Oracle MySQL Enterprise Backup. Whereas MariaDB Backup is forked and derived from Percona XtraBackup, they also have MariaDB Enterprise Backup

Both of these tools share the same concepts when performing or taking a backup. It's a binary backup which offers a hot online backup, PITR, incremental and full backup, partial backup, also useful for data recovery as it understands recovery such that produces binary log file and position, supports GTID's, and a lot more. Although MariaDB Backup and Percona XtraBackup are two different types of software nowadays as they are architected onwards to support the database focused to provide a backup. MariaDB Backup is definitely applicable if you are intending to use or take backups from a MariaDB database source. Whereas, Percona XtraBackup is applicable on Oracle MySQL and also on Percona Server or some derived MySQL servers such as Percona XtraDB Server or Codership's version of Galera Cluster for MySQL.

Both backups are very beneficial for database migrations. Performing a hot online backup is quicker and faster and produces a backup that you can directly use to load it to your target database. More often, streaming backup is handy as well like you can perform an online backup and stream the binary data to the target database using socat or netcat. This allows you to shorten the time of migration since moving data to the target destination is being streamed directly. 

The most common approach of data migration while using this tool is to copy the data from the source then stream the data to the target destination. Once in the target database destination, you can just prepare the binary backup with --prepare option where it applies the logs that were recorded during the time of the backup creation so it will copy the full data as is and exactly from the point of time where the backup was taken. Then set the target database destination as a replica to act as a replica or slave of the existing source cluster and replicate all those changes and transactions that have occurred from the main cluster.

Of course there's a limitation as well of using this tool but database administrators must know how to use this tool and also how to throttle and customize the usage in accordance to its desired use. You might not want to bog down your source database if your source is taking too much traffic or large processing from that time. Its limitation also ensures that its a homogeneous setup where the target source is of Linux compatible system and not on a Windows type environment since Percona XtraBackup and MariaDB Backup operate only in the Linux environment.

Database Schema Migration Tools

Database migration does not speak itself only on a specific tool and a specific task, then migration can happen. There's a lot of considerations and underlying subsequent tasks that have to be done to fulfill a complete database migration. One of these is the schema migration or database migration. A schema in MySQL/MariaDB is a collection of data which consists of a group of tables with its columns and rows, events, triggers, stored procedures or routines, and functions. There are occasions when you might only want to migrate a schema or only a table. Let say a specific table on a schema requires a change in its table structure and that requires a DDL statement. The problem is that, running a direct DDL statement such as ALTER TABLE ...ENGINE=InnoDB blocks any incoming transactions or connections that will also refer or use to the target table. For some huge tables that comprises long data definition and structure of the table, it adds more real challenge and also complicates more especially if the table is a hot table. Whereas in a database migration, it can be hard to copy the exact full copy of the full table without downtime from source. So let's see what these are.

pt-online-schema-change

It's part of the famous Percona Toolkit which originally derived from Maatkit and Aspersa. This tool is very useful when performing a table definition change especially for a hot table consisting of a huge amount of data. For some common yet naive approach for performing a table definition change, running ALTER TABLE can do the job. Although it does suffice the case, ALTER TABLE without using ALGORITHM=INPLACE causes a full table copy which acquires a full-metadata lock and that means your database can possibly piled up and lock up for a long period of time, especially if the table is huge. In that case, this tool is built to solve that problem. This tool  is very beneficial for database migration in such a way that a detected inconsistent copy of a hot table with very huge data from your already setup target database destination. Instead of performing a backup either using a logical or binary/physical copy, pt-online-schema-change can be used which copies the rows from source table to its target table chunk-by-chunk. You can even customize the command with proper calls to its parameters depending on your requirements.

Aside from using, pt-online-schema-change also uses triggers. By using triggers, any subsequent or on-going traffic that tries to apply changes into that reference table shall also be copied to the target database which acts as a replica of the current source database cluster. This copies all data exactly what data that the source database has down to your target database that is lying on a different platform, for example. Using triggers is applicable to be used for MySQL and MariaDB as long as it's engine is InnoDB and has a primary key presence on that table, which is a requirement.  You may know that InnoDB uses a row-locking mechanism which allows that, for some number of chunks (a group of select records), pt-online-schema-change will try to copy that and then applies the INSERT statement to the target table. The target table is a dummy table which acts as a target copy of the soon-to-be replacement of the existing source table. pt-online-schema-change though allows the user to either remove the dummy table or just let the dummy table in-placed until the administrator is ready to remove that table. Take note that, dropping or removing a table acquires a meta-datalock. Since it acquires triggers, any subsequent changes shall be copied exactly to the target table leaving no discrepancy on the target or dummy table.

gh-ost

Shares the same concept as pt-online-schema-change. This tool approaches differently compared to pt-online-schema-change. I shall say, this schema tool migration approaches those production-based impediments that can cause your database to slow down and possibly stuck up causing your database cluster to go fall down under maintenance mode or down for an unknown period of time, until the problem is solved. This problem is caused usually with triggers. If you have a busy or hot table that is undergoing a schema change or table definition change, triggers can cause your database to piled up due to lock contention. MySQL/MariaDB triggers allow your database to define triggers for INSERT, UPDATE, and DELETE. If the target table is on a hotspot, then it can end up nasty. Your database starts to get slower until it gets stuck up unless you are able to kill those incoming queries or best to remove the triggers, but that's not what the ideal approach is all about.

Because of those issues, gh-ost addresses that problem. It acts as if there's a binary log server where the incoming events or transactions are logged in a binary log format, specifically using RBR (Row Based Replication). In fact, it is very safe and less worries in terms of impact you need to face. In fact, you have also the option to do a test or dry-run (same as with pt-online-schema-change) but test it directly into the replica or a slave node. This is perfect if you want to play around and check the exact copy to your target database during migration.

This tool is very flexible in accordance to your needs and implies assurance that your cluster shall not stuck up or probably end up performing a failover or data recovery if it goes worse. For more information and want to learn this tool, I suggest reading this post from Github by Shlomi Noach.

Other OSC Tools

I can say, those two tools are more of a recommendable approach but other alternatives you can also try. Mostly, these tools apply MySQL/MariaDB triggers so it somehow shares the same concept as pt-online-schema-change. Here's the following list:

  • LHM - Rails style database migrations are a useful way to evolve your data schema in an agile manner. Most Rails projects start like this, and at first, making changes is fast and easy.
  • OnlineSchemaChange - Created and initiated by Facebook. This tool is used for making schema changes for MySQL tables in a non-blocking way
  • TableMigrator - Initiated by Serious Business and ex-employees of Twitter. This tool shares the same principle with zero-downtime migrations of large tables in MySQL. It is implemented using Rails so it can be useful if you have a Ruby-on-Rails application environment.
  • oak-online-alter-table - this is an old tool created by Shlomi Noach though it's somehow approaches the same approach that pt-online-schema-change and does perform a non-blocking ALTER TABLE operation

Database Migration Wizard Tools

There are few migration tools that offer free usage which is very beneficial to some extent. What's more advantageous with using migration wizard tools is that they have GUI for which you can have the convenience to see the current structure or just follow the steps the UI provides during migration. There can be numerous services or wizard tools but it's not open source and it's not available for free. Of course, a database migration is a very complex yet a systematic process but for some cases, it requires large work and efforts. Let's take a look at these free tools.

MySQL Workbench

As the name suggests, it's for MySQL and derivative databases such as Percona Server for example, can be useful when it comes to database migration. Since MariaDB has totally shifted to a different route especially since the 10.2 version, there are some incompatibility issues you might encounter if you attempt to use this from a MariaDB source or target. Workbench can be used for heterogeneous types of databases such as those coming from different source databases and wants to dump the data to MySQL. 

The MySQL Workbench is composed of community and enterprise versions. Yet, the community version is freely available as GPL which you can find here https://github.com/mysql/mysql-workbench. As the documentation states, MySQL Workbench Allows you to migrate from Microsoft SQL Server, Microsoft Access, Sybase ASE, SQLite, SQL Anywhere, PostreSQL, and other RDBMS tables, objects and data to MySQL. Migration also supports migrating from earlier versions of MySQL to the latest releases.

phpMyAdmin

For those working as web developers using the LAMP stack, this tool comes to no surprise to be one of their swiss army knives when dealing with database tasks. phpMyAdmin is a free software tool written in PHP, intended to handle the administration of MySQL over the Web. phpMyAdmin supports a wide range of operations on MySQL and MariaDB. Frequently used operations (managing databases, tables, columns, relations, indexes, users, permissions, etc) can be performed via the user interface, while you still have the ability to directly execute any SQL statement.

Although it's quite simple when it comes to import and export, what's important is that it makes the job done. Although for larger and more complex migration, this might not suffice to handle your desired needs. 

HeidiSQL

HeidiSQL is free software, and has the aim to be easy to learn. "Heidi" lets you see and edit data and structures from computers running one of the database systems MariaDB, MySQL, Microsoft SQL, PostgreSQL and SQLite. Invented in 2002 by Ansgar, HeidiSQL belongs to the most popular tools for MariaDB and MySQL worldwide.

For migration purposes, it allows you to export from one server/database directly to another server/database. It also has import features to allow text fields such as CSV, and also export table rows also into a wide range of supported file types such as CSV, HTML, XML, SQL, LaTeX, Wiki Markup and PHP Array. Although it is built to manage databases for db server administration purposes, yet you can use this for simple migration stuff.

Percona Toolkit As Your Swiss Army Knife

Percona Toolkit is notable software being distributed as an open-source software under the warranty of GPL. Percona Toolkit is a collection of advanced command-line tools commonly used internally by Percona but it's also applicable for any database work related especially for MySQL/MariaDB servers. 

So how and why is it also helpful for data migration especially in MySQL/MariaDB migrations? They have a number of tools here which is beneficial to use upon migration and after migration. 

As mentioned earlier, a common approach of data migration is to have the target destination server as a replica of the main source database cluster but in a homogeneous setup. This means that, if the situation is moving from on-prem to a public cloud provider, you can setup an elected node from that platform and this node will replicate all transactions from the main cluster. By using backup tools, you may be able to achieve this type of migration setup. But it doesn't end there. Percona Toolkit has pt-table-checksum/pt-table-sync tools for example in order to help you identify data inconsistencies between on-prem versus the target destination database server. With pt-table-checksum, you can perform checksum calculations based on a series of chunks for all databases or just selectively checksum on particular databases, or particular tables, or even a range of records of the table. pt-table-sync will be used to perform data synchronization so your target databases will be refreshed again with a new copy of the exact data from the main source cluster.

On the other hand, pt-upgrade is very useful after the migration from backup tools is performed. With pt-upgrade, you can use this tool to perform analysis by running a set of queries, for example, from a slow query log file. These results can be used to compare from the source database and against the target database server.

Summary

Database migration, especially from a heterogeneous setup, can be very complicated. Yet on a homogenous setup it can be quite straightforward; regardless if the data is large or small as long as you are equipped with proper tools and, of course, the correct systematic approach to determine that migration is complete with data being consistent. There can be times when a migration requires consultation with the experts, but it's always a great start to come up and try with these open source tools to achieve your desired database migration task.

Viewing all 327 articles
Browse latest View live