How to configure SELinux for MySQL-based systems (MySQL/MariaDB Replication + Galera)

June 24, 2021, 2:13 am

≫ Next: Deploying MariaDB Sharding with Spider using ClusterControl

≪ Previous: How to configure AppArmor for MySQL-based systems (MySQL/MariaDB Replication + Galera)

In the era that we are living in now, anything with a less secure environment is easily a target for an attack and becomes a bounty for the attackers. Compared to the past 20 years, hackers nowadays are more advanced not only with the skills but also with the tools that they are using. It’s no surprise why some giant companies are being hacked and their valuable data is leaked.

In the year 2021 alone, there are already more than 10 incidents related to data breaches. The most recent incident was reported by BOSE, a well-known audio maker that occurred in May. BOSE discovered that some of its current and former employees’ personal information was accessed by the attackers. The personal information exposed in the attack includes names, Social Security Numbers, compensation information, and other HR-related information.

What do you think is the purpose of this kind of attack and what motivates the hacker to do so? It’s obviously all about the money. Since stolen data is also frequently sold, by attacking big companies hackers can earn money. Not only the important data can be sold to the competitors of the business, but the hackers can also ask for a huge ransom at the same time.

So how could we relate this to databases? Since the database is one of the big assets for the company, it is recommended to take care of it with enhanced security so that our valuable data is protected most of the time. In my last blog post, we already went through some introduction about SELinux, how to enable it, what type of mode SELinux has and how to configure it for MongoDB. Today, we will take a look into how to configure SELinux for MySQL based systems.

Top 5 Benefits of SELinux

Before going further, perhaps some of you are wondering if SELinux provides any positive benefits given it’s a bit of a hassle to enable it. Here are the top 5 SELinux benefits that you don’t want to miss and should consider:

Enforcing data confidentiality and integrity at the same time protecting processes
The ability to confine services and daemons to be more predictable
Reducing the risk of privilege escalation attacks
The policy enforced systems-wide, not set at user discretion and administratively-define
Providing a fine-grained access control

Before we start configuring the SELinux for our MySQL instances, why not go through how to enable SELinux with ClusterControl for all MySQL based deployments. Even though the step is the same for all database management systems, we think it is a good idea to include some screenshots for your reference.

Steps To Enable SELinux for MySQL Replication

In this section, we are going to deploy MySQL Replication with ClusterControl 1.8.2. The steps are the same for MariaDB, Galera Cluster or MySQL: assuming all nodes are ready and passwordless SSH is configured, let’s start the deployment. To enable SELinux for our setup, we need to untick “Disable AppArmor/SELinux” which means SELinux will be set as “permissive” for all nodes.

Next, we will choose Percona as a vendor (you can also choose MariaDB, Oracle or MySQL 8 as well), then specify the “root” password. You may use a default location or your other directories depending on your setup.

Once all hosts have been added, we can start the deployment and let it finish before we can begin with the SELinux configuration.

Steps To Enable SELinux for MariaDB Replication

In this section, we are going to deploy MariaDB Replication with ClusterControl 1.8.2.

We will choose MariaDB as a vendor and version 10.5 as well as specify the “root” password. You may use a default location or your other directories depending on your setup.

Once all hosts have been added, we can start the deployment and let it finish before we can proceed with the SELinux configuration.

Steps To Enable SELinux for Galera Cluster

In this section, we are going to deploy Galera Cluster with ClusterControl 1.8.2. Once again, untick “Disable AppArmor/SELinux” which means SELinux will be set as “permissive” for all nodes:

Next, we will choose Percona as a vendor and MySQL 8 as well as specify the “root” password. You may use a default location or your other directories depending on your setup. Once all hosts have been added, we can start the deployment and let it finish.

As usual, we can monitor the status of the deployment in the “Activity” section of the UI.

How To Configure SELinux For MySQL

Considering all our clusters are MySQL based, the steps to configure SELinux are also the same. Before we start with the setup and since this is a newly setup environment we suggest you disable auto recovery mode for both cluster and node as per the screenshot below. By doing this, we could avoid the cluster run into a failover while we are doing the testing or restart the service:

First, let’s see what is the context for “mysql”. Go ahead and run the following command to view the context:

$ ps -eZ | grep mysqld_t

And the example of the output is as below:

system_u:system_r:mysqld_t:s0       845 ?        00:00:01 mysqld

The definition for the output above is:

system_u - User
system_r - Role
mysqld_t - Type
s0 845 - Sensitivity level

If you check the SELinux status, you can see the status is “permissive” which is not fully enabled yet. We need to change the mode to “enforcing” and in order to accomplish that we have to edit the SELinux configuration file to make it permanent.

$ vi /etc/selinux/config
SELINUX=enforcing

Proceed to reboot the system after the changes. As we are changing the mode from “permissive” to “enforcing”, we need to relabel the file system again. Typically, you can choose whether to relabel the entire file system or only for one application. The reason why relabel is required due to the fact that “enforcing” mode needs the correct label or function to run correctly. In some instance, those labels are changed during the “permissive” or “disabled” mode.

For this example, we will relabel only one application (MySQL) using the following command:

$ fixfiles -R mysqld restore

For a system that has been used for quite some time, it is a good idea to relabel the entire file system. The following command will do the job without rebooting and this process might take a while depending on your system:

$ fixfiles -f -F relabel

Like many other databases, MySQL also demands to read and write a lot of files. Without a correct SELinux context for those files, the access will be unquestionably denied. To configure the policy for SELinux, “semanage” is required. “semanage” also allows any configuration without a need of recompiling the policy sources. For the majority of Linux systems, this tool already installed by default. As for our case, it’s already installed with the following version:

$ rpm -qa |grep semanage
python3-libsemanage-2.9-3.el8.x86_64
libsemanage-2.9-3.el8.x86_64

For the system that does not have it installed, the following command will help you to install it:

$ yum install -y policycoreutils-python-utils

Now, let’s see what is the MySQL file contexts:

$ semanage fcontext -l | grep -i mysql

As you may notice, there are a bunch of files that are connected to MySQL once the above command is executed. If you recall at the beginning, we are using a default “Server Data Directory”. Should your installation is using a different data directory location, you need to update the context for “mysql_db_t” which refers to the /var/lib/mysql/

The first step is to change the SELinux context by using any of these options:

$ semanage fcontext -a -t mysqld_db_t /var/lib/yourcustomdirectory
$ semanage fcontext -a -e /var/lib/mysql /var/lib/yourcustomdirectory

After the step above, run the following command:

$ restorecon -Rv /var/lib/yourcustomdirectory

And lastly, restart the service:

$ systemctl restart mysql

In some setup, likely a different log location is required for any purpose. For this situation, “mysqld_log_t” needs to be updated as well. “mysqld_log_t” is a context for default location /var/log/mysqld.log and the steps below can be executed to update it:

$ semanage fcontext -a -t mysqld_log_t "/your/custom/error.log"
$ restorecon -Rv /path/to/my/custom/error.log
$ systemctl restart mysql

There will be a situation when the default port is configured using a different port other than 3306. For example, if you are using port 3303 for MySQL, you need to define the SELinux context with the following command:

$ semanage port -a -t mysqld_port_t -p tcp 3303

And to verify that the port has been updated, you may use the following command:

$ semanage port -l | grep mysqld

Using audit2allow To Generate Policy

Another way to configure the policy is by using “audit2allow” which already included during the “semanage” installation just now. What this tool does is by pulling the log events from the audit.log and use that information to create a policy. Sometimes, MySQL might need a non-standard policy, so this is the best way to achieve that.

First, let’s set the mode to permissive for the MySQL domain and verify the changes:

$ semanage permissive -a mysqld_t
$ semodule -l | grep permissive
permissive_mysqld_t
permissivedomains

The next step is to generate the policy using the command below:

$ grep mysqld /var/log/audit/audit.log | audit2allow -M {yourpolicyname}
$ grep mysqld /var/log/audit/audit.log | audit2allow -M mysql_new

You should see the output like the following (will differ depending on your policy name that you set):

******************** IMPORTANT ***********************

To make this policy package active, execute:

semodule -i mysql_new.pp

As stated, we need to execute “semodule -i mysql_new.pp” to activate the policy. Go ahead and execute it:

$ semodule -i mysql_new.pp

The final step is to put the MySQL domain back to the “enforcing” mode:

$ semanage permissive -d mysqld_t

libsemanage.semanage_direct_remove_key: Removing last permissive_mysqld_t module (no other permissive_mysqld_t module exists at another priority).

What Should You Do If SELinux is Not Working?

A lot of times, the SELinux configuration requires so much testing. One of the best ways to test the configuration is by changing the mode to “permissive”. If you want to set it only for the MySQL domain, you can just use the following command. This is good practice to avoid configuring the whole system to “permissive”:

$ semanage permissive -a mysqld_t

Once everything is done, you may change the mode back to the “enforcing”:

$ semanage permissive -d mysqld_t

In addition to that, /var/log/audit/audit.log provides all logs related to the SELinux. This log will help you a lot in identifying the root cause and the reason. All you have to do is to filter “denied” using “grep”.

$ more /var/log/audit/audit.log |grep "denied"

We are now finished with configuring SELinux policy for MySQL based system. One thing worth mentioning is that the same configuration needs to be done on all nodes of your cluster, you might need to repeat the same process for them.

Tags:

selinux

MySQL

clustercontrol

galera

MariaDB

percona

security

↧

Deploying MariaDB Sharding with Spider using ClusterControl

July 12, 2021, 5:23 am

≫ Next: ClusterControl - Advanced Backup Management - mariabackup Part II

≪ Previous: How to configure SELinux for MySQL-based systems (MySQL/MariaDB Replication + Galera)

MariaDB offers built-in multi-host sharding capabilities with the Spider storage engine. Spider supports partitioning and XA transactions and allows remote tables of different MariaDB instances to be handled as if they were on the same instance. The remote table can be of any storage engine. The table linking is achieved by the establishment of the connection from a local MariaDB server to a remote MariaDB server, and the link is shared for all tables that are part of the same transaction.

In this blog post, we are going to walk you through the deployment of a cluster of two MariaDB shards using ClusterControl. We are going to deploy a handful of MariaDB servers (for redundancy and availability) to host a partitioned table based on a range of a selected shard key. The chosen shard key is basically a column that stores values with a lower and upper limit, as in this case, integer values between 0 to 1,000,000, making it the best candidate key to balance data distribution between two shards. Therefore, we will divide the ranges into two partitions:

0 - 499999: Shard 1
500000 - 1000000: Shard 2

The following diagram illustrates our high-level architecture of what we are going to deploy:

Some explanations of the diagram:

mariadb-gw-1: MariaDB instance that runs Spider storage engine, acts like a shard router. We give a name to this host as MariaDB Gateway 1 and this is going to be the primary (active) MariaDB server to reach the shards. The application will connect to this host like a standard MariaDB connection. This node connects to the shards via HAProxy listening on 127.0.0.1 ports 3307 (shard1) and 3308 (shard2).
mariadb-gw-2: MariaDB instance that runs Spider storage engine, acts like a shard router. We give a name to this host as MariaDB Gateway 2 and this is going to be the secondary (passive) MariaDB server to reach the shards. It will have the same setup as mariadb-gw-1. The application will connect to this host only if the primary MariaDB is down. This node connects to the shards via HAProxy listening on 127.0.0.1 ports 3307 (shard1) and 3308 (shard2).
mariadb-shard-1a: MariaDB master that serves as the primary data node for the first partition. MariaDB gateway servers should only write to the shard's master.
mariadb-shard-1b: MariaDB replica that serves as secondary data node for the first partition. It shall take over the master role in case the shard's master goes down (automatic failover is managed by ClusterControl).
mariadb-shard-2a: MariaDB master that serves as primary data node for the second partition. MariaDB gateway servers only write to the shard's master.
mariadb-shard-2b: MariaDB replica that serves as secondary data node for the second partition. It shall take over the master role in case the shard's master goes down (automatic failover is managed by ClusterControl).
ClusterControl: A centralized deployment, management and monitoring tool for our MariaDB shards/clusters.

Deploying Database Clusters using ClusterControl

ClusterControl is an automation tool to manage the lifecycle of your open-source database management system. We are going to use ClusterControl as a centralized tool for cluster deployments, topology management and monitoring for the purpose of this blog post.

1) Install ClusterControl.

2) Configure the passwordless SSH from ClusterControl server to all database nodes. On the ClusterControl node:

(clustercontrol)$ whoami
root
$ ssh-keygen -t rsa
$ ssh-copy-id root@192.168.22.41
$ ssh-copy-id root@192.168.22.42
$ ssh-copy-id root@192.168.22.43
$ ssh-copy-id root@192.168.22.44
$ ssh-copy-id root@192.168.22.45
$ ssh-copy-id root@192.168.22.46

3) Since we are going to deploy 4 sets of clusters, it is a good idea to use the ClusterControl CLI tool for this particular task to expedite and simplify the deployment process. Let's first verify if we can connect with the default credentials by running the following command (default credential is auto-configured at /etc/s9s.conf):

(clustercontrol)$ s9s cluster --list --long
Total: 0

If we don't get any errors and see a similar output as above, we are good to go.

4) Note that steps 4,5,6 and 7 can be executed at once since ClusterControl supports parallel deployment. We will start by deploying the first MariaDB Gateway server using ClusterControl CLI:

(clustercontrol)$ s9s cluster --create \
        --cluster-type=mysqlreplication \
        --nodes="192.168.22.101?master" \
        --vendor=mariadb \
        --provider-version=10.5 \
        --os-user=root \
        --os-key-file=/root/.ssh/id_rsa \
        --db-admin="root" \
        --db-admin-passwd="SuperS3cr3tPassw0rd" \
        --cluster-name="MariaDB Gateway 1"

5) Deploy the second MariaDB Gateway server:

(clustercontrol)$ s9s cluster --create \
        --cluster-type=mysqlreplication \
        --nodes="192.168.22.102?master" \
        --vendor=mariadb \
        --provider-version=10.5 \
        --os-user=root \
        --os-key-file=/root/.ssh/id_rsa \
        --db-admin="root" \
        --db-admin-passwd="SuperS3cr3tPassw0rd" \
        --cluster-name="MariaDB Gateway 2"

6) Deploy a 2-node MariaDB Replication for the first shard:

(clustercontrol)$ s9s cluster --create \
        --cluster-type=mysqlreplication \
        --nodes="192.168.22.111?master;192.168.22.112?slave" \
        --vendor=mariadb \
        --provider-version=10.5 \
        --os-user=root \
        --os-key-file=/root/.ssh/id_rsa \
        --db-admin="root" \
        --db-admin-passwd="SuperS3cr3tPassw0rd" \
        --cluster-name="MariaDB - Shard 1"

7) Deploy a 2-node MariaDB Replication for the second shard:

(clustercontrol)$ s9s cluster --create \
        --cluster-type=mysqlreplication \
        --nodes="192.168.22.121?master;192.168.22.122?slave" \
        --vendor=mariadb \
        --provider-version=10.5 \
        --os-user=root \
        --os-key-file=/root/.ssh/id_rsa \
        --db-admin="root" \
        --db-admin-passwd="SuperS3cr3tPassw0rd" \
        --cluster-name="MariaDB - Shard 2"

While the deployment is ongoing, we can monitor the job output from CLI:

(clustercontrol)$ s9s job --list --show-running
ID CID STATE   OWNER GROUP  CREATED  RDY TITLE
25   0 RUNNING admin admins 07:19:28  45% Create MySQL Replication Cluster
26   0 RUNNING admin admins 07:19:38  45% Create MySQL Replication Cluster
27   0 RUNNING admin admins 07:20:06  30% Create MySQL Replication Cluster
28   0 RUNNING admin admins 07:20:14  30% Create MySQL Replication Cluster

And also from the ClusterControl UI:

Once the deployment is complete, you should see something the database clusters are listed like this in the ClusterControl dashboard:

Our clusters are now deployed and running the latest MariaDB 10.5. Next, we need to configure HAProxy to provide a single endpoint to the MariaDB shards.

Configure HAProxy

HAProxy is necessary as a single-endpoint to the shard's master-slave replication. Otherwise, if a master goes down, one has to update Spider's server list using the CREATE OR REPLACE SERVER statement in the gateway servers, and the perform ALTER TABLE and pass a new connection parameter. With HAProxy, we can configure it to listen on the local host of the gateway server and monitor different MariaDB shards with different ports. We will configure HAProxy on both gateway servers as the following:

127.0.0.1:3307 -> Shard1 (backend servers are mariadb-shard-1a and mariadb-shard-1b)
127.0.0.1:3308 -> Shard2 (backend servers are mariadb-shard-2a and mariadb-shard-2b)

In case of the shard's master goes down, ClusterControl will failover the shard's slave as the new master and HAProxy will reroute the connections to the new master accordingly. We are going to install HAProxy on the gateway servers (mariadb-gw-1 and mariadb-gw-2) using ClusterControl since it will automatically configure the backend servers (mysqlchk setup, user grants, xinetd installation) with some tricks as shown below.

First of all, on the ClusterControl UI, choose the first shard, MariaDB - Shard 1 -> Manage -> Load Balancers -> HAProxy -> Deploy HAProxy and specify the Server Address as 192.168.22.101 (mariadb-gw-1), similar to the following screenshot:

Similarly, but this one for shard 2, go to MariaDB - Shard 2 -> Manage -> Load Balancers -> HAProxy -> Deploy HAProxy and specify the Server Address as 192.168.22.102 (mariadb-gw-2). Wait until the deployment finishes for both HAProxy nodes.

Now we need to configure the HAProxy service on mariadb-gw-1 and mariadb-gw-2 to load balance all shards at once. Using text editor (or ClusterControl UI -> Manage -> Configurations), edit the last 2 "listen" directives of the /etc/haproxy/haproxy.cfg to look like this:

listen  haproxy_3307_shard1
        bind *:3307
        mode tcp
        timeout client  10800s
        timeout server  10800s
        tcp-check connect port 9200
        tcp-check expect string master\ is\ running
        balance leastconn
        option tcp-check
        default-server port 9200 inter 2s downinter 5s rise 3 fall 2 slowstart 60s maxconn 64 maxqueue 128 weight 100
        server 192.168.22.111 192.168.22.111:3306 check # mariadb-shard-1a-master
        server 192.168.22.112 192.168.22.112:3306 check # mariadb-shard-1b-slave

listen  haproxy_3308_shard2
        bind *:3308
        mode tcp
        timeout client  10800s
        timeout server  10800s
        tcp-check connect port 9200
        tcp-check expect string master\ is\ running
        balance leastconn
        option tcp-check
        default-server port 9200 inter 2s downinter 5s rise 3 fall 2 slowstart 60s maxconn 64 maxqueue 128 weight 100
        server 192.168.22.121 192.168.22.121:3306 check # mariadb-shard-2a-master
        server 192.168.22.122 192.168.22.122:3306 check # mariadb-shard-2b-slave

Restart the HAProxy service to load the changes (or use ClusterControl -> Nodes -> HAProxy -> Restart Node):

$ systemctl restart haproxy

From ClusterControl UI, we can verify that only one backend server is active per shard (indicated by the green lines), as shown below:

At this point, our database cluster deployment is now complete. We can proceed to configure the MariaDB sharding using the Spider storage engine.

Preparing MariaDB Gateway Servers

On both MariaDB Gateway servers (mariadb-gw-1 and mariadb-gw-2), perform the following tasks:

Install Spider plugin:

MariaDB> INSTALL PLUGIN spider SONAME 'ha_spider.so';

Verify if the storage engine is supported:

MariaDB> SELECT engine,support FROM information_schema.engines WHERE engine = 'spider';
+--------+---------+
| engine | support |
+--------+---------+
| SPIDER | YES     |
+--------+---------+

Optionally, we can also verify if the plugin is loaded correctly from the information_schema database:

MariaDB> SELECT PLUGIN_NAME,PLUGIN_VERSION,PLUGIN_STATUS,PLUGIN_TYPE FROM information_schema.plugins WHERE plugin_name LIKE 'SPIDER%';
+--------------------------+----------------+---------------+--------------------+
| PLUGIN_NAME              | PLUGIN_VERSION | PLUGIN_STATUS | PLUGIN_TYPE        |
+--------------------------+----------------+---------------+--------------------+
| SPIDER                   | 3.3            | ACTIVE        | STORAGE ENGINE     |
| SPIDER_ALLOC_MEM         | 1.0            | ACTIVE        | INFORMATION SCHEMA |
| SPIDER_WRAPPER_PROTOCOLS | 1.0            | ACTIVE        | INFORMATION SCHEMA |
+--------------------------+----------------+---------------+--------------------+

Add the following line under the [mysqld] section inside the MariaDB configuration file:

plugin-load-add = ha_spider

Create the first "data node" for the first shard which should be accessible via HAProxy 127.0.0.1 on port 3307:

MariaDB> CREATE OR REPLACE SERVER Shard1 
FOREIGN DATA WRAPPER mysql
OPTIONS (
   HOST '127.0.0.1',
   DATABASE 'sbtest',
   USER 'spider',
   PASSWORD 'SpiderP455',
   PORT 3307);

Create the second "data node" for the second shard which should be accessible via HAProxy 127.0.0.1 on port 3308:

CREATE OR REPLACE SERVER Shard2 
FOREIGN DATA WRAPPER mysql
OPTIONS (
   HOST '127.0.0.1',
   DATABASE 'sbtest',
   USER 'spider',
   PASSWORD 'SpiderP455',
   PORT 3308);

Now we can create a Spider table that needs to be partitioned. In this example, we are going to create a table called sbtest1 inside database sbtest, and partitioned by the integer value in the column 'k':

MariaDB> CREATE SCHEMA sbtest;
MariaDB> CREATE TABLE sbtest.sbtest1 (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`, `k`)
)
  ENGINE=Spider
  COMMENT 'wrapper "mysql", table "sbtest1"'
  PARTITION BY RANGE (k) (
    PARTITION shard1 VALUES LESS THAN (499999) COMMENT = 'srv "Shard1"',
    PARTITION shard2 VALUES LESS THAN MAXVALUE COMMENT = 'srv "Shard2"'
);

Note that the COMMENT = 'srv "ShardX"' clauses of the CREATE TABLE statement are critical, where we pass connection information about the remote server. The value must be identical to the server name as in the CREATE SERVER statement. We are going to fill up this table using the Sysbench load generator as shown further below.

Create the application database user to access the database, and allow it from the application servers:

MariaDB> CREATE USER sbtest@'192.168.22.%' IDENTIFIED BY 'passw0rd';
MariaDB> GRANT ALL PRIVILEGES ON sbtest.* TO sbtest@'192.168.22.%';

In this example, since this is a trusted internal network, we just use a wildcard in the statement to allow any IP address in the same range, 192.168.22.0/24.

We are now ready to configure our data nodes.

Preparing MariaDB Shard Servers

On both MariaDB Shard master servers (mariadb-shard-1a and mariadb-shard-2a), perform the following tasks:

1) Create the destination database:

MariaDB> CREATE SCHEMA sbtest;

2) Create the 'spider' user and allow connections from the gateway servers (mariadb-gw-1 and mariadb-gw2). This user must have all privileges on the sharded table and also MySQL system database :

MariaDB> CREATE USER 'spider'@'192.168.22.%' IDENTIFIED BY 'SpiderP455';
MariaDB> GRANT ALL PRIVILEGES ON sbtest.* TO spider@'192.168.22.%';
MariaDB> GRANT ALL ON mysql.* TO spider@'192.168.22.%';

In this example, since this is a trusted internal network, we just use a wildcard in the statement to allow any IP address in the same range, 192.168.22.0/24.

3) Create the table that is going to receive the data from our gateway servers via Spider storage engine. This "receiver" table can be on any storage engine supported by MariaDB. In this example, we use InnoDB storage engine:

MariaDB> CREATE TABLE sbtest.sbtest1 (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `k` int(11) NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`, `k`)
) ENGINE = INNODB;

That's it. Don't forget to repeat the steps on the other shard.

Testing

To test using Sysbench to generate some database workloads, on the application server, we have to install Sysbench beforehand:

$ yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
$ yum install -y sysbench

Generate some test workloads and send them to the first gateway server, mariadb-gw-1 (192.168.11.101):

$ sysbench \
/usr/share/sysbench/oltp_insert.lua \
--report-interval=2 \
--threads=4 \
--rate=20 \
--time=9999 \
--db-driver=mysql \
--mysql-host=192.168.11.101 \
--mysql-port=3306 \
--mysql-user=sbtest \
--mysql-db=sbtest \
--mysql-password=passw0rd \
--tables=1 \
--table-size=1000000 \
run

You may repeat the above test on mariadb-gw-2 (192.168.11.102) and the database connections should be routed to the right shard accordingly.

When looking at the first shard (mariadb-shard-1a or mariadb-shard-1b), we can tell that this partition only holds rows where the shard key (column k) is smaller than 500000:

MariaDB [sbtest]> SELECT MIN(k),MAX(k) FROM sbtest1;
+--------+--------+
| min(k) | max(k) |
+--------+--------+
| 200175 | 499963 |
+--------+--------+

On another shard (mariadb-shard-2a or mariadb-shard-2b), it holds data from 500000 up until 999999 as expected:

MariaDB [sbtest]> SELECT MIN(k),MAX(k) FROM sbtest1;
+--------+--------+
| min(k) | max(k) |
+--------+--------+
| 500067 | 999948 |
+--------+--------+

While for MariaDB Gateway server (mariadb-gw-1 or mariadb-gw-2), we can see all rows similar to if the table exists inside this MariaDB instance:

MariaDB [sbtest]> SELECT MIN(k),MAX(k) FROM sbtest1;
+--------+--------+
| min(k) | max(k) |
+--------+--------+
| 200175 | 999948 |
+--------+--------+

To test on the high availability aspect, when a shard master is not available, for example when the master (mariadb-shard-2a) of shard 2 goes down, ClusterControl will automatically perform the slave promotion on the slave (mariadb-shard-2b) to be a master. During this period, you could probably see this error:

ERROR 1429 (HY000) at line 1: Unable to connect to foreign data source: Shard2

And while its unavailability, you will get the following subsequent error:

ERROR 1158 (08S01) at line 1: Got an error reading communication packets

In our measurement, the failover took around 23 seconds after the failover had commenced and once the new master is promoted, you should be able to write into the table from the gateway server as usual.

Conclusion

The above setup is a proof of principle on how ClusterControl can be used to deploy a MariaDB sharded setup. It can also improve the service availability of a MariaDB sharding setup with its automatic node and cluster recovery feature, plus all of the industry-standard management and monitoring features to support your overall database infrastructure.

Tags:

↧

ClusterControl - Advanced Backup Management - mariabackup Part II

July 21, 2021, 12:28 pm

≫ Next: ClusterControl - Advanced Backup Management - mariabackup Part III

≪ Previous: Deploying MariaDB Sharding with Spider using ClusterControl

In the previous part we have tested backup time and effectiveness of the compression for different backup compression levels and methods. In this blog we will continue our efforts and we will talk about more settings that, probably, most of the users do not really change yet they may have a visible effect on the backup process.

The setup is the same as in the previous part: we will use MariaDB master-slave replication cluster with ProxySQL and Keepalived.

We have generated 7.6GB of data using sysbench:

sysbench /root/sysbench/src/lua/oltp_read_write.lua --threads=4 --mysql-host=10.0.0.111 --mysql-user=sbtest --mysql-password=sbtest --mysql-port=6033 --tables=32 --table-size=1000000 prepare

Using PIGZ

This time we are going to enable Use PIGZ for parallel gzip for our backups. As before, we will test every compression level to see how it performs.

We are storing the backup locally on the instance, the instance is configured with 4 vCPUs.

The outcome is sort of expected. The backup process was significantly faster than when we used just a single CPU core. The size of the backup remains pretty much the same, there is no real reason for it to change significantly. It is clear that using pigz improves the backup time. There is a dark side of using parallel gzip though, and it is CPU utilization:

As you can see, the CPU utilization skyrockets and it reaches almost 100% for higher compression levels. Increasing CPU utilization on the database server is not necessarily the best idea as, typically, we want CPU to be available for the database. On the other hand, if we happen to have a replica that is dedicated to taking backups and, let’s say, heavier queries - a node that is not used for serving an OLTP type of traffic, we can enable parallel gzip to greatly reduce the backup time. As can be clearly seen, it is not an option for everyone but it is definitely something that you can find useful in some particular scenarios. Just keep in mind that CPU utilization is something you need to track as it will impact the latency of the queries and, as through it, it will impact the user experience - something we always should consider when working with the databases.

Xtrabackup Parallel Copy Threads

Another setting we want to highlight is Xtrabackup Parallel Copy Threads. To understand what it is, let’s talk a bit about the way Xtrabackup (or MariaBackup) works. In short, those tools perform two actions at the same time. They copy the data, physical files, from the database server to the backup location while monitoring the InnoDB redo logs for any updates. The backup consists of the files and the record of all changes to InnoDB that happened during the backup process. This, with backup locks or FLUSH TABLES WITH READ LOCK, allows to create backup that is consistent at the point of time when the data transfer has been finished. Xtrabackup Parallel Copy Threads define the number of threads that will perform the data transfer. If we set it to 1, one file will be copied at the same time. If we’ll set it to 8, theoretically up to 8 files can be transferred at once. Of course, there has to be fast enough storage to actually benefit from such a setting. We are going to perform several tests, changing Xtrabackup Parallel Copy Threads from 1 through 2 and 4 to 8. We will run tests on compression level of 6 (default one) with and without parallel gzip enabled.

First four backups (27 - 30) have been created without parallel gzip, starting from 1 through 2, 4 and 8 parallel copy threads. Then we repeated the same process for backups 31 to 34, this time using parallel gzip. As you can see, in our case there is hardly a difference between the parallel copy threads. This will most likely be more impactful if we would increase the size of the data set. It also would improve the backup performance if we would use faster, more reliable storage. As usual, your mileage will vary and in different environments this setting may affect the backup process more than what we see here.

Network throttling

Finally, in this part of our short series we would like to talk about the ability to throttle the network usage.

As you may have seen, backups can be stored locally on the node or it can also be streamed to the controller host. This happens over the network and, by default, it will be done “as fast as possible”.

In some cases, where your network throughput is limited (cloud instances, for example), you may want to reduce the network usage caused by the MariaBackup by setting a limit on the network transfer. When you do that, ClusterControl will use ‘pv’ tool to limit the bandwidth available for the process.

As you can see, the first backup took around 3 minutes but when we throttled the network throughput, backup took 13 minutes and 37 seconds.

In both cases we used pigz and the compression level 1. The graph above shows that throttling the network also reduced the CPU utilization. It makes sense, if pigz has to wait for the network to transfer the data, it doesn’t have to push hard on the CPU as it has to idle most of the time.

Hopefully you found this short blog interesting and maybe it will encourage you to experiment with some of the not-so-commonly-used features and options of MariaBackup. If you would like to share some of your experience, we would like to hear from you in the comments below.

Tags:

↧

ClusterControl - Advanced Backup Management - mariabackup Part III

August 9, 2021, 2:19 am

≫ Next: Webinar Replay: Top Five Tips to Drive MariaDB Galera Cluster Performance for Nextcloud

≪ Previous: ClusterControl - Advanced Backup Management - mariabackup Part II

So far in the previous two parts of this short blog series we have discussed several options that may impact the time and size of the backup. We have discussed different compression options and a setting related to throttling the network transfer should you stream the data from the node to the controller host. This time we would like to highlight something else - ability to take partial backups using MariaBackup. First, let’s talk what are the partial backups and what are the challenges related to them.

Partial backups

MariaBackup is a backup tool that creates physical backups. What it means is that it will copy the data stored in files on the database node to the target location. It will create a consistent backup of the database, something that allows you to restore your data to a precise point of time - the time when the backup completed. All data in all tables and schemas will be consistent. This is quite important to keep in mind. Consistent backups can be used to provision replicas, running Point-in-Time Restore and so on.

Partial backups on the other hand are, well, partial. Only a subset of the tables is backed up. Obviously, this makes the backup inconsistent. It cannot be used to create a replica or to restore the data to the same point of time. Partial backups still have their own use. They can be used to restore a subset of the data - instead of restoring whole backup you can restore just a single table and then extract the data you need. Sure, you can do the same with logical backups but those are quite slow and not really suitable for any kind of larger deployments.

The downside is that partial backup is not consistent in time. This should be quite obvious as we are collecting just a subset of the data. Another challenge is restore - you cannot restore partial backups directly on the production systems easily. First, because it is not straightforward, second, because it is not consistent. The safest way to restore partial backup would be to restore it on a separate node and then use mysqldump or SELECT INTO OUTFILE to extract required data.

Let’s take a look at the options that ClusterControl provides us with regarding the partial backups.

Partial backups in ClusterControl

First of all, partial backups are not used by default, you have to explicitly enable them. Then a set of options shows up which allows us to pick what we want to backup. We can pick a particular schema or a set of tables. We can take a backup of all tables except some or we can just tell that we want to take a backup of tables A, B and C.

Photo author

Photo description

Of course, when you go to the drop-down, you’ll see all databases and all tables listed to pick from.

We have picked some of the tables and schemas and we are going to run this backup now. Of course, if you want that, you can schedule partial backups in exactly the same way as normal ones.

On the second screen we can configure mariabackup to our liking, just like we explained in our previous blog posts. That’s it, click on the Create Backup button and the process will start.

Restoring partial backup in ClusterControl

Once the backup is ready, it will become visible on the backup list.

We can see it is a partial backup because there is a list of schemas that are included in it.

When we attempt to restore a partial backup in an asynchronous replication cluster we are presented with two options. Restore on node and restore and verify on standalone host. The former is definitely not something we want to do as it would wipe out some of the data we do not have in the backup. The latter option, on the other hand, allows you to deploy a separate node and restore the backup on it.

All that we need to do is to pick a hostname that is reachable by SSH from ClusterControl and ensure that it won’t be stopped after the backup is restored. This will let us restore the partial backup and then access it to extract any kind of data we may want.

We hope that this short blog gives you some insight into how ClusterControl allows you to perform partial backups, what are the use cases and how can you restore them in a safe way.

Tags:

clustercontrol

backup

MariaDB

↧

Webinar Replay: Top Five Tips to Drive MariaDB Galera Cluster Performance for Nextcloud

August 13, 2021, 5:36 am

≫ Next: What is MariaDB Enterprise and How to Manage it with ClusterControl?

≪ Previous: ClusterControl - Advanced Backup Management - mariabackup Part III

Are you a MariaDB database administrator that is facing problems regarding your Galera Cluster implementations? Perhaps you even tried to make it work in harmony with Nextcloud? If you have answered “yes” to any of these questions, this webinar replay is for you! This webinar featuring Björn Schiessle, Pre-sales Lead and Co-Founder at Nextcloud, and Ashraf Sharif, Senior Support Engineer at Severalnines will take you through a deep dive into designing, implementing and optimizing your MariaDB Galera Cluster performance for Nextcloud.

What is Nextcloud?

According to Nextcloud themselves, they have built the most-deployed on-premises file sharing and collaboration platform. Nextcloud is written in PHP and Javascript and can be extended using plugins. Nextcloud enables users to gain access to their documents and share them with people within and outside of their organization also by employing an easy-to-use web interface. Nextcloud can also be extended with plugins to improve its functionality even further. Nextcloud also includes extensive collaboration capabilities through Calendar, Contact, Mail, private audio / video conferencing and more!

Nextcloud is an open source platform. Its services are being used by millions of people worldwide and with tens of thousands of servers deployed all across the globe, they sure know how to satisfy both small and gigantic companies. If you are looking for a file sharing and collaboration platform, Nextcloud is surely the way to go.

Here are some statistics from Nextcloud themselves:

Judging from the fact that the majority of people are using either single instance MySQL or MariaDB deployments or Galera Cluster,

What’s the Webinar Replay About?

In this webinar replay you will hear some valuable insight situated around the capability of Nextcloud: since Nextcloud uses its database to store data, a database that is performing poorly can have a serious performance impact on Nextcloud. Since MariaDB Galera Cluster is a recommended solution to gain both performance and high availability, this webinar dives deeper into how to optimize MariaDB Galera Cluster for Nextcloud. In this webinar, Björn explains the architecture of Nextcloud, Ashraf then takes us in a deep dive into how to design and optimize MariaDB Galera Cluster for Nextcloud. He will cover 5 tips on how to significantly improve performance and stability.

Agenda

The following things will be covered in the webinar:

Nextcloud architecture
Designing database architecture
Database proxy solutions
MariaDB performance tuning
MySQL storage engine (InnoDB) performance tuning
Nextcloud performance tuning
A Question & Answer (Q&A) session is included as well!

We are certain that this webinar will help you improve your MariaDB Galera Cluster performance for Nextcloud - if you’re interested, tune in!

Tags:

MariaDB

galera

performance

↧

What is MariaDB Enterprise and How to Manage it with ClusterControl?

April 12, 2022, 4:33 pm

≫ Next: What's New in MariaDB 10.6

≪ Previous: Webinar Replay: Top Five Tips to Drive MariaDB Galera Cluster Performance for Nextcloud

Have you ever wondered what products MariaDB Enterprise has to offer? Is it different from MariaDB Community? Can I manage them with ClusterControl?

MariaDB provides two distributions of their software — Enterprise and Community. The Community consists of the MariaDB Server, which has Galera embedded; you can use either standard, asynchronous or semi-synchronous replication or, as an alternative, build a MariaDB Cluster based on Galera. Another addition to the Community distribution is MariaDB ColumnStore. MariaDB 10.6 Community comes with ColumnStore 5.5. MariaDB ColumnStore is a columnar analytics database that allows users to create fast reporting queries through a reporting-optimized way of storing the data. Finally, it is also possible to use MaxScale, a proxy developed by MariaDB, for free as long as you use up to two database nodes. This limit, however, means it’s not feasible for any production deployment and might be used as a never-ending trial.

This post will explore products included with MariaDB Enterprise and how it works with ClusterControl.

What Products does the MariaDB Enterprise Platform Include?

MariaDB Enterprise Server

Let’s take a look at the Enterprise offering from MariaDB. MariaDB 10.6 is the enhanced version of the Community version. It comes with features such as an improved MariaDB Enterprise Audit plugin that adds additional options to control the audited events. MariaDB Enterprise Backup is an improved version of MariaBackup, which reduces the optimized lock handling, effectively decreasing the blocking of writers if a backup is running. MariaDB Enterprise Cluster adds additional data-at-rest encryption for Galera, non-blocking DDLs for Galera, and a few other small features.

MariaDB Enterprise ColumnStore

A further difference is in other parts of the package. First, ColumnStore is available in the most recent version — 5.6 or 6.2. MariaDB Enterprise ColumnStore 6, as per MariaDB documentation, comes with new features like disk-based aggregation, which allows you to trade the performance of the aggregation operations for larger data sets that can be aggregated. So far, all data had to fit in memory. Now, it is possible to use disk for aggregation. Another improvement is introducing an LZ4 compression in addition to the already existing Snappy compression. The precision of the DECIMAL data type has also been increased from 18 to 38, and it’s now possible to update transactional data from ColumnStore data. We can execute updates on the InnoDB table that uses data from the ColumnStore table. In the past, only the other way around (updating ColumnStore based on InnoDB data) was supported.

Finally, another significant change between Enterprise and Community ColumnStore offerings is that MariaDB Enterprise ColumnStore comes with an option to deploy multi-node setups, allowing for better scalability and high availability.

MariaDB Xpand

MariaDB Xpand (previously Clustrix) is a database that, while still providing drop-in compatibility with MySQL, allows users to scale out by adding additional nodes to the cluster. MariaDB Xpand is ACID-compliant and provides fault tolerance, high availability, and scalability. On top of that, other features listed on the MariaDB website are parallel query evaluation and execution, columnar indexes, and automated data partitioning.

MaxScale

As we mentioned earlier, MaxScale, even though it is available to download for free, comes with a license that limits its free use to only two backend nodes, making it unusable for most production environments. In the Enterprise offering, MaxScale does not have such limitations, making it a feasible solution for building deployments based on different elements of MariaDB Enterprise. MaxScale supports all of the software included in the MariaDB Enterprise and acts as a core building block for any of the supported topologies. MaxScale can monitor the underlying databases, route the traffic among them, and perform automated actions like failovers should the need arise. This makes it a great solution for controlling the database traffic and dealing with potential issues. Much older versions of MaxScale have been released for the public, but, realistically speaking, the recent version is what’s most interesting feature-wise, thus making MariaDB Enterprise one of the ways to use MaxScale.

How does MariaDB Enterprise work with ClusterControl?

ClusterControl itself does not provide access to MariaDB Enterprise repositories, nor does it allow users to get the MariaDB licenses. However, it can very easily be configured to work with MariaDB Enterprise. As usual, ClusterControl requires SSH connectivity to be in place:

Then we have another step where we can pick the MariaDB version and pass the password for the superuser in MySQL.

ClusterControl, by default, is configured to set up community repositories for MariaDB, but it is possible to pick an option to “Do Not Setup Vendor Repositories”. It is up to the user to configure repositories to use MariaDB Enterprise packages, but once this is done, ClusterControl can be told just to install the packages and not care where they come from. This is an excellent way of installing custom, non-community packages. Just make sure that you picked the correct version of the MariaDB that you have configured the Enterprise repositories for.

Alternatively, especially if you already have MariaDB Enterprise deployed in your environment, you can import those nodes into ClusterControl, given that the SSH connectivity is in place:

This allows ClusterControl to work with existing deployments of MariaDB Enterprise.

Such deployment of MariaDB, no matter if imported or deployed, is fully supported by ClusterControl, both asynchronous replication, and MariaDB Galera Cluster. Should your cluster switch to a non-primary state, backup schedules can be created and executed, failover will happen, replicas will be promoted as necessary, MariaDB cluster nodes will restart, and the whole cluster will be bootstrapped.

As for other elements of the MariaDB Enterprise, ClusterControl supports MaxScale load balancer. The same pattern we explained for the MariaDB database can also be applied here. If you deployed the cluster using existing repositories, MaxScale would be installed as long as it can be downloaded from one of the configured repositories.

Alternatively, it is possible to import the existing MaxScale instance:

This, again, allows you to import your existing environment into ClusterControl.

When imported, ClusterControl provides an interface for MaxScale’s command-line interface:

You can execute different commands directly from the graphical interface of ClusterControl.

As you can see, no matter if you are using MariaDB Community or MariaDB Enterprise, ClusterControl can help you to manage the database and MaxScale load balancer.

Wrapping Up

Many elect to use MariaDB Enterprise for its advanced features to achieve ACID compliance, high availability, load balancing, security, scalability, and improved backups. Whether you’re using MariaDB Community or MariaDB Enterprise, ClusterControl can help you manage the database and the MaxScale load balancer. If you want to see it all in the works, you can evaluate ClusterControl free for 30 days.

If you go the route of MariaDB Enterprise and want to take advantage of load balancing, check out how to install and configure MaxScale, both manually and with the help of ClusterControl.

Stay in touch for more updates and best practices for managing your open-source-based databases, be sure to follow us on Twitter and LinkedIn, and subscribe to our newsletter.

Tags:

↧

What's New in MariaDB 10.6

May 3, 2022, 12:02 pm

≪ Previous: What is MariaDB Enterprise and How to Manage it with ClusterControl?

As of January 2022, ClusterControl v1.9.2 introduced support for the latest version of MariaDB — version 10.6. MariaDB 10.6, released in July 2021, will be supported for the next four years or precisely until July 2026.

In this post, we will highlight the top features of MariaDB 10.6.

Atomic DDL (Data Definition Language)

The first feature we will cover is Atomic DDL. By definition, “atomic” means either the operation is successful and logged to the binary logs, or it is completely reversed. Starting with MariaDB 10.6.1, MariaDB has improved the DDL’s operations readability by making most of them atomic while the rest are crash-safe, even if the server crashes while executing any operation. Both Atomic and Crash-safe were developed to work with all storage engines, except for the S3 storage engine and partitioning engine, which are still a work in progress.

In this version, ALTER TABLE, RENAME TABLE, CREATE TABLE, DROP TABLE, DROP DATABASE, and their related DDL statements are now atomic ready. The complete list of other Atomic DDL operations can be found here. The great thing about the new atomic and crash-safe implementation is that the MariaDB server has become much more stable, not to mention reliable, even in unstable environments.

SQL Syntax

In terms of the SQL Syntax category, a few new features were added. The first one that we are going to see is:

SELECT ... OFFSET ... FETCH

The OFFSET clause will permit us to return just those segments of a resultset that arrive after a defined offset. On the other hand, the FETCH clause restricts the number of rows to return. Whether it’s a singular ROW or plural ROWS, both could be used after the OFFSET and FETCH clauses as they have no impact on the outcomes.

SKIP LOCKED

Perhaps some of us are familiar with this syntax since it has been imported and adapted from MySQL. With SKIP LOCKED, we could skip any locked ROWS when accomplishing SELECT or UPDATE operations. It is definitely a useful feature, especially for applications that let multiple users book limited resources like hotel rooms, flight seats, concert tickets, etc.

Ignored Indexes

Ignored Indexes is similar to the “invisible indexes” feature in MySQL 8. Any indexes that are visible and maintained but not used by the optimizer could be defined as Ignored Indexes. This can be very useful when testing to see what surfaces if we drop an index before indeed dropping it. In case any issue emerges, we could enable it again instantly (by marking the index IGNORED/NOT IGNORED).

JSON_TABLE

This table function is also imported from MySQL, where it could transform JSON data or documents into a relational form. In this version, MariaDB enabled a table view into JSON data stored in the MariaDB database, and by using SQL, all queries will be returned as a regular table.

Oracle Compatibility

MariaDB was the pioneer in the open-source database world that added PL/SQL compatibility. Starting with MariaDB 10.3, many syntaxes and functions have been added to ease the migration from Oracle to MariaDB. As for MariaDB 10.6, the following are features introduced to make MariaDB more PL/SQL compatible:

Anonymous subqueries in a FROM clause (no AS clause) are permitted in ORACLE mode
ADD_MONTHS() added
- function to add/subtract months from a given date value.
TO_CHAR() added
- supports NUMBER, DATE, DATETIME, TIMESTAMP, etc. as parameters and returns a formatted/converted TEXT value
SYS_GUID() added
- similar to the UUID function in MariaDB
MINUS is mapped to EXCEPT in UNION
ROWNUM function returns the current number of accepted rows in the current context

Replication, Galera and Binlog

In this category, MariaDB has introduced binlog_expire_logs_seconds as a form of alias for expire_logs_days, which means any changes to any of them will automatically be reflected in the other. Besides, binlog_expire_logs_seconds accepts a precision of 1/1000000 days. This is exceptionally useful in high-volume writes on master and when the environment has limited disk space.

In addition to that, MariaDB also introduced wsrep_mode system variables. This variable enables WSREP features that are not part of the default behavior like BINLOG_ROW_FORMAT_ONLY, DISALLOW_LOCAL_GTID, REQUIRED_PRIMARY_KEY, REPLICATE_ARIA, REPLICATE_MYISAM, STRICT_REPLICATION.

Sys Schema

The next feature is sys_schema, a collection of views, functions, and procedures. There is no doubt that sys_schema helps DBAs and developers in many ways while interpreting data; a lot of information and diagnostic information could be gathered from it. sys_schema is not only used in troubleshooting performance issues but also assisting in managing the resources efficiently. Thankfully, this is now available in MariaDB 10.6.

Information Schema

The newly added feature in this category is INFORMATION_SCHEMA.KEYWORDS and INFORMATION_SCHEMA.SQL_FUNCTIONS. The KEYWORDS table contains around 694 rows of MariaDB keywords, while the SQL_FUNCTIONS table contains around 234 rows of MariaDB functions. With these two tables, we can now obtain the Information Schema related to them in case we need it.

Wrapping Up

In addition to these new features introduced in MariaDB 10.6, many other improvements were made that were not discussed in this post. While many of these features were taken from MySQL, these additions are still highly beneficial to users.

As previously mentioned, ClusterControl currently supports MariaDB 10.6. With ClusterControl, you can easily upgrade to the latest technology version stress-free. If you’re not yet familiar with ClusterControl, you can evaluate it free for 30 days— no CC required.

To stay up to date with all the latest news and best practices for the most popular open-source databases, don’t forget to follow us on Twitter and LinkedIn, and subscribe to our newsletter for updates.

Tags:

MariaDB

↧