Migrating from Maxscale to the ProxySQL Load Balancer

October 21, 2020, 9:26 am

≫ Next: Migrating Amazon RDS (MySQL or MariaDB) to an On-Prem Server

≪ Previous: Top Open Source Tools for MySQL & MariaDB Migrations

A database load balancer, or proxy, is a middleware service between application layer and database layer. Application connects to the database proxy, and the proxy forwards the connection to the database. There are some benefits using a database proxy, for example: split read and write queries, cache queries, distribute queries based on some routing algorithm, queries rewrite, and scale your read-only workload. A database proxy also abstracts the database topology (and any changes) for the application layer, so applications only need to connect to one single endpoint.

There are various database proxies out there, from commercial to open source options e.g., HAProxy, Nginx, ProxySQL, Maxscale, etc. In this blog, we will discuss how to migrate database proxies from Maxscale to ProxySQL with the help of ClusterControl.

Current Architecture with Maxscale

Consider a highly available database architecture which consists of 3 nodes in a Galera Cluster, and on top of it, 2 Maxscale and Keepalived services for high availability of the database proxy. Galera Cluster is “virtually” synchronous replications, it uses a certification based for replication ensuring your data will be available on all the nodes. The current architecture is shown below:

Maxscale is a database proxy from MariaDB Corporation, which acts as middleware between applications and databases.

Here’s the topology architecture for Galera Cluster and Maxscale load balancers in ClusterControl. You are able to deploy all this directly from ClusterControl, or import existing databases and proxy nodes into ClusterControl. You can see your database topology in the Topology Tab.

Deploy ProxySQL & Keepalived

ProxySQL is another database proxy from ProxySQL, which provides some features such as query caching, queries rewrites, queries split for write and read based on queries pattern. To deploy ProxySQL in ClusterControl, you need to go to Manage -> Load Balancers in your cluster. ClusterControl supports a few different database proxies; HAProxy, ProxySQL, MaxScale.

Choose ProxySQL, and it will show the below page:

We need to choose the server address where ProxySQL will be installed. We can either install on the existing nodes or if you want to have a dedicated node for ProxySQL, just type the IP address in the list. Fill the password for Administration and Monitoring users, Add the application user into ProxySQL or you can configure later. Enable the database servers to be included in the load balancing set in ProxySQL. Click the Deploy ProxySQL button. We need to have at least 2 ProxySQL for high availability.

If we forget to add a database user into ProxySQL during the setup, we can configure it in the ProxySQL user tab as shown below:

ProxySQL requires database users to be configured in ProxySQL as well.

After ProxySQL is deployed, we continue to configure the Keepalived on each ProxySQL host. Keepalived services will act as master/backup roles across the ProxySQL instances. Keepalived service uses VIP (Virtual IP Address), so application will connect to a virtual IP Address on master role, and will forward the connection to the local ProxySQL. If the services fail, the VIP automatically be floated to another node.

Deploying keepalived in ClusterControl is done on the same page as the database proxy, you just need to choose the Keepalived tab. Choose the load balancers type, which is ProxySQL, and then add the current ProxySQL for Keepalived1 and Keepalived2. Fill the Virtual IP Address and Network interface. And finally, click the Deploy Keepalived button.

Running two ProxySQL with Keepalived services gives us a high availability proxy layer. In ClusterControl, it is shown in the below topology view:

Switchover

Switchover of the traffic is really straight forward, just need to change the ip address connection in the application layer to use Virtual IP Address for ProxySQL, and then monitor the traffic through ProxySQL.

Tags:

MariaDB

MaxScale

proxysql

load balancing

database proxy

proxy

↧

Migrating Amazon RDS (MySQL or MariaDB) to an On-Prem Server

October 22, 2020, 12:00 pm

≫ Next: Migrating Azure Database for MySQL/MariaDB to an On-Prem Server

≪ Previous: Migrating from Maxscale to the ProxySQL Load Balancer

Amazon Web Services is a technology giant, especially when it comes to pioneering itself in top-of-the-line cloud computing services. Its fully managed services products (Amazon RDS) is one of a kind. But then again, while it can be a perfect platform for some organizations, it can be a challenge to move out from it if it’s not. There are always concerns of being stuck in a vendor lock-in situation.

Some things to keep in mind when migrating from RDS to an on-premise platform are budget constraints, security, and data autonomy. This is because data is your most valuable asset and retaining control wherever it resides, it is always imperative for the organization and company to always remain competitive. No organization can afford to have cloud lock-in, and yet, many enterprises find themselves exactly in that situation and start hopping searching for any alternative existing solutions that can be operable via on-prem.

This blog will walk you through how to migrate from Amazon RDS going to an on-prem server. Our target database on the on-prem server is on a RHEL/CentOS Linux server, but the applicable procedure shall apply on other versions of Linux as well as long as packages are properly installed.

There are some third-party existing solutions that offer data migration but it's not applicable for an on-premise platform. Additionally, it is not free and migrating using free with open source solutions is always favorable and advantageous. Though doubts and concerns also exist since the warranty and support is not bound with open-source technologies but we will show you here how to achieve this in a straightforward procedure.

Since Amazon RDS supports compatibility with MySQL and MariaDB. We will focus on them for this blog.

Migrating from Amazon RDS for MySQL or MariaDB

A typical approach of migrating your data from Amazon RDS to an on-prem server is to take a backup using a logical copy. This can be done using backup utility solutions that are compatible to operate with Amazon RDS which is a fully-managed service. Fully-managed database services do not offer SSH logins so physical copy of backups is not an option.

Using mysqldump

Using mysqldump has to be installed in your target database node located on-prem. It has to be prepared as a replica of the AWS RDS node so all subsequent transactions shall be replicated to the node. To do this, follow the steps below.

AWS RDS source host: database-1.xxxxxxx.us-east-2.rds.amazonaws.com

On-Prem Server Host: 192.168.10.226 (testnode26)

Before starting the dump, make sure that the binlog retention hours is set. To set it, you can do like the example procedure call below in your Amazon RDS instance,

mysql> call mysql.rds_set_configuration('binlog retention hours', 24);

Query OK, 2 rows affected (0.23 sec)



mysql> CALL mysql.rds_show_configuration;

+------------------------+-------+------------------------------------------------------------------------------------------------------+

| name                   | value | description                                                                                          |

+------------------------+-------+------------------------------------------------------------------------------------------------------+

| binlog retention hours | 24    | binlog retention hours specifies the duration in hours before binary logs are automatically deleted. |

+------------------------+-------+------------------------------------------------------------------------------------------------------+

1 row in set (0.23 sec)



Query OK, 0 rows affected (0.23 sec)

Install mysqldump

Prepare the repository.

# For MySQL

$ yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm

# For MariaDB

$ curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash

Install mysql-client package

# For MySQL

$ yum install -y mysql-community-client.x86_64

# For MariaDB

$ yum install -y MariaDB-client

Create a data dump using mysqldump by executing it inside the target node. Take note, with --master-data=2 specified as an option, this works only for MariaDB but not in MySQL. So extra-work for MySQL has to be done. We'll talk on this later.

## Applicable for MariaDB approach

[root@testnode26 ~]# mysqldump -h database-1.xxxxxxx.us-east-2.rds.amazonaws.com -uadmin -p --single-transaction --master-data=2 --databases db1 db2 db3  > backups/dump.sql

Enter password:

[root@testnode26 ~]# ls -alth backups/dump.sql

-rw-r--r--. 1 root root 196M Oct 18 02:34 backups/dump.sql

Install the MySQL/MariaDB Server in the target database node

# For MySQL (always check what version repository is enabled in your yum repository. At this point, I'm using MySQL 5.7)

$ yum --disablerepo=* --enablerepo=mysql57-community install mysql-community-common mysql-community-client mysql-community-server

# For MariaDB

$ yum install MariaDB-server.x86_64

Setup the MySQL/MariaDB Server instance (my.cnf, file permissions, directories), and start the server

# Setting up the my.cnf (using the my.cnf deployment use by ClusterControl)

[MYSQLD]

user=mysql

basedir=/usr/

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

pid_file=/var/lib/mysql/mysql.pid

port=3306

log_error=/var/log/mysql/mysqld.log

log_warnings=2

slow_query_log_file=/var/log/mysql/mysql-slow.log

long_query_time=2

slow_query_log=OFF

log_queries_not_using_indexes=OFF

innodb_buffer_pool_size=2G

innodb_flush_log_at_trx_commit=2

innodb_file_per_table=1

innodb_data_file_path=ibdata1:100M:autoextend

innodb_read_io_threads=4

innodb_write_io_threads=4

innodb_doublewrite=1

innodb_log_file_size=256M

innodb_log_buffer_size=32M

innodb_buffer_pool_instances=1

innodb_log_files_in_group=2

innodb_thread_concurrency=0

innodb_flush_method=O_DIRECT

innodb_rollback_on_timeout=ON

innodb_autoinc_lock_mode=2

innodb_stats_on_metadata=0

default_storage_engine=innodb

server_id=1126

binlog_format=ROW

log_bin=binlog

log_slave_updates=1

relay_log=relay-bin

expire_logs_days=7

read_only=OFF

report_host=192.168.10.226

key_buffer_size=24M

tmp_table_size=64M

max_heap_table_size=64M

max_allowed_packet=512M

skip_name_resolve=true

memlock=0

sysdate_is_now=1

max_connections=500

thread_cache_size=512

query_cache_type=0

query_cache_size=0

table_open_cache=1024

lower_case_table_names=0

performance_schema=OFF

performance-schema-max-mutex-classes=0

performance-schema-max-mutex-instances=0



[MYSQL]

socket=/var/lib/mysql/mysql.sock



[client]

socket=/var/lib/mysql/mysql.sock



[mysqldump]

socket=/var/lib/mysql/mysql.sock

max_allowed_packet=512M

## Reset the data directory and re-install the database system files

$ rm -rf /var/lib/mysql/*

## Create the log directories

$ mkdir /var/log/mysql

$ chown -R mysql.mysql /var/log/mysql

## For MySQL

$ mysqld --initialize

## For MariaDB

$ mysql_install_db

Start the MySQL/MariaDB Server

## For MySQL

$ systemctl start mysqld

## For MariaDB

$ systemctl start mariadb

Load the data dump we have taken from AWS RDS to the target database node on-prem

$ mysql --show-warnings < backups/dump.sql

Create the replication user from the AWS RDS source node

MariaDB [(none)]> CREATE USER 'repl_user'@'149.145.213.%' IDENTIFIED BY 'repl_passw0rd';

Query OK, 0 rows affected (0.242 sec)



MariaDB [(none)]>  GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO repl_user'@'149.145.213.%'  IDENTIFIED BY 'repl_passw0rd' ;

Query OK, 0 rows affected (0.229 sec)

Set up the MySQL/MariaDB Server as a replica/slave of the AWS RDS source node

## First, let's search or locate the CHANGE MASTER command

[root@testnode26 ~]# grep -rn -E -i 'change master to master' backups/dump.sql |head -1

22:-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin-changelog.000584', MASTER_LOG_POS=421;

## Run the CHANGE MASTER statement but add the replication user/password and the hostname as follows,

MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='database-1.xxxxxxx.us-east-2.rds.amazonaws.com', MASTER_LOG_FILE='mysql-bin-changelog.000584', MASTER_LOG_POS=421, MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd';

Query OK, 0 rows affected (0.004 sec)

## Then start the slave threads

MariaDB [(none)]> START SLAVE;

Query OK, 0 rows affected (0.001 sec)

## Check the slave status how it goes

MariaDB [(none)]> SHOW SLAVE STATUS \G

*************************** 1. row ***************************

                Slave_IO_State: Waiting for master to send event

                   Master_Host: database-1.xxxxxxx.us-east-2.rds.amazonaws.com

                   Master_User: repl_user

                   Master_Port: 3306

                 Connect_Retry: 60

               Master_Log_File: mysql-bin-changelog.000584

           Read_Master_Log_Pos: 421

                Relay_Log_File: relay-bin.000001

                 Relay_Log_Pos: 4

         Relay_Master_Log_File: mysql-bin-changelog.000584

              Slave_IO_Running: Yes

             Slave_SQL_Running: Yes

               Replicate_Do_DB:

           Replicate_Ignore_DB:

            Replicate_Do_Table:

        Replicate_Ignore_Table:

       Replicate_Wild_Do_Table:

   Replicate_Wild_Ignore_Table:

                    Last_Errno: 0

                    Last_Error:

                  Skip_Counter: 0

           Exec_Master_Log_Pos: 421

               Relay_Log_Space: 256

               Until_Condition: None

                Until_Log_File:

                 Until_Log_Pos: 0

            Master_SSL_Allowed: No

            Master_SSL_CA_File:

            Master_SSL_CA_Path:

               Master_SSL_Cert:

             Master_SSL_Cipher:

                Master_SSL_Key:

         Seconds_Behind_Master: 0

 Master_SSL_Verify_Server_Cert: No

                 Last_IO_Errno: 0

                 Last_IO_Error:

                Last_SQL_Errno: 0

                Last_SQL_Error:

   Replicate_Ignore_Server_Ids:

              Master_Server_Id: 1675507089

                Master_SSL_Crl:

            Master_SSL_Crlpath:

                    Using_Gtid: No

                   Gtid_IO_Pos:

       Replicate_Do_Domain_Ids:

   Replicate_Ignore_Domain_Ids:

                 Parallel_Mode: optimistic

                     SQL_Delay: 0

           SQL_Remaining_Delay: NULL

       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

              Slave_DDL_Groups: 0

Slave_Non_Transactional_Groups: 0

    Slave_Transactional_Groups: 0

1 row in set (0.000 sec)

Now that we have finally been able to replicate from RDS as the source or master of our replica located on-prem. It's not done yet. There are some cases you'll encounter replication errors such as,

Last_SQL_Errno: 1146

                Last_SQL_Error: Error 'Table 'mysql.rds_heartbeat2' doesn't exist' on query. Default database: 'mysql'. Query: 'INSERT INTO mysql.rds_heartbeat2(id, value) values (1,1602988485784) ON DUPLICATE KEY UPDATE value = 1602988485784'

Since on-prem does not need to replicate data coming from mysql database for tables prefixed with 'rds%', then we just ignore these tables during replication. Additionally, you might not want AWS RDS to update and change your mysql.user table. To do this, you can optionally ignore the schema or just list of tables such as,

STOP SLAVE;

Then,

SET GLOBAL replicate_wild_ignore_table='mysql.rds%';

SET GLOBAL replicate_wild_ignore_table='mysql.%';

The MySQL Problem With --master-data=2

Taking the mysqldump with --master-data=2 requires sufficient privileges which requires SUPER and RELOAD privileges. The problem is, AWS RDS does not supply this for the admin user during database setup and creation. In order to workaround this issue, it must be that your AWS RDS has a master and a replica or slave setup. Once you have a slave setup, take that as the target source host when taking mysqldump. Then stop the slave threads from your AWS RDS replica as follows,

rds-replica-mysql> CALL mysql.rds_stop_replication;

Then take the mysqldump without the --master-data option just like below,

mysqldump -h database-1.xxxxxxx.us-east-2.rds.amazonaws.com -uadmin -p --single-transaction --databases db1 db2 db3  > backups/dump.sql

Then run the SHOW SLAVE STATUS\G from your AWS RDS replica and take note of the Master_Log_File and Exec_Master_Log_Pos for which you will use when connecting to the AWS RDS master replicating to your on-prem server. Use those coordinates when running CHANGE MASTER TO… MASTER_LOG_FILE=Master_Log_File, MASTER_LOG_POS=<Exec_Master_Log_Pos>. Of course, once the backup has been done, do not forget to start your RDS replica to start its replication threads again,

rds-replica-mysql> CALL mysql.rds_start_replication;

Using mydumper

mydumper can be your alternative option here especially when the dataset is very large as it offers parallelism and speed when taking a dump or backup copy of your dataset from a source RDS node. Follow the steps below from installing the mydumper to loading it to your destination on-prem server.

Install the binary. The binaries can be located here https://github.com/maxbube/mydumper/releases.

 $ yum install https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper-0.9.5-2.el6.x86_64.rpm

Take the backup from the RDS source node. For example,

[root@testnode26 mydumper-2]# /usr/bin/mydumper --outputdir=. --verbose=3 --host=database-1.xxxxxxx.us-east-2.rds.amazonaws.com --port=3306 --kill-long-queries --chunk-filesize=5120 --build-empty-files --events --routines --triggers --compress --less-locking --success-on-1146 --regex='(db1\.|db2\.|db3\.|mydb4\.|testdb5\.)' -u admin --password=admin123

** Message: Connected to a MySQL server



** (mydumper:18904): CRITICAL **: Couldn't acquire global lock, snapshots will not be consistent: Access denied for user 'admin'@'%' (using password: YES)

** Message: Started dump at: 2020-10-18 09:34:08



** Message: Written master status

** Message: Multisource slave detected.

** Message: Thread 5 connected using MySQL connection ID 1109

Now, at this point, mydumper will take a backup files in the form of *.gz files

Load it to your destination on-premise server

$ myloader --host localhost --directory=$(pwd) --queries-per-transaction=10000 --threads=8 --compress-protocol --verbose=3

** Message: 8 threads created

** Message: Creating database `db1`

** Message: Creating table `db1`.`folders_rel`

** Message: Creating table `db2`.`p`

** Message: Creating table `db2`.`t1`

** Message: Creating table `db3`.`AddressCodeTest`

Setup the destination node as a slave/replica. mydumper will include a file called metadata which consists of binary log coordinates including GTID positions, for example:

$ cat metadata

Started dump at: 2020-10-18 10:23:35

SHOW MASTER STATUS:

        Log: mysql-bin-changelog.000680

        Pos: 676

        GTID:0-1675507089-3044

## Then run a change master from the replica or your target destination MySQL/MariaDB database node

MariaDB [jbmrcd_date]> CHANGE MASTER TO MASTER_HOST='database-1.cmu8qdlvkepg.us-east-2.rds.amazonaws.com', MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd',  MASTER_LOG_FILE='mysql-bin-changelog.000680', MASTER_LOG_POS

=676;

Query OK, 0 rows affected (0.002 sec)

## Start the slave

MariaDB [jbmrcd_date]> start slave;

Query OK, 0 rows affected (0.001 sec)

At this point, you have now replicated from an Amazon RDS instance running MySQL/MariaDB. Once your application is ready to move away from your Amazon RDS instance, setup the endpoint going to your on-prem server and all remaining transactions from your RDS instance will be replicated to your on-prem leaving no data being missed going to your on-prem server.

Check For Data Discrepancies

Once you have your data loaded or dumped to your on-prem server acting as a replica from the AWS RDS instance, you should double check this by running checksum calculations to determine how far your data is against the source Amazon RDS. I suggest you use pt-table-checksum tool by Percona, but you can create your own though by using checksumming tools such as md5 or sha256 but this takes time to do. Additionally, using pt-upgrade can help as well after your data migration using this replication approach is done.

Conclusion

Using mysqldump or mydumper are free open-source tools which is a great advantage as well especially if your data is very confidential and you do not want a third-party to access it. Although it might be simple to take this approach, there can be tedious and large work that can be involved as testing and double checks always follow in order to prove that migration is fully achieved without any data inconsistencies.

Tags:

data center migration

database migration

↧

Migrating Azure Database for MySQL/MariaDB to an On-Prem Server

October 29, 2020, 8:36 am

≫ Next: Using the Percona Audit Log Plugin for Database Security

≪ Previous: Migrating Amazon RDS (MySQL or MariaDB) to an On-Prem Server

Database migrations can impose huge challenges when you consider how to start, what tools to use, and how to achieve a full database migration successfully. Earlier, we have listed the top open source you can use on migration for MySQL or MariaDB. In this blog, we'll show you how to migrate data from Microsoft Azure Database for MySQL or MariaDB.

Microsoft Azure is now known to be a contender against the two other cloud tech giants: AWS and Google Cloud. It specializes more of its Microsoft products specially their home grown MSSQL proprietary database. But not only that, it also has open sources as one of their fully managed service databases to offer publicly. Among its supported databases are MySQL and MariaDB.

Moving out from Azure Database for MySQL/MariaDB can be tedious but it depends on what type of architecture and what type of dataset you have hosted in your Azure as your current cloud provider. With the right tools, it can be achievable and a full migration can be done.

We'll focus on the tools we can use for data migrations on MySQL or MariaDB. For this blog, I'm using RHEL/CentOS to install the required packages. Let's go over and define the steps and procedures on how to do this.

Migrating From Azure Database for MySQL or MariaDB

A typical approach of migrating your data from Azure Database to an on-prem server is to take a backup using a logical copy. This can be done using backup utility solutions that are compatible to operate with Azure Database for MySQL or MariaDB which is a fully-managed service. Fully-managed database services do not offer SSH logins so physical copy of backups is not an option.

Before you can migrate or dump your existing database from Azure, you have to take note of the following considerations.

Common Use-cases For Dump and Restore On-Prem

Most common use-cases are:

Using logical backup (such as mysqldump, mysqlpump or mydumper/myloader) and restore is the only option. Azure Database for MySQL or MariaDB does not support physical access to the physical storage as this is a fully-managed database service.
Supports only InnoDB and Memory storage engines. Migrating from alternative storage engines to InnoDB. Azure Database for MySQL or MariaDB supports only InnoDB Storage engine, and therefore does not support alternative storage engines. If your tables are configured with other storage engines, convert them into the InnoDB engine format before migration to Azure Database for MySQL.
For example, if you have a WordPress or WebApp using the MyISAM tables, first convert those tables by migrating into InnoDB format before restoring to Azure Database for MySQL. Use the clause ENGINE=InnoDB to set the engine used when creating a new table, then transfer the data into the compatible table before the restore.
If your source Azure Database is on a specific version, then your target on-premise server has also been the same version as the source Azure Database.

So with these limitations, you only expect that your data from Azure has to be InnoDB storage engine or Memory, if there's such in your dataset.

Performance Considerations For Taking Logical Backup from Azure Database

The only way to take a logical backup with Azure is to use mysqldump or mysqlpump. To optimize performance when taking a dump using these tools, take notice of these considerations when dumping large databases:

Use the exclude-triggers option in mysqldump when dumping databases. Exclude triggers from dump files to avoid the trigger commands firing during the data restore.
Use the single-transaction option to set the transaction isolation mode to REPEATABLE READ and send a START TRANSACTION SQL statement to the server before dumping data. Dumping many tables within a single transaction causes some extra storage to be consumed during restore. The single-transaction option and the lock-tables option are mutually exclusive because LOCK TABLES causes any pending transactions to be committed implicitly. To dump large tables, combine the single-transaction option with the quick option.
Use the extended-insert multiple-row syntax that includes several VALUE lists. This results in a smaller dump file and speeds up inserts when the file is reloaded.
Use the order-by-primary option in mysqldump when dumping databases, so that the data is scripted in primary key order.
Use the disable-keys option in mysqldump when dumping data, to disable foreign key constraints before load. Disabling foreign key checks provides performance gains. Enable the constraints and verify the data after the load to ensure referential integrity.
Use partitioned tables when appropriate.
Load data in parallel. Avoid too much parallelism that would cause you to hit a resource limit, and monitor resources using the metrics available in the Azure portal.
Use the defer-table-indexes option in mysqlpump when dumping databases, so that index creation happens after the table's data is loaded.
Use the skip-definer option in mysqlpump to omit definer and SQL SECURITY clauses from the create statements for views and stored procedures. When you reload the dump file, it creates objects that use the default DEFINER and SQL SECURITY values.
Copy the backup files to an Azure blob/store and perform the restore from there, which should be a lot faster than performing the restore across the Internet.

Unsupported

The following are unsupported:

DBA role: Restricted. Alternatively, you can use the administrator user (created during new server creation), allows you to perform most of DDL and DML statements.
SUPER privilege: Similarly, SUPER privilege is restricted.
DEFINER: Requires super privileges to create and is restricted. If importing data using a backup, remove the CREATE DEFINER commands manually or by using the --skip-definer command when performing a mysqldump.
System databases: The mysql system database is read-only and used to support various PaaS functionality. You cannot make changes to the mysql system database.
SELECT ... INTO OUTFILE: Not supported in the service.

Using mysqldump

Using mysqldump has to be installed in your target database node located on-prem. It has to be prepared as a replica of the Azure Database node so all subsequent transactions shall be replicated to the node. To do this, follow the steps below.

Install mysqldump

Prepare the repository.

# For MySQL

$ yum install https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm

# For MariaDB

$ curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash

Install mysql-client package

# For MySQL

$ yum install -y mysql-community-client.x86_64

# For MariaDB

$ yum install -y MariaDB-client

Create a data dump using mysqldump by executing it inside the target node.

$ MYSQL_PWD=<YOUR_MYSQL_PASS> mysqldump -h<YOUR_AZURE_DB_HOSTNAME>  -u<YOUR_AZURE_USERNAME> --single-transaction --master-data=2 --extended-insert --order-by-primary --disable-keys --databases maximusdb db2 db3 > backups/dump.sql

Install the MySQL/MariaDB Server in the target database node

# For MySQL

$  yum install mysql-community-server.x86_64 mysql-community-client mysql-community-common

# For MariaDB

$ yum install MariaDB-server.x86_64

Setup the MySQL/MariaDB Server instance (my.cnf, file permissions, directories), and start the server

# Setting up the my.cnf (using the my.cnf deployment use by ClusterControl)

[MYSQLD]

user=mysql

basedir=/usr/

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

pid_file=/var/lib/mysql/mysql.pid

port=3306

log_error=/var/log/mysql/mysqld.log

log_warnings=2

slow_query_log_file=/var/log/mysql/mysql-slow.log

long_query_time=2

slow_query_log=OFF

log_queries_not_using_indexes=OFF

innodb_buffer_pool_size=2G

innodb_flush_log_at_trx_commit=2

innodb_file_per_table=1

innodb_data_file_path=ibdata1:100M:autoextend

innodb_read_io_threads=4

innodb_write_io_threads=4

innodb_doublewrite=1

innodb_log_file_size=256M

innodb_log_buffer_size=32M

innodb_buffer_pool_instances=1

innodb_log_files_in_group=2

innodb_thread_concurrency=0

innodb_flush_method=O_DIRECT

innodb_rollback_on_timeout=ON

innodb_autoinc_lock_mode=2

innodb_stats_on_metadata=0

default_storage_engine=innodb

server_id=1126

binlog_format=ROW

log_bin=binlog

log_slave_updates=1

relay_log=relay-bin

expire_logs_days=7

read_only=OFF

report_host=192.168.10.226

key_buffer_size=24M

tmp_table_size=64M

max_heap_table_size=64M

max_allowed_packet=512M

skip_name_resolve=true

memlock=0

sysdate_is_now=1

max_connections=500

thread_cache_size=512

query_cache_type=0

query_cache_size=0

table_open_cache=1024

lower_case_table_names=0

performance_schema=OFF

performance-schema-max-mutex-classes=0

performance-schema-max-mutex-instances=0



[MYSQL]

socket=/var/lib/mysql/mysql.sock



[client]

socket=/var/lib/mysql/mysql.sock



[mysqldump]

socket=/var/lib/mysql/mysql.sock

max_allowed_packet=512M

## Reset the data directory and re-install the database system files

$ rm -rf /var/lib/mysql/*

## Create the log directories

$ mkdir /var/log/mysql

$ chown -R mysql.mysql /var/log/mysql

## For MySQL

$ mysqld --initialize

## For MariaDB

$ mysql_install_db

Start the MySQL/MariaDB Server

## For MySQL

$ systemctl start mysqld

## For MariaDB

$ systemctl start mariadb

Load the data dump we have taken from Azure Database to the target database node on-prem

$ mysql --show-warnings < backups/dump.sql

Create the replication user from your Azure Database source node

CREATE USER 'repl_user'@'<your-target-node-ip>' IDENTIFIED BY 'repl_passw0rd';

GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO repl_user@'<your-target-node-ip>' IDENTIFIED BY 'repl_passw0rd';

Make sure you change the IP address of your target node's IP address as the client to connect from.

Set up the MySQL/MariaDB Server as a replica/slave of the Azure Database source node

## First, let's search or locate the CHANGE MASTER command

$ grep -rn -E -i 'change master to master' backups/dump.sql |head -1

22:-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=2938610;

## Run the CHANGE MASTER statement but add the replication user/password and the hostname as follows,

CHANGE MASTER TO MASTER_HOST='<YOUR_AZURE_DB_HOSTNAME>', MASTER_LOG_FILE='mysql-bin.000006', MASTER_LOG_POS=2938610, MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd';

## In some cases, you might have to ignore the mysql schema. Run the following statement:

SET GLOBAL replicate_wild_ignore_table='mysql.%';

## Then start the slave threads

START SLAVE;

## Check the slave status how it goes

SHOW SLAVE STATUS \G

Now that we have finally been able to replicate from Azure Database either for MySQL/MariaDB as the source of your replica located on-prem.

Using mydumper

Azure Database for MySQL or MariaDB in fact suggests that using mydumper specially for large backups such as 1TB can be your alternative option. It offers parallelism and speed when taking a dump or backup copy of your dataset from a source Azure Database node.

Follow the steps below from installing the mydumper to loading it to your destination on-prem server.

Install the binary. The binaries can be located here https://github.com/maxbube/mydumper/releases.

 $ yum install https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper-0.9.5-2.el6.x86_64.rpm

Take the backup from the Azure Database source node. For example,

[root@testnode26 mydumper]# MYSQL_PWD=<YOUR_AZURE_DB_PASSWORD> /usr/bin/mydumper --outputdir=. --verbose=3 --host=<YOUR_AZURE_DB_HOSTNAME>  -u <YOUR_AZURE_USER>@<YOUR_AZURE_DB_HOSTNAME> --port=3306 --kill-long-queries --chunk-filesize=5120 --build-empty-files --events --routines --triggers --compress --less-locking --success-on-1146 --regex='(maximusdb\.|db1\.|db2\.)'

** Message: Connected to a MySQL server

** Message: Using Percona Backup Locks



** (mydumper:28829): CRITICAL **: Couldn't acquire LOCK BINLOG FOR BACKUP, snapshots will not be consistent: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'BINLOG FOR BACKUP' at line 1

** Message: Started dump at: 2020-10-26 01:34:05



** Message: Written master status

** Message: Multisource slave detected.

** Message: Thread 5 connected using MySQL connection ID 64315

** Message: Thread 6 connected using MySQL connection ID 64345

** Message: Thread 7 connected using MySQL connection ID 64275

** Message: Thread 8 connected using MySQL connection ID 64283

** Message: Thread 1 connected using MySQL connection ID 64253

** Message: Thread 2 connected using MySQL connection ID 64211

** Message: Thread 3 connected using MySQL connection ID 64200

** Message: Thread 4 connected using MySQL connection ID 64211



** (mydumper:28829): CRITICAL **: Error: DB: mysql - Could not execute query: Access denied for user 'mysqldbadmin'@'%' to database 'mysql'

** Message: Thread 5 shutting down

** Message: Thread 6 shutting down

** Message: Thread 7 shutting down

** Message: Thread 8 shutting down

** Message: Thread 1 dumping data for `db1`.`TB1`

** Message: Thread 2 dumping data for `db1`.`tb2

….

As you can see, there's a limitation of taking backup from a managed database such as Azure. You might notice,

** (mydumper:28829): CRITICAL **: Couldn't acquire LOCK BINLOG FOR BACKUP, snapshots will not be consistent: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'BINLOG FOR BACKUP' at line 1

This is because, SUPER PRIVILEGE is not supported or restricted. Ideally, the best option to do this is to take the backup from a replica of your Azure Database. We'll talk about this later.

Now, at this point, mydumper will take a backup files in the form of *.gz files

Load it to your destination on-premise server

$ myloader --host localhost --directory=$(pwd) --queries-per-transaction=10000 --threads=8 --compress-protocol --verbose=3

** Message: 8 threads created

** Message: Creating database `maximusdb`

** Message: Creating table `maximusdb`.`usertbl`

** Message: Creating table `maximusdb`.`familytbl`

** Message: Creating table `db2`.`t1`

** Message: Creating table `db3`.`test1`

…

….

Setup the destination node as a slave/replica. mydumper will include a file called metadata which consists of binary log coordinates including GTID positions, for example:

$ cat metadata

Started dump at: 2020-10-26 01:35:12

SHOW MASTER STATUS:

        Log: mysql-bin.000007

        Pos: 801

        GTID:0-3649485694-1705



Finished dump at: 2020-10-26 01:37:12

## Then run a change master from the replica or your target destination MySQL/MariaDB database node

CHANGE MASTER TO MASTER_HOST='<YOUR_AZURE_DB_HOSTNAME>', MASTER_LOG_FILE='mysql-bin.000007', MASTER_LOG_POS=801, MASTER_USER='repl_user', MASTER_PASSWORD='repl_passw0rd';

## Start the slave

START SLAVE;

At this point, you have now replicated from an Azure Database instance running MySQL/MariaDB. Once your application is ready to move away from your Azure Database instance, setup the endpoint going to your on-prem server and all remaining transactions from your Azure instance will be replicated to your on-prem leaving no data being missed going to your on-prem server.

Handling Limitations With Managed Databases For MySQL or MariaDB in Azure

Dealing with limitations especially when taking a backup dump of your dataset has to be 100% accurate from the point in time you have taken the backup dump. Of course, this is an ideal migration going to your on-prem. In order to deal with this, the best architecture setup is to have a replication topology presence in your Azure Database.

Once you have it and ready for migration, the mysqldump/mysqlpump or mydumper has to use the Azure Database replica as its source. Within that Azure Database replica, make sure that the SQL_THREAD is stopped so that you can snapshot or record the correctMASTER_LOG_FILE and EXEC_MASTER_LOG_POS from the result of SHOW SLAVE STATUS.

Of course, once the backup has been done, do not forget to start your Azure Database replica to start its replication threads again.

Check For Data Discrepancies

Once you have your data loaded or dumped to your on-prem server acting as a replica from the Azure Database instance, you should double check this by running checksum calculations to determine how far your data is against the source Azure Database. We suggest you use pt-table-checksum tool from Percona Toolkit, but you can create your own though by using checksumming tools such as md5 or sha256 but this takes time to do. Additionally, using pt-upgrade from Percona Toolkit can help as well after your data migration using this replication approach is done.

Conclusion

Limitations of privileges and unsupported types from Azure Database can be challenging but with the appropriate flow and architecture, it's not impossible to migrate from a fully-managed database going on-prem. All you need to do is prepare the required steps, setup the required topology from your Azure Database source, then start migration from taking backups, to replication, and total migration to your on-prem.

Tags:

↧

Using the Percona Audit Log Plugin for Database Security

October 30, 2020, 3:45 am

≫ Next: An Overview of ProxySQL Clustering in ClusterControl

≪ Previous: Migrating Azure Database for MySQL/MariaDB to an On-Prem Server

Why Do You Need To Use an Audit Plugin For Your Database?

Auditing in a database doesn't deviate from its meaning as it shares the same connotation i.e. to inspect, examine, and evaluate for such database events/transactions that are being logged or performed within your database. It in fact adds more feasibility for databases especially as a security feature, as it commends the administrative side to be sensitive for managing and processing data. It embraces the responsibility and accountability for data management.

Database Audit requires that for every transactions (i.e. DDLs and DMLs) has to be logged so as to record traces and get the full overview of what is happening during database operations. These operations can take the considerations:

Provides capability to monitor and debug so as to increase performance on the application side
Security and data privacy compliance such as PCI DSS, HIPAA, GDPR, etc.
Provides capability to take data autonomy specific to multi-tenancy environments. This allows them to take data analysis so as to differentiate and filter transactions based on the sensitivity and privacy for security and performance considerations.
Drives administrative actions to prevent database users from inappropriate actions based on investigative suspicious activity or limited by its role. This means, read users, for example, shall only be allowed to pull data and only limited access to specific databases that they are only responsible for or with limited scope in accordance to their job role.

What is the Percona Audit Log Plugin?

Previous approaches on auditing transactions or events running in your database can be a hefty approach. Enabling general log file or either using slow query log. It's not a perfect approach so the audit log plugin manages to add more flexibility and customizable parameters to fill up the gap. Percona claims their Audit Log Plugin is an alternative to MySQL Enterprise Audit. Although that is true, there's a caveat here that Percona's Audit Log Plugin is not available for Oracle's MySQL for installation. There's no downloadable tarball for this binary but it is easy to install by just copying an existing audit_log.so file from an existing Percona Server or Percona XtraDB Cluster installation. Best to recommend to use or copy an existing audit_log.so of the same version of Percona Server with the MySQL community version as well. So if your target MySQL Community version is of 8.x, then use the audit_log.so from a Percona Server 8.x version as well. We will show you how to do this on a MySQL community version later on this blog.

The Percona Audit Log Plugin is of course open-source and it is available for free of use. So if your enterprise application uses a backend database such as Percona Server or the vanilla MySQL, then you can use this plugin. MySQL Enterprise Audit is only available for MySQL Enterprise Server and that comes with a price. Additionally, Percona is constantly updating and maintaining this software and this comes as a major advantage as if any major release from the MySQL upstream is available. Percona will also release based on its major version and that affects updates and tested functionality as well for their audit log plugin tool. So any incompatibility from its previous versions, shall be updated as well to work with the most recent and secure version of MySQL.

The Percona Audit Log Plugin is tagged as one of a security tool but let us clarify this again. This tool is used for auditing logs. It's sole purpose is to log traces of transactions from your database. It does not do firewalling nor it does not apply preventive measures to block specific users. This tool is mainly for auditing logs and use for database transaction analysis.

Using The Percona Audit Log Plugin

In this section, we'll go over on how to install, use, and how beneficial the plugin can be especially in real-world situations.

Installing The Plugin

Percona comes with various sources for their database binaries. Once you install the database server properly, the standard installation will place the audit log plugin shared-object in /usr/lib64/mysql/plugin/audit_log.so. Installing the plugin as a way to enable it within the Percona/MySQL server can be done with the following actions below. This steps is done using Percona Server 8.0,

mysql> select @@version_comment, @@version\G

*************************** 1. row ***************************

@@version_comment: Percona Server (GPL), Release 12, Revision 7ddfdfe

        @@version: 8.0.21-12

1 row in set (0.00 sec)

Then the steps are as follows:

Verify first if the plugin exists or not

## Check if the plugin is enabled or installed

mysql> select * from information_schema.PLUGINS where PLUGIN_NAME like '%audit%';

Empty set (0.00 sec)



mysql> show variables like 'audit%';

Empty set (0.00 sec)

Install the plugin,

## Check where are the plugins located

mysql> show variables like 'plugin%';

+---------------+--------------------------+

| Variable_name | Value                    |

+---------------+--------------------------+

| plugin_dir    | /usr/lib64/mysql/plugin/ |

+---------------+--------------------------+

1 row in set (0.00 sec)



mysql> \! ls -a /usr/lib64/mysql/plugin/audit_log.so

/usr/lib64/mysql/plugin/audit_log.so

## Ready and then install

mysql> INSTALL PLUGIN audit_log SONAME 'audit_log.so';

Query OK, 0 rows affected (0.01 sec)

Verify it back once again

mysql> select * from information_schema.PLUGINS where PLUGIN_NAME like '%audit%'\G

*************************** 1. row ***************************

           PLUGIN_NAME: audit_log

        PLUGIN_VERSION: 0.2

         PLUGIN_STATUS: ACTIVE

           PLUGIN_TYPE: AUDIT

   PLUGIN_TYPE_VERSION: 4.1

        PLUGIN_LIBRARY: audit_log.so

PLUGIN_LIBRARY_VERSION: 1.10

         PLUGIN_AUTHOR: Percona LLC and/or its affiliates.

    PLUGIN_DESCRIPTION: Audit log

        PLUGIN_LICENSE: GPL

           LOAD_OPTION: ON

1 row in set (0.00 sec)



mysql> show variables like 'audit%';

+-----------------------------+---------------+

| Variable_name               | Value         |

+-----------------------------+---------------+

| audit_log_buffer_size       | 1048576       |

| audit_log_exclude_accounts  |               |

| audit_log_exclude_commands  |               |

| audit_log_exclude_databases |               |

| audit_log_file              | audit.log     |

| audit_log_flush             | OFF           |

| audit_log_format            | OLD           |

| audit_log_handler           | FILE          |

| audit_log_include_accounts  |               |

| audit_log_include_commands  |               |

| audit_log_include_databases |               |

| audit_log_policy            | ALL           |

| audit_log_rotate_on_size    | 0             |

| audit_log_rotations         | 0             |

| audit_log_strategy          | ASYNCHRONOUS  |

| audit_log_syslog_facility   | LOG_USER      |

| audit_log_syslog_ident      | percona-audit |

| audit_log_syslog_priority   | LOG_INFO      |

+-----------------------------+---------------+

18 rows in set (0.00 sec)

Installing the Percona Audit Plugin Over the MySQL Community Version

When installing on Oracle MySQL versions, as what we have mentioned above, always match with the version of Percona Server where the audit_log.so file came from. So for example, I have the following versions of MySQL below,

nodeB $  mysqld --version

/usr/sbin/mysqld  Ver 8.0.22 for Linux on x86_64 (MySQL Community Server - GPL)

Whereas, my Percona Server is,

nodeA $ mysqld --version

/usr/sbin/mysqld  Ver 8.0.21-12 for Linux on x86_64 (Percona Server (GPL), Release 12, Revision 7ddfdfe)

All you need to do is copy from the Percona source to the server where you have MySQL Community Server installed.

nodeA $ scp /usr/lib64/mysql/plugin/audit_log.so nodeB:/tmp/

Then move to /usr/lib64/mysql/plugin for which plugins shall be located.

root@nodeB > show global variables like 'plugin%';

+---------------+--------------------------+

| Variable_name | Value                    |

+---------------+--------------------------+

| plugin_dir    | /usr/lib64/mysql/plugin/ |

+---------------+--------------------------+

1 row in set (0.00 sec)



nodeB $ mv /tmp/audit_log.so /usr/lib64/mysql/plugin

All the rest, you can follow the steps as stated above to continue installing or enabling the Percona Audit Login Plugin for MySQL Community Server.

Configuration and Managing Percona Audit Log Plugin

Percona Audit Log Plugin is a very flexible tool that is very configurable or customizable to cater your requirements as you log your database connections or transactions. It's a linear fashion implementation for its given configuration so even if it's flexible to be customized by its given parameters, only those given values shall be logged and audited during the whole time your database runs and it's done asynchronously by default. Every parameter variables in this plugin are important but below are the most important parameters that you can use to configure the plugin:

audit_log_strategy- Used to specify the audit log strategy and when audit_log_handler is set to FILE. which is either the following values are possible:
- ASYNCHRONOUS - (default) log using memory buffer, do not drop messages if buffer is full
- PERFORMANCE - log using memory buffer, drop messages if buffer is full
- SEMISYNCHRONOUS - log directly to file, do not flush and sync every event
- SYNCHRONOUS - log directly to file, flush and sync every event
audit_log_file- Filename to be used to store audit logs, which defaults to file ${datadir}/audit.log. You can use relative file path from the datadir of your database or the absolute file path.
audit_log_flush- Useful when you need to flush the log such as being used in coordination with logrotate
audit_log_buffer_size- By default, Percona Audit Log records traces to the default file log. This variable is useful when audit_log_handler = FILE, and audit_log_strategy = ASYNCHRONOUS or PERFORMANCE. When set, it is used to specify the size of the memory buffer used for logging. This allows you to avoid performance penalty degradation when auditing logs is enabled.
audit_log_format- Format to specify when recording or saving information to your audit log file. Accepts formats as OLD/NEW (based on XML format), JSON, and CSV. This is very useful especially when you incorporate later with other external tools to pull your audit logs that support specific formats.
audit_log_exclude_accounts/audit_log_include_accounts - Used to specify the list of users you can include or exclude respective to its param name. Accepts NULL otherwise a comma separated list in the format of user@host or 'user'@'host'. These variables are mutually exclusive so it has to be unset (i.e. value is NULL) one or the other
audit_log_include_commands/audit_log_exclude_commands - Used to specify the list of commands (either NULL or comma separated list) for which filtering by SQL command type is applied. These variables are mutually exclusive so it has to be unset (i.e. value is NULL) one or the other. To get the list of SQL command types in MySQL or Percona, do the following:
- enable performance_schema=ON variable in your my.cnf (requires database server restart)
- Run the following query: SELECT GROUP_CONCAT(SUBSTRING_INDEX(name, '/', -1) ORDER BY name) sql_statement FROM performance_schema.setup_instruments WHERE name LIKE "statement/sql/%"\G
audit_log_include_databases/audit_log_exclude_databases - used to specify to filter by database name and with conjunction to audit_log_{include,exclude}_commands to filter the list of commands so as to be more granular when logging during auditing logs. These variables are mutually exclusive so it has to be unset (i.e. value is NULL) one or the other.
audit_log_policy- Used to specify which events should be logged. Technically, you can set this variable dynamically to enable or disable (set value to NONE) for your audit logging. Possible values are:
- ALL - all events will be logged
- LOGINS - only logins will be logged
- QUERIES - only queries will be logged
- NONE - no events will be logged

Managing the Audit Log Plugin

As mentioned, default log file goes to ${data_dir}/audit.log and uses XML format just like my example below:

[root@testnode20 ~]# ls /var/lib/mysql/audit.log  | xargs tail -28

<AUDIT_RECORD

  NAME="Ping"

  RECORD="28692714_2020-10-28T19:12:18"

  TIMESTAMP="2020-10-29T09:39:56Z"

  COMMAND_CLASS="error"

  CONNECTION_ID="10"

  STATUS="0"

  SQLTEXT=""

  USER="cmon[cmon] @  [192.168.10.200]"

  HOST=""

  OS_USER=""

  IP="192.168.10.200"

  DB="information_schema"

/>

<AUDIT_RECORD

  NAME="Query"

  RECORD="28692715_2020-10-28T19:12:18"

  TIMESTAMP="2020-10-29T09:39:56Z"

  COMMAND_CLASS="show_status"

  CONNECTION_ID="10"

  STATUS="0"

  SQLTEXT="SHOW GLOBAL STATUS"

  USER="cmon[cmon] @  [192.168.10.200]"

  HOST=""

  OS_USER=""

  IP="192.168.10.200"

  DB="information_schema"

/>

Now, let's manage the Percona Audit Log Plugin in a real case scenario. Inspired by the work of Dani's blog of Percona, let's consider changing the following variables in my.cnf,

[root@testnode20 ~]# grep -i 'audit' /etc/my.cnf

## Audit Log

audit_log_format=JSON

audit_log_strategy=PERFORMANCE

audit_log_policy=QUERIES

audit_log_exclude_databases=s9s

Then let's create the following database and tables,

CREATE DATABASE s9s;

CREATE TABLE `audit_records` ( `id` int unsigned NOT NULL AUTO_INCREMENT,  `audit_record` json,   PRIMARY KEY (`id`) ) ENGINE=InnoDB;

Then let's use A named pipe or FIFO in Linux to collect logs ready for auditing but which we can later use feasibly.

$ mkfifo /tmp/s9s_fifo

$ exec 1<>/tmp/s9s_fifo

$ tail -f /var/lib/mysql/audit.log 1>/tmp/s9s_fifo 2>&1

Then, let's insert any logs to our table `s9s`.`audit_records` using the following script below,

#/bin/bash

pipe=/tmp/s9s_fifo

while true; do

    if read line <$pipe; then 

if [[ "$line" == 'quit' ]]; then 

break

fi 

mysql --show-warnings -vvv -e "INSERT INTO s9s.audit_records (audit_record) VALUES(\"${line//\"/\\\"}\")" 

    fi

done

Then I did try running a benchmark using sysbench. Now, with the following entries I have,

mysql> select count(1) from audit_records\G

*************************** 1. row ***************************

count(1): 37856

1 row in set (0.11 sec)

I can do some auditing using JSON which is making it feasible for me to do auditing and investigation or even performance analysis of my database. For example,

mysql> SELECT top10_select_insert from ((select audit_record->"$.audit_record" as top10_select_insert from audit_records  where audit_record->"$.audit_record.command_class" in ('select') order by audit_records.id desc limit 10) union all (select audit_record->"$.audit_record" as top10_select_insert from audit_records  where audit_record->"$.audit_record.command_class" in ('insert')  order by audit_records.id desc limit 10)) AS b\G

*************************** 1. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263176_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 5001 AND 5100 ORDER BY c", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25143"}

*************************** 2. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263175_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT c FROM sbtest4 WHERE id BETWEEN 4875 AND 4974 ORDER BY c", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25143"}

*************************** 3. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263174_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT SUM(k) FROM sbtest1 WHERE id BETWEEN 5017 AND 5116", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25143"}

*************************** 4. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263173_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT c FROM sbtest8 WHERE id BETWEEN 4994 AND 5093", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25153"}

*************************** 5. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263172_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT c FROM sbtest3 WHERE id=4976", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25153"}

*************************** 6. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263171_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT c FROM sbtest3 WHERE id=5018", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25153"}

*************************** 7. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263170_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT c FROM sbtest3 WHERE id=5026", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25153"}

*************************** 8. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263169_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT c FROM sbtest3 WHERE id=5711", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25153"}

*************************** 9. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263168_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT c FROM sbtest3 WHERE id=5044", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25153"}

*************************** 10. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263167_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "SELECT c FROM sbtest3 WHERE id=5637", "timestamp": "2020-10-29T11:11:56Z", "command_class": "select", "connection_id": "25153"}

*************************** 11. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263151_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest9 (id, k, c, pad) VALUES (4998, 4986, '02171032529-62046503057-07366460505-11685363597-46873502976-33077071866-44215205484-05994642442-06380315383-02875729800', '19260637605-33008876390-94789070914-09039113107-89863581488')", "timestamp": "2020-10-29T11:11:56Z", "command_class": "insert", "connection_id": "25124"}

*************************** 12. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263133_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest8 (id, k, c, pad) VALUES (6081, 4150, '18974493622-09995560953-16579360264-35381241173-70425414992-87533708595-45025145447-98882906947-17081170077-49181742629', '20737943314-90440646708-38143024644-95915967543-47972430163')", "timestamp": "2020-10-29T11:11:56Z", "command_class": "insert", "connection_id": "25133"}

*************************** 13. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263126_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest2 (id, k, c, pad) VALUES (5014, 5049, '82143477938-07198858971-84944276583-28705099377-04269543238-74209284999-24766869883-70274359968-19384709611-56871076616', '89380034594-52170436945-89656244047-48644464580-26885108397')", "timestamp": "2020-10-29T11:11:56Z", "command_class": "insert", "connection_id": "25135"}

*************************** 14. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263119_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest5 (id, k, c, pad) VALUES (4995, 3860, '07500343929-19373180618-48491497019-86674883771-87861925606-04683804124-03278606074-05397614513-84175620410-77007118978', '19374966620-11798221232-19991603086-34443959669-69834306417')", "timestamp": "2020-10-29T11:11:56Z", "command_class": "insert", "connection_id": "25142"}

*************************** 15. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263112_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest10 (id, k, c, pad) VALUES (5766, 5007, '46189905191-42872108894-20541866044-43286474408-49735155060-20388245380-67571749662-72179825415-56363344183-47524887111', '24559469844-22477386116-04417716308-05721823869-32876821172')", "timestamp": "2020-10-29T11:11:56Z", "command_class": "insert", "connection_id": "25137"}

*************************** 16. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263083_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest7 (id, k, c, pad) VALUES (5033, 4986, '20695843208-59656863439-60406010814-11793724813-45659184103-02803540858-01466094684-30557262345-15801610791-28290093674', '14178983572-33857930891-42382490524-21373835727-23623125230')", "timestamp": "2020-10-29T11:11:56Z", "command_class": "insert", "connection_id": "25118"}

*************************** 17. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263076_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest1 (id, k, c, pad) VALUES (5029, 5016, '72342762580-04669595160-76797241844-46205057564-77659988460-00393018079-89701448932-22439638942-02011990830-97695117676', '13179789120-16401633552-44237908265-34585805608-99910166472')", "timestamp": "2020-10-29T11:11:56Z", "command_class": "insert", "connection_id": "25121"}

*************************** 18. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263036_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest1 (id, k, c, pad) VALUES (5038, 5146, '62239893938-24763792785-75786071570-64441378769-99060498468-07437802489-36899434285-44705822299-70849806976-77287283409', '03220277005-21146501539-10986216439-83162542410-04253248063')", "timestamp": "2020-10-29T11:11:55Z", "command_class": "insert", "connection_id": "25127"}

*************************** 19. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326263018_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest4 (id, k, c, pad) VALUES (5004, 5028, '15487433957-59189974170-83116468418-96078631606-58760747556-09307871236-40520753062-17596570189-73692856496-38267942694', '98937710805-24695902707-05013528796-18454393948-39118534483')", "timestamp": "2020-10-29T11:11:55Z", "command_class": "insert", "connection_id": "25129"}

*************************** 20. row ***************************

top10_select_insert: {"db": "sbtest", "ip": "192.168.10.200", "host": "", "name": "Query", "user": "cmon[cmon] @  [192.168.10.200]", "record": "326262989_2020-10-29T10:35:07", "status": 0, "os_user": "", "sqltext": "INSERT INTO sbtest3 (id, k, c, pad) VALUES (5015, 5030, '30613877119-41343977889-67711116708-96041306890-46480766663-68231747217-07404586739-83073703805-75534384550-12407169697', '65220283880-37505643788-94809192635-84679347406-74995175373')", "timestamp": "2020-10-29T11:11:55Z", "command_class": "insert", "connection_id": "25139"}

20 rows in set (0.00 sec)

Aggregate Your Audit Logs With Other Tools

Now that you are able to parse the output of your audit logs, you can start incorporating it to other external tools and start aggregating with your current environment or technology stack as long as it reads or supports JSON. For example, using ELK (Elasticsearch, Logstash Kibana) to parse and centralize your logs. You might also try to incorporate with Graylog or Fluentd. On the other hand, you might create your own viewer and incorporate with your current software setup. Using Percona Audit Log makes these things feasible to do more analysis with high productivity and of course feasible and extensible as well.

Tags:

↧

An Overview of ProxySQL Clustering in ClusterControl

December 1, 2020, 11:21 am

≫ Next: How to Deploy MariaDB Cluster 10.5 for High Availability

≪ Previous: Using the Percona Audit Log Plugin for Database Security

ProxySQL is a well known load balancer in MySQL world - it comes with a great set of features that allow you to take control over your traffic and shape it however you see it fit. It can be deployed in many different ways - dedicated nodes, collocated with application hosts, silo approach - all depends on the exact environment and business requirements. The common challenge is that you, most of the cases, want your ProxySQL nodes to contain the same configuration. If you scale out your cluster and add a new server to ProxySQL, you want that server to be visible on all ProxySQL instances, not just on the active one. This leads to the question - how to make sure you keep the configuration in sync across all ProxySQL nodes?

You can try to update all nodes by hand, which is definitely not efficient. You can also use some sort of infrastructure orchestration tools like Ansible or Chef to keep the configuration across the nodes in a known state, making the modifications not on ProxySQL directly but through the tool you use to organize your environment.

If you happen to use ClusterControl, it comes with a set of features that allow you to synchronize the configuration between ProxySQL instances but this solution has its cons - it is a manual action, you have to remember to execute it after a configuration change. If you forget to do that, you may be up to a nasty surprise if, for example, keepalived will move Virtual IP to the non-updated ProxySQL instance.

None of those methods is simple or 100% reliable and the situation is when the ProxySQL nodes have different configurations and might be potentially dangerous.

Luckily, ProxySQL comes with a solution for this problem - ProxySQL Cluster. The idea is fairly simple - you can define a list of ProxySQL instances that will talk to each other and inform others about the version of the configuration that each of them contains. Configuration is versioned therefore any modification of any setting on any node will result in the configuration version being increased - this triggers the configuration synchronization and the new version of the configuration is distributed and applied across all nodes that form the ProxySQL cluster.

The recent version of ClusterControl allows you to set up ProxySQL clusters effortlessly. When deploying ProxySQL you should tick the “Use Native Clustering” option for all of the nodes you want to be part of the cluster.

Once you do that, you are pretty much done - the rest happens under the hood.

MySQL [(none)]> select * from proxysql_servers;

+------------+------+--------+----------------+

| hostname   | port | weight | comment        |

+------------+------+--------+----------------+

| 10.0.0.131 | 6032 | 0      | proxysql_group |

| 10.0.0.132 | 6032 | 0      | proxysql_group |

+------------+------+--------+----------------+

2 rows in set (0.001 sec)

On both of the servers the proxysql_servers table was set properly with the hostnames of the nodes that form the cluster. We can also verify that the configuration changes are properly propagated across the cluster:

We have increased the Max Connections setting on one of the ProxySQL nodes (10.0.0.131) and we can verify that the other node (10.0.0.132) will see the same configuration:

In case of a need to debug the process, we can always look to the ProxySQL log (typically located in /var/lib/proxysql/proxysql.log) where we will see information like this:

2020-11-26 13:40:47 [INFO] Cluster: detected a new checksum for mysql_servers from peer 10.0.0.131:6032, version 11, epoch 1606398059, checksum 0x441378E48BB01C61 . Not syncing yet ...

2020-11-26 13:40:49 [INFO] Cluster: detected a peer 10.0.0.131:6032 with mysql_servers version 12, epoch 1606398060, diff_check 3. Own version: 9, epoch: 1606398022. Proceeding with remote sync

2020-11-26 13:40:50 [INFO] Cluster: detected a peer 10.0.0.131:6032 with mysql_servers version 12, epoch 1606398060, diff_check 4. Own version: 9, epoch: 1606398022. Proceeding with remote sync

2020-11-26 13:40:50 [INFO] Cluster: detected peer 10.0.0.131:6032 with mysql_servers version 12, epoch 1606398060

2020-11-26 13:40:50 [INFO] Cluster: Fetching MySQL Servers from peer 10.0.0.131:6032 started. Expected checksum 0x441378E48BB01C61

2020-11-26 13:40:50 [INFO] Cluster: Fetching MySQL Servers from peer 10.0.0.131:6032 completed

2020-11-26 13:40:50 [INFO] Cluster: Fetching checksum for MySQL Servers from peer 10.0.0.131:6032 before proceessing

2020-11-26 13:40:50 [INFO] Cluster: Fetching checksum for MySQL Servers from peer 10.0.0.131:6032 successful. Checksum: 0x441378E48BB01C61

2020-11-26 13:40:50 [INFO] Cluster: Writing mysql_servers table

2020-11-26 13:40:50 [INFO] Cluster: Writing mysql_replication_hostgroups table

2020-11-26 13:40:50 [INFO] Cluster: Loading to runtime MySQL Servers from peer 10.0.0.131:6032

This is the log from 10.0.0.132 where we can clearly see that a configuration change for table mysql_servers was detected on 10.0.0.131 and then it was synced and applied on 10.0.0.132, making it in sync with the other node in the cluster.

As you can see, clustering ProxySQL is an easy yet efficient way to ensure its configuration stays in sync and helps significantly to use larger ProxySQL deployments. Let us know down in the comments what your experience with ProxySQL clustering is.

Tags:

↧

How to Deploy MariaDB Cluster 10.5 for High Availability

December 7, 2020, 9:54 am

≫ Next: What is a Query Outlier and How to Fix It

≪ Previous: An Overview of ProxySQL Clustering in ClusterControl

ClusterControl 1.8.1 includes support for MariaDB Cluster 10.5. MariaDB 10.5 is equipped with.

More Granular Privileges
InnoDB Performance Improvements
Full GTID Support for Galera Cluster
More Metadata for Replication and Binary Logs
More SQL syntax statements are introduced (RETURNING statement to INSERT, EXCEPT ALL and INTERSECT ALL, …)
Performance Schema Updates to Match MySQL 5.7
The S3 Storage Engine

You can check further on our previous blog What’s New in MariaDB Server 10.5? to know more details about the release. With the specific updates for MariaDB Cluster, in version 10.5, the key features don’t differ much from MariaDB Cluster 10.4, but the list below are some of the important changes in this version.

GTID consistency
Cluster inconsistency/error voting
non-blocking DDL operations (available only on enterprise version)
Black box (available only on enterprise version)
Upgraded its Galera wsrep Library for which 26.4.6 is the latest version

XA Transaction Support is expected on this release ( do not be confused as XA Transactions are supported by MariaDB Server but not on Galera Cluster) but due to some reasons it's delayed and shall be expected on the next MariaDB Cluster 10.6 release.

MariaDB Cluster For High Availability

The MariaDB Cluster is basically a Galera Cluster that uses the MariaDB implementation as the database layer to interface with Innodb or XtraDB engine. MariaDB Galera Cluster is a virtually synchronous multi-master cluster for MariaDB. It is available on Linux only, and only supports the XtraDB/InnoDB storage engines (although there is experimental support for MyISAM - see the wsrep_replicate_myisam system variable). When Galera Cluster is in use, database reads and writes can be directed to any node. Any individual node can be lost without interruption in operations and without using complex failover procedures.

With Galera's nature adapted within MariaDB Cluster, it's basically a high availability solution with synchronous replication, failover, and resynchronization. It brings the benefits of no data loss, no slave lag, read and write scalability, and does high availability on different data centers.

Deploying MariaDB Cluster 10.5

MariaDB provides a very straightforward and easy setup for installing your MariaDB Cluster 10.5. The manual process can be tedious but with automated scripts provided by MariaDB, repositories can be setup in accordance to your target database version, OS type, and OS version.

For this exercise, I have the following 3-node Galera Cluster setup with the following IP addresses: 192.168.40.210, 192.168.40.220, 192.168.40.230.

Setup Your Repository

As mentioned earlier, MariaDB has a script named mariadb_repo_setup. It's very straightforward and easy to use. You can specify the target version of your database, the OS type, and the version of your OS.

For example, I am installing using CentOS 8,

curl -LsS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup |  sudo bash -s -- --mariadb-server-version="mariadb-10.5" --os-type=rhel --os-version=8

or installing it in Ubuntu Focal Fossa,

curl -LsS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup |  sudo bash -s -- --mariadb-server-version="mariadb-10.5" --os-type=ubuntu --os-version=focal

Take note that when using mariadb_repo_setup script, it requires the package apt-transport-https as a dependency. So install this package first before you can take advantage of the mariadb_repo_setup script.

apt update

apt install apt-transport-https

Now, run the command to your three nodes in accordance to their OS and of course the MariaDB version has to be 10.5.

Setup MySQL Configuration

Configuration file depends on your server resources, type of server environment, and assigned IP address. For this blog, you can use this production ready MariaDB Cluster/PXC configuration setup which we used to deploy in our Percona XtraDB Cluster/MariaDB Cluster databases using ClusterControl. Notable variables that you need or subject to change are the following:

innodb_buffer_pool_size - set the buffer pool from 70% - 80% available RAM of your server
wsrep_provider - Path of the Galera compiled library. For RHEL/CentOS, the path shall be /usr/lib64/galera-4/libgalera_smm.so. Whereas Debian/Ubuntu is in /usr/lib/galera/libgalera_smm.so.
wsrep_node_address - This is the node IP address
wsrep_sst_method - either you can change it but we recommend you use mariabackup. Possible values you can choose are rsync, mysqldump, xtrabackup, xtrabackup-v2, mariabackup.
wsrep_cluster_name - The name of your MariaDB Cluster. It has to be identical to all your nodes in a single MariaDB Cluster.
wsrep_cluster_address - This contains the addresses of your nodes within the cluster. It has to be a valid IP, hostname, or FQDN.
wsrep_node_name - The name of your node. The name can be used in wsrep_sst_donor as a preferred donor. Note that multiple nodes in a cluster can have the same name.

For performing SST, the user and password for the following sections [mysqldump], [xtrabackup], and [mysqld] can change if you want. For this exercise, let's keep it simple and you can just let the values as is.

Now, copy the configuration file and place it to /etc/my.cnf. Do this to all of your three Galera nodes.

Installing Required Packages

Install the packages for all the three Galera nodes. Follow the command below based on your target OS environment.

For RHEL/CentOS,

sudo yum install MariaDB-server MariaDB-client galera-4 MariaDB-backup

For Debian/Ubuntu,

sudo apt update

sudo apt-get install mariadb-server galera-4 mariadb-client libmariadb3 mariadb-backup mariadb-common

Once installation is done, stop the MariaDB process and initialize the cluster as a single node.This shall bootstrap your Galera Cluster. At this stage, I'm running it on node 192.168.40.210,

$ /usr/bin/galera_new_cluster

Create SST/IST User

Create the backup user which shall be used for SST or IST. Only run the following SQL statements below on the first node that you have initiated the cluster. At this stage, I have executed it in node 192.168.40.210.

CREATE USER backupuser@localhost IDENTIFIED BY 'backuppassword';

GRANT PROCESS, RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'backupuser'@'localhost';

At this point, we're already setup and ready to connect the remaining nodes.

Start The MariaDB Server

Now we have already setup the first node. We are ready to connect the remaining nodes. Simply, just start the mariadb service. Just run the command below,

systemctl start mariadb

Run the command for the remaining nodes. Do it one at a time.

At this point, all nodes are in synced and we can verify with the following SQL statement as follows,

Adding Steroids For Your MariaDB Cluster 10.5 for HA

Likely, in production setup, deploying the MariaDB Cluster 10.5 for HA might not suffice your needs. Adding more steroids such as installing HAProxy together with Keepalived for your redundancy, will bring more high availability for your database environment.

Setting up your HAProxy and then Keepalived adds more hassle to obtain the desired topology and environment. This can be achieved with automation using ClusterControl. Using ClusterControl allows you to deploy MariaDB Cluster 10.5 and add more solutions such as ProxySQL, MaxScale, or garbd for load balancing. While you can add Keepalived to your cluster to add redundancy and auto failover solutions in case of disaster happens.

You can download ClusterControl, which is free and allows you to deploy these HA solutions for free and manage your database giving you a 30-day subscription license to enjoy the benefits of advanced and enterprise features.

Deploying MariaDB 10.5 Cluster with ClusterControl

Once you have installed ClusterControl, click the icon from the upper right corner and you'll see the deployment Wizard just like below.

You can setup easily. Just follow the series of steps based on the flow of the UI,

Deploy HAProxy For Load Balancing Management

At this point, I would assume you have your MariaDB Cluster 10.5 all setup. Now, lets deploy the HAProxy,

Alternatively, you can go to Manage → Load Balancer → HAProxy.

Then select or type the address where the HAProxy to be installed and select your Galera nodes that have to be monitored by the HAProxy. See example below,

Add at least two deployments of HAProxy to add more availability such that whenever one of your HAProxy goes down, your application will route over to the other node that is still available or online. This is very important especially when you are handling database or system upgrades, aside from catastrophic or disaster events.

Deploy Keepalived

The same process you can use with how you deploy HAProxy with Clustercontrol. Below is a perfect example on you are going to deploy Keepalived,

If you noticed, I have two HAProxy instances for which I am going to install my Keepalived which shall be present in every node where the HAProxy is running.

Finalizing your MariaDB Cluster 10.5 With High Availability

Now that we have already setup, you have environment which shall look like this,

Conclusion

This setup gives you the benefits of achieving high availability with more 9's so to speak. HAProxy gives your MariaDB Cluster 10.5 more load balancing capability with its read and write separation. Then, Keepalived assures that in case one of your HAProxy dies, Keepalived assures that it will fail over to the next one available. Your application shall only connect to the virtual IP address (which follows the VRRP) and no extra configuration or setup to be done in case of such HAProxy instance failure occurs.

Of course, this setup is not perfect when you speak of high availability. You can replace HAProxy with ProxySQL to add more flexibility and add more read/write separation by just playing only in one port. Achieving a perfect setup for HA is hard to achieve and comes with drawbacks as well. But this setup, it gives you more high presence from your application and database interoperability. What matters most is that a lower downtime to a possible no downtime has to be achieved.

Tags:

mariadb galera cluster

galera

galera cluster

↧

What is a Query Outlier and How to Fix It

December 14, 2020, 11:55 am

≫ Next: Database Design 101: Partitions in MySQL

≪ Previous: How to Deploy MariaDB Cluster 10.5 for High Availability

The MySQL database workload is determined by the number of queries that it processes. There are several situations in which MySQL slowness can originate. The first possibility is if there are any queries that are not using proper indexing. When a query cannot make use of an index, the MySQL server has to use more resources and time to process that query. By monitoring queries, you have the ability to pinpoint SQL code that is the root cause of a slowdown and fix it before the overall performance degrades.

In this blog post, we are going to highlight the Query Outlier feature available in ClusterControl and see how it can help us improve the database performance. In general, ClusterControl performs MySQL query sampling in two ways:

Fetch the queries from the Performance Schema (recommended).
Parse the content of MySQL Slow Query.

If the Performance Schema is disabled, ClusterControl will then default to the Slow Query log. To learn more on how ClusterControl performs this, check out this blog post, How to use the ClusterControl Query Monitor for MySQL, MariaDB and Percona Server.

What are Query Outliers?

An outlier is a query that takes a longer time than the normal query time of that type. Do not literally take this as "badly written" queries. It should be treated as potential suboptimal common queries that could be improved. After a number of samples and when ClusterControl has had enough stats, it can determine if latency is higher than normal (2 sigmas + average_query_time) then it is an outlier and will be added into the Query Outlier.

This feature is dependent on the Top Queries feature. If Query Monitoring is enabled and Top Queries are captured and populated, the Query Outliers will summarize these and provide a filter based on timestamp. To see the list of queries that require attention, go to ClusterControl -> Query Monitor -> Query Outliers and should see some queries listed (if any):

As you can see from the screenshot above, the outliers are basically queries that took at least 2 times longer than the average query time. First the first entry, the average time is 34.41 ms while the outlier's query time is 140 ms (more than 2 times higher than the average time). Similarly, for the next entries, the Query Time and Avg Query Time columns are two important things to justify the outstandings of a particular outlier query.

It is relatively easy to find a pattern of a particular query outlier by looking at a bigger time period, like a week ago, as highlighted in the following screenshot:

By clicking on each row, you can see the full query which is really helpful to pinpoint and understand the problem, as shown in the next section.

Fixing the Query Outliers

To fix the outliers, we need to understand the nature of the query, the tables' storage engine, the database version, clustering type and how impactful the query is. In some cases, the outlier query is not really degrading to the overall database performance. As in this example, we have seen that the query has been standing out for the whole week and it was the only query type being captured so it is probably a good idea to fix or improve this query if possible.

As in our case, the outlier query is:

SELECT i2l.country_code AS country_code, i2l.country_name AS country_name 
FROM ip2location i2l 
WHERE (i2l.ip_to >= INET_ATON('104.144.171.139') 
AND i2l.ip_from <= INET_ATON('104.144.171.139')) 
LIMIT 1 
OFFSET 0;

And the query result is:

+--------------+---------------+
| country_code | country_name  |
+--------------+---------------+
| US           | United States |
+--------------+---------------+

Using EXPLAIN

The query is a read-only range select query to determine the user's geographical location information (country code and country name) for an IP address on table ip2location. Using the EXPLAIN statement can help us understand the query execution plan:

mysql> EXPLAIN SELECT i2l.country_code AS country_code, i2l.country_name AS country_name 
FROM ip2location i2l 
WHERE (i2l.ip_to>=INET_ATON('104.144.171.139') 
AND i2l.ip_from<=INET_ATON('104.144.171.139')) 
LIMIT 1 OFFSET 0;
+----+-------------+-------+------------+-------+--------------------------------------+-------------+---------+------+-------+----------+------------------------------------+
| id | select_type | table | partitions | type  | possible_keys                        | key         | key_len | ref  | rows  | filtered | Extra                              |
+----+-------------+-------+------------+-------+--------------------------------------+-------------+---------+------+-------+----------+------------------------------------+
|  1 | SIMPLE      | i2l   | NULL       | range | idx_ip_from,idx_ip_to,idx_ip_from_to | idx_ip_from | 5       | NULL | 66043 |    50.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+--------------------------------------+-------------+---------+------+-------+----------+------------------------------------+

The query is executed with a range scan on the table using index idx_ip_from with 50% potential rows (filtered).

Proper Storage Engine

Looking at the table structure of ip2location:

mysql> SHOW CREATE TABLE ip2location\G
*************************** 1. row ***************************
       Table: ip2location
Create Table: CREATE TABLE `ip2location` (
  `ip_from` int(10) unsigned DEFAULT NULL,
  `ip_to` int(10) unsigned DEFAULT NULL,
  `country_code` char(2) COLLATE utf8_bin DEFAULT NULL,
  `country_name` varchar(64) COLLATE utf8_bin DEFAULT NULL,
  KEY `idx_ip_from` (`ip_from`),
  KEY `idx_ip_to` (`ip_to`),
  KEY `idx_ip_from_to` (`ip_from`,`ip_to`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin

This table is based on the IP2location database and it is seldomly being updated/written, usually only on the first day of the calendar month (recommended by the vendor). So one option is to convert the table to MyISAM (MySQL) or Aria (MariaDB) storage engine with fixed row format to get better read-only performance. Note that this is only applicable if you are running on MySQL or MariaDB standalone or replication. On Galera Cluster and Group Replication, please stick to the InnoDB storage engine (unless you know what you are doing).

Anyway, to convert the table from InnoDB to MyISAM with fixed row format, simply run the following command:

ALTER TABLE ip2location ENGINE=MyISAM ROW_FORMAT=FIXED;

In our measurement, with 1000 random IP address lookup tests, the query performance improved around 20% with MyISAM and fixed row format:

Average time (InnoDB): 21.467823 ms
Average time (MyISAM Fixed): 17.175942 ms
Improvement: 19.992157565301 %

You can expect this result to be immediate after the table is altered. No modification on the higher tier (application/load balancer) is necessary.

Tuning the Query

Another way is to inspect the query plan and use a more efficient approach for a better query execution plan. The same query can also be written using subquery as below:

SELECT `country_code`, `country_name` FROM 
  (SELECT `country_code`, `country_name`, `ip_from` 
   FROM `ip2location` 
   WHERE ip_to >= INET_ATON('104.144.171.139') 
   LIMIT 1) 
AS temptable 
WHERE ip_from <= INET_ATON('104.144.171.139');

The tuned query has the following query execution plan:

mysql> EXPLAIN SELECT `country_code`,`country_name` FROM 
(SELECT `country_code`, `country_name`, `ip_from` 
FROM `ip2location` 
WHERE ip_to >= INET_ATON('104.144.171.139') 
LIMIT 1) 
AS temptable 
WHERE ip_from <= INET_ATON('104.144.171.139');
+----+-------------+--------------+------------+--------+---------------+-----------+---------+------+-------+----------+-----------------------+
| id | select_type | table        | partitions | type   | possible_keys | key       | key_len | ref  | rows  | filtered | Extra                 |
+----+-------------+--------------+------------+--------+---------------+-----------+---------+------+-------+----------+-----------------------+
|  1 | PRIMARY     | <derived2>   | NULL       | system | NULL          | NULL      | NULL    | NULL |     1 |   100.00 | NULL                  |
|  2 | DERIVED     | ip2location  | NULL       | range  | idx_ip_to     | idx_ip_to | 5       | NULL | 66380 |   100.00 | Using index condition |
+----+-------------+--------------+------------+--------+---------------+-----------+---------+------+-------+----------+-----------------------+

Using subquery, we can optimize the query by using a derived table that focuses on one index. The query should return only 1 record where the ip_to value is greater than or equal to the IP address value. This allows the potential rows (filtered) to reach 100% which is the most efficient. Then, check that the ip_from is less than or equal to the IP address value. If it is, then we should find the record. Otherwise, the IP address does not exist in the ip2location table.

In our measurement, the query performance improved around 99% using a subquery:

Average time (InnoDB + range scan): 22.87112 ms
Average time (InnoDB + subquery): 0.14744 ms
Improvement: 99.355344207017 %

With the above optimization, we can see a sub-millisecond query execution time of this type of query, which is a massive improvement considering the previous average time is 22 ms. However, we need to make some modifications to the higher tier (application/load balancer) in order to benefit from this tuned query.

Patching or Query Rewriting

Patch your applications to use the tuned query or rewrite the outlier query before it reaches the database server. We can achieve this by using a MySQL load balancer like ProxySQL (query rules) or MariaDB MaxScale (statement rewriting filter), or using the MySQL Query Rewriter plugin. In the following example, we use ProxySQL in front of our database cluster and we can simply create a rule to rewrite the slower query into the faster one, for example:

Save the query rule and monitor the Query Outliers page in ClusterControl. This fix will obviously remove the outlier queries from the list after the query rule is activated.

Conclusion

Query outliers is a proactive query monitoring tool that can help us understand and fix the performance problem before it is getting way out of control. As your application grows and becomes more demanding, this tool can help you maintain a decent database performance along the way.

Tags:

↧

Database Design 101: Partitions in MySQL

December 18, 2020, 2:45 am

≫ Next: Tips for Storing Your MariaDB Backups in the Cloud

≪ Previous: What is a Query Outlier and How to Fix It

In this blog post we are going to be discussing one of the most widely used features of MySQL - partitions.

What is Partitioning?

In MySQL, partitioning is a database design technique in which a database splits data into multiple tables, but still treats the data as a single table by the SQL layer. Simply put, when you partition a table, you split it into multiple sub-tables: partitioning is used because it improves the performance of certain queries by allowing them to only access a portion of the data thus making them faster. I/O operations can also be improved because data and indexes can be split across many disk volumes.

There are two types of partitioning: horizontal and vertical. Horizontal partitioning involves putting different rows into different tables, vertical partitioning on the other hand involves creating tables with fewer columns and using additional tables to store the remaining columns.

How Does Partitioning Work?

When SELECT queries are used, the partitioning layer opens and locks partitions, the query optimizer determines if any of the partitions can be pruned, then the partitioning layer forwards the handler API calls to the storage engine that handles the partitions.
When INSERT queries are used, the partitioning layer opens and locks partitions, determines which partition should the row belong to, then forwards the row to that partition.
When DELETE queries are used, the partitioning layer opens and locks partitions, determines which partition contains the row, then deletes the row from that partition.
When UPDATE queries are used, the partitioning layer opens and locks partitions, figures out which partition contains the row, fetches the row and modifies it, then determines which partition should contain the new row, forwards the row to the new partition with an insertion request, then forwards the deletion request to the original partition.

When Should You Use Partitioning?

In general, partitioning is useful when:

You have a lot of data that you need to query through.
Your tables are too big to fit in memory.
Your tables contain historical data and new data is added into the newest partition.
You think that you will need to distribute the contents of a table across different storage devices.
You think you will need to restore individual partitions.

If one or more of the scenarios described above describe your situation, partitioning may help. Before partitioning your data though, keep in mind that MySQL partitions have their own limitations:

Partitioning expressions do not permit the use of stored procedures, stored functions, user-defined functions (UDFs) or plugins, and with limited support for SQL functions. You also cannot use declared or stored variables.
Partitioned tables cannot contain or be referenced by foreign keys.
There is a limit of 1,024 partitions per table (starting from MariaDB 10.0.4, tables can contain a maximum of 8,192 partitions).
A table can only be partitioned if the storage engine supports partitioning.
The query cache is not aware of partitioning or partition pruning.
All partitions must use the same storage engine.
FullTEXT indexes is not supported
Temporary tables cannot be partitioned

The options above should help you make up your mind whether partitioning is an option for you or not.

Partitioning Types

If you decide to use partitions, keep in mind that you have a number of partitioning types to choose from. We will briefly cover your options below, then dive deeper into them:

Partitioning by RANGE can help you to partition rows based on column values falling within a given range.
Partitioning by LIST can help you to partition rows based on the membership of column values in a given list.
Partitioning by HASH can help you to partition rows based on a value returned by a user-defined expression.
Partitioning by KEY can help you to partition rows based on a hashing function provided by MySQL.

Partitioning by RANGE

Partitioning by RANGE is one of the most popular forms of partitioning MySQL tables. When you partition a table by RANGE, you partition the table in such a way that each partition contains a certain number of rows that fall within a given range. To define a partition, define the name of it, then tell it which values it should hold - to partition a table by range, add a PARTITION BY RANGE statement. For example, if you would want to name your partition p0 and make it hold every value that is less than 5, you would need to make sure that your query contains PARTITION p0 VALUES LESS THAN (5). Here’s an example of a partitioned table:

CREATE TABLE sample_table (

id INT(255) NOT NULL AUTO_INCREMENT PRIMARY KEY,

column_name VARCHAR(255) NOT NULL DEFAULT ‘’

...

) PARTITION BY RANGE (column_name) (

PARTITION p0 VALUES LESS THAN (5),

PARTITION p1 VALUES LESS THAN (10),

PARTITION p2 VALUES LESS THAN (15),

PARTITION p3 VALUES LESS THAN (20),

...

);

You can also define a partition that holds all of the values that do not fall in certain ranges like so:

PARTITION p5 VALUES LESS THAN MAXVALUE

The above partition is named p5 and it holds all values other partitions do not - MAXVALUE represents a value that is always higher than the largest possible value. You can also use functions by defining your partitions like so:

PARTITION BY RANGE (YEAR(date)) (

    PARTITION p0 VALUES LESS THAN (2000),

    PARTITION p1 VALUES LESS THAN (2010),

    PARTITION p2 VALUES LESS THAN (2020),

    PARTITION p3 VALUES LESS THAN MAXVALUE

);

In this case, all values that are less than 2000 are stored in the partition p0, all values that are less than 2010 are stored in the partition p1, all values that are less than 2020 are stored in the partition p2 and all values that do not fall in any of these ranges are stored in the partition p3.

Partitioning by LIST

Partitioning MySQL tables by LIST is similar to partitioning by RANGE - the main difference of partitioning tables by LIST is that when tables are partitioned by LIST each partition is defined and selected based on the membership of a column value in a set of value lists rather than a range of values. Partitioning by LIST can be useful when you know that, for example, you have data that can be divided into multiple smaller sets of data (say, regions). Suppose that you have a store that has 4 franchises: one in the central part of town, second in the north, third in the east, fourth in the west. You can partition a table in such a way that data belonging to a certain franchise would be stored in a partition dedicated to that franchise:

PARTITION BY LIST(store) (

PARTITION central VALUES IN (1,3,5),

PARTITION north VALUES IN (2,4,7),

PARTITION east VALUES IN (8,9),

PARTITION west VALUES IN (10,11)

);

Partitioning by HASH

Partitioning MySQL tables by HASH can be a way to make sure that data across partitions is distributed evenly. If you are partitioning your tables by HASH you only need to specify how many partitions you need your data to be divided into - the rest is taken care of MySQL. You can use partitioning by HASH by adding the following statement to CREATE TABLE:

PARTITION BY HASH(id)

PARTITIONS 5;

Replace 5 with the number that specifies into how many partitions you need your data to be divided into - the default number is 1.

MySQL also supports partitioning by LINEAR HASH - linear hashing differs from regular hashing because linear hashing utilizes a linear powers-of-two algorithm. To partition tables by a LINEAR HASH, replace PARTITION BY HASH with PARTITION BY LINEAR HASH.

Partitioning by KEY

Partitioning MySQL tables by KEY is similar to partitioning MySQL tables by HASH - in this case, the hashing function for key partitioning is supplied by the MySQL server. Any columns that are used as the partitioning key must comprise the entire table’s primary key or at least be a part of the table’s primary key. If no column name is specified as the partitioning key, the primary key will be used. If there is no primary key, but there is a unique key, the unique key will be used instead. For example, following statements are both valid, even though the first statement does not even specify the partitioning key:

CREATE TABLE demo_table (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL DEFAULT ''
)
PARTITION BY KEY()
PARTITIONS 2;

CREATE TABLE demo_table (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
)
PARTITION BY KEY(id)
PARTITIONS 5;

Conclusion

To summarize, partitions can be helpful if you have a lot of data, your tables are too big to fit in memory or they contain historical data. Partitions can also be useful if you think that you will need to distribute the contents of a table across different storage mediums, also if you want to have the option to delete or restore individual partitions.

However, do keep in mind that partitions in MySQL have their own disadvantages. One of the major disadvantages of partitioning is that it will make your tables bigger - you cannot gain speed without compromising on space. If you have a very large set of data this can be a pretty big problem.

Tags:

↧

Tips for Storing Your MariaDB Backups in the Cloud

December 22, 2020, 2:45 am

≫ Next: Boosting Performance by Using Read Write Splitting of Database Traffic with Moodle 3.9

≪ Previous: Database Design 101: Partitions in MySQL

Having a good Disaster Recovery Plan is a must in all companies to prevent data loss or reduce downtime in case of failure. For this, backups are a basic point here, and it is essential to define which type of backup you need to create and where to store it. The best practice is to store the backup files in three different places, one stored locally on the database server (for faster recovery), another one in a centralized backup server, and the last one in the cloud (or if your infrastructure is in the cloud, you should use a different cloud provider in this part). In this blog, we will mention different things to take into account before storing your MariaDB backups in the cloud and how to use ClusterControl for this task.

Cloud Providers

There are many cloud providers offering different backup storage options and features. You will need to check the features and the costs to make sure you are covering your needs and it fits your budget. Now, we will mention some important things that you should check here.

Security

This could be the most important point to check before storing your data in the cloud. The cloud provider should offer encryption for data-at-rest (and even in-transit) if you want to store the backup there. This encryption protects the data from being used by an unauthorized person during the time that it is stored in the cloud.

Compliance

The cloud provider should follow privacy laws and comply with some regulations to provide maximum data protection. The EU’s General Data Protection Regulation (GDPR) has strict regulations on storing sensitive data. Also, several EU members don’t allow to store sensitive data outside the national boundaries, so it is important to take this into account.

Easy Management

The cloud provider should provide an easy management console where to configure, manage, and monitor your backups stored in the cloud, otherwise, you can convert a simple task to a complex one, which doesn’t make sense.

Availability and Durability Policies

Some Cloud Providers have at least 99.99% uptime, but it is always good to check their SLA on the different offerings on availability and durability. The Cloud Providers might offer different solutions priced higher to achieve high availability and durability, and depending on the business, it could be necessary to use a different solution than the default one.

Costs

The cost could be the most crucial point and also quite complicated as Cloud Providers often display their cost to make it look cheap at a glance.

In general, there are three criteria for evaluating the cost of Cloud Storage:

Storage Cost: It is usually calculated per GB/MB depending on the type of data and activity level.
Access to data: Depends on how fast you will need to access the data. Storage for cold backups is usually lower but could increase based on volume and retention period.
SLA: Necessary if you require a guarantee on uptime and lower downtime.

After checking the basic points mentioned above, you will be able to store your MariaDB Backups in the Cloud in the selected Cloud Provider, but now, you should decide how to upload it there. Of course, you can upload it manually, but it will be annoying for sure, so to avoid a manual task you should create a cron job or a custom script, which could fail, so you will also need to monitor the job. All this could be a time-consuming task, and here is when ClusterControl can make your life easier.

Upload Your Backups to the Cloud with ClusterControl

ClusterControl is a management system for open source databases that automates deployment, backups, and management functions, as well as health and performance monitoring for different database technologies and environments.

Let’s see how to store your MariaDB backups in the Cloud using AWS as an example, but actually, you can integrate ClusterControl with Google Cloud or Azure too. For this, we will assume you have ClusterControl installed and it is managing your MariaDB cluster.

Creating a Backup

For this task, go to ClusterControl -> Select MariaDB Cluster -> Backup -> Create Backup.

You can create a new backup or configure a scheduled one. For this example, we will create a single backup instantly.

You must choose the backup method (mysqldump, or mariabackup full/incremental), the server from which the backup will be taken, and where you want to store the backup. Here you can also upload your backup to the cloud by enabling the corresponding button.

Then you can specify the use of compression, compression level, encryption, retention, and more backup settings.

If you enable the upload backup to the cloud option, you will see a section to specify the cloud provider (in this case AWS, but you can add more Cloud Providers in ClusterControl -> Integrations -> Cloud Providers. For AWS, it uses the S3 service, so you must select an existing Bucket or even create a new one to store your backups there.

In the backup section, you will see the progress of the backup and information like method, size, location, and more. In “Storage Location”, you can find the Cloud Icon, which means that the backup is stored in the Cloud too.

When it finishes, you will find the backup in the selected location and in the Cloud Provider.

Conclusion

As data is an important asset in a company, storing your MariaDB Backups in the Cloud could be risky if you don’t take care of some basic things before uploading it, like security, or availability. Also, the cost is an important factor as, depending on the requirement, it could be more expensive than expected.

In this blog, we mentioned some important things to take into consideration before choosing a Cloud Provider to store your data, and how you can upload your backups easily by using ClusterControl for this task.

Tags:

↧

Boosting Performance by Using Read Write Splitting of Database Traffic with Moodle 3.9

January 8, 2021, 11:07 am

≫ Next: How to Automatically Manage Failover of the MySQL Database for Moodle

≪ Previous: Tips for Storing Your MariaDB Backups in the Cloud

Moodle is a very well known Learning Management System which is intended to help educational organizations organize their online learning activities. As you can imagine, given the online shift in 2020 caused by the COVID-19, such systems became very popular and the load those systems have to handle has increased significantly. Many administrators are wondering how to improve the performance of the database that backs up Moodle installation. Luckily, if you are running Moodle 3.9 or later you have some built-in options that can help you to boost performance. In this blog post we will show you how to do it.

First of all, we assume that you have a Moodle installation with a single database node. Let’s take a look at the steps you may want to take to improve the performance of your Moodle database. Of course, all of the steps that we explain here can be performed by hand. We are going to use ClusterControl for that as we value our time.

Assuming you have ClusterControl installed, the first step will be to import an existing database node.

The SSH connectivity using a passwordless key has to be in place. We have it set up as a root user with an SSH key located in /root/.ssh/id_rsa.

As the next step we defined the superuser and its password. We also enabled information_schema queries (as we know we don’t have tens of thousands of tables) and both autorecovery options so ClusterControl will be able to recover our database if needed.

After a brief moment our database shows on the list of clusters:

Now, we can start to scale out our cluster by adding more slaves. We should ensure that the master has binary logs enabled. If not, it can be done from the ClusterControl. Please keep in mind that enabling binary logs requires restart so you probably want to do it in a time when the load is the lowest and, ideally, giving some heads up to the users of your Moodle platform.

We passed the IP (or hostname) of the node we want to use as a slave. ClusterControl will provision it with the data from our master node. We could also use backups to provision a slave but we haven’t taken any backups using ClusterControl so far.

Installation will take a couple of minutes, we can follow the progress by looking at the job logs in ClusterControl.

Adding a slave to our system doesn’t make any difference. We have to tell Moodle to actually start using it. Luckily, Moodle has a feature that lets you configure slave nodes and then the “safe” writes will be redirected to them, reducing the load on the master and improving overall performance.

In the configuration file (config-dist.php) you can see the ‘readonly’ section of the ‘dboptions’ array. In that place you can define one or more slave nodes that will be used by Moodle to send the traffic to.

'readonly' => [          // Set to read-only slave details, to get safe reads

                            // from there instead of the master node. Optional.

                            // Currently supported by pgsql and mysqli variety classes.

                            // If not supported silently ignored.

     'instance' => [        // Readonly slave connection parameters

       [

         'dbhost' => '10.0.0.132',

         'dbport' => '',    // Defaults to master port

         'dbuser' => '',    // Defaults to master user

         'dbpass' => '',    // Defaults to master password

       ],

       [...],

     ],

As you can see, we can add more than one slave host, allowing us to spread the safe reads across multiple nodes which you can easily provision from ClusterControl and reduce the load on the cluster.

If you are interested in more advanced, highly available database setups for Moodle, we have several blog posts on this topic describing, among others, how you can utilize Moodle with Galera Cluster as a backend. We also described the more advanced scaling techniques for Moodle, involving ProxySQL load balancing.

Let us know your thoughts and experience on working with Moodle.

Tags:

↧

How to Automatically Manage Failover of the MySQL Database for Moodle

January 22, 2021, 11:14 am

≫ Next: How to Deploy the Chamilo MariaDB Database for High Availability

≪ Previous: Boosting Performance by Using Read Write Splitting of Database Traffic with Moodle 3.9

In our previous blogs, we made justification for why you need a database failover and have explained how a failover mechanism works. I’m sharing this in case you have questions on why you should set up a failover mechanism for your MySQL database. If you do, please read our previous blog posts.

How To Setup Automatic Failover

The advantage with using MySQL or MariaDB for automatically managing your failover is that there are available tools you can use and implement in your environment. From open source ones to enterprise grade solutions. Most tools are not only failover capable, there are other features such as switchover, monitoring and advanced features that can offer more management capabilities for your MySQL database cluster. Below, we'll go over the most common ones that you can use.

Using MHA (Master High Availability)

We have taken this topic with MHA with its most common issues and how to fix them. We have also compared MHA with MRM or with MaxScale.

Setting up with MHA for high availability might not be easy but it's efficient to use and flexible as there are tunable parameters you can define to customize your failover. MHA has been tested and used. But as technology advances, MHA has been lagging behind as it does not support GTID for MariaDB and it has not been pushing any updates for the last 2 or 3 years.

By running the masterha_manager script,

masterha_manager --conf=/etc/app1.cnf

Where a sample /etc/app1.cnf shall look like as follows,

[server default]

user=cmon

password=pass

ssh_user=root

# working directory on the manager

manager_workdir=/var/log/masterha/app1

# working directory on MySQL servers

remote_workdir=/var/log/masterha/app1

[server1]

hostname=node1

candidate_master=1

[server2]

hostname=node2

candidate_master=1

[server3]

hostname=node3

no_master=1

Parameters such as no_master and candidate_master shall be crucial as you set whitelisting desired nodes to be your target master and nodes that you do not want to be a master.

Once set, you are ready to have failover for your MySQL database in case failure on the primary or master occurs. The script masterha_manager manages the failover (automatic or manual), takes decisions on when and where to failover, and manages slave recovery during promotion of the candidate master for applying differential relay logs. If the master database dies, MHA Manager will coordinate with MHA Node agent as it applies differential relay logs to the slaves that do not have the latest binlog events from the master.

Checkout what MHA Node agent does and its scripts involved. Basically, it's the script that the MHA Manager will invoke when failover occurs. It will wait for its mandate from MHA Manager as it searches for the latest slave that contains the binlog events and copies missing events from the slave using scp and applies them to itself. As mentioned, it applies relay logs, purge relay logs, or save binary logs.

If you want to know more about tunable parameters and how to customize your failover management, checkout the Parameters wiki page for MHA.

Using Orchestrator

Orchestrator is a MySQL and MariaDB high availability and replication management tool. It is released by Shlomi Noach under the terms of the Apache License, version 2.0. This is an open source software and handles automatic failover but there's tons of things you can customize or do to manage your MySQL/MariaDB database aside from recovery or automatic failover.

Installing Orchestrator can be easy or straightforward. Once you have downloaded the specific packages required for your target environment, you are then ready to register your cluster and nodes to be monitored by Orchestrator. It provides a UI for which this is very easy to manage but has lots of tunable parameters or set of commands that you can use to attain your failover management.

Let's consider that you have finally setup and Registering the cluster by adding our primary or master node can be done by the command below,

$ orchestrator -c discover -i pupnode21:3306

2021-01-07 12:32:31 DEBUG Hostname unresolved yet: pupnode21

2021-01-07 12:32:31 DEBUG Cache hostname resolve pupnode21 as pupnode21

2021-01-07 12:32:31 DEBUG Connected to orchestrator backend: orchestrator:?@tcp(127.0.0.1:3306)/orchestrator?timeout=1s

2021-01-07 12:32:31 DEBUG Orchestrator pool SetMaxOpenConns: 128

2021-01-07 12:32:31 DEBUG Initializing orchestrator

2021-01-07 12:32:31 INFO Connecting to backend 127.0.0.1:3306: maxConnections: 128, maxIdleConns: 32

2021-01-07 12:32:31 DEBUG Hostname unresolved yet: 192.168.40.222

2021-01-07 12:32:31 DEBUG Cache hostname resolve 192.168.40.222 as 192.168.40.222

2021-01-07 12:32:31 DEBUG Hostname unresolved yet: 192.168.40.223

2021-01-07 12:32:31 DEBUG Cache hostname resolve 192.168.40.223 as 192.168.40.223

pupnode21:3306

Now, we have our cluster added.

If a primary node fails (hardware failure or encountered crashed), Orchestrator will detect and find the most advanced node to be promoted as the primary or master node.

Now, we have two nodes remaining in the cluster while the primary is down.

$ orchestrator-client -c topology -i pupnode21:3306

pupnode21:3306 [unknown,invalid,10.3.27-MariaDB-log,rw,ROW,>>,downtimed]

$ orchestrator-client -c topology -i pupnode22:3306

pupnode22:3306   [0s,ok,10.3.27-MariaDB-log,rw,ROW,>>]

+ pupnode23:3306 [0s,ok,10.3.27-MariaDB-log,ro,ROW,>>,GTID]

Using MaxScale

MariaDB MaxScale has been supported as a database load balancer. Over the years MaxScale has grown and matured, extended with several rich features and that includes automatic failover. Since MariaDB MaxScale 2.2 was released, it introduces several new features including replication cluster failover management. You can read our previous blog regarding MaxScale failover mechanism.

Using MaxScale is under BSL although the software is freely available but requires you to at least buy service with MariaDB. It might not be suitable but in case you have acquired MariaDB enterprise services, then this can be a great advantage if you require failover management and its other features.

Installation of MaxScale is easy but setting up the required configuration and defining its parameters is not, and that it requires that you have to understand the software. You can refer to their configuration guide.

For quick and fast deployment, you can use ClusterControl to install MaxScale for you in your existing MySQL/MariaDB environment.

Once installed, setting up your Moodle database can be done by pointing your host to the MaxScale IP or hostname and the read-write port. For example,

For which port 4008 is your read-write for your service listener. For example, here's the following service and listener configuration for my MaxScale.

$ cat maxscale.cnf.d/rw-listener.cnf

[rw-listener]

type=listener

protocol=mariadbclient

service=rw-service

address=0.0.0.0

port=4008

authenticator=MySQLAuth



$ cat maxscale.cnf.d/rw-service.cnf

[rw-service]

type=service

servers=DB_123,DB_122,DB_124

router=readwritesplit

user=maxscale_adm

password=42BBD2A4DC1BF9BE05C41A71DEEBDB70

max_slave_connections=100%

max_sescmd_history=15000000

causal_reads=true

causal_reads_timeout=10

transaction_replay=true

transaction_replay_max_size=32Mi

delayed_retry=true

master_reconnection=true

max_connections=0

connection_timeout=0

use_sql_variables_in=master

master_accept_reads=true

disable_sescmd_history=false

While in your monitor configuration, you must not forget to enable the automatic failover or also enable auto rejoin if you want the previous master to fail to auto rejoin when going back online. It goes like this,

$ egrep -r 'auto|^\['  maxscale.cnf.d/replication_monitor.cnf

[replication_monitor]

auto_failover=true

auto_rejoin=1

Take note that the variables I have stated are not meant for production use but only for this blog post and test purposes. The good thing with MaxScale, once the primary or master goes down, MaxScale is smart enough to promote the ideal or best candidate to take the role of the master. Hence, no need to change your IP and port as we have used the host/IP of our MaxScale node and its port as our endpoint once master goes down. For example,

[192.168.40.223:6603] MaxScale> list servers



┌────────┬────────────────┬──────┬─────────────┬─────────────────┬──────────────────────────┐

│ Server │ Address        │ Port │ Connections │ State           │ GTID                     │

├────────┼────────────────┼──────┼─────────────┼─────────────────┼──────────────────────────┤

│ DB_124 │ 192.168.40.223 │ 3306 │ 0           │ Slave, Running  │ 3-2003-876,5-2001-219541 │

├────────┼────────────────┼──────┼─────────────┼─────────────────┼──────────────────────────┤

│ DB_123 │ 192.168.40.221 │ 3306 │ 0           │ Master, Running │ 3-2003-876,5-2001-219541 │

├────────┼────────────────┼──────┼─────────────┼─────────────────┼──────────────────────────┤

│ DB_122 │ 192.168.40.222 │ 3306 │ 0           │ Slave, Running  │ 3-2003-876,5-2001-219541 │

└────────┴────────────────┴──────┴─────────────┴─────────────────┴──────────────────────────┘

Node DB_123 which points to 192.168.40.221 is the current master. Terminating the node DB_123 shall trigger MaxScale to perform a failover and it shall look like this,

[192.168.40.223:6603] MaxScale> list servers



┌────────┬────────────────┬──────┬─────────────┬─────────────────┬──────────────────────────┐

│ Server │ Address        │ Port │ Connections │ State           │ GTID                     │

├────────┼────────────────┼──────┼─────────────┼─────────────────┼──────────────────────────┤

│ DB_124 │ 192.168.40.223 │ 3306 │ 0           │ Slave, Running  │ 3-2003-876,5-2001-219541 │

├────────┼────────────────┼──────┼─────────────┼─────────────────┼──────────────────────────┤

│ DB_123 │ 192.168.40.221 │ 3306 │ 0           │ Down            │ 3-2003-876,5-2001-219541 │

├────────┼────────────────┼──────┼─────────────┼─────────────────┼──────────────────────────┤

│ DB_122 │ 192.168.40.222 │ 3306 │ 0           │ Master, Running │ 3-2003-876,5-2001-219541 │

└────────┴────────────────┴──────┴─────────────┴─────────────────┴──────────────────────────┘

Whilst, our Moodle database is still up and running as our MaxScale points to the latest master that was promoted.

$ mysql -hmaxscale.local.domain -umoodleuser -pmoodlepassword -P4008

Welcome to the MariaDB monitor.  Commands end with ; or \g.

Your MariaDB connection id is 9

Server version: 10.3.27-MariaDB-log MariaDB Server



Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.



Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.



MariaDB [(none)]> select @@hostname;

+------------+

| @@hostname |

+------------+

| 192.168.40.222  |

+------------+

1 row in set (0.001 sec)

Using ClusterControl

ClusterControl can be downloaded freely and offers licenses for Community, Advance, and Enterprise. The automatic failover is only available on Advance and Enterprise. Automatic failover is covered under our Auto-Recovery feature which tries to recover a failed cluster or a failed node. If you want more details on how to perform this, check out our previous post How ClusterControl Performs Automatic Database Recovery and Failover. It offers tunable parameters which are very convenient and easy to use. Please read our previous post also on How to Automate Database Failover with ClusterControl.

Managing your automatic failover for your Moodle database must at least require a virtual IP (VIP) as your endpoint for your Moodle application client interfacing your database backend. To do this, you can deploy Keepalived with HAProxy (or ProxySQL--depends on your load balancer choice) on top of it. In this case, your Moodle database endpoint shall point to the virtual IP, which is basically assigned by Keepalived once you have deployed it, same as how we showed you earlier when setting up MaxScale. You can also check this blog on how to do it.

As mentioned above, tunable parameters are available which you can just set via your /etc/cmon.d/cmon_<CLUSTER_ID>.cnf located in your ClusterControl host wherein CLUSTER_ID is the id of your cluster. These are the parameters which would help you manage your autofailover more efficiently,

replication_check_binlog_filtration_bf_failover
replication_check_external_bf_failover
replication_failed_reslave_failover_script
replication_failover_blacklist
replication_failover_events
replication_failover_wait_to_apply_timeout
replication_failover_whitelist
replication_onfail_failover_script
Replication_post_failover_script
replication_post_unsuccessful_failover_script
replication_pre_failover_script
replication_skip_apply_missing_txs
replication_stop_on_error

ClusterControl is very flexible when managing the failover so you can do some pre-failover or post-failover tasks.

Conclusion

There are other great choices when setting up and automatically managing your failover for your MySQL database for Moodle. It depends on your budget and what you likely have to spend money for. Using open source ones requires expertise and requires multiple testing to get familiarized as there's no support you can run when you need help other than the community. With enterprise solutions, it comes with a price but offers you support and ease as the time consuming work can be diminished. Take note that if failover is used mistakenly, it can cost damage to your database if not properly handled and managed. Focus on what is more important and how you are capable of the solutions you are utilizing for managing your Moodle database failover.

Tags:

↧

How to Deploy the Chamilo MariaDB Database for High Availability

January 25, 2021, 11:53 am

≫ Next: How to Backup Your Moodle MariaDB Database

≪ Previous: How to Automatically Manage Failover of the MySQL Database for Moodle

Learning Management System (LMS) platforms allow you to learn/teach remotely, something that is really important due to the current situation around the world.

That being said, we can also say High Availability is a must in this kind of platform, otherwise, it could not be accessible when it is needed.

In this blog, we will see how to deploy a MariaDB database for High Availability to be used for one of the most popular LMS platform options, Chamilo LMS.

What is Chamilo?

Chamilo LMS is a free Learning Management System (LMS) designed for online education and developed through the collaboration of many companies and individuals developers.

As a teacher, using Chamilo you can access a series of useful tools to create an effective learning environment. Some of these tools are:

Import or create documents (audio, video, images) and publish them
Build tests and exams with automated scores and feedback as required
Set and receive virtual assignments
Describe the components of the course through description sections
Communicate through forums or chat
Publish announcements
Add links
Create work groups or laboratory groups
Set up a virtual classroom
Create surveys
Add a wiki to create documents collaboratively
Use a glossary and an agenda
Enable tracking of learners in your courses
Register attendances
Elaborate a class diary, and more

The Chamilo platform is extremely flexible. All its tools can be customized according to the needs of each course. It provides a friendly and intuitive user interface that requires no special prior technical knowledge or skills.

So, the question is, how can you deploy a MariaDB database for High Availability to be used for this system?

MariaDB Database for High Availability

There are different approaches to deploy a MariaDB Database for High Availability. Let’s see the two main options.

MariaDB Master-Slave Replication

You can run a master-slave setup using asynchronous or semi-synchronous replication. The advantage of this simple option is that, when the master is unavailable, you can promote one of the slaves and continue working as usual. The main issue with this setup is that the failover has to be performed manually, or even using an external tool like ClusterControl. It means you will have a (short) downtime, which could be acceptable or not for your business.

MariaDB Cluster

Another approach would be to use a Galera Cluster to store the data from Chamilo LMS. You can start using it with three nodes, and it can automatically handle the failure of one of these nodes. The remaining two nodes will continue working receiving connections from the Chamilo application. It means you won’t have downtime in this case, but as it is a more complex topology, you will require more knowledge about this technology, and depending on the workload, it could not be the best option.

Load Balancers

To improve High Availability, both options will require a Load Balancer in front of them, which would handle the traffic and redirect it to an available/healthy node.

ProxySQL is a dedicated load balancer for MySQL which comes with a variety of features including query redirecting, query caching, and traffic shaping. It can be used to easily set up a read-write split and redirect queries to separate backend nodes.

HAProxy is a load balancer that distributes traffic from one origin to one or more destinations and can define specific rules and/or protocols for this task. If any of the destinations stops responding, it is marked as offline, and the traffic is sent to the rest of the available destinations.

Keepalived is a service that allows you to configure a Virtual IP Address within an active/passive group of servers. This Virtual IP Address is assigned to an active server. If this server fails, the IP Address is automatically migrated to the “Secondary” passive server, allowing it to continue working with the same IP Address in a transparent way for the systems.

Using only one Load Balancer node will add a single point of failure in your topology, so, you can use the combination of ProxySQL+Keepalived or HAProxy+Keepalived using two Load Balancer nodes (at least) and Keepalived between them.

Now, let’s see how ClusterControl can help you to deploy a MariaDB Database adding Load Balancers and a Virtual IP Address to be used in your Chamilo application.

Chamilo MariaDB Database Deployment

As an example, we will deploy a MariaDB Master-Slave Replication, which will be used by the Chamilo application. For this, we will use ClusterControl to deploy 2 MariaDB Database nodes (master-slave), and 2 HAProxy Load Balancers with Keepalived configured between them.

MariaDB Database Deployment

To perform a deployment from ClusterControl, simply select the option “Deploy” and follow the instructions that appear.

When selecting MySQL Replication, you must specify User, Key or Password, and Port to connect by SSH to your servers. You can also add a name for your new cluster and if you want ClusterControl to install the corresponding software and configurations for you.

After setting up the SSH access information, you need to select the database vendor/version, and define the database credentials, port, and data directory. You can also specify which repository to use.

In the next step, you need to add your servers to the cluster that you are going to create using the IP Address or Hostname.

Once the task is finished, you can see your new MariaDB cluster in the main ClusterControl screen.

Now you have your cluster created, you can perform several tasks on it, like adding a Load Balancer or a new replica.

Load Balancer Deployment

To perform a Load Balancer deployment, select the option “Add Load Balancer” in the cluster actions, and complete the asked information.

You only need to add IP or Hostname, Port, Policy, and the nodes you are going to use for load balancing. You can deploy it using two different ports (read/write and read-only), or you can use just one read/write port to send all the traffic there.

Keepalived Deployment

To perform a Keepalived deployment, select the option “Add Load Balancer” in the cluster actions and then, go to the Keepalived Tab.

Here, select the HAProxy nodes, and specify the Virtual IP Address that will be used to access the database.

Now, let’s connect this environment to the Chamilo application.

Chamilo Database Configuration

During the Chamilo deployment, in the step 4, you will need to add the database configuration.

Here you should use the Virtual IP address to access your MariaDB Database, and the database credentials.

For more details about the Chamilo configuration your can refer to the Official Documentation.

That’s it! You have your Chamilo Application using a MariaDB Database with High Availability.

ClusterControl Autorecovery Feature

In case of failure, ClusterControl will promote the most advanced slave node to master as well as notify you of the problem. It also fails over the rest of the slave nodes to replicate from the new master server.

By default, HAProxy is configured with two different ports: read-write and read-only. In the read-write port, you have your master node as online and the rest of the nodes as offline, and in the read-only port, you have both the master and the slave nodes online.

When HAProxy detects that one of your nodes is not accessible, it automatically marks it as offline and does not take it into account for sending traffic to it. Detection is done by health check scripts that are configured by ClusterControl at the time of deployment. These check whether the instances are up, whether they are undergoing recovery, or are read-only.

When ClusterControl promotes a slave node, HAProxy marks the old master as offline for both ports and puts the promoted node online in the read-write port.

If your active HAProxy, which is assigned a Virtual IP Address to which your systems connect, fails, Keepalived migrates this IP Address to your passive HAProxy automatically. This means that your systems are then able to continue to function normally.

MariaDB Database Deployment using the ClusterControl CLI

If you prefer to deploy the MariaDB Cluster using command-line, you can use the ClusterControl command-line client tool called "s9s". This tool will send a deployment job to the ClusterControl server and it will perform all the necessary steps to deploy the cluster.

For example, you can run the following command on the ClusterControl server to create a MariaDB master-slave replication:

$ s9s cluster --create \

--cluster-type=mysqlreplication \

--nodes='10.10.10.136;10.10.10.137' \

--vendor=mariadb \

--provider-version='10.5' \

--db-admin-passwd='root123' \

--os-user=root \

--os-key-file=/root/.ssh/id_rsa \

--cluster-name='MariaDB1' \

--log

You will see the job log in the console so you can monitor the deployment progress there, or you can also monitor the job progress from ClusterControl UI -> Activity -> Jobs. When it is finished you can see it listed in the ClusterControl UI.

Conclusion

In this blog, we mentioned some options to deploy a MariaDB Database for High Availability using ClusterControl and how to use it on a Chamilo LMS application.

We used a simple master-slave deployment adding load balancers to improve High Availability on this topology, but you can improve this environment even more by using a different approche like Galera+ProxySQL or a different combination of them.

Tags:

↧

How to Backup Your Moodle MariaDB Database

February 5, 2021, 11:26 am

≫ Next: Running MariaDB in a Hybrid Cloud Setup

≪ Previous: How to Deploy the Chamilo MariaDB Database for High Availability

Previously, we had blogged about backing up your Moodle MySQL Database. This time, it's all about backing up your Moodle MariaDB Database. In fact, it shares the same approach when backing up your Moodle MySQL Database as with your Moodle MariaDB Database. However, since MariaDB 10.2, it has slowly deviated and continues to have drastic differences with the MySQL version. In this regard, pay attention to approaches you might have to consider when you are backing up your MariaDB in contrast to MySQL if you were from it.

Best Practices for Making Your Moodle MariaDB Backup

Let's consider this topic first. Taking a backup for your Moodle data has to apply the best practices for your MariaDB backup as this provides you security and assurance especially when disaster or catastrophe comes up in unpredictable situations.

So what's up with this? Taking a backup for your Moodle has to do at least with the following backups:

Logical Backup
Physical copy of your backup
Point-in-Time (Incremental) Recovery

Logical Backup

A logical backup of data is stored in a human-readable format like SQL. Logical backups save information represented as logical database structure (CREATE DATABASE, CREATE TABLE statements) and content (INSERT statements or delimited-text files). This type of backup is suitable for smaller amounts of data where you might edit the data values or table structure, or recreate the data on a different machine architecture. If you tend to use a huge database backup, ensure you have compression enabled. Caveat as this can take a lot of disk space especially.

In MariaDB, a common tool to use is using mysqldump, but shall change in the future with mariadb-dump starting MariaDB 10.4.6 or 10.5.2 onwards. Alternatively, a common tool to use for MySQL/MariaDB DBA's is mydumper if you want a parallel backup for your logical backup copy. Though there are some issues in some versions or it might have problems especially for the latest versions of MariaDB. If you want to use this, then pay attention to your backup logs when creating a copy.

Physical Backup

Physical backup contains the database binary data which consist of raw copies of the directories and files that store database contents. This is an appropriate action to take especially for a large database and it's faster to recover a full copy of the database compared to a logical backup copy. On the other hand, taking a full physical backup takes time especially if you have a very large data set. It also depends on what parameters you have enabled or set that can impact its backup ETA.

A common tool to use for MariaDB is using mariabackup. Mariabackup is an open source tool provided by MariaDB. It is a fork of Percona XtraBackup designed to work with encrypted and compressed tables, and is the recommended backup method for MariaDB databases.

Point-in-Time Recovery (PITR)

Point-in-time recovery refers to recovery of data changes up to a given point in time. This given point in time is the desired recovery objective that has been determined and required to be put back in place, that is applied during recovery. PITR is a point-forward recovery and that means you can restore data from desired starting time to the desired ending time, for the opposite read Using MariaDB Flashback on a MySQL Server. PITR is also considered an additional method of data protection, as it safeguards loss of important information.

In common situations, your PITR backup applicable for your type of recovery is performed after restoring a full backup that brings the server to its state as of the time the backup was made. Point-in-time recovery then brings the server up to date incrementally from the time of the full backup to a more recent time. It also speeds up building a replica within a replication cluster from catching up with your MariaDB primary or active-writer database.

So what are these backups? In MariaDB, the common backups applicable for your PITR are your binary logs. Binary logging has to be configured properly in your MariaDB database and it has to be enabled. If you are using ClusterControl, this might not be difficult for you to configure as it is configured for you automatically and is enabled especially when setting up a replication cluster.

When applying PITR, the most common tool to use is mysqlbinlog or mariadb-binlog for MariaDB 10.5.2 onwards.

Best Approach To Backup Your Moodle MariaDB Database

When performing your Moodle MariaDB Database, always take your backup during non-peak hours or when traffic is too low. Before doing this, make sure that you have tested the backup and that it has been successfully finished. Once finished, test a restore if your backup is useful or not and that it satisfies your needs when data recovery is needed or when you need your backup to create another set of clusters (QA, dev, or extend to another data center). Basically, the following subsections shall be the approach and setup you have to deal with.

Setup A Replica And Take The Backup On Replica

If you are not familiar with replication, read our white paper MySQL Replication for High Availability. Basically, do not spare your active-writer or primary database node for performing or running a backup. Taking a backup has to be tested and it has not to be performed in production if it has not yet been tested with a set of commands and its backup policy you have created. Once good, just leave it and aim it on a replica. You may take a backup from your primary/master MariaDB database if you are left with no choice or at least you are sure of what you are doing.

Run Backup During Non-peak Hours

When performing a backup, ensure that it is your peak hours. Your replica has to be zero lag as possible so the most up-to-date data shall be blacked up.

Create A Backup Policy

The backup policy consists of your parameters and schedule that your backup kicks in. For the parameters, make sure that it suffices your needs, i.e. security, compression, etc. The schedule has to be permanent so your backup shall be there when needed in times of disaster and data recovery is needed. You shall also be able to determine your recovery time objective so that you can determine how often your backup schedule shall run and when.

How To Create Backup of Your Moodle MariaDB Database

Using mariadb-dump/mysqldump

The command below shall create your database backup for Moodle. In this example, the database name is moodle. We included triggers, stored procedures or routines, and events. Print the GTID and the master information which is useful when you want to provision a replica from the primary or master database node.

$  /usr/bin/mariadb-dump --defaults-file=/etc/my.cnf --flush-privileges --hex-blob --opt --master-data=2 --single-transaction --skip-lock-tables --triggers --routines --events --gtid --databases moodle

You can simply replace mariadb-dump with mysqldump if you are using MariaDB version < 10.5.2. If you need to compressed, you can run the command below:

$  /usr/bin/mysqldump --defaults-file=/etc/my.cnf --flush-privileges --hex-blob --opt --master-data=2 --single-transaction --skip-lock-tables --triggers --routines --events --gtid --databases moodle |gzip -6 -c > /backups/backup-n1/mysqldump_2021-01-25_182643_schemaanddata.sql.gz

Using mariabackup

The mariabackup can be taken simply. For this, we will be using mbstream as the desired streaming and archival format. You can use the command below:

$ /usr/bin/mariabackup --defaults-file=/etc/my.cnf --backup --parallel 1 --stream=xbstream > backup.xbstream

If you aim to compress it, you can do the following command:

$ /usr/bin/mariabackup --defaults-file=/etc/my.cnf --backup --parallel 1 --stream=xbstream | gzip -6 - > backup.xbstream.gz

Using ClusterControl

If you switch to a managed solution especially for taking backup of your Moodle, ClusterControl takes it on a simple yet offers advanced features for taking backup. Take a look at the screenshot below:

Taking a backup is very simple and easy to create. All you need to do is to go to <Choose Your Cluster> → Backup → Create Backup.

In the screenshot above, I have mysqldump, mysqldump with PITR compatible, and mariabackup. You can select which host to take the backup. As mentioned earlier, ensure you are taking the backup on the replica or slave. That means, make sure that the slave is selected. See screenshot below:

While selecting or choosing mysqldump, ClusterControl allows the user to choose the type of data to be dumped. This also includes a PITR compatible dump for backup. See screenshot below:

For your binary or physical backup, aside from mysqldump as your logical backup, you can choose mariabackup either full or incremental. See below:

Storage locations only allow the user to choose either to stay on the node or stream it to the ClusterControl host. If you have an external server that is not registered to the ClusterControl, you can use NFS to mount the volume you want your backup to be dumped. Although that might not be ideal but for some cases, this shall satisfy especially if the network bandwidth or network transfer is fast enough to stream the data locally to the other node via network.

Essentially, you can choose the backups to be uploaded to the cloud. You can see the screenshot earlier and just tick the checkbox to enable it just like below:

As mentioned earlier that ClusterControl takes the backup in a simple to use yet provides advanced features, here are the options you can set.

For mysqldump,

For mariabackup,

very easy to use. ClusterControl also offers backup verification and backup restore so it's easy for you to determine if backup is useful or not. This helps that your Moodle database backup is useful especially when data recovery has to be applied for emergency and recovery.

Conclusion

It can be easy to backup your Moodle MariaDB database but when data gets bigger and traffic gets higher, it can be a great challenge. You just need to follow best practices, make sure you have secured your data, ensure your backup is verified and it is useful when data recovery has to be applied.

Tags:

↧

Running MariaDB in a Hybrid Cloud Setup

February 11, 2021, 9:46 pm

≫ Next: Monitoring MariaDB Performance in a Hybrid Cloud

≪ Previous: How to Backup Your Moodle MariaDB Database

The term “hybrid” is popular nowadays. Hybrid is used for vehicles, applications, financials, and also cloud. For example, in the vehicle's use case, the hybrid means combining the power of the gasoline engine with an electric motor.

In the hybrid cloud environment, we combine and connect the resources between a private cloud or on-prem environment with the public cloud. One popular use case is to mirror an on-prem environment in the cloud for disaster recovery purposes. There are some points you need to consider when building a Hybrid Cloud database. Latency will determine which MariaDB architecture you can use. A reliable connection with low and predictable latency means you can spread one Galera Cluster across both environments, with the DR setup in the cloud being syncronously up-to-date with your on-prem environment. However, this also means that the performance of the entire cluster will be limited by the performance of the slowest node in the cluster.

Another alternative is to have two separate systems that are connected using regular asynchronous replication. For instance, it is possible to have two MariaDB Galera clusters asynchronously replicate with each other. For those who prefer the standard asynchrnous replication, we propose two master-slave setups, with the second setup replication from the first one.

In this blog, we will provide a quick hands on guide on how to run a highly available MariaDB replicated setup in a Hybrid Cloud environment.

Why Hybrid Cloud?

Hybrid Cloud enables enterprise organizations to mix the environment between the private on-prem and public cloud. This model provides the following benefits for the organization :

Scalability of infrastructure

You can quickly scale the infrastructure by combining private cloud and public cloud as the business rapidly grows. The public cloud offers a cost effective way to extend your infrastructure, while in a private setup, it requires upfront planning and CAPEX.

Disaster Recovery

A hybrid cloud can be categorized as having a Disaster Recovery Plan, with regards to the deployment model. Public clouds can be used as disaster recovery sites, if something happens to the private datacenter (eg. force majeure, data center issue).

Better technical control and security

By having a hybrid cloud environment, organizations are able to segregate environments. Share the load of services based on restricted access and also enable multi tenancy and segregating the layer.

Architectural Flexibility

Running hybrid cloud environments gives you flexibility in how you can design services based on the workload and the requirements from the application side. For example, a private cloud environment can be restricted with regards to access to Internet, except to a public cloud environment via VPN, while the public cloud environment handles communication to the third party services.

Connectivity

Running a hybrid cloud for databases needs a secure communication link between the private cloud and public cloud. Most of the cloud providers have some sort of connectivity option available, for instance AWS has AWS Direct Connect.

Achieving Hybrid Cloud using ClusterControl

There are a few deployment models for MariaDB in hybrid cloud environments. We can use MariaDB Master/Slave replication or MariaDB Galera Cluster. The difference between Master/Slave and Galera Cluster is the synchronization method. Master/Slave replication uses asynchronous replication of data that is written to a binlog, while MariaDB Galera Cluster uses “virtually” synchronous replication by broadcasting writesets to all nodes. It is also possible to have separate Galera Clusters replicate asynchronously via standard replication.

Deployment of MariaDB Master/Slave Replication on hybrid cloud in ClusterControl is straightforward. You just go through Deploy menu as shown below:

After clicking Deploy, choose MySQL Replication and fill the SSH user, password, and Cluster Name as shown below:

Then click Continue. Choose MariaDB as the database vendor and version to be installed. There are custom options for data directory and server port, or can use the default values.

Fill in the root password for the database, and then click Continue. Add the IP Addresses of the hosts on private and public clouds as shown below:

Note that you will need to take care of the connectivity between the private and public environments, and make sure it is secure. Then click Deploy, it will deploy MariaDB Master/Slave Replication in your hybrid cloud environment. Both environments will have a replicated setup, and the DR setup in the public cloud will be replicating asynchronously from the primary setup in your private datacenter.

Tags:

MariaDB

cloud

hybrid

↧

Monitoring MariaDB Performance in a Hybrid Cloud

February 17, 2021, 4:53 am

≫ Next: Tips and Tricks for Implementing Database Role-Based Access Controls for MariaDB

≪ Previous: Running MariaDB in a Hybrid Cloud Setup

Performance in our MariaDB database is one of the areas we want to closely monitor and observe in a production environment and its timely running condition. It can be extremely demanding for time, work, and money if the architectural setup uses a Hybrid Cloud. Not only that, there are such certain areas to look at, especially the network intermediary that serves its connectivity as either an on-premise or a private cloud that communicates with the public cloud (GCP, AWS, Azure, etc.) and vice versa.

This blog is all about monitoring the performance of your MariaDB databases on a Hybrid Cloud infrastructure. We'll provide you with basics and most important key indicators when monitoring your MariaDB database performance within a hybrid cloud setup.

Why Do You Need Performance Monitoring?

On a Hybrid Cloud, it can be complicated to monitor each service you avail. Unlike your own data center or your private cloud, you have full control over the hardware and software infrastructure. With public cloud, there are limitations on the services you tendered and it might incur an additional cost if you want different services that would provide you with metrics and logs. Security is also a concern with regards to your confidential data that are being collected.

Performance Monitoring helps determine how efficient and how fast your databases are running in the cloud, on-prem or private or public cloud. In practice, a set of tested and result-based processes and tools that will provide you real time or periodical metrics.

Within a hybrid cloud, not all software monitoring tools are built to manage key metrics that have to be observed and monitored. You should have the idea and knowledge to determine the required metrics and requirements the tool can provide. With a hybrid cloud, it is expected that the nature of how a hybrid cloud works can be complex. The services are highly distributed and mixed with other services that are not bound only to one provider.

In that regard, your monitoring software has these specialities and does it also has the capability to identify and decouple from which cloud it belongs. A monitoring software or tools must have the ability to address bottlenecks, security issues, latencies, provide scalability, notify of ongoing issues, and provide predictions. The prediction can avoid further consequences that can lead to a disaster or impact efficiency of your MariaDB databases. This helps the whole team including infrastructure engineers, database engineers, server administrators, and developers to ensure that the database servers are healthy and are running on its level of expectations.

Things To Consider for Database Monitoring

When monitoring your MariaDB database cluster (replication or Galera) or node, there are two main things to take into account: the operating system and the database itself. You will need to define which metrics you are going to monitor from both sides and how you are going to do it. You need to monitor the metric always in the context of your system, and you should look for alterations in the behaviour pattern.

In most cases, you will need to use several tools (as it is nearly impossible to find one to cover all the desired metrics.)

Keep in mind that when one of your metrics is affected, it can also affect others, making troubleshooting of the issue more complex. Having a good monitoring and alerting system is important to make this task as simple as possible.

Always Monitor Your Server Activity (Network, Disk, Load, Memory, & CPU)

Monitoring your server activity can also be a complex task if you have a very complicated stack that is intertwined in your database architecture. However, for a MariaDB database, it's always best to have your nodes always set up as a dedicated server to get a full introspection per node basis. Although that doesn't limit you from using all the spare resources, below are the common key areas you have to look into.

Network

On a Hybrid Cloud infrastructure, it's one of the most important concerns to look into as you have to take into account the type of design and how it communicates from on-premise or private cloud to the public cloud and vice versa. Either way, you have one of the clusters or nodes specialize its role either as a primary for receiving writes or it serves as disaster recovery. With that regard, you do not want your recovery nodes to have latencies, worse it gets large replication lags. Whenever it gets lags and when the primary cluster from a particular cloud (let's say your on-prem) goes down, your public cloud has to take over but shall be able to serve the most up-to-date data. Whenever this can happen, you might have to add a pre-failover mechanism that shall take care of incremental backups or PITR in case some transactions or writes were not yet applied in your recovery cluster (or in this case, your public cloud cluster).

If you're using Galera, since MariaDB upgraded its Galera to version 4, streaming replication is added as one of the key features and changes from the previous version. Since streaming replication addresses the drawbacks it had in the previous releases but allows it to manage more than 2GB of write-sets since Galera Cluster 4. This allows big transactions to be fragmented and is highly recommended to enable this during session level alone. This means, monitoring your network activity is very important and crucial to the normal activity of your MariaDB Cluster. This will help you identify which node did have the most or highest network traffic based on the period of time.

A good example of rendering network monitoring is using ClusterControl. It identifies each of the nodes and provides an overview of its network activity per node regardless of which cloud the node is located. See screenshot below:

CPU, Memory, and Load Activity

Let me briefly put these three areas to look upon when monitoring. In this section, it's always best you have better observability of the following areas at once. It's quick and easy to understand and helps in ruling out a performance bottleneck or identifying bugs that cause your nodes to either stall or affect the other nodes or even having the possibility of a cluster going down.

So how does CPU, memory, and load activity upon monitoring help your MariaDB? Well, as I have mentioned above, those are one of the few things yet a big factor for daily routine checks. Now, this also helps you identify if these are periodical or random occurrences. If periodical, it might be related to backups running in one of your MariaDB database nodes, or it's a massive query that requires optimization. For example, bad queries with no proper indexes, or in-balance usage of data retrieval such as doing a string comparison for such a large string. That can be undeniably inapplicable for OLTP type databases especially if it's really the nature and requirements of your application. Better use other analytical tools such as MariaDB Columnstore, or other third-party analytic processing tools (Apache Spark, Kafka, or MongoDB, etc.) for large string data retrieval and/or string matching.

So with all these key areas being monitored, the question is, how it shall be monitored? It has to be monitored at least per-minute. With refined monitoring, i.e. per-second of collective metrics can be resource intensive and much greedy in terms of your resources. Although half-a-minute of collectivity is acceptable especially if your data and RPO (recovery point objective) is very low, so you need more granular and real-time data metrics. It is very important that you are able to oversee the whole picture of your database cluster. Aside from this, it's also best and important that whenever what metrics you are monitoring, you have the right tool to tap your attention when things are in danger or even just warnings. Using the proper tool such as ClusterControl helps you manage these key areas to be monitored. I'm using a free version or community edition of ClusterControl which helps me monitor my nodes without any hassle from installation up to the monitoring of nodes by just a few clicks. For example, see the screenshots below:

It also provides a per node basis with a simple graph overview,

or with a more powerful and rich data model which also supports query language using Prometheus, can provide you analysis of how your MariaDB database performs based on historical data comparing its performance in a timely manner. For example,

That just provides you more visible metrics. So you see how important it really is to have the right tool when monitoring your MariaDB database in a Hybrid Cloud.

Collective Monitoring of Your MariaDB Statistic Variables

From time to time, it cannot be inevitable that MariaDB database versions will produce new stats to monitor or enhance the nature of monitoring the database by providing more status variables and refine values to look upon.

Bytes Sent/Received

The bytes sent or received correlates with the networking activity and is one of the key areas to look side-by-side especially in a hybrid cloud topology. This allows you to determine which node is the most impacted or attributing to the performance issues that are suffering within your MariaDB database. It is very important as you can check if there can be any degradation in terms of hardware such as your network device or the underlying storage device for which syncing of dirty pages can take too much time to be done.

See the example screenshot,

Cluster Load

This is more of the database activity of how much changes or data retrieval have been queried or done so far since the server's uptime. It helps you rule out what kind of queries are mostly affecting your database cluster performance. This allows you to provide room for improvement especially on balancing the load of your database requests.

As such, there are tons of variables to look upon in a MariaDB database server. The most important thing here you have to take into account is the tool you are using for monitoring your database cluster. With ClusterControl (Community Edition), it provides me with more ways with the flexibility to look upon in a MariaDB database. See the example below,

Then you can also select from the drop down menu for the other variables to look upon,

This is very useful and helps you out determine for example in a replication topology in a hybrid cloud. You can get a quick overview of what's the state and performance related to network but also with other variable pointers to consider and check what are the bottlenecks that would affect your MariaDB performance in a hybrid cloud topology. You can determine if your application is greedy with writes, then the replication and network transfer is impacted, you can get the cluster inter-activity within two or more cloud infra. It's best to determine how well your nodes can handle the stress. Especially during stress testing before pushing specific changes in your application, it's always best to try and test to determine the capacity management of your application product and determine if your current database nodes and design can handle the load of your application requirements.

For more granular and rich data metrics, you can get more data using agent-based monitoring. See below,

This is how you shall approach the monitoring of your MariaDB Cluster. A perfect visualization is always easier and quicker to manage. When things go south, you cannot afford to lose your productivity and also the downtime can impact your business. Although having a free version does not provide you with the luxury and comfort when managing high traffic databases; having alarms, notifications, and database management in one area is a-walk-in-the-park add-ons that ClusterControl can do.

Conclusion

Monitoring your MariaDB database servers in a hybrid cloud environment is not easy and it's also complicated especially when there are numbers of services and complex relationships that formulates the whole stack of your technology. Using the right tools for monitoring helps you to manage your application effectively and improve productivity at the same time. Also, with the right monitoring tools in hand, you will have more time to focus on improving your applications along with other business processes.

Tags:

MariaDB

Hybrid Cloud

performance monitoring

↧

Tips and Tricks for Implementing Database Role-Based Access Controls for MariaDB

March 23, 2021, 2:17 am

≫ Next: Tips and Trick using Audit Logging for MariaDB

≪ Previous: Monitoring MariaDB Performance in a Hybrid Cloud

In a database management system (DBMS), role-based access controls (RBAC), is a restriction on database resources based on a set of pre-defined groups of privileges and has become one of the main methods for advanced access control. Database roles can be created and dropped, as well as have privileges granted to and revoked from them. Roles can be granted to and revoked from individual user accounts. The applicable active roles for an account can be selected from those granted to the account and changed during sessions for that account.

In this blog post, we will cover some tips and tricks on using the database role to manage user privileges and as an advanced access control mechanism for our database access. If you would like to learn about the basics of roles in MySQL and MariaDB, check out this blog post, Database User Management: Managing Roles for MariaDB.

MySQL vs MariaDB Roles

MySQL and MariaDB use two different role mechanisms. In MySQL 8.0 and later, the role is similar to another user, with username and host ('role1'@'localhost'). Yes, that is the role name, which is practically similar to the standard user-host definition. MySQL stores the role definition just like storing user privileges in the mysql.user system table.

MariaDB had introduced role and access privileges in MariaDB version 10.0.5 (Nov 2013), a good 8 years before MySQL included this feature in MySQL8.0. It follows similar role management in a SQL-compliant database system, more robust and much easier to understand. MariaDB stores the definition in the mysql.user system table flagged with a newly added column called is_role. MySQL stores the role differently, using a user-host combination similar to the standard MySQL user management.

Having said that, role migration between these two DBMSs is now incompatible with each other.

MariaDB Administrative and Backup Roles

MySQL has dynamic privileges, which provide a set of privileges for common administration tasks. For MariaDB, we can set similar things using roles, especially for backup and restore privileges. For MariaDB Backup, since it is a physical backup and requires a different set of privileges, we can create a specific role for it to be assigned to another database user.

Firstly, create a role and assign it with the right privileges:

MariaDB> CREATE ROLE mariadb_backup;
MariaDB> GRANT RELOAD, LOCK TABLES, PROCESS, REPLICATION CLIENT ON *.* TO mariadb_backup;

We can then create the backup user, grant it with mariadb_backup role and assign the default role:

MariaDB> CREATE USER mariabackup_user1@localhost IDENTIFIED BY 'passw0rdMMM';
MariaDB> GRANT mariadb_backup TO mariabackup_user1@localhost;
MariaDB> SET DEFAULT ROLE mariadb_backup FOR mariabackup_user1@localhost;

For mysqldump or mariadb-dump, the minimal privileges to create a backup can be set as below:

MariaDB> CREATE ROLE mysqldump_backup;
MariaDB> GRANT SELECT, SHOW VIEW, TRIGGER, LOCK TABLES ON *.* TO mysqldump_backup;

We can then create the backup user, grant it with the mysqldump_backup role and assign the default role:

MariaDB> CREATE USER dump_user1@localhost IDENTIFIED BY 'p4ss182MMM';
MariaDB> GRANT mysqldump_backup TO dump_user1@localhost;
MariaDB> SET DEFAULT ROLE mysqldump_backup FOR dump_user1@localhost;

For restoration, it commonly requires a different set of privileges, which is a bit:

MariaDB> CREATE ROLE mysqldump_restore;
MariaDB> GRANT SUPER, ALTER, INSERT, CREATE, DROP, LOCK TABLES, REFERENCES, SELECT, CREATE ROUTINE, TRIGGER ON *.* TO mysqldump_restore;

We can then create the restore user, grant it with the mysqldump_restore role, and assign the default role:

MariaDB> CREATE USER restore_user1@localhost IDENTIFIED BY 'p4ss182MMM';
MariaDB> GRANT mysqldump_restore TO restore_user1@localhost;
MariaDB> SET DEFAULT ROLE mysqldump_restore FOR restore_user1@localhost;

By using this trick we can simplify the administrative user creation process by assigning a role with pre-defined privileges. Thus, our GRANT statement can be shortened and easy to understand.

Creating Role Over Role In MariaDB

We can create another role over an existing role similar to a nested group membership with more fine-grained control over privileges. For example, we could create the following 4 roles:

MariaDB> CREATE ROLE app_developer, app_reader, app_writer, app_structure;

Grant the privileges to manage the schema structure to the app_structure role:

MariaDB> GRANT CREATE, ALTER, DROP, CREATE VIEW, CREATE ROUTINE, INDEX, TRIGGER, REFERENCES ON app.* to app_structure;

Grant the privileges for Data Manipulation Language (DML) to the app_writer role:

MariaDB> GRANT INSERT, DELETE, UPDATE, CREATE TEMPORARY TABLES app.* to app_writer;

Grant the privileges for Data Query Language (DQL) to the app_reader role:

MariaDB> GRANT SELECT, LOCK TABLES, SHOW VIEW app.* to app_reader;

And finally, we can assign all of the above roles to app_developer which should have full control over the schema:

MariaDB> GRANT app_structure TO app_developer;
MariaDB> GRANT app_reader TO app_developer;
MariaDB> GRANT app_writer TO app_developer;

The roles are ready and now we can create a database user with app_developer role:

MariaDB> CREATE USER 'michael'@'192.168.0.%' IDENTIFIED BY 'passw0rdMMMM';
MariaDB> GRANT app_developer TO 'michael'@'192.168.0.%';
MariaDB> GRANT app_reader TO 'michael'@'192.168.0.%';

Since Michael now belongs to the app_deleloper and app_reader roles, we can also assign the lowest privileges as the default role to protect him against unwanted human mistake:

MariaDB> SET DEFAULT ROLE app_reader FOR 'michael'@'192.168.0.%';

The good thing about using a role is you can hide the actual privileges from the database user. Consider the following database user just logged in:

MariaDB> SELECT user();
+----------------------+
| user()               |
+----------------------+
| michael@192.168.0.10 |
+----------------------+

When trying to retrieve the privileges using SHOW GRANTS, Michael would see:

MariaDB> SHOW GRANTS FOR 'michael'@'192.168.0.%';
+----------------------------------------------------------------------------------------------------------------+
| Grants for michael@localhost                                                                                   |
+----------------------------------------------------------------------------------------------------------------+
| GRANT `app_developer` TO `michael`@`localhost`                                                                 |
| GRANT USAGE ON *.* TO `michael`@`localhost` IDENTIFIED BY PASSWORD '*2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19' |
+----------------------------------------------------------------------------------------------------------------+

And when Michael is trying to look up for the app_developer's privileges, he would see this error:

MariaDB> SHOW GRANTS FOR FOR app_developer;
ERROR 1044 (42000): Access denied for user 'michael'@'localhost' to database 'mysql'

This trick allows the DBAs to exhibit only the logical grouping where the user belongs and nothing more. We can reduce the attack vector from this aspect since the users will have no idea of the actual privileges being assigned to them.

Enforcing Default Role In MariaDB

By enforcing a default role, a database user can be protected at the first layer against accidental human mistakes. For example, consider user Michael which has been granted the app_developer role, where the app_developer role is a superset of app_strucutre, app_writer and app_reader roles, as illustrated below:

Since Michael belongs to the app_deleloper role, we can also set the lowest privilege as the default role to protect him against accidental data modification:

MariaDB> GRANT app_reader TO 'michael'@'192.168.0.%';
MariaDB> SET DEFAULT ROLE app_reader FOR 'michael'@'192.168.0.%';

As for user "michael", he would see the following once logged in:

MariaDB> SELECT user(),current_role();
+-------------------+----------------+
| user()            | current_role() |
+-------------------+----------------+
| michael@localhost | app_reader     |
+-------------------+----------------+

Its default role is app_reader, which is a read_only privilege for a database called "app". The current user has the ability to switch between any applicable roles using the SET ROLE feature. As for Michael, he can switch to another role by using the following statement:

MariaDB> SET ROLE app_developer;

At this point, Michael should be able to write to the database 'app' since app_developer is a superset of app_writer and app_structure. To check the available roles for the current user, we can query the information_schema.applicable_roles table:

MariaDB> SELECT * FROM information_schema.applicable_roles;
+-------------------+---------------+--------------+------------+
| GRANTEE           | ROLE_NAME     | IS_GRANTABLE | IS_DEFAULT |
+-------------------+---------------+--------------+------------+
| michael@localhost | app_developer | NO           | NO         |
| app_developer     | app_writer    | NO           | NULL       |
| app_developer     | app_reader    | NO           | NULL       |
| app_developer     | app_structure | NO           | NULL       |
| michael@localhost | app_reader    | NO           | YES        |
+-------------------+---------------+--------------+------------+

This way, we are kind of setting a primary role for the user, and the primary role can be the lowest privilege possible for a specific user. The user has to consent about its active role, by switching to another privileged role before executing any risky activity to the database server.

Role Mapping in MariaDB

MariaDB provides a role mapping table called mysql.roles_mapping. The mapping allows us to easily understand the correlation between a user and its roles, and how a role is mapped to another role:

MariaDB> SELECT * FROM mysql.roles_mapping;
+-------------+-------------------+------------------+--------------+
| Host        | User              | Role             | Admin_option |
+-------------+-------------------+------------------+--------------+
| localhost   | root              | app_developer    | Y            |
| localhost   | root              | app_writer       | Y            |
| localhost   | root              | app_reader       | Y            |
| localhost   | root              | app_structure    | Y            |
|             | app_developer     | app_structure    | N            |
|             | app_developer     | app_reader       | N            |
|             | app_developer     | app_writer       | N            |
| 192.168.0.% | michael           | app_developer    | N            |
| localhost   | michael           | app_developer    | N            |
| localhost   | root              | mysqldump_backup | Y            |
| localhost   | dump_user1        | mysqldump_backup | N            |
| localhost   | root              | mariadb_backup   | Y            |
| localhost   | mariabackup_user1 | mariadb_backup   | N            |
+-------------+-------------------+------------------+--------------+

From the above output, we can tell that a User without a Host is basically a role over a role and administrative users (Admin_option = Y) are also being assigned to the created roles automatically. To get the list of created roles, we can query the MySQL user table:

MariaDB> SELECT user FROM mysql.user WHERE is_role = 'Y';
+------------------+
| User             |
+------------------+
| app_developer    |
| app_writer       |
| app_reader       |
| app_structure    |
| mysqldump_backup |
| mariadb_backup   |
+------------------+

Final Thoughts

Using roles can improve database security by providing an additional layer of protection against accidental data modification by the database users. Furthermore, it simplifies the privilege management and maintenance operations for organizations that have many database users.

Tags:

↧

Tips and Trick using Audit Logging for MariaDB

March 31, 2021, 4:00 am

≫ Next: Announcing CCX Database as a Service from Severalnines

≪ Previous: Tips and Tricks for Implementing Database Role-Based Access Controls for MariaDB

MariaDB’s Audit Plugin provides auditing functionality for not only MariaDB but MySQL as well (as of, version 5.5.34 and 10.0.7) and Percona Server. MariaDB started including by default the Audit Plugin from versions 10.0.10 and 5.5.37, and it can be installed in any version from MariaDB 5.5.20.

The purpose of the MariaDB Audit Plugin is to log the server's activity. For each client session, it records who connected to the server (i.e., user name and host), what queries were executed, and which tables were accessed and server variables that were changed. This information is stored in a rotating log file or it may be sent to the local syslogd.

In this blog post, we are going to show you some best-practice tunings and tips on how to configure audit logging for a MariaDB server. The writing is based on MariaDB 10.5.9, with the latest version of MariaDB Audit Plugin 1.4.4.

Installation Tuning

The recommended way to enable audit logging is by setting the following lines inside the MariaDB configuration file:

[mariadb]
plugin_load_add = server_audit # load plugin
server_audit=FORCE_PLUS_PERMANENT  # do not allow users to uninstall plugin
server_audit_file_path=/var/log/mysql/mariadb-audit.log # path to the audit log
server_audit_logging=ON  # enable audit logging

Do not forget to set "server_audit=FORCE_PLUS_PERMANENT" to enforce the audit log and disallow it to be uninstalled by other users using the UNINSTALL SONAME statement. By default, the logging destination is a log file in the MariaDB data directory. We should put the audit log outside of this directory because there is a chance that the datadir will be wiped out (SST for Galera Cluster), or being replaced for a physical restore like datadir swapping when restoring a backup taken from MariaDB Backup.

Further tuning is necessary, as shown in the following sections.

Audit Events Filtering

MariaDB Audit plugin utilizes several log settings that depending on the plugin version. The following audit events are available on the latest plugin version 1.4.4:

Type	Description
CONNECT	Connects, disconnects and failed connects, including the error code
QUERY	Queries executed and their results in plain text, including failed queries due to syntax or permission errors
TABLE	Tables affected by query execution
QUERY_DDL	Similar to QUERY, but filters only DDL-type queries (CREATE, ALTER, DROP, RENAME and TRUNCATE statements - except CREATE/DROP [PROCEDURE / FUNCTION / USER] and RENAME USER (they're not DDL)
QUERY_DML	Similar to QUERY, but filters only DML-type queries (DO, CALL, LOAD DATA/XML, DELETE, INSERT, SELECT, UPDATE, HANDLER and REPLACE statements)
QUERY_DML_NO_SELECT	Similar to QUERY_DML, but doesn't log SELECT queries. (since version 1.4.4) (DO, CALL, LOAD DATA/XML, DELETE, INSERT, UPDATE, HANDLER and REPLACE statements)
QUERY_DCL	Similar to QUERY, but filters only DCL-type queries (CREATE USER, DROP USER, RENAME USER, GRANT, REVOKE and SET PASSWORD statements)

By default, it will track everything since the server_audit_events variable will be set to empty by default. Note that older versions have less support for the above operation type, as shown here. So make sure you are running on the latest version if you want to do specific filtering.

If the query cache is enabled, and a query is returned from the query cache, no TABLE records will appear in the log since the server didn't open or access any tables and instead relied on the cached results. So you may want to disable query caching.

To filter out specific events, set the following line inside the MariaDB configuration file (requires restart):

server_audit_events = 'CONNECT,QUERY,TABLE'

Or set it dynamically in the runtime using SET GLOBAL (requires no restart, but not persistent):

MariaDB> SET GLOBAL server_audit_events = 'CONNECT,QUERY,TABLE';

This is the example of one audit event:

20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,226,QUERY,information_schema,'SHOW GLOBAL VARIABLES',0

An entry of this log consists of a bunch of information separated by a comma containing the following information:

Timestamp
The MySQL host (identical with the value of SELECT @@hostname)
The database user
Host where the user was connecting
Connection ID
Thread ID
Operation
Database
SQL statement/command
Return code. 0 means the operation returns a success response (even empty), while a non-zero value means an error executing the operation like a failed query due to syntax or permission errors.

When filtering the entries, one would do a simple grep and look for a specific pattern:

$ grep -i global /var/lib/mysql/server_audit.log
20210325 04:19:17,ip-172-31-0-44,root,localhost,14,37080,QUERY,,'set global server_audit_file_rotate_now = 1',0
20210326 00:46:48,ip-172-31-0-44,root,localhost,35,329003,QUERY,,'set global server_audit_output_type = \'syslog\'',0

By default, all passwords value will be masked with asterisks:

20210326 05:39:41,ip-172-31-0-44,root,localhost,52,398793,QUERY,mysql,'GRANT ALL PRIVILEGES ON sbtest.* TO sbtest@127.0.0.1 IDENTIFIED BY *****',0

Audit User Filtering

If you track everything, you probably will be flooded with the monitoring user for its sampling responsibility, as shown in the example below:

20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,226,QUERY,information_schema,'SHOW GLOBAL VARIABLES',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,227,QUERY,information_schema,'select @@global.wsrep_provider_options',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,228,QUERY,information_schema,'SHOW SLAVE STATUS',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,229,QUERY,information_schema,'SHOW MASTER STATUS',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,230,QUERY,information_schema,'SHOW SLAVE HOSTS',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,231,QUERY,information_schema,'SHOW GLOBAL VARIABLES',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,232,QUERY,information_schema,'select @@global.wsrep_provider_options',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,233,QUERY,information_schema,'SHOW SLAVE STATUS',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,234,QUERY,information_schema,'SHOW MASTER STATUS',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,7,235,QUERY,information_schema,'SHOW SLAVE HOSTS',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,5,236,QUERY,information_schema,'SET GLOBAL SLOW_QUERY_LOG=0',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,5,237,QUERY,information_schema,'FLUSH /*!50500 SLOW */ LOGS',0
20210325 02:02:08,ip-172-31-0-44,cmon,172.31.1.119,6,238,QUERY,information_schema,'SHOW GLOBAL STATUS',0

In the span of one second, we can see 14 QUERY events recorded by the audit plugin for our monitoring user called "cmon". In our test workload, the logging rate is around 32 KB per minute, which will accumulate up to 46 MB per day. Depending on the storage size and IO capacity, this could be excessive in some workloads. So it would be better to filter out the monitoring user from the audit logging, so we could have a cleaner output and is much easier to audit and analyze.

Depending on the security and auditing policies, we could filter out the unwanted user like the monitoring user by using the following variable inside the MariaDB configuration file (requires restart):

server_audit_excl_users='cmon'

Or set it dynamically in the runtime using SET GLOBAL (requires no restart, but not persistent):

MariaDB> SET GLOBAL server_audit_excl_users = 'cmon'

You can add multiple database users, separated by comma. After adding the above, we got a cleaner audit logs, as below (nothing from the 'cmon' user anymore):

$ tail -f /var/log/mysql/mysql-audit.log
20210325 04:16:06,ip-172-31-0-44,cmon,172.31.1.119,6,36218,QUERY,information_schema,'SHOW GLOBAL STATUS',0
20210325 04:16:06,ip-172-31-0-44,root,localhost,13,36219,QUERY,,'set global server_audit_excl_users = \'cmon\'',0
20210325 04:16:09,ip-172-31-0-44,root,localhost,13,36237,QUERY,,'show global variables like \'%server_audit%\'',0
20210325 04:16:12,ip-172-31-0-44,root,localhost,13,0,DISCONNECT,,,0

Log Rotation Management

Since the audit log is going to capture a huge number of events, it is recommended to configure a proper log rotation for it. Otherwise, we would end up with an enormous size of logfile which makes it very difficult to analyze. While the server is running, and server_audit_output_type=file, we can force the logfile rotation by using the following statement:

MariaDB> SET GLOBAL server_audit_file_rotate_now = 1;

For automatic log rotation, we should set the following variables inside the MariaDB configuration file:

server_audit_file_rotate_size=1000000 # in bytes
server_audit_file_rotations=30

Or set it dynamically in the runtime using SET GLOBAL (require no restart):

MariaDB> SET GLOBAL server_audit_file_rotate_size=1000000;
MariaDB> SET GLOBAL server_audit_file_rotations=30;

To disable audit log rotation, simply set the server_audit_file_rotations to 0. The default value is 9. The log rotation will happen automatically after it reaches the specified threshold and will keep the last 30 logs, which means the last 30 days' worth of audit logging.

Auditing using Syslog or Rsyslog Facility

Using the syslog or rsyslog facility will make log management easier because it permits the logging from different types of systems in a central repository. Instead of maintaining another logging component, we can instruct the MariaDB Audit to log to syslog. This is handy if you have a log collector/streamer for log analyzer services like Splunk, LogStash, Loggly or Amazon CloudWatch.

To do this, set the following lines inside MariaDB configuration file (requires restart):

server_audit_logging = 'syslog'
server_audit_syslog_ident = 'mariadb-audit'

Or if you want to change in the runtime (requires no restart, but not persistent):

MariaDB> SET GLOBAL server_audit_logging = 'syslog';
MariaDB> SET GLOBAL server_audit_syslog_ident = 'mariadb-audit';

The entries will be similar to the syslog format:

$ grep mariadb-audit /var/log/syslog
Mar 26 00:48:49 ip-172-31-0-44 mariadb-audit:  ip-172-31-0-44,root,localhost,36,329540,QUERY,,'SET GLOBAL server_audit_syslog_ident = \'mariadb-audit\'',0
Mar 26 00:48:54 ip-172-31-0-44 mariadb-audit:  ip-172-31-0-44,root,localhost,36,0,DISCONNECT,,,0

If you want to set up a remote logging service for a centralized logging repository, we can use rsyslog. The trick is to use the variable server_audit_syslog_facility where we can create a filter to facilitate logging, similar to below:

MariaDB> SET GLOBAL server_audit_logging = 'syslog';
MariaDB> SET GLOBAL server_audit_syslog_ident = 'mariadb-audit';
MariaDB> SET GLOBAL server_audit_syslog_facility = 'LOG_LOCAL6';

However, there are some prerequisite steps beforehand. Consider the following MariaDB master-slave replication architecture with a centralized rsyslog server:

In this example, all servers are running on Ubuntu 20.04. On the rsyslog destination server, we need to set the following inside /etc/rsyslog.conf:

module(load="imtcp")
input(type="imtcp" port="514")
$ModLoad imtcp
$InputTCPServerRun 514
if $fromhost-ip=='172.31.0.44' then /var/log/mariadb-centralized-audit.log
& ~
if $fromhost-ip=='172.31.0.82' then /var/log/mariadb-centralized-audit.log
& ~

Note that the "& ~" part is important and don't miss that out. It basically tells the logging facility to log into /var/log/mariadb-centralized-audit.log and stop further processing right after that.

Next, create the destination log file with the correct file ownership and permission:

$ touch /var/log/mariadb-centralized-audit.log
$ chown syslog:adm /var/log/mariadb-centralized-audit.log
$ chmod 640 /var/log/mariadb-centralized-audit.log

Restart rsyslog:

$ systemctl restart rsyslog

Make sure it listens on all accessible IP addresses on TCP port 514:

$ netstat -tulpn | grep rsyslog
tcp        0      0 0.0.0.0:514             0.0.0.0:*               LISTEN      3143247/rsyslogd
tcp6       0      0 :::514                  :::*                    LISTEN      3143247/rsyslogd

We have completed configuring the destination rsyslog server. Now we are ready to configure the source part. On the MariaDB server, create a new separate rsyslog configuration file at /etc/rsyslog.d/50-mariadb-audit.conf and add the following lines:

$WorkDirectory /var/lib/rsyslog # where to place spool files
$ActionQueueFileName queue1     # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g     # 1GB space limit (use as much as possible)
$ActionQueueSaveOnShutdown on   # save messages to disk on shutdown
$ActionQueueType LinkedList     # run asynchronously
$ActionResumeRetryCount -1      # infinite retries if rsyslog host is down
local6.* action(type="omfwd" target="172.31.6.200" port="514" protocol="tcp")

The settings in the first section are about creating an on-disk queue, which is recommended to not get any log entry lost. The last line is important. We changed the variable server_audit_syslog_facility to LOG_LOCAL6 for the audit plugin. Here, we specified "local6.*" as a filter to only forward Syslog entries using facility local6 to rsyslog running on the rsyslog server 172.31.6.200, on port 514 via TCP protocol.

To activate the changes for rsyslog, the last step is to restart the rsyslog on the MariaDB server to activate the changes:

$ systemctl restart rsyslog

Now, rsyslog is correctly configured on the source node. We can test out by accessing the MariaDB server and perform some activities to generate audit events. You should see the audit log entries are forwarded here:

$ tail -f /var/log/mariadb-centralized-audit.log
Mar 26 12:56:18 ip-172-31-0-44 mariadb-audit:  ip-172-31-0-44,root,localhost,69,0,CONNECT,,,0
Mar 26 12:56:18 ip-172-31-0-44 mariadb-audit:  ip-172-31-0-44,root,localhost,69,489413,QUERY,,'select @@version_comment limit 1',0
Mar 26 12:56:19 ip-172-31-0-44 mariadb-audit:  ip-172-31-0-44,root,localhost,69,489414,QUERY,,'show databases',0
Mar 26 12:56:37 ip-172-31-0-44 mariadb-audit:  ip-172-31-0-44,root,localhost,69,0,DISCONNECT,,,0

Final Thoughts

MariaDB Audit Plugin can be configured in many ways to suit your security and auditing policies. Auditing information can help you troubleshoot performance or application issues, and lets you see exactly what SQL queries are being processed.

Tags:

↧

Announcing CCX Database as a Service from Severalnines

April 26, 2021, 5:16 am

≫ Next: Live Webinar: Tips to Drive MariaDB Galera Cluster Performance for Nextcloud

≪ Previous: Tips and Trick using Audit Logging for MariaDB

We at Severalnines are thrilled to announce the release of CCX, our brand new database as a service (DBaaS) offering! It’s a fully managed service built atop the powerful ClusterControl automated operational database management platform. CCX enables you to simply click to deploy and access managed, secured MySQL, MariaDB and PostgreSQL database clusters on multiple Availability Zones on AWS. At last, database high availability and performance meets extreme ease-of-use.

CCX Is Not Your Average DBaaS

CCX is not your average DBaaS. It includes a combination of advanced technologies that other DBaaS vendors do not. Following are some of CCX’ capabilities.

Database Automation and Management

CCX leverages the ClusterControl automation and management platform to provide unrivaled ease of management of open source databases and database clusters.

High Availability

CCX uses the powerful multi-master technology of Galera Cluster to support preconfigured, highly available deployments for MySQL and MariaDB.
For PostgreSQL, CCX supports streaming replication, which enables the continuous transfer of data between nodes, keeping them current in real time.
CCX leverages ClusterControl’s powerful self-healing functionality to detect node anomalies and failures and, when they occur, automatically switches to standby nodes and repairs broken ones.

Traffic Management

ProxySQL provides database-aware advanced traffic management by default with MySQL and MariaDB Clusters. It routes queries on-demand, separating write-traffic from read-traffic, optimizes connection handling and enables throttling.

Security

VPC peering enables CCX to securely route traffic between your applications and the database servers without any exposure to the internet.
Advanced user management ensures that databases and the data contained within can only be accessed by authorized users.
Data is protected by a firewall and encrypted between the client and the server.

Monitoring

CCX provides advanced database monitoring capabilities including query monitoring, system monitoring, and specialized stats on the database and load balancers.

Disaster Recovery

Full database backups are taken daily and incremental backups are taken hourly so data is always available should something go wrong.

Upgrades and Patches

Security and minor upgrade patches are applied automatically, ensuring databases are up-to-date and secure.

Database Experts

CCX is supported and managed by database experts with years of open source database experience.

Break Free

With CCX you can break free from mundane database management and maintenance tasks and leave them to the powerful combination of ClusterControl automation and Severalnines database experts.

Learn more at the CCX site or request a demo to see the difference CCX can make.

Tags:

percona xtradb cluster

↧

Live Webinar: Tips to Drive MariaDB Galera Cluster Performance for Nextcloud

May 20, 2021, 1:52 am

≫ Next: How to configure AppArmor for MySQL-based systems (MySQL/MariaDB Replication + Galera)

≪ Previous: Announcing CCX Database as a Service from Severalnines

Join us for this webinar on Tips to Drive MariaDB Galera Cluster Performance for Nextcloud. The webinar features Björn Schiessle, Co-Founder and Pre-sales lead at Nextcloud, and Ashraf Sharif, senior support engineer at Severalnines. They will give you a deep dive into designing and optimising MariaDB Galera Cluster for Nextcloud and sharing tips on improving performance and stability significantly.

Nextcloud: Regain Control Over Your Data

Nextcloud is an on-premises collaboration platform. It uniquely combines the convenience and ease of use of consumer-grade SaaS platforms with the security, privacy and control large organizations needs.

Users gain access to their documents and can share them with others within and outside their organization with an easy to use web interface or clients for all popular platforms. Nextcloud also features extensive collaboration capabilities including Calendar, Contact, Mail, Online Office, private audio/video conferencing and a variety of planning and coordination tools as part of an extensive ecosystem of hundreds of apps.

Nextcloud deeply integrates in existing infrastructure like user directories and storage and provides strong access control capabilities to ensure business policies are enforced. First class security backed by a USD 10.000 security bug bounty program provides the confidence that data meant to stay private will stay private.

Nextcloud is a fully open source platform, with hundreds of thousands of servers deployed on the web by both individual techies and large corporations. At scale, database performance is key for a good user experience and large deployments in government, for telecom providers, research universities or big enterprises work closely with Nextcloud and its partners like Severalnines to get the most out of their hardware.

About the webinar

Nextcloud uses its database to store a wide of range of data, from file meta data to calendar files and chat logs. A poorly performing database can have a serious impact on performance and availability of Nextcloud. MariaDB Cluster is the recommended database backend for production installations that require high availability and performance.

This talk is a deep dive into how to design and optimize MariaDB Galera Cluster for Nextcloud. We will cover 5 tips on how to significantly improve performance and stability.

Agenda:

Overview of Nextcloud architecture
Database architecture design
Database proxy
MariaDB and InnoDB performance tuning
Nextcloud performance tuning
Q&A

Learn more and sign up now!

Tags:

↧

How to configure AppArmor for MySQL-based systems (MySQL/MariaDB Replication + Galera)

June 22, 2021, 11:25 pm

≫ Next: How to configure SELinux for MySQL-based systems (MySQL/MariaDB Replication + Galera)

≪ Previous: Live Webinar: Tips to Drive MariaDB Galera Cluster Performance for Nextcloud

Last week, we discussed how to configure AppArmor for MongoDB Replica Sets which basically has the same concepts applicable when configuring this for your MySQL-based systems. Indeed, security is very important because you have to make sure that your data is very well protected and encapsulated against unwanted data gathering of information from your business domain.

A little bit of history about AppArmor

AppArmor was first used in Immunix Linux 1998–2003. At the time, AppArmor was known as SubDomain, a reference to the ability for a security profile for a specific program to be segmented into different domains, which the program can switch between dynamically. AppArmor was first made available in SLES and openSUSE, and was first enabled by default in SLES 10 and in openSUSE 10.1.

In May 2005 Novell acquired Immunix and rebranded SubDomain as AppArmor and began code cleaning and rewriting for the inclusion in the Linux kernel. From 2005 to September 2007, AppArmor was maintained by Novell. Novell was taken over by SUSE who are now the legal owners of the trademarked name AppArmor.

AppArmor was first successfully ported/packaged for Ubuntu in April 2007. AppArmor became a default package starting in Ubuntu 7.10, and came as a part of the release of Ubuntu 8.04, protecting only CUPS by default. As of Ubuntu 9.04 more items such as MySQL have installed profiles. AppArmor hardening continued to improve in Ubuntu 9.10 as it ships with profiles for its guest session, libvirt virtual machines, the Evince document viewer, and an optional Firefox profile.

Why do we need AppArmor?

In our previous blog, we have tackled a bit of what is the use of AppArmor. It is a Mandatory Access Control (MAC) system, implemented upon the Linux Security Modules (LSM). It is used and mostly enabled by default in systems such as Ubuntu, Debian (since Buster), SUSE, and other distributions. It is comparable to RHEL/CentOS SELinux, which requires good userspace integration to work properly. SELinux attaches labels to all files, processes, and objects and is therefore very flexible. However, configuring SELinux is considered to be very complicated and requires a supported filesystem. AppArmor, on the other hand, works using file paths and its configuration can be easily adapted.

AppArmor, like most other LSMs, supplements rather than replaces the default Discretionary Access Control (DAC). As such it is impossible to grant a process more privileges than it had in the first place.

AppArmor proactively protects the operating system and applications from external or internal threats and even zero-day attacks by enforcing a specific rule set on a per-application basis. Security policies completely define what system resources individual applications can access, and with what privileges. Access is denied by default if no profile says otherwise. A few default policies are included with AppArmor and using a combination of advanced static analysis and learning-based tools, AppArmor policies for even very complex applications can be deployed successfully in a matter of hours.

Every breach of policy triggers a message in the system log, and AppArmor can be configured to notify users with real-time violation warnings.

AppArmor for MySQL

I have setup a MySQL-replication-based cluster using ClusterControl to my target database nodes running in Ubuntu Bionic. You can further follow this blog on how to deploy it or follow this video tutorial. Take note that ClusterControl checks or disables the AppArmor during deployment. You might have to uncheck this according to your setup just like below:

ClusterControl will just issue a warning that it is not touching your current AppArmor configuration. See below:

Managing your AppArmor profiles

Standard installation of your AppArmor in Ubuntu would not include utilities that would help manage the profiles efficiently. So let's install these packages like so:

$ apt install apparmor-profiles apparmor-utils

Once installed, check the status of your AppArmor in the system by running aa-status command. In the node I am using, I have the following output without MySQL 8 installed.

$ aa-status

apparmor module is loaded.

15 profiles are loaded.

15 profiles are in enforce mode.

   /sbin/dhclient

   /usr/bin/lxc-start

   /usr/bin/man

   /usr/lib/NetworkManager/nm-dhcp-client.action

   /usr/lib/NetworkManager/nm-dhcp-helper

   /usr/lib/connman/scripts/dhclient-script

   /usr/lib/snapd/snap-confine

   /usr/lib/snapd/snap-confine//mount-namespace-capture-helper

   /usr/sbin/tcpdump

   lxc-container-default

   lxc-container-default-cgns

   lxc-container-default-with-mounting

   lxc-container-default-with-nesting

   man_filter

   man_groff

0 profiles are in complain mode.

0 processes have profiles defined.

0 processes are in enforce mode.

0 processes are in complain mode.

0 processes are unconfined but have a profile defined.

Since I am using ClusterControl to deploy my MySQL-replication based cluster with AppArmor (i.e. ClusterControl won't touch my current AppArmor config), the deployment shall have the MySQL profile in place and that shows up in the list running aa-status.

$ aa-status

apparmor module is loaded.

56 profiles are loaded.

19 profiles are in enforce mode.

   ...

   /usr/sbin/mysqld

   ...

37 profiles are in complain mode.

   ...

1 processes have profiles defined.

1 processes are in enforce mode.

   /usr/sbin/mysqld (31501)

0 processes are in complain mode.

0 processes are unconfined but have a profile defined.

It is worth noting that a profile is in one of the following modes:

Enforce - Default setting. Applications are prevented from taking actions restricted by the profile rules.
Complain - Applications are allowed to take restricted actions, and the actions are logged.
Disabled - Applications are allowed to take restricted actions, and the actions are not logged.

You can also mix enforce and complain profiles in your server.

Based on the output above, let's elaborate more about the profile complain. AppArmor will allow it to perform (almost as complain mode status will still enforce any explicit deny rules in a profile) all tasks without restriction but it will log them in the audit log as events. This is useful when you are attempting to create a profile for an application but are not sure what things it needs access to. Whereas the unconfined status, on the other hand, will allow the program to perform any task and will not log it. This usually occurs if a profile was loaded after an application is started, meaning it runs without restrictions from AppArmor. It is also important to note that only processes that have profiles are listed under this unconfined status. Any other processes that run on your system but do not have a profile created for them will not be listed under aa-status.

If you have disabled AppArmor but then realize you wanted to enhance your security or comply with security regulations, you can use this MySQL 8.0 profile that is provided by MySQL itself. To apply that, just run the following command:

$ cat /etc/apparmor.d/usr.sbin.mysqld | sudo apparmor_parser -a

It is worth noting that AppArmor profiles are stored by default in /etc/apparmor.d/. It is a good practice to add your profiles in that directory.

Diagnosing your AppArmor profiles

AppArmor logs can be found in the systemd journal, in /var/log/syslog and /var/log/kern.log (and /var/log/audit.log when auditd is installed). What you need to look for is the following:

ALLOWED (logged when a profile in complain mode violates the policy)
DENIED (logged when a profile in enforce mode actually blocks an operation)

The full log message should provide more information on what exact access has been denied. You can use this to edit profiles before turning them on in enforce mode.

For example,

$ grep -i -rn -E 'apparmor=.*denied|apparmor=.*allowed' /var/log/

/var/log/kern.log:503:Jun 18 18:54:09 ubuntu-bionic kernel: [  664.680141] audit: type=1400 audit(1624042449.006:19): apparmor="DENIED" operation="capable" profile="/usr/sbin/mysqld" pid=30349 comm="mysqld" capability=2  capname="dac_read_search"

Binary file /var/log/journal/877861ee473c4c03ac1512ed369dead1/system.journal matches

/var/log/syslog:1012:Jun 18 18:54:09 ubuntu-bionic kernel: [  664.680141] audit: type=1400 audit(1624042449.006:19): apparmor="DENIED" operation="capable" profile="/usr/sbin/mysqld" pid=30349 comm="mysqld" capability=2  capname="dac_read_search"

Customizing your AppArmor profile

Profiles prepared by Oracle MySQL shall not be a one-size-fits-all pattern. In that case, you might decide to change, for example, the data directory where your MySQL instance data is located. After you apply the changes to your configuration file and restart your MySQL instances, AppArmor will deny this action. For example,

$ egrep -i -rn 'apparmor=.*denied|apparmor=.*allowed' /var/log/

/var/log/kern.log:503:Jun 18 18:54:09 ubuntu-bionic kernel: [  664.680141] audit: type=1400 audit(1624042449.006:19): apparmor="DENIED" operation="capable" profile="/usr/sbin/mysqld" pid=30349 comm="mysqld" capability=2  capname="dac_read_search"

/var/log/kern.log:522:Jun 18 19:46:26 ubuntu-bionic kernel: [ 3801.151770] audit: type=1400 audit(1624045586.822:67): apparmor="DENIED" operation="mknod" profile="/usr/sbin/mysqld" name="/mysql-data/mysql.sock.lock" pid=5262 comm="mysqld" requested_mask="c" denied_mask="c" fsuid=1002 ouid=1002

Binary file /var/log/journal/877861ee473c4c03ac1512ed369dead1/system.journal matches

/var/log/syslog:1012:Jun 18 18:54:09 ubuntu-bionic kernel: [  664.680141] audit: type=1400 audit(1624042449.006:19): apparmor="DENIED" operation="capable" profile="/usr/sbin/mysqld" pid=30349 comm="mysqld" capability=2  capname="dac_read_search"

/var/log/syslog:1313:Jun 18 19:46:26 ubuntu-bionic kernel: [ 3801.151770] audit: type=1400 audit(1624045586.822:67): apparmor="DENIED" operation="mknod" profile="/usr/sbin/mysqld" name="/mysql-data/mysql.sock.lock" pid=5262 comm="mysqld" requested_mask="c" denied_mask="c" fsuid=1002 ouid=1002

Together with the error I had earlier, now it adds that I had just decided to use the /mysql-data directory but that is denied.

To apply the changes, let’s do the following. Edit the file /etc/apparmor.d/usr.sbin.mysqld. You might find these lines:

# Allow data dir access

  /var/lib/mysql/ r,

  /var/lib/mysql/** rwk,

Those flags such as r, rwk are the so-called access modes. These mean the following:

       r       - read

       w       - write -- conflicts with append

       k       - lock

The man page explains those flags in more detail.

Now, I have changed it to the following:

# Allow data dir access

  /mysql-data/ r,

  /mysql-data/** rwk,

Then I reload the profiles as follows:

$ apparmor_parser -r -T /etc/apparmor.d/usr.sbin.mysqld

Restart the MySQL server:

$ systemctl restart mysql.service

What if I set my mysqld to a complain mode?

As mentioned earlier, complain mode status will still enforce any explicit deny rules in a profile. Though this works for example:

$ aa-complain /usr/sbin/mysqld

Setting /usr/sbin/mysqld to complain mode.

Then,

$ aa-status

apparmor module is loaded.

56 profiles are loaded.

18 profiles are in enforce mode.

   ...

38 profiles are in complain mode.

   ...

1 processes have profiles defined.

0 processes are in enforce mode.

1 processes are in complain mode.

   /usr/sbin/mysqld (23477)

0 processes are unconfined but have a profile defined.

After you restarted MySQL, it will run and will show log files such as:

/var/log/syslog:1356:Jun 18 19:58:51 ubuntu-bionic kernel: [ 4545.427074] audit: type=1400 audit(1624046331.098:83): apparmor="ALLOWED" operation="open" profile="/usr/sbin/mysqld" name="/mysql-data/mysql.sock.lock" pid=5760 comm="mysqld" requested_mask="wrc" denied_mask="wrc" fsuid=1002 ouid=1002

/var/log/syslog:1357:Jun 18 19:58:51 ubuntu-bionic kernel: [ 4545.432077] audit: type=1400 audit(1624046331.102:84): apparmor="ALLOWED" operation="mknod" profile="/usr/sbin/mysqld" name="/mysql-data/mysql.sock" pid=5760 comm="mysqld" requested_mask="c" denied_mask="c" fsuid=1002 ouid=1002

/var/log/syslog:1358:Jun 18 19:58:51 ubuntu-bionic kernel: [ 4545.432101] audit: type=1400 audit(1624046331.102:85): apparmor="ALLOWED" operation="mknod" profile="/usr/sbin/mysqld" name="/mysql-data/mysql.pid" pid=5760 comm="mysqld" requested_mask="c" denied_mask="c" fsuid=1002 ouid=1002

And it will work. However, it will probably have issues with networking as it still denies entries as what we have in /etc/apparmor.d/usr.sbin.mysqld. For example, my replica is not able to connect to the primary:

                Last_IO_Error: error connecting to master 'rpl_user@192.168.40.246:3306' - retry-time: 10 retries: 1 message: Host '192.168.40.247' is not allowed to connect to this MySQL server

               Last_SQL_Errno: 0

In that case, using enforce and reloading your profile shall be an efficient and easy approach to manage your MySQL profiles with AppArmor.

Tags:

↧