In general, databases store data in row format and use SQL as query language to access it, but this storage method is not always the best in terms of performance, it depends on the workload itself. If you want to get statistical data, you should most probably use another kind of database storage engine.
In this blog, we will see what Columnar Storage is and, to be more specific, what MariaDB ColumnStore is, and how to install it to be able to process your big data in a more performant way for analytical purposes.
Columnar Storage
Columnar Storage is a type of database engine that stores data using a column-oriented model.
For example, in a common relational database, we could have a table like this:
id | firstname | lastname | age |
1001 | Oliver | Smith | 23 |
1002 | Harry | Jones | 65 |
1003 | George | Williams | 30 |
1004 | Jack | Taylor | 41 |
Here is where a Columnar Storage engine comes into play. Instead of storing data in rows, the data is stored in columns. So, if you need to know the average age, it will be better to use it, as you will have a structure like this:
id | firstname | id | lastname | id | age | ||
1001 | Oliver | 1001 | Smith | 1001 | 23 | ||
1002 | Harry | 1002 | Jones | 1002 | 65 | ||
1003 | George | 1003 | Williams | 1003 | 30 | ||
1004 | Jack | 1004 | Taylor | 1004 | 41 |
n the other hand, the cost of doing single inserts is higher than a row-oriented database, and it is not the best option for “SELECT *” queries or transactional operations, so we can say that it fits better in an OLAP (Online Analytical Processing) database than an OLTP (Online Transaction Processing) one.
MariaDB ColumnStore
It is a columnar storage engine that uses a massively parallel distributed data architecture. It is a separate download, but it will be available as a storage engine for MariaDB Server from MariaDB 10.5.4, which is still in development at the time of this blog was written.
It is designed for big data, using the benefits of columnar storage to have a great performance with real-time response to analytical queries.
MariaDB ColumnStore Architecture
It is composed of many (or just 1) MariaDB Servers, operating as modules, working together. These modules include User, Performance, and Storage.
User Module
It is a MariaDB Server instance configured to operate as a front-end to ColumnStore.
The User Module manages and controls the operation of end-user queries. When a client runs a query, it is parsed and distributed to one or more Performance Modules to process the query. The User module then collects the query results and assembles them into the result-set to return to the client.
The primary purpose of the User Module is to handle concurrency scaling. It never directly touches database files and doesn't require visibility to them.
Performance Module
It is responsible for storing, retrieving, and managing data, processing block requests for query operations, and for passing it back to the User module or modules to finalize the query requests. It doesn't see the query itself, but only a set of instructions given to it by a User Module.
The module selects data from disk and caches it in a shared-nothing buffer that is part of the server on which it runs.
Having multiple Performance Module nodes, a heartbeat mechanism ensures that all nodes are online and there is transparent failover in the event that a particular node fails.
Storage
You can use local storage (Performance Modules), or shared storage (SAN), to store data.
When you create a table on MariaDB ColumnStore, the system creates at least one file per column in the table. So, for instance, a table created with three columns would have a minimum of three, separately addressable logical objects created on a SAN or on the local disk of a Performance Module.
ColumnStore optimizes its compression strategy for read performance from disk. It is tuned to accelerate the decompression rate, maximizing the performance benefits when reading from disk.
MariaDB ColumnStore uses the Version Buffer to store disk blocks that are being modified, manage transaction rollbacks, and service the MVCC (multi-version concurrency control) or "snapshot read" function of the database. This allows it to offer a query consistent view of the database.
How MariaDB CloumnStore Works
Now, let’s see how MariaDB ColumnStore processes an end-user query, according to the official MariaDB ColumnStore documentation:
- Clients issue a query to the MariaDB Server running on the User Module. The server performs a table operation for all tables needed to fulfill the request and obtains the initial query execution plan.
- Using the MariaDB storage engine interface, ColumnStore converts the server table object into ColumnStore objects. These objects are then sent to the User Module processes.
- The User Module converts the MariaDB execution plan and optimizes the given objects into a ColumnStore execution plan. It then determines the steps needed to run the query and the order in which they need to be run.
- The User Module then consults the Extent Map to determine which Performance Modules to consult for the data it needs, it then performs Extent Elimination, eliminating any Performance Modules from the list that only contain data outside the range of what the query requires.
- The User Module then sends commands to one or more Performance Modules to perform block I/O operations.
- The Performance Module or Modules carry out predicate filtering, join processing, initial aggregation of data from local or external storage, then send the data back to the User Module.
- The User Module performs the final result-set aggregation and composes the result-set for the query.
- The User Module / ExeMgr implements any window function calculations, as well as any necessary sorting on the result-set. It then returns the result-set to the server.
- The MariaDB Server performs any select list functions, ORDER BY and LIMIT operations on the result-set.
- The MariaDB Server returns the result-set to the client.
How to Install MariaDB ColumnStore
Now, let’s see how to install it. For more information, you can check the MariaDB official documentation.
We will use CentOS 7 as the operating system, but you can use any supported OS instead. The installation packages are available for download here.
First, you will need to install the Extra Packages repository:
$ yum install -y epel-release
Then, the following required packages:
$ yum install -y boost expect perl perl-DBI openssl zlib snappy libaio perl-DBD-MySQL net-tools wget jemalloc numactl-libs
And now, let’s download the MariaDB ColumnStore latest version, uncompress, and install it:
$ wget https://downloads.mariadb.com/ColumnStore/latest/centos/x86_64/7/mariadb-columnstore-1.2.5-1-centos7.x86_64.rpm.tar.gz
$ tar zxf mariadb-columnstore-1.2.5-1-centos7.x86_64.rpm.tar.gz
$ rpm -ivh mariadb-columnstore-1.2.5-1-*.rpm
When it is finished, you will see the following message:
The next step is:
If installing on a pm1 node using non-distributed install
/usr/local/mariadb/columnstore/bin/postConfigure
If installing on a pm1 node using distributed install
/usr/local/mariadb/columnstore/bin/postConfigure -d
If installing on a non-pm1 using the non-distributed option:
/usr/local/mariadb/columnstore/bin/columnstore start
So, for this example, let’s just run the command:
$ /usr/local/mariadb/columnstore/bin/postConfigure
Now, it will ask you some information about the installation:
This is the MariaDB ColumnStore System Configuration and Installation tool.
It will Configure the MariaDB ColumnStore System and will perform a Package
Installation of all of the Servers within the System that is being configured.
IMPORTANT: This tool requires to run on the Performance Module #1
Prompting instructions:
Press 'enter' to accept a value in (), if available or
Enter one of the options within [], if available, or
Enter a new value
===== Setup System Server Type Configuration =====
There are 2 options when configuring the System Server Type: single and multi
'single' - Single-Server install is used when there will only be 1 server configured
on the system. It can also be used for production systems, if the plan is
to stay single-server.
'multi' - Multi-Server install is used when you want to configure multiple servers now or
in the future. With Multi-Server install, you can still configure just 1 server
now and add on addition servers/modules in the future.
Select the type of System Server install [1=single, 2=multi] (2) > 1
Performing the Single Server Install.
Enter System Name (columnstore-1) >
===== Setup Storage Configuration =====
----- Setup Performance Module DBRoot Data Storage Mount Configuration -----
There are 2 options when configuring the storage: internal or external
'internal' - This is specified when a local disk is used for the DBRoot storage.
High Availability Server Failover is not Supported in this mode
'external' - This is specified when the DBRoot directories are mounted.
High Availability Server Failover is Supported in this mode.
Select the type of Data Storage [1=internal, 2=external] (1) >
Enter the list (Nx,Ny,Nz) or range (Nx-Nz) of DBRoot IDs assigned to module 'pm1' (1) >
===== Performing Configuration Setup and MariaDB ColumnStore Startup =====
NOTE: Setting 'NumBlocksPct' to 50%
Setting 'TotalUmMemory' to 25% of total memory.
Running the MariaDB ColumnStore setup scripts
post-mysqld-install Successfully Completed
post-mysql-install Successfully Completed
Starting MariaDB Columnstore Database Platform
Starting MariaDB ColumnStore Database Platform Starting, please wait ....... DONE
System Catalog Successfull Created
MariaDB ColumnStore Install Successfully Completed, System is Active
Enter the following command to define MariaDB ColumnStore Alias Commands
. /etc/profile.d/columnstoreAlias.sh
Enter 'mcsmysql' to access the MariaDB ColumnStore SQL console
Enter 'mcsadmin' to access the MariaDB ColumnStore Admin console
NOTE: The MariaDB ColumnStore Alias Commands are in /etc/profile.d/columnstoreAlias.sh
Run the generated script:
$ . /etc/profile.d/columnstoreAlias.sh
Now you can access the database running the “mcsmysql” command:
$ mcsmysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 12
Server version: 10.3.16-MariaDB-log Columnstore 1.2.5-1
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]>
That’s it. Now, you can load data in your MariaDB ColumnStore database.
Conclusion
Columnar Storage is a great database storage alternative to handle data for analytics purposes. MariaDB ColumnStore is a Columnar Storage engine designed for this task, and as we could see, the installation is pretty easy, so if you need an OLAP database or process big data, you should give it a try.