Server manufacturers and cloud providers offer different kinds of storage solutions to cater for your database needs. When buying a new server or choosing a cloud instance to run our database, we often ask ourselves - how much disk space should we allocate? As we will find out, the answer is not trivial as there are a number of aspects to consider. Disk space is something that has to be thought of upfront, because shrinking and expanding disk space can be a risky operation for a disk-based database.
In this blog post, we are going to look into how to initially size your storage space, and then plan for capacity to support the growth of your MySQL or MariaDB database.
How MySQL Utilizes Disk Space
MySQL stores data in files on the hard disk under a specific directory that has the system variable "datadir". The contents of the datadir will depend on the MySQL server version, and the loaded configuration parameters and server variables (e.g., general_log, slow_query_log, binary log).
The actual storage and retrieval information is dependent on the storage engines. For the MyISAM engine, a table's indexes are stored in the .MYI file, in the data directory, along with the .MYD and .frm files for the table. For InnoDB engine, the indexes are stored in the tablespace, along with the table. If innodb_file_per_table option is set, the indexes will be in the table's .ibd file along with the .frm file. For the memory engine, the data are stored in the memory (heap) while the structure is stored in the .frm file on disk. In the upcoming MySQL 8.0, the metadata files (.frm, .par, dp.opt) are removed with the introduction of the new data dictionary schema.
It's important to note that if you are using InnoDB shared tablespace for storing table data (innodb_file_per_table=OFF), your MySQL physical data size is expected to grow continuously even after you truncate or delete huge rows of data. The only way to reclaim the free space in this configuration is to export, delete the current databases and re-import them back via mysqldump. Thus, it's important to set innodb_file_per_table=ON if you are concerned about the disk space, so when truncating a table, the space can be reclaimed. Also, with this configuration, a huge DELETE operation won't free up the disk space unless OPTIMIZE TABLE is executed afterward.
MySQL stores each database in its own directory under the "datadir" path. In addition, log files and other related MySQL files like socket and PID files, by default, will be created under datadir as well. For performance and reliability reason, it is recommended to store MySQL log files on a separate disk or partition - especially the MySQL error log and binary logs.
Database Size Estimation
The basic way of estimating size is to find the growth ratio between two different points in time, and then multiply that with the current database size. Measuring your peak-hours database traffic for this purpose is not the best practice, and does not represent your database usage as a whole. Think about a batch operation or a stored procedure that runs at midnight, or once a week. Your database could potentially grow significantly in the morning, before possibly being shrunk by a housekeeping operation at midnight.
One possible way is to use our backups as the base element for this measurement. Physical backup like Percona Xtrabackup, MariaDB Backup and filesystem snapshot would produce a more accurate representation of your database size as compared to logical backup, since it contains the binary copy of the database and indexes. Logical backup like mysqldump only stores SQL statements that can be executed to reproduce the original database object definitions and table data. Nevertheless, you can still come out with a good growth ratio by comparing mysqldump backups.
We can use the following formula to estimate the database size:
Where,
- Bn - Current week full backup size,
- Bn-1 - Previous week full backup size,
- Dbdata - Total database data size,
- Dbindex - Total database index size,
- 52 - Number of weeks in a year,
- Y - Year.
The total database size (data and indexes) in MB can be calculated by using the following statements:
mysql> SELECT ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) "DB Size in MB" FROM information_schema.tables;
+---------------+
| DB Size in MB |
+---------------+
| 2013.41 |
+---------------+
The above equation can be modified if you would like to use the monthly backups instead. Change the constant value of 52 to 12 (12 months in a year) and you are good to go.
Also, don't forget to account for innodb_log_file_size x 2, innodb_data_file_path and for Galera Cluster, add gcache.size value.
Binary Logs Size Estimation
Binary logs are generated by the MySQL master for replication and point-in-time recovery purposes. It is a set of log files that contain information about data modifications made on the MySQL server. The size of the binary logs depends on the number of write operations and the binary log format - STATEMENT, ROW or MIXED. Statement-based binary log are usually much smaller as compared to row-based binary log, because it only consists of the write statements while the row-based consists of modified rows information.
The best way to estimate the maximum disk usage of binary logs is to measure the binary log size for a day and multiply it with the expire_logs_days value (default is 0 - no automatic removal). It's important to set expire_logs_days so you can estimate the size correctly. By default, each binary log is capped around 1GB before MySQL rotates the binary log file. We can use a MySQL event to simply flush the binary log for the purpose of this estimation.
Firstly, make sure event_scheduler variable is enabled:
mysql> SET GLOBAL event_scheduler = ON;
Then, as a privileged user (with EVENT and RELOAD privileges), create the following event:
mysql> USE mysql;
mysql> CREATE EVENT flush_binlog
ON SCHEDULE EVERY 1 HOUR STARTS CURRENT_TIMESTAMP ENDS CURRENT_TIMESTAMP + INTERVAL 2 HOUR
COMMENT 'Flush binlogs per hour for the next 2 hours'
DO FLUSH BINARY LOGS;
For a write-intensive workload, you probably need to shorten down the interval to 30 minutes or 10 minutes before the binary log reaches 1GB maximum size, then round the output up to an hour. Then verify the status of the event by using the following statement and look at the LAST_EXECUTED column:
mysql> SELECT * FROM information_schema.events WHERE event_name='flush_binlog'\G
...
LAST_EXECUTED: 2018-04-05 13:44:25
...
Then, take a look at the binary logs we have now:
mysql> SHOW BINARY LOGS;
+---------------+------------+
| Log_name | File_size |
+---------------+------------+
| binlog.000001 | 146 |
| binlog.000002 | 1073742058 |
| binlog.000003 | 1073742302 |
| binlog.000004 | 1070551371 |
| binlog.000005 | 1070254293 |
| binlog.000006 | 562350055 | <- hour #1
| binlog.000007 | 561754360 | <- hour #2
| binlog.000008 | 434015678 |
+---------------+------------+
We can then calculate the average of our binary logs growth which is around ~562 MB per hour during peak hours. Multiply this value with 24 hours and the expire_logs_days value:
mysql> SELECT (562 * 24 * @@expire_logs_days);
+---------------------------------+
| (562 * 24 * @@expire_logs_days) |
+---------------------------------+
| 94416 |
+---------------------------------+
We will get 94416 MB which is around ~95 GB of disk space for our binary logs. Slave's relay logs are basically the same as the master's binary logs, except that they are stored on the slave side. Therefore, this calculation also applies to the slave relay logs.
Spindle Disk or Solid State?
There are two types of I/O operations on MySQL files:
- Sequential I/O-oriented files:
- InnoDB system tablespace (ibdata)
- MySQL log files:
- Binary logs (binlog.xxxx)
- REDO logs (ib_logfile*)
- General logs
- Slow query logs
- Error log
- Random I/O-oriented files:
- InnoDB file-per-table data file (*.ibd) with innodb_file_per_table=ON (default).
Consider placing random I/O-oriented files in a high throughput disk subsystem for best performance. This could be flash drive - either SSDs or NVRAM card, or high RPM spindle disks like SAS 15K or 10K, with hardware RAID controller and battery-backed unit. For sequential I/O-oriented files, storing on HDD with battery-backed write-cache should be good enough for MySQL. Take note that performance degradation is likely if the battery is dead.
We will cover this area (estimating disk throughput and file allocation) in a separate post.
Capacity Planning and Dimensioning
Capacity planning can help us build a production database server with enough resources to survive daily operations. We must also provision for unexpected needs, account for future storage and disk throughput needs. Thus, capacity planning is important to ensure the database has enough room to breath until the next hardware refresh cycle.
It's best to illustrate this with an example. Considering the following scenario:
- Next hardware cycle: 3 years
- Current database size: 2013 MB
- Current full backup size (week N): 1177 MB
- Previous full backup size (week N-1): 936 MB
- Delta size: 241MB per week
- Delta ratio: 25.7% increment per week
- Total weeks in 3 years: 156 weeks
- Total database size estimation: ((1177 - 936) x 2013 x 156)/936 = 80856 MB ~ 81 GB after 3 years
If you are using binary logs, sum it up from the value we got in the previous section:
- 81 + 95 = 176 GB of storage for database and binary logs.
Add at least 100% more room for operational and maintenance tasks (local backup, data staging, error log, operating system files, etc):
- 176 + 176 = 352 GB of total disk space.
Based on this estimation, we can conclude that we would need at least 352 GB of disk space for our database for 3 years. You can use this value to justify your new hardware purchase. For example, if you want to buy a new dedicated server, you could opt for 6 x 128 SSD RAID 10 with battery-backed RAID controller which will give you around 384 GB of total disk space. Or, if you prefer cloud, you could get 100GB of block storage with provisioned IOPS for our 81GB database usage and use the standard persistent block storage for our 95GB binary logs and other operational usage.
Happy dimensioning!