The principal objective of backing up your data is, of course, the ability to roll back and access your archives in case of hardware failure. To do business today, you need the certainty of knowing that in the case of disaster, your data will be protected and accessible. You would need to store your backups offsite, in case your datacenter goes down in flames.
Data protection remains a challenge for small and medium-sized businesses. Small-to-medium sized businesses prefer to archive their company’s data using direct-attached storage, with the majority of firms having plans to do offsite backup copies. Local storage approach can lead to one of the most severe dilemmas the modern company can face - loss of data in case of disaster.
Many factors come into deliberation when judging on whether to allow a business critical database to be transferred offsite, and when choosing a suitable vendor to do so. Traditional methods like writing to tape and shipping to a remote location can be a complicated process that requires special hardware, adequately trained staff and procedures to ensure that backups are regularly produced, protected and that the information contained in them is verified for integrity. Small businesses usually have small IT budgets. Often they can not afford to have a secondary datacenter, even if they have a dedicated data center. But nevertheless, it is still important to keep a copy of your backup files offsite. Disasters like hurricane, flood, fire or theft can destroy your servers and storage. Keeping backed up data in the separate data center ensures data is safe, no matter what is going on in your primary datacenter. Cloud storage is a great way of addressing this problem.
With the cloud backup approach, there are a number of factors to consider. Some of the questions you have are:
- Is backed-up data secured at rest in the external data center?
- Is transfer to or from the external data center through the public internet network safe?
- Is there an effect on RTO (Recovery Time Objective)?
- Is the backup and recovery process easy enough for our IT staff?
- Are there any changes required to existing processes?
- Are the 3rd party backup tools needed?
- What are the additional costs in terms of required software or data transfer?
- What are the storage costs?
Backup features when doing a backup to the cloud
If your MySQL server or backup destination is located in an exposed infrastructure like a public cloud, hosting provider or connected through an untrusted WAN network, you need to think about additional actions in your backup policy. There are few different ways to perform database backups for MySQL, and depending on the type of backup, recovery time, size, and infrastructure options will vary. Since many of the cloud storage solutions are simply storage with different API front ends, any backup solution can be performed with a bit of scripting. So what are the options we have to make process smooth and secure?
Encryption
It is always a good idea to enforce encryption to enhance the security of backup data. A simple use case to implement encryption is where you want to push the backup to an offsite backup storage located in the public cloud.
When creating an encrypted backup, one thing to have in mind is that it usually takes more time to recover. The backup has to be decrypted before any recovery activities. With a big dataset, this could introduce some delays to the RTO.
On the other hand, if you are using private key for encryption, make sure to store the key in a safe place. If the private key is missing, the backup will be useless and unrecoverable. If the key is stolen, all created backups that use the same key would be compromised as they are no longer secured. You can use the popular GnuPG or OpenSSL to generate the private or public keys.
To perform mysqldump encryption using GnuPG, generate a private key and follow the wizard accordingly:
$ gpg --gen-key
Create a plain mysqldump backup as usual:
$ mysqldump --routines --events --triggers --single-transaction db1 | gzip > db1.tar.gz
Encrypt the dump file and remove the older plain backup:
$ gpg --encrypt -r ‘admin@email.com’ db1.tar.gz
$ rm -f db1.tar.gz
GnuPG will automatically append .gpg extension on the encrypted file. To decrypt,
simply run the gpg command with --decrypt flag:
$ gpg --output db1.tar.gz --decrypt db1.tar.gz.gpg
To create an encrypted mysqldump using OpenSSL, one has to generate a private key and a public key:
OpenSSL req -x509 -nodes -newkey rsa:2048 -keyout dump.priv.pem -out dump.pub.pem
This private key (dump.priv.pem) must be kept in a safe place for future decryption. For mysqldump, an encrypted backup can be created by piping the content to openssl, for example
mysqldump --routines --events --triggers --single-transaction database | openssl smime -encrypt -binary -text -aes256
-out database.sql.enc -outform DER dump.pub.pem
To decrypt, simply use the private key (dump.priv.pem) alongside the -decrypt flag:
openssl smime -decrypt -in database.sql.enc -binary -inform
DEM -inkey dump.priv.pem -out database.sql
Percona XtraBackup can be used to encrypt or decrypt local or streaming backups with xbstream option to add another layer of protection to the backups. Encryption is done with the libgcrypt library. Both --encrypt-key option and --encryptkey-file option can be used to specify the encryption key. Encryption keys can be generated with commands like
$ openssl rand -base64 24
$ bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1
This value then can be used as the encryption key. Example of the innobackupex command using the --encrypt-key:
$ innobackupex --encrypt=AES256 --encrypt-key=”bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1” /storage/backups/encrypted
The output of the above OpenSSL command can also be redirected to a file and can be treated as a key file:
openssl rand -base64 24 > /etc/keys/pxb.key
Use it with the --encrypt-key-file option instead:
innobackupex --encrypt=AES256 --encrypt-key-file=/etc/keys/pxb.key /storage/backups/encrypted
To decrypt, simply use the --decrypt option with appropriate --encrypt-key or --encrypt-key-file:
$ innobackupex --decrypt=AES256 --encrypt-key=”bWuYY6FxIPp3Vg5EDWAxoXlmEFqxUqz1”
/storage/backups/encrypted/2018-11-18_11-10-09/
For more information about MySQL and MariaDB encryption, please check our another blog post.
Compression
Within the database cloud backup world, compression is one of your best friends. It can not only save storage space, but it can also significantly reduce the time required to download/upload data.
There are lots of compression tools available out there, namely gzip, bzip2, zip, rar, and 7z.
Normally, mysqldump can have best compression rates as it is a flat text file. Depending on the compression tool and ratio, a compressed mysqldump can be up to 6 times smaller than the original backup size. To compress the backup, you can pipe the mysqldump output to a compression tool and redirect it to a destination file. You can also skip several things like comments, lock tables statement (if InnoDB), skip GTID purged and triggers:
mysqldump --single-transaction --skip-comments --skip-triggers --skip-lock-tables --set-gtid-purged OFF --all-databases | gzip > /storage/backups/all-databases.sql.gz
With Percona Xtrabackup, you can use the streaming mode (innobackupex), which sends the backup to STDOUT in special tar or xbstream format instead of copying files to the backup directory. Having a compressed backup could save you up to 50% of the original backup size, depending on the dataset. Append the --compress option in the backup command. By using the xbstream in streaming backups, you can speed up the compression process by using the --compress-threads option. This option specifies the number of threads created by xtrabackup for parallel data compression. The default value for this option is 1. To use this feature, add the option to a local backup. An example backup with compression:
innobackupex --stream=xbstream --compress --compress-threads=4 > /storage/backups/backup.xbstream
Before applying logs during the preparation stage, compressed files will need to be
decompressed using xbstream:
Then, use qpress to extract each file ending with .qp in their respective directory before
running --apply-log command to prepare the MySQL data.
$ xbstream -x < /storage/backups/backup.xbstream
Limit network throughput
An great option for cloud backups is to limit network streaming bandwidth (Mb/s) when doing a backup. You can achieve that with pv tool. The pv utility comes with data modifiers option -L RATE, --rate-limit RATE which limit the transfer to a maximum of RATE bytes per second. Below example will restrict it to 2MB/s.
$ pv -q -L 2m
In below example, you can see xtrabackup with parallel gzip, encryption
/usr/bin/innobackupex --defaults-file=/etc/mysql/my.cnf --galera-info --parallel 4 --stream=xbstream --no-timestamp . | pv -q -L 2m | pigz -9 - | openssl enc -aes-256-cbc -pass file:/var/tmp/cmon-008688-19992-72450efc3b6e9e4f.tmp > /home/ubuntu/backups/BACKUP-3445/backup-full-2018-11-28_213540.xbstream.gz.aes256 ) 2>&1.
Transfer backup to Cloud
Now when your backup is compressed and encrypted, it is ready for transfer.
Google cloud
The gsutil command line tool is used to manage, monitor and use your storage buckets on Google Cloud Storage. If you already installed the gcloud util, you already have the gsutil installed. Otherwise, follow the instructions for your Linux distribution from here.
To install the gcloud CLI you can follow below procedure:
curl https://sdk.cloud.google.com | bash
Restart your shell:
exec -l $SHELL
Run gcloud init to initialize the gcloud environment:
gcloud init
With the gsutil command line tool installed and authenticated, create a regional storage bucket named mysql-backups-storage in your current project.
gsutil mb -c regional -l europe-west1 gs://severalnines-storage/
Creating gs://mysql-backups-storage/
Amazon S3
If you are not using RDS to host your databases, it is very probable that you are doing your own backups. Amazon’s AWS platform, S3 (Amazon Simple Storage Service) is a data storage service that can be used to store database backups or other business critical files. Either it’s Amazon EC2 instance or your on-prem environment you can use the service to secure your data.
While backups can be uploaded through the web interface, the dedicated s3 command line interface can be used to do it from the command line and through backup automation scripts. If backups are to be kept for a very long time, and recovery time isn’t a concern, backups can be transferred to Amazon Glacier service, providing much cheaper long-term storage. Files (amazon objects) are logically stored in a huge flat container named bucket. S3 presents a REST interface to its internals. You can use this API to perform CRUD operations on buckets and objects, as well as to change permissions and configurations on both.
The primary distribution method for the AWS CLI on Linux, Windows, and macOS is pip, a package manager for Python. Instruction can be found here.
aws s3 cp severalnines.sql s3://severalnine-sbucket/mysql_backups
By default S3 provides eleven 9s object durability. It means that if you store 1.000.000.000 (1 billion) objects into it, you can expect to lose 1 object every 10 years on average. The way S3 achieves that impressive number of 9s is by replicating the object automatically in multiple Availability Zones, which we’ll talk about in another post. Amazon has regional datacenters all around the world.
Microsoft Azure Storage
Microsoft’s public cloud platform, Azure, has storage options with their control line interface. Information can be found here. The open-source, cross-platform Azure CLI provides a set of commands for working with the Azure platform. It gives much of the functionality seen in the Azure portal, including rich data access.
The installation of Azure CLI is fairly simple, you can find instructions here. Below you can find how to transfer your backup to Microsoft storage.
az storage blob upload --container-name severalnines --file severalnines.sql --name severalnines_backup
Hybrid Storage for MySQL and MariaDB backups
With the growing public and private cloud storage industry, we have a new category called hybrid storage. This technology allows the files to be stored locally, with changes automatically synced to remote in the cloud. Such an approach is coming from the need of having recent backups stored locally for fast restore (lower RTO), as well as business continuity objectives.
The important aspect of efficient resource usage is to have separate backup retentions. Data that is stored locally, on redundant disk drives would be kept for a shorter period while cloud backup storage would be held for a longer time. Many times the requirement for longer backup retention comes from legal obligations for different industries (like telecoms having to store connection metadata). Cloud providers like Google Cloud Services, Microsoft Azure and Amazon S3 each offer virtually unlimited storage, decreasing local space needs. It allows you to retain your backup files longer, for as long as you would like and not have concerns around local disk space.
When scheduling backup with ClusterControl, each of the backup methods are configurable with a set of options on how you want the backup to be executed. The most important for the hybrid cloud storage would be:
- Network throttling
- Encryption with the build in key management
- Compression
- Retention period for the local backups
- Retention period for the cloud backups
Conclusion
The cloud has changed the data backup industry. Because of its affordable price point, smaller businesses have an offsite solution that backs up all of their data.
Your company can take advantage of cloud scalability and pay-as-you-go pricing for growing storage needs. You can design a backup strategy to provide both local copies in the datacenter for immediate restoration, and a seamless gateway to cloud storage services from AWS, Google and Azure.
Advanced TLS and AES 256-bit encryption and compression features support secure backups that take up significantly less space in the cloud.