In the earlier blogs in this series, we concluded that MaxScale with MariaDB Replication Manager still has some way to go as a failover solution.
The failover mechanism relied on MariaDB GTID, needed a wrapper script around the decoupled replication manager and had no protection against flapping. Since then, MaxScale and MariaDB Replication Manager (MRM) have received a couple of updates to improve them. For MaxScale, the greatest improvement must be the availability of the community edition repository.
Integration
Previously, to configure MaxScale to execute the MariaDB Replication Manager upon master failure, one would add a wrapper shell script to translate MaxScale’s parameters to the command line options of MRM. This has now been improved, there is no need for the wrapper script anymore. That also means that there is now less chance of parameter mismatch.
The new configuration syntax for MaxScale is now:
[MySQL Monitor]
type=monitor
module=mysqlmon
servers=svr_10101811,svr_10101812,svr_10101813
user=admin
passwd=B4F3CB4FD8132F78DC2994A3C2AC7EC0
monitor_interval=1000
script=script=/usr/local/bin/replication-manager --user root:admin --rpluser repluser:replpass --hosts $INITIATOR,$NODELIST --failover=force --interactive=false --logfile=/var/log/failover.log
events=master_down
Also, to know what happened during the failover, you can read this from the failover log as defined above.
Preferred master
In the second blog post, we mentioned you can’t set candidate masters like you are used to with MHA. Actually, as the author indicated in the comments, this is possible with MRM: by defining the non-candidate masters as servers to be ignored by MRM during slave promotion.
The syntax in the MaxScale configuration would be:
script=script=/usr/local/bin/replication-manager --user root:admin --rpluser repluser:replpass --hosts $INITIATOR,$NODELIST --ignore-servers=’172.16.2.123:3306,172.16.2.126’ --failover=force --interactive=false --logfile=/var/log/failover.log
Flapping
We also concluded the combination of MRM and MaxScale lacks the protection against flapping back and forth between nodes. This is mostly due to the fact that MRM and MaxScale are decoupled, they implement their own topology discovery. After MRM has performed its tasks, it exits. This could lead to an infinite loop where a slave gets promoted, fails due to the increase in load while the old-master becomes healthy again and is re-promoted.
MRM actually has protection against flapping when used in the so called monitoring mode, where MRM runs as an application. The monitoring mode is an interactive mode where a DBA can either invoke a failover or have this done automatically. With the failover-limit parameter, you can limit the number of failovers before MRM will back off and stop promoting. Naturally this only works because MRM is keeping state in the interactive mode.
It would actually make sense to also add this functionality to the non-interactive mode and somewhere keep the state after the last failover(s). Then MRM would be able to stop performing the failover multiple times within a short timeframe.
Monitoring mode
MRM also features a so called “monitoring” mode where it constantly monitors the topology and could failover automatically if there is a master failure. With MaxScale we always set the mode to “force” to have MRM perform the failover without the need of a confirmation. The monitoring mode actually invokes interactive mode, so unless you run it in screen, you can’t have MRM run in the background and perform the failover automatically for you.
Conclusion
MariaDB Replication Manager has improved over the past few weeks. With a few improvements, it has become more useful. Seeing the number of issues added (and resolved) indicate people are starting to test/use it. If MariaDB would provide binaries for MRM, the tool could receive wider adoption among the MariaDB users.