Looking for current and maintained information and documentation on (Linux ) Open Source High Availability HA Clustering?
You probably should be reading the Pacemaker site clusterlabs.org
This site conserves Heartbeat specific stuff. See Site news for details.

Talk:MySQL (resource agent)

From Linux-HA

Jump to: navigation, search

Created page for the mysql RA. Given the fact that this RA is currently undergoing major changes with MySQL replication support being added, please use this talk page to share your views and comments. Please sign your comments and include a timestamp (easily done with ~~~~). Thanks!

Florian 19:02, 27 April 2010 (UTC)

Doing MySQL HA with replication is in theory easy but in practice, it can quickly become a nightmare. Pacemaker offers a very good framework and the foundation stone has been laid by Florian and Andrew (? not sure) with the new MySQL resource agent. The current agent (1.0.3), although it works, has some shortcomings that need to be addressed in order to have a usable solution that can replace a tool like MMM.

The whole process relies on features like 'read_only' and, of course, replication. The problem with 'read_only' is that users with the SUPER privilege override the setting. We will need to either implement a verification tool or write, in big red letters, a warning against this privilege in the documentation. Also, replication can be filtered. Once again, either we force a verification in the agent or we warn against such filters in the documentation.

I am far from a Pacemaker expert (I am learning though) but I do know MySQL well. Here are some use cases that will happened and needs to be covered in some way. I don't pretend I know or have ideas about how to implement these requirements.

Transition of the master role:

Let's consider the following 3 servers: a master becoming a slave (MS), a slave becoming a master (SM) and a slave (S). Here are steps I see that are required.

  1. Remove connections from the master (MS)
    1. remove the writer VIP (Florian 16:17, 12 May 2010 (UTC): no need to do this within the RA, Pacemaker handles this via colocation constraint)
    2. if mysqld is accessible, kill all the user sessions expect the agent one and the system ones (Florian 16:17, 12 May 2010 (UTC): that would be a case for resource freeze/thaw. Unfortunately not available in Pacemaker yet)
    3. set read_only (Florian 16:17, 12 May 2010 (UTC): done)
    4. if mysqld is not accessible then fencing/stonith of the process or the host is the way to go (Florian 16:17, 12 May 2010 (UTC): intrinsic Pacemaker feature, nothing to do)
  2. Promotion to master (SM) (Florian 16:17, 12 May 2010 (UTC): done)
    1. check the status to the SQL thread so that it is at state "Has read all relay log;....". we might have to wait for that. Ideally the chosen "new master" should be the slave with the smallest second_behind_master value. This needs to have a timeout and a check for replication errors. On error or timeout, another slave must be promoted to master. (Florian 16:17, 12 May 2010 (UTC): All done)
    2. "stop slave;" and "reset slave;" (Florian 16:17, 12 May 2010 (UTC): Done)
    3. Record the current master_log_file and master_log_pos from "show master status"; (Florian 16:17, 12 May 2010 (UTC): We don't do that, instead we do RESET MASTER)
    4. Publish the new master IP along with current master_log_file and master_log_pos (Florian 16:17, 12 May 2010 (UTC): No need to do that, freshly reset master)
    5. Remove read_only setting (Florian 16:17, 12 May 2010 (UTC): done)
    6. Add the writer VIP (Florian 16:17, 12 May 2010 (UTC): no need to do this within the RA, Pacemaker handles this via colocation constraint)
  3. Reconfigure slaves (S) and (MS)
    1. "stop slave;" (Florian 16:17, 12 May 2010 (UTC): done)
    2. issue a "change master to " pointing to the new master_host, master_log_file and master_log_pos (Florian 16:17, 12 May 2010 (UTC): no need, slaves always connect to a freshly reset master)
    3. "start slave;" (Florian 16:17, 12 May 2010 (UTC): done)

Slaves management:

The other basic thing we need to achieve is pulling out a slave from the read pool when either replication is broken or it lags behind. We cannot fence or restart the mysql instance (or host) under these conditions since it will remove any chance the slave would have to catch up with the master. MMM achieves this with a pool of reader VIPs (one per slave and one for the master) and it moves those VIPs around. At some point, if all the slaves are lagging behind by more than a configurable amount of time, all the reader VIPs will be locate on the master host. That might be fairly tricky to achieve but given the flexibility of the Pacemaker configuration, I hope it is possible.

Yves 21:06, 11 May 2010 (UTC)

Personal tools