Jan 032017
 

In this step by step guide I will take you through all steps required to configure a highly available, 2-node MySQL cluster (plus witness server) in Google Cloud Platform (Google Compute Engine, aka GCE).  The guide includes both screenshots, shell commands and code snippets as appropriate.  I assume that you are somewhat familiar with Google Cloud Platform and already have an account.  If not, you can sign up for a free trial today.  I’m also going to assume that you have basic linux system administration skills as well as understand basic failover clustering concepts like Virtual IPs, quorum, etc.

Disclaimer: The cloud is a rapidly moving target.  As such, features/screens/buttons are bound to change over time so your experience may vary slightly from what you’ll see below.  While this guide will show you how to make a MySQL database highly available, you could certainly adapt this information and process to protect other applications or databases, like SAP,PostgreSQL, Oracle, WebSphere MQ, NFS file servers, and more.

These are the high level steps to create a highly available MySQL database within Google Compute Engine:

  1. Create a Project
  2. Create Instances (Virtual Machines)
  3. Create Instance Groups
  4. Create Firewall Rule to allow VNC access
  5. Linux OS Configuration
  6. Install and Configure MySQL
  7. Install and Configure Cluster
  8. Create an Internal Load Balancer
  9. Create Firewall Rules for Internal Load Balancer
  10. Test Cluster Connectivity

Overview

This article will describe how to create a cluster within a single Google Cloud region.  The cluster nodes (node1, node2 and the witness server) will all reside in the region “us-central1” (10.128.0.0/20 network) but you can select your region accordingly.

The configuration will look like this:

The following IP addresses will be used:

  • node1: 10.128.0.2
  • node2: 10.128.0.3
  • witness: 10.128.0.4
  • Internal Load Balancer IP: 10.128.0.99
  • MySQL port: 3306

Create a Project

Upon first login, you’ll see an empty Home Dashboard and will be asked to create a Project.  All Google Compute Engine resources we will be creating will belong to this Google Cloud Platform project.

Here we will call our newly created Project “LinuxCluster”:

Create Instances (Virtual Machines)

We will be provisioning 3 Virtual Machines in this guide.  The first two VMs (I’ll call them “node1” and “node2”) will function as cluster nodes with the ability to bring the MySQL database and it’s associated resources online.  The 3rd VM will act as the cluster’s witness server for added protection against split-brain.

To ensure maximum availability, all 3 VMs will reside in different zones within the region (in this example: us-central1-a, us-central1-b, us-central1-c).

Create “node1” Instance

Create your first VM instance (“node1”).  If this is the first time you’ve created an instance, your screen will look something like the image below. Click “Create Instance” button in the center of your screen:

If you have other instances already running in GCE, your screen will look a bit different.  Still, click “create instance” to continue:

By default, Debian linux is usually selected by default.  We *DON’T* want this as we’ll be using CentOS 6.X in this guide.

Give the instance a name (“node1”), select the first zone (a) within our region (us-central1) and make sure to click “Change” to select the proper boot image.  You can size the instance based on your workload requirements, but for the purposes of this guide we’ll be using the default size to minimize cost, which is a fairly small VM (1 vCPU and only 3.75GB RAM)

In the Boot disk pop up screen, select CentOS 6 and at the bottom we will go with an SSD boot disk.  10GB is more than sufficient for the purposes of this guide.  You can size your systems accordingly:

After clicking “Select”, you’ll be brought back to the Create An Instance screen.  Towards the bottom, click “Management, disk, networking, SSH keys” because we will be adding a 2nd disk to our VM.  This 2nd disk will be used to store our databases, and it’s what will later be replicated/synchronized by the clustering software.

 

Select the “Disks” tab, and click “+ Add item” to add a 2nd disk to this instance:

Click “Create disk”:

Give the new disk a name, select desired type, and start with a blank disk.  10 GB should be more than sufficient for our needs here in this example configuration.  NOTE: Remember the value you set here.  Both cluster nodes (node1 and node2) needs to be the SAME SIZE:

Finally, click the “Networking” tab, and give node1 a customer Internal IP.

Click “Create” to launch your new instance:

Create “node2”

Repeat the steps above twice to create your second cluster node (“node2”).  Create this instance just like you did node1, including the addition of the 2nd disk.

IMPORTANT: make sure it’s in a different zone (us-central1-b) and give it a unique IP (10.128.0.3)

Create the “witness” VM

Create your third VM (“witness”) and make sure it’s in a different zone (us-central1-c) from the first two instances.

NOTE: This instance DOESN’T need the extra disk added.

It may take a little while for your 3 VM instances to provision.  Once complete, you’ll see your VMs listed on the VM Instances screen within your Google Cloud Console.  Verify that you properly launched each VM into a different zone:

 

Create Instance Groups

Later in this guide we will be creating an Internal Load Balancer to route traffic to the active cluster node.  All of the possible load balancer configurations available on Google Cloud Platform require instance groups to serve the traffic sent from the load balancer.

Two instance groups will be created, and each one will contain one cluster node.

Create Instance Group 1

Give your first instance group a name (“instance-group-1a”), select “Single-zone” and make sure to properly select the Zone where your first VM instance resides.  Here, we select us-central-1a, because that is where “node1” was deployed.  Below, choose “Select existing instances” and pick “node1” from the VM instances drop down:

Create Instance Group 2

Repeat the previous step once more, this time selecting the zone your second node resides in.  us-central-1b and node2:

Create Firewall Rule to allow VNC access

You can see all of your existing firewall rules by navigating to:

Networking -> Firewall Rules

By default, the only ports open in the “Google firewall” from the outside world into your VMs are are ping, SSH (port 22) and RDP (port 3389).

Later in the guide, we’ll be using VNC to access the desktop of “node1” and configure the cluster using a GUI.  Create an Firewall Rule to allow VNC access.  In this guide port 5902 is used.  Adjust this according based on your VNC configuration.

 

Linux OS Configuration

Next, we will need to configure the Linux OS of our instances and get our hands dirty on the command line, which as a Linux administrator you should be used to by now.

There are a number of ways to connect to the console of your linux VMs.  You can initiate an SSH connection directly from the GCE web interface, or you can install the Google Cloud SDK locally on your laptop/workstation.

To SSH using your browser, simple go to Compute -> VM Instances and to the right of the VM you wish to connect to, under “Connect” select “Open in browser window”.

If you would prefer to have the Google Cloud command line tools installed natively on your laptop/workstation, please see the documentation here:

https://cloud.google.com/sdk/docs/quickstarts

Once connected use the “sudo” command to gain root privileges:

$sudo su -

Edit /etc/hosts

Unless you have already have a DNS server setup, you’ll want to create host file entries on all 3 servers so that they can properly resolve each other by name

Add the following lines to the end of your /etc/hosts file:

10.128.0.2    node1
10.128.0.3    node2
10.128.0.4    witness
10.128.0.99   mysql-vip

Disable SELinux

Edit /etc/sysconfig/linux and set “SELINUX=disabled”:

# vi /etc/sysconfig/selinux

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

Install various RPM packages

Next, install a handful of rpm packages that will be needed later as pre-requisites for our clustering software:

# yum install redhat-lsb patch

Install and Configure VNC (and related packages)

In order to access the GUI of our linux servers, to later configure our cluster, install VNC server on your cluster node.  In my setup I only did this on “node1”

# yum install tigervnc-server xterm
# vncpasswd
# vi /etc/sysconfig/vncservers

      VNCSERVERS="2:root"
      VNCSERVERARGS[2]="-geometry 1024x768"

# service vncserver start
# chkconfig vncserver on

Test connectivity by opening a VNC client on your laptop/desktop, and connecting to the Public IP of your cluster node

Reboot Cluster Nodes

Reboot so that SELinux is disabled. All 3 systems (node1, node2, witness) need to be rebooted.

Partition and Format the “data” disk

During VM instance creation, an extra disk was added to each cluster node to store the application data we will be protecting.  In this case it happens to be MySQL databases.

The disk configuration of our VMs is as follows:

  • /dev/sda – OS disk
  • /dev/sdb – data disk

The 2nd disk added during instance creation  /dev/sdb.  You can run the “fdisk -l” command to verify.  You’ll see that /dev/sda (OS) already has a disk partition and is being used.

# fdisk -l

Disk /dev/sda: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00035e98

 Device Boot Start End Blocks Id System
/dev/sda1 * 1 1306 10484736 83 Linux

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
64 heads, 32 sectors/track, 10240 cylinders
Units = cylinders of 2048 * 512 = 1048576 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x762b810b


Here we will create a partition (/dev/sdb1), format it, and mount it at the default location for MySQL, which is /var/lib/mysql.  Perform the following steps on BOTH “node1” and “node2”:

# fdisk /dev/sdb
Command (m for help): n
Command action
e   extended
p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1): <enter>
Using default value 1
Last cylinder, +cylinders or +size{K,M,G} (1-1305, default 1305): <enter>
Using default value 1305
 
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
[root@node1 ~]#

# mkfs.ext4 /dev/sdb1
# mkdir /var/lib/mysql

On node1, mount the filesystem:

# mount /dev/sdb1 /var/lib/mysql

Install and Configure MySQL

Next, install install the MySQL packages, initialize a sample database, and set “root” password for MySQL.

On “node1”:

# yum -y install mysql mysql-server
# /usr/bin/mysql_install_db --datadir="/var/lib/mysql/" --user=mysql
# mysqld_safe --user=root --socket=/var/lib/mysql/mysql.sock --port=3306 --datadir=/var/lib/mysql --log &
#
# # NOTE: This next command allows remote connections from ANY host.  NOT a good idea for production!
# echo "update user set Host='%' where Host='node1'; flush privileges" | mysql mysql
#
# #Set MySQL's root password to 'SIOS'
# echo "update user set Password=PASSWORD('SIOS') where User='root'; flush privileges" | mysql mysql

Create a MySQL configuration file. We will place this on the data disk  (that will later be replicated – /var/lib/mysql/my.cnf).  Example:

# vi /var/lib/mysql/my.cnf

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
pid-file=/var/lib/mysql/mysqld.pid
user=root
port=3306
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
 
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
 
[client]
user=root
password=SIOS

Delete the original MySQL configuration file, located in /etc, if it exists:

# rm /etc/my.cnf

On “node2”:

On “node2”, you ONLY need to install the MySQL packages.  The other steps aren’t required:

[root@node2 ~]# yum -y install mysql mysql-server

Install and Configure the Cluster

At this point, we are ready to install and configure our cluster.  SIOS Protection Suite for Linux (aka SPS-Linux) will be used in this guide as the clustering technology.  It provides both high availability failover clustering features (LifeKeeper) as well as real-time, block level data replication (DataKeeper) in a single, integrated solution.  SPS-Linux enables you to deploy a “SANLess” cluster, aka a “shared nothing” cluster meaning that cluster nodes don’t have any shared storage, as is the case with Azure VMs.

Install SIOS Protection Suite for Linux

Perform the following steps on ALL 3 VMs (node1, node2, witness):

Download the SPS-Linux installation image file (sps.img) and and obtain either a trial license or purchase permanent licenses.  Contact SIOS for more information.

You will loopback mount it and run the “setup” script inside, as root (or first “sudo su -” to obtain a root shell)

For example:

# mkdir /tmp/install
# mount -o loop sps.img /tmp/install
# cd /tmp/install
# ./setup

During the installation script, you’ll be prompted to answer a number of questions.  You will hit Enter on almost every screen to accept the default values.  Note the following exceptions:

  • On the screen titled “High Availability NFS” you may select “n” as we will not be creating a highly available NFS server
  • Towards the end of the setup script, you can choose to install a trial license key now, or later. We will install the license key later, so you can safely select “n” at this point
  • In the final screen of the “setup” select the ARKs (Application Recovery Kits, i.e. “cluster agents”) you wish to install from the list displayed on the screen.
    • The ARKs are ONLY required on “node1” and “node2”.  You do not need to install on “witness”
    • Navigate the list with the up/down arrows, and press SPACEBAR to select the following:
      • lkDR – DataKeeper for Linux
      • lkSQL – LifeKeeper MySQL RDBMS Recovery Kit
    • This will result in the following additional RPMs installed on “node1” and “node2”:
      • steeleye-lkDR-9.0.2-6513.noarch.rpm
      • steeleye-lkSQL-9.0.2-6513.noarch.rpm

Install Witness/Quorum package

The Quorum/Witness Server Support Package for LifeKeeper (steeleye-lkQWK) combined with the existing failover process of the LifeKeeper core allows system failover to occur with a greater degree of confidence in situations where total network failure could be common. This effectively means that failovers can be done while greatly reducing the risk of “split-brain” situations.

Install the Witness/Quorum rpm on all 3 nodes (node1, node2, witness):

# cd /tmp/install/quorumrpm -Uvh steeleye-lkQWK-9.0.2-6513.noarch.rpm

On ALL 3 nodes (node1, node2, witness), edit /etc/default/LifeKeeper, set

NOBCASTPING=1

On ONLY the Witness server (“witness”), edit /etc/default/LifeKeeper, set

WITNESS_MODE=off/none

Install a License key

On all 3 nodes, use the “lkkeyins” command to install the license file that you obtained from SIOS:

# /opt/LifeKeeper/bin/lkkeyins <path_to_file>/<filename>.lic

Start LifeKeeper

On all 3 nodes, use the “lkstart” command to start the cluster software:

# /opt/LifeKeeper/bin/lkstart

Set User Permissions for LifeKeeper GUI

On all 3 nodes, edit /etc/group and add the “tony” user (or whatever username you are logged in as) to the “lkadmin” group to grant access to the LifeKeeper GUI.  By default only “root” is a member of the group, and we don’t have the root password in :

# vi /etc/group

lkadmin:x:502:root,tony

Open the LifeKeeper GUI

Make a VNC connection to the Public IP address of node1.  Based on the VNC and Firewall Rule configuration from above, you would connect to <Public_IP>:2 using the VNC password you specified earlier.  Once logged in, open a terminal window and run the LifeKeeper GUI using the following command:

# /opt/LifeKeeper/bin/lkGUIapp &

You will be prompted to connect to your first cluster node (“node1”).  Enter the linux userid and password specified during VM creation:

lk-gui-connect1

Next, connect to both “node2” and “witness” by clicking the “Connect to Server” button highlighted in the following screenshot:

lk-gui-connect2

You should now see all 3 servers in the GUI, with a green checkmark icon indicating they are online and healthy:

lk-gui-connect3

Create Communication Paths

Right-click on “node1” and select Create Comm Path

comm path1

Select BOTH “node2” and “witness” and then follow the wizard.  This will create comm paths between:

  • node1 & node2
  • node1 & witness

comm path2

A comm path still needs to be created between node2 & witness.   Right click on “node2” and select Create Comm Path.  Follow the wizard and select “witness” as the remote server:

comm path3

At this point the following comm paths have been created:

  • node1 <—> node2
  • node1 <—> witness
  • node2 <—> witness

The icons in front of the servers have changed from a green “checkmark” to a yellow “hazard sign”.  This is because we only have a single communication path between nodes.

If the VMs had multiple NICs (information on creating Azure VMs with multiple NICs can be found here, but won’t be covered in this article), you would create redundant comm paths between each server.

comm path4

To remove the warning icons, go to the View menu and de-select “Comm Path Redundancy Warning”:

comm path5

Result:

comm path6

Verify Communication Paths

Use the “lcdstatus” command to view the state of cluster resources.  Run the following commands to verify that you have correctly created comm paths on each node to the other two servers involved:

# /opt/LifeKeeper/bin/lcdstatus -q -d node1

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node2    TCP     10.128.0.2/10.128.0.3  ALIVE        1

witness  TCP     10.128.0.2/10.128.0.4  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d node2

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.128.0.3/10.128.0.2  ALIVE        1

witness  TCP     10.128.0.3/10.128.0.4  ALIVE        1

#/opt/LifeKeeper/bin/lcdstatus -q -d witness

MACHINE  NETWORK ADDRESSES/DEVICE   STATE     PRIO

node1    TCP     10.128.0.4/10.128.0.2  ALIVE        1

node2    TCP     10.128.0.4/10.128.0.3  ALIVE        1

Create a Data Replication cluster resource (i.e. Mirror)

Next, create a Data Replication resource to replicate the /var/lib/mysql partition from node1 (source) to node2 (target).  Click the “green plus” icon to create a new resource:

data replication1

Follow the wizard with these selections:

Please Select Recovery Kit:  Data Replication
Switchback Type: intelligent
Server: node1
Hierarchy Type: Replicate Exiting Filesystem
Existing Mount Point: /var/lib/mysql
Data Replication Resource Tag: datarep-mysql
File System Resource Tab: /var/lib/mysql
Bitmap File: (default value)
Enable Asynchronous Replication:  No

After the resource has been created, the “Extend” (i.e. define backup server) wizard will appear.  Use the following selections:

Target Server: node2
Switchback Type: Intelligent
Template Priority: 1
Target Priority: 10
Target Disk: /dev/sdb1
Data Replication Resource Tag: datarep-mysql
Bitmap File: (default value)
Replication Path: 10.128.0.2/10.128.0.3
Mount Point: /var/lib/mysql
Root Tag: /var/lib/mysql

The cluster will look like this:

data replication2

 

Create the MySQL resource hierarchy

Next, create a MySQL cluster resource.  The MySQL resource is responsible for stopping/starting/monitoring of your MySQL database.  To create, click the “green plus” icon to create a new resource:

Follow the wizard with to create the IP resource with these selections:

Select Recovery Kit: MySQL Database
Switchback Type: Intelligent
Server: node1
Location of my.cnf: /var/lib/mysql
Location of MySQL executables: /usr/bin
Database Tag: mysql

Extend the IP resource with the following selections:

Target Server: node2
Switchback Type: intelligent
Template Priority: 1
Target Priority: 10

As a result, your cluster will look as follows.  Notice that the Data Replication resource was automatically moved underneath the database (dependency automatically created) to ensure it’s always brought online before the database:

 

Create an Internal Load Balancer

If this was a typical on-premises cluster using either physical or virtual servers, you’d be done at this point.  Clients and Applications would connect into the Virtual IP of the cluster (10.128.0.99) to reach the active node.  In Google Cloud, this doesn’t work without some additional configuration.

To connect into the cluster, Google provides a feature were you can setup an Internal Load Balancer (ILB).  Essentially, when you connect to the IP address of the ILB (which we will set to 10.128.0.99) you are routed to the currently active cluster node.

Create a TCP Load Balancer:

This will be an internal load balancer, so select “Only between my VMs”:

Next, give the load balancer a name (“internal-lb”) and then click Backend configuration:

Make sure you select the proper region (“us-central1”) and configure the Backend.  Click “+ Add backend” and add both instance groups (instance-group-1a AND instance-group-1b):

The load balancer decides which node to route traffic to based on a health check.  In this example, a health check to see if MySQL is running (checking for default port 3306) will be configured.  Select “Create a health check”:

Give the new health check a name (“mysql-health-check”) and configured it for TCP port 3306:

Next, configure the Frontend of the load balancer.  Select “Frontend configuration” and under IP address, define a custom static internal IP of 10.128.0.99.  The port should be 3306, which is the default port for MySQL:

Finally, review and finalize load balancer creation.  Click “Create”:

Result.  You will see that the load balancer is online, however, it’s not showing either instance group as healthy! (as indicated by the 0/0).  We will fix that in the next section:

Create Firewall Rules for Internal Load Balancer

Per Google documentation (see “Configure a firewall rule to allow internal load balancing” section), two firewall rules need to be created.  The first allows traffic to the load balancer and from the load balancer to the instances. The second allows health check probes from the health checker.

Create a new firewall rule:

Give the new rule a name (allow-internal-lb) and specify “10.128.0.0/20” as the source IP range.  The allowed protocols and ports should be “tcp:3306“:

After clicking “Create”, you’ll be returned to the Firewall rules page and can see the newly created rule in the list.  Click “Create Firewall Rule” again so we can create the 2nd required rule:

Give the 2nd rule a name (“allow-health-check”).  TWO different source IP ranges will need to be defined:

  • 130.211.0.0/22
  • 35.191.0.0/16

Note: Always a good idea to double check with the google cloud documentation to make sure that these IP ranges are still valid.

You should now see both newly created firewall rules in the list:

Test Cluster Connectivity

At this point, all of our Google Cloud and Cluster configurations are complete!

Cluster resources are currently active on node1:

You will also notice that the Internal Load Balancer is showing node1, which is a member of instance-group-1a as “healthy” and as such, routing traffic coming into the virtual IP (10.128.0.99) to node1:

SSH into the witness server, “sudo su -” to gain root access.   Install the mysql client if needed:

[root@witness ~]# yum -y install mysql

Test MySQL connectivity to the cluster:

[root@witness ~]# mysql --host=10.128.0.99 mysql -u root -p

Execute the following MySQL query to display the hostname of the active cluster node:

mysql> select @@hostname;
+------------+
| @@hostname |
+------------+
| node1      |
+------------+
1 row in set (0.00 sec)
mysql>

Using LifeKeeper GUI, failover from Node1 -> Node2″.  Right click on the mysql resource underneath node2, and select “In Service…”:

After failover, resources are brought online on node2:

You will now see that the Internal Load Balancer is showing instance-group-1b, which contains node2, as healthy.  Traffic is now routed to node2:

After failover has completed, re-run the MySQL query.  You’ll notice that the MySQL client has detected that the session was lost (during failover) and automatically reconnects:

Execute the following MySQL query to display the hostname of the active cluster node, verifying that now “node2” is active:

mysql> select @@hostname;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    48
Current database: mysql
+------------+
| @@hostname |
+------------+
| node2      |
+------------+
1 row in set (0.56 sec)
mysql>