Home > Cluster > Creating a Red Hat Cluster: Part 3

Creating a Red Hat Cluster: Part 3

Print Friendly, PDF & Email

Here is the third of a series of article describing how to create a Linux Red Hat/CentOS cluster. At the end of this article, we will have a working cluster, all that will be left to do is the creation of the GFS filesystem and the scripts that will stop, start and give a status of our ftp and web services. You can refer to “Part1 , “Part 2” and our network diagram before reading this article, but for now let’s move on and continue our journey into building our cluster.

Defining the fencing devices

The fencing device is use by the cluster software to power-off the node when it is considered in problem. We need to define a fencing device for all the nodes that we defined in the previous article.

In our cluster, the fencing device used is the HP ILO interface. Select “the “HP ILO Device” from the pull down menu and enter the device information needed to connect to it. If you would like to see what are the fencing device supported by Red Hat cluster, you can consult the fencing FAQ on this page.

You could use manual fence for testing purpose but it is not supported in a production environment, since manual intervention is required to fence a server.

  • We choose to prefix the fencing device name with a lowercase “f_”, followed by the name assigned to its IP in /etc/hosts.
  • The login name to access the device.
  • The password used to authenticate to the device.
  • The host name as defined is our host file used to connect to the device.

 

Repeat the operation for each server in the cluster.

 

 

 

 

 

 

 

 

 

 

Defining the fence level

Next, we need to associate a fencing device with each server in our cluster. Let’s begin by clicking on the node name “hbbilbo.maison.ca”, then press on the button name (“Manage Fencing For This Node”) at the bottom right of the screen. You will be presented with a similar screen than the one below.

 

 

Now click on the “Add a New Fence Level” button, this will create a new fence level named “Fence_Level-1”.

Next, select “Fence-Level-1” on the left of the screen and then click on the button “Add a New Fence to this Level” . A little pop-up will appear, allowing you to assign the  fencing device to the host name. In our case we select the fencing device name “f_bilbo”. So we just created an association between a fencing device and a node.

 

 

 

 

 

 

 

 

 

 

 

 

 

Click on the “OK”  button to close the screen above and then press the “Close”  button.

Back on this screen, press the “Close” button again to end the definition of our fence level.

 

We need to repeat the operation for each node in our cluster.

Once a fence level is created for all nodes, you can proceed with the next step.

 

 

 

 

 

 

 

 

Define Failover Domain

A failover domain is a named subset of cluster nodes that are eligible to run a cluster service in the event of a node failure. The name of our failover domain, will always begin with “fd_(lowercase), this is the standard that I choose.

So let’s begin defining our failover domain, by clicking on “Failover Domains” on the right side of the GUI and pressing the “Create a Failover Domain” button.

Now enter the name of the failover domain, in this case for “bilbo” server will be “fd_bilbo” to stick to our standard.  Press the “OK” button to proceed.

The failover domain configuration, list all the servers that will be part of the specified failover domain. We choose, that each member of the cluster will be listed in our failover domain.

So select each server one at a time from the selection list “Available Cluster Nodes” , until they have all been selected.  When all the servers are selected, the selection list will display “No Cluster Nodes Available”.

We choose to restrict the failover domain to the list of servers we have included. This means all of them for this cluster, but they can be situation were you could have six nodes in a cluster and we want one service to run on only three of the servers because they have more cpu and memory then the three others .

We also want the service to be prioritized, check the “Prioritized List” checkbox.

You may then highlight the nodes listed in the “Member Node” box and click the “Adjust Priority”  buttons to select the order in which the failover should occur. The node with the highest priority (priority 1) will be the initial node for running the service. If that node fails, the service will be relocated to the node with the next higher priority (priority 2) and so forth.

So for the failover domain  “fd_bilbo”  the “hbbilbo”  will be prioritize (1), if the server is power-off then the service will be move to “hbgandalf” (2) our passive node and if it unavailable then service will move to the server “hbgollum” (3).

The preferred server for the failover domain “fd_gollum” will be “hbgollum” (1), if it is not available then our passive server “hbgandalf”  (2) will take over. And in the eventuality that the two servers are not available then “hbbilbo” (3) will take over.

For the failover domain “fd_gandalf” we have choosen to give “hbgandalf”  priority one, “hbgollum” priority two and three to “hbbilbo” .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Defining cluster resources

Cluster resource can be IP addresses, scripts, NFS and/or GFS filesystem mount point, database that are needed to run a service. In this case our services will be a FTP and a web site.

If you remember inl our cluster network diagram, we’ll be running 2 services in our cluster. The first service will be a FTP server name “ftp.maison.ca” running at IP 192.168.1.204 and a web service “www.maison.ca” running at IP 192.168.2.211.

So let’s define an “IP Resource” for our ftp server at 192.168.1.204. Click on “Resources” on the left part of the screen and then click on the button “Create a Resource”  at the lower right of the screen. From the drop down list, select the “IP Address” resource type.

Enter the IP of our ftp server “192.168.1.204” and make sure that the “Monitor Link” check box is selected. That means that if for a reason or another this IP is no longer responding, then this will trigger the move of our ftp service to the next server that we have defined in our failover domain for that node.

Repeat the process for our web site at IP 192.168.1.211.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Defining our cluster services

We have decided that we would be running two services in our cluster, one for the ftp and one for the web site.

Now is the time to define these services, we will begin by creating our ftp service. Click on “Services” on the bottom left of the screen and then press the button called “Create a Service” to begin the process.

First, we need to assign a name to our service, we choose to prefix our service by “srv_” , this will be our standard. We choose a short name, because they are nicer when they are displayed by the “clustat” command that we will see later on. So enter the service name “srv_ftp” and click on the “OK” button.

Next you will see a screen similar to this one.

First we need to assign a failover domain to our ftp service. As planned on our cluster network diagram we were going to run the ftp service on the gollum server, so from the failover domain drop down list we select the failover domain “fd_gollum” for our ftp service.

Next we need to make our FTP server IP (192.168.1.204)  part of the service. To do so click on the button “Add a Shared Resource to this service” and select the IP used for our ftp service.

 

 

 

After pressing the button above,  you should see a screen similar to this one.

Click on the IP that our ftp server will use (192.168.2.204) and press the “OK” button.

One last thing before finishing the definition of our ftp service, make sure you select “Relocate” from the “Recovery Policy”. This allow our service to be move automatically to another server if the FTP service become inaccessible.

We can now press the “Close” button and repeat the operation to create of our web service.

Follow the name procedure as the ftp service to create the web service.

The name that we decided to call the service for the web site is “srv_www” .

The web site service “srv_www” will be part of the “fb_bilbo” failover domain, so it will start on the “bilbo” server.

If the web service become unavailable, the service will be relocated to another server within the cluster. The server to where the web service will be move, is be based on the priority order we have put in the failover domain. So the IP will automatically be move when the service is relocated to another server within the cluster. We will demonstrate that later on in this article.

 

 

 

 

 

 

 

Propagate cluster configuration

We should have now a working cluster, it does not have the final configuration but we will be able to bring it up and move services from one node to another. The missing part of our cluster in order to finalize it, is the creation of the GFS filesystem and our scripts that will bring up and down our ftp/web site. We will look at that in the next article.

But for now let’s push this configuration to our cluster, by pressing the “Sent to Cluster” button. This will copy our new configuration file “/etc/cluster/cluster.conf” to every member of the cluster and activate it.

We will need to confirm our intention, by pressing the “Yes” button to the pop up below.

I had some problem pushing the initial configuration to other member of the cluster, on my first attempt. If this happen, you may have to copy it manually. The cluster configuration file is stored in /etc/cluster and is named cluster.conf.  If you have ran the “system-config-cluster” GUI on “bilbo” server, then issue the following command on “bilbo” to copy the configuration file to the other servers ;

scp /etc/cluster/cluster.conf  gandalf:/etc/cluster

scp /etc/cluster/cluster.conf  gollum:/etc/cluster

If the copy is done manually, you will have to restart the cluster services on each node or reboot all the nodes. Once you have a working cluster, you will not have to manually copy the cluster configuration file, pressing the button “Send to Cluster” will be enough. You can also the command “ccs_tools” to propagate the cluster configuration change (describe below).

 

 

 

 

Manual adjustment to cluster configuration file

I made the following changes to the cluster configuration file to prevent the reboot of the server after it is fenced (For the HP ILO). When a server is fenced, the service is transferred to another node and the server that was hosting that service is power off (fenced). If we do not make the following change, the server that was power off will automatically reboot and perhaps then same error condition may occur (network, switch, FC problem), it would power off / power on, / power off / power on …. We want to eliminate that.

Every time you update the cluster configuration file manually you need to increment the configuration version number, so it will trigger an update on the others servers .

root@gandalf:~# vi /etc/cluster/cluster..conf

 

Increment the version number

Since we are changing the configuration file, need to increment the version number.

Before change

<cluster alias=”our_cluster” config_version=”74” name=”our_cluster”>

After the change

<cluster alias=”our_cluster” config_version=”75” name=”our_cluster”>

 

Changes to prevent rebooting of the node after a power-off

We need to do this modification for each of the “clusternode name” section. This will prevent a restart of the node after it has been fenced.

Before the change

<clusternode name=”hbbilbo.maison.ca” nodeid=”1″ votes=”1″>
<fence>
<method name=”1″>
<device name=”f_bilbo”/>
</method>
</fence>
</clusternode>

After the change

<clusternode name=”hbbilbo.maison.ca” nodeid=”1″ votes=”1″>
<fence>
<method name=”1″>
<device action=”off” name=”f_bilbo”/>
</method>
</fence>
</clusternode>

 

Increase resource manager verbosity

I include here something optional, if you are having some problem with your cluster, increasing verbosity of the resource manager daemon “rgmanager” may help you.  Changing the <rm> line will add more debugging information in the /var/log/rgmanager file (syslogd need to be restarted).

Before the change

</fencedevices>
<rm>
<failoverdomains>

After the change

</fencedevices>
<rm log_facility=”local4″ log_level=”7″>
<failoverdomains>

Whenever you update the configuration file manually you can use the “ccs_tool” command to propagate the new cluster configuration file to all cluster members. Don’t forget to update the version number.

Distribute new cluster config file

 

Checking cluster functionality

We can check the status of your cluster in two different ways. The simplest one is to use the “clustat” command (see output below).  If you want a continuous display, you can add “-i 2” to the “clustat” command, to have the output refreshed every two seconds (Press CTRL-C to stop the display).

clustat command output

This is the normal output you should have. All our members are “Online”  with the resource manager running on them.

The name of each members of the cluster, their “Node ID” and their status are displayed.

Our two services “srv_ftp” and “srv_www”  are running (started) on the selected server.

system-config-cluster (cluster mgmt tab)

The second way to get the cluster status, is to run the cluster GUI (system-config-cluster) and then click on the “Cluster Management” tab.

In the upper part of the screen, we can see the current member of our cluster and their “Node ID” .

Below, we can visualise the status of each services define within the cluster.

We can also see that our service “srv_ftp”, is running (started) on “hbgollum” and that the service “srv_www” is running on “hbbilbo.maison.ca” .

With the “Cluster Management” tab , we can disable (stop), enable (start) or restart each of these services. More on this later.

Let’s check if our services IP are alive.

The IP address of the ftp service that we have define is “192.168.1.204” and from the “clustat” command above we know that it is running on the server “gollum”. So let’s logon on that server and check if that IP is active. Be aware that the output of the “ifconfig -a”  does not include our FTP IP, we need to use the “ip” command to see it.  From the output below, we can see that server IP “192.168.1.104”  is active on “eth0” and that our ftp server IP  “192.168.1.204” is also active on the same interface. So our cluster software is doing it’s job, we have the “ftp.maison.ca” IP defined on the interface “eth0”. You will also notice that on “eth1”, we have our heartbeat IP “10.10.10.104” is active.

root@gollum:~# ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:50:da:68:df:9b brd ff:ff:ff:ff:ff:ff
inet 192.168.1.104/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.204/24 scope global secondary eth0
inet6 fe80::250:daff:fe68:df9b/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:02:a5:b1:a0:d4 brd ff:ff:ff:ff:ff:ff
inet 10.10.10.104/24 brd 10.10.10.255 scope global eth1
inet6 fe80::202:a5ff:feb1:a0d4/64 scope link
valid_lft forever preferred_lft forever
4: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
root@gollum:~#

 

On the server “bilbo” we have our web IP (192.168.1.211) active on the interface “eth0” along with the server IP (192.168.1.111). Our heartbeat IP is defined on the interface “eth1” .

root@bilbo:~# ip addr list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:01:02:75:80:58 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.111/24 brd 192.168.1.255 scope global eth0
inet 192.168.1.211/24 scope global secondary eth0
inet6 fe80::201:2ff:fe75:8058/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:50:8b:f4:c5:59 brd ff:ff:ff:ff:ff:ff
inet 10.10.10.111/24 brd 10.10.10.255 scope global eth1
inet6 fe80::250:8bff:fef4:c559/64 scope link
valid_lft forever preferred_lft forever
4: sit0: <NOARP> mtu 1480 qdisc noop
link/sit 0.0.0.0 brd 0.0.0.0
root@bilbo:~#

I think we should stop the article here, we have now a working cluster that we will exploit in our next article.

In our next article, we will create our services scripts and our GFS filesystem so that our server can share the same filesystem. This feature will give us the ability to share information between server via a common filesystem. So when the service has to moved from one server to another, the service can start using the same data it was using on the previous server.

I don’t know if the next article will be the last one of this series of article on the cluster, but we will see.

Stay tune for part 4 of this series of article on creating a Linux Red Hat cluster.

Part 1 – Creating a Linux ReadHat/CentOS cluster

Part 2 – Creating a Linux ReadHat/CentOS cluster

Part 3 – Creating a Linux ReadHat/CentOS cluster

Part 4 – Creating a Linux ReadHat/CentOS cluster

Part 5 – Creating a Linux ReadHat/CentOS cluster

 

Categories: Cluster