High-Availability Solutions

(1)

High-Availability

Solutions

Building Low Cost Data

Centers Using Open

Source UNIX-based

Server Clusters

Mike Pietruszka &

Rudy Host

(2)

2

Introduction ... 4

Synopsis ... 7

Cluster and Network Design ... 10

Cluster Design ... 11

Network Design ... 14

Technical Requirements ... 16

Hardware Requirements ... 17

Hardware - Servers ... 17

Hardware - Network Devices ... 18

Software Requirements ... 19

Software - Operating System ... 19

Software - Load Balancers ... 21

Software - Web Services ... 22

Software - Database Services ... 23

Software - DNS ... 24

Software – NTP ... 25

In-Depth ... 26

In-Depth - Load Balancers ... 27

Server Nodes Configuration - /etc/networking/interfaces ... 29

Server Node Configuration - /etc/sysctl.conf ... 30

Load Balancer Configuration - /etc/ha.d/ha.cf ... 33

Load Balancer Configuration - /etc/ha.d/haresources ... 34

Load Balancer Configuration - /etc/ha.d/authkeys ... 35

Load Balancer Configuration - /etc/ha.d/ldirectord.cf ... 35

In-Depth - Web Services ... 40

In-Depth - Database Services ... 43

(3)

3

In-Depth - DNS Services ... 53

In-Depth - Management Services ... 56

Technical Problems ... 64

MySQL Database Replication Issues ... 65

MySQL Virtual IP Issues ... 67

(4)

4

Introduction

There comes a time in an IT students life where he is about to pause his education and graduate. By pause, we mean that as IT students, we never end our education. With technology changing daily, we have to stay on top of things at all times and we have to learn something new every day.

Therefore, when we graduate from college or a course, we only pause our education before the next chapter begins, as we never end school though. This pause of course has to be summarized, usually with a test or some sort of exam. But for us IT students, there is a project that encompasses

everything that we have learned in our college careers, a senior project is there for us to show off our skills. The senior project is a weeks-long exam in which we pull up our collective information and create something that shows off our experience and proves to our faculty and instructors that we have what it takes to go out into the world and work for employers where we will have to take numerous of such projects.

Our groups’ senior project was to build a high-availability, load-balancing UNIX cluster for the purpose of providing high uptime, self-redundant and scalable network service solutions that are meant to allow enterprises to have same level of service as the more expensive products for quarter of the cost and for those businesses that are still young and cannot afford collocation. Our mission’s statement is aimed at whoever is interested in our project and its goal and that is usually a business startup or someone that is looking to host various services on an inexpensive array of homebrewed supercomputer clusters. But first of, what is a cluster?

(5)

5 A cluster is a group of computers that are linked together to form a single computer. There are three types of clusters:

• High Availability (HA),

• Beowulf, and

• Mosix.

High Availability clusters are fault tolerant systems set to provide 100% uptime for various network services. They have two states of operation: Standby and Takeover. In Standby

configuration, only one node performs work, whereas in the Takeover configuration, the work is done on both machines and in case one fails the other one takes over. Beowulf clusters on another hand are used whenever there is a need for a huge amount of processing power (such as

rendering). Beowulf clusters (“poor people’s supercomputers”) are implemented through parallel computing where the workload is distributed across numerous machines. Finally, Mosix clusters act as a single system image for all the machines within the cluster. This simply means that the cluster appears as one true machine with its hardware resources being shared across numerous physical computers. For example, when a person logs onto any of the Mosix nodes, everything will be the same across the board. Mosix sits at the kernel level, which means it does not require any upper level software packages to distribute the tasks or resources.

As previously stated, we chose to build the High-Availability cluster to show that cheap high availability solutions were just as viable as $90,000 F5 Big-IP traffic managers as UNIX computing and clustering plays a huge role in today’s IT world where time is money. A new type of IT specialization also came to be and it is called “Cloud Computing” which deals with providing centralized online services that can be accesses by large amounts of people from anywhere in the

(6)

6 world. Clusters are used for cloud computing as they provide high availability and load balancing to ensure 24/7 uptime and no user disruption.

Clustering is also a huge topic in large enterprises as in today’s tough economic times, many of them are trying to downsize or consolidate their IT infrastructure. Cluster computing can help that by squeezing higher processing power and higher availability out of cheaper and less powerful machines (that is how Google started). Clustering also provides load balancing for large websites and availability for many key services such as database systems, DNS and DHCP servers, Mail servers, as well as DFS systems. Buying powerful systems to provide a service can be costly, but building a cluster system out of small, cheap computers can pay off with much better performance than a single machine and with lower TCO (Total Cost of Ownership).

We believe we can learn a lot about providing network services with the lowest amount of downtime and the most optimal and achievable performance. This is an obscure area of study in many universities and that is why we felt that working on such a thing was important – to show off others that that this is a growing area of IT that needs to be discussed and studied. Especially with more and more companies relying on redundancy, high uptime, and mission critical recovery techniques, it is imperative to have at least some experience or understanding in this field.

(7)

7

Synopsis

Our cluster was built and tested to allow a large amount of users to access its resources (such as websites with dynamic content, database systems, DNS servers, and such). What made our cluster a superb project was the idea that it was possible to have it span multiple computers both locally and globally, no matter what the location is. We also ensured that it was fully redundant and self-aware of its status (unlike DNS Round Robin, which would simply cycle through DNS entries without keeping any type of status on the nodes).

What we set to accomplish in this project was the following:

• Build a load-balanced high-availability UNIX cluster out of 2 load balancers and 2 nodes

• Provide non-interruptive access to network services such as web servers, database servers, and the like

• Implement successful static content (data that does not change often) replication without the use of expensive NAS/SAN/NFS solutions

• Implement easy resource management through various CLI/GUI tools

We can honestly say that all guidelines set at the beginning of our project were met with success, except for the resource management through GUI tools. We had to resort to using command line and only had limited graphic user interface management (which was done through Webmin). Nevertheless, even with time constraints and low hardware resources, we were able to build a successful cluster that ran various Content Management Systems (such as Drupal and MediaWiki) and numerous MySQL databases which were synchronized in real time.

(8)

8 Some of the limitations that we have encountered in this project include the following:

• Bandwidth constraints

o Our nodes operated on a 10mbit link between each other which proved to be a huge bottleneck when transferring large amounts of static data; this caused rsync

synchronization to take a long time when running over large amounts of data and due to low interval between the syncs, this became a huge issue instantaneously.

• Server hardware limitations

o Pentium 4 machines are great if you are providing resources that are not getting a lot of traffic – but if you start servicing more users that are trying to access more data, you will hit the bottleneck extremely fast.

• Database replication was limited to two nodes

o Time and hardware constraints limited us from deploying MySQL Cluster

Nevertheless, we were still able to build a successful UNIX cluster that gave us a great

understanding of High Availability clustering. We can honestly say that we learned what we needed in the future to implement more robust clusters that spanned a lot of machines.

There is also something else that is worth mentioning - this project was also built entirely out of Open Source Software. Everything that the servers and routers ran was Open Source, which is growing in popularity all over the world. OSS allows programmers to create usually free, stable, bug-free software that is checked by many sources to ensure that it works on a variety of hardware without any flaws. Open Source Software also ensures that its creators receive full credit for their creation whenever it changes hands. OSS provides complete freedom in which users can choose their type of license and allow people to change their code to their liking or needs. Did we say that

(9)

9 most of Open Source Software is also free and backed by huge communities? It is gaining support in many of world’s governments, institutions, and businesses. It is too good of a thing to pass up.

(10)

10

(11)

11

Cluster Design

In order to successfully execute this clustering project, it was imperative to come up with a solid network design. We had to draw up numerous designs and schemas for the cluster farm in order to know what was needed to be done. In the end we came up with a solid schema that fit our

hardware resources. We decided to use two machines for server nodes, two for load balancers, and one for DNS / NTP services.

The two load balancing nodes would have their own IP addresses, whereas the server nodes would have their own IP addresses as well as a virtual IP address that would be handled by a Loopback interface. Users would access the services hosted by the clusters through that virtual IP address. This IP address had to be shared by two or more server nodes which the load balancers use to listen on and direct traffic. The traffic is directed using a process called Linux Director which uses scheduling algorithms to forward the incoming requests. Both load balancers monitor each other and the server nodes through the use of Heartbeat. We also decided that both dynamic data (content that changes often and on the fly) and static data (content that does not change too often) would reside on the two node servers. This data would be kept exactly the same on both machines since whenever users connected, they would be randomly sent to different servers depending on the traffic. We chose to keep both types of data on the same machines over dedicated NAS/SAN and DRBD (Distributed Replicated Block Device)1_{solutions or another server with GlusterFS / NFS due} to cost effectiveness and lack of hardware.

1_{DRBD (Distributed Replicated Block Device) is a distributed storage system for Linux that is similar to RAID}

1 except that it runs over a network. Its main advantage is that it runs over a network therefore is a load balanced and clustered over a network instead of sitting on one machine which is a single point of failure.

(12)

12

Figure 1: Cluster Workflow Diagram

There is, of course, reasoning of having two load balancers. In reality, if one of these load balancers goes offline, nothing will be able to get to server nodes as all the traffic is intercepted through the load balancers. If there are two load balancers on another hand, one of them stays active and the

(13)

13 other stays in passive mode only to listen on in case the first one is to go offline. In case that

happens, the passive balancer becomes active and runs and goes on running its tasks.

(14)

14

Network Design

The cluster’s network design was rather quite simple. We used a simple Class C subnet which allowed us to have up to 254 hosts. A class A or B would have been more realistic in a true to life scenario as the cluster size would have grown beyond 254 easily depending on the needs. We chose to keep the server nodes at the beginning of the subnet while the load balancers and the gateway stayed towards the end for management reasons. This was done more for ease of use and network aesthetics. A tidy network is every sysop’s dream.

(15)

15

(16)

16

(17)

17

Hardware Requirements

Hardware - Servers

The hardware in our project consisted of 5 IBM NetVista computers. Each machine had a 2.8GHz Pentium 4 processor and 1GB of RAM. They also had 40GB hard-drives. Two of the machines were set aside to serve our needs as the official nodes on which the data would be stored. Another two were saved for load balancing and the last machine was our router / DNS server. We have also used unrelated machines for management purposes. We built our project with the constant ability of remote management in mind.

These are not by far the fastest machines, but those are the only ones that we had available for our project. Of course, faster machines would mean that they would have been more stable, would be able to server more clients, and in case something goes wrong, would process the data replication much more quickly. Having only 100mb full duplex Ethernet cards was a plus, but across the available 10mb switch at hand, the syncing of the data was hitting the bottleneck quite easily.

(18)

18

Hardware - Network Devices

Our network devices consisted of both a server-based router and dedicated gateway router and a Cisco Catalyst 1900 series switch. At first we went with a dedicated server-based router, which ran pfSense Firewall on top of a FreeBSD subsystem. Unfortunately, pfSense Firewall was limited in terms of DNS and Time Server daemons, which were necessary in our project.

One must also note that this was not a grand scale project. Our main goal was to simulate a data center operation on a small scale. If it were up to us, we would have been running multiple rack mount servers with dedicated fiber channels, F5 Big-IP local and global traffic managers, and full gigabit Ethernet switches. On top of that, we would have multiple offsite backup plans

implemented. Sadly, that is nothing but every sysadmin’s dream and this was not viable in this project. Plus, our job was to create a cheap, inexpensive alternative to all the expensive high availability solutions. We have also tasked ourselves with this project in a very limited time with little planning.

(19)

19

Software Requirements

Software - Operating System

The operating system that was installed on the computers was Ubuntu Server Edition 8.04 LTS. This wasn’t bleeding edge software, as bleeding edge would most likely break a lot of the software packages that we have used or would overwrite our configuration files without us knowing. We have also looked into various other UNIX-like systems, such as Arch Linux (which is known for its extremely fast and well-tuned startup sequence and general responsiveness, Pacman Package Manager, and simplicity of Slackware Linux/other minimalistic distributions) and FreeBSD (which is best known for being a most popular BSD variant and for its security), but Ubuntu Server proved to be the one with least of problems.

Ubuntu Server seemed like the best solution for this project. It is widely used and its amazing Debian-based Apt-Get package manager is one of the best package managers ever created for Linux and UNIX-like systems. With Ubuntu Server, we also received a very small and basic operating system without all the thrills and whistles that we did not need (but we had all the ability to add them). Ubuntu is also known for its extensive community that is well known for its exponential growth. Distrowatch.com currently ranks Ubuntu as a number one Linux distribution, and there is a great reason behind that.

Another reason why we chose Ubuntu Server was simply because with it, we’ll have the least of problems. Finding support on the Internet, as well as howtos and tutorials, is extremely easy as there is a huge amount of them available. Most of the time, everything will work on Ubuntu (even though we will suffer from performance losses compared to other highly-optimized distributions),

(20)

20 but that we can live with during the course of this project. With only limited time (10 weeks), we had to get going to get this project done. We could not have afforded to search through tons of documentations to fix one problem only to find out that it is an issue with code that is beyond our scope or get stuck on a code compile that never wanted to cooperate with the library set provided in a specific Linux distribution of UNIX variant.

(21)

21

Software - Load Balancers

Our load balancers used the Heartbeat package from the Linux High Availability project, which provides clustering solutions for UNIX-like operating systems in order to promote reliability, availability, and serviceability of network services. The package is a communications module that is now part of numerous clustering projects, such as the Pacemaker project, Linux Virtual Server, or the UltraMonkey project which both provide Cluster Resource Management. Heartbeat is now simply a messaging protocol (just like OpenAIS) that supports two-node configurations whereas Pacemaker is an advanced resource manager, which allows monitoring of the cluster nodes and unlimited node configurations. We decided to implement a version of Heartbeat along with the UltraMonkey load balancer / resource monitor instead of Pacemaker as it was simpler, less time consuming, and provided a profound understanding of clustering upon which we could then implement more advanced CRMs. It also worked perfectly for our hardware as we only planned to run two server nodes. The older version of Heartbeat still provides resource monitoring and management, which allows the cluster to take failed nodes offline and load-balance the traffic onto nodes that are sitting idle. Either way, Heartbeat still can be used UNIX clusters and it provides great foundation for more complex implementations. To this date, there are thousands of Heartbeat configurations in operation in the world.

(22)

22

Software - Web Services

The web services hosted on our clustered included a basic LAMP stack. A LAMP stack is a common acronym in the Linux community – Apache – MySQL – PHP set of packages, which form a basic web server. We figured that this would be a best representative of a service that needs to be made sure it is load balanced as well as up 24/7, with zero downtime. To simply put it, LAMP provides web site hosting services. It is one of the most popular ways to host a website on the Internet, as it is ran on around 50% of web servers that are currently online. The key component of every LAMP stack is the Apache HTTP web server, which played an important role in the early formations of WWW. Heavily in development since 1995, Apache played a major role in our project, as it was one of the original services that we provided on our cluster. Redundancy and high availability is important when it comes to web servers as they are accessed by an extreme amount of people. Each time a connection is made, a separate thread is created in the operating system, which requires additional processor resources. Through load balancing, we were able to divide the work of the servers by moving traffic to other server nodes. This allowed for greater performance and better uptime as the servers locked up a lot less with each overload.

(23)

23

Software - Database Services

The database services are just as much as a part of the web services section as being a standalone service package that does not require Apache or PHP. This means that users could be connecting to the cluster through a virtual IP address to store data in databases or run queries against the

databases. We implemented MySQL as this is one of the most popular of RDBMS in the wild currently. Of course, with a little bit of tweaking, we could have implemented PostgreSQL or something else.

(24)

24

Software - DNS

DNS played a huge (but transparent) role in our project as it provided domain name resolution. Each and every host in our cluster needed to have its IP mapped to a hostname. This avoids typing in the IP address whenever the user wants to access a host and allows the user to simply type in a human-readable name that is then resolved to the IP address. The way most operating systems do it is through the hosts file. On Windows, that file is located in the System32 directory, and on UNIX-like systems, it’s located in /etc/ directory. When a network contains many hosts, it becomes cumbersome to edit those files to avoid typing in an IP address to connect to a specific host.

Instead, a DNS server is used to provide a single point of management and querying of DNS entries. Our DNS ran a BIND9 daemon, which is by far the most popular DNS management service available.

(25)

25

Software – NTP

In order for replication to function correctly, it is important that the time in all of our systems is synchronized. To achieve this, we used Network Time Protocol (NTP). We dedicated one system to act as a local NTP server. It uses the ntpd application in both client and server modes. It is

configured to connect to 4 random time servers on the internet, as provided by the pool.ntp.org project, and calculate the correct time based upon their responses. This method may not achieve the exact time, but it will be accurate to within a few hundred milliseconds, which is close enough for our purposes.

Once this synchronization occurs, it can provide the time to the local systems on our network. The difference between our NTP client machines and the NTP server is that the clients connect only to our local server and are not operating in server mode. Using the same calculations as the server, the time on our servers were accurate to each other within 10 milliseconds, perfect for replication to work.

(26)

26

(27)

27

In-Depth - Load Balancers

As previously stated, our load balancers ran the Heartbeat / Ultra Monkey packages which gave us basic abilities to stop and start resources and monitor the availability. Heartbeat in reality is an authenticated communications tool which is used across the network for node failure detection which can be configured to talk using unicast UDP packets, broadcast UDP packets, multicast UDP packets, as well as non-IP serial links. The way Heartbeat manages the health of the machines is through Heartbeat pings. On top of Heartbeat, it is imperative to implement a Cluster Resource Manager in order to maintain resource configurations and management. Older versions of

Heartbeat and Ultra Monkey (which has a patched Heartbeat 1.2.x version) use haresources file and Linux Virtual Server as the CRM, whereas new versions of Heartbeat require the use of Pacemaker CRM (crm.xml file) which handles resource management better, along with optional GUI tools. Nonetheless, even base Heartbeat package that we used provides high availability solutions for many network services, such as Apache, MySQL, BIND9, IBM’s WebSphere, OpenLDAP, FTP, and the like.

(28)

28 To manage the resource availability across our two server nodes, we decided to set up two load balancers as this secured our network services’ availability in case one of the load balancers was taken offline. If that were the case, the other load balancer simply took over the job from the first one. The two load balancers shared a virtual IP address that was assigned on the server nodes (in our case, 192.168.2.235) which users would access in case they wanted to access various network resources. User would then send a packet request which UltraMonkey would accept. The load balancers would then split the incoming requests between the nodes or select a server to provide the answer to the packet request. Each load balancer would also monitor the other so both of them were on the lookout for any failures. This proved to be a much better solution than RoundRobin DNS. 2_{If a load balancer does not receive a response in a timely manner (which we can of course} set ourselves to fine tune the performance), UltraMonkey CRM will mark the node as dead and take it out of the operating pool.

To properly configure the load balancers we needed to set up some configuration files in the /etc/ directory on both the server nodes and the load balancing nodes. This directory is basically like System32 directory on Windows operating systems – it houses the UNIX configuration files used by the operating systems and this is also the directory where the UNIX administrator will spend most of his life. We have set up the following files in order to have our load balancers run without any issues:

• /etc/networking/interfaces

2_{RoundRobin DNS works in a way where numerous IP addresses link to a hostname and each DNS request make use of}

the next IP address. The problem with this technique is caching of DNS entries as well as there is no good load balancing

going on. DNS will link the user with the IP of the geographically closest machine, but will not check whether that

(29)

29

o This is the standard Ubuntu network interface configuration

• /etc/sysctl.conf

o This file sets up packet forwarding

• /etc/ha.d/ha.cf

o Sets specified machines as load balancers

• /etc/ha.d/haresources

o Sets up a virtual IP addresses on which the load balancers would listen for requests

• /etc/ha.d/authkeys

o Sets up the authentication between the load balancers

• /etc/ha.d/ldirectord.cf

o Linux Director that specifies the actual resource that is to be balanced

Server Nodes Configuration - /etc/networking/interfaces

The server nodes first needed a virtual IP which would be shared between the nodes and on which they would serve the content / network resources. We have decided during our preliminary network layout scheme discussions that this IP address would be 192.168.2.235. We have

configured this IP address as a Loopback interface that was part of one physical network interface. This was done through IP Aliasing feature on UNIX. It should be noted that UNIX-based operating systems support easy configuration of Interface Aliasing (having multiple IP addresses on one physical interface), as well as Interface Bonding (having numerous physical interfaces with one logical IP address) techniques. These techniques can aid a system administrator greatly in redundancy and load balancing. For example, Interface Bonding can be set up when providing replication services across Ethernet – we could use multiple physical interfaces and one IP address to move the data faster across the link(s).

(30)

30

Figure 5: ifconfig Command Output

As one can see, we have a Loopback lo:0 interface set up with a virtual IP of 192.168.2.235 which is setup on both server nodes that listen on that IP address. Whenever that IP address is requested, the load balancers simply pass the request to the nodes.

Server Node Configuration - /etc/sysctl.conf

This file was very important from the packet forwarding standpoint. Since the requests come to the 192.168.2.235 virtual IP address, they have to be announced by an interface. The issue with this is that our loopback address would be announced the server nodes, instead of the ldirectord (Linux Director) which is needed for processing the load balancing tasks. Linux Director in the Ultra-Monkey package uses LVS which provides Layer 4 switching using information available from the Layer 3 of the OSI stack. Linux Director uses a special scheduling algorithm which processes

(31)

31 connections from the clients to the server nodes. It forwards the accepted packets directly to the server nodes it chooses. Linux Director also provides data and connection integrity as the subsequent packets are forwarded to the same physical server node until a timeout happens. This provides data integrity where a client computer needs to pull up the same data for an extended period of time. Linux Director can also use three different ways of forwarding packets: Direct Routing, IP-IP Encapsulation, and NAT; our project used the Direct Routing which required our server nodes to accept traffic sent to the virtual IP address.

Figure 6: Linux Director Diagram

(32)

32 The scheduling algorithm actually encompasses various types of algorithms that can be used for packet forwarding. Per the Ultra Monkey documentation, there are 10 scheduling algorithms at this point3_:

• Least-Connection (lc): Allocate connections to the real-server with the least number of

connections.

• Weighted Least-Connection (wlc):

Weighted version of Least-Connection.

• Round-Robin (rr): Place the real-servers in a circular list and allocate connections to each

real-server in turn.

• Weighted Round-Robin (wrr): Weighted version of round-robin.

• Locality-Based Least-Connection (lblc): Try to assign connections addressed to the same

IP address to the same real-server. This is frequently used in conjunction with transparent http proxy services.

• Locality-Based Least-Connection with Replication (lblcr): Variation of Locality-Based

Least-Connection that allows a pool of servers for a given destination IP address to be maintained in satiations of high load.

• Destination-Hashing (dh): Use a static hash of the destination IP address to allocate

connections.

• Source-Hashing (sh): Similar to Destination-Hashing, but the source IP address is hashed.

• Shortest Expected Delay (sed): Allocate connections to the server that will service the

request with the shortest expected delay.

• Never Queue (nq): Allocate a connections to a idle real-servers if there are any, else use the

Shortest Expected Delay algorithm

(33)

33

The scheduling algorithms can be specified in the /etc/ha.d/ldirectord.cf which is the actual configuration file of the Linux Director.

Table 1: /etc/sysctl.conf – Packet Forwarding Configuration

# Enable configuration of arp_ignore option net.ipv4.conf.all.arp_ignore = 1

# When an arp request is received on eth0, only respond if that address is

# configured on eth0. In particular, do not respond if the address is # configured on lo

net.ipv4.conf.eth0.arp_ignore = 1

# Ditto for eth1, add for all ARPing interfaces #net.ipv4.conf.eth1.arp_ignore = 1

# Enable configuration of arp_announce option net.ipv4.conf.all.arp_announce = 2

# When making an ARP request sent through eth0 Always use an address that

# is configured on eth0 as the source address of the ARP request. If this

# is not set, and packets are being sent out eth0 for an address that is on

# lo, and an arp request is required, then the address on lo will be used.

# As the source IP address of arp requests is entered into the ARP cache on

# the destination, it has the effect of announcing this address. This is

# not desirable in this case as adresses on lo on the real-servers should

# be announced only by the linux-director. net.ipv4.conf.eth0.arp_announce = 2

# Ditto for eth1, add for all ARPing interfaces #net.ipv4.conf.eth1.arp_announce = 2

(34)

34 The load balancers were configured through the /etc/ha.d/ha.cf files which resided on both load balancers. Here is a dump of the file:

Table 2: /etc/ha.d/ha.cf

logfile /var/log/ha-log logfacility local0 bcast eth0 # Linux

mcast eth0 225.0.0.1 694 1 0 auto_failback off

node amish-seesaw1 node amish-seesaw2

respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster

Through this file, we can see that the two load balancing nodes were named amish-seesaw1 and amish-seesaw2 respectively. We chose to go with the amish-* naming convention to show that our project was almost in a way amish, built out of regular workstation computers. The above code snippet also shows the logfile location, multicast IP address for sending the Heartbeat pings, as well as the port on which the pings were to be heard on. The auto_failback parameter tells the load balancer whether the server node will remain a primary cluster node and wait until the other nodes fail or whether it will allow other nodes to take the traffic.

Load Balancer Configuration - /etc/ha.d/haresources

Next up, we have the /etc/ha.d/haresources file which specifies virtual resource IP address on which both load balancers would listen on:

Table 3: /etc/ha.d/haresources

amish-seesaw1 \

ldirectord::ldirectord.cf \ LVSSyncDaemonSwap::master \

(35)

35 This file sits on both load balancers and tells them to listen on that virtual IP address and use the Linux Director as a resource. It is the Linux Director’s job to load balance the server nodes. Of course, we can have numerous virtual IP addresses on numerous Ethernet interfaces and we can set them all up in the /etc/ha.d/haresources file if we desire so. This would allow our hosted network services to have a separate virtual IP address mapped to a specific service, even though they could belong on one physical network interface.

Load Balancer Configuration - /etc/ha.d/authkeys

The /etc/ha.d/authkeys uses a md5 hashed string that both load balancers check to make sure that the communications come from true sources. This is a rather simple file, as we can see; Corp122 was set as the hashed string. Of course, this file has to be readable by the root user only.

Table 4: /etc/ha.d/authkeys

auth 3

3 md5 Corp123

Load Balancer Configuration - /etc/ha.d/ldirectord.cf

Ldirectord.cf is the actual load balancer configuration, i.e. the Linux Director. It instructs the cluster by telling it, which network service is to be load balanced by a specified virtual IP address. Our project consisted of two network services that were load balanced: Apache web server and MySQL database management system. Let’s break down the file:

Table 5: /etc/ha.d/ldirectord.cf

checktimeout=10 checkinterval=2

(36)

36 autoreload=no logfile="local0" quiescent=yes virtual=192.168.2.235:80 real=192.168.2.1:80 gate real=192.168.2.2:80 gate fallback=127.0.0.1:80 gate service=http request="ldirectord.html" receive="Test Page" scheduler=rr protocol=tcp checktype=negotiate virtual=192.168.2.235:3306 real=192.168.2.1:3306 gate real=192.168.2.2:3306 gate service=mysql checktype = negotiate login = "ldirector" passwd = "ldirectorpassword" database = "ldirectordb"

request = "SELECT * FROM connectioncheck" scheduler = wrr

The above file sets the checktimeout and checkinterval values to 10 and 2 respectively. The first value is the timeout value by which the server nodes are marked as dead, or failed. If there is no response within 10 seconds, the node is marked as offline. Of course, this can be fine-tuned to our performance needs. Next up, we have the interval value in which we define how often the health of the server nodes needs to be checked at. We have set it to be 2 seconds, but of course, this can be changed to finely tune the performance.

The checktype = negotiate option sets what type checks should ldirectord perform. The negotiate attribute consists of having the load balancers request a simple connection / string from the servers nodes. If the server nodes allow successful connection / check, the link to these server nodes is

(37)

37 kept alive. If not, they are taken out of the pool until they get back online. Currently, Linux Director supports HTTP, HTTPS, FTP, IMAP, POP, SMTP, NNTP, MySQL, but adding your own is quite easy.

The virtual=192.168.2.235:80 and virtual=192.168.2.235:3306 are the virtual IP addresses and ports on which the load balancers will listen and on which the services will run on. We also have the real IP values, which are the real IP addresses of our server nodes. We also have various authentication passwords and request files which the load balancers will use the check the health of the nodes. In Apache web server for example, the request file is ldirecord.html which Heartbeat / UltraMonkey will check contents for to see if the node is still operational. Of course, after 10 seconds of inability to access the file, the node will be marked as dead by the load balancers. MySQL on another hand is checked for a specific row within a specific table of a specified database. If Heartbeat cannot pull up the row query, it will mark that specific host on the real IP address as dead.

After the aforementioned files were created and populated, we would start the Heartbeat daemon with the following commands:

Table 6: Starting Heartbeat daemon

/etc/init.d/ldirectord stop /etc/init.d/heartbeat start

As one probably can see, we stopped ldirectord daemon before we started Heartbeat. That was to ensure that ldirectord was not running when we start Heartbeat as Heartbeat starts ldirectord on its own. To check the health of the nodes, we would run the ldirectord ldirectord.cf status and ipvsadm –L –n commands on both load balancers to see which one was active at a given moment. Here is the output of an active load balancer for reference:

(38)

38

Figure 7: ldirectord ldirectord.cf status Command Output - Status Stopped

Figure 8: : ldirectord ldirectord.cf status Command Output - Status Running

Figure 9: ipvsadm -L -n Output on an Active Load Balancer

From the above screenshot, we can see that there are two services being balanced on the amish-seesaw1 load balancer. We can easily tell what services are they by port numbers (:80 and :3306, httpd and MySQL, respectively). We also see the virtual IP address and an amount of connections currently active on the either node.

An idle load balancer would not display any connections as it would not be in service:

(39)

39 All in all, it appears that the whole cluster configuration is quite easy. The tricky part is to of course configure the services to properly listen on both IP addresses (the real and the virtual IPs) which require separate sections for each other.

(40)

40

In-Depth - Web Services

The infamous Apache Foundation provided us with probably the most well-known web server in the world: Apache web server. Let us say that its reach is immense, which is over 70% of the web servers in the world that run it. It is a modular web server originally written by Robert McCool for the National Center for Supercomputing Applications. Apache itself has a way of setting up

redundancy with mod_proxy_balancer which is an Apache module, but we still chose to load balance it through own load balancer to keep everything standard.

The way the load balancing worked is quite similar to other services. In our project, client computers were able to connect to the web servers through the load balancers by requesting the virtual IP address or through the DNS hostnames created in the DNS configuration. This technically still would access the cluster by the virtual IP address, but would handle virtual hosts as well which is something we used for running multiple websites off of one IP address. What was really useful was that we could have assigned numerous virtual IP addresses in our /etc/ha.d/haresources file which could then be mapped to different virtual hosts in the Apache web server.

You might be wondering what a virtual host is. A virtual host is a technique of allowing web server to host multiple domain names on one physical machine. Virtual hosting can be hosted on a machine with only one or multiple IP address. This permits cost and hardware efficiency as multiple websites can be hosted from machine. There are three types of virtual hosting methods:

• Name-based

o Name-based virtual hosts allow a server with one IP address to host multiple domains by using DNS and Host header part of the HTTP header. This requires a

(41)

41 DNS server to point multiple hostnames / FQDNs (Fully Qualified Domain Names) to point to one IP address. When a user makes a connection with the web server based on the IP address, the web server views the HTTP header and connects the web browser session to that specific hostname / FQDN that is specified in the Host header. This requires the DNS server to be properly configured as a single IP address will resolve multiple domain names to the same machine (See the In-Depth – DNS Services section for further information).

• IP Address-based

o In this method, DNS server points multiple hostnames to multiple IP address. This technique requires the web server to be configured with either multiple physical interfaces with own IP address, one physical interface with multiple IP addresses (IP Aliasing), or virtual network interfaces on one physical network interface. This makes it appear as if the connection was made to multiple machines (as there are multiple IP addresses), even though in reality the client makes a connection to one physical machine with just multiple IP addresses.

• Port-based

o Each website is hosted on a different TCP port. The default port for HTTP servers is port 80, but websites can be hosted on different ports to which clients will connect. The issue with this is that many people are not familiar with non-standard port numbers. Web browsers by default connect to port 80 on the web server. Non-standard ports require the user to manually specify the port at the end of the FQDN (i.e., www.google.com:8080)

In our project we used the Name-based virtual hosting method as it met our needs and criteria since we decided to keep our load balancing configuration simple and only use a single virtual IP address for our cluster.

(42)

42

Table 7: drupal.amish.paradise.local Virtual Host

<VirtualHost *>

DocumentRoot /var/www-drupal

ServerName drupal.amish.paradise.local <Directory "/var/www-drupal">

allow from all Options +Indexes </Directory> </VirtualHost>

Table 8: wiki.amish.paradise.local Virtual Host

<VirtualHost *>

DocumentRoot "/var/www-wiki"

ServerName wiki.amish.paradise.local <Directory "/var/www-wiki">

allow from all Options +Indexes </Directory> </VirtualHost>

The above files are placed in the /etc/httpd or /etc/apache2 directories, depending on the operating system used. Sometimes, you will see them included in the main Apache configuration file

(httpd.conf ) but sometimes they are in separate files. They define the virtual hosts by specifying the virtual host port, directory (DocumentRoot), name (ServerName), and various options

concerning access and indexing. The httpd.conf (or apache2.conf file in our case) did not have any specific configurations. A base, default file was fine since our main concern was in the virtual host configurations.

(43)

43

In-Depth - Database Services

We chose to use MySQL as our database engine and management system due it being extremely popular on the UNIX platform. UNIX and MySQL simply go hand-in-hand. The load balancing part of this section of the project was quite easy, but the replication on the other hand proved to be tricky which we will discuss in a later section. The nature of databases is actually very delicate as databases deal with dynamic content – the type that changes very quickly, especially in a network with many users or websites.

MySQL DBMS was load balanced through our Heartbeat package. We have added a special

configuration to our /etc/ha.d/ldirectord.cf file, which also allows many other things for us to load balance, such as LDAP, POP3/SMTP, DNS, DHCP, and such.

Table 9: MySQL Specific Configuration in /etc/ha.d/ldirectord.cf

virtual=192.168.2.235:3306 real=192.168.2.1:3306 gate real=192.168.2.2:3306 gate service=mysql checktype = negotiate login = "ldirector" passwd = "ldirectorpassword" database = "ldirectordb"

request = "SELECT * FROM connectioncheck" scheduler = wrr

Whichever content management system we used had to go through the virtual IP that would direct the queries to whichever database system was available. Since the virtual IP caused the user to connect to only one database on that one system, whatever query was run would be recorded in that specific database on that specific system. This means that the data would not show up on the other database on the other system. As part of our goal of this project, we decided to implement a

(44)

44 master-to-master database replication scheme to overcome this issue. Whatever was recorded on one node was done instantaneously on the other one as well which kept both of them in perfect sync. Also, if one node were to fail, once it got back up, the databases would sync to the one with the latest information. Since this was a simple two-node configuration, we decided to use the regular MySQL package. If we were to use more than 2 nodes, then we would have to deploy the MySQL Cluster package (free Community download available). MySQL Cluster is a package that encompasses the MySQL base server package and provides multi-node clustering capabilities. It is more of a dedicated project aimed at providing true enterprise clustering services with maximum availability.

The replication configuration was configured in the /etc/mysql/my.cnf file since everything was performed inside of MySQL:

Table 10: /etc/mysql/my.cnf Output

#

# The MySQL database server configuration file. #

# You can copy this to one of:

# - "/etc/mysql/my.cnf" to set global options, # - "~/.my.cnf" to set user-specific options. #

# One can use all long options that the program supports.

# Run program with --help to get a list of available options and with # --print-defaults to see which it would actually understand and use. #

# For explanations see

# http://dev.mysql.com/doc/mysql/en/server-system-variables.html # This will be passed to all mysql clients

# It has been reported that passwords should be enclosed with ticks/quotes # escpecially if they contain "#" chars...

# Remember to edit /etc/mysql/debian.cnf when changing the socket location. [client]

port = 3306

(45)

45

# Here is entries for some specific programs

# The following values assume you have at least 32M ram

# This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] # # * Basic Settings # # # * IMPORTANT

# If you make changes to these settings and your system uses apparmor, you may # also need to also adjust /etc/apparmor.d/usr.sbin.mysqld.

# user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp language = /usr/share/mysql/english skip-external-locking #

# Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. #bind-address = 0.0.0.0 # # * Fine Tuning # key_buffer = 16M max_allowed_packet = 16M thread_stack = 128K thread_cache_size = 8 #max_connections = 100 #table_cache = 64 #thread_concurrency = 10 #

# * Query Cache Configuration #

query_cache_limit = 1M query_cache_size = 16M #

# * Logging and Replication #

(46)

46

# Both location gets rotated by the cronjob.

# Be aware that this log type is a performance killer. #log = /var/log/mysql/mysql.log

#

# Error logging goes to syslog. This is a Debian improvement :) #

# Here you can see queries with especially long duration #log_slow_queries = /var/log/mysql/mysql-slow.log #long_query_time = 2

#log-queries-not-using-indexes #

# The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change.

server-id = 2 master-host = amish-barn001 master-user = root master-password = Corp123 master-port = 3306 log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M binlog_do_db = joomla binlog_do_db = test binlog_do_db = wikidb replicate_do_db = joomla replicate_do_db = test replicate_do_db = wikidb binlog_ignore_db = mysql auto-increment-increment = 10 auto-increment-offset = 2 # # * BerkeleyDB #

# Using BerkeleyDB is now discouraged as its support will cease in 5.1.12. skip-bdb

#

# * InnoDB #

# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many!

# You might want to disable InnoDB to shrink the mysqld process by circa 100MB. #skip-innodb

#

# * Security Features #

# Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/

(47)

47

# For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 16M [mysql]

#no-auto-rehash # faster start of mysql but no tab completition [isamchk]

key_buffer = 16M #

# * NDB Cluster #

# See /usr/share/doc/mysql-server-*/README.Debian for more information. #

# The following configuration is read by the NDB Data Nodes (ndbd processes) # not from the NDB Management Nodes (ndb_mgmd processes).

#

# [MYSQL_CLUSTER]

# ndb-connectstring=127.0.0.1

#

# * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored.

#

!includedir /etc/mysql/conf.d/

The above is an entire MySQL configuration file. It seems big and scary at first, but if one really takes the time to read it through, it appears a lot simpler that one thinks. Plus it’s far shorter than the Apache web server’s configuration file. What we are really interested in though, is the following set of lines:

(48)

48

Table 11: Simple MySQL Replication Configuration

server-id = 2 master-host = amish-barn001 master-user = root master-password = Corp123 master-port = 3306 log_bin = /var/log/mysql/mysql-bin.log expire_logs_days = 10 max_binlog_size = 100M binlog_do_db = drupal binlog_do_db = test binlog_do_db = wikidb replicate_do_db = drupal replicate_do_db = test replicate_do_db = wikidb binlog_ignore_db = mysql

Since both server nodes were “master” servers to each other, they both had almost the same

configs. The only difference laid in the server-id and the master-* set, which are always the opposite of the server with whom we were performing replication. Now, let’s break down this config.

The master-* settings set the master server from which the holder of this configuration set will replicate the database data from. Since both servers are using master-to-master replication, they both will have the same configurations and will contain information of the opposite server. The log_bin part is important in a way that it specifies the Binary Log file of the database which provides the information concerning replication and where it was left off. Technically, the file has to be the same on both nodes in order for both servers to have the same data within their databases. On another hand, binlog_do_db and replicate_do_db attributes specify the databases to be replicated (of course, the settings in this section have to be the same on both servers) through the Binary Log whereas the binlog_ignore_db specifies which databases to ignore. Binary Log contains every statement concerning the update data since all the statements and queries are store as events within the log. This log gets sent from the master server to the slave servers so the slave server can

(49)

49 replicate the master database. This file is also useful for database recovery as it contains everything that has happened in the database. The file is written immediately after the SQL query completes. The file can be also further inspected using the mysqlbinlog utility which comes standard with every base installation.

We also have miniscule options such as the Binary Log file size and how often it should be deleted, but they do not need to be discussed at this moment. There is a lot more options that be used to enhance MySQL replication, but the above is the actually juice that fuels it and gets the job done. If one were to read the MySQL Reference Manual, they would also find all sorts of options that could be used with the replication.

(50)

50

In-Depth - Replication and Storage Services

Static data replication services were performed through the UNIX rsync utility. We have originally looked at various other software and hardware projects to try to develop best schema for our needs of storage. We also looked at numerous schemas and replication techniques. Some of them

included using the project called Unison or using dedicated hardware NAS/SAN setups, which would take up far more time in development. If we were to use those, we would require another server that would be the actual storage devices on which all the data would reside and be shared by our server nodes. Of course, the problem with this scenario is that the question of what would happen if that storage device were to go offline? We would then be left with no static data storage and that would defeat the whole purpose of a high availability cluster. We could create a cluster of SAN devices with elaborate RAID implementations, which were beyond the scope of our research and time, not to mention that we were also going for cheap solutions. We needed something that would stay on both nodes and would be mirrored accordingly in a master-to-master fashion (just like MySQL database replication). Our only quick and dirty solution was an old UNIX utility that is well known in the UNIX world, rsync.

What rsync really does is it syncs the changes in files between two nodes through an encrypted using through SSH (Secure SHell). We were able to sync any changes to the static content by running the rsync script periodically (usually every 1 to 3 minutes) on both machines at the same time. The script would change any files with respect to the newer files. This is where NTPD daemon (discussed in a later section) played an important role as we needed to have all the nodes synced to one time source so the time could be in perfect sync. Any time offset would break the timestamps of the files.

(51)

51 Now, the way the rsync algorithm worked is that it would check both nodes and see if there were any discrepancies in the data. If there were, it would securely transfer them over to the node that was either missing the file, or was missing the necessary contents.

We have set rsync to check the nodes on which it ran to run its algorithm every minute. We could have changed this interval to anything we wanted, but we figured that with the limited hardware, this was the best as to not affect the general performance of the nodes and the network. We also wanted to make sure that rsync had plenty of time to finish before another algorithm was ran. We have set the interval through Cron, which is “Scheduled Tasks” daemon for UNIX. It is pretty much standard on every UNIX and UNIX-like machine, just like rsync.

Table 12: rsync Script Used to Replicate Static Content

#!/bin/sh

rsync -a /var/www-wiki root@amish-barn001:/var/ rsync -a /var/www-drupal root@amish-barn001:/var/

The above script is quite simple to read. All it does is it sync two directories on our servers nodes:

• /var/www/www-wiki

o MediaWiki directory

• /var/www/www-drupal

o Drupal directory

Both directories contain static files and subdirectories for our two Content Management Systems that we ran. Whatever changes in those files, it shows up on both servers. The rsync algorithm is smart enough to differentiate which file is newer (by checking the timestamp on the file) in order to

(52)

52 replicate only the newest file. We can also set it to replicate any other file or folder, but we chose those two directories as they only contained data that was needed to be synchronized.

(53)

53

In-Depth - DNS Services

In Ubuntu, rather than make changes directly to the named.conf file, custom configurations are inserted into the /etc/bind/named.conf.local file. This file, as shown in Table 10, contains the entries for different DNS zones. We created a forward lookup zone called amish.paradise.local and a reverse lookup zone for the IP scheme of our network. In both zones, the type is set to master, which makes BIND return authoritative answers to any queries. The file entries signify where the files with the zone entries are located.

Table 13: /etc/bind/named.conf.local

//

// Do any local configuration here //

// Consider adding the 1918 zones here, if they are not used in your // organization //include "/etc/bind/zones.rfc1918"; zone "amish.paradise.local" { type master; file "/etc/bind/zone.amish.paradise.local"; }; zone "2.168.192.in-addr.arpa"{ type master; file "/etc/bind/2.168.192.in-addr.arpa"; };

Table 11 shows the file containing the forward lookups for our domain. A forward lookup is the lookup of a domain name with the response being the IP address. This config file contains SOA in the zone line, which indicates that it is the Start of Authority. The NS line points to the authoritative name server. The rest of the lines are the actual entries, first specifying the hostname, followed by the type of record (IN A), then the IP address of the host.

(54)

54

Table 14: /etc/bind/zone.amish.paradise.local

; BIND db file for amish.paradise.local $TTL 86400

@ IN SOA amish-shepherd.amish.paradise.local. root.amish.paradise.local. (

2009081301 ; serial number YYMMDDNN

28800 ; Refresh 7200 ; Retry 864000 ; Expire 86400 ; Min TTL ) NS amish-shepherd.amish.paradise.local. $ORIGIN amish.paradise.local. amish-shepherd IN A 192.168.2.253 amish-gateway IN A 192.168.2.254 amish-barn001 IN A 192.168.2.1 amish-barn002 IN A 192.168.2.2 amish-seesaw1 IN A 192.168.2.240 amish-seesaw2 IN A 192.168.2.241 wiki IN A 192.168.2.235 drupal IN A 192.168.2.235 virtual IN A 192.168.2.235

Table 12 shows the reverse lookup to the IP scheme. The reverse lookup is the opposite of a forward lookup. The query is an IP address and the response is the Fully Qualified Domain Name(FQDN). This config also indicates that the server is the Start of Authority. The main difference between the forward lookups and the reverse lookups is the hostname in the reverse lookup has to be the FQDN. The actual entries are of the same format as the forward zone, being IP lookup, entry type (IN PTR), and FQDN.

Table 15: /etc/bind/2.168.192.in-addr.arpa

; BIND db file for amish.paradise.local $TTL 86400

(55)

55 @ IN SOA amish-shepherd.amish.paradise.local.

root.amish.paradise.local. (

2009081304 ; serial number YYMMDDNN

28800 ; Refresh 7200 ; Retry 864000 ; Expire 86400 ; Min TTL ) IN NS amish-shepherd.amish.paradise.local. 1 IN PTR amish-barn001.amish.paradise.local. 2 IN PTR amish-barn002.amish.paradise.local. 240 IN PTR amish-seesaw1.amish.paradise.local. 241 IN PTR amish-seesaw2.amish.paradise.local. 253 IN PTR amish-shepherd.amish.paradise.local. 254 IN PTR amish-gateway.amish.paradise.local. 235 IN PTR virtual.amish.paradise.local. 235 IN PTR wiki.amish.paradise.local. 235 IN PTR drupal.amish.paradise.local.

(56)

56

In-Depth - Management Services

We decided to make our lives easy and use GUI for management of the cluster server (even though CLI is just as easy for us). We used Webmin for the management of all the nodes as it already had Heartbeat module which made editing Heartbeat settings and adding nodes a quick breezy, especially for the less experienced UNIX specialists. Webmin in reality is a web-based GUI management tool for UNIX-like operating systems. It builds a graphical shell for the various configuration scripts within the /etc and provides a streamlined, consistent cross many platforms easy to use administration tool. We have set up Webmin on each individual machine and

configured the main management PC (computer not related to the cluster) and configured it to talk to each node, including the load balancers. Of course, if we were to add more server nodes, our best bet would have been to set up a PXE server that loaded all the software (or should we say, an image) and Webmin for us. Then we could have set the main management PC to scan for new nodes automatically. This would have scored extra points in manageability of the cluster farm.

(57)

57

Figure 11: Webmin Front Page

The Webmin front page shows us the status of a node that we have logged into. We are shown various information concerning the processor, memory, hard disk space, as well as uptime. This all can be also found using various UNIX commands such as cat /proc/cpuinfo, uname –a, or free.

(58)

58

Figure 12: Cluster Webmin Servers

Cluster Webmin Servers allow the administrator to cluster-administer various machines on his network. This simplifies various administrative tasks such user and group addition / deletion, Cron jobs, and various maintenance commands. This requires the administrator to log into only one machine to administer all of the other machines. Of course, with security in mind, the administrator needs to know the root password of the machines that he or she want to administer as Webmin is only the graphic / web interface of UNIX commands – this simply means that whatever Webmin does, it has to be translated into commands on the fly.

(59)

59

Figure 13: Cluster Users and Groups

In the above figure, we are presented with the Cluster Users and Groups window in which we can modify users and groups across the cluster nodes. This provides easy changes in our /etc/passwd and /etc/groups files on our Linux boxes. This feature can be particularly useful when deploying new services which require dedicated user groups.

(60)

60 Cluster Shell Commands allows users the run commands on one or multiple, if not all, servers nodes (as well as load balancers). This can great simplify maintenance tasks which otherwise would require logging into each node.

(61)

61

Figure 16: Webmin Heartbeat Monitor

This the Heartbeat Monitor landing page in Webmin. Simple GUI versions of the text files that we have configured in the project simply overwrite anything that is in the configuration files.

(62)

62

Figure 17: Heartbeat Monitor Configuration Options

Above is a Webmin version of the Heartbeat configuration. Seeing the graphic version of the

Webmin module of the Heartbeat probably makes all the command line legwork almost a waste, but it does leave the sense of accomplishment as well as the overall big picture of what is going on behind the scenes. Either way, if one requires quickly configure extra nodes as well as change various port settings and interfaces, or even dead time, this is the place to go.

(63)

63 Webmin proved to be an excellent tool for what we were doing. Fortunately, many CRMs and High Availability solutions provide their own GUI interfaces which can be even more robust in managing cluster farms. Graphical management tools prove to be extremely useful when dealing with many servers nodes in a fairly large cluster farm as the management tasks can grow in size. Easy to use and intuitive interfaces as well as automated tasks and ability to see the big picture are some of the key things to look for in cluster solutions. That is one of the reasons why many businesses choose to pay a large amount of money for dedicated solutions to provide highly available solutions for their networks. Unfortunately, many new startups, as well as many small businesses, cannot afford expensive hardware and support. This is where UNIX-based homebrew clusters come into play. They can provide great performance for little cost, with sacrificed ease of use.

(64)

64

High-Availability Solutions