Number 990903
DNS ROUND ROBIN HIGH-AVAILABILITY LOAD SHARING
An increasing number of organizations are clustering web and other application servers together in order to achieve highly available systems that can manage an increasing base of client traffic. While the web has primarily driven this trend, other applications are also being deployed in a clustered environment.
One area of clustering that is important is in the realm of balancing or sharing the load of client traffic amongst multiple servers. Recent releases of DNS/BIND have included a capability to allow load sharing in what is known as a round robin technique. Although this is widely available, easy to use, and low cost, it suffers from a few limitations; one of which is its inability to handle server failures. This white paper discusses how to setup DNS round robin for load sharing in a multiple server environment. Finally, it shows how to use the Polyserve Understudy product together with DNS round robin to eliminate the server failure problem. The purpose of this paper is to provide instruction on how to setup DNS round robin in a highly available server cluster.
This paper contains the following sections:
• Configuring DNS Round Robin
• Limitations of DNS Round Robin
• Configuring DNS and Understudy for High Availability Load Sharing Configuring DNS Round Robin
Features were placed in BIND 4.9 that allow simple load sharing to be configured among multiple servers. An excellent overview exists on page 259 of the most recent O’Reilly book: DNS and BIND (O’Reilly & Associates, 1998, Third Edition). This book discusses BIND 4.8.3 (which used a shuffle address record scheme and required a patch) and BIND 4.9, which has embedded load-sharing capabilities. We recommend using BIND 4.9 and later versions (especially BIND 8) and will discuss how to set this up below.
BIND 4.9 and more recent versions now allow A records (address records) to be duplicated for a specific host, with different IP addresses. The name server then alternatively rotates addresses for any one name that has multiple A records.
This is known as DNS round robin.
As an example, let us assume that we at Polyserve have three (3) web servers: their real names and IP addresses are:
www.polyserve1.com 150.1.1.1 www.polyserve2.com 150.1.1.2 www.polyserve3.com 150.1.1.3
If we wanted to set up our servers so that DNS requests by clients (in this case for web server access) are round robin rotated, we can do so by placing multiple A records in the authoritative name server files. For our example above, we want all clients to access our site by using www.polyserve.com, but for these requests to be shared between our three servers using DNS round robin. To do so, we need to place the following A records in the name server:
www.polyserve.com. 60 IN A 150.1.1.1
www.polyserve.com. 60 IN A 150.1.1.2
www.polyserve.com. 60 IN A 150.1.1.3
WHITE PAPER 918 Parker Street
Berkeley, California 94710 (510) 665-2929
www.polyserve.com
Note a few very important items here. The first is the ‘.’ after the name www.polyserve.com on each A record -- this is mandatory or the name server will append the domain origin to the name. Also, the other issue is the TTL (time to live value). The time to live field tells the servers to remove these entries from the name cache after this many seconds. This is the 60 shown on the A record in the example above. The value of 60 seconds insures that this value is not cached for a great length of time on intermediate name servers that don’t support round robin. Again for a much more thorough discussion of these and other related issues, see the O’Reilly DNS and BIND book mentioned above.
Most of the high profile systems, such as Solaris, NT, and Linux (as well as others) support BIND 4.9 and later versions.
The best route to be sure your specific name server supports the DNS round robin feature is to contact your vendors technical support line or access their web page and find out from them directly. As an example, Microsoft NT requires SP4 for round robin support. Most of the Linux vendors, as well as Solaris, come with these capabilities already included.
DNS round robin supports pools of servers for any applications, not just web servers. Pools of web, email, ftp, database, and other servers can all be setup to load-share using DNS.
Limitations of DNS Round Robin
DNS Round Robin has a number of advantages and a few limitations. The main advantage is its simplicity and low cost.
A simple addition to the name-server configuration file allows a pool of servers to be clustered and appear to act as a single host to the clients, when in reality requests are being alternated between all the hosts in the pool. It is standard software in most of these systems (or can be obtained at no or low cost). For this reason it is very effective for small to medium size business or organizations. It is extremely popular among ISPs, e-commerce sites, universities, and other cost sensitive sites.
Load Balancing vs. Load Sharing
There are limitations with this architecture and they should be noted. The first is that DNS round robin is actually not a load balancing mechanism; instead it is a load sharing mechanism. Load balancing has become popular at large enterprise web sites that need to support many hosts at potentially different geographic locations. These hardware and software solutions measure the “load” on the systems and gauge where to send client requests in order to spread the load among the servers. There are a variety of algorithms to do this, including using:
• CPU load
• Response Time
• Least Connection
• Assigned Weight
• Service Level Agreements
• Custom Rules
• Simple Round Robin (but not using DNS)
While these products are all good at what they do, they tend to be costly to employ, and therefore are effective for larger organizations. Most of them include security, integrated management, application monitoring & failover, and sophisticated APIs for defining homegrown service monitors.
DNS round robin does not gauge server “load” in any way; instead it simply alternates client requests among the pool of servers defined in the name server files. This basically shares the load among multiple hosts. One or more of the hosts in the pool will tend to get more activity than the other servers. DNS round robin should be quite effective up to about 10 servers per virtual cluster (a virtual cluster being defined as a pool of servers acting as a single server for client requests).
These hosts would all be in the same physical location, most likely on a number of different high-speed switch ports.
Our research has shown that load sharing is effective for small to medium size organizations. Recognize that at some point you may need to consider a larger product that does load balancing and provides the scalability that DNS round robin will not allow. This is especially true when multiple site support is required.
DNS Round Robin and Server Failures
How does DNS operate if one of the servers crashes or is down for maintenance? Simply enough, requests from clients will still go to this IP address when it is its turn in the round robin pool. Existing client sessions will still be sent to this address. The result is that all of these requests will go to hosts that will not operate correctly. This is a serious limitation of the DNS round robin feature. Many shops simply cannot allow a potentially large number of client requests to go unanswered. This is obviously not good for business.
In the next section, we will show how this problem can be rectified without having to purchase sophisticated traffic management solutions.
Configuring DNS and Understudy for High Availability Load Sharing
Polyserve Understudy is a high availability clustering software product that currently runs on Linux, Solaris, and NT.
Understudy runs on each server in the cluster and performs automatic failover and service monitoring and can be configured with DNS round robin to eliminate the server failure or maintenance problem discussed in the limitation section above. By using Understudy with DNS round robin, we will demonstrate how virtual pools of servers can be configured to guarantee that all client requests are being sent to active, operational servers, even when a portion of the server pool is down.
Understudy High Availability Server Configuration
Before explaining how to configure Understudy and DNS round robin, let’s first understand how Understudy works using the three server example we discussed above. Assume that we at Polyserve have three web servers: www.polyserve1.com, www.polyserve2.com, and www.polyserve3.com. Using Understudy, we define a virtual host www.polyserve.com which is the client access name for all three Polyserve servers. Figure 1 shows how Understudy manages this virtual host and server pool.
Understudy manages the virtual server www.polyserve.com by allowing one host in the real server pool to be marked the primary, while the others are the backup hosts in the pool. As can be seen from figure 1, www.polyserve1.com is the primary server, while www.polyserve2.com is the first backup server and www.polyserve3.com is the final backup server.
When a client accesses www.polyserve.com the requests are all sent to the primary host www.polyserve1.com. This is .
Router
www.polyserve1.com (Primary)
www.polyserve2.com (Backup1)
www.polyserve3.com (Backup2) Clients access “www.polyserve.com”
Inactive
Inactive ACTIVE
Figure 1
Web
because the virtual host IP address for www.polyserve.com is mapped to the MAC address for www.polyserve1.com. Note in figure 1 that the two backup servers are currently inactive in the cluster. This doesn’t mean these servers are performing other functions, it just means that in the virtual cluster www.polyserve.com, they are not handling client requests.
Now let’s see what happens when one of the servers fails. Figure 2 shows how Understudy manages the cluster when www.polyserve1.com goes down (or is taken out of the cluster for maintenance reasons). Understudy runs on each server in the cluster and periodically communicates with each to validate that all servers are up and operational. Understudy also can be configured to monitor specific services such as HTTP, SMTP, FTP, and various TCP/IP ports. In the case of figure 2, Understudy has detected that www.polyserve1.com is down. To perform the failover, Understudy sends a gratuitous ARP that tells the router that www.polyserve2.com is now handling all traffic for the virtual host www.polyserve.com.
Understudy marks the primary as down and the first backup server as active. The second backup server (www.polyserve3.com) is still inactive in the cluster.
When www.polyserve1.com is back up, it sends a gratuitous ARP that tells the router that it is now handling requests for www.polyserve.com. In this way, automatic failover is done in a way that is not visible from the client who is simply accessing www.polyserve.com and doesn’t know which of the three servers is actually handling their requests. Understudy can support 2 or more hosts per cluster (up to a maximum of 10 servers).
Configuring Understudy to support DNS Round Robin
Understudy can also be configured to support highly available DNS load sharing. Again, our three-server Polyserve web site example illustrates the point. Figure 3 shows how DNS and Understudy can be configured to guarantee that all round robin load sharing server requests are sent only to servers that are active. First, note that instead of defining a single www.polyserve.com virtual host, we now define 3 virtual hosts, one for each server in our DNS round robin pool.
. Router
www.polyserve1.com (Primary)
www.polyserve2.com (Backup1)
www.polyserve3.com (Backup2) Clients access “www.polyserve.com”
ACTIVE
Inactive DOWN
Figure 2
Web
Let’s look at the first virtual host: www.virtualpoly1.com. We configure www.polyserve1.com as the primary server, and www.polyserve2.com and www.polyserve3.com are the backup servers (in that order). The IP address of the virtual host www.virtualpoly1.com is 160.1.1.1. All requests to www.virtualpoly1.com (160.1.1.1) go to www.polyserve1.com since this is the primary server for this virtual host. If www.polyserve1.com were to fail then www.polyserve2.com would be the next backup in this virtual host pool. The configuration for this virtual cluster is:
Virtual Host www.virtualpoly1.com (IP address 160.1.1.1) Primary is www.polyserve1.com
Backup #1 is www.polyserve2.com Backup #2 is www.polyserve3.com
In the same manner, two more virtual hosts are defined. In the second cluster the primary is www.polyserve2.com; while the third cluster has as its primary server www.polyserve3. They have the following configurations:
Virtual Host www.virtualpoly2.com (IP address 160.1.1.2) Primary is www.polyserve2.com
Backup #1 is www.polyserve3.com Backup #2 is www.polyserve1.com
Virtual Host www.virtualpoly3.com (IP address 160.1.1.3) Primary is www.polyserve3.com
Backup #1 is www.polyserve1.com Backup #2 is www.polyserve2.com
.
web
Router
.
DNS Server
Clients
www.polyserve.com
Virtual Host www.virtualpoly1.com
Virtual Host www.virtualpoly3.com Round Robin Setup:
www.polyserve.com 60 IN A 160.1.1.1 www.polyserve.com 60 IN A 160.1.1.2 www.polyserve.com 60 IN A 160.1.1.3
Figure 3
Virtual Host www.virtualpoly2.com
.
.
160.1.1.1
160.1.1.2
160.1.1.3
Finally, the DNS server is configured for round robin with the following A records added to the correct files:
www.polyserve.com 60 IN A 160.1.1.1
www.polyserve.com 60 IN A 160.1.1.2
www.polyserve.com 60 IN A 160.1.1.3
At this point you might be asking, “Why have we defined three virtual hosts with each of the three Polyserve web sites in each virtual host?” As we will now show you, the key is the ordering of the primary host in each virtual host cluster. Let’s run through an example to see how this setup operates.
Figure 3 shows three clients that will all attempt to access www.polyserve.com. Client #1 makes the first attempt. A DNS lookup is done and because 160.1.1.1 is the first A record in the DNS file, it is returned as the target address. This is the virtual host www.virtualpoly1.com and because Understudy is configured for the primary server to be www.polyserve1.com in this virtual host pool, it receives the client request. When client #2 makes the request, the DNS lookup returns the next A record in the DNS server (160.1.1.2), which is www.virtualpoly2.com and is handled by www.polyserve2.com (the primary server for the virtual host www.virtualpoly2.com). Finally, when client #3 accesses www.polyserve.com, its request is ultimately managed by www.polyserve3.com.
So why did we need 3 virtual host clusters for this to work? As we will show, it is when a server fails or needs to be taken offline that this setup is most effective. If no servers fail, then the cluster operates just as you would expect. Each server handles 1/3 of the requests via the DNS round robin entry in the name server file. Figure 4 shows what happens when www.polyserve1.com (the real server) goes down. If Understudy was not used, the DNS round robin setup would forward every third request to this server and each of these requests would not be successfully handled (they would go into the proverbial “void”).
Figure 4 .
web
Router DNS Server
Clients
www.polyserve.com
Virtual Host www.virtualpoly1.com
Virtual Host www.virtualpoly3.com Round Robin Setup:
www.polyserve.com 60 IN A 160.1.1.1 www.polyserve.com 60 IN A 160.1.1.2 www.polyserve.com 60 IN A 160.1.1.3
Virtual Host www.virtualpoly2.com 160.1.1.1
160.1.1.2
160.1.1.3
But with Understudy configured as shown in figure 4, as long as a single server is up and operational, all client requests will go to active servers. Let’s see how this works. Assume www.polyserve1.com goes down. This is shown as the red host in each virtual host pool. It could have crashed, HTTP might no longer be operating correctly (maybe it went down or crashed), or the server might have been removed for maintenance reasons. Within a few seconds (the default is 10 seconds), Understudy realizes the virtual host cluster www.virtualpoly1.com has lost its primary server. It then makes www.polyserve2.com the active server for this virtual host. Each new client that is resolved by DNS to 160.1.1.1 (one out of every 3 requests will go to this address) goes to the virtual host www.virtualpoly1.com. Since this virtual host is now pointing to www.polyserve2.com, this host now handles all requests for www.vitualpoly1.com. Since the other virtual hosts (virtualpoly2 and virtualpoly3) primary servers are up, these servers continue to handle each request that comes its way. The fact that www.polyserve1.com went down has no effect on the requests to www.virtualpoly2.com or www.virtualpoly3.com.
The fact is then that each request that is sent by DNS to the 160.1.1.1 host (virtualpoly1.com) now actually go to www.polyserve2.com instead of www.polyserve.1.com (which is down). Two out of every three round robin client requests go to www.polyserve2.com, while the third goes to www.polyserve3.com. Each client request through DNS goes to an active, operational machine and is not transferred to the “void.” Obviously www.polyserve2.com is handling more requests than www.polyserve3.com, but this is certainly much better than each third request not being handled correctly.
The more servers in the pool, the less load each will have to handle in case of a server failure.
And what about existing sessions that do not need to go through DNS again? Will they continually be sent to the failed server? In fact they will also be re-routed to www.polyserve2.com, since the gratuitous ARP message forces all requests to the virtual host 160.1.1.1 to the backup server. Therefore, even existing sessions will be routed to working servers.
One issue that often comes up is that if this backup server does not have the same data as the original server, it is possible that the client request will not have access to the same data. Fortunately, Understudy supports data replication and synchronization, so the servers can automatically have their data replicated and synchronized for complete cluster control (if this is required).
When www.polyserve.com comes back online, all requests for 160.1.1.1 will now be routed to www.polyserve1.com (since it is the primary for this virtual host and will be used whenever it is up). What happens if both www.polyserve1.com and www.polyserve2.com go down? www.polyserve3.com will handle all client requests for www.polyserve.com.
Together, Understudy and DNS round robin are a powerful, low cost alternative to purchasing expensive, complex load balancing and clustering solutions. A multitude of applications can be supported in this configuration, including: HTTP, FTP, SMTP, and various TCP/IP applications. Understudy has a range of other features and as discussed earlier, supports Linux, Solaris, and NT. An evaluation copy of the product can be requested online by going to PolyServe’s web page at www.polyserve.com.