Load Balancing Algorithms

Load balancing is a strategy for distributing requests from a single IP address and port to multiple processes across multiple servers. When it’s time to bring a service to pro- duction, it provides three essential benefits: failover, scalability, and decreased latency. First, even in the smallest deployment, load balancing ensures that a service remains available when a portion of a Ruby process is unable to process requests (a capability known as failover). Second, as you need to handle increased capacity, load balancing allows you to scale by adding processes. Third, with a good algorithm, load balancing helps maintain the best possible latencies in the face of response time variation.

Services can be scaled horizontally or vertically. Vertical scaling works by increasing the resources (memory, CPU, I/O, and so on) of each server so each server can do more work. Horizontal scaling, on the other hand, means increasing capacity by adding more servers. Services are well suited to horizontal scaling; therefore, balancing the load across multiple servers is particularly important. In a service-oriented design, the load balancer sits between the application processes and the service processes. Figure 8.1 illustrates how the different pieces fit together.

As a load balancer receives each request, it must decide which process (or backend) should handle the request. The simplest algorithm a load balancer could implement might be to randomly select a known backend, but the result would not be very bal- anced. Some backends might receive too many requests while others sit idle. Fortu- nately, there are a number of better load balancing strategies that are easy to use, including round-robin, least-connections, and URI-based load balancing.

ptg

Round-Robin Load Balancing

A load balancer operating using a round-robin algorithm keeps an internal counter of which backend was most recently used, and it uses the next choice from a sequential list of all known backends. As each request arrives, the next backend is selected, and the counter is incremented, looping back to the beginning of the list when necessary. If every request requires the same amount of computation to process, the workload is distributed evenly. Figure 8.2 shows how round-robin load balancing works.

Round robin is straightforward and can provide two of the primary benefits of load balancing—horizontal scalability and failover—but it doesn’t perform well when there is variability in response times. If a single-threaded Ruby process is processing a slower request and a fast request is distributed to it, the fast request has to wait until the slower request is finished before it can start to be handled. In practice, most web services have some actions that respond much more quickly than others, so an algorithm that is designed to handle such situations gracefully is often preferable. The least-connections algorithm, discussed next, is designed specifically to handle the scenario of variable response times.

Rails Application

Service Load Balancer

Service Service Service

ptg

Least-Connections Load Balancing

With the least-connections algorithm, the load balancer attempts to route requests based on the current amount of load on each backend. It accomplishes this by keep- ing track of the number of active connections for each backend. A backend’s connection counter is incremented as requests are dispatched to it and decremented as they complete. When a request arrives, the load balancer routes it to the backend with the fewest active connections. Figure 8.3 shows the operation of a least-connections load balancer.

Unlike round robin, least-connections algorithms avoid the problem of a request queuing behind another when a backend is available. The requests still take the same amount of time to process in the backend, but the latency, as seen by the service client, is more consistent.

Rails Application

Load Balancer Last Used Backend: svcs2

svcs1 svcs2 svcs3

ptg

URI-Based Load Balancing

An alternative approach to load balancing relies on the data in the request rather than the state of the clusters processing the data. URI-based load balancers work by using a hash algorithm to map each path to a backend. In a service-based architecture that leverages REST-style URLs, this means all requests for a piece of data can be routed to the same backend (server) while it’s available. If the backend goes down, the load balancer adjusts by routing requests to available backends in the pool. Figure 8.4 shows how a URL-based load balancer distributes load.

The question of why a service implementer would want to load balance based on URL instead of active connections is an interesting one. The answer usually relates to caching. If a given backend can cache the representations of N objects, where N is sig- nificantly smaller than the total number of objects in the system, then a URL-based load balancing strategy can significantly increase the number of requests that result in

Rails Application Load Balancer svcs1 connections: 1 svcs2 connections: 0 svcs3 connections: 1 svcs1 svcs2 svcs3

ptg

a cache hit (a metric known as cache hit ratio). Sacrificing connection-based load bal- ancing for cache locality is a reasonable approach in some situations, but it is the sort of scaling consideration that should be carefully weighed and measured.

In document SERVICE-ORIENTED DESIGN WITH RUBY AND RAILS (Page 170-174)