Web Intelligence with High Availability A Demand Driven Approach

(1)

Web Intelligence with

High Availability – A

Demand Driven

Approach

How to build a high available system to

provide thin-client tool for query,

reporting and analysis.

(2)

(3)

Web Intelligence with

High Availability - A

Demand Driven

Approach

Building a high available system to provide

thin-client tool for query, reporting, and

analysis.

Business Case

Datawarehouse experts say that the rate in which Datawarehouses are utilised increase at a phenomenal rate. Further as the number of users goes up day by day, it would be imperative that the users are able to access the required information over the web, as having a client side installation for every user would be an overhead on cost as well. So we would need a scalable system where the users are able to access the reports over the web. The flip side of it is that the network traffic would go up along with the usage of the servers.

As a fundamental facet of an organisation's philosophy, it would want high amount of resilience and fault tolerance for the system and it has to be available on a 24 x 7 basis. This in turn would increase the confidence on the system for the business users.

This white paper focuses more on load balancing and resilience of Web Intelligence, a product of Business Objects. But this concept is generic and may be applied to other products as well. Though Business Objects products have its own internal method of load balancing, it is not adequate when there is a requirement for a very high availability system. This white paper explores options of load balancing external to the business objects using CISCO switches and scenarios of the different types of load balancing techniques that can be used to make optimum use of the available infrastructure

Brief Introduction to Web Intelligence

WEBINTELLIGENCE, a product of Business Objects is the industry’s leading thin-client tool for query, reporting, and analysis. End users access WEBINTELLIGENCE using standard web browsers such as Microsoft Internet Explorer or Netscape Navigator.

(4)

The WEBINTELLIGENCE system is the infrastructure on which the Business Objects distributed solution relies. This solution gives business users the ability to access, analyse, and share information in intranet, extranet, and e-business environments. It provides IT departments with the tools they need to effectively control and manage enterprise-wide and inter-enterprise user access. Business Objects is used to create reports, for which we have an installation on the desktop. Once such reports need to be viewed over the intranet or extranet, the Web Intelligence distributed solution provides the means to do it. The Web Intelligence has several modules and the important ones would be the session manager, which manages the session information of the users and the BOManager that processes the reports. To obtain optimum performance with optimum cost, it is possible to install these modules separately in different machines. This in turn helps to achieve load balancing and fault tolerance.

Achieving High Availability and Load Balancing for WebIntelligence

Load Balancing and Fail over can be achieved using two methods. 1. Software Load Balancing

2. Hardware Load Balancing

Many softwares provide their own means of load balancing. In the case of Web Intelligence, each cluster manager can have a number of nodes attached to it. The cluster manager can then forward requests to process reports to the node that is least loaded at that point of time. Further, depending on the configuration environment, if WebIntelligence is installed on Application Servers like Websphere or Weblogic, they can further provide their own internal load mechanism, but they may not be as fast and reliable as the Hardware load balancing. The reason being, the hardware load balancers are built just for this purpose but the flip side being, they are costly.

This hardware load balancing can be achieved with the help of IP redirectors. There are several IP redirectors available in the market and among them, Cisco CSS (Content Services Switch) are the most popular. This paper considers using Cisco CSS as an example, whereas there are other IP redirectors as well which can be used in its place.

.

Using CISCO CSS as IP redirectors

CISCO switches help to forward requests from clients to different servers based on the load-balancing algorithm that has been implemented. In a nutshell, the CISCO switches would be forwarding requests from Clients to servers that are either having the least load or that which is up and running fine and hence ensuring high availability, fail over and performance. The Session Manager of WebIntelleigence maintains the session and there is no mechanism where the session manager can share this session information with another session manager.

(5)

Hence when a machine that hosts the session manager fails, user would have to login again so that the session is created again in the next session manager. Keeping this behaviour of Web Intelligence in mind, we would need to look at a load balancing strategy that can give optimum performance with optimum cost.

Hence we would require a load balancing mechanism in which, once a user is authenticated by one cluster manager, he should stick to the same cluster manager till he chooses to logout. During a session, CSS helps to maintain an association between a client and server. This is known as stickiness. If the CSS determines that a client is already stuck to a particular service, then the CSS places the request to that particular service, regardless of the load balancing method specified. This stickiness can be implemented with the help of cookies.

Client cookies uniquely identify clients to the servers providing content. A cookie is a small data structure used by a server to deliver data to a Web client and request that the client store the information, and in certain applications, return the information to the server to maintain the state between a client and a server.

Because the application must distinguish each user or group of users, the CSS needs to determine how a particular user is stuck to a specific Web server. The CSS can use a variety of methods

The Session Manager module of the WebIntelligence maintains the session of the users logged in. Hence requests going from one client should always go the same session manager. Hence once the CSS determines that a client is stuck to a particular service, the CSS should place the request to the same server again and again. This sticky configuration would be imperative for configuring load balancing for WebIntelligence.

Various methods of implementing the sticky behaviour

Source IP address (Layer 3)

In this method, the client would be stuck to one particular server based on the IP address of the client machine. The CSS has the ability to have 32K simultaneous users on the site that can be stuck to a particular server and the next user would be unstuck. If the volume of the site is such that are more than 32k users at a time, or if a large percentage of users come through a mega proxy, a different approach need to be looked at to implement the sticky mechanism.

Source IP address and destination port (Layer 4)

In this method, stickiness would be maintained based on combination of the IP address and the destination port number. This is similar to layer 3 and has the same limitation of layer 3 sticky mechanism. If the CSS sees the same IP address with two different destination ports, it will use two entries. If there is concern about the simultaneous number of users, it would be a wiser option to use the advanced balance methods.

(6)

String found in a cookie or an URL (String method)

Stickiness can also be implemented based on the cookie that is sent from the server. There are three advanced balance methods namely

• Cookie method

• Url method

• Cookieurl method

Whenever any of the above options is chosen for the load balancing mechanism, the cookie or the URL or cookieurl sent by the server is analysed by the CSS to determine to which server the request should be sent.

The advantage of the above three methods is that the control on how the load balancing would work will depend on the application. At times, it may prove to be a disadvantage as the programmers would not want their application to take care of load balancing and they may not consider being in their umbrella. In that case, we would have to go to the advanced balance arrowpoint cookie.

Using ArrowPoint Cookie

Stickiness can also be achieved by using the advanced balance arrow point cookie in which the client is stuck to the server based on the unique service identifier information of the selected server in the ArrowPoint-generated cookie. In this method, there is no need to configure any string match criteria and hence there is no dependency on the application while configuring the load balancing mechanism in the Cisco switches.

Design Decisions

Based on the features provided by WebIntelligence and CISCO switches, there are various possible architectures that are possible. Depending on the nature of the business, one can go about with the best possible architecture that would suit the business needs. Let us see in brief the various possible architectures and also where CISCO switches can prove to be very useful in providing the best possible infrastructure for Web Intelligence with very high resilience, fail over and load balancing capability.

It has to be noted that whenever load balancing is being implemented the Storage where the reports and universes are cached have to be shared across all the cluster mangers and nodes. It is recommended to use RAID, configured using a CIFS technology.

Load Balancing and Fail Over using a Backup/Redundant Cluster Manager

Figure 1 shows an approach where load balancing can be achieved by using several nodes working in tandem with the main clustermanager. The clustermanager would forward

(7)

requests for processing reports to the node machine that has the least load at that point of time. Further based on the processing power of the node, a parameter called node load factor

Pros Cons

Only One Cluster Manager to administer at a time

Requires manual intervention to put the system up when the main Cluster Manager fails The backup/redundant cluster manager is idle when the main cluster manager is working. No optimum use of

Shared Storage/RAID LAN/ WAN Processes Reports, Dynamically Builds HTML Node Main Cluster Manager Node Node Node Node Node Backup Cluster Manager Fig 1

(8)

Figure 1 shows an approach where load balancing can be achieved by using several nodes working in tandem with the main clustermanager. The clustermanager would forward requests for processing reports to the node machine that has the least load at that point of time. Further based on the processing power of the node, a parameter called node load factor can be set, which enables the node with high processing power to process more reports than the less powerful nodes.

In order to provide fault tolerance or fail over capability, there can be either a Backup Cluster Manager or a Redundant cluster Manager. In case of a Backup Cluster Manager, the OSAgent port used by the backup cluster manager would be same as that of the Main Clustermanager. In the event there is a fault in the main Cluster Manger, the Main Cluster manager has to be shutdown and the Backup Manager and its nodes have to be started.

In case a redundant cluster manager is being used, the OSAgent port number used by the system would be up and running all the time. So in the event the main cluster manager fails, users will have to access the system using a different URL.

The disadvantage of this type load balancing and fail over method is that at any point of time, the entire infrastructure would not be put into optimum use. Further manual intervention is required when the redundant or backup cluster manger manager has to be put up when the main cluster manager fails.

Load Balancing and Fail Over using Cluster Managers that are Load Balanced

Figure2 shows two clusters managers that are load balanced and are further having the capability of fail over. CISCO switches are load balanced in a round robin fashion using a sticky mechanism. A sticky mechanism on the Cisco switches is required when load balancing is implemented on WebI servers, as the session created in one Cluster Manager does not exist in another cluster manager. Hence a suitable sticky mechanism has to be formulated and designed, before the load balancing and fail over mechanism is implemented using CISCO switches.

Once a user logs into the system, his session would be created in any of the cluster manager. The user would stick to the same cluster manager till he logs out. In the event that one of the cluster managers fails, the user would just have to login again in order to create another session. Hence the entire fail over would be seamless and that too with apparently no down time. Once it is identified that a cluster manager has failed, it has to be just brought up so that the CISCO switches again identify it to send requests to this clustermanager.

(9)

Pros Cons

Optimum use of Infrastructure as all the machines as put into use at a point in time.

The NLB is single point of failure.

Two cluster managers to be administered at a point of time LAN/ WAN N L B Shared Storage/RAID Cluster Manager Node Node Node Node Node Node Cluster Manager Fig 2

(10)

Load Balancing with Remote Site Fail Over

Pros Cons

System with very high

availability Multiple cluster managers to be administered. No single point of

failure Additional NLB required for configuring backup services to remove any single point of failure

Shared Storage/RAID LAN/ WAN

Site A

Site B

Cluster Manager Cluster Manager Cluster Manager Cluster Manager NLB Farm N L B N L B Fig 3

(11)

These days, organisations have deemed it necessary to provide a mechanism in which a remote site fail over is also possible. In the event of any disaster and there is a complete loss of one site, the system should be still available with the help of the infrastructure that is still up and running fine in another site.

Figure three shows a very high availability system of web intelligence with high resilience and remote fail over capability in which there is no single point of failure. Assuming that there is loss of the entire site A, Site B would still be available. To achieve this, we would require the CISCO switches in site A and site B, configured in a Master and Backup mode. The switches would be configured in such a way that even if one CISCO switch fails, the back up service that is configured in the other switch would come into effect.

In the previous two architectures we find that the cluster manager is a single point of failure and hence a downtime would be inevitable in the event the cluster manager fails.

This architecture would provide the following edges: 1. No single point of failure.

2. Even if an entire site fails, the system would be still available in the other site.

3. In the event a CISCO switch fails, the service would be still available as a backup service would be available in the switch in the alternate site.

Conclusion

Every organisation would require a high available system to provide thin-client tool for query, reporting and analysis. Based on the organisation’s existing infrastructure and available budget, it would be able to set up such system, though it may face some challenges in between. This white paper was just an effort to show that the organisations have a wide range of choices to choose from the available technologies

ABOUT THE AUTHOR

Zebah Singh Alfred is a Business Objects and BI consultant with Wipro Technologies and responsible for recommending BI architectures. He holds a degree in Electrical and Electronics Engineering from Government College of Technology, Coimbatore. Author can be contacted at [email protected]

(12)

ABOUT WIPRO TECHNOLOGIES

Wipro is the first PCMM Level 5 and SEI CMMi Level 5 certified IT services company globally. Wipro provides comprehensive IT solutions and services (including systems integration, IS outsourcing, package implementation, software application development and maintenance) and research & development services (hardware and software design, development and implementation) to corporations globally. Wipro’s unique value proposition is further delivered through our pioneering Offshore Outsourcing Model and stringent quality processes of SEI and Six Sigma.

WIPRO IN DATA WAREHOUSING AND BUSINESS INTELLIGENCE

Wipro provides end-to-end Data Warehousing and Business Intelligence services to customers across verticals like Finance, Insurance, Utilities, Telecom, Retail, Logistics, Manufacturing and Healthcare. Wipro has implemented Business Intelligence and Data Warehousing solutions for over 90 customers worldwide including 30 Fortune 1000 clients. Wipro has evolved its ‘Insta Intelligence’ project management and delivery methodology around leading edge technologies in Data Acquisition, Data Modelling, Data Management, OLAP, Data Mining, and Meta-data Management to deliver innovative, sure-fire solutions to its customers.

For Further information, please visit: