TABLE 7 – ROUTINE TASKS - OUAF Technical Best Practices

TASK COMMENTS

Perform Backups Perform the backup of the database and file system using the site procedures and the tools designated for your site.

Post Process Logs Check the log files for any error conditions that may need to be addressed. Refer to Post Process Logs and Check Logs For Errors for more details.

Process Performance Data Collate and process day's performance data to assess against any Server Level targets. Identify any badly performing transactions.

Perform Batch Schedule Execute the batch schedule agreed for your site. This will include overnight, daily, hourly and ad- hoc background processes.

Rebuild Statistics DB2 and Oracle require the database statistics for the product schemas to be rebuilt on a regular basis so that the access to the SQL is optimized. At DB2 sites, a rebind is also required to reflect the changes in the execution plan/packages.

File Cleanup On a regular basis, the output files from the background processes and logs will need to be archived and removed to minimize disk space usage.

Archive Data not required The Oracle Utilities Application Framework features an inbuilt archiving facility that can transfer transaction data not considered required for online processing to another environment, to a file or simply deleted. Refer to Archiving for more details.

Run Cleanup Batch Jobs There are a number of background processes that remove staging records that have been already successfully processed. Refer to "Removal of Staging Records" for more details.

Note: The tasks listed above do not constitute a comprehensive list of what needs to be performed. During the implementation you will decide what additionally needs to be done for your site.

Typical Business Day

One of the patterns experienced at sites is the notion of a common definition of a business day. Typically during the implementation the business day is defined for planning purposes. It defines when the call center is at peak or non-peak, background processing can be performed and when backups are performed during the business day.

0 4 8 12 16 20 0

Peak Off Peak

Off Peak Online Backup Backup Backups Overnight Batch Overnight Batch

Batch Daily/Ad-hoc/Hourly Batch

Monitoring Monitoring

Figure 5 – Example Typical Business Day

Note: The above diagram is for illustrative purposes only and could vary for your site.

Typically a business day contains the following elements:

• There is a peak online period where the majority of call center business is performed. Typically this is performed in business hours varying according to local custom.

• There is a call center off peak period where the volume of call center traffic is greatly reduced compared to the peak period. Typically in call centers, which operate 24x7, this represents overnight and weekends. At this time the call center is reduced in size (usually a skeleton shift).

Some sites do not operate in non-peak periods and rely on automated technology (e.g. IVR) to process transactions such as payments etc.

• Backups are either performed at the start of the peak period or the end of the peak period. The decision is based upon risk around failure of the background processing and its risk to the impact of online processing. The product specific background processes can be run anytime but avoiding them during peak time will maximize the available computing resources to the successful processing of call center transactions. The backup at the end of the peak period is the most common patterns amongst product customers.

• Background processes are run at both peak and off peak times. The majority of the background processing is performed at off-peak times to maximize the computing resources to the successful completion of the background processing. The background processing that is run at off peak times is usually to check ongoing call center transactions for adherence to business

rules and process interface transactions ready for overnight processing.

• Monitoring is performed throughout both peak and off peak times. The monitoring regime used may use manual as well as automated tools and utilities to monitor compliance against agreed service levels. Any non-compliance is tracked and resolved.

The definition of the business day for you site is crucial to schedule background processing and set monitoring regimes appropriate for the traffic levels expected.

Note: This facility is available in Oracle Utilities Application Framework V4 and above only.

In the past releases of the Oracle Utilities Application Framework the userid that could be used to login was restricted to 8 characters in length. In Oracle Utilities Application Framework V4 and above,

In Oracle Utilities Application Framework V4 and above the concept of a Login Id is supported. This attribute is the used by the framework to authenticate the user. For backward compatibility the 8 character userid field is still used for auditing purposes internally. Therefore both Userid and Login Id should be populated. They can be different or the same values.

The Login Id can be set manually, via Oracle Identity Manager or set in a class extension to auto generate a value.

Figure 6 – Login Id

Hardware Architecture Best Practices

Note: There is a more detailed discussion of effective architectures in the Oracle Utilities Application Framework Architecture Guidelines whitepaper. Refer to that document for further advice.

The product can be run on various combinations of hardware based architectures. When choosing an architecture that is best suited to a site there are a number of key factors that most be considered:

• Cost – When deciding a preferred architecture, the total cost of the machine(s) and infrastructure needs to be taken into a consideration. This should the ongoing costs of maintenance as well as power costs.

• IT Maintenance Effort – When deciding a preferred architecture, the manual or automated effort in maintaining the hardware in that architecture needs to be factored into the solution. • Availability – One of the chief motivations for settling on a multi-machine architecture is

requiring the architecture to support high availability. When deciding a preferred architecture, the tolerance and cost of availability needs to be factored into the solution.

Single Server Architecture

If the site is cost sensitive and/or the availability requirements allows it, then having all the architecture on a single machine is appropriate. This is known as the single server architecture. This configuration is

popular with some sites as:

• The cost of the hardware can be minimal (or least very cost effective). • Maintenance costs can be minimized with the minimal hardware.

• Virtualization software (typically part of the operating system or third party virtualization software) can be used to partition the machine into virtual machines.

The one issue that makes this solution less than ideal is the risk of unavailability due to hardware failure. Customers that choose this solution, typically address this shortcoming by buying a second

machine of similar size and using it for failover, disaster recovery as well as non-production. In essence, if the primary hardware fails then the backup machine assumes the responsibility for production till the hardware fault is resolved. In this case, additional effort is required to keep the secondary machine in synchronization with the primary.

The diagram below illustrates the single server architecture:

Web Application Server Business Application Server

Database Server

Browser Client

Figure 7 – Example Single Server Architecture

Simple Multi-tier Architecture

One of the variations on the single server architecture is the "simple multi-tier architecture". In this hardware architecture, the database server and Web Application Server/Business Application Server are separated on different machines. For product V1.x customers, you can also separate the Web Application and Business Application Servers.

This is chosen by customers who want to optimize the hardware for the particular tier (settings and size of machine) and therefore separate the maintenance efforts for each server. For example, Database Administrators need only access the Database Server to perform their duties and set the operating system parameters optimized for the database.

Unfortunately the solution can have a higher cost than the single server solution and still does not address the unavailability of any machine in the architecture. Customers that have used this model adopt a similar solution to the single server architecture (duplicate secondary machines at a secondary site) but also have the option of having both machines in the architecture being the same size and shifting the roles when availability is compromised. For example, if the database server fails, the Web Application Server can be configured to act as a combination of the Database Server and Web Application Server.

Web Application Server Business Application Server

Database Server

Browser Client

Web Application Server

Business Application Server

Database Server

Browser Client

Figure 8 – Variations of the Single Multi-Tier Architecture

Machines in this architecture can be the same size or different sizes depending on the cost/benefits of the various variations. Typically customers use a smaller machine for the Web Application Server as compared with the database server.

Multiple Web Application Servers

To support higher availability for the product, some sites consider having multiple Web Application servers. This allows online users to be spread across machines and in the case of a failure be diverted to the machine that is available. To achieve this, the site must use a load balancer (see "Load balancers" discussion later in this document). At the time of failover the load balancer will redirect traffic to the available server. This is made possible as the product is stateless.

The Web Applications Servers are either clustered or managed. Refer to the discussion in the Clustering or Managed? section of this document for advice.

This architecture is quite common as it represents flexibility as one of the Web Application Servers can be dedicated to batch processing in non-business hours making the architecture more cost effective. Typically the Web Application Server software is shutdown to allow batch processing to use the full resources of the machine while allowing users (usually a small subset) to process online transactions. The only drawbacks with this solution are a potential higher cost than a multi-tier solution and the potential impact of database unavailability. Customers that use this architecture overcome the potential unavailability of the database by either using a secondary site to act as the failover or using one of the Web Applications in a failover database server role. The latter is less common, as most customers find it more complex to configure, but is possible with this is a possibility with this architecture.

Web Application Server

Business Application Server

Database Server

Browser Client

Web Application Server Load Balancer

Figure 9 – Example Multi-Application Server Architecture

High Availability Architecture

The most hardware intense solution is where all the tiers in the architecture have multiple machines for high availability and distribution of traffic. The solution can vary (number of machines etc) but have the following common attributes:

• There is no single point of failure. There is redundancy at all levels of the architecture. This excludes redundancy in the network itself, though this is typically out of scope for most implementations.

• The number of servers will depend on segmentation of the traffic between call centers, non- call centers, interfaces and batch processing. It is possible to reuse existing servers or setup dedicated servers for different types of traffic.

• Availability can be managed with either hardware based solutions, software based solutions or a valid combination of both.

• The number of users will dictate the number of machine to some extent. Experience has shown, that a large number of users tend to be better served, from a performance and

availability point of view, by multiple machines. Refer to the What is the number of Web Application instances do I need? for a discussion on this topic.

• The Web Applications Servers are either clustered or managed. Refer to the discussion in the Clustering or Managed? section of this document for advice.

• Database clustering is typically handled by the clustering or grid support supplied with the database management system.

This solution represents the highest cost hardware from both hardware and a maintenance perspective. Historically customers with large volumes of data or specific high availability requirements have used this solution successfully.

The figure below illustrates the High Availability Architecture:

Web Application Server

Database Server/Cluster

Browser Client

Web Application Server Load Balancer

Business Application Server Business Application Server Load Balancer

Figure 10 – Example High Availability Server Architecture

Failover Best Practices

Failover occurs when a server in your architecture becomes unavailable due to hardware or software failure. Immediately after the failure the active components architecture route the transactions around the

of availability. This routing can be done automatically through the use of high availability

software/hardware or manually by operators.

The Oracle Utilities product architecture supports failover at all tiers of the architecture, using either hardware or software based solutions. Failover solutions can be varied but a few principles have been adopted successfully by existing customers:

• Failover solutions that are automated are preferable to manual intervention. Depending on the hardware architecture used the failover capability can be automated.

• Availability goals play a big part in the extent of a failover solution. Sites with high availability targets tend to favor more expensive, comprehensive hardware and software solutions. Sites with lower availability (or no goals) tend to use manual processes to handle failures.

• Failover is built into the software used by the products (though it may entail an additional license from the relevant vendor). For example, Web Application Server vendors have inbuilt failover capabilities including load balancing, which is popular with customers.

• Hardware vendors will have failover capabilities at the hardware or operating system level. In some cases, it is an option offered as part of the hardware. Sites use the hardware solution in combination with a software based solution to offer protection at the hardware level. In this case, the hardware solution will detect the failure of the hardware and work in conjunction with the software solution to route the traffic around the unavailable component.

• Failover is made easier to implement for the product as the Web Application is stateless. The users only need connection to the server while they are actively sending or receiving data from the server. While they are inputting data and talking on the phone they are not consuming resources on the machine. For each transaction the infrastructure routes the calls across the active components of the architecture.

• At the database level the common failover facility used is the facility provided by the database vendor. For example, Oracle database customers typically implement RAC. Failover configuration at the database is the least used by existing sites, as the cost of having additional hardware is usually prohibitive (or at least not cost configurable).

• Sites wanting to have failover and disaster recovery but cannot afford both consider a solution which combines both. In this case, the disaster recovery configuration is used as a failover for non-disasters.

• For any failover solution to be effective, the site typically analyses all the potential areas of failure in their architecture and configures the hardware and software to cover that eventuality. In some cases, sites have chosen NOT to cover eventualities of extremely low probability. Using hardware Mean Time between Failure (MTBF) values from hardware vendors can assist in this decision.

When designing a failover solution then the following considerations are important: • Determine what the availability goals are for your site.

• Determine the inbuilt failover capabilities of the hardware and software that your site is using. This may reduce the cost of implementing a failover solution if it is already in place.

• List all the components that need to be covered by a failover solution. Review the list to ensure all aspects of "what can fail?" are covered.

• Design your failover solution with all the above information in mind that you can automate (within reason) for your site. Ensure the solution is simple and reuses already available infrastructure to save costs.

• Commonly sites use the following failover techniques in the architecture: TABLE 8 – COMMONLY USED FAILOVER TECHNIQUES

TIER COMMON FAILOVER SOLUTION

Network Load Balancer (hardware for large numbers of users; software based for others). Consider redundant load balancers for "no single point of failure" requirements. Web Application Server/

Business Application Server

Use inbuilt clustering/failover facilities unless load balancer is doing this. Consider hardware solutions for batch or interface servers.

Database Server Use inbuilt failover facilities in database unless hardware solution is more cost effective.

Online and Batch tracing and Support Utilities

The Oracle Utilities Application Framework provides a set of utilities to aid in capturing information for support. Refer to My Oracle Support Doc ID 1206793.1 (Master Note for Oracle Utilities

Framework Products - Online and Batch tracing and Support Utilities) for details and training on using these utilities to provide critical information to help expedite support requests.

General Troubleshooting Techniques

Whilst the troubleshooting features of the product are documented in detail in the online help, Performance Troubleshooting Guides and other manuals there are a number of techniques and guidelines that can be used to help identify problems:

• Check the logs in the right order – The log files are usually the best spot to look for errors as any error is automatically logged to then by the product. The most efficient method is to look for the logs from the bottom up as if the error appears in the lower ranks of the architecture that is more likely where the error occurred5

In document OUAF Technical Best Practices (Page 34-42)