An Augmented Normalization Mechanism for Capacity Planning & Modelling Elegant Approach with Artificial Intelligence

(1)

An Augmented Normalization Mechanism for Capacity Planning

& Modelling – Elegant Approach with Artificial Intelligence

13th Annual International Software Testing Conference 2013

Bangalore, 4

th

-5

th

December 2013

Arit Kumar Bishwas

Capgemini India Pvt. Ltd.

A-1, Technology Park, MIDC, Talwade,

Pune-412114, Maharashtra, India

(2)

Abstract

Software performance engineering in a heterogeneous environment is a very complex & challenging field to practice. Performance of any software application plays a vital role into the qualification of the application’s existence. So, a little degradation in the performance can cause an unbearable disaster in application’s sustainability. We need a model where we can do effective & intelligently forecasting to all the possible quantized dimensions of application’s sustainability in a dynamic heterogeneous complex environment which employee future growth. Here I am presenting an approach to develop an intelligent software performance prophecy machine that will help to forecast and predict the application’s state with different future work load estimations. This intelligent software performance prophecy machine has been designed & developed based on a machine learning algorithm named “Artificial Neural Network”. This machine has been used to explore the application’s performance states in depth for its future withstanding in such situations where every possible quantized dimensions of the multi-tier heterogeneous system architecture are getting changed dynamically with new specified performance load desires. The proposed machine is itself an independent in nature, and keeps on learning from the currently available performance analysis information and other related bench mark information to explore the sustainability & withstanding of any software application with its future requirements.

Key Factors: Capacity Modelling & Planning, Performance Engineering, Machine Learning, Artificial Neural Network

1. Introduction

In today’s fast and advanced technology driven world we all are fenced around with the complex net of information technology and very highly dependent on computer applications. Nowadays computer applications are parts of our day to day life ranging from simple phone to extraterritorial investigation technology. So these software should be very robust and reliable in terms of its sustainability as well as its scalable performance!

In this busy world where time is very precise thing, users always expect fast response time from the application while interacting with the software applications. So as a software manager, architect, developer and tester it is our responsibility to concentrate on those factors which causes the software to be highly robust, reliable and as well as highly performance sustainable.

We need a model that predicts the performance sustainability of any application very intelligently. Here in this paper we are going to discuss about one such model. This model predicts the performance sustainability of any application with its artificial intelligence. This model learns itself from the experience. This model has been developed with one of the most popular machine learning algorithm- Artificial Neural Network which is very accurate predictors for classification problems with robust approach for approximations, and also supports real-valued, discrete-valued and vector valued functions.

Experiments have shown clearly, that how this new artificial intelligent model, named as Intelligent Performance Sustainable Forecasting Model (IPSFM), helped in predicting performance sustainability of the software system with increasing future user growth; which in turns demonstrating the significant level of improvement in the process of capacity planning & modelling.

2. Capacity Planning & Modelling

Background

An organization demands the production capacity measurements in terms of capacity planning and modelling to meet the changing demands for its products, services, and business.

2.1. Capacity Planning

Capacity planning is the science and art of estimating the computing resources such as computer hardware, software and infrastructure resources to identify the appropriate amount of resources needed to handle current and future service requirements. This is a continuously on-going process where the future needs of business are generally imagined to explore the range of probabilistic possibilities of "What If" scenarios.

(3)

Capacity Modelling is in general directly relates the capacity forecasts to associated capital and operational costs. It enables the business to balance the cost of capacity requirements against the benefits associated with business growth or service enhancements. The capacity models portray a complete picture to justify capacity-related decisions.

3. Intelligent Performance Sustainable

Forecasting Model (IPSFM)

IPSFM is developed based on an advance artificial intelligent mechanism to predict the performance of any software application.

First, here we will have to understand some of the key concepts regarding this model then we will look into the working mechanism of IPSFM.

Figure 1: Intelligent Performance Sustainable Forecasting Model (IPSFM)

The above diagram shows the architecture of the

Intelligent Performance Sustainable Forecasting Model (IPSFM).

3.1. Algorithm for Working with IPSFM

The following algorithm defines the working flow of this model.

Step 1:

Configure the Learning Machine and also prepare the Future Configuration as per requirements.

Step 2:

The Learning Machine of IPSFM is trained with

Training Data Matrix Set.

Step 3:

After learning phase the Learning Machine

generates the learned Performance Model.

Step 4:

The Performance Model is executed with Future Configuration to predict the Future Performance Status.

Step 5:

Finally analyse the Future Performance Status to understand the expected capacity requirements.

Step 6:

If expected results are satisfied with Step 5 then configure the application’s architecture based on Future Configurations. Otherwise go to Step 1

again.

3.2. Training Data Matrix Set

Training Data Matrix Set is an N-dimensional matrix of information which is used to train the

Learning Model. For each different set of data a separate matrix is created. As shown in the Figure 2 that each 2-dimenstional matrix set contains relative different set of information.

Figure 2: Training Data Matrix Set Structure

Training Data Matrix Set has been categorized in the following two sections:

3.2.1. Master Training Data Matrix Set

This is the data matrix set which contains the different type of latest benchmark information. Some of the most important benchmark data information has been mentioned below:

1. Benchmark data matrix for database transactions processing rate

2. Benchmark data matrix for java operation execution rate with JVM

We will have to keep on updating this Master Training Data Matrix Set with new published benchmark information.

3.2.2. Current Performance Status Matrix

This is the data matrix set which contains the information regarding current performance of the

(4)

application. Some of the most important such current performance engineering statistics information has been mentioned below that are required to train the Learning Machine:

1. Transaction response time

2. Transaction processing rate

3. 90th Percentile of response time

4. Standard Deviation of response time

5. Server CPU Utilization

6. Server Memory Utilization

7. JVM Details

8. Hits/Sec at web server

9. Throughput

10. Inter-server Network connections information

11. Database server machine configurations details (CPU, Memory, and Disk)

12. Application server machine configurations details (CPU, Memory, Disk)

The above Current Performance Status Matrix information can be gathered by performance test engineering activity.

3.3. Learning Machine

Learning Machine is an artificial intelligent module of the Intelligent Performance Sustainable Forecasting Model (IPSFM). Learning Machine

learns from the Training Data Matrix Set.

3.3.1. Learning Technique Method

Choosing a good Learning Mechanism is very important; it should be very accurate predictor, and should support real-valued, discrete-valued and vector valued functions for our complex and real time software applications.

So here I have selected the learning classifier algorithms as “Artificial Neural Network” to build the Learning Machine.

3.3.2. Artificial Neural Network

Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this

paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by examples. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well.

3.3.3. Neuron Architecture

ANN consists of a number of processing units or neurons, which are connected to each other through, weighted links and each of these links has its own value known as weight of the link.

Figure.3: Neuron Architecture

Here in Figure 3, A1, A2,…, An are the inputs and

W1, W2, …, Wn are their respective weights for the

neuron. The Activation Function calculates output, which is the next input value for the other links connected with this neuron. A Combination Function is used to calculate the argument value (X) for Activation Function as:

X = Σ AiWji = A1Wj1 + A2Wj2 + ... + AnWjn

The Activation Function normally used, is a

sigmoid function and it generates the output between 0 and 1.

output = g(X) = 1/(1+ e-X), where X = Σ AiWji

These units (neurons) are organized into several layers, namely input layer, hidden layers and output layer. The input layer receives an external activation vector and passes it via weighted connections to the units in the first hidden layer. These hidden layers compute their other activation and pass them to neurons in succeeding layers.

(5)

Figure 4: Artificial Neural Network

3.3.4. Training of Artificial Neural Network

Now the ANN will be trained with the training data set. Let’s take the reference of Figure 4 for ANN. There are one input layer, one hidden layer and one output layer in the ANN. Here at the initialization, some random inputs are provided to the input layer of the ANN. ANN can only take inputs in a standardized manner ,so all the input variable have to be encoded in a standardized manner, taking values between 0 and 1, even for categorical variables. For continues variables, we use the following formula to encode the input variables into the standardized manner:

X* = (X – min(X)) / range(X)

= (X – min(X)) / (max(X) – min(X)), Where X* is the encoded value

These values are used by the input nodes to progress forward in the network. Now the nodes of the hidden layer need inputs to progress further, so what will be the input and output values for the nodes? As we have seen in the early section of

Neuron Architecture, the Activation Function will produce the output for the next connected neurons in the network. The output produced by Activation Function will be used as the input to the next connected neuron. As shown in Figure 4,

Input of node N1 = X1

Output produced by node N1 = W1A

Input of node NA = W1A

Output of node NA = g (Z) = 1/(1+ e-Z),

where, X = Σ XiWij ; i =1, 2, 3; j=A,B,K

This is called Normalization. Again the output we get from Neural Network is in its standardized format, so de-normalization is required to decode the value in real valued format using the following formula;

prediction = output (data_range) + minimum

Here,

prediction: is the predicted output for input values.

output: is the predicted output

data_range: is the difference value in dataset for actual output value

minimum: is the minimum actual output value in the data set

3.3.5. Supervised Learning

I have used Supervised LearningTechnique for the

Learning System. Supervised learning is the

machine learning task of inferring a function from supervised training data. The training data consists of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function. The inferred function should predict the correct output value for any valid input object. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way.

Let us consider xidenote the set of input variables,

and yi denote the set of output variables that

algorithm will try to predict for the given set of input variables. A pair (xi, yi) is called a training

example, and the dataset that we'll be using to learn-a list of m training examples {(xi, yi); i =1, ...,

m}-is called a training set.

Figure 5: Supervised Learning

So in supervised learning algorithm, our goal is, given a training set, to learn a function h: X

→

Y so that h(x) is a "good' predictor for the corresponding value of y. Here h is known as hypothesis. The

Figure.5 represents this.

3.3.6. Sum of Square Error

The predicted output we get may not be the actual output, and there may be a degree of error/difference. The output is then tuned to an excepted value. The output is compared to the actual value of the target variable for this training set observation, and the error (actual - output) is

(6)

calculated. This prediction error is analogous to the residuals in regression models. To measure how well the output prediction fit the actual target value, most Artificial Neural Networks use the Sum of Square Errors.

SSE = 1/2 ∑ ∑ (actual - output)2

Where, the squared prediction errors are summed over all the output nodes and over all the records in the training set.

3.3.7. Error Minimization

Gradient Descent Method has been used for error minimizing. It is an optimization method that will help us in finding the set of weights which will minimize Sum of Square Error.

As we have a set of m weights W=W0, W1, W2,…

Wm in our ANN model and we want to find the

values for each of these weights that, together, minimize SSE. The gradient of SSE with respect to the vector of weight W is the vector derivative:

SSE(W) = [

∂

SSE/

∂

W0,

∂

SEE/

∂

W1,…,

∂

SEE/

∂

Wm ]

that is, the vector of partial derivatives of SSE with respect to each of the weights and the gradient descent procedure to minimize SSE, starts from a random W, and at each step, updates W, in the opposite direction of the gradient. The Gradient Descent method gives us the direction that we should adjust the weight in order to decrease SSE.

∆Wi = − η ( ∂x/∂Wi ), ˅i

Wi = Wi + ∆Wi

where, η is the rate of learning and determines how much to move in that direction.

3.3.8. Back-propagation Rule

The prediction error (actual - output) produced by

SSE method for a particular record, is back-propagated to hidden unit assigning partitioned responsibility for the error to the various connections. The weights on these connections are then adjusted to decrease the error, using Gradient Descent method. “Tom M Mitchell” derived the Back-propagation rule by using Activation Function and Gradient Descent:

Wij,new = Wij,current + ∆Wij ;

where, ∆Wij = ηδjXij

Here Xijrepresents the ith input to jthnode, and δj

represents the responsibility for a particular error belonging to node j. The error responsibility is

computed using the partial derivative of the sigmoid function with respect to Xiand takes the

following forms, depending on whether the mode in question lies in the output layer or the hidden layer:

If there are output layer nodes Then

δj=outputj(1 - outputj ) ( actualj -

outputj )

If there is hidden layer nodes Then

∆j= outputj ( 1 - outputj ) ∑Wjkδj

where, ∑Wjkδj is the weighted sum of the error

responsibilities for the nodes downstream from the particular hidden layer nodes.

Momentum is also added for better and fast learning:

∆Wcurrent= − η ( ∂SSE/∂Wcurrent )

+ α∆Wprevious

Where ∆Wprevious represents the previous weight

adjustment, and 0 ≤α≤ 1.

3.4.Performance Model

Once the Learning Machine got trained with the

Training Data Matrix Set, we get the learned model as Performance Model. This model will be now used to predict the performance of the application with different futuristic system configurations.

3.5.Future Configuration Matrix

Figure 6: Future Configuration Matrix

Future Configuration Matrix is the input data matrix. This data matrix contains the details of the expected future system configurations. The data matrix will be feed to the Performance Model

module. Then the Performance Model module will predict the performance status of the application based on the input data matrix configuration. The above Figure 6 shows the structure of the Future

Records Output

(7)

Configuration Matrix which is used for the input data matrix.

3.6.Future Performance Status

This module shows the performance status of the application which is predicted based on the Future Configuration Data Matrix Set.

4. Experiment

For the experiment purpose I have taken one of the client projects.

4.1.Requirement

We need to predict the maximum number of transactions that can be handled by the system when the system is going to get updated with the given futuristic proposed system configurations.

4.2.Work Flow

Step 1:

We have used the performance engineering analysis report of the stable existing application. We then prepared the Current Performance Status Matrix Set by analysing the performance engineering report.

Step 2:

We prepared the Training Data Matrix Set, with updated (if available) benchmark information.

Step 3:

Trained the Learning Machine with Training Data Matrix Set.

Step 4:

The trained Learning Machine has been used as

Performance Model to predict the maximum number of transaction that can be handled by the application.

Step 5:

Predicted the maximum number of transaction that can be handled by the application based on the given futuristic proposed system configurations.

4.3.Selection of Data Mining Tool

There are many data mining tools available in the market, some are licensed and some are open source. One of the most popular public domain Data Mining tools is the WEKA (Waikato Environment for Knowledge Analysis). It is a suite of machine learning software, written in Java, and is free software available under the GNU General Public License. There are many learners available within WEKA. I have used WEKA to design the learning machine. The architecture of the IPSFM has been designed in Java.

4.4.Experiment Setup

4.4.1. Training Data Matrix Set

1. Master Training Data Matrix Set

For training the Training Data Matrix Set with

Master Training Data Matrix Set we have used the

TPC, SPEC benchmark data sets.

Figure 7: Database tpmC Benchmark Information by TPC

Figure 8: max-jOPS Benchmark Information by SPEC

2. Current Performance Status Matrix

For training the Training Data Matrix Set with

Current Performance Status Matrixwe have used the current performance test engineering results and infrastructure configuration information of the existing application. Here below we have shown some of the information those have been captured during performance test execution.

(8)

Figure 9: Average Transaction Response Time

Figure 10: Primary Application Server CPU Utilization

Figure 11: Primary Application Server Memory Utilization

Figure 12: Primary Application Server Disk Utilization

Figure 13: Database Server CPU Utilization

Figure 14: Database Server Memory Utilization

Figure 15: Database Server Disk Utilization

4.4.2. Learning with Learning Machine

At the initialization, couple of experiments have been carried out to train the Learning Machine

with Master Training Data Matrix Set and Current Performance Status Matrix by setting various combinations of values of artificial neural network parameters.

After 10 rounds of experiments with different artificial neural network parameter combinations, a satisfied level of learning has been achieved by the

Learning Machine with േ 5% of error variance and has been saved as our Performance Model. It is noticed to achieve with the following parameters values in Learning Machine:

a) No of hidden layers: 2 b) Learning rate: 0.05 c) Momentum : 0.0050

d) No of epochs or training time: 500000 e) ValidationSetSize: 0

f) ValidationThreshold: 20

4.4.3. Future Configuration Data Matrix Set

Once the expected Performance Model is generated, the following future configuration has been used to predict the maximum number of transactions that can be achieved in the system with expected future configuration. The Performance Model uses the following Future Configuration Data Matrix Set and predicts the maximum number of transactions that can be achieved with the proposed configurations in the system.

(9)

Table 1: Future Configuration Data Matrix Set

Attributes Values

DB Company Oracle

DB System SPARC SuperCluster with T3-4 Servers

DB Spec. Revision 5.11.0

Database Software

Oracle Database 11g R2 Enterprise Edition w/RAC w/Partitioning

Operating System Oracle Solaris 10 09/10 TP Monitor Tuxedo CFS-R

Server CPU Type Oracle SPARC T3 - 1.65 GHz

Server CPUNo 108 Total Server

Processors 108 Total Server Cores 1728 Total Server Threads 13824

Cluster Y Front EndsNo 81 Total FE Processors 162 Total FE Cores 972 Total FE Threads 1944 AppServerHW

Name and Model Cisco UCS C220 M3 JVM Name Oracle Java SE 7u11

JVM Version

Java HotSpot(TM) 64-Bit Server VM_ version 1.7.0_11 NetworkbetweenAp pServerToDBServer (MBPS) 100 NetworkbetweenWe bServerToAppServe r(MBPS) 100 Max_Transaction_C anBeAchieved Per Sec ? 4.5.Experiment Results

The IPSFM predicted the maximum number of transactions that can be achieved with the proposed future configuration is 399540 transactions per second with േ 5% of variance.

4.6.Challenges

1. Need to update the Master Training Data Matrix Set on regular basis based on the availability of the new benchmark data.

2. Need to train the Learning Machine many times with different parameter settings to

achieve a satisfying level of leaning with minimum error variance.

5. Conclusion

This paper demonstrated that how we can predict the performance of any software application with future configurations. IPSFM will explore the performance sustainability of any software application. By using this mechanism one can predict the maximum number of transactions that can be achieved with the proposed future configuration. IPSFM helps in forecasting the effects of increasing user load in the system and provides the guidance to improve the system so that in future the application can handle augmentation user load. This will help in identifying the future performance traps so that the serious business losses due to un-expected user load can be avoided in advance.

Appendix A

Number of graphs: 15

Number of tables: 1

References

[1] Machine Learning, Tom M Mitchell, McGraw-Hill International Editions 1997.

[2] Introduction to Machine Learning, Ethem Alpaydin, PHI, 2008

[3] Artificial Intelligence: A Modern Approach (3rd Edition), Stuart J. Russell, 3rd Edition

[4] http://www.tpc.org/

[5] http://www.spec.org/benchmarks.html

[6] http://www.cs.waikato.ac.nz/ml/weka/

[7] The Data Mining Approach to Automated Software Testing, Mark Last, Menahem Friedman, Abraham Kandel

[8] Probability and Statistic, Murray R Spiegel, John J Schiller, R Alu Srinivasan, 3rd Edition, Mc Graw Hill

[9] A First Course in Probability, Sheldon Ross, 6th Edition, Pearson Education

(10)

[10] DISCOVERING KNOWLEDGE IN DATA, An Introduction to Data Mining, Daniel T. Larose, Director of data Mining, Central Connecticut State University

[11] Stanford University Lecture Notes by Andrew Ng, www.stanford.edu/class/cs229/materials.html

[12] An Elegant Approach To Automated Software Testing Using The Dimension Of Artificial Intelligence; 10th Annual International Software Testing Conference 2010 Bangalore, November 2010

About the Author

Arit Kumar Bishwas, having a MS degree from BITS, Pilani. He is currently working with Capgemini India Pvt. Ltd. as a Performance Engineering Consultant. He is having 9+ years of total work experience out of which he has about 7 years of Software Industry experience and about 2+ years of computer science teaching experience. He is Sun Certified Java Programmer (SCJP 1.5) as well as ISTQB certified. Earlier he has worked with IBM India Pvt Ltd., Mindtree Ltd. and Zensar Technologies Ltd. In 2010, his paper was selected as one of the top ten best papers at STC.