• No results found

Mining Invariant Relationships for Failure Analysis of Batch Software Systems

N/A
N/A
Protected

Academic year: 2021

Share "Mining Invariant Relationships for Failure Analysis of Batch Software Systems"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

tesi di laurea magistrale

relatori

Ch.mo Prof. Stefano Russo Ch.mo Prof. Marcello Cinque

correlatori

Ch.mo Ing. Flavio Frattini

Ch.mo Dr. Santonu Sarkar, Infosys Ltd., Bangalore, INDIA

candidato

Agostino Savignano Matr. M63000176

Mining Invariant Relationships for Failure Analysis of

Batch Software Systems

(2)

This thesis is in part the result of a research work done during the

InStep internship experience with Infosys Ltd

multinational provider of business consulting, information technology, software engineering and outsourcing services.

160.000 employees, 8 billion revenues.

Next Gen Computing Labs, within the Infosys campus of Bangalore, India.

(3)

Context

A batch software system is a system where jobs run without end user interaction, or can be scheduled to run as resources permit. This class of systems shows a very repetitive behaviour. They are often used:

to process large amount of transaction data

for scientific computation

as there may be failures in the execution of jobs, malfunctions have to be detected and actions have to be taken accordingly; manual detection is inefficient.

A feasible approach to automatically and dynamically characterise the behaviour of a system, in order to detect execution problems, is represented by flow intensity invariants.

(4)

A simple example of

flow intensity invariant

A platform for the batch processing of files: each file goes through a stage of elaboration. The processing time is proportional to the size

Measuring the file size and the time spent in a stage, I(x) and I(y), (the

flow intensities), the equation

invariant relationship characterising the expected behaviour of the batch system.

If there is an execution problem (e.g., file processing hangs) the equation does not hold any more.

)

(

)

(

y

k

I

x

(5)

Contribution

An approach for the

automatic

discovery of a set of flow

intensity

invariant relationships

in batch software systems

,

useful to characterise the

expected behaviour

of the

system.

A detection mechanism that helps the operations personnel in identifying deviations of the actual system behaviour from the correct expected one is also presented.

The application of a tool implementing the approach,

MinerVa

, in two real world batch systems:

A SaaS application by Infosys *

SCoPE, the Supercomputer of the Federico II University

* S. Sarkar, R. Ganesan, M. Cinque, F. Frattini, S. Russo, and A. Savignano, “Mining Invariants from SaaS Application Logs”,

accepted for publication in the proceedings of the European Dependable Computing Conference, May 2014, Newcastle upon Tyne, UK‏

(6)

Approach

Source of knowledge of the system behaviour: event logs

Objective

automatically mine a set of invariant relationships between

time series measurements from system logs (learning phase)

use such relationships for detecting deviations of the system behaviour (detection phase)

(7)

Approach Data Flow

Learning phase

Off-line

Past observations

RESULTS: a set of linear relationships,

representing a model of the system

Detection phase

Either on-line or off-line

Uses the model to

predict the behaviour of a system on new

(8)

Sampling: from Event Logs to Time Series

PROBLEM

How to retrieve time series from event logs

SOLUTION

With a certain

time interval:

retrieve the events in the application logs

count the events,

according to the different

flow intensities

the “

sampling time

” is determined using a

tupling

algorithm, applied on particular events, e.g., job arrivals or

completions.

(9)

Mining: from Time Series to Invariants

PROBLEM

How to discover relationships between time series

SOLUTION

For each possible pair of flow intensities:

Perform a linear regression using:

AutoRegressive models with eXogenous inputs (ARX)

Recursive Least Squares (RLS) algorithm Test the goodness of the regression:

 R2 coefficient of determination

All the pairwise relationships which have a R2 value greater than a

threshold are considered as linear :

(10)

Detection: from Invariants to Alerts

PROBLEM

How to detect a deviation from the expected behaviour

SOLUTION

Using the regressed coefficients to predict the expected behaviour

For each regressed invariant relationship:

compute the expected output

compare the expected output with the actual output

If the difference between the expected and the actual output is greater than an adaptive threshold, raise an alert.

(11)

SaaS CPG application by Infosys

A platform provided as a service by means of virtualized machines that:

Receives transaction details data daily from several subscribers all around the world.

Stores the data and performs analysis on the stored data.

The work items consisting of several records are provided as a file in a specific format and with different frequencies by the various

(12)

SaaS CPG application by Infosys: Mining

9 months of observations 108.000 workitems

Tupling on job completions Sampling time: 10 min

Identified flow intensities: 33 Linear regressions performed: 528

Discovered invariant relationships: 11

Time spent in post-processing validation - Time spent in archiving (i1)

Time spent in transaction validation - # rows completed (i4)

# files arrived - # files completed (i7)

# files started - # files completed (i8)

# files arrived - # files started (i9)

# rows started - # rows completed(i10)

(13)

SaaS CPG application: Broken Invariants

Detected broken invariants with p.i. threshold correspond or include the SLA violations discussed with the operations personnel.

Maintenance staff confirmed the possibility of violations for invariants

i1-i11.

Month Samples Threshold i1 … i7 i8 i9 …

M1 4320 10% 4035 … 3865 3754 3182 … p.i. 5 … 14 13 4 … M2 4464 10% 4204 … 3988 3903 3050 … p.i. 2 … 17 18 6 … M4 4598 10% 4306 … 4208 4106 3422 … p.i. 0 … 8 12 4 ...

(14)

SCoPE: Mining

8 months of observations 1.104.139 jobs

Tupling on job arrivals Sampling time: 60 s

Regression performed on a subset of node processing the same kind of jobs

Identified flow intensities: 15 Linear regressions performed: 105

Discovered invariant relationships: 69

# jobs active - # jobs arrived (i5)

WORKLOAD_VMEMORY - # jobs started (i13)

# jobs active - # jobs started (i16)

# jobs started - # jobs arrived (i19)

virtual memory - # of cores (i40)

CPU% - # jobs active (i48)

memory - # jobs started (i56)

(15)

SCoPE: Broken Invariants

i5, i13, i16 are almost never broken

i21 is broken for more than the 50% of samples and

i40 is broken for almost each samples

i19, i36, i48, i56, i59, i63, i66 are broken for less than the 2% of

samples.

Month Samples i5 i13 i16 i19 i40 i48 i66

M1 44640 13 0 0 431 41443 3328 6442

M2 43200 3 0 0 555 42097 4026 6743

(16)

Conclusions and Future Work

It is possible to automatically

discover invariants in batch systems

They are useful to the operation personnel to

reduce the quantity of data to be analysed for understanding the system behaviour in case of execution problems

In some cases, to detect the execution problem itself (SLA violations)

Future avenues of research:

Use logs from different layers Improve the detection

mechanism:

Different adaptive threshold

Improve the mining process:

Perform a continuous invariant mining: forgetting factor RLS algorithm.

Invariant identification with

Prediction Error Method instead of the Least Squared Method

Model relationships between

multiple time series instead of only pairwise relationships

References

Related documents

Therefore, positive affecting factors on the province tourism development could be counted as: establishment of an efficient system for general tourist attraction,

The single-crystal X-ray analysis showed that the complex forms a zig-zag chain structure, in which the chloro ligands bridge the dinuclear units at the axial positions with the

Frederick Banting and Charles Best extracted insulin from bovine pancreas in 1922, who received the Nobel Prize for their contribution in Medical field with John

The IC 50 values of methanolic extract of Artocarpus heterophyllus were found 35.26 µgm/ml and 35.27 µgm/ml against A549 cell line by MTT and SRB assay methods respectively

The data on primary and secondary enrollment, the assistance amount, the openness to trade and the investment figures for these years were collected from different secondary

As a result of national developmental projects such as increased accessibility to safe water during recent years in Iran, contaminated water is not a main rout of

It includes the small monopole antenna design such as meandered line antenna, inverted-L antenna, inverted-F antenna, planar inverted-F antenna, and multiband antenna