• No results found

Applying Process Mining to IT Big Data

N/A
N/A
Protected

Academic year: 2021

Share "Applying Process Mining to IT Big Data"

Copied!
18
0
0

Loading.... (view fulltext now)

Full text

(1)

Richard F. Eng

PRINCE2, PMP, CSQE, CRE, CQE

14 March 2014

Applying Process Mining to

IT Big Data

The author’s affiliation with The MITRE Corporation

is provided for identification purposes only, and is

not intended to convey or imply MITRE’s

concurrence with, or support for, the positions,

opinions, or viewpoints expressed by the author.

(2)

Biography

Big Data Definitions

Process Mining

Quantitative Analysis Stages and Steps

Applying Process Mining

Lessons Learned

(3)

Biography

Principal Software Systems Engineer at the MITRE

Corporation

Over 20 years of industry experience in telecommunications

and software systems

Areas of interests

Solving systems level problems

Applying quantitative methods to improve business , IT

processes, and software quality

Education:

M.S. in Data Analytics, University of Maryland (Expected 2015)

MBA, Georgetown University

M.S. in Bioengineering, Brooklyn Polytechnic Institute

B.S. in Chemistry, Brooklyn Polytechnic Institute

(4)

Big data is data which

“exceed(s) the capacity or capability of

current or conventional methods and systems.”

In other words,

the notion of “big” is relative to the current standard of

computation.

The National Institute of Standards and Technology

“… the increasing size of data, the increasing rate at which it is

produced and the increasing range of formats and

representations employed. This report predated the term “big

data” but proposed a three-fold definition encompassing the

three Vs”: Volume, Velocity and Variety

. This idea has since

become popular and sometimes includes a

fourth V: Veracity

, to

cover questions of trust and uncertainty

.”

Gartner. In 2001, a Meta (now Gartner) report

(5)

“Big data is the term increasingly used to describe the process of

applying

serious computing power

—the latest in

machine learning

and

artificial intelligence

— to

seriously massive

and often

highly

complex sets of information

.”

Microsoft

“Big data opportunities emerge in organizations generating a

median of

300 terabytes of data a week.

The most common forms of

data analyzed in this way are

business transactions

stored in

relational databases

, followed by

documents, e-mail, sensor data,

blogs, and social media.

Intel

The Method for an Integrated Knowledge Environment open-source

project. The MIKE project argues that big data is

not a function of the

size of a data set

but its

complexity

. Consequently, it is the high

degree of

permutations

and

interactions

within a data set that

defines big data.

(6)

Data analytics technique

System agnostic

Examines large amounts of process data

Provides the ability to

Evaluate and understand actual process flows

Compare actual against expected process flows from a

Policy, procedures, and/or Information Technology perspective

Impact of visualizing actual process flows

Detection of process patterns

Identification of process inefficiencies

Identification of anomalous behavior

Process Mining

(7)

Frame the Problem

• Problem

recognition

• Review previous

results

Solving the Problem

• Modeling and

variable selection

• Data collection

• Data analysis

Communicating and

Acting On Results

• Results

presentation and

action

Quantitative Analysis 3 Stages and 6 Steps

Source: Keeping up with the Quants: Your Guide

to Understanding and Using Analytics –

(8)

Problem recognition

Figure out where the problems are with the

SYSTEM

& fix them fast

E-Commerce site experiencing significant operational availability

issues

Site performance degrades and crashes as more customers access services

No current documentation (e.g., Systems requirements, design specifications,

applications software, test procedures, test scripts, etc.)

– We were Agile. We didn’t have time to …

Review of previous results

Ad hoc problem solving huddles

It’s not my system. It must be your …

Try this, it should/might fix it …

No tracing of business processes to software applications to IT

infrastructure

Manual review of system logs

Frame the Problem

“Furious activity is no substitute for understanding.”

- H. H. Williams

(9)

Modeling and variable selection

Apply process mining to model end user transactions through the

E-commerce site

Variables needed

Unique Transaction ID

, Start and Stop

Time Stamp

, and

Activity

Data collection

Gain access to system log files

Normal operations

During outages and low operational availability

Data analysis

Run process model

Review results

Provide feedback to stakeholders that own the processes,

application software, and IT infrastructure

(10)

Results presentation and action

Demonstrate process mining results to stakeholders

Capture process mining visualization as a video and e-mail to

stakeholders so they can see the process bottlenecks and

deviations

Explain the findings

(11)

Modeling and variable selection

Process mining

Visualize normal and anomalous operations

– Transaction process flows

– Application process flows

– IT Infrastructure process flows

Variables needed

Unique Transaction ID

Start and End Time Stamps

Business Process

Application Software

IT Infrastructure

IP Address

(12)

Data collection

Talk with system administrators to capture log data

Explain the data fields and formats you want

Transform the log files into the format that the process mining

system needs

Data analysis

Load the data into the process mining model

Run the process mining models

Visualize and analyze data

Transaction business process flows

Application software process flows

IT infrastructure process flows

Review the results

Applying Process Mining to IT Big Data

(Continued)

Data Collection

Transform Data

Run the Model & Data Analysis

(13)

Communicate results

Provide feedback to stakeholders

Process owners

Application software team

IT Infrastructure team

Benefits

Visualization of the actual process flows

Comparison of normal and anomalous operations - rather than

manually reviewing individual system logs

More rapidly identify spurious processes

Applying Process Mining to IT Big Data

(Continued)

(14)

Illustration of Process Mining Data

User ID Start Time Stamp End Time Stamp Business Process Step Application Software Infrastructure IP Address

m1000000 10/15/2013 12:03 10/15/2013 12:04 Registration REG APP Server 1 255.130.155.33 m1000000 10/15/2013 12:04 10/15/2013 12:05 Enrollment ENROLL APP Server 1 255.130.155.34 m1000000 10/15/2013 12:05 10/15/2013 12:07 Demographic Information CUSTOMER INFO APP Server 2 255.130.155.35 m1000000 10/15/2013 12:07 10/15/2013 12:08 Customer Verification CUST VERIFICATION APP DB 1 255.130.155.36 m1000000 10/15/2013 12:08 10/15/2013 12:10 Review Service Options SERVICE MENU APP Server 3 255.130.155.37 m1000000 10/15/2013 12:10 10/15/2013 12:12 Select Service SERVICE SELECTION APP Server 4 255.130.155.38 m1000000 10/15/2013 12:12 10/15/2013 12:13 Purchase Service PURCHASE APP Server 4 255.130.155.39 m1000000 10/15/2013 12:13 10/15/2013 12:15 Make Payment PAYMENT APP Server 5 255.130.155.40 m1000000 10/15/2013 12:15 10/15/2013 12:18 Payment Confirmation CONFIRM PAYMENT APP DB 2 255.130.155.41 m1000001 10/15/2013 12:18 10/15/2013 12:22 Exit EXIT APP Server 5 255.130.155.42 m1000002 10/15/2013 12:04 10/15/2013 12:05 Registration REG APP Server 1 255.130.155.33 m1000002 10/15/2013 12:05 10/15/2013 12:07 Enrollment ENROLL APP Server 1 255.130.155.34 m1000002 10/15/2013 12:07 10/15/2013 12:08 Demographic Information CUSTOMER INFO APP Server 2 255.130.155.35 m1000002 10/15/2013 12:08 10/15/2013 12:10 Customer Verification CUST VERIFICATION APP DB 1 255.130.155.36 m1000002 10/15/2013 12:10 10/15/2013 12:12 Review Service Options SERVICE MENU APP Server 3 255.130.155.37 m1000002 10/15/2013 12:12 10/15/2013 12:13 Select Service SERVICE SELECTION APP Server 4 255.130.155.38 m1000002 10/15/2013 12:13 10/15/2013 12:15 Purchase Service PURCHASE APP Server 4 255.130.155.39 m1000002 10/15/2013 12:15 10/15/2013 12:18 Make Payment PAYMENT APP Server 5 255.130.155.40 m1000002 10/15/2013 12:18 10/15/2013 12:22 Payment Confirmation CONFIRM PAYMENT APP DB 2 255.130.155.41 m1000002 10/15/2013 12:22 10/15/2013 12:23 Exit EXIT APP Server 5 255.130.155.42

(15)

Transaction Business Process Flow

Application Software Process Flow

IT Infrastructure and IP Address Process Flow

(16)

Good news

Process mining concept works

Bad news

Not all of the IT infrastructure was instrumented to collect log data

In some cases the log data was aggregate rather than per

transaction

Some logs lacked unique transaction IDs

System time was not synchronized across the IT infrastructure

IaaS provider never contracted to provide system log data

Parting thoughts

Make sure the network and system time for all of your infrastructure

are synchronized

Require all XaaS providers to provide system log data

Plan for process mining at the start of the project

(17)

“The most difficult subjects can be explained to the most

slow-witted man if he has not formed any idea of them already; but

the simplest thing cannot be made clear to the most intelligent

man if he is firmly persuaded that he knows already, without a

shadow of doubt, what is laid before him.” – Leo Tolstoy

(18)

Richard F. Eng

[email protected]

[email protected]

703-201-9112

Figure

Illustration of Process Mining Data

References

Related documents

Central to our confidence in the success of the program: the powerful combination of NMSDC’s 43 years of national experience with MBEs and corporations, the executive

Specifically, passive participants were shown all the screenshots given to active participants in Stage I and Stage II, and, after being informed about whether that active

While suppliers will continue to work to monetize the computing and network assets that underpin the cloud services, it is the operational expertise of billing

Whether grown as freestanding trees or wall- trained fans, established figs should be lightly pruned twice a year: once in spring to thin out old or damaged wood and to maintain

property, it is often the desire of the shareholders to transfer the leased ground (and perhaps some equipment) to the newly created corporation with the intention that the departing

Building on current knowledge, this thesis aims to investigate the link between the colloidal and aggregation behaviour measured by dynamic light scattering, and the

Potential explanations for the large and seemingly random price variation are: (i) different cost pricing methods used by hospitals, (ii) uncertainty due to frequent changes in

Players can create characters and participate in any adventure allowed as a part of the D&D Adventurers League.. As they adventure, players track their characters’