• No results found

Batch Schedule Performance Optimization

N/A
N/A
Protected

Academic year: 2021

Share "Batch Schedule Performance Optimization"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

Batch Schedule

Performance Optimization

Lydia Duijvestijn, Jan van Cappelle, Coen Burki

13 -15 March 2012

(2)

Agenda

 Problem statement

 Solution outline

 The engagement

 The Source2VALUE™ portal

 Results and demo

 Next steps

(3)

From the news papers we infer that many IT problems can have their root

cause in the batch...

 Internet banking unavailable

 Automatic collections doubly booked...

...and after that a faulty correction

 ATM´s unavailable

 Pin payments in shops unavailable

 Stock exchange unavailable

(4)

When we listen to the client, this is what we hear...

 “Between 10:00 am and 4:00 pm we had bad online transaction

response times.”

 “The time window to run my batch is too short.”

 “Processor utilization is too high.”

 “When we merge company A with company B, will there still be

enough time to run our batch jobs?”

 “Our batch schedule is so complex that we don’t know anymore how

our applications work. We are worried that when there are problems we won’t be able to resolve them quickly.”

 “Can you help us reduce our batch processing costs by 20%?”

 “There is no specific department responsible for our batch schedule,

every department has it own standards and, no one thinks about the overall corporate perspective and requirements; how can we

standardize?”

 “We want to clean up our batch schedule but do not know where to

(5)

Why do companies have those problems?

 The core batch systems or Good Old Systems were built 20-30 years or even longer ago

 New systems were built and connected to the old core

 Standard and guidelines were made, but never kept up to date and sometimes every department has its own standards.

 The world changed from 9 to 5 into 24*7. The question is where do we have a time slot to run the batch?

 Companies that used to have one mainframe now have one mainframe with

hundreds of distributed platforms surrounding and connected to this mainframe

(6)

Agenda

 Problem statement

 Solution outline

 The engagement

 The Source2VALUE™ portal

 Results and demo

 Next steps

(7)

The solution is to implement Batch Governance and in the meantime start a

Batch Optimization program

 Batch Governance

– To ensure alignment with the IT strategy and operational processes

 BSPO = Batch Schedule Performance Optimization – Short term BSPO to answer questions such as

• What is the effect of a merger between two banks?

• What is the risk of an outage caused by batch window overrun – Long term BSPO to cleanup the batch maintenance batchlog

• Agree and install standards and guidelines for batch processing • Investigate batch critical timelines

• Prioritize issues and draw up roadmap • Solve issues in order of priority

Long term BSPO is a cyclic approach, based on anti-pattern detection

Detect

Analyse

Change Test

(8)

Batch governance ensures alignment with enterprise architecture, IT

strategy and operational processes

Operational Governance Batch Governance IT Governance Batch Design Authority IT Management Enterprise Architecture

Change the business Run the business Projects Design Authority Change Management Incident & Problem Management Application Support Operations Problems Requests Technical control Quality assurance Advice Threatening trends Failure patterns RCA’s Timelines Technical control Escalation reporting Priorities

Align batch architecture with Enterprise architecture

(9)

Batch Governance scope and tasks

 Scope:

– Design the end to end batch schedule of the enterprise.

– Document and enforce the standards and guidelines to run the batch schedule.

– Assure the quality of the batch schedule during design, build, operation, and change phases.

– Make architectural decisions about the batch schedule, especially for new features in advance.

– Make proposals for improvement.

 Tasks:

– Short Term - addresses the current batch problems and how they can be solved quickly.Manages risk in the current environment and maintains the standards and guidelines up to date.

– Medium Term - creates and maintains the future architecture, including new machines and new applications. Involved with the creation of new solutions.

– Long Term - aligns the business strategy and the batch strategy and implements this strategy effectively. Develops strategic and innovative initiatives.

(10)

BSPO is based upon the detection of anti-patterns and replacement

by patterns

some definitions

 An anti-pattern (DON’T) is similar to a pattern, but leads to negative

consequences

– An anti-pattern describes what to avoid and how to fix a problem when you find it

 A pattern (DO) is a named solution to a recurring problem in a context

– A documented pattern specification communicates expertise from an expert to a novice, educating the novice about what makes the problem difficult and how to go about solving it

– It should provide the reader with enough information to decide whether this pattern is applicable to her problem, and enough understanding to be able to use it successfully

– A pattern describes how to solve a problem but does not advocate a particular implementation of the solution.

(11)

What makes the BSPO approach unique

 With BSPO IBM compares the schedule on one hand with the program

sources (programming language, scripting language and data manipulation language) on the other hand. And checks on anti patterns.

 By using the Source2VALUE™ portal IBM is able to connect all kind of

different data sources

– Batch schedule: TWS, CA7, Control-M – Scripting language: JCL, BASH, Pearl – Programming language: Cobol, PL1, C – Database: IMS, DB2, Oracle

– Platform: Mainframe, Unix, I-series, Windows – More then one schedule

(12)

When looking for performance (anti-)patterns we have to bear in mind the

KPI’s for batch performance

 Elapsed time

Elapsed time is batch ”response time”

– Elapsed time is the time taken from the start of a job to its end – Elapsed time includes I/O time and all other types of wait

The focus of the example BSPO engagement has been on elapsed time

 Throughput

– Throughput is defined as the number of units of work that a system can

process in a given unit of time

 Capacity and resource utilisation

– Capacity is the amount of available system resources to process the batch

– Resource utilisation is the extent to which the capacity of a resource is used – Resources are CPU, Memory, Disk, Network bandwidth

(13)

Focus of the example BSPO engagement has been on batch

elapsed time

Batch elapsed time is composed of three components

Wait time

• Step 1: Eliminate wait times from critical path

Work that should not be done

• Step 2: Eliminate unnecessary processing from critical

path

Work that needs to be done

• Step 3: Optimize necessary processing

In case there are many critical timelines, the priorities have to

be decided

(14)

Agenda

 Problem statement

 Solution outline

 The engagement

 The Source2VALUE™ portal

 Results and demo

 Next steps

(15)

The merger of two banks will cause an increase in workload,

potentially leading to blocking performance problems in multiple

critical batch timelines

Problem statement

Account balances not processed in time

Stock exchange system unavailable

Money transfer foreign currencies unavailable

Bank-office applications unavailable

Service to private clients unsatisfactory

(16)

Optimizing the batch is not exactly a “

one-dimensional

” problem

Analyzing one timeline can be a nightmare, but companies tend to

have several critical timelines...

•This client had over 600 critical timelines

•In the scheduling package there are over 15.000 applications •With over 300.000 jobs running

•With more then half a million relations •IBM has seen jobs with 75 steps

(17)

The client had good standards and did already several

optimization projects in the past.

• 1 job has 1 procedure has 1 step

• Always top down re-runnable

• no problems with restart (temporary files)

• No time loss with restart

• Special resource only for 1 step

• When file is ready file can be delivered to next job

• 20% of components TWS, schedule JCL and procedure JCL and

control cards are generated.

• Components (TWS, JCL and control cards) are build on

development and promoted by tool to test, acceptance and production.

(18)

We proposed a four-phased short term BSPO approach

 Preparation

– Revisit user interface of the tool – Create automated source collection

mechanism

– Agree relevant measurement points – Agree business critical endpoints – Understand migration plans

 Collect, load, store & process – Load data

– Collect and send sources – Store and process sources

 Analysis & weekly intermediate reporting – Perform high level analysis and

store results

– Perform detailed analysis – Report findings

– Define and capture new anti-patterns

 Final reporting

– Compose end report and present results

(19)

Scope

Analysis of critical timelines  Four production timelines  Core applications online point  Analysis of dress rehearsal runs  Several try-outs

 One full dress rehearsal run.  Sources

 TWS Application DB Database  TWS Current Plans

 TWS Track log information  TWS Critical timelines  Schedule JCL

 Procedure JCL  SMF data

Timeline analysis

Critical timelines compared to baseline

Anti-pattern analysis for critical timelines

Whitespace analysis

Analysis on TWS track log data

Long running jobs

Delayed endpoints

Special request analysis

(20)

Wait times

can be recognized as “large gaps between jobs”

1. Cross reference between time, active job and workstation

2. Solution time analysis of abends, followed by zooming in on system log (777)

Job is waiting for unknown reason

Threshhold has been exceeded (e.g. # of // servers)

Deadlocks and abends

[optional]. Cross reference per gap within critical path based on planning + JCL + SMF

Waiting for normal resources (tape units) [+TWS DB]:

If incorrect use of normal resources  remove 

Cross reference per gap within critical path based on planning+TWS AD DB

Waiting for special resources (VSAM datasets, DBs) [+JCL]:

If incorrect use of special resources  remove 

Cross reference per critical path based on PSB+JCL (for IMS) or Views+COBOL Sources+JCL (for DB2)

Dependencies [+JCL+Program]:

If dependency (technically) incorrect  remove 

Gantt chart per critical path based on plan + actuals (track log)

Early start times [plan only]:

If early start time incorrect  remove 

Source Anti-pattern

No or little business

involvement needed

(21)

Work that should not be done

takes on many forms

[optional, maintenance work] Lists, based on plan, actuals and JCL

IEFBR14 jobs

Remove IEFBR14 jobs  

Lists, based on plan, actuals and JCL

Stop and start of databases

Keep 7x24 hrs online  

Lists, based on plan, actuals and JCL

Stop and start of transactions

Stop and start no more than once per night 

Lists, based on plan and JCL

Frequent backups

Of datasets  backups should be based on naming convention,

e.g. CTY is backed up by system   Of databases  reduce frequency  

Lists, based on plan and JCL (parameterised service procedures)

Duplicate copies of datasets and databases

Remove from critical path or remove entirely 

Source Anti-pattern

Business involvement needed to judge

(in)correctness of operations

(22)

Work that must be done

must be done as efficient as possible

Sources, Input / Output datasets

Parallellisation where possible

Run sequential jobstreams in parallel instead Split jobs into multiple parallel jobs

Joblogs, sources, Input / Output datasets, SMF

Job optimisation

Remove unnecessary resource allocations Remove wait times

Remove redundancy and duplications Remove deadlocks

Analysis of the actuals, joblogs

Find the 20% jobs that constitute 80% of the work

Source (Anti-)pattern

Heavy business / application maintenance

vendor involvement needed

(23)

16 Antipatterns were analysed and the list is growing

Runcycle not in line with jobname Anti_SW_015

Wait for input BMP Anti_SW_014

Immediate start when after extend current plan Anti_SW_013

GSAM init. In critical path Anti_SW_012

Offline programs (DBB/DLI) Anti_SW_011

Redundant change status (tablespaces) Anti_SW_010

Redundant start and stop (databases) Anti_SW_009

Redundant start and stop (transactions) Anti_SW_008

Unused code (JCL) Anti_SW_007

Unused code (COBOL) Anti_SW_006

Redundant image copies (backups) Anti_SW_005

Superfluous dummy steps Anti_SW_004

Redundant copies of files Anti_SW_003

Superfluous early start time Anti_SW_002

Superfluous special resource Anti_SW_001

Superfluous relation Anti_SW_000

Name ID

(24)

Agenda

 Problem statement

 Solution outline

 The engagement

 The Source2VALUE™ portal

 Results and demo

 Next steps

(25)

 Anti-patterns are translated to signals on the dashboard

 Whitespace shows gaps in critical timeline

Source2VALUE™ Portal detects and shows

anti-patterns across platforms (TWS, JCL, Cobol, ..)

(26)

 Different views: – Dashboard view

– Timeline view (job flow visualization) – Cross Reference view

– Source view

 All elements are hyperlinked across platforms

 Two starting points for analysis

 The dashboard for analysing anti-patterns

 The job flow visualisation for white space analysis

Source2VALUE™ Portal increases analyzability

and saves time (and money)

Click

(27)

Source2VALUE™ Dashboard shows all jobs with configured

anti-patterns

Optimization opportunity: 7,5% of the jobs in this critical timeline

contain image copy jobs

(28)

Source2VALUE™ Timeline shows gaps in the plan

 Timeline

– Contains slots of continuous work in a critical timeline. – Whitespace means an

opportunity to optimize

 Timeslot Gantt

– Contains individual jobs and dependencies within a timeslot – Used to find the starting jobs in

a timeslot to start further analysis

– Used to detect suspected dependencies and durations

Click

Analyze white space

(29)

Source2VALUE™ lets you focus on optimization opportunities

 Adaptable

– New anti-patterns can be added

– New sources can be added (e.g. PSB)

 Configurable

– The dashboard can be configured depending on the desired focus

 Repeatable – Trend analyses – Difference analysis <expression> <or> <gt metric="DELTA_TARGET" value="1"/> </or> <or> <gt metric="DELTA_DEADLINE" value="1"/> </or> <or> <gt metric="critAndADStartTime" value="1"/> </or> <or> <gt metric="NO_IMS_IC" value="1"/> </or> <or> <gt metric="NO_DB2_IC" value="1"/> </or> <or> <gt metric="NO_IMS_SS" value="1"/> </or> <or> <gt metric="NO_DB2_SS" value="1"/> </or> <or> <gt metric="NO_DUMMYJOBS" value="1"/> </or> </expression>

(30)

Agenda

 Problem statement

 Solution outline

 The engagement

 The Source2VALUE™ portal

 Results and demo

 Next steps

(31)

Example: Dummy jobs in Client database

Sort jobs on anti-pattern and start analysis

25 dummyjobs in critical timeline

Click for job details

(32)

Example: Dummy jobs in Client database

Analyze source code

Dummy job can be removed and

dependencies adjusted

(33)

Example: 17% image copies, start/stops and dummy jobs in critical timeline

Sort jobs on anti-pattern and start analysis

17% of jobs in critical timeline is not core

functionality

(125+133+53+141+100)/2531

4 sequential IMS Image copies in

(34)

Example: Gap in timeline due to early start in Triple A timeline

Analyze gap by investigating timeslot after gap

(35)

Example: Gap in timeline due to early start in critical timeline

Investigate job details to determine early start

FTP incoming stock exchange Job has early

start defined in Application Database.

Business has to be consulted to determine optimization possibilities

(36)

Extra: Standalone job with no early start is potential risk

Investigate why job starts at 3:30

Job pdofi60t is a GSAM initialization job that runs directly after plan extend

at 3:30 instead of at 12:00... Potential risk when previous plan is

(37)

Baseline

• 16 january to 11 februari • Trimmed averages

(See http://nl.wikipedia.org/wiki/Getrimd_gemiddelde)

Critical timeline compared to baseline

• e.g. ET online point

• for DR4-C1 and DR4-C2 dates

41940 Total nr of unique jobs analysed

194871 Total nr of jobs filtered out

858267 Total nr of jobs analysed

1053138 Total nr of jobs available

11-feb Baseline enddate 16-jan Baseline startdate Baseline charactistics ET online point : 12-4 to 16-4 and 19-4 to 23-4 (bottom)

(38)

Agenda

 Problem statement

 Solution outline

 The engagement

 The Source2VALUE™ portal

 Results and demo

 Next steps

(39)

© 2012 IBM Corporation 39

Notices

This information was developed for products and services offered in the U.S.A.

Note to U.S. Government Users Restricted Rights — Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES

CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

(40)

© 2012 IBM Corporation 40

Trademarks

 This presentation contains trade-marked IBM products and technologies. Refer to the following Web site:

(41)

© 2011 IBM Corporation

41

An example of a performance anti-pattern impacting batch elapsed time is

a.The presence of dummy jobs in the schedule b.Redundant use of image copies

c.Existence of obsolete relations in the schedule d.All of the above

Participate in the System z Expert and Superhero contest! Fill in your answer to the question below on the scorecard and deposit your card in the box!

(42)

© 2012 IBM Corporation 42

More information on zEnterprise

 IBM zEnterprise landing page:

http://www.ibm.com/systems/z/hardware/zenterprise/index.html

 IBM zEnterprise 114 (z114): http://www.ibm.com/systems/z/hardware/zenterprise/z114.html

 IBM zEnterprise Events Landing Page: http://www.ibm.com/systems/breakthrough

 IBM software for zEnterprise: http://www.ibm.com/software/os/systemz/announcements

 IBM System Storage: http://www.ibm.com/systems/storage/product/z.html

 IBM Global Financing: http://www.ibm.com/financing/us/lifecycle/acquire/zenterprise/

 IBM Services for zEnterprise: http://www.ibm.com/services/us/gts/zenterprise/index.html

(43)

© 2012 IBM Corporation 43

Thank You

Merci

Bedankt

Gracias!

Obrigado

Danke Japanese English French Russian German Nederlands Spanish Brazilian Portuguese Arabic Traditional Chinese Simplified Chinese Thai

References

Related documents

Under the auspices of FHWA’s International Technology Scanning Program, the Transportation Research Board’s (TRB) National Cooperative Highway Research Program (Panel 20-36),

[r]

Intenzitet trošenja se dobije preko gubitka mase koji se sveden na gubitak volumena zbog različite gustoće uzoraka, te se onda otpornost na abrazijsko trošenje

The base data are: (i) an annual series of nominal equity premia defined as the (simple) difference between the nominal stock return and the nominal risk free rate; and (ii) an

According to complementary learning systems theory (CLST) ( 54 – 56 ) this system (shown in red in Fig. 4 ) provides an integrated representation of the understanding system state,

Furthermore, it also attempts to assess the existence of a synergistic effect between the different risk factors for AMI, by combining poor adherence to the Mediterranean diet

If available, information on the following variables of interest was retrieved: program location by US census region (ie, Northeast, South, Midwest, and West), residency size

2) Some of America's trading partners, including Japan, which have one of the largest and most persistent trade surpluses with the US (Dollars 64bn in1998)- are aiming to disarm