Batch Schedule
Performance Optimization
Lydia Duijvestijn, Jan van Cappelle, Coen Burki
13 -15 March 2012
Agenda
Problem statement
Solution outline
The engagement
The Source2VALUE™ portal
Results and demo
Next steps
From the news papers we infer that many IT problems can have their root
cause in the batch...
Internet banking unavailable
Automatic collections doubly booked...
...and after that a faulty correction
ATM´s unavailable
Pin payments in shops unavailable
Stock exchange unavailable
When we listen to the client, this is what we hear...
“Between 10:00 am and 4:00 pm we had bad online transaction
response times.”
“The time window to run my batch is too short.”
“Processor utilization is too high.”
“When we merge company A with company B, will there still be
enough time to run our batch jobs?”
“Our batch schedule is so complex that we don’t know anymore how
our applications work. We are worried that when there are problems we won’t be able to resolve them quickly.”
“Can you help us reduce our batch processing costs by 20%?”
“There is no specific department responsible for our batch schedule,
every department has it own standards and, no one thinks about the overall corporate perspective and requirements; how can we
standardize?”
“We want to clean up our batch schedule but do not know where to
Why do companies have those problems?
The core batch systems or Good Old Systems were built 20-30 years or even longer ago
New systems were built and connected to the old core
Standard and guidelines were made, but never kept up to date and sometimes every department has its own standards.
The world changed from 9 to 5 into 24*7. The question is where do we have a time slot to run the batch?
Companies that used to have one mainframe now have one mainframe with
hundreds of distributed platforms surrounding and connected to this mainframe
Agenda
Problem statement
Solution outline
The engagement
The Source2VALUE™ portal
Results and demo
Next steps
The solution is to implement Batch Governance and in the meantime start a
Batch Optimization program
Batch Governance
– To ensure alignment with the IT strategy and operational processes
BSPO = Batch Schedule Performance Optimization – Short term BSPO to answer questions such as
• What is the effect of a merger between two banks?
• What is the risk of an outage caused by batch window overrun – Long term BSPO to cleanup the batch maintenance batchlog
• Agree and install standards and guidelines for batch processing • Investigate batch critical timelines
• Prioritize issues and draw up roadmap • Solve issues in order of priority
Long term BSPO is a cyclic approach, based on anti-pattern detection
Detect
Analyse
Change Test
Batch governance ensures alignment with enterprise architecture, IT
strategy and operational processes
Operational Governance Batch Governance IT Governance Batch Design Authority IT Management Enterprise Architecture
Change the business Run the business Projects Design Authority Change Management Incident & Problem Management Application Support Operations Problems Requests Technical control Quality assurance Advice Threatening trends Failure patterns RCA’s Timelines Technical control Escalation reporting Priorities
Align batch architecture with Enterprise architecture
Batch Governance scope and tasks
Scope:
– Design the end to end batch schedule of the enterprise.
– Document and enforce the standards and guidelines to run the batch schedule.
– Assure the quality of the batch schedule during design, build, operation, and change phases.
– Make architectural decisions about the batch schedule, especially for new features in advance.
– Make proposals for improvement.
Tasks:
– Short Term - addresses the current batch problems and how they can be solved quickly.Manages risk in the current environment and maintains the standards and guidelines up to date.
– Medium Term - creates and maintains the future architecture, including new machines and new applications. Involved with the creation of new solutions.
– Long Term - aligns the business strategy and the batch strategy and implements this strategy effectively. Develops strategic and innovative initiatives.
BSPO is based upon the detection of anti-patterns and replacement
by patterns
some definitions
An anti-pattern (DON’T) is similar to a pattern, but leads to negative
consequences
– An anti-pattern describes what to avoid and how to fix a problem when you find it
A pattern (DO) is a named solution to a recurring problem in a context
– A documented pattern specification communicates expertise from an expert to a novice, educating the novice about what makes the problem difficult and how to go about solving it
– It should provide the reader with enough information to decide whether this pattern is applicable to her problem, and enough understanding to be able to use it successfully
– A pattern describes how to solve a problem but does not advocate a particular implementation of the solution.
What makes the BSPO approach unique
With BSPO IBM compares the schedule on one hand with the program
sources (programming language, scripting language and data manipulation language) on the other hand. And checks on anti patterns.
By using the Source2VALUE™ portal IBM is able to connect all kind of
different data sources
– Batch schedule: TWS, CA7, Control-M – Scripting language: JCL, BASH, Pearl – Programming language: Cobol, PL1, C – Database: IMS, DB2, Oracle
– Platform: Mainframe, Unix, I-series, Windows – More then one schedule
When looking for performance (anti-)patterns we have to bear in mind the
KPI’s for batch performance
Elapsed time
–Elapsed time is batch ”response time”
– Elapsed time is the time taken from the start of a job to its end – Elapsed time includes I/O time and all other types of wait
The focus of the example BSPO engagement has been on elapsed time
Throughput
– Throughput is defined as the number of units of work that a system can
process in a given unit of time
Capacity and resource utilisation
– Capacity is the amount of available system resources to process the batch
– Resource utilisation is the extent to which the capacity of a resource is used – Resources are CPU, Memory, Disk, Network bandwidth
Focus of the example BSPO engagement has been on batch
elapsed time
Batch elapsed time is composed of three components
–
Wait time
• Step 1: Eliminate wait times from critical path
–
Work that should not be done
• Step 2: Eliminate unnecessary processing from critical
path
–
Work that needs to be done
• Step 3: Optimize necessary processing
In case there are many critical timelines, the priorities have to
be decided
Agenda
Problem statement
Solution outline
The engagement
The Source2VALUE™ portal
Results and demo
Next steps
The merger of two banks will cause an increase in workload,
potentially leading to blocking performance problems in multiple
critical batch timelines
Problem statement
Account balances not processed in time
Stock exchange system unavailable
Money transfer foreign currencies unavailable
Bank-office applications unavailable
Service to private clients unsatisfactory
Optimizing the batch is not exactly a “
one-dimensional
” problem
Analyzing one timeline can be a nightmare, but companies tend to
have several critical timelines...
•This client had over 600 critical timelines
•In the scheduling package there are over 15.000 applications •With over 300.000 jobs running
•With more then half a million relations •IBM has seen jobs with 75 steps
The client had good standards and did already several
optimization projects in the past.
• 1 job has 1 procedure has 1 step
• Always top down re-runnable
• no problems with restart (temporary files)
• No time loss with restart
• Special resource only for 1 step
• When file is ready file can be delivered to next job
• 20% of components TWS, schedule JCL and procedure JCL and
control cards are generated.
• Components (TWS, JCL and control cards) are build on
development and promoted by tool to test, acceptance and production.
We proposed a four-phased short term BSPO approach
Preparation
– Revisit user interface of the tool – Create automated source collection
mechanism
– Agree relevant measurement points – Agree business critical endpoints – Understand migration plans
Collect, load, store & process – Load data
– Collect and send sources – Store and process sources
Analysis & weekly intermediate reporting – Perform high level analysis and
store results
– Perform detailed analysis – Report findings
– Define and capture new anti-patterns
Final reporting
– Compose end report and present results
Scope
Analysis of critical timelines Four production timelines Core applications online point Analysis of dress rehearsal runs Several try-outs
One full dress rehearsal run. Sources
TWS Application DB Database TWS Current Plans
TWS Track log information TWS Critical timelines Schedule JCL
Procedure JCL SMF data
Timeline analysis
– Critical timelines compared to baseline
– Anti-pattern analysis for critical timelines
– Whitespace analysis
Analysis on TWS track log data
– Long running jobs
– Delayed endpoints
Special request analysis
Wait times
can be recognized as “large gaps between jobs”
1. Cross reference between time, active job and workstation
2. Solution time analysis of abends, followed by zooming in on system log (777)
Job is waiting for unknown reason
Threshhold has been exceeded (e.g. # of // servers)
Deadlocks and abends
[optional]. Cross reference per gap within critical path based on planning + JCL + SMF
Waiting for normal resources (tape units) [+TWS DB]:
If incorrect use of normal resources remove
Cross reference per gap within critical path based on planning+TWS AD DB
Waiting for special resources (VSAM datasets, DBs) [+JCL]:
If incorrect use of special resources remove
Cross reference per critical path based on PSB+JCL (for IMS) or Views+COBOL Sources+JCL (for DB2)
Dependencies [+JCL+Program]:
If dependency (technically) incorrect remove
Gantt chart per critical path based on plan + actuals (track log)
Early start times [plan only]:
If early start time incorrect remove
Source Anti-pattern
No or little business
involvement needed
Work that should not be done
takes on many forms
[optional, maintenance work] Lists, based on plan, actuals and JCL
IEFBR14 jobs
Remove IEFBR14 jobs
Lists, based on plan, actuals and JCL
Stop and start of databases
Keep 7x24 hrs online
Lists, based on plan, actuals and JCL
Stop and start of transactions
Stop and start no more than once per night
Lists, based on plan and JCL
Frequent backups
Of datasets backups should be based on naming convention,
e.g. CTY is backed up by system Of databases reduce frequency
Lists, based on plan and JCL (parameterised service procedures)
Duplicate copies of datasets and databases
Remove from critical path or remove entirely
Source Anti-pattern
Business involvement needed to judge
(in)correctness of operations
Work that must be done
must be done as efficient as possible
Sources, Input / Output datasets
Parallellisation where possible
Run sequential jobstreams in parallel instead Split jobs into multiple parallel jobs
Joblogs, sources, Input / Output datasets, SMF
Job optimisation
Remove unnecessary resource allocations Remove wait times
Remove redundancy and duplications Remove deadlocks
Analysis of the actuals, joblogs
Find the 20% jobs that constitute 80% of the work
Source (Anti-)pattern
Heavy business / application maintenance
vendor involvement needed
16 Antipatterns were analysed and the list is growing
Runcycle not in line with jobname Anti_SW_015
Wait for input BMP Anti_SW_014
Immediate start when after extend current plan Anti_SW_013
GSAM init. In critical path Anti_SW_012
Offline programs (DBB/DLI) Anti_SW_011
Redundant change status (tablespaces) Anti_SW_010
Redundant start and stop (databases) Anti_SW_009
Redundant start and stop (transactions) Anti_SW_008
Unused code (JCL) Anti_SW_007
Unused code (COBOL) Anti_SW_006
Redundant image copies (backups) Anti_SW_005
Superfluous dummy steps Anti_SW_004
Redundant copies of files Anti_SW_003
Superfluous early start time Anti_SW_002
Superfluous special resource Anti_SW_001
Superfluous relation Anti_SW_000
Name ID
Agenda
Problem statement
Solution outline
The engagement
The Source2VALUE™ portal
Results and demo
Next steps
Anti-patterns are translated to signals on the dashboard
Whitespace shows gaps in critical timeline
Source2VALUE™ Portal detects and shows
anti-patterns across platforms (TWS, JCL, Cobol, ..)
Different views: – Dashboard view
– Timeline view (job flow visualization) – Cross Reference view
– Source view
All elements are hyperlinked across platforms
Two starting points for analysis
The dashboard for analysing anti-patterns
The job flow visualisation for white space analysis
Source2VALUE™ Portal increases analyzability
and saves time (and money)
Click
Source2VALUE™ Dashboard shows all jobs with configured
anti-patterns
Optimization opportunity: 7,5% of the jobs in this critical timeline
contain image copy jobs
Source2VALUE™ Timeline shows gaps in the plan
Timeline
– Contains slots of continuous work in a critical timeline. – Whitespace means an
opportunity to optimize
Timeslot Gantt
– Contains individual jobs and dependencies within a timeslot – Used to find the starting jobs in
a timeslot to start further analysis
– Used to detect suspected dependencies and durations
Click
Analyze white space
Source2VALUE™ lets you focus on optimization opportunities
Adaptable
– New anti-patterns can be added
– New sources can be added (e.g. PSB)
Configurable
– The dashboard can be configured depending on the desired focus
Repeatable – Trend analyses – Difference analysis <expression> <or> <gt metric="DELTA_TARGET" value="1"/> </or> <or> <gt metric="DELTA_DEADLINE" value="1"/> </or> <or> <gt metric="critAndADStartTime" value="1"/> </or> <or> <gt metric="NO_IMS_IC" value="1"/> </or> <or> <gt metric="NO_DB2_IC" value="1"/> </or> <or> <gt metric="NO_IMS_SS" value="1"/> </or> <or> <gt metric="NO_DB2_SS" value="1"/> </or> <or> <gt metric="NO_DUMMYJOBS" value="1"/> </or> </expression>
Agenda
Problem statement
Solution outline
The engagement
The Source2VALUE™ portal
Results and demo
Next steps
Example: Dummy jobs in Client database
Sort jobs on anti-pattern and start analysis
25 dummyjobs in critical timeline
Click for job details
Example: Dummy jobs in Client database
Analyze source code
Dummy job can be removed and
dependencies adjusted
Example: 17% image copies, start/stops and dummy jobs in critical timeline
Sort jobs on anti-pattern and start analysis
17% of jobs in critical timeline is not core
functionality
(125+133+53+141+100)/2531
4 sequential IMS Image copies in
Example: Gap in timeline due to early start in Triple A timeline
Analyze gap by investigating timeslot after gap
Example: Gap in timeline due to early start in critical timeline
Investigate job details to determine early start
FTP incoming stock exchange Job has early
start defined in Application Database.
Business has to be consulted to determine optimization possibilities
Extra: Standalone job with no early start is potential risk
Investigate why job starts at 3:30
Job pdofi60t is a GSAM initialization job that runs directly after plan extend
at 3:30 instead of at 12:00... Potential risk when previous plan is
Baseline
• 16 january to 11 februari • Trimmed averages
(See http://nl.wikipedia.org/wiki/Getrimd_gemiddelde)
Critical timeline compared to baseline
• e.g. ET online point
• for DR4-C1 and DR4-C2 dates
41940 Total nr of unique jobs analysed
194871 Total nr of jobs filtered out
858267 Total nr of jobs analysed
1053138 Total nr of jobs available
11-feb Baseline enddate 16-jan Baseline startdate Baseline charactistics ET online point : 12-4 to 16-4 and 19-4 to 23-4 (bottom)
Agenda
Problem statement
Solution outline
The engagement
The Source2VALUE™ portal
Results and demo
Next steps
© 2012 IBM Corporation 39
Notices
This information was developed for products and services offered in the U.S.A.
Note to U.S. Government Users Restricted Rights — Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.
© 2012 IBM Corporation 40
Trademarks
This presentation contains trade-marked IBM products and technologies. Refer to the following Web site:
© 2011 IBM Corporation
41
An example of a performance anti-pattern impacting batch elapsed time is
a.The presence of dummy jobs in the schedule b.Redundant use of image copies
c.Existence of obsolete relations in the schedule d.All of the above
Participate in the System z Expert and Superhero contest! Fill in your answer to the question below on the scorecard and deposit your card in the box!
© 2012 IBM Corporation 42
More information on zEnterprise
IBM zEnterprise landing page:
http://www.ibm.com/systems/z/hardware/zenterprise/index.html
IBM zEnterprise 114 (z114): http://www.ibm.com/systems/z/hardware/zenterprise/z114.html
IBM zEnterprise Events Landing Page: http://www.ibm.com/systems/breakthrough
IBM software for zEnterprise: http://www.ibm.com/software/os/systemz/announcements
IBM System Storage: http://www.ibm.com/systems/storage/product/z.html
IBM Global Financing: http://www.ibm.com/financing/us/lifecycle/acquire/zenterprise/
IBM Services for zEnterprise: http://www.ibm.com/services/us/gts/zenterprise/index.html
© 2012 IBM Corporation 43