WAS Performance on i5/os. Lisa Wellman May 2010

76 

Loading....

Loading....

Loading....

Loading....

Loading....

Full text

(1)

WAS Performance on i5/OS

Lisa Wellman

peace@us.ibm.com

(2)

A simplified view: major WAS

functions widely used

„ Administered Java runtime environment „ HTTP request routing

„ Web container

… Web thread pool

… Servlet (and JSP) lifecycle

„ Database connections

… Pooling connections, prepared statements

„ Security … Authentication, authorization … Administration, application „ transaction control „ EJB container „ JMS services „ web services „ Etc, etc, etc

WAS is middleware, it doesn’t do anything without an application HTTP Server

WebSphere Application Server Web

container

Database connection pool

(3)

WebSphere Application

Server performance

Lisa Wellman peace@us.ibm.com

(4)

WAS Queue “Funnel”

„

Queues

…

HTTP Server threads

…

Web container threads

…

ORB pool

…

Data source pools

…

Etc.

„

Queues get smaller as the request goes deeper into the

system

…

Better to wait “near the network”

„ Don’t overload the system, bigger is not always better …

Some requests serviced without backend resources

„

http://publib.boulder.ibm.com/infocenter/wsdoc400/v6r0/i

ndex.jsp

…

Tuning performance →

Tuning the application server

(5)
(6)

Performance Tools for WAS

Environments

„

Tools for underlying infrastructure (Java,

OS) plus….

…

Performance Monitoring Infrastructure (PMI)

„

Tivoli Performance Viewer (TPV)

…

i5/OS Web Admin GUI

„

HTTP real time stats

„

Web performance advisor (WPA)

„

Web performance monitor (WPM)

(7)

Tivoli Performance Viewer (TPV)

„

TPV is a way of viewing PMI data

„

Impact of each level

…

See documentation for each counter

„ Monitoring → Monitoring overall system health → Performance

Monitoring Infrastructure (PMI) → PMI data organization (link at bottom of page) → counter pages (links at bottom of page),

“overhead” column

„

Never use JVMPI

…

Level “All”, any JVM subcategories

„

Recommend enabling PMI service

…

Levels can be dynamically set as needed

…

Use “basic” or “custom” levels

(8)

i5/OS Web Administration GUI

(port 2001)

„

HTTP server real time statistics

„

Web performance advisor (WPA)

„

Web performance monitor (WPM)

(9)

Average response time in sec

HTTP data also in Collection Services

(10)

Web performance advisor (WPA)

„

Looks at static configuration information

…

Not tailored to your load

„

Checks basic settings on system, HTTP server,

WAS

…

Gives recommendations and allows acceptance

„

Advice gives information

„

Like having a performance expert review your

configuration, and provide a report

(11)

HW, TCP, etc

HTTP, WAS config

(12)

Web performance monitor (WPM)

„

Uses ARM under the covers

…

Restarts the HTTP server and WAS in order

to enable ARM

„

Restart again to turn it off

…

ARM overhead ~20%

„

See performance at different layers

…

HTTP, WAS, DB2

(13)

Web performance monitor (WPM)

HTTP and Application Server are restarted!!!

ARM is enabled (20% overhead)

(14)

Look at CPU and response time

Hit Refresh

Threads

Transactions

(15)

Monitor transaction for specific IP (Client)

(16)

Use it for clients with long response times,

to see where time is spent (HTTP, application, DB)

(17)

Other Java-based tools

„

ITCAM – IBM Tivoli Composite App Mgr

„

WSAD profiler

„

3

rd

party, such as Wily Introscope

„

Open source / freeware ?

(18)

Review

„

Monitor WAS queuing network with

…

Performance Monitoring Infrastructure (PMI)

„

Tivoli Performance Viewer (TPV)

…

i5/OS Web Admin GUI

„

HTTP real time stats

„

Web performance advisor (WPA)

„

Web performance monitor (WPM)

(19)

Java performance

Lisa Wellman peace@us.ibm.com

(20)

IBM i JVM options

Classic

IT 32-bit

IT 64-bit

Classic

IT 32-bit

IT 64-bit

Classic

V6R1

IT 32-bit

IT 64-bit

IT 32-bit

IT 64-bit

NA

V7R1

Classic

IT 32-bit

Classic

6.1

Classic

IT 32-bit

7.0

Classic

Classic

6.0

V5R4

V5R3

i5/OS

WAS

6.0 EOS September 30, 2010

(21)

32-bit vs. 64-bit

„

Maximum heap size

…

32-bit gives limited heap size due to limited pointer

addressability

„

~4GB theoretical limit for the entire job

„

In reality, WAS limits Java heap to ~2GB or less (1.5 safe)

…

64-bit gives unlimited heap from a practical

perspective

„

Runtime heap size

…

Smaller pointers results in a smaller heap

…

Smaller heap means better performance

(22)

Reduced memory requirements

„

Footprint can be large due to

a few factors

… 64-Bit JVM requires internal

pointers to be twice as large

… Asynchronous GC can

cause heap to get larger

… Implementation could not

move Java objects to compact heap

„

IT JVM has about a 40%

smaller memory footprint

… 32-Bit JVM has smaller

addressability

… Stop-the-world GCs are

performed when heap approaches max

… Implementation can move

Java objects, allowing a heap compaction

(23)
(24)

GC Policies

„ optthruput (default)

… Gives best performance overall

… No concurrent mark or sweep (completely STW)

„ optavgpause

… Use to reduce STW GC pause times … Uses concurrent mark and sweep

„ gencon

… Generational Concurrent Garbage Collector … Good for apps with many short-lived objects

… Objects created in nursery, which is further split into allocate and survivor areas,

“scavenge” is the term used for cleaning (GC) done in this area „ Copying scheme of nursery reduces fragmentation

„ Adaptive size and tilting ratio

… Moved to tenured area after reaching threshold age

… Uses concurrent mark in tenure area. Does not use concurrent sweep.

„ subpool

… Scales well on very large multi-processor machines

… Reduces contention on allocation lock by using many size-based free lists … No concurrent mark or sweep

Specify with gcpolicy, for example “–Xgcpolicy:optavgpause”

Gencon has been working well for WAS environments

(25)

Tuning garbage collection

„

Reasonable to just try different policies and

measure throughput / pause times

„

Other strategy is to turn on verbosegc and

interpret resulting data

…

-verbose:gc

(or

-verbosegc

) writes its output to the

standard error stream (native_stderr for WAS)

…

Use -Xverbosegclog:filename

to direct output

elsewhere

„

Best since ‘other’ output does not mess up the verbose

GC format so tools can read it.

„

For WAS, I like to put the output in the logs folder, so I

use -Xverbosegclog:logs/verbosegc

You must manage these files, WAS does not!

(26)

Recommendations

„

Maximum heap size

…

Look at used heap, maximum value

…

Add 25% and set the value for the JVM

…

Max for WAS is 1.5-2.0GB

„

Pause Times

…

Look if pause times too long

…

Maybe choose another gc policy

„

Time between garbage collection

…

Look at intervals between garbage collections

…

If they are short (GC runs almost continuously) increase max

heap size

„

Compaction times

…

Look at compact times

(27)

Java Memory Usage

„

Java in particular is adversely affected by paging

…

GC must touch every object in the heap

„

Disparate workloads result in more paging than

similar workloads

„

Separating workloads facilitates performance

monitoring and tuning

„

Additional memory, if any, is minimal

(28)

Java Memory Usage

„

Separate workloads with memory pools, LPAR

„

Do NOT allow automatic memory adjustment

(system value QPFRADJ)

…

Prefer to move memory with scheduled jobs if

required (e.g. nightly batch jobs)

…

If enabled, protect WAS pool with a sufficient

minimum value (WRKSHRPOOL F11) or use a

private pool

„

Determine memory requirements by adding JVM

sizes (heap AND native memory)

(29)

Performance Tools for (IT) Java workloads

„

CL commands for Java

„

SST jvminfo macro

„

Traces or dumps /

IBM support assistant (ISA) tools

…

verboseGC /

GC and Memory Visualizer

…

Java dump /

Thread Analyzer

…

Heap dump /

MDD4J

…

System dump

„

PEX

…

TPROF

„

iDoctor

…

JobWatcher

…

PTDV

„

“Normal” i5/OS tools like collection services and i5/OS

(30)

V6R1 CL commands

„

WRKJVMJOB

„

PRTJVMJOB

„

GENJVMDMP

Use to work with IT JVMs (not Classic JVMs)

(31)

jvminfo SST macro

„

V5R4 PTFs

MF42160,MF42128,SI28174,SI28142

„

Parms

…

<none>

dumps JVM addresses

…

-gcCycles <vm>

last 300 GCs

…

-threadsl <vm>

stacks and locks

…

-java <vm>

javacore file

…

-heap <vm>

heapdump (phd file)

…

-system <vm>

core file

…

-verbosegc <vm> [off]

turn on/off verbose GC

…

(32)

IBM Support Assistant (ISA)

„

The convergence spot for all tools and information from

IBM. Based on Eclipse technology and product updater.

…

Support documentation and troubleshooting guides

…

Tools

…

Problem submission into IBM

„

Free, download from

(33)

Cross-platform IT JVM tools

„

IT JVM is cross-platform, and so are the tools

„

Diagnostics are primarily “dump and analyze”

1.

Generate a dump of data

2.

Use an ISA tool to analyze the dump

There are additional ISA tools not covered – these

are currently the strategic ones / the ones I find

most useful.

(34)

Trace: Verbose GC

„ Cycles which begin for a reason other than “threshold

allocation reached”

„ Heap growth over time (live objects or current heap size) „ Long collection time, especially if one cycle starts as soon

as the previous one ends Key things to look for

JVM log, mid-level analysis Type of tool

Verbose Garbage Collection Full name

Minimal Overhead

Moderate Complexity

Monitor garbage collector behavior, and check for object leaks.

What to use it for

In WAS use checkbox in console, otherwise use –

verbose:gc JVM option. Output goes to native_stderr file unless you use -Xverbosegclog:logs/verbosegc

Or turn on/off dynamically with WRKJVMJOB or SST. How to get it

(35)

Tool: GC Visualizer

„ High level info and recommendation in report, details in

line plots, focus on “Used heap (after collection)”

„ This tool is supposed to be strategic and supported

Key things to look for

Parsing tool of a Verbose GC collection Type of tool

IBM Monitoring and Diagnostic Tools for Java™ - Garbage Collection and Memory Visualizer

Full name

Minimal (Verbose GC only) Overhead

Simple to moderate Complexity

Detecting object leaks and monitoring heap usage. Compare different runs.

What to use it for

Part of ISA How to get it

Verbose GC

Output

On line plot, change axis for different views and use VGC Data menu for data points; report gives executive overview; data is summary of GCs

(36)

Dump: Javacore

Dump of the current status of the JVM (human-readable) Type of tool

„ Current heap size

„ Threads which are “stuck” (stack information)

Key things to look for

Javacore, JavaDump, or thread dump Full name

Minimal Overhead

Moderate Complexity

Dump information about a running JVM, including the classpath, basic heap information and thread information (state, locks and stacks).

What to use it for

Mechanism included in J9 JVM

„The heap dump will be generated (by default) when:

… JVM terminates unexpectedly …Signal sent via “kill –QUIT <pid>”

„User code calls com.ibm.jvm.Dump.JavaDump() „SST jvminfo –java <task>

„GENJVMDMP *JAVA (V6R1)

(37)

Tool: Thread and Monitor

Dump Analyzer

„ Java thread state and stacks out of place. „ Deadlock situations that are occurring.

„ Thread leaks

Key things to look for

Parsing tool of a javacore Type of tool

IBM Thread and Monitor Dump Analyzer (TMDA) Full name

Minimal (client post processing of a javacore file) Overhead

Simple Complexity

Detecting Java hangs and delays. Can compare dumps. What to use it for

Part of ISA How to get it

Javacore

dump

(38)

Dump: HeapDump

Binary file only readable by parsing programs. Use opts=CLASSIC for human-readable form

(e.g. -Xdump:heap:opts=CLASSIC+PHD) Type of tool

„ Continuous growth of objects

Key things to look for

Heapdump file (phd files) Full name

Heavy, client overhead very heavy Overhead

„Analyze the file with tools, such as MAT. „Debug object leaks

What to use it for

Mechanism included in J9 JVM

„The heap dump will be generated (by default) when:

… OutOfMemoryError occurs in the JVM

…Specify –Xdump:heap for other options, including signal option

with “kill –QUIT <pid>”

„User code calls com.ibm.jvm.Dump.HeapDump() „SST jvminfo –heap <task>

„GENJVMDMP *HEAP (V6R1) „wsadmin for WAS

(39)

Heap analysis tools

Heap/memory analysis is very hard

Tools sometimes help

Tools are resource intensive (memory)

„

MDD4J intended for relatively simple, first-pass

analysis, target casual users

…

Will remain beta, no enhancements

„

MAT (Memory Analyzer Tool) for more complex

(40)

ISA Tools: Java strategy

„

Health Center

…

Newer, may have promise

IBM’s JVM team is converging on a family of tools - “IBM Monitoring and

Diagnostic Tools for Java”. These have the best chance of being strategic and supported.

(41)

“normal” IBM i performance tools /

interfaces

„

WRKACTJOB

„

WRKSYSSTS

„

WRKSYSACT

„

WRKDSKSTS

„

Collection Services

„

Management Central Monitors

„

Performance Data Investigator

„

iDoctor (JW, PA, PTDV)

(42)

IBM i Performance Tools and IT JVM

No iDoctor HeapAnalysis

Yes, with V5R4 PTFs WRKJOB command (stacks)

Yes, in V6R1. V5R4 limited to jobs/threads only iDoctor JobWatcher No DMPJVM, ANZJVM Yes, with V5R4 PTFs SST macros No PEX Java events (object creates,

entry/exit)

Yes, with V5R4 PTFs, but no stacks PEX TProf

Works with IT JVM? Tool

(43)

PEX TPROF

„

Identifies users of CPU by sampling

…

ADDPEXDFN DFN(TPROF5) TYPE(*PROFILE)

PRFTYPE(*JOB) JOB(*ALL) TASK(*ALL) MAXSTG(100000)

INTERVAL(5) TEXT('TProf - 5 ms sampling interval')

…

STRPEX / ENDPEX

„ Trace for 500K events, or as long as possible …

Use PRTPEXRPT or PTDV for analysis

„ PRTPEXRPT MBR(TEST) LIB(MYLIB) TYPE(*PROFILE)

PROFILEOPT(*SAMPLECOUNT *PROCEDURE) ORDER(*ASCCENDING)

… Also leave out PROFILEOPT parameter to use default *PROGRAM

value instead of *PROCEDURE

„

Measure Java GC

…

Target 15% or less

„

Identify application problems

…

Operations either run frequently or are processor intensive

Can also use PEX Analyzer

(44)

iDoctor

„

4 components

…

Job Watcher

„

Collection Services Investigator

„

Disk Watcher

…

PEX analyzer

…

Heap Analysis

…

PTDV (Performance Trace Data Visualizer)

= fee, free 45-day trial

= free

„

Client and server components

„

Command and GUI interfaces

(45)
(46)
(47)
(48)

IT JVM Tools

• System resource usage such as CPU, memory pools, disk and IO

Moderate to Complex Free

i tools such as PEX, Collection Services, Performance

Investigator

• Monitor heap / GC

• Waits, run signature, CPU users • … and more

Moderate to Complex Free & fee

iDoctor

• Display / dump various information Moderate Free SST macros V6R1 commands Verbose GC IBM Support Assistant Data/Tool GC and Memory

Visualizer Free Moderate • Monitor heap size and GC behavior

• At every GC cycle, information is logged about the Java heap and GC functions.

• Useful to determine if your application has memory leaks, monitor your current heap size, frequency and length of GC cycles, etc

Moderate Free

• IBM portal for solving both functional and

performance issues. Work in progress as tools are added.

• Provides searching, problem reporting, updating tools and managing dumps.

Simple Free

What it is used for Complexity

Cost

ISA tools have this background

(49)

IT JVM Tools (cont)

• Analyze heap dumps

• Various analysis and report options Complex

Free MAT

• javacore parsing tool

• Compare thread stacks between dumps • Monitor (Java lock) analysis

Simple Free

Thread and Monitor Dump Analyzer

• Generated by JVM when serious error occurs • Let IBM do analysis

Complex Free

core file

• javacore parsing tool

• Analysis to get very high level view of work via grouping similar thread stacks

• Monitor (Java lock) analysis Simple

Free ThreadAnalyzer

• Binary dump file with the contents of the Java heap.

• Feed into tools to parse the output. Complex

Free Heapdump file

• Analyze heap dumps

• Pinpoint object leaks and who is rooting the object Complex

Free MDD4J

javacore file Data/Tool

• Also referred to as a JavaDump.

• The Javacore shows information about threads within the JVM (state, stack, locking)

Moderate Free

What it is used for Complexity

(50)

„

Monitoring GC

verboseGC and

Garage Collection and Memory

Analyzer , WRKJVMJOB

„

Stack dumps

javacore and ThreadAnalyzer or

Thread and

Monitor Dump Analyzer

, WRKJVMJOB

„

Heap dumps

heapdump and MDD4J or

Memory Analyzer

„

CPU usage

Health Monitor,

PEX TPROF

(51)

Review

„

Java and memory

„

IT JVM GC behavior and tuning

„

IT JVM Performance tools

…

V6R1 CL commands

…

jvminfo SST macro

…

Traces or dumps /

IBM support assistant (ISA) tools

„ verboseGC / GC and Memory Visualizer

„ Java dump / Thread & Monitor Dump Analyzer „ Heap dump / MDD4J & MAT

„ System dump …

PEX

…

iDoctor

…

“Normal” i5/OS tools like collection services, i5/OS commands, Systems

Director

(52)

Performance Roadmap: IT JVM

„

High CPU

…

WRKACTJOB

…

In WAS jobs: TPROF, PTDV

…

In DB jobs: DBMon

„

Other Problems

…

Look at GC health: SST/WRKJVMJOB, Verbose GC

„

Tune GC (Policies, heap sizes)

„

Leak (HeapDump, MAT)

…

Javacorefile (Thread & Monitor Dump Analyzer)

…

Performance Monitoring Infrastructure (PMI)

…

iDoctor JobWatch

(53)

If you only have 5 minutes to collect data

(e.g. need to shut down and recover)

„

Dump GcCycles with SST or WRKJVMJOB

(V6R1) to printer

…

Not needed if verbosegc is on

„

Check CPU, if high run a TPROF

„

Create javacore file (or several)

„

Run a JobWatcher trace

„

Grab WAS logs

(54)

HTTP server Performance

Lisa Wellman peace@us.ibm.com

(55)

General Performance Tips

„

Minimize the number of requests per page

… Each resource reference on a page is a separate request

„

Flatten the directory structure and use short paths

„

Configure FRCA caching of static content

„

Configure memory caching of SSL static content

„

Configure Server Side Includes (SSI) only in the scope of

where they are used

„

Do not configure DNS client hostname lookups for logging

„

Do not use .htaccess files

…

Set AllowOveride directive to Off

„

Build CGI programs in "named" activation group

„

Use StartCGI directive to "pre-start" CGI jobs

(56)

Tune ThreadsPerChild Directive

„

Controls the number of concurrent requests that the

server can process

…

Default is 40

„

More is not necessarily going to be better

…

Too many can cause processor thrashing

„

It is best to tune this in a controlled environment

…

With some simulation tool driving transactions

„

Or, change it and let it run for a day

…

Use access log reports to analyze your traffic

…

And fine tune over a period of time

(57)

Use Persistent Connections

„

The System i has specific code that allows for effective

use of persistent connections (Keepalive).

…

Asynchronous I/O support avoids having a thread tied up with a

single request.

„

Allow persistent connections

…

use a single "socket" connection for multiple requests from a

single client

„

Time to wait between requests

…

Time the server keeps the "socket" connection open waiting for

another request

„

Maximum requests per connection

…

Number of requests allowed before the server closes the

(58)

JDBC access of DB2/400

Lisa Wellman

peace@us.ibm.com

(59)

JDBC drivers

„

Native JDBC driver

…

Best performance when database is local

(same partition) to client

„

Toolbox JDBC driver

…

Best performance when database is remote

from client

(60)

Toolbox JDBC Driver

Native JDBC driver

(61)

Database Performance

… is a huge topic

Performance analysis starting from the backend

DB is very effective

„

DB tuning can result in large gains

„

Workload elimination also possible

„

A good way to understand the application at a

(62)

Optimizing Performance

with Caching

Lisa Wellman peace@us.ibm.com

(63)

Cache strategy

„

Cache as close to the network as possible

„

Cache configuration requires application

and business knowledge

„

Benefit can be large, but effort is required

(some caches more than others). Weigh

costs and potential benefits; target those

with the most potential.

(64)

Caching layers

„

Cache as close to the edge as possible

…

e.g. Edge Server cache

…

HTTP server

„

Use FRCA for public content

„

Local caches for secure content

„

ESI caching in WAS HTTP plugin

…

WAS

„

Dynamic cache

…

Application

(65)

Performance Tuning

Parameters for

Java/WAS

(66)

Tuning Areas

„

System

„

HTTP

„

WAS

„

Database

Tune when you have major changes (e.g. new applications, upgrades, or more users), and when you have performance concerns

(67)

i5/OS

„

System values

…

QPFRADJ, QPRCMLTTSK, QQRYDEGREE,

QMAXACT, etc.

„

Pools

…

Separate different workloads

„

Probably database

„

Especially Java

…

Memory, max active

(68)

HTTP server

„

Number of threads

„

Compression

…

Useful for slow networks, has CPU cost

„

Cache settings

…

FRCA, local, etc.

(69)

WAS

Java

„

Queuing network

… Web container threads … ORB threads

… Data source connection pools

„

Session persistence

„

Performance tools

… Trace, PMI, ARM

„

Cache settings

„

Class reloading

„

Security

„

Isolation levels

„

Transaction boundaries

„

Topology

Usually only the queuing network, Java, and perhaps topology need to be adjusted, otherwise defaults are good nearly universally. Of course you can change other values, and you have a good probability of causing problems if you do!

„

IT (vs. Classic)

„

32 vs. 64 bit

„

Policy

„

Maximum heap size

„

Minimum heap size

(70)

Summary

„

Tune the request flow queues

„

Ensure there is enough (dedicated) memory for Java

…

separate memory pool

…

QPFRADJ=0 or minimum on memory pool

„

Tune Java GC

„

Leave everything else alone

…

Most defaults are best for almost all applications and

environments

…

Don‘t adjust anything you do not fully understand

…

Identify problem areas before adjusting anything

(71)

WebSphere Application

Server on i5/OS

Performance Monitoring and

Tuning

Lisa Wellman peace@us.ibm.com

(72)

Primary Metrics to Monitor

„

Paging rates

…

WRKSYSSTS, Collection Services

„

CPU consumption

…

WRKACTJOB, WRKSYSSTS, Collection Services

„

Java garbage collection (GC) health

…

DMPJVM

, SST macros, verbosegc,

WRKJVMJOB

„

Database server jobs

…

DSPACTPJ

„

WAS pools

…

Tivoli Performance Viewer (TPV)

Classic JVM

IT JVM

(73)

Monitoring strategy

„

Always run

…

Collection Services

…

Job Watch Monitors

„

Run the other tools

…

When problems arise

…

Occasionally to know what’s “normal”

Monitor all the time to understand trends and have some data when problems occur

(74)

Roadmap

„

High CPU

…

WRKACTJOB

…

In WAS jobs: TPROF, PTDV

…

In DB jobs: DBMon

„

Paging in WAS jobs

(with IT JVM this becomes part of ‘other’)

…

Look at GC health (SST, JobWatcher, verbosegc,

DSPJVMJOB

)

…

Look for heap growth (leaks)

„

Neither of the above, but slow responses

…

DMPJVM

,

javacore dump

…

TPV

…

iDoctor JobWatcher

…

Collection Services, WRKDSKSTS, etc

Classic JVM

IT JVM

(75)

If you only have 5 minutes to collect data (i.e. need to shut

down and recover)

„

Dump GcCycles to printer (SST or PRTJVMJOB for IT in

V6R1)

…

Not necessary if verbosegc is on

„

Check CPU, if high run a TPROF

„

Classic JVM:

Check paging, if NOT high do a DMPJVM

„

IT JVM:

Get a javacore via ‘kill QUIT’ or SST ‘jvminfo

-java’ or GENJVMDMP

…

Ideally get several dumps

„

Run a JobWatcher trace

„

Save WebSphere Logs

(76)

Loadtests

„

CPU

…

TPROF

„

GC / Heap Size

…

JobWatcher (IT V6R1+)

…

Verbose GC / GC and Memory Analyzer

„

Pools

…

Web Threads, Connections: PMI / TPV

„

Database Tuning

…

DBMon

Figure

Updating...

References

Updating...

Related subjects :