COMPUTER MEASUREMENT GROUP INDIA. Large Centralized Tax Processing Solution Benchmark and Performance Analysis

(1)

Large Centralized Tax Processing Solution

Benchmark and Performance Analysis

www.cmgindia.org

COMPUTER MEASUREMENT GROUP

INDIA

(2)

Client Server

Architecture

Replace With One Single Centralized System For The Country

Background

(Existing Architecture)

Zonal Centre A

(3)

Zonal Centre B

Web server Application Server Database Server

WAN

Background

(Proposed Architecture)

Benchmark to Determine if This is Technically Feasible

Zonal Centre A

(4)

RFP Objectives & Ground Rules

• Application Benchmarking a Pre-Condition

• To Evaluate Scalability Targets and Recommend Hardware Configuration

for the Application

• Deterministic Think Time Fixed at 5 Seconds

• Only Configuration Parameter Tuning of Web, App, DB Servers Permitted

• No Application Code Optimization Permitted

• 4000 Concurrent Users (To test in steps of 1000, 2000, 4000 users)

• Average Server Side Response Time/Screen < 1 sec

(5)

Main Transaction to be Benchmarked

Average Response Time Per Screen < 1 sec

Think Time for entire transaction = 5 secs

“Observed Cycle Time” was 2 minutes!!!

Login

Step 1:

Main Screen

Steps

2 and 3

:

Compute

Steps 4

and 5

:

Refund

12.5%

Steps 6,7,

and 8

:

Print Result

Logout

(6)

Test Configuration (1000 Users)

16 CPU

8 CPU

2 CPU

Database Servers Application Servers Web Servers

(7)

Test Result & Outcome

• With 5 sec think time response time target surpassed at 500 users

• Tuning and adding resources were of little help

• Raised the issue with customer

• Technical Committee revised think time to 40 Secs and left to the

vendors bidding for the RFP to do proper capacity planning

(8)

Test Configuration (2000 Users)

8 CPU

32 CPU

8 CPU

2 CPU

(9)

Throughput

32 16 CPU Used (DB) 1 M 1 M Returns processed

Utilization

Response Times (sec)

< 10% ~15% Web CPU % ~40% ~40% Apps CPU % ~50% ~45% DB CPU % User Exit Print Result Refund Compute Main Screen 1.1 0.4 0.3 0.3 0.4 0.5 1.1 0.3 0.2 0.1 0.3 0.4 User Login 05:45 11:10 Completion Time (Hrs) 48 25 Business TPS 2000 Users 1000 Users Test Type  1.2 0.3 0.2 0.1 0.4 0.5 Avg 95th pct 1.0 0.4 0.2 0.2 0.3 0.4

Test Results: 1000 & 2000 Users

Response Time

Meets RFP

Criteria

Throughput scales

almost linearly – no

apparent bottleneck

(10)

Database Tuning

• Cached highly used sequences

• Increased transaction slots at block level (initrans) to manage high

concurrency

• Pinned packages/Functions in memory to avoid reload

• Raised the priority of Oracle Processes

• Introduced session cursor caching to reduce latch requirement

• Created reverse key indexes for surrogated key columns

1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 …. …. …. …. 2001 2002 2003 2004 Normal BTree Different instances inserting 2005, 2006, 2007, 2008 Contention 1001 1002 1003 1004 …. …. …. …. 5001 5002 …. …. …. …. 6001 6002 Reverse BTree

(11)

4000 User Test

• Results on App/DB configuration used for 2000 users

• Results on higher number of CPUs

• Scalability limiting analysis

• Further analysis of response time

8 CPU

32 CPU

Base Configuration

2 CPU

(12)

4000 User Test Results

32 1 M < 10% ~40% ~50% 05:45 48 2000 Users 1.0 0.4 0.2 0.2 0.3 0.4 32 CPU Used (DB) 1 M Returns processed ~5% Web CPU % ~45% Apps CPU % ~50% DB CPU % User Exit Print Result Refund Compute Main Screen 1.2 29.3 4.5 17.1 2.1 1.6 User Login 06:00 Completion Time (Hrs) 47 Business TPS 4000 Users Test Type  1.1 24.0 2.1 13.7 0.7 0.6

Throughput

Response Time

Avg 95

th

pct

Utilization

(13)

4000 User Test: Impact of Adding DB CPUs

No. of Users 2000 4000 4000 4000

No. of CPU (DB) 32 32 48 56

No. of App Server Business Throughput 4 48 tps 4 47 tps 5 47 tps 5 47 tps

Response Times (seconds)

User Login 0.5 2 Main Screen 0.4 2 Same as for 32 CPUs 32 CPUs Same as for 32 CPUs Compute _{0. 3} ₁₇ Refund _{0. 3} ₅ Print Result _0.4 ₂₉ User Exit ₁ ₁

Utilization

DB CPU % ~50% ~50% ~45% ~40% Apps CPU % ~40% ~45% ~40% ~40%

(14)

4000 User Test:

Network, Disk, Memory on DB server

Nothing to be alarmed

about.

No Disk, Memory,

Network bottlenecks

Database does not

scale with CPUs

Are there any bad

SQLs, Contention

problems, Wait

Events?

< 192Kbps on 1 Gbps lan

< 45%

Constant at 45%

(15)

SQL Processing Contribution

SQL ordered by Gets for DB: ABC Instance: ABC

CPU Elapsd

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value

--- --- --- --- --- --- --- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85 3184176672 Module: f90runm@server1 (TNS V1-V3)

SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM AA_RETURN WHERE (SEQ_NO IN (SEL ECT a.SEQ_NO FROM aa_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

SQL ordered by Reads for DB: ABC Instance: ABC

CPU Elapsd

Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value --- --- --- --- --- --- --- 14,416,210 87,688 164.4 83.6 2778.07 20102.85 3184176672 Module: f90runm@server1 (TNS V1-V3)



Extract of Oracle Statspack report for 4000 user test

High

physical and

logical

reads

(16)

SQL Processing Contribution

SQL ordered by Gets for DB: ABC Instance: ABC

CPU Elapsd

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value

--- --- --- --- --- --- --- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85 3184176672 Module: f90runm@server1 (TNS V1-V3)

SQL ordered by Reads for DB: ABC Instance: ABC

CPU Elapsd

Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value --- --- --- --- --- --- --- 14,416,210 87,688 164.4 83.6 2778.07 20102.85 3184176672 Module: f90runm@server1 (TNS V1-V3)



Extract of Oracle Statspack report for 4000 user test

High

physical and

logical

reads

Avg. Response time/Execution= 20102.85/87688 =0.23 sec

Avg. Rsp Time for bus txn = 40.5 sec

0.23 sec due to this SQL

Is it worth tuning?

Avg.

Main Screen

0.7

Compute

13.7

Refund Details

2.1

Print Result

24.0

(17)

What is a Latch in Oracle DB

•

Simple, low-level serialization mechanism to protect shared data structures in the

System Global Area (SGA)

•

Provide only exclusive access to protected data structures needed briefly

Latches are

applied in

these stages

Syntax

Check

Semantic

Analysis

Was the statement

already parsed by

another session?

Hard

Parse

Soft

Parse

No Yes

(18)

Database Wait Event Analysis

STATSPACK report for

Snap Id Snap Time Sessions Curs/Sess Comment

--- --- --- --- --- Begin Snap: 2 31-Oct-07 18:36:38 4,009 118.7

End Snap: 3 31-Oct-07 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Top 5 Timed Events

~~~~~~~~~~~~~~~~~~~~~ % Total

Event Waits Time (s) Ela Time --- --- --- --- latch free 1,539,663 3,587,743 98.96 CPU time 28,487 .79 db file sequential read 17,221,454 7,500 .21 log file sync 46,102 773 .02 enqueue 5,299 680 .02 ---

Wait Events for DB: ABC Instance: ABC

Avg.

Total Wait wait Waits Event Waits Timeouts Time (s) (ms) /txn --- --- --- --- --- --- latch free 1,539,663 0 3,587,743 2330 5.1 db file sequential read 17,221,454 0 7,500 0 60.6 log file sync 246,102 0 773 3 0.9 enqueue 5,299 8 680 128 0.0

Excessively

high Latch

contention

(99% of total

wait time)



Extract of Oracle Statspack report for 4000 user test: 30 min

Latch Free wait during 2000 user test was 8.31% and Avg. wait time was 2 ms

Average Wait / DB txn

= 2.3 * 5.1 = 11.73 sec

(19)

157.70 DB txns/sec

for 47 business

txns/sec



157.7/47 =

3.36 db txn/bus txn

Latch Wait’s Contribution to Response Time

Snap Id Snap Time Sessions Curs/Sess Comment

--- --- --- --- --- Begin Snap: 2 31-Oct-07 18:36:38 4,009 118.7

End Snap: 3 31-Oct-07 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Load Profile

~~~~~~~~~~~~ Per Second Per Transaction --- --- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70

Contribution of Latch Wait Time to Response Time

= Avg Wait Time/DB txn * No. of DB txn/Business Txn

= 11.73 * 3.36

= 39.41 seconds

Average Response Time for 4000 Users

= 0.7 + 13.7 + 2.1 + 24.0

= 40.5 seconds

X*

Avg.

Main Screen

0.7

Compute

13.7

Refund Details

2.1

Print Result

24.0

(20)

Parsing Analysis

SQL ordered by Parse Calls for DB: ABC Instance: ABC

% Total

Parse Calls Executions Parses Hash Value --- --- --- --- 14,285,993 14,286,265 54.89 2588670467 Module: f90runm@server1 (TNS V1-V3)

declare p varchar2(32767); begin p := GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 :=

substr(p,16001,4000); :v6 := substr(p,20001,4000);

2,939,514 2,939,618 11.29 2294365478 Module: f90runm@server1 (TNS V1-V3)

declare p varchar2(32767); begin p := MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 :=

substr(p,16001,4000); :v6 := substr(p,20001,4000);

349,887 2,988,725 1.34 3106935379 Module: f90runm@server1 (TNS V1-V3)

SELECT ABC_NUM From EMP_ABC WHERE ORAUSER = USER



Extract of Oracle Statspack report for 4000 user test

Nearly 66% of total parse calls are for two functions :

GLOBAL_POLICY & MASTER_POLICY

(21)

declare p varchar2(32767); begin p :=

GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000);

call count cpu elapsed disk query current rows --- --- --- --- --- --- --- --- Parse 207 0.01 0.02 0 0 0 0 Execute 207 0.03 0.04 0 420 0 207 Fetch 0 0.00 0.00 0 0 0 0 --- --- --- --- --- --- --- --- total 414 0.04 0.07 0 420 0 207

declare p varchar2(32767); begin p :=

MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); call count cpu elapsed disk query current rows --- --- --- --- --- --- --- --- Parse 40 0.01 0.00 0 0 0 0 Execute 40 0.01 0.01 0 54 0 40 Fetch 0 0.00 0.00 0 0 0 0 --- --- --- --- --- --- --- --- total 80 0.02 0.02 0 54 0 40

Function Call Analysis



Extract of Oracle TKPROF report of test

247 function calls per business

transaction

= 247 * 47 = 11,609 per sec

= 41.8 million per hour !!!!

X*

Number of calls cannot be reduced by

parameter tuning or adding hardware.

Only way out is to change the

application and hence reduce number

of parses / latches / sec.

(22)

2000 to 4000 User Analysis:

(40 Secs Think Time)

2000 user

4000 user

Average Rsp Time of

Entire Business Txn

1.1 sec

40.5 sec

Business Throughput

48/sec

47/sec

Average Latch Wait Time

2 ms

2,330 ms

N = 2000, R = 1.1, Z = 40



D

_max

= 20 ms

N = 4000, Z = 40, D

_max

= 20 ms



R = 4000 * 20ms – 40

= 40 sec

If business throughput is not changing why does response time and wait time

increase so drastically by doubling the number of users ?

Queueing Theory : Closed System under saturation

R = N * D

_max

– Z

where R = average response time of business txn

N = number of users

Z = think time

D

_max

= demand at bottleneck resource

= visit count at bottleneck resource/business txn x

average service time at bottleneck resource

(23)

Application

Tuning

Infrastructure

Workload

Think Time

in Transactions

Time Spent in

Components

Resource

Consumption

Config parameters well

tuned

Adding more CPUs in

one box does not help

Determinants of Performance

Is 40 sec

Think Time

meaningful?

No changes allowed

during benchmark.

Main bottlenecks

identified

(24)

Think Time Analysis

• Assuming :

No. of hours/day = 5

Throughput required to process 50 Million returns

= 35 TXNs/Sec

• Applying Little’s Law :

N = (R+Z) * X

4000 4 ? 35

• Avg. Think Time = (4000/35) – 4 = 110 Sec

With 40 sec think time :

50 Million returns can be processed in ~ 1.5 months

Though the target is 4 months

(25)

Test Results: 100 Secs Think Time

½ M 1 M 1 M 1 M Returns processed ~5% ~5% ~10% ~15% Web CPU % ~35% ~45% ~45% ~40% Apps CPU % ~45% ~50% ~50% ~45% DB CPU % 1.0 1 1.1 1.2 User Exit 0.4 29 0.4 0.3 Print Result 0.2 5 0.3 0.2 Refund 0.2 17 0.3 0.1 Compute 0.4 2 0.4 0.4 Main Screen 0.5 2 0.5 0.5 User Login 3:35 6:00 5:45 11:10 Completion Time (Hrs) 38 47 48 25 Business TPS 4000 Users (Think Time =100 S) 4000 Users 2000 Users 1000 Users Test Type 

Throughput

Utilization

(26)

Conclusions

• Benchmark has proven that application scales linearly till 2000 users with the

specified think time

• Recommendations given for further scalability

• Analysis of the results and workload has proved that the workload defined for

the benchmark is unrealistic

• System can meet the target of processing 50 million returns in 4 months, with

reasonable think time and workload

IMPACT

The RFP was scrapped and new requirements were laid down with

more realistic workload and flexibility

(27)

Learnings

• Gauge the problem - Is it feasible to achieve target with resolution?

• Whenever doing Benchmark, work on Performance Engineering Model

to derive whether it will work or not

• Evaluate contribution of Wait Events to response time while doing

Database Analysis

• Every Application has certain scalability limits, no matter what

hardware you deploy upon

“Adding hardware is not the solution for all the problems”

• Choosing realistic workload is imperative to ensure its achievability

and applicability of the results

(28)

COMPUTER MEASUREMENT GROUP INDIA. Large Centralized Tax Processing Solution Benchmark and Performance Analysis

Large Centralized Tax Processing Solution

Web server Application Server Database Server

Background

• Only Configuration Parameter Tuning of Web, App, DB Servers Permitted

Login

• Tuning and adding resources were of little help

Throughput

Criteria

• Introduced session cursor caching to reduce latch requirement

1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 …. …. …. …. 2001 2002 2003 2004 Normal B*Tree Different instances inserting 2005, 2006, 2007, 2008 Contention 1001 1002 1003 1004 …. …. …. …. 5001 5002 …. …. …. …. 6001 6002 Reverse B*Tree

4000 User Test

Throughput

No. of Users 2000 4000 4000 4000

Response Times (seconds)

Are there any bad

SQL ordered by Gets for DB: ABC Instance: ABC

SQL ordered by Reads for DB: ABC Instance: ABC

Extract of Oracle Statspack report for 4000 user test

SQL ordered by Gets for DB: ABC Instance: ABC

SQL ordered by Reads for DB: ABC Instance: ABC

Extract of Oracle Statspack report for 4000 user test

What is a Latch in Oracle DB

Was the statement

STATSPACK report for

Excessively

txns/sec

Contribution of Latch Wait Time to Response Time

SQL ordered by Parse Calls for DB: ABC Instance: ABC

Extract of Oracle Statspack report for 4000 user test

Function Call Analysis

of parses / latches / sec.

increase so drastically by doubling the number of users ?

Tuning

Main bottlenecks

Test Results: 100 Secs Think Time

the benchmark is unrealistic

Learnings

hardware you deploy upon

1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 …. …. …. …. 2001 2002 2003 2004 Normal BTree Different instances inserting 2005, 2006, 2007, 2008 Contention 1001 1002 1003 1004 …. …. …. …. 5001 5002 …. …. …. …. 6001 6002 Reverse BTree