Benchmark and Performance Analysis. of a. Large Centralized Tax Processing Solution

(1)

ROSATEA – Boston, 10-11 July 2007

Benchmark and Performance Analysis

of a

Large Centralized Tax Processing

Solution

Vijay Jain

Venkata Sai Jayanti Murty

2 ROSATEA – Boston, 10-11 July 2007

Oracle Forms 4.5

Oracle Reports 2.5

Oracle 8i database

RCC City A

RCC City B

RCC City C

RCC – Regional Computing Centre

Total 36 RCC

Client server

architecture

Each RCC works in isolation

Oracle Forms 4.5 becoming obsolete

Customer needed consolidation as business strategy

Replace with one single centralized

system for the country

(2)

3 ROSATEA – Boston, 10-11 July 2007 Oracle HTTP

server Oracle 10g AS Oracle 9i NCC -National Computing Centre

RCC City C RCC City A RCC City B LBS’s LBS’s Forms & Report server

Background

Benchmark to determine if this is technically feasible.

(Proposed Architecture)

Profile

• 52 million returns to be processed in 4 months

• 10000 assessing officers across 750 locations

• Floated a RFP for System Integration

• Application Benchmarking a pre-condition in RFP

• Benchmark to be done on one business transaction “Processing of

Income-Tax Returns”

• TCSL led consortium responded to the RFP

• Carried out benchmark in HP Labs, China

(3)

Objectives of the Benchmark

• To verify server side performance

• To evaluate scalability targets

• Apply modeling techniques to measured parameters

• Recommend hardware configuration for the application.

Rules of the Benchmark

• Server Partition, Disk and Volume size frozen by customer

• Application/Database/Load Runner scripts frozen

• Think Time fixed at 3 seconds

• Deterministic Think Time

• No reorganization of tablespaces structure and database

permitted

• No application optimization permitted

• Configuration parameter tuning of web, app, DB servers

permitted

• Tests to be executed: 1000, 2000 users for 1 million

transactions and 4000 users for 1, 2, 4 million transactions

(4)

Logical Diagram

Load

Controller Load Load Controller Controller Load Generator #1 Load Load Generator Generator #1 #1 Load Generator #3 Load Load Generator Generator #3 #3 Load Generator #2 Load Load Generator Generator #2 #2 Load Generator #4 Load Load Generator Generator #4 #4 Load Generator #10 Load Load Generator Generator #10 #10 Load Balancer Load Balancer Load Balancer Web Server #1 Web Server Web Server #1 #1 Web Server#3 Web Server Web Server #3 #3 Web Server #2 Web Server Web Server #2 #2 Web Server#4 Web Server Web Server #4 #4 Web Server#6 Web Server Web Server #6 #6 Application Server #1 Application Application Server Server #1 #1 Application Server #3 Application Application Server Server #3 #3 Application Server #2 Application Application Server Server #2 #2 Application Server #4 Application Application Server Server #4 #4 Database Server #2 (NCC) Database Database Server Server #2 (NCC) #2 (NCC) Database Server #1 (RCC) Database Database Server Server #1 (RCC) #1 (RCC) Mercury LoadRunner 8.0 Xeon 2 CPU/4GB DL480 Win2K rp8420 8CPU/32GB partition PA8900 1.1GHz Superdome 32CPU/128GB partition PA8800 1Ghz Oracle 9.2.0.4 Superdome 32CPU/128GB partition PA8800 1Ghz Oracle 9.2.0.4 Radware WSD 3.0 Oracle 10gAS 9.0.4.0 1 8 1 4 146 GB 10K rpm XP12000 32GB Cache App & DB server partitioning as per RFP 8 ROSATEA – Boston, 10-11 July 2007

(T

0

+) T

1

+ T

2

+ T

3

+ T

4

+ T

5

+ T

6

+ T

7

+ T

8

+ T

9

+ I = 3 sec

R

_i

< 1 sec for i = 0,..,9

Think Time Response Time Inter-iteration Time Login

Main Screen Adjustment/Demand

Save Comp-uted Records

Open Refund Details

Open Print Results Save Refund Details Close Print Results T0 R1 R₂ R3 R4 R₀ R₅ R6 R₇ R8 R9 T1 _T₂ T₃ T₄ T₅ T6 T7 T8 T₉ I Logout R₁ 12.5%

Observed cycle time of 120 sec in reality

Business Transaction Benchmarked

(5)

Test Configuration

16 CPU

32GB

RCC

8 CPU

32 GB

8 CPU

32 GB

2 CPU

4 GB

2 CPU

4 GB

Database Server Application Server Web Server

1000 Users

1 Million Returns

16 CPU

32GB

NCC

Initial Glitches

• With 3 sec think time response time target surpassed

at 500 users itself

• Heavy buffer busy, row lock waits

• Tuning and adding resources were of little help

• Raised the issue with customer

(6)

(T

₀

+) T

₁

+ T

₂

+ T

₃

+ T

₄

+ T

₅

+ T

₆

+ T

₇

+ T

₈

+ T

₉

+ I = 3 sec

Ri < 1 sec for i = 0,..,9 Think Time Response Time Inter-iteration Time Login

Main Screen Adjustment/Demand

Save Comp-uted Records

Open Refund Details

Open Print Results Save Refund Details Close Print Results T0 R₁ R₂ R₃ R₄ R0 R₅ R6 R₇ R8 R9 T1 _T₂ T₃ T₄ T₅ T₆ T7 T8 T9 I Logout R₁ 12.5%

39 sec

Observed cycle time of 120 sec in reality

Business Transaction Benchmarked

Test Configuration

8 CPU

32 GB

8 CPU

32 GB

2 CPU

4 GB

2 CPU

4 GB

32 CPU

128 GB

RCC

8 CPU

32 GB

8 CPU

32 GB

2 CPU

4 GB

2 CPU

4 GB

2000 Users 1 Million Returns

32 CPU

128 GB

(7)

Test Results: 1000 & 2000 users

Throughput 32 16 CPU Used (RCC) 1 M 1 M Returns processed Utilization

Response Times sec

< 10% ~15% Web CPU % ~40% ~40% Apps CPU % ~50% ~45% DB CPU % User Exit Print Result Refund Details Compute Main Screen 1.1 0.4 0.3 0.3 0.4 0.5 1.1 0.3 0.2 0.1 0.3 0.4 User Login 05:43 11:09 Completion Time (Hrs) 48 25 Business TPS 2000/ 1 million 1000/ 1million Test Type à 1.2 0.3 0.2 0.1 0.4 0.5 Avg 95th pct 1.0 0.4 0.2 0.2 0.3 0.4 Meets RFP Criteria Throughput scales almost linearly – no apparent bottleneck 15 ROSATEA – Boston, 10-11 July 2007

4000 user test

Results on App/DB configuration used for 2000 users

Results on higher number of CPUs

Scalability limiting analysis

Further analysis of response time

2 CPU

4 GB

2 CPU

4 GB

2 CPU

4 GB

2 CPU

4 GB

8 CPU

32 GB

8 CPU

32 GB

8 CPU

32 GB

8 CPU

32 GB

32 CPU

128 GB

RCC

32 CPU

128 GB

NCC

2 CPU

4 GB

2 CPU

4 GB

Base Configuration

(8)

4000 user results: 32CPU App, 32 CPU RCC

32 1 M < 10% ~40% ~50% 05:43 48 2000/ 1 million 1.0 0.4 0.2 0.2 0.3 0.4 32 CPU Used (RCC) 1 M Returns processed ~5% Web CPU % ~45% Apps CPU % ~50% DB CPU % User Exit Print Result Refund Details Compute Main Screen 1.2 29.3 4.5 17.1 2.1 1.6 User Login 06:03 Completion Time (Hrs) 47 Business TPS 4000/ 1 million Test Type à 1.1 24.0 2.1 13.7 0.7 0.6 Throughput

Response TimeAvg 95_pctth

Utilization

X*

4000 user test: Impact of adding DB CPUs

5 47 tps 5 47 tps 4 47 tps 4 48 tps No. of App Server

Business Throughput

Utilization

Response Times (seconds)

~40% ~40% ~45% ~40% Apps CPU % ~40% ~45% ~50% ~50% DB CPU % 1 1 User Exit 29 0.4 Print Result 5 0. 3 Refund Details 32 CPUs 32 CPUs 17 0. 3 Compute same as for same as for 2 0.4 Main Screen 2 0.5 User Login 56 48 32 32 No. of RCC CPU 4000 4000 4000 2000 No. of Users

(9)

4000 user test: Network, Disk, Memory on DB server

Nothing to be alarmed

about

No disk, memory,

network bottlenecks

Database does not scale

with CPUs

Are there any bad SQLs, contention problems, wait events?

< 192Kbps on 1 Gbps lan

< 45%

Constant at 45%

SQL processing contribution

SQL ordered by Gets for DB: RCC Instance: RCC

CPU Elapsd

Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value --- --- --- --- --- --- ---454,871,006 87,688 5,187.4 46.8 2778.07 20102.85 3184176672 Module: f90runm@rp84201 (TNS V1-V3)

SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SEL ECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

SQL ordered by Reads for DB: RCC Instance: RCC

CPU Elapsd

Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value --- --- --- --- --- ---

---14,416,210 87,688 164.4 83.6 2778.07 20102.85 3184176672 Module: f90runm@rp84201 (TNS V1-V3)

SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM SS_RETURN WHERE (SEQ_NO IN (SEL ECT a.SEQ_NO FROM ss_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and

Y

Extract of Oracle Statspack report for 4000 user test

High

physical

and logical

reads

Avg. Response time/Execution= 20102.85/87688 =0.23 sec

Avg Rsp Time for bus txn = 40.5 sec

0.23 sec due to this SQL

Is it worth tuning?

24.0 Print Result 2.1 Refund Details 13.7 Compute 0.7 Main Screen Avg.

(10)

What is a latch in Oracle DB

• Simple, low-level serialization mechanism to protect shared data structures

in the system global area (SGA)

• Provide only exclusive access to protected data structures needed briefly

Observed excessively high Latch contention

Latches are

applied in

these stages

Latches are

applied in

these stages

Syntax Check Syntax Check Semantic Analysis Semantic Analysis

Was the statement already parsed by another session?

Was the statement already parsed by another session? Hard Parse Hard Parse Soft Parse Soft Parse No Yes 22 ROSATEA – Boston, 10-11 July 2007

Database Wait Event Analysis

STATSPACK report for

Snap Id Snap Time Sessions Curs/Sess Comment

--- --- --- --- ---Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7

End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Top 5 Timed Events

~~~~~~~~~~~~~~~~~~~~~ % Total

Event Waits Time (s) Ela Time --- --- --- ---latch free 1,539,663 3,587,743 98.96 CPU time 28,487 .79 db file sequential read 17,221,454 7,500 .21 log file sync 46,102 773 .02 enqueue 5,299 680 .02

---Wait Events for DB: RCC Instance: RCC

Avg.

Total Wait wait Waits Event Waits Timeouts Time (s) (ms) /txn --- --- --- --- --- ---latch free 1,539,663 0 3,587,743 2330 5.4 db file sequential read 17,221,454 0 7,500 0 60.6 log file sync 246,102 0 773 3 0.9 enqueue 5,299 8 680 128 0.0

Excessively

high Latch

contention

(99% of total

wait time)

Y

Extract of Oracle Statspack report for 4000 user test: 30 min

5.1

Latch Free wait during 2000 user test was 8.31% and Avg. wait time was 2 ms

Average Wait / DB txn

= 2.3 * 5.1 = 11.73 sec

(11)

157.70 DB txns/sec

for 47 business

txns/sec

Þ

157.7/47 =

3.36 db txn/bus txn

Latch Wait’s Contribution to Response Time

Snap Id Snap Time Sessions Curs/Sess Comment

--- ---- --- --- ---Begin Snap: 2 31-Oct-05 18:36:38 4,009 118.7

End Snap: 3 31-Oct-05 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)

Load Profile

~~~~~~~~~~~~ Per Second Per Transaction --- ---Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70

Contribution of Latch Wait Time to Response Time

= Avg Wait Time/DB txn * No. of DB txn/Business Txn = 11.73 * 3.36

= 39.41 seconds

Average Response Time for 4000 users

= 0.7 + 13.7 + 2.1 + 24.0 = 40.5 seconds 24.0 Print Result 2.1 Refund Details 13.7 Compute 0.7 Main Screen Avg.

X*

Parsing Analysis

SQL ordered by Parse Calls for DB: RCC Instance: RCC

% Total

Parse Calls Executions Parses Hash Value --- --- --- ---14,285,993 14,286,265 54.89 2588670467 Module: f90runm@rp84201 (TNS V1-V3)

declare p varchar2(32767); begin p := GF_GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000);

2,939,514 2,939,618 11.29 2294365478 Module: f90runm@rp84201 (TNS V1-V3)

declare p varchar2(32767); begin p := GF_MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 := substr(p,16001,4000); :v6 := substr(p,20001,4000);

349,887 2,988,725 1.34 3106935379 Module: f90runm@rp84201 (TNS V1-V3)

SELECT RCC_NUM From GS_EMP_RCC WHERE ORAUSER = USER

Y

Extract of Oracle Statspack report for 4000 user test

Nearly 66% of total parse calls are for two functions :

GF_GLOBAL_POLICY & GF_MASTER_POLICY

Nearly 66% of total parse calls are for two functions :

_{GF_GLOBAL_POLICY & GF_MASTER_POLICY}

(12)

declare p varchar2(32767); begin p :=

GF_GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); call count cpu elapsed disk query current rows --- --- -- -- -- -- ---Parse 207 0.01 0.02 0 0 0 0 Execute 207 0.03 0.04 0 420 0 207 Fetch 0 0.00 0.00 0 0 0 0 --- --- --- - - - - ---total 414 0.04 0.07 0 420 0 207 declare p varchar2(32767); begin p :=

GF_MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000);

:v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000);

call count cpu elapsed disk query current rows --- --- -- -- -- -- ---Parse 40 0.01 0.00 0 0 0 0 Execute 40 0.01 0.01 0 54 0 40 Fetch 0 0.00 0.00 0 0 0 0 --- --- --- ---total 80 0.02 0.02 0 54 0 40

Function Call Analysis

Y

Extract of Oracle TKPROF report of test

1 session in tracefile.

435 user SQL statements in trace file.

70 internal SQL statements in trace file. 505 SQL statements in trace file. 162 unique SQL statements in trace file. 145 SQL statements EXPLAINed using schema:

AST.prof$plan_table Default table was used. Table was created. Table was dropped. 5295 lines in trace file. 247 function calls per business

transaction

= 247 * 47 = 11,609 per sec = 41.8 million per hour !!!!

WinterCorp 2005 Report:

Peak OLTP worldwide on Unix

= 8.6 million SQL calls per hour!!!!

WinterCorp 2005 Report:

Peak OLTP worldwide on Unix

= 8.6 million SQL calls per hour!!!!

Database user calls per

business transaction

**( 435 * 47 = 20445 per sec)**

=

73.6 million per hour!!!!

X*

207

40

Number of calls cannot be reduced by parameter tuning or adding hardware.

Only way out is change the application and hence reduce number of parses /

latches / sec.

2000 to 4000 user analysis:

39 sec Think Time

2,330 ms

2 ms

Average Latch Wait Time

47/sec

48/sec

Business Throughput

40.5 sec

1.1 sec

Average Rsp Time of

Entire Business Txn

4000 user

2000 user

N = 2000, R = 1.1, Z = 39 ÞDmax= 20 ms N = 4000, Z = 39, Dmax= 20 ms ÞR = 4000 * 20ms – 39 = 41 sec

If business throughput is not changing why does response time and wait time increase so drastically by doubling the number of users ?

Queueing Theory : Closed System under saturation R = N * Dmax– Z

where R = average response time of business txn N = number of users

Z = think time

Dmax= demand at bottleneck resource

= visit count at bottleneck resource/business txn x average service time at bottleneck resource

(13)

Application

Tuning

Infrastructure

Workload

Think Time

in Transactions

Time Spent in

Components

Resource

Consumption

Config parameters well

tuned

Adding more in one box

does not help

Determinants of Performance

No changes

allowed during

benchmark. Main

bottlenecks

identified.

Is 39 sec

meaningful?

Think Time Analysis

• Considering :

No. of hours/day = 5

Throughput required to process 52 Million returns

=

36.11 TXNs/Sec

• Applying Little’s Law :

N = (R+Z) * X

4000 4 ? 36.11

• Avg. Think Time = (4000/36.11) – 4 =

106.77 Sec

With 39 sec think time :

52 Million returns can be processed in mere 1.6 months

Though production target is 4 months

(14)

Test Results: 95 sec Think Time

½ M 1 M 1 M 1 M Returns processed Throughput Utilization

Response Times sec – 95th_percentile

~5% ~5% ~10% ~15% Web CPU % ~35% ~45% ~45% ~40% Apps CPU % ~45% ~50% ~50% ~45% DB CPU % 1.0 1 1.1 1.2 User Exit 0.4 29 0.4 0.3 Print Result 0.2 5 0.3 0.2 Refund Details 0.2 17 0.3 0.1 Compute 0.4 2 0.4 0.4 Main Screen 0.5 2 0.5 0.5 User Login 03:36 06:03 05:43 11:09 Completion Time (Hrs) 38 47 48 25 Business TPS 4000/ ½ million (95) 4000/ 1 million 2000/ 1 million 1000/ 1million Test Type à 30 ROSATEA – Boston, 10-11 July 2007