Large Centralized Tax Processing Solution
Benchmark and Performance Analysis
www.cmgindia.org
COMPUTER MEASUREMENT GROUP
INDIA
Client Server
Architecture
Replace With One Single Centralized System For The Country
Background
(Existing Architecture)
Zonal Centre A
Zonal Centre B
Web server Application Server Database Server
WAN
Background
(Proposed Architecture)
Benchmark to Determine if This is Technically Feasible
Zonal Centre A
RFP Objectives & Ground Rules
• Application Benchmarking a Pre-Condition
• To Evaluate Scalability Targets and Recommend Hardware Configuration
for the Application
• Deterministic Think Time Fixed at 5 Seconds
• Only Configuration Parameter Tuning of Web, App, DB Servers Permitted
• No Application Code Optimization Permitted
• 4000 Concurrent Users (To test in steps of 1000, 2000, 4000 users)
• Average Server Side Response Time/Screen < 1 sec
Main Transaction to be Benchmarked
Average Response Time Per Screen < 1 sec
Think Time for entire transaction = 5 secs
“Observed Cycle Time” was 2 minutes!!!
Login
Step 1:
Main Screen
Steps
2 and 3
:
Compute
Steps 4
and 5
:
Refund
12.5%
Steps 6,7,
and 8
:
Print Result
Logout
Test Configuration (1000 Users)
16 CPU
8 CPU
8 CPU
2 CPU
2 CPU
Database Servers Application Servers Web ServersTest Result & Outcome
• With 5 sec think time response time target surpassed at 500 users
• Tuning and adding resources were of little help
• Raised the issue with customer
• Technical Committee revised think time to 40 Secs and left to the
vendors bidding for the RFP to do proper capacity planning
Test Configuration (2000 Users)
8 CPU
8 CPU
32 CPU
8 CPU
8 CPU
2 CPU
2 CPU
2 CPU
2 CPU
Database Servers Application Servers Web ServersThroughput
32 16 CPU Used (DB) 1 M 1 M Returns processedUtilization
Response Times (sec)
< 10% ~15% Web CPU % ~40% ~40% Apps CPU % ~50% ~45% DB CPU % User Exit Print Result Refund Compute Main Screen 1.1 0.4 0.3 0.3 0.4 0.5 1.1 0.3 0.2 0.1 0.3 0.4 User Login 05:45 11:10 Completion Time (Hrs) 48 25 Business TPS 2000 Users 1000 Users Test Type 1.2 0.3 0.2 0.1 0.4 0.5 Avg 95th pct 1.0 0.4 0.2 0.2 0.3 0.4
Test Results: 1000 & 2000 Users
Response Time
Meets RFP
Criteria
Throughput scales
almost linearly – no
apparent bottleneck
Database Tuning
• Cached highly used sequences
• Increased transaction slots at block level (initrans) to manage high
concurrency
• Pinned packages/Functions in memory to avoid reload
• Raised the priority of Oracle Processes
• Introduced session cursor caching to reduce latch requirement
• Created reverse key indexes for surrogated key columns
1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 …. …. …. …. 2001 2002 2003 2004 Normal B*Tree Different instances inserting 2005, 2006, 2007, 2008 Contention 1001 1002 1003 1004 …. …. …. …. 5001 5002 …. …. …. …. 6001 6002 Reverse B*Tree
4000 User Test
• Results on App/DB configuration used for 2000 users
• Results on higher number of CPUs
• Scalability limiting analysis
• Further analysis of response time
8 CPU
8 CPU
8 CPU
8 CPU
32 CPU
Base Configuration
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
2 CPU
Database Servers Application Servers Web Servers4000 User Test Results
32 1 M < 10% ~40% ~50% 05:45 48 2000 Users 1.0 0.4 0.2 0.2 0.3 0.4 32 CPU Used (DB) 1 M Returns processed ~5% Web CPU % ~45% Apps CPU % ~50% DB CPU % User Exit Print Result Refund Compute Main Screen 1.2 29.3 4.5 17.1 2.1 1.6 User Login 06:00 Completion Time (Hrs) 47 Business TPS 4000 Users Test Type 1.1 24.0 2.1 13.7 0.7 0.6Throughput
Response Time
Avg 95th
pct
Utilization
4000 User Test: Impact of Adding DB CPUs
No. of Users 2000 4000 4000 4000
No. of CPU (DB) 32 32 48 56
No. of App Server Business Throughput 4 48 tps 4 47 tps 5 47 tps 5 47 tps
Response Times (seconds)
User Login 0.5 2 Main Screen 0.4 2 Same as for 32 CPUs 32 CPUs Same as for 32 CPUs Compute 0. 3 17 Refund 0. 3 5 Print Result 0.4 29 User Exit 1 1
Utilization
DB CPU % ~50% ~50% ~45% ~40% Apps CPU % ~40% ~45% ~40% ~40%4000 User Test:
Network, Disk, Memory on DB server
Nothing to be alarmed
about.
No Disk, Memory,
Network bottlenecks
Database does not
scale with CPUs
Are there any bad
SQLs, Contention
problems, Wait
Events?
< 192Kbps on 1 Gbps lan
< 45%
Constant at 45%
SQL Processing Contribution
SQL ordered by Gets for DB: ABC Instance: ABC
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--- --- --- --- --- --- --- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85 3184176672 Module: f90runm@server1 (TNS V1-V3)
SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM AA_RETURN WHERE (SEQ_NO IN (SEL ECT a.SEQ_NO FROM aa_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
SQL ordered by Reads for DB: ABC Instance: ABC
CPU Elapsd
Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value --- --- --- --- --- --- --- 14,416,210 87,688 164.4 83.6 2778.07 20102.85 3184176672 Module: f90runm@server1 (TNS V1-V3)
SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM AA_RETURN WHERE (SEQ_NO IN (SEL ECT a.SEQ_NO FROM aa_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
Extract of Oracle Statspack report for 4000 user test
High
physical and
logical
reads
SQL Processing Contribution
SQL ordered by Gets for DB: ABC Instance: ABC
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--- --- --- --- --- --- --- 454,871,006 87,688 5,187.4 46.8 2778.07 20102.85 3184176672 Module: f90runm@server1 (TNS V1-V3)
SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM AA_RETURN WHERE (SEQ_NO IN (SEL ECT a.SEQ_NO FROM aa_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
SQL ordered by Reads for DB: ABC Instance: ABC
CPU Elapsd
Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value --- --- --- --- --- --- --- 14,416,210 87,688 164.4 83.6 2778.07 20102.85 3184176672 Module: f90runm@server1 (TNS V1-V3)
SELECT ROWID,SEQ_NO,IND_STAT,BNDL_AREA_CD,BNDL_AO_TYP,BNDL_RANGE _CD,BNDL_AO_NO,BNDL_FIN_YR,BNDL_CNTR_NO,BNDL_SEQ_NO,ACK_NO,AST_Y R,PAN,DT_FILED,NAME,RET_INC FROM AA_RETURN WHERE (SEQ_NO IN (SEL ECT a.SEQ_NO FROM aa_return a WHERE A.RANGE_CD = :1 AND A.AO_NO = :2 AND A.AO_TYP = :3 AND A.area_cd = :4)) and (AST_YR=:5) and
Extract of Oracle Statspack report for 4000 user test
High
physical and
logical
reads
Avg. Response time/Execution= 20102.85/87688 =0.23 sec
Avg. Rsp Time for bus txn = 40.5 sec
0.23 sec due to this SQL
Is it worth tuning?
Avg.
Main Screen
0.7
Compute
13.7
Refund Details
2.1
Print Result
24.0
What is a Latch in Oracle DB
•
Simple, low-level serialization mechanism to protect shared data structures in the
System Global Area (SGA)
•
Provide only exclusive access to protected data structures needed briefly
Latches are
applied in
these stages
Syntax
Check
Semantic
Analysis
Was the statement
already parsed by
another session?
Hard
Parse
Soft
Parse
No YesDatabase Wait Event Analysis
STATSPACK report for
Snap Id Snap Time Sessions Curs/Sess Comment
--- --- --- --- --- Begin Snap: 2 31-Oct-07 18:36:38 4,009 118.7
End Snap: 3 31-Oct-07 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time --- --- --- --- latch free 1,539,663 3,587,743 98.96 CPU time 28,487 .79 db file sequential read 17,221,454 7,500 .21 log file sync 46,102 773 .02 enqueue 5,299 680 .02 ---
Wait Events for DB: ABC Instance: ABC
Avg.
Total Wait wait Waits Event Waits Timeouts Time (s) (ms) /txn --- --- --- --- --- --- latch free 1,539,663 0 3,587,743 2330 5.1 db file sequential read 17,221,454 0 7,500 0 60.6 log file sync 246,102 0 773 3 0.9 enqueue 5,299 8 680 128 0.0
Excessively
high Latch
contention
(99% of total
wait time)
Extract of Oracle Statspack report for 4000 user test: 30 min
Latch Free wait during 2000 user test was 8.31% and Avg. wait time was 2 ms
Average Wait / DB txn
= 2.3 * 5.1 = 11.73 sec
157.70 DB txns/sec
for 47 business
txns/sec
157.7/47 =
3.36 db txn/bus txn
Latch Wait’s Contribution to Response Time
Snap Id Snap Time Sessions Curs/Sess Comment
--- --- --- --- --- Begin Snap: 2 31-Oct-07 18:36:38 4,009 118.7
End Snap: 3 31-Oct-07 19:06:39 4,006 120.7 Elapsed: 30.02 (mins)
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction --- --- Redo size: 1,537,284.62 9,748.08 Logical reads: 539,210.76 3,419.19 Block changes: 8,615.05 54.63 Physical reads: 9,570.96 60.69 Physical writes: 982.97 6.23 User calls: 22,725.02 144.10 Parses: 14,450.63 91.63 Hard parses: 0.00 0.00 Sorts: 3,699.62 23.46 Logons: 0.00 0.00 Executes: 38,242.79 242.50 Transactions: 157.70
Contribution of Latch Wait Time to Response Time
= Avg Wait Time/DB txn * No. of DB txn/Business Txn
= 11.73 * 3.36
= 39.41 seconds
Average Response Time for 4000 Users
= 0.7 + 13.7 + 2.1 + 24.0
= 40.5 seconds
X*
Avg.
Main Screen
0.7
Compute
13.7
Refund Details
2.1
Print Result
24.0
Parsing Analysis
SQL ordered by Parse Calls for DB: ABC Instance: ABC
% Total
Parse Calls Executions Parses Hash Value --- --- --- --- 14,285,993 14,286,265 54.89 2588670467 Module: f90runm@server1 (TNS V1-V3)
declare p varchar2(32767); begin p := GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 :=
substr(p,16001,4000); :v6 := substr(p,20001,4000);
2,939,514 2,939,618 11.29 2294365478 Module: f90runm@server1 (TNS V1-V3)
declare p varchar2(32767); begin p := MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); :v4 := substr(p,12001,4000); :v5 :=
substr(p,16001,4000); :v6 := substr(p,20001,4000);
349,887 2,988,725 1.34 3106935379 Module: f90runm@server1 (TNS V1-V3)
SELECT ABC_NUM From EMP_ABC WHERE ORAUSER = USER
Extract of Oracle Statspack report for 4000 user test
Nearly 66% of total parse calls are for two functions :
GLOBAL_POLICY & MASTER_POLICY
declare p varchar2(32767); begin p :=
GLOBAL_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000);
call count cpu elapsed disk query current rows --- --- --- --- --- --- --- --- Parse 207 0.01 0.02 0 0 0 0 Execute 207 0.03 0.04 0 420 0 207 Fetch 0 0.00 0.00 0 0 0 0 --- --- --- --- --- --- --- --- total 414 0.04 0.07 0 420 0 207
declare p varchar2(32767); begin p :=
MASTER_POLICY(:sn, :on); :v1 := substr(p,1,4000); :v2 := substr(p,4001,4000); :v3 := substr(p,8001,4000); call count cpu elapsed disk query current rows --- --- --- --- --- --- --- --- Parse 40 0.01 0.00 0 0 0 0 Execute 40 0.01 0.01 0 54 0 40 Fetch 0 0.00 0.00 0 0 0 0 --- --- --- --- --- --- --- --- total 80 0.02 0.02 0 54 0 40
Function Call Analysis
Extract of Oracle TKPROF report of test
247 function calls per business
transaction
= 247 * 47 = 11,609 per sec
= 41.8 million per hour !!!!
X*
Number of calls cannot be reduced by
parameter tuning or adding hardware.
Only way out is to change the
application and hence reduce number
of parses / latches / sec.
2000 to 4000 User Analysis:
(40 Secs Think Time)
2000 user
4000 user
Average Rsp Time of
Entire Business Txn
1.1 sec
40.5 sec
Business Throughput
48/sec
47/sec
Average Latch Wait Time
2 ms
2,330 ms
N = 2000, R = 1.1, Z = 40
D
max= 20 ms
N = 4000, Z = 40, D
max= 20 ms
R = 4000 * 20ms – 40
= 40 sec
If business throughput is not changing why does response time and wait time
increase so drastically by doubling the number of users ?
Queueing Theory : Closed System under saturation
R = N * D
max– Z
where R = average response time of business txn
N = number of users
Z = think time
D
max= demand at bottleneck resource
= visit count at bottleneck resource/business txn x
average service time at bottleneck resource
Application
Tuning
Infrastructure
Workload
Think Time
in Transactions
Time Spent in
Components
Resource
Consumption
Config parameters well
tuned
Adding more CPUs in
one box does not help
Determinants of Performance
Is 40 sec
Think Time
meaningful?
No changes allowed
during benchmark.
Main bottlenecks
identified
Think Time Analysis
• Assuming :
No. of hours/day = 5
Throughput required to process 50 Million returns
= 35 TXNs/Sec
• Applying Little’s Law :
N = (R+Z) * X
4000 4 ? 35
• Avg. Think Time = (4000/35) – 4 = 110 Sec
With 40 sec think time :
50 Million returns can be processed in ~ 1.5 months
Though the target is 4 months
Test Results: 100 Secs Think Time
½ M 1 M 1 M 1 M Returns processed ~5% ~5% ~10% ~15% Web CPU % ~35% ~45% ~45% ~40% Apps CPU % ~45% ~50% ~50% ~45% DB CPU % 1.0 1 1.1 1.2 User Exit 0.4 29 0.4 0.3 Print Result 0.2 5 0.3 0.2 Refund 0.2 17 0.3 0.1 Compute 0.4 2 0.4 0.4 Main Screen 0.5 2 0.5 0.5 User Login 3:35 6:00 5:45 11:10 Completion Time (Hrs) 38 47 48 25 Business TPS 4000 Users (Think Time =100 S) 4000 Users 2000 Users 1000 Users Test Type Throughput
Utilization
Conclusions
• Benchmark has proven that application scales linearly till 2000 users with the
specified think time
• Recommendations given for further scalability
• Analysis of the results and workload has proved that the workload defined for
the benchmark is unrealistic
• System can meet the target of processing 50 million returns in 4 months, with
reasonable think time and workload
IMPACT
The RFP was scrapped and new requirements were laid down with
more realistic workload and flexibility
Learnings
• Gauge the problem - Is it feasible to achieve target with resolution?
• Whenever doing Benchmark, work on Performance Engineering Model
to derive whether it will work or not
• Evaluate contribution of Wait Events to response time while doing
Database Analysis
• Every Application has certain scalability limits, no matter what
hardware you deploy upon
“Adding hardware is not the solution for all the problems”
• Choosing realistic workload is imperative to ensure its achievability
and applicability of the results