© 2012 Wellesley Information Services. All rights reserved.
20 Tips and Tricks
to Improve Data
Load Performance
Jesper Christensen
COMERIT
In This Session …
• Gain insight into SAP NetWeaver® BW data load processes, how
they work, and what tools are available to monitor and optimize their performance
• Receive best practices to maximize data load performance while
reducing long-term maintenance costs
• Understand the benefits of optimized data load processes
• Find out how to enable version history to track code changes and
how to create reusable ETL logic to improve throughput and reduce data load time
• Get tips on when and how to use customer exits in DataSources
and variables to manage risk and reduce maintenance costs
• Identify the challenges and benefits of semantic partitioning and
the importance of efficient data models
2
What We’ll Cover …
• Loading data in SAP NetWeaver BW • Finding performance bottlenecks • Optimizing the database
• Optimizing the ABAP code • Optimizing the data models • Optimizing the data updates • Wrap-up
SAP NetWeaver BW Data Load Processing Overview
• SAP NetWeaver BW data load processing consists of three main
activities:
Extraction = Collecting the data in the source systems and
preparing it before sending it to SAP NetWeaver BW
Transformation = Transforming the data using routines,
lookups, formulas, etc.
Load = Updating the data into InfoProviders’ DataStore Objects
(DSOs), cubes, and master data
Dataflow in SAP NetWeaver BW
4
Extraction Interface Types
5
6
DataSources Supported by SAP NetWeaver Extraction
• SAP NetWeaver BW Service API
Allows data from SAP systems in standardized form to be
extracted and accessed directly
These can be SAP application systems or SAP NetWeaver BW systems
• File interface
The file interface permits the extraction from and direct access
to files, such as csv files
• Web services
Permit you to send data to the SAP NetWeaver BW system
7
DataSources Supported by SAP NetWeaver Extraction
(cont.)
• Universal Data (UD) Connect
Permits the extraction from and direct access to relational data
• Database (DB) Connect
Permits the extraction from and direct access to data located in
tables or views of a database management system
• Staging Business Application Programming Interfaces (BAPIs)
Open interfaces that SAP BusinessObjects DataServices and
certified third-party tools can use to extract data from older systems
8
Extraction Time Can Be Split into Two Categories
• Extraction time
DB time to select the data to be extracted
Logic applied during extraction such as joins, lookups, and
filtering
• Middleware and network time
The time used to transfer the data from the source system to
the target SAP NetWeaver BW system
Interface types such as Web services and Universal Data (UD)
Connect are good for small amounts of data and cannot handle large volumes
Fixed format files are larger to transfer but faster to load into
SAP NetWeaver BW
9
Transformation Types
• SAP NetWeaver BW supports the 3.x and the 7.x versions of
transforming the data
3.x is using Transfer rules and Update rules
Two steps of logic to process the dataset
Loads to different targets must be processed together Used to have better performance than transformations Old method; no more development or performance
enhancements; do not continue to use
7.x is using transformations
Is using a single step of logic to process the dataset
Loads to different targets can be processed independently Better performance
10
Loading Data to Information Providers Types
• Loading of the data to InfoProviders differs depending on type
DSO
Update of the activation queue
Activation of data (update of active table and changelog) SID determination
Should in general be switched off for DSOs
Master data
Update of master data tables SID determination
Check duplicate key values
Very time consuming for time-dependent attributes
Attribute change run to activate the master data Generate navigation data
Loading Data to Information Providers Types (cont.)
• Loading of the data to InfoProviders differs depending on type
(cont.)
Cubes
Update of data to the InfoCube star schema SID determination
Roll up data to aggregates
Update data to SAP NetWeaver BW Accelerator (SAP NetWeaver BWA)
• Performance considerations for loading the data
Ensure that the database parameters are in place
Implement the correct SAP NetWeaver BW settings for your
InfoProviders
12
What We’ll Cover …
• Loading data in SAP NetWeaver BW • Finding performance bottlenecks • Optimizing the database
• Optimizing the ABAP code • Optimizing the data models • Optimizing the data updates • Wrap-up
Tip 01: SAP NetWeaver BW 7.x Statistics
• SAP NetWeaver BW includes a great statistics tool
It collects information on most SAP NetWeaver BW-specific
activity
Such as data loads and queries
It’s delivered as business content
So you must activate it just like all business content
How to Activate Admin Cockpit document on help.sap.com
http://help.sap.com/saphelp_nw04s/helpdata/en/46/f9bd550d4 0537de10000000a1553f6/frameset.htm
Tip 01: SAP NetWeaver BW 7.x Statistics (cont.)
• Define standard measure that can
be monitored on a daily, weekly, and monthly basis to evaluate data load performance trends
Records processed per minute
or Time to process 1 million records
Time spent on extraction
Time spent in transformations
Top 10 long running loads
Total time spent for Attribute
and Hierarchy change runs
• Use the standard queries and
reports as a starting point
Tip 02: See Details About Performance in the Monitor
• The load monitor transaction code RSMO gives more details
about the processing steps
InfoPackage details
Data Transfer Process (DTP) details
16
Tip 03: Use SE30 to Test Performance
• Transaction code SE30 ABAP Runtime Analysis gives a detailed
view of performance Remember to set the accuracy to Low Run transaction code RSA3
Tip 03: Use SE30 to Test Performance (cont.)
• Detailed Runtime will show you the bottlenecks
Sort descending based on Net Time and you will see your bottleneck on the top
18
What We’ll Cover …
• Loading data in SAP NetWeaver BW • Finding performance bottlenecks • Optimizing the database
• Optimizing the ABAP code • Optimizing the data models • Optimizing the data updates • Wrap-up
Tip 04: Implement the Correct DB Parameters
• Key DB parameters
SAP has recommended some parameter values for SAP
NetWeaver BW that usually improve performance
Expect to evaluate these parameter settings frequently, though,
to ensure that the DB operates optimally
See three key SAP Notes:
830576 – Parameter recommendations for Oracle 10g 387946 – Use of locally managed tablespaces for
BW systems
1044441 – Basis parameterization for NW 7.0 BI systems
20
Tip 05: Manage Database Statistics
• DB statistics are also crucial for
SAP NetWeaver BW performance
The DB will not know the most
optimal execution path for an SQL statement without DB statistics
To set up DB statistics:
Set up BRCONNECT job
using DB20 to recalculate DB statistics
Use program RSANAORA to analyze specific tables
DB statistics can run very slowly under Oracle when you use SAP NetWeaver BW programs or DB
Tip 06: Build Secondary Indices
• The select statements used during extraction or during
user exit enhancements should always use a database index
Build secondary indices in transaction code SE11
or on the DSO objects used in select statements
22
What We’ll Cover …
• Loading data in SAP NetWeaver BW • Finding performance bottlenecks • Optimizing the database
• Optimizing the ABAP code • Optimizing the data models • Optimizing the data updates • Wrap-up
Tip 07: Coding Tips — Dynamic Calls
• Code the extractor user exits so that they call a dynamic program
per DataSource
Isolate the code per DataSource in a self-contained program
Minimize risk that a syntax error in code for one DataSource
impacts extraction from all other DataSources
• Example
Program name = ZBW + <DataSource name>
Form name = DOZBW + <DataSource name>
• This same technique can be used with customer exit variable
code
24
Tip 07: Coding Tips — Dynamic Calls (cont.)
Tip 08: Coding Tips — Field Symbols
• Performance consideration: Where possible, use field symbols to
populate fields in the data package
The move costs of a LOOP ... INTO statement depend on the
size of a table line
The larger the line size, the longer the move will take
By applying a LOOP... ASSIGNING statement you can attach a
field symbol to the table lines and operate directly on the line contents
This is a much faster way to access the internal table lines
without moving their contents
26
Tip 08: User Exit — Field Symbols
• Illustration: Sample use of field symbols
User Exit (without field-symbols)
REPORT YBWZDS_AGR_USER.
***************************************************************** * Form called dynamically must start with DOYBW + <DataSource> *****************************************************************
FORM DOYBWZDS_AGR_USER
TABLES C_T_DATA STRUCTURE ZOXBWD0001.
data: l_logsys type logsys. l_s_data like ZOXBWD0001. select single logsys from t000 into l_logsys
where mandt = sy-mandt.
loop at c_t_data into l_s_data. l_s_data-load_dt = sy-datum. l_s_data-logsys = l_logsys.
modify c_t_data from l_s_data index sy-tabix. endloop.
ENDFORM.
User Exit (with field-symbols)
REPORT YBWZDS_AGR_USER.
***************************************************************** * Form called dynamically must start with DOZBW + <DataSource> *****************************************************************
FORM DOYBWZDS_AGR_USER
TABLES C_T_DATA STRUCTURE ZOXBWD0001.
data: l_logsys type logsys.
field-symbols: <fs> like c_t_data.
select single logsys from t000 into l_logsys
where mandt = sy-mandt.
loop at c_t_data assigning <fs>. <fs>-load_dt = sy-datum. <fs>-logsys = l_logsys. endloop.
27
Tip 09: Coding Tips — Read Instead of Loop
• Use a READ statement to access a table rather than a LOOP
WHERE
The cost of a LOOP WHERE is much higher than a READ with
table key or binary search statement
The READ can also be used prior to a loop statement that does
require a LOOP to then use a LOOP FROM INDEX instead of LOOP WHERE
28
Tip 09: User Exit: Read Instead of Loop
• Illustration: Sample use of field symbols
User Exit (without read)
REPORT YBW2LIS_13_VDITM.
***************************************************************** * Form called dynamically must start with DOYBW + <DataSource> *****************************************************************
FORM DOYBW2LIS_13_VDITM
TABLES C_T_DATA STRUCTURE ZOXBWD0001.
data: l_logsys type logsys. l_s_data like ZOXBWD0001.
field-symbols: <fs> like c_t_data, <fs1> like VBAP.
Loop at c_t_data assigning <fs>. Loop at itab assigning <fs1> where VBELN = c_t_data-VEBLN.
c_t_data-NETVALUE = c_t_data-NETVALUE + <fs>- NETWR.
endloop. Endloop.
ENDFORM.
User Exit (with read)
REPORT YBWZDS_AGR_USER.
***************************************************************** * Form called dynamically must start with DOZBW + <DataSource> *****************************************************************
FORM DOYBWZDS_AGR_USER
TABLES C_T_DATA STRUCTURE ZOXBWD0001.
data: l_logsys type logsys, l_idx type sy-tabix.
field-symbols: <fs> like c_t_data, <fs1> like VBAP.
Loop at c_t_data assigning <fs>.
READ TABLE ITAB WITH TABLE KEY VBELN = c_t_data-VEBLN BINARY SEARCH. L_idx = sy-tabix.
Loop at itab assigning <fs1> FROM INDEX l_idx. check <fs>-VBELN = c_t_data-VEBLN.
c_t_data-NETVALUE = c_t_data-NETVALUE + <fs>- NETWR.
endloop. endloop.
29
Tip 10: Delta Enable Generic DataSources
• Improve extract performance by creating delta-enabled generic
DataSources
• Simple:
By date
By timestamp
By sequential number (unique table key)
• Complex:
Pointers – ABAP techniques can be used to record an array of
30
Tip 10: Delta Enable Generic DataSources (cont.)
• Illustration: Delta enabling a generic DataSource
Ensure that you set the upper or lower
limits correctly based on the data you are extracting!
Tip 11: Lookups
• Do not use single selects for lookups! • For better performance:
Use start routines to read lookup data to an internal table
Read internal table to populate field values in routines
• For best performance:
Add lookup fields to InfoSource
Use start routine and field symbols to populate blank fields for
entire data package at one time (see illustration on slide titled ―User Exit — Field Symbols‖)
Tip 12: Program Includes
• Use includes for all complex routine logic • Access logic by using ―perform‖ statements • Increase portability of transformation logic
Use same read statements for multiple lookups
Reduce risk of errors in obscure places
• Decrease maintenance cost of complex update rules
One place to go to fix/enhance logic
Code is consistent and easier to follow
• Enable version management of code
Track changes over time
Compare between systems
Revert to previous versions
33
Tip 12: Program Includes (cont.)
• Illustration – Select into internal table
Start routine
FORM startup
TABLES MONITOR STRUCTURE RSMONITOR "user defined monitoring MONITOR_RECNO STRUCTURE RSMONITORS
DATA_PACKAGE STRUCTURE DATA_PACKAGE USING RECORD_ALL LIKE SY-TABIX
SOURCE_SYSTEM LIKE RSUPDSIMULH-LOGSYS
CHANGING ABORT LIKE SY-SUBRC. "set ABORT <> 0 to cancel update *
*$*$ begin of routine - insert your code only below this line *-*
* fill the internal tables "MONITOR" and/or "MONITOR_RECNO", * to make monitor entries
perform READ_USR02_TO_MEMORY_FOR_0BWTC_C02 TABLES MONITOR DATA_PACKAGE USING RECORD_ALL SOURCE_SYSTEM CHANGING ABORT.
* if abort is not equal zero, the update process will be canceled * ABORT = 0.
*$*$ end of routine - insert your code only before this line *-*
Program include
***************************************************************** * INITIALIZATION (ONE-TIME PER DATA PACKET) ********************* * TO READ FROM DATABASE (ALL RECORDS FOR DATA PACKAGE) ********** ***************************************************************** * FORM READ_USR02_TO_MEMORY_FOR_0BWTC_C02
*---*
Form READ_USR02_TO_MEMORY_FOR_0BWTC_C02
TABLES MONITOR STRUCTURE RSMONITOR
DATA_PACKAGE STRUCTURE /BIC/CS80BWTC_C02 USING RECORD_ALL LIKE SY-TABIX
SOURCE_SYSTEM LIKE RSUPDSIMULH-LOGSYS CHANGING ABORT LIKE SY-SUBRC.
* Refresh the internal table.
refresh: GT_USR02.
* Read USR02 user data to memory for this data package
select * into corresponding fields of table GT_USR02 from USR02
FOR ALL ENTRIES IN DATA_PACKAGE where BNAME = DATA_PACKAGE-TCTUSERNM order by primary key.
* if abort is not equal zero, the update process will be canceled
ABORT = 0.
34
Tip 12: Program Includes (cont.)
• Illustration – Include perform statements
Update routine
FORM compute_key_field
TABLES MONITOR STRUCTURE RSMONITOR "user defined monitoring USING COMM_STRUCTURE LIKE /BIC/CS0BWTC_C02
RECORD_NO LIKE SY-TABIX RECORD_ALL LIKE SY-TABIX
SOURCE_SYSTEM LIKE RSUPDSIMULH-LOGSYS CHANGING RESULT LIKE /BI0/V0BWTC_C02T-USERGROUP RETURNCODE LIKE SY-SUBRC
ABORT LIKE SY-SUBRC. "set ABORT <> 0 to cancel update *
*$*$ begin of routine - insert your code only below this line*-*
* fill the internal table "MONITOR", to make monitor entries
PERFORM READ_GT_USR02 USING COMM_STRUCTURE-TCTUSERNM RECORD_NO RECORD_ALL SOURCE_SYSTEM CHANGING GS_USR02 ABORT. RESULT = GS_USR02-CLASS.
*if abort is not equal zero, the update process will be canceled
*$*$ end of routine - insert your code only before this line *-* ENDFORM.
Program include
***************************************************************** * RECORD PROCESSING (RUN PER RECORD) **************************** * TO READ FROM MEMORY (ONE RECORD) ****************************** ***************************************************************** * FORM READ_GT_USR02
*---*
FORM READ_GT_USR02
USING TCTUSERNM LIKE USR02-BNAME RECORD_NO LIKE SY-TABIX RECORD_ALL LIKE SY-TABIX
SOURCE_SYSTEM LIKE RSUPDSIMULH-LOGSYS CHANGING GS_USR02
ABORT LIKE SY-SUBRC. "ABORT<>0 cancels update
STATICS: L_RECORD LIKE SY-TABIX. IF RECORD_NO <> L_RECORD.
L_RECORD = RECORD_NO. CLEAR GS_USR02.
* Read user data from internal table GT_USR02
READ TABLE GT_USR02
WITH KEY BNAME = TCTUSERNM INTO GS_USR02.
ENDIF.
Tip 13: Use Start and End Routines
• Start routines can be used to process the data efficiently prior to
starting the single records processing
The most efficient place to delete records from the data
package prior to spending time on processing them
• End routines in SAP NetWeaver 7.x allows for processing of the
data after it has been passed through the transformation
It is the most efficient place to copy data records (e.g., for
generating year-to-date figures)
36
What We’ll Cover …
• Loading data in SAP NetWeaver BW • Finding performance bottlenecks • Optimizing the database
• Optimizing the ABAP code • Optimizing the data models • Optimizing the data updates • Wrap-up
37
Tip 14: Data Modeling: Defining Dimensions
• Use as many dimensions as possible
Separate common filter characteristics into own dimension
• Use line-item dimensions for high cardinality characteristics such
as document numbers
Do not set the high cardinality flag!
• Define related characteristics in the same dimension
Calculate expected number of dimensional entries
Try not to exceed 10% of expected fact table entries
Verify the dimension design after the first dataloads using
program SAP_INFOCUBE_DESIGNS
• Add all relevant time characteristics
If 0CALMONTH is lowest granularity, add 0CALMONTH2,
0CALQUARTER, 0CALQUART1, 0HALFYEAR, and 0CALYEAR
38
Tip 15: Implement Semantic Partitioning
• What is it?
An architectural design to enable parallel data loading and
query execution
Partitioning criteria: Year, Region, or Actual/Plan
39
Tip 15: Implement Semantic Partitioning (cont.)
• Benefits of semantic partitioning:
Reduction in SAP NetWeaver BWA footprint (when partitioned
by year)
Parallel data loading (when not partitioned by year)
Parallel query execution
Best case when partitioning criterion is set as constant Almost as good to create variables to filter on 0INFOPROV
Archival of a single InfoCube does not impact others
Easier DB maintenance
Performance benefits are so significant … semantic partitioning should be deployed on virtually every data model!
40
Tip 15: Implement Semantic Partitioning (cont.)
• Example: Semantic partitioning by year
DataSource
Ex: Current Year + 1 = 2010 Current Year = 2009 Current Year - 1 = 2008 Current Year - 2 = 2007 Current Year - 3 = 2006
MultiProvider
Current Year - 1 Current Year Current Year + 1 Current Year - 2
Current Year – 3
Current Year - 1 Current Year Current Year + 1 Current Year - 2
Current Year – 3
ALL years Write-Optimized (No SIDs)
History (Summarized)
41
What We’ll Cover …
• Loading data in SAP NetWeaver BW • Finding performance bottlenecks • Optimizing the database
• Optimizing the ABAP code • Optimizing the data models • Optimizing the data updates • Wrap-up
42
Tip 16: Switch Off SID Determination for DSOs
• Switch off SID determination for DSOs that are not used in
reporting
SID determination is required only for report DSOs and take up
43
Tip 17: Activate Parallel Processing
• Parallel processing is
possible for most steps in SAP NetWeaver BW DTP Parallel Processing DSO settings Transaction code RSODSO_SETTINGS
44
Tip 18: Compress Data
• Compression of InfoCubes helps with two things in the dataflow:
Makes the tables that are updated smaller and hence faster to
update
The process variant that drops and recreates the indices during
loading in a process deletes only the indices on the F-fact table and hence the time to rebuild indices is much faster
• Recommendation
Compress data that is older than 2-8 days depending on your
Tip 19: Implement Number Range Buffering of DIMs and
SIDs
• The number range tables (NRIV) are called for every new distinct
record that is loaded to SAP NetWeaver BW as either master data or dimension in an InfoCube
The NRIV table is accessed with a select for update statement,
which can be quite slow
Buffering should be done as follows:
Determine the large number ranges (Document numbers, Dimensions with documents or many distinct values)
Goto t-code SNRO and set up buffering
46
Tip 20: Implement SAP NetWeaver BW Accelerator
• SAP NetWeaver BWA is superior to aggregates when it comes to
improving performance
• Aggregates require continuous tuning as the data and query
requirements change over time
• SAP NetWeaver BWA requires limited maintenance effort in
comparison
47
Tip 20: Implement SAP NetWeaver BW Accelerator (cont.)
• Disk speed is growing slower than other hardware components
47 In-memory data
stores
Multi-channel UI, high event volume, cross industry value chains Application-aware and intelligent data management Disk-based data storage Simple consumption of apps (fat client UI, EDI) General-purpose, application-agnostic database 1990 2010 Architectural Drivers Improvement 2010 1990 216 Addressable Memory 2502x 50.15 MB/$ 0.02 MB/$ Memory 5066x 253.31 MIPS/$ 0.05 MIPS/$ CPU Technology Drivers 600 MBPS 5 MBPS Disk Data Transfer 120x 1000 x 100 Gbps 100 Mbps Network Speed 264 248x
Source: 1990 numbers SAP AG, 2010 numbers, Dr. Berg
Physical hard drive speeds grew by only 120 times since 1990. All other hardware components grew faster.
Source: SAP
Tip 20: Implement SAP NetWeaver BW Accelerator (cont.)
• In this example, the
average query
execution took 58.8 seconds; after
SAP NetWeaver BW Accelerator, the average query took 17.9 seconds (295% faster overall)
Tip 20: Implement SAP NetWeaver BW Accelerator (cont.)
• With SAP NetWeaver BW 7.3,
you can have data in SAP NetWeaver BW Accelerator; InfoCubes are not required
• This saves the loading time to
the BW cube start schema
• You should implement SAP
NetWeaver BWA if you want to consistently improve query
performance and data load performance
Tip 20: Implement SAP NetWeaver BW Accelerator (cont.)
• SAP NetWeaver BWA is an appliance, but it does require some
maintenance activities to keep it running smoothly
Monitor SAP NetWeaver BWA utilization to avoid overloading
The rule of thumb is that you should have data that is less than 50% of the memory size
Overloading SAP NetWeaver BWA will cause performance degradation
Compress the cubes and rebuild indices on a regular basis
SAP NetWeaver BWA is not a cheap toy. The licensing is based on blades used.
Avoid using more space than needed by dropping and rebuilding the SAP NetWeaver BWA indices on a regular basis
51
• Gather information about end-user
query requirements and drill-down patterns
• You can suggest aggregates based on
query design
• Execute the query multiple times using
realistic drill-down scenarios
• Allow time for users to execute queries
and collect SAP NetWeaver BW statistics
• You can suggest aggregates based on
SAP NetWeaver BW statistics
• Analyze the use of aggregates
• Modify aggregates for optimization
Before aggregate creation: After aggregate creation:
Tip 20: Implement SAP NetWeaver BW Accelerator (cont.)
• Avoid aggregates but consider as a back up for SAP NetWeaver
BW Accelerator
They come at a cost
Additional step in data loading
Longer runtime for master data and hierarchy activations
Check that the query is using the aggregate via RSRT
52
What We’ll Cover …
• Loading data in SAP NetWeaver BW • Finding performance bottlenecks • Optimizing the database
• Optimizing the ABAP code • Optimizing the data models • Optimizing the data updates • Wrap-up
Resources
• Joe Darlak of COMERIT, SAP NetWeaver BI and Portals 2010
conference (Orlando, Florida)
Practical Tips to Improve Data Loading Performance and
Efficiency in SAP NetWeaver by Up to 75%
• Training
BW360 BW – Performance and Administration class
54
7 Key Points to Take Home
• Use the SAP NetWeaver BW statistics to find data loads that
require optimization – target to optimize top 5-10 every month
• Use SE30 to analyze ABAP runtime for DataSources and
transformations
• Review and implement the recommended database parameters for
SAP NetWeaver BW
• Ensure that all SQL statements used in the data loading process
are using indices and that statistics are calculated for the tables
• Make sure that the ABAP coding used in extraction exits and
transformation is optimized
• Review and optimize the data models to avoid unnecessary
processing
56
Disclaimer
SAP, R/3, mySAP, mySAP.com, SAP NetWeaver®, Duet®, PartnerEdge, and other SAP products and services mentioned herein as well as their
respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All other product and service names mentioned are the trademarks of their respective companies. Wellesley Information Services is neither owned nor controlled by SAP.