Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

(1)

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1

Mark Rittman, Director, Rittman Mead Consulting for Collaborate’09, Florida, USA, May 2009

The Oracle BI Applications 7.9.5 consists of a pre‐defined dimensional data warehouse, ETL routines, an Oracle BI Enterprise Edition repository and example dashboards and reports. The ETL routines are built using Informatica PowerCenter and are scheduled and orchestrated using the Oracle Data Warehouse Administration Console.

The data warehouse provided with the Oracle BI Applications is designed to be deployed on either Oracle Database Enterprise Edition, Microsoft SQL Server or IBM DB/2. As such, whilst it uses common data warehousing features such as bitmap indexes, it does not make use of any Oracle‐specific features such as segment compression, partitioning or materialized views. It is possible however to make use of these features, and this paper sets out a methodology for their use with Oracle BI Applications 7.9.5, along with the Oracle Data Warehouse Administration Console 10.1.3.4 or higher.

An Oracle Business Analytics Warehouse Overview

The Oracle Business Analytics Warehouse consists of a number of staging and presentation tables that together support the loading and querying of enterprise data via a conformed dimensional model.

Tables are created as regular heap tables, with a minimal amount of NOT NULL check constraints but no primary keys, foreign keys, partitions or other additional items of metadata. Tables are loaded via Informatica PowerCenter 8.1.1, using the PowerCenter Integration Service and row‐by‐row data loading.

Aggregate tables are created and populated to support key fact tables, using separate ETL processes after the main fact table loads that truncate, and then rebuild the aggregates.

It is however possible to customize the Oracle Business Analytics Warehouse to take advantage of features such as segment compression, partitioning, materialized views and other Oracle data

warehouse features. It is also possible to add additional metadata such as primary key and foreign key constraints, dimensions and other features to support more efficient querying of detail‐level and summarized data. To illustrate how these Oracle features can be used to optimize the loading and querying of the Oracle Business Analytics Warehouse, this paper will take on of the fact tables within the data warehouse and apply these techniques to it.

Performance Optimization Scenario

The Oracle Business Analytics Warehouse contains a table called W_SALES_INVOICE_LINE_F that

contains fact data on the sales invoices generated by the business. It is supported by an aggregate table,

W_SALES_INVOICE_LINE_A that takes data from the original table and summarizes it to improve

query performance. In the sample data set used in this paper, these two tables had the following row

count and size.

(2)

select count(*)

from w_sales_invoice_line_a;

COUNT(*) --- 36264

select count(*)

from w_sales_invoice_line_f;

COUNT(*) --- 478571

select segment_name

, bytes/1024/1024 "Size in MB"

from user_segments

where segment_name in ('W_SALES_INVOICE_LINE_F’, 'W_SALES_INVOICE_LINE_A');

SEGMENT_NAME Size in MB --- --- W_SALES_INVOICE_LINE_A 9 W_SALES_INVOICE_LINE_F 189.125 These tables are loaded by two DAC tasks, and a DAC Task Group:

1. TASK_GROUP_Load_SalesFact calls the following SIL and PLP tasks, and when they have completed recreates any indexes required for supporting queries.

2. SIL_SalesInvoiceLinesFact , which initially drops the indexes on the fact table, then calls either the SIL_SalesInvoiceLinesFact and SIL_SalesInvoiceLinesFact_Full Informatica workflows for incremental and full loads respectively, and then recreates just those indexes required for the rest of the ETL process.

3. PLP_SalesInvoiceLineAggregate , which again drops indexes this time on the aggregate table, then calls either the PLP_SalesInvoiceLinesAggregate_Load and

PLP_SalesInvoiceLinesAggregate_Load_Full Informatica Workflows for incremental and full loads of the aggregate table, then recreates the indexes required for the rest of the ETL process.

Initial Benchmarks

To create baseline figures to compare your optimizations to, start the Oracle Data Warehouse

Administration Console 10.1.3.4 or higher, and create a new subject area within the DAC Repository that

uses the two tasks and one task group listed above.

(3)

Now switch to the Execute view in the DAC Console and create a new execution plan that deploys this new subject area. Create the parameter list for the execution plan and build the set of ordered tasks, so that you are ready to run the execution plan and generate some baseline ETL timings.

Now run the execution plan two times, firstly in FULL mode and then in INCREMENTAL mode, so that you can compare them against subsequent timings to establish the benefit that each feature provides.

Adding Compression to the Fact Table

The first optimization task is to add the COMPRESS clause to the W_SALES_INVOICE_LINE_F fact table, so that rows that are inserted using direct path operations are compressed. Compression is an Oracle Database Enteprise Edition feature, and stores more rows of data into each individual data block to provide two main benefits:

• less space is taken up by data warehouse data, and

(4)

• full table scans can be performed faster as less blocks are required to retrieve all the table’s data.

To test the benefits of compressing the W_SALES_INVOICE_LINE_F fact table, first truncate it and then alter the table to add compression.

SQL> truncate table w_sales_invoice_line_a;

Table truncated.

SQL> alter table w_sales_invoice_line_f compress;

Table altered.

You can then restore the ETL source tables back to their original state and run the full, and then incremental loads into the fact and the aggregate tables in order to test that compression is working as expected. Note that tables will only be compressed when data is inserted, as Informatica PowerCenter by default uses bulk load functionality perform table inserts. Updates, or mixed insert/update loads will not benefit from compression as Informatica will revert to conventional‐path inserts, and of course updates remove compression from Oracle tables unless you are using the Advanced Compression Option to the database.

Partitioning the Fact Table

Partitioning is an option to the Enterprise Edition of the Oracle Database that allows you to split one large physical table into several smaller physical ones, with users still seeing it as one big table but giving you far more flexibility in how you can store and manage data within it. Partitioning is typically used with large fact tables and allows the DBA to assign each partition to separate tablespaces, which can then be stored on different physical disk units and backed up independently. As with table compression though, unfortunately the Data Warehouse Administration console does not have any concept of partitioning and you will therefore have to carry out some additional steps to use this feature.

Tables such as the W_SALES_INVOICE_LINE_F table are normally created by the DAC administrator when initially installing the Oracle Business Analytics Warehouse, by selecting Tools > ETL Management

> Configure from the DAC menu. However there is no provision to create tables using the PARTITION BY(or COMPRESS) clauses, and so we can either create the table outside of the DAC, as we did in the previous step for the COMPRESS clause, or we can use the Actions feature in DAC 10.1.3.4.1 to create our table for us, using the requisite clause, before we try and do a full load into the table.

Actions are a new feature of the 10.1.3.4.1 version of the DAC and allow us to create table, index and task actions.

• Table actions allow us to override the Truncate and Analyze steps carried out on tables during an ETL process

• Index actions allow us to override the creation and dropping actions associated with indexes

• Task actions allow us to execute SQL and PL/SQL steps before or after a task executes.

(5)

Note that the 10.1.3.4 version of the DAC has a bug in it that corrupts the SQL text for an action. You will need to apply a patch over this release to be able to carry out the actions in this paper (available from Oracle Metalink, patch number TBA at the time of writing this paper), or install the 10.1.3.4.1 release when it becomes available.

As DAC actions cannot override the creation step for a table, only the truncate and analyze steps, you will add a new task action for the SIL_SalesInvoiceLinesFact task, that will run when the task is run in FULL mode, and that will drop the existing, non‐partitioned version of the table and recreate it using the required partitioning clause. To start this process, select Tools > Seed Data > Actions > Task Actions from the DAC application menu.

At the Task Action dialog, press New to create a new action, call the action Create

W_SALES_INVOICE_LINE_F Partitioned , save the action and click in the Value text box to set the

table creation scripts.

(6)

Using the action Value dialog, create two steps, one to drop the table and the other to create it.

Make sure the drop step is listed above the create step is set to Continue on Fail, and enter the

following SQL statement to drop the table:

drop table w_sales_invoice_line_f

For the create table step, do not check the Continue on Fail checkbox, then enter the appropriate table creation command into the SQL Statement text box, remembering to add the PARTITION BY clause to the script, and the COMPRESS clause if you would like the table to be compressed as well.

create table w_sales_invoice_line_f (sales_ordln_id varchar2(80 char), ……

x_custom varchar2(80 char) compress

partition by range (cost_center_wid)

(partition cc_1 values less tan 4000)

…….

(7)

Do not place any semi‐colons at the end of the SQL script as this will cause it to fail when run.

Even though this new task action will drop and then recreate, using partitioning, the

W_SALES_INVOICE_LINE_F table, the DAC still holds details of it in its repository together with details of the indexes that are associated with it. As things stand, the DAC would drop these indexes as part of the SIL task and recreate them using the task group task, however it does not normally understand the concept of local indexes and will try and create them without any local or global clause, which has the effect of creating them as global indexes. To instruct the DAC to create our indexes as local indexes, you now need to create an Index Action to override the normal index creation process for these indexes.

The first step in this process is to define the index action, then you will associate it with the relevant indexes. To create the index action, select Tools > Seed Data > Actions > Index Action, name the index action, press Save and then enter the Value editor.

This index action will execute for every index that we associate it with. It consists of an SQL statement

that uses a built‐in DAC function to return the name of the index in question, and the list of columns

that it indexes.

(8)

The SQL statement to be used is shown below, with the DAC functions highlighted in bold:

create bitmap index getIndexName() on getTableName() (getIndexColumns()) local nologging parallel

When the action is used, the DAC will substitute the index name, table name and index columns into the SQL statement and thereby create the index in question as a local index. Next, save the action and return to the main DAC console.

Now you need to associate the index action with the bitmap indexes that need to be created as local

indexes. To do this, navigate to the Indices tab in the Design view of the DAC, and query the repository

to return just the bitmap indexes associated with the W_SALES_INVOICE_LINE_F table. When the list of

indexes are displayed, right‐click anywhere on the list and select Add Actions…

(9)

Using the Add Actions dialog, select Create Index as the action type, Both as the Load Type, and then select the index action that you created in the previous step for the Action Type.

(10)

Now you can associate the task action with the SIL DAC task that populates the

W_SALES_INVOICE_LINE_F table, so that it drops and recreates the table using partitioning when it runs in full load mode. To do this, locate the SIL_SalesInvoiceLinesFact task using the Task tab in the Design view, and select the Actions tab when the task is displayed. Then, select Preceding Action as the Action Type, Full as the Load Type and then select the task action that you created earlier.

Now you can re‐run your execution plan, which will now include these index and task actions in the

steps that are carried out. After the execution plan completes, you can check the list of steps carried out

by the SIL_SalesInvoiceLinesFact task to see your task action being carried out

(11)

Using Materialized Views for Table Aggregation

The Oracle BI Applications use aggregate tables to improve the performance of queries that request aggregated data. These aggregate tables are then mapped into the Oracle Business Intelligence Enterprise Edition metadata layer, so that the BI Server can internally rewrite queries to use these aggregates. Post‐Load Processing (PLP) Informatica workflows load data into these aggregate tables, either as a complete refresh or incrementally, as part of the subject area load.

The Enterprise Edition of the Oracle Database has similar functionality in the form of Materialized Views.

These allow database administrators to define aggregates in the database, which are then used at query runtime to satisfy queries that required aggregated data. Materialized Views can be fast refreshable and can be designed to either satisfy a single aggregation, a range of aggregations or can even be created using an OLAP Analytic Workspace to meet the aggregation needs of an entire star schema.

Like partitioned tables and local indexes, the DAC does not contain out‐of‐the‐box functionality to create and maintain materialized views. To add this functionality to your ETL process, you therefore need to add two new actions to the DAC repository:

1. An action to create the W_SALES_INVOICE_LINE_A object as a materialized view rather than a regular table, and to create the required Materialized View Logs to support fast refresh of this object.

2. An action to perform the refresh of the materialized view, which you will use in place of the regular PLP_SalesInvoiceLinesAggregate_Load and

PLP_SalesInvoiceLinesAggregate_Load_Full Informatica workflows .

To see how the existing aggregated table is populated, and to extract the base SQL statement that you

will need to create the materialized view, open the Informatica Designer application and locate the

PLP_SalesInvoiceLinesAggregate_Load_Full mapping. When you view the mapping logic, you

will see that the W_SALES_INVOICE_LINE_F table is joined to the W_DAY_D table through a Source

(12)

Qualifier mapping, which is then supplemented with a sequence number that is used to populate the ROW_WID column.

Whilst we cannot reproduce the sequence functionality with a materialized view, you will be able to take the data from these two tables and use it to initially populate, and then refresh, the materialized view.

Like the previous example where you create a partitioned table, creation of the materialized view will be performed by a new task action that you will associate with the

PLP_SalesInvoiceLinesAggregate_Load DAC task when run in Full mode. To create the action, select Tools > Seed Data > Actions > Task Action, and create and save a new task action. At the Value dialog, create individual steps to drop and recreate the required materialized view logs, then drop and recreate the materialized view, using the following SQL statements:

1. Drop Materialized View Log on W_SALES_INVOICE_LINE_F

drop materialized view log on W_SALES_INVOICE_LINE_F 2. Drop materialized view log on W_SALES_INVOICE_LINE_F

drop materialized view log on W_DAY_D

(13)

3. Create Materialized View Log on W_SALES_INVOICE_LINE_F

create materialized view log on W_SALES_INVOICE_LINE_F with sequence, rowed

( sales_ordln_id, sales_pckln_id ...

discount_line_flg) including new values

4. Create Materialized View Log on W_DAY_D

create materialized view log on W_DAY_D with sequence, rowed

( row_wid, calendar_date ...

x_custom) including new values 5. Drop Materialized View

drop materialized view W_SALES_INVOICE_LINE_A 6. Create Materialized View

create materialized view W_SALES_INVOICE_LINE_A pctfree 0

build immediate refresh fast as

select 1 as row_wid,

w_sales_invoice_line_f.chnl_type_wid

…

from w_sales_invoice_line_f, w_day_d

where ….

group by … )

Note that with the materialized view definition script, you will need to populate the ROW_WID column with a constant, as it is not possible to create a materialized view that uses an Oracle sequence to populate a column. You will also need to include COUNT() and COUNT(column) columns for all aggregated columns in order for the materialized view to be fast‐refreshable. See the Oracle Data* Warehousing Guide available on http://docs.oracle.com for full details on creating materialized views.

Make sure that you mark all “drop” steps as Continue on Fail, so that the whole ETL process doesn’t

stop because the object did not exist in the first place, something that will happen when you first make

use of the action.

(14)

Now that you have the task action in place to create the materialized view and it’s associated logs, you can now create another task action to refresh the materialized view. To do this, select Tools > Seed Data

> Actions > Task Action again, and this time create a new action to perform the refresh. Enter the following anonymous PL/SQL block into the SQL Statement text box to refresh the materialized view:

begin

dbms_mview.refresh(‘OBAW.W_SALES_INVOICE_LINE_A’) end;

Remember to replace the name of the schema with the one appropriate for your database.

In addition, create another task action called “Dummy Refresh” or similar that you will associate with the PLP task run in full mode, create a new step within it but do not enter any SQL text. This is required as running the task in full mode will create and refresh the materialized view automatically, but we need an action to associate with the task to make it valid.

Once all of your task actions are created, including the ones used in the previous example, your list of

task actions should look like this:

(15)

Next, locate the PLP_SalesInvoiceLinesAggregate_Load task in the DAC Design view and change the Execution Type to SQL File, then replace the Command for Incremental Load command with a call to the Fast Refresh task action created previously, and the Command for Full Load command with a call to the dummy action you created at the same time.

(16)

Then, switch to the Target Tables tab and un‐check the Truncate for Full Load checkbox, otherwise the

DAC will automatically truncate the materialized view just after you have created and refreshed it, and

subsequent fast refreshes will fail with an ORA-32320 error due to the truncation having counted as a

partition maintenance operation (PMOP).

(17)

Finally, to get the DAC to create the materialized view for you when the task is first run, add a new Preceding Task action to the task to run the Create Materialized View task action you created **previously. Be sure to drop the existing W_SALES_INVOICE_LINE_A aggregate table using SQL*Plus** before you re‐run your ETL, as trying to drop it whilst referring to it as a materialized view will cause an error and fail to drop the table.

Delete, and recreate the execution plan for your subject area, to pick up the changes to the PLP

mapping. Once you have done this, you are now ready to re‐run your DAC execution plan, to assess what improvement to processing time these changes have made.

Quantifying the Improvements

When run against the author’s installation of Oracle Business Intelligence Applications 7.9.5, using a subset of the Oracle E‐Business Suite Vision dataset, the following timings were recorded using these scenarios:

1. Baseline run of the standard out‐of‐the‐box ETL routines

2. Addition of the COMPRESS clause to the W_SALES_INVOICE_LINE_F table

3. Adding of partitioning to the W_SALES_INVOICE_LINE_F table, and keeping compression.

(18)

4. All of the above, and replacement of the W_SALES_INVOICE_LINE_A table with a fast‐refresh materialized view.

The results of these scenarios are shown in the table below.

Scenario Load Type Rows

Loaded Elapsed

Time Time

Improvemen t

Vs. Baseline

Fact Table Size

Size

improvemen t

Vs. Baseline

1 – Baseline Full 478571 904 secs n/a 189Mb n/a

1 – Baseline Incremental 553 499 secs n/a 189Mb n/a

2 – With

Compression Full 478571 887 secs 2% 43MB 77%

2 – With

Compression Incremental 553 482 secs 3% 43MB 77%

3 – With Partitioning

&

Compression

Full 478571 834 secs 7% 44MB 77%

3 – With Partitioning

&

Compression

Incremental 553 470 secs 6% 44MB 77%

4 – With Partitioning, Compression and

Materialized View

Full 478571 437 secs 51% 44MB 77%

4 – With Partitioning, Compression and

Materialized View

Incremental 553 334 secs 33% 44MB 77%

Overall, using table compression on the main fact table reduced its storage requirement by 77%, from 189MB to 44MB. Using a fast‐refreshable materialized view, along with partitioning and compression, reduced the ETL time for the main fact table and associated aggregate by 51% for a full load and 33% for an incremental load.

In addition, queries against the partitioned version of the fact table that can benefit from partition

elimination can experience significantly lower execution plan costs. The following query and execution

plan were executed against the original, non‐partitioned version of the W_SALES_INVOICE_LINE_F

table:

(19)

When the table is subsequently partitioned though, queries that benefit from partition elimination show a significant drop in their cost. In the example below, the table is being partitioned on

COST_CENTER_WID, whereas in reality it is likely to be partitioned on a date column, so that queries that only require data for a particular range of months or years can avoid scanning the entire table.

Further Opportunities for Optimization

As well as the actions outlined in this paper, there are further opportunities for optimizing the ETL and

query processing carried out by the Oracle BI Applications when working with the Enterprise Edition of

the Oracle Database.

(20)

For example, the DAC does not create primary key or foreign key constraints on the tables that it creates, which together with the default setting for the STAR_TRANSFORMATION_ENABLED parameter for newly created databases, in most cases means that star transformations are not used when the database handles queries against fact tables that involve filtering against two or more dimensions. You could therefore add addition task actions to SIL and PLP tasks to create and drop these constraints, possibly using the RELY NOVALIDATE clauses to minimize unnecessary redo, set the

STAR_TRANSFORMATION_ENABLED parameter appropriately and take advantage of this key Oracle data warehousing feature.

Another optimization possibility is to use the OLAP Option to Oracle Database 11g to replace the materialized view outlined in this paper with a Cube Organized Materialized View, which could provide aggregations for an entire star schema at multiple levels of aggregation. You would need to use Oracle Analytic Workspace Manager (a free download from http://otn.oracle.com) to create the cube

organized materialized view, but once it is created it could be refreshed in the same manner as the materialized view that this paper describes.

Conclusions

The ETL and query optimization techniques provided out‐of‐the‐box with the Oracle BI Applications provides are appropriate for generic databases, but can be improved apon if you make use of the specific data warehouse optimizations available on your actual target database. The Enterprise Edition of the Oracle Database provides many such features including segment compression, materialized views and partitioning, and this paper sets out how they can be used in conjunction with the new “actions”

feature available with the Oracle Data Warehouse Administration Console 10.1.3.4 and higher.

About the Author

Mark Rittman is an Oracle ACE Director and is co‐founder of Rittman Mead Consulting, a specialist Oracle partner delivering Oracle data warehousing, business intelligence and performance management solutions. Mark is co‐chair of the ODTUG BI&DW SIG, is editor of Oracle Scene, the magazine of the UK Oracle User Group, writes regularly for Oracle Magazine, Oracle Technology Network and the ODTUG Technical Journal, and runs a blog at http://www.rittmanmead.com/blog. Mark can be reached at [email protected] if you would like to discuss the contents of this white paper.