slay Bad data In essBase 2
2.4 Four steps to Data Quality anD suCCess
2.4.1 Extract Essbase Dimensionality to a Table
If ASosamp was fully built from known dimension sources that, in turn, were con-strained against the fact data, there would be no need to test that the data matches the database metadata; it could not be any other way. This absolute match of data and metadata achieved through referential integrity is the hallmark of a data warehouse.
As I have noted, however, many Essbase databases are not built from data warehouses.
Frequently, Essbase databases are built from multiple data sources, some of which are more reliable than others. Some are downright unreliable. I am talking about spread-sheets and text files that are “always right,” except when they are not, which is often.
Figure 2.4 Mnemonic naming convention in Designer.
Table 2.2
Object Type Icon Prefix Sample
Interface INT INT_Years
OS Command OS OS_MaxLDimLoad
Load Plan PLAN PLAN_ASOsamp
Procedure PROC PROC_LogSuccessStatus
Scenarios SCN SCN_PKG_HMISSINGMEMBERS
Variable VAR VAR_Object
moreover, dimension maintenance of some but not all dimensions is often manual.
What happens when part of a data source does not match dimensionality or an admin-istrator erroneously renames a member? We both know the answer: kABoom! Insert the mental image of a bad data mushroom cloud rising ominously over your Essbase database.
A process that reverses out the dimensions to a SQL table so that constraints in the form of InnEr JoIns can occur is the first step in guaranteeing that the fact data can be successfully loaded into the database.
2.4.1.1 Extracting Dimensions While this step could be done through one of the sev-eral freeware outline extractors (Applied olap’s outlineExtractor and Sebastién roux’s outlinereader are the two most common) or via Essbase 11.1.2.x’s maxL “export outline”
command, oDI was used for this part because it can write directly to a SQL table. The freeware products only write to flat files and maxL outputs only to xmL.
Before ASosamp’s dimensions are extracted to the Dimensions table through a series of dimension-specific Interfaces, Data models for Essbase and the Dimensions table target and their concomitant Physical and Logical schemas must be defined first in topology manager. your environment will dictate the specific schema and other defini-tions. After you reverse the objects, the Essbase ASosamp and SQL table Dimension Data models will look like table 2.3.
The Dimensions Data model contains some fields that are for the dimension tagged as “Accounts” only, e.g., timeBalance, varreporting, etc. These properties are ignored for non-Accounts dimensions on reversal and are set to nuLL. oDI will automatically map Essbase and SQL field names in Interfaces if the names are identical. The Essbase Data model names are defined by oDI so it is up to you to define your SQL target fields in the same way. The alternative is to manually map fields, but why would anyone want to create manual, error-prone Interface mapping requirements?
Table 2.3
Staging Areas and Work Tables
ODI uses a concept called Extract, Load, and Transform (E-LT), which means that it performs data processing on the target using the target’s own internal language. This target orientation can lead to a smaller footprint memory/disk footprint, fewer servers, and better performance than a traditional ETL tool, which does its processes in a separate staging area using its own processing engine. In ODI, E-LT’s target orientation means that any temporary work tables are dynamically created and used solely in the target environment.
By default, these temporary tables are deleted to clean up the data target. However, this automatic deletion can be undesirable as the work tables can be a rich source of debugging information if something goes wrong. It is nice to be able to see what the last successful (or otherwise) pull of information was in an n-step process beyond what the ODI Operator scheduler and job console shows.
To keep those temporary tables from being deleted, go to the Flow tab in a given Interface, select the Source Set object (that’s what you’re reading from) and, as in Figure 2.5, set the DELETE_TEMPORARY_OBJECTS option from the default TRUE to FALSE. ODI will automatically drop and recreate the tables on the next run.
However, Figure 2.6 illustrates the problem when E-LT processing is used as these temporary tables appear in the database/schema list as below.
This is a minor annoyance with one work table; when extended to multiple work tables, it can clutter the table list and make navigation difficult. Moreover, it is highly likely that the SQL username ODI is using will have table write access, but will not have table create rights in the target environment, resulting in execution errors when ODI tries to create those tables. Can ODI’s temporary work tables exist in an E-LT paradigm?
Yes, by using a different data store for its processing than the target. While this can be done in the ODI server’s memory, using either the MEMORY_ENGINE or the SUNOPSIS_MEMORY_ENGINE as defined in the Interface Definition
Figure 2.6 ODI Data Model of target SQL databases.
Figure 2.5 DELETE_TEMPORARY_OBJECTS ODI Interface Flow Dialog Box.
tab, the ephemeral nature of these RAM-based processing areas makes debugging impossible as once the process is finished, any temporary tables are destroyed. Additionally, while the in-memory engines are fast, they do not scale well as they are limited by available memory.
A better approach is to use a separate SQL data store that has already been defined as an ODI Data Model. That means that there are three data stores in a given process: the source, the staging area, and the target. “What,” you gasp as you fight for air, “break the E-LT architecture after Oracle went through all that trouble to make it faster/better/
smarter?” You bet, because with Essbase the concept of E-LT has already struck out:
• There are no tables at the Essbase target as Essbase is not relational, so tables cannot be created there, only in some third area. Strike one against E-LT for Essbase.
• There is no data manipulation language at the Essbase target. Essbase has MDX formulas and external ASO Calc Scripts, BSO Calc Scripts, MaxL, Esscmd, and the various APIs, but nothing that acts as a data manipu-lation language. Strike two against E-LT for Essbase.
• The lack of target tables and data manipulation functionality means something has to perform Extract, Transform, and Load processing. Essbase thus requires an ODI server agent and engine. Note: the ODI server and agent can coexist with many other products on a given physical server; it need not truly be a separate box.
Strike three against E-LT for Essbase.
• No change on your part is required; the default processing within the in-memory engine using implicit SQL puts paid to the notion of Essbase E-LT: Essbase with ODI is not E-LT. For your purposes, this is great because you can take advantage of the traditional ETL architecture by picking a separate relational data store to cleanly store those temporary work tables, as shown in Figure 2.7. Choosing this relational data store is as easy as ticking the “Staging Area Different From Target” box on the Interface Overview tab and then selecting the relational staging Data Model, in this example named ODISTAGE.
This selection results in temporary tables written to the separate staging area, thus providing the best of both worlds:
supplemental debugging information and an uncluttered, easy to read target data store.
Each extracted dimension requires its own Interface. There are only a few things to note in Figure 2.8’s Interface:
• The Dimname field is hard coded with the dimension name.
• Identical field names map automatically.
• membername is the primary key for the Dimensions table.
• All mapping takes place in the Staging Area.
Figure 2.7 Selecting a different staging area.
If you got excited by the dimension name constant in the target, e.g., Age, geography, etc., do not bother; you are stuck with setting up each dimension separately as each Interface is inextricably linked to a physical Essbase dimension as shown in the source pane on the left. After something as important (and permanent) as that is put into play, a mere string constant is not material. Alas, this is how oDI works when it comes to revers-ing Essbase dimensions short of programmatically generatrevers-ing Interfaces through code.
This is an advanced book, but not that advanced. This object-based approach is used for reversing dimensions because of its ease; the other components of this solution use more flexible approaches. And, if you can figure out a way to generalize this extraction portion, please let me know (contact information is on page x).
Compiling the individual Interfaces to Scenarios and then linking them together in a Package, as shown in Figure 2.9, allows all of the dimensions to be extracted to the Dimensions table.
Figure 2.8 INT_ExtractAge Interface.
Figure 2.9 Package PKG_Extract_ASOsamp.