2
PowerCenter Developer:
Tips & Tricks for Mapping Designer
Lingaraju Ramasamy (Raju),
Technical Architecture Manager
Informatica Professional Services
Agenda
•
Introduction
•
Architecture Best Practices
•
Mapping Tips & Tricks
•
Transformation Techniques
•
Use of Metadata
•
Repository maintenance
4
Presenter Contact
•
Lingaraju Ramasamy (Raju)
•
[email protected]
•
408-368-2475 (Mobile)
•
Technical Architecture Manager, Informatica
Professional Services
6
Architecture Best Practices
•
Consistency
•
Applying consistent standards reduces long term complications
•
Naming Conventions (Velocity)
•
Descriptions
•
Environments
•
Documentation (Hyperlink to SharePoint)
•
Modularity
•
Develop according to a modular design
•
Common Error Handling
•
Reprocessing
•
Mapping Assistants
8
Architecture Best Practices
•
Scalability
•
Keep volumes in mind in order to create efficient mappings
•
Caching
•
Queries
•
Partitioning
•
Initial vs. Incremental Loads
•
Simplicity
•
Multiple simple processes are often better than few complex
processes
•
Multiple mappings
•
Simple Queries
•
Staging Tables
•
Advantages
Sample Complex Mapping
EXP_SEQ_HEA DER_HIERARC HY_RECS11 EXP_SEQ_DET AIL_HIERARCH Y_RECS SQ_SC_T_STR _ATTR_OUTL_ ORG_WK SC_T_STR_AT TR_OUTL_LAN G_FN (Oracle) JAVA_GENERA TE_MSGID JAVA_GENERA TE_SESSID JNR_OUTL_HR _HOL JNR_OUTL_HR S SEQ_HR_HDR SC_T_STR_AT TR_OUTL_HRS _WK1 (Oracle) SC_T_STR_AT TR_OUTL_TM_ FN (Oracle) SC_T_STR_AT TR_OUTL_ORG _WK (Oracle) SC_T_STR_AT TR_OUTL_HRS _WK (Oracle) SC_T_STR_AT TR_OUTL_HOL _FN (Oracle) SC_T_STR_AT TR_OUTL_FN1 (Oracle) SEQ_HOL TC_TRANSACT ION_RECS RTR_HDR_DE T_DATA SEQ_PK_FK_O UTLET EXP_SEQ_DET AIL_HOLIDAY_ RECS EXP_HRS_BOO KEND EXP_SEQ_REC _TM_ZONE SEQ_HRS EXP_LANG_BOOKEND JNR_OUTL_LA NG_HOL_HRS
EXP_SEQ_HEA DER_TEAM_M EMBER_RECS SQ_TEAM_ME
MBER EXP_TEAM_ME MBER SEQ_TEAM_ME MBER JNR_TEAM_ME MBER JNR_HRS_HDR EXP_HRS_HDR SQ_SC_T_STR _ATTR_OUTL_ FN1 EXP_OUTL_BO OKEND SQ_SC_T_STR _ATTR_OUTL_ HRS_WK SEQ_ORG JNR_ORGANIZ ATION SEQ_LANG EXP_HOL_BOO KEND EXP_SEQ_SER VICE_REC_3 EXP_HEADER_ HOURS EXP_CREATE_ HDR_ELEMEN TS EXP_SEQ_REC _GEO_TYPE SQ_SC_T_STR _ATTR_OUTL_ HOL_FN EXP_SEQ_REC _BASIC SEQ_OTHERS EXP_DETAIL_H OURS SRT SQ_SC_T_STR _ATTR_OUTL_ LANG_FN SQ_SC_T_STR _ATTR_OUTL_ HRS_WK1 EXP_CHK_NE W_RECS EXP_OUTL_OR G_BOOKEND SEQ_OUTLET EXP_PASS_TH ROUGH
10
Simplified Complex Mapping
RTR_HDR_DE T_DATA EXP_DETAIL_L ANG EXP_CHK_NE W_RECS SEQ_OUTLET LANG_DETAIL_ STR_ATTR_OU TL_WK (Oracle) LANG_HEADER _SC_T_STR_A TTR_OUTL_W K (Oracle) SQ_SC_T_STR _ATTR_OUTL_ CHG_WK SC_T_STR_AT TR_OUTL_CHG _WK (Oracle) SC_T_STR_AT TR_OUTL_LAN G_FN (Oracle) SC_LKP_GET_ MSGID SC_EXP_CREA TE_HDR_ELEM ENTS SRT_INCM_RE CS SEQ_LANG SQ_SC_T_STR _ATTR_OUTL_ LANG_FN EXP_HEADER_ LANG SEQ_OTHERS EXP_PASS_TH ROUGH EXP_OUTL_BO OKEND JNR_OUTL_LA NG_HOL_HRS EXP_LANG_BO OKEND SQ_SC_T_STR _ATTR_OUTL_ HRS_WK SC_LKP_GET_ MSGID SC_T_STR_AT TR_OUTL_HRS _WK (Oracle) SC_EXP_CREA TE_HDR_ELEM ENTS DETAIL_HOUR S_SC_T_STR_ ATTR_OUTL_ WK1 (Oracle) HOURS_HDR_ SC_T_STR_AT TR_OUTL_WK ( Oracle) EXP_SEQ_DAY _OF_WEEK EXP_SRC_BOO KEND SEQ_HRS SQ_SC_T_STR _ATTR_OUTL_ CHG_WK SC_T_STR_AT TR_OUTL_CHG _WK1 (Oracle) EXP_HRS_BOO
KEND EXP_CHK_NE W_RECS RTR_HDR_DE T_DATA DAY_OF_WEEK _SC_T_STR_A TTR_OUTL_W K2 (Oracle) SRT_INCM_RE CS EXP_DETAIL_H OURS EXP_HEADER_ HOURS JNR_OUTL_HR S SEQ_OTHERS EXP_PASS_TH ROUGH SEQ_OUTLET SC_T_STR_AT TR_OUTL_CHG _WK (Oracle) EXP_CHK_NE W_RECS EXP_LANG_BO OKEND SC_T_STR_AT TR_OUTL_LAN G_FN (Oracle) SEQ_OUTLET LANG_DETAIL_ STR_ATTR_OU TL_WK (Oracle) RTR_HDR_DE T_DATA EXP_DETAIL_L ANG SQ_SC_T_STR _ATTR_OUTL_ CHG_WK SC_LKP_GET_ MSGID SC_EXP_CREA TE_HDR_ELEM ENTS SRT_INCM_RE CS LANG_HEADER _SC_T_STR_A TTR_OUTL_W K (Oracle) SEQ_LANG SQ_SC_T_STR _ATTR_OUTL_ LANG_FN EXP_HEADER_ LANG SEQ_OTHERS EXP_PASS_TH ROUGH EXP_OUTL_BO OKEND JNR_OUTL_LA NG_HOL_HRS
Staging
1
Staging 2
Staging 3
Staging –> WebServices
JAVA_GEN_MS G_ID SEQ_ID EXP_GET_SEQ _NUM SQ_SC_T_STR _ATTR_SITE_F N WSC_STR_ATT R_SAVE_SITE EXP_SRC_BOO KEND JAVA_GEN_SE SSID SC_T_STR_AT TR_SITE_FN1 ( Oracle) SC_T_STR_AT TR_SITE_FN (O racle)
12
Mapping Tips
•
Sources and Targets
•
Use shortcuts from shared folders
•
Extract only what is necessary
•
Limit reads on source
•
Distinguish between similar sources and targets
• Example
• DIM_CUSTOMER1 = DIM_CUSTOMER_insert
Mapping Tricks
Parameters & Variables
• Reduce overhead of creating multiple mappings • Replace hard coded values
• Use to incrementally extract data
Example
UpdateDateTime >= TO_DATE (‘$$PREV_RUN_TS’)
14
Mapping Tricks
Parameters & Variables
•
Assign Parameter/Variable values in a Session
•
Pass values from one session to a subsequent session in same
workflow/worklet
•
On Components Tab in Session Properties
•
Use user-defined workflow/worklet variables
Mapping Tricks
Built-in Mapping Variables
•
Mapping Name
•
Workflow Name
•
Session Name
•
Integration Service Name
•
Repository Service Name
•
Repository User Name
•
Folder Name
•
Session Run Mode
•
Source Table Names
16
Mapping Tricks
Group Expression (Anchor transformation)
•
Add expression transformation after a source qualifier and
before a target
•
If source or target definition changes, reconnecting ports is
much easier
Mass Update
•
pmrep massupdate
•
Session properties
•
Session config attributes
•
Transformation instance attributes
18
Mapping Assistants
Preview Data
•
View Data
• Accommodate anomalies early
• Verification of extraction/loading strategies
•
Type of Data
• Source/Targets
• Relational, Flat file
• XML Files
•
For further analysis, use Informatica Analyst
• Analyze the content, quality and structure of source data
Mapping Assistants
Mapping Wizard
• Pass-Through
• Slowly Changing Dimension
• Type 1 Dimension (No History)
• Type 2 Dimension (All History)
• Version Data
• Flag Current
• Effective Date Range
20
Mapping Assistants
•
Standardize specifications
•
Enhance collaboration between analyst and developer
•
Improve documentation & audit ability of business logic
Mapping Analyst for Excel (MAE)
Data Analyst
Defines Business Terms
Specifies Transformation Rules Standardize Excel format
DI Developer Augments, Tunes Generated Mappings from Specifications Generate Mapping Generate Specification
Mapping Assistants
• Define consistent methodology & structure for data integration projects
• Build custom wizard based on pattern without coding
• Generate multiple mappings at one time
• Document data flow
Mapping Architect for Visio (MAV)
DI Architect
Creates & Publishes mapping template DI Developer Augments, Tunes Generated Mappings Generate Mappings Informatica Toolbar Informatica Stencil Drawing Window Template File Publish Template
22
Mapping Assistants
•
Mapping Architect for Visio (MAV)
Case Study #1•7 templates were used across 2 projects to generate 600 mappings
•97% of mappings were automatically generated and required no additional changes
•3% needed to be manually modified or custom developed
Case Study #2
•1 template was used to create 150 mappings for a data migration project along with PowerCenter sessions and workflows
•Total effort was less than one day
•Equivalent effort to create 150 mappings manually would have been 2 weeks (10x effort)
24
•
Apply Default Query when possible
• Utilize SQ Attributes
(i.e., User Defined Join, Source Filter)
•
Understand advantages and limitations of the SQL override
PROS• Utilize database optimizers (i.e., indexes, hints)
• Can accommodate complex queries CONS
• Processing impacts database resources
• Lose transformation logic in metadata searched
• Unable to utilize Partitioning or Pushdown Optimization options
•
Minimize complicated queries
•
Add the SQL Override Query to the Description
Source Qualifier
•
Understand Port process order
• INs or IN/OUTs
• VARIABLEs
• OUTPUTs
•
Reduce code complexity
• Use local variables
• Redundant calculations
• Check previous record
• Provide comments (-- or //) in expressions
•
Optimize Expressions
• Numeric operations are faster than string operations
• Operators are faster than functions
• Un-Nest complicated logic (use IIF or DECODE)
Expressions
26
Transformation Tips
•
Build complex expressions and reuse
them within repository
•
Two Types:
• Public: Callable from any transformation expression
• Private: Only callable from another user-defined function
•
Include any valid function except
aggregate functions
•
Can export to XML Files
Transformation Tips
•
Consider Source Qualifier with
a filter to limit rows within
relational sources
•
Filter as close to the source as
possible
•
Replace multiple filters with a
router
•
Pertaining to routers, rows will
go to each path where the
criteria is TRUE
28
•
Use sorted input to decrease use of aggregate caches
•
Limit connected input/output or output port
•
Filter data before aggregating
•
Use as early as possible
Joiners
•
Perform joins in Source Qualifier when possible
• Limit use to heterogeneous and flat file sources
•
Perform normal joins when possible
•
Join sorted input when possible
•
Designate the master source as the source with fewer rows
Transformation Tips
Aggregators
Transformation Tips
•
Using SQL Override in Lookup
• Similar to Source Qualifier, avoid when possible
• Can apply Parameters and Variables
• Can query against multiple tables in same database
• Suppress ORDER BY statement by appending two dashes (--)
•
Add indexes to database columns
•
Replace large lookup tables with joins in the Source Qualifier
when possible
•
Relational Lookups should only return ports that meet the
condition
•
Remove all ports not used downstream or in the SQL Override
30
Transformation Tips
• Lookup Cache Types• Persistent Caches
• Save lookup cache files for reuse
• Dynamic Caches
• Retains the latest changes to data as rows pass through the mapping
• Updating a master table
• Real-time sessions
• Slowly changing dimension
• Cache Sizes
• Eliminate Paging
• Stores condition values in index, .idx, files
• Stores output values in data, .dat, files
• Apply the Cache Calculator in Session
Transformation Tips
• Cache Updates• Update the dynamic lookup cache with results of an expression
Use Case: Update QTY on hand for new timestamp
Add WHERE incoming row timestamp > cached timestamp
• SQL Overrides for Uncached Lookups
• You must choose the Use Any Value on Lookup Policy on Multiple Match condition
• Multiple Rows Return
Use Case: Aggregate customer orders and store the total value
• Database Deadlock Resilience
Lookups
32
Transformation Tricks
•
Perform a lookup on an application source that is not a
relational table or file
•
Partial pipeline contains Source & Source Qualifier but no target
•
Integration Service reads source data and passes to Lookup
Transformation to create cache
•
Create partitions to improve performance
Transformation Tips
•
Transaction in PowerCenter is
a set of rows bound by commit
or rollback
•
Control commit and rollback
transactions based on a row or
set of rows that pass through
the transformation
Use Case: Each invoice number is committed to the target database as a single transaction
•
Change Tracing Level to
‘Terse’
• At higher tracing levels, every
34
Transformation Tips
Associated Source Qualifier
•
Use ASQ when MQ data is flat file or COBOL
Transformation Tips
•
Non-Reusable the counter is 0
•
Performance will be affected if cached is low
•
Increase of caching will improve the performance
•
This doesn’t involve any database operation
•
The caching allows to reserve number of rows in the
memory
36
File Source and Target Commands
Commands for File Sources
•
Use a command to generate flat file source data
input rows or file list or a session
•
Unix – any valid UNIX command or shell script
•
Windows – any valid DOS or batch file.
•
Service process variables ($PMSourceFileDir)
can be used in the command.
38
File Source Command
•
Input Type: Command (default: file)
•
Command Type: Command Generating File List
•
Command writes list of file names to stdout
•
PowerCenter interprets this as a file list.
File Source Command
•
Input Type: Command
(default: file)
•
Command Type: Command Generating Data
•
Command generates rows to stdout
•
Flat file reader reads directly from stdout
•
Removes need for staging data
•
Example use, reading compressed files
•
uncompress –c $PMSourceFileDir/myCompressedFile.Z
40
File Target Command
•
Output Type: Command
(default: file)
•
Flat file writer writes to the command
•
Writing compressed files
•
compress -c - > $PMTargetFileDir/myCompressedFile.Z
•
Sorting output data
Filename Port
•
Input Filename can be processed and passed
to target
42
Filename Port
•
Write records to a dynamically named flat file
Target Filename
44
Change Detection for Updates
•
Challenge: a record with a lot of columns needs
to be checked for changes
•
Solution: calculate an MD5 checksum on the
columns and use a lookup to compare the value
with any existing record
Sample Change data detection
•
Calculate MD5 for all columns except key
•
Create lookup for primary key and MD5 value
46
Querying the PowerCenter repository
•
Query in designer
•
Limit querying on OPB tables
•
Use the MX views instead
•
Utilize Reporting Service
•
Use Meta Query tool
48
50
Repository Maintenance
Purge repository versions
•
Define version strategy for Dev,
QA and Prod
•
Archieve if required for future
analysis
•
Purge unwanted versions
•
Run the purge in regular interval
daily, weekly or monthly
pmrep connect -r $REPOSITORY_NAME -d $DOMAIN_NAME -n
$ADMIN_USER -X INFA_ENCRYPTED_PASSWD
Repository Maintenance
Purge repository logs
•
Define log strategy for Dev, QA and Prod
•
Archieve if required for future analysis
•
Purge unwanted logs
•
Run the purge in regular interval daily, weekly or monthly
Compute statistics on metadata tables
pmrep connect -r $REPOSITORY_NAME -d $DOMAIN_NAME
-n $ADMIN_USER -X INFA_ENCRYPTED_PASSWD
pmrep truncatelog -t $DAYS_TO_KEEP
52