Introduction ETL Life Cycle
The typical real-life ETL cycle consists of the following execution steps: 1. Cycle initiation
2. Build reference data 3. Extract (from sources) 4. Validate
5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) 6. Stage (load into staging tables, if used)
7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) 8. Publish (to target tables)
9. Archive 10. Clean up Best practices
Four-layered approach for ETL architecture design
• Functional layer: Core functional ETL processing (extract, transform, and load).
• Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.
• Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.
• Utility layer: Common components supporting all other layers. Use file-based ETL processing where possible
• Storage costs relatively little
• Intermediate files serve multiple purposes: • Used for testing and debugging
• Used for restart and recover processing • Used to calculate control statistics
• Helps to reduce dependencies - enables modular programming. • Allows flexibility for job-execution and -scheduling
• Better performance if coded properly, and can take advantage of parallel processing capabilities when the need arises.
Use data-driven methods and minimize custom ETL coding • Parameter-driven jobs, functions, and job-control • Code definitions and mapping in database
• Consideration for data-driven tables to support more complex code-mappings and business-rule application. Qualities of a good ETL architecture design :
• Performance • Scalable • Migratable
• Recoverable (run_id, ...)
• Operable (completion-codes for phases, re-running from checkpoints, etc.)
• Auditable (in two dimensions: business requirements and technical troubleshooting) What is Informatica
Informatica Power Center is a powerful ETL tool from Informatica Corporation. Informatica Corporation products are:
• Informatica Power Center • Informatica on demand
• Informatica B2B Data Exchange • Informatica Data Quality • Informatica Data Explorer
Informatica Power Center is a single, unified enterprise data integration platform for accessing, discovering, and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise at any speed. Informatica Power Center Editions :
Because every data integration project is different and includes many variables such as data volumes, latency requirements, IT infrastructure, and methodologies—Informatica offers three Power Center Editions and a suite of Power Center Options to meet your project’s and organization’s specific needs.
• Standard Edition • Real Time Edition • Advanced Edition
Informatica Power Center Standard Edition:
Power Center Standard Edition is a single, unified enterprise data integration platform for discovering, accessing, and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise to improve operational efficiency.
Key features include:
• A high-performance data integration server • A global metadata infrastructure
• Visual tools for development and centralized administration
• Productivity tools to facilitate collaboration among architects, analysts, and developers . Informatica Power Center Real Time Edition :
Packaged for simplicity and flexibility, Power Center Real Time Edition extends Power Center Standard Edition with additional capabilities for integrating and provisioning transactional or operational data in real-time. Power Center Real Time Edition provides the ideal platform for developing sophisticated data services and delivering timely information as a service, to support all business needs. It provides the perfect real-time data integration complement to service-oriented architectures, application integration approaches, such as enterprise application integration (EAI), enterprise service buses (ESB), and business process management (BPM).
Key features include:
• Change data capture for relational data sources • Integration with messaging systems
• Built-in support for Web services
• Dynamic partitioning with data smart parallelism • Process orchestration and human workflow capabilities Informatica Power Center Real Time Edition :
Power Center Advanced Edition addresses requirements for organizations that are Standardizing data integration at an enterprise level, across a number of projects and departments. It combines all the capabilities of Power Center Standard Edition and features additional capabilities that are ideal for data governance and Integration Competency Centers. Key features include:
• Dynamic partitioning with data smart parallelism • Powerful metadata analysis capabilities
• Web-based data profiling and reporting capabilities Power Center includes the following components:
• Power Center domain • Administration Console • Power Center repository • Power Center Client • Repository Service • Integration Service • Web Services Hub • SAP BW Service • Data Analyzer • Metadata Manager
• Power Center Repository Reports POWERCENTER CLIENT
The Power Center Client consists of the following applications that we use to manage the repository, design mappings, mapplets, and create sessions to load the data:
1. Designer 2. Data Stencil 3. Repository Manager 4. Workflow Manager 5. Workflow Monitor 1. Designer:
Use the Designer to create mappings that contain transformation instructions for the Integration Service.
The Designer has the following tools that you use to analyze sources, design target Schemas, and build source-to-target mappings:
• Source Analyzer: Import or create source definitions. • Target Designer: Import or create target definitions.
• Transformation Developer: Develop transformations to use in mappings. You can also develop user-defined functions to use in expressions.
• Mapplet Designer: Create sets of transformations to use in mappings.
• Mapping Designer: Create mappings that the Integration Service uses to Extract, transform, and load data.
2.Data Stencil
Use the Data Stencil to create mapping template that can be used to generate multiple mappings. Data Stencil uses the Microsoft Office Visio interface to create mapping templates. Not used by a developer usually.
3.Repository Manager
Use the Repository Manager to administer repositories. You can navigate through multiple folders and repositories, and complete the following tasks:
• Manage users and groups: Create, edit, and delete repository users and User groups. We can assign and revoke repository privileges and folder Permissions.
• Perform folder functions: Create, edit, copy, and delete folders. Work we perform in the Designer and Workflow Manager is stored in folders. If we want to share metadata, you can configure a folder to be shared.
• View metadata: Analyze sources, targets, mappings, and shortcut dependencies, search by keyword, and view the properties of repository Objects. We create repository objects using the Designer and Workflow Manager Client tools.
We can view the following objects in the Navigator window of the Repository Manager:
• Source definitions: Definitions of database objects (tables, views, synonyms) or Files that provide source data. • Target definitions: Definitions of database objects or files that contain the target data.
• Mappings: A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Integration Service uses to transform and move data.
• Reusable transformations: Transformations that we use in multiple mappings. • Mapplets: A set of transformations that you use in multiple mappings.
• Sessions and workflows: Sessions and workflows store information about how and When the Integration Service moves data. A workflow is a set of instructions that Describes how and when to run tasks related to extracting, transforming, and loading Data. A session is a type of task that you can put in a workflow. Each session Corresponds to a single mapping.
4.Workflow Manager :
Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.
The Workflow Manager has the following tools to help us develop a workflow: • Task Developer: Create tasks we want to accomplish in the workflow.
• Work let Designer: Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. We can nest worklets inside a workflow.
• Workflow Designer: Create a workflow by connecting tasks with links in the Workflow Designer. You can also create tasks in the Workflow Designer as you develop the workflow.
When we create a workflow in the Workflow Designer, we add tasks to the workflow. The Workflow Manager includes tasks, such as the Session task, the Command task, and the Email task so you can design a workflow. The Session task is based on a mapping we build in the Designer.
We then connect tasks with links to specify the order of execution for the tasks we created. Use conditional links and workflow variables to create branches in the workflow.
5.Workflow Monitor
Use the Workflow Monitor to monitor scheduled and running workflows for each Integration Service. We can view details about a workflow or task in Gantt chart view or Task view. We Can run, stop, abort, and resume workflows from the Workflow Monitor. We can view Sessions and workflow log events in the Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives information from the Integration Service and Repository Service. It also fetches information from the repository to display historic Information.
Services Behind Scene
INTEGRATION SERVICE PROCESS
The Integration Service starts an Integration Service process to run and monitor workflows. The Integration Service process accepts requests from the Power Center Client and from pmcmd. It performs the following tasks:
• Manages workflow scheduling. • Locks and reads the workflow. • Reads the parameter file. • Creates the workflow log.
• Runs workflow tasks and evaluates the conditional links connecting tasks. • Starts the DTM process or processes to run the session.
• Writes historical run information to the repository. • Sends post-session email in the event of a DTM failure. LOAD BALANCER
The Load Balancer is a component of the Integration Service that dispatches tasks to achieve optimal performance and scalability. When we run a workflow, the Load Balancer dispatches the Session, Command, and predefined Event-Wait tasks within the workflow.
The Load Balancer dispatches tasks in the order it receives them. When the Load Balancer needs to dispatch more Session and Command tasks than the Integration Service can run, it places the tasks it cannot run in a queue. When nodes become available, the Load Balancer dispatches tasks from the queue in the order determined by the workflow service level. DTM PROCESS
When the workflow reaches a session, the Integration Service process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks:
• Retrieves and validates session information from the repository.
• Performs pushdown optimization when the session is configured for pushdown optimization. • Adds partitions to the session when the session is configured for dynamic partitioning.
• Expands the service process variables, session parameters, and mapping variables and parameters. • Creates the session log.
• Validates source and target code pages. • Verifies connection object permissions.
• Runs pre-session shell commands, stored procedures, and SQL.
• Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid. • Creates and run mapping, reader, writer, and transformation threads to extract, transform, and load data. • Runs post-session stored procedures, SQL, and shell commands.
• Sends post-session email. PROCESSING THREADS
The DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory. The default memory allocation is 12,000,000 bytes.
The DTM uses multiple threads to process data in a session. The main DTM thread is called the master thread. The master thread can create the following types of threads:
• Mapping Threads: One mapping thread for each session. • Pre- and Post-Session Threads: One thread created. • Reader Threads: One thread for each partition
• Writer Threads: One thread for each partition CODE PAGES and DATA MOVEMENT
A code page contains the encoding to specify characters in a set of one or more languages. An encoding is the assignment of a number to a character in the character set.
The Integration Service can move data in either ASCII or Unicode data movement mode. These modes determine how the Integration Service handles character data.
We choose the data movement mode in the Integration Service configuration settings. If we want to move multi byte data, choose Unicode data movement mode.
ASCII Data Movement Mode: In ASCII mode, the Integration Service recognizes 7-bit ASCII and EBCDIC characters and stores each character in a single byte.
Unicode Data Movement Mode: Use Unicode data movement mode when sources or targets use 8-bit or multi byte character sets and contain character data.
Try U R Hand's on Admin-Console Repository Manager Tasks:
• Add domain connection information • Add and connect to a repository
• Work with Power Center domain and repository connections • Search for repository objects or keywords
• View object dependencies • Compare repository objects
• Truncate session and workflow log entries • View user connections
• Release locks
• Exchange metadata with other business intelligence tools
Add a repository to the Navigator, and then configure the domain connection information when we connect to the repository.
1.Adding a Repository to the Navigator :
1. In any of the Power Center Client tools, click Repository > Add.
2. Enter the name of the repository and a valid repository user name. 3. Click OK.
Before we can connect to the repository for the first time, we must configure the Connection information for the domain that the repository belongs to.
2.Configuring a Domain Connection
1. In a Power Center Client tool, select the Repositories node in the Navigator. 2. Click Repository > Configure Domains to open the Configure Domains dialog box. 3. Click the Add button. The Add Domain dialog box appears.
4. Enter the domain name, gateway host name, and gateway port number. 5. Click OK to add the domain connection.
3.Connecting to a Repository
1. Launch a Power Center Client tool.
2. Select the repository in the Navigator and click Repository > Connect, or double-click the repository. 3. Enter a valid repository user name and password.
Click on more button to add, change or view domain information.
4.Viewing Object Dependencies
Before we change or delete repository objects, we can view dependencies to see the impact on other objects. For example, before you remove a session, we can find out which workflows use the session. We can view dependencies for repository objects in the Repository Manager, Workflow Manager, and Designer tools.
Steps:
1. Connect to the repository.
2. Select the object of use in navigator.
3. Click Analyze and Select the dependency we want to view. 5.Validating Multiple Objects
We can validate multiple objects in the repository without fetching them into the workspace. We can save and optionally check in objects that change from invalid to valid status as a result of the validation. We can validate sessions, mappings, mapplets, workflows, and worklets.
Steps:
1. Select the objects you want to validate. 2. Click Analyze and Select Validate
3. Select validation options from the Validate Objects dialog box 4. Click Validate.
5. Click a link to view the objects in the results group. 6.Comparing Repository Objects
We can compare two repository objects of the same type to identify differences between the objects. For example, we can compare two sessions to check for differences. When we compare two objects, the Repository Manager displays their attributes.
Steps:
1. In the Repository Manager, connect to the repository. 2. In the Navigator, select the object you want to compare. 3. Click Edit > Compare Objects.
4. Click Compare in the dialog box displayed. 7.Truncating Workflow and Session Log Entries
When we configure a session or workflow to archive session logs or workflow logs, the Integration Service saves those logs in local directories. The repository also creates an entry for each saved workflow log and session log. If we move or delete a session log or workflow log from the workflow log directory or session log directory, we can remove the entries from the repository.
Steps:
1. In the Repository Manager, select the workflow in the Navigator window or in the Main window. 2. Choose Edit > Truncate Log. The Truncate Workflow Log dialog box appears.
3. Choose to delete all workflow and session log entries or to delete all workflow and session log entries with an end time before a particular date.
4. If you want to delete all entries older than a certain date, enter the date and time. 5. Click OK.
8.Managing User Connections and Locks
In the Repository Manager, we can view and manage the following items:
Repository object locks: The repository locks repository objects and folders by user. The repository creates different types of locks depending on the task. The Repository Service locks and unlocks all objects in the repository.
User connections: Use the Repository Manager to monitor user connections to the repository. We can end connections when necessary.
Types of locks created:
1. In-use lock: Placed on objects we want to view
2. Write-intent lock: Placed on objects we want to modify.
3. Execute lock: Locks objects we want to run, such as workflows and sessions Steps:
1. Launch the Repository Manager and connect to the repository. 2. Click Edit > Show User Connections or Show locks
3. The locks or user connections will be displayed in a window. 4. We can do the rest as per our need.
9.Managing Users and Groups
1. In the Repository Manager, connect to a repository. 2. Click Security > Manage Users and Privileges. 3. Click the Groups tab to create Groups. or 4. Click the Users tab to create Users
5. Click the Privileges tab to give permissions to groups and users. 6. Select the options available to add, edit, and remove users and groups. There are two default repository user groups:
Administrators: This group initially contains two users that are created by default.
The default users are Administrator and the database user that created the repository. We cannot delete these users from the repository or remove them from the Administrators group.
Public: The Repository Manager does not create any default users in the Public group. 10 Working with Folders
We can create, Edit or delete folder as per our need.
1. In the Repository Manager, connect to a repository. 2. Click Folder > Create.
Enter the following information:
3. Click ok.
Difference Between 7.1 and 8.6
1. Target from Transformation: In Informatica 8X we can create target from Transformation by dragging transformation in Target designer
2. Pushdown optimization: Uses increased performance by pushing Transformation logic to the database by analyzing the transformations and Issuing SQL statements to sources and targets. Only processes any Transformation logic that it cannot push to the database.
3. New function in expression editor: New function have been introduced in Informatica 8X like reg_extract and reg_match
4. Repository query available in both versioned and non versioned Repositories previously it was available only for versioned repository.
5. UDF (User defined function) similar to macro in excel
6. FTP: We can have partitioned FTP targets and Indirect FTP File source (with file list).
7. Propagating Port Descriptions: In Informatica 8 we can edit a port description and propagate the description to other transformations in the mapping.
8. Environment SQL Enhancements: Environment SQL can still be used to Execute an SQL statement at start of connection to the database. We can Use SQL commands that depend upon a transaction being opened during The entire read or write process. For example, the following SQL command Modifies how the session handles characters: Alter session set
NLS_DATE_FORMAT='DD/MM/YYYY';".
9. Concurrently write to multiple files in a session with partitioned targets. 10. Flat File Enhancements:
• Reduced conversion of data types
• Delimited file performance has improved
• Flat file now can have integer and double data types • Data can be appended to existing flat files
Informatica power center 8 is having the following features which makes it more powerful, easy to use and manage when compared to previous versions.
• Supports Service oriented architecture
• Access to structured, unstructured and semi structured data • Support for grid computing
• High availability • Pushdown optimization • Dynamic partitioning
• Metadata exchange enhancements • Team based Development
• Global Web-based Admin console • New transformations
• 23 New functions • User defined functions
• Custom transformation enhancements • Flat file enhancements
• New Data Federation option • Enterprise GRID
Testing Unit Testing
Unit testing can be broadly classified into 2 categories. Quantitative Testing
Validate your Source and Target
a) Ensure that your connectors are configured properly.
b) If you are using flat file make sure have enough read/write permission on the file share. c) You need to document all the connector information.
Analyze the Load Time
a) Execute the session and review the session statistics.
b) Check the Read and Write counters. How long it takes to perform the load. c) Use the session and workflow logs to capture the load statistics.
d) You need to document all the load timing information. Analyze the success rows and rejections.
a) Have customized SQL queries to check the source/targets and here we will perform the Record Count Verification. b) Analyze the rejections and build a process to handle those rejections. This requires a clear business requirement from the business on how to handle the data rejections. Do we need to reload or reject and inform etc? Discussions are required and appropriate process must be developed.
Performance Improvement a) Network Performance b) Session Performance c) Database Performance
d) Analyze and if required define the Informatica and DB partitioning requirements. Qualitative Testing
Analyze & validate your transformation business rules. More of functional testing.
e) You need review field by field from source to target and ensure that the required transformation logic is applied.
f) If you are making changes to existing mappings make use of the data lineage feature Available with Informatica Power Center. This will help you to find the consequences of Altering or deleting a port from existing mapping.
g) Ensure that appropriate dimension lookup’s have been used and your development is in Sync with your business requirements.
Integration Testing
After unit testing is complete; it should form the basis of starting integration testing. Integration testing should Test out initial and incremental loading of the data warehouse.
Integration testing will involve following 1. Sequence of ETL jobs in batch.
2. Initial loading of records on data warehouse.
4. Testing the rejected records that don’t fulfill transformation rules. 5. Error log generation.
Integration Testing would cover End-to-End Testing for DWH. The coverage of the tests would include the below: Count Validation
Record Count Verification: DWH backend/Reporting queries against source and target as an initial check.
Control totals: To ensure accuracy in data entry and processing, control totals can be compared by the system with manually entered or otherwise calculated control totals using the data fields such as quantities, line items, documents, or dollars, or simple record counts
Hash totals: This is a technique for improving data accuracy, whereby totals are obtained on identifier fields (i.e., fields for which it would logically be meaningless to construct a total), such as account number, social security number, part number, or employee number. These totals have no significance other than for internal system control purposes.
Limit checks: The program tests specified data fields against defined high or low value limits (e.g., quantities or dollars) for acceptability before further processing.
Dimensional Analysis
Data integrity between the various source tables and relationships. Statistical Analysis
Validation for various calculations.
• When you validate the calculations you don’t require loading the entire rows into target and Validating it. • Instead you use the Enable Test Load feature available in Informatica Power Center.
Property Description
Enable Test Load You can configure the Integration Service to perform a test load.
With a test load, the Integration Service reads and transforms data without writing to targets. The Integration Service generates all session files, and performs all pre- and post-session Functions, as if running the full session.
The Integration Service writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write data to the targets.
Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions using XML sources. You can perform a test load for relational targets when you configure a session for normal Mode. If you configure the session for bulk mode, the session fails.
Number of Rows to Test Enter the number of source rows you want the Integration Service to test load. The Integration Service reads the number you configure for the test load.
Data Quality Validation
Check for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of source and target data.
Overflow checks: This is a limit check based on the capacity of a data field or data file area to accept data. This programming technique can be used to detect the truncation of a financial or quantity data field value after computation (e.g., addition, multiplication, and division). Usually, the first digit is the one lost.
Format checks: These are used to determine that data are entered in the proper mode, as numeric or alphabetical characters, within designated fields of information. The proper mode in each case depends on the data field definition.
Sign test: This is a test for a numeric data field containing a designation of an algebraic sign, + or - , which can be used to denote, for example, debits or credits for financial data fields.
Size test: This test can be used to test the full size of the data field. For example, a social security number in the United States should have nine digits
Granularity
Validate at the lowest granular level possible Other validations
Audit Trails, Transaction Logs, Error Logs and Validity checks.
Note: Based on your project and business needs you might have additional testing requirements. User Acceptance Test
In this phase you will involve the user to test the end results and ensure that business is satisfied with the quality of the data. Any changes to the business requirement will follow the change management process and eventually those changes have to follow the SDLC process.
Optimize Development, Testing, and Training Systems
• Dramatically accelerate development and test cycles and reduce storage costs by creating fully functional, smaller targeted data subsets for development, testing, and training systems, while maintaining full data integrity.
• Quickly build and update nonproduction systems with a small subset of production data and replicate current subsets of nonproduction copies faster.
• Simplify test data management and shrink the footprint of nonproduction systems to significantly reduce IT infrastructure and maintenance costs.
• Reduce application and upgrade deployment risks by properly testing configuration updates with up-to-date, realistic data before introducing them into production .
• Easily customize provisioning rules to meet each organization’s changing business requirements. • Lower training costs by standardizing on one approach and one infrastructure.
• Train employees effectively using reliable, production-like data in training systems. Support Corporate Divestitures and Reorganizations
• Untangle complex operational systems and separate data along business lines to quickly build the divested organization’s system.
• Accelerate the provisioning of new systems by using only data that’s relevant to the divested organization. • Decrease the cost and time of data divestiture with no reimplementation costs .
Reduce the Total Cost of Storage Ownership
• Dramatically increase an IT team’s productivity by reusing a comprehensive list of data objects for data selection and updating processes across multiple projects, instead of coding by hand—which is expensive, resource intensive, and time consuming .
• Accelerate application delivery by decreasing R&D cycle time and streamlining test data management.
• Improve the reliability of application delivery by ensuring IT teams have ready access to updated quality production data.
• Lower administration costs by centrally managing data growth solutions across all packaged and custom applications.
• Substantially accelerate time to value for subsets of packaged applications. • Decrease maintenance costs by eliminating custom code and scripting. Informatica Power Center Testing
Debugger: Very useful tool for debugging a valid mapping to gain troubleshooting information about data and error conditions. Refer Informatica documentation to know more about debugger tool.
Test Load Options – Relational Targets. Running the Integration Service in Safe Mode
• Test a development environment. Run the Integration Service in safe mode to test a development environment before migrating to production
• Troubleshoot the Integration Service. Configure the Integration Service to fail over in safe mode and
troubleshoot errors when you migrate or test a production environment configured for high availability. After the Integration Service fails over in safe mode, you can correct the error that caused the Integration Service to fail over. Syntax Testing: Test your customized queries using your source qualifier before executing the session. Performance Testing for identifying the following bottlenecks:
• Target • Source • Mapping • Session • System
Use the following methods to identify performance bottlenecks:
• Run test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.
• Analyze performance details. Analyze performance details, such as performance counters, to determine where session performance decreases.
• Analyze thread statistics. Analyze thread statistics to determine the optimal number of partition points.
• Monitor system performance. You can use system monitoring tools to view the percentage of CPU use, I/O waits, and paging to identify system bottlenecks. You can also use the Workflow Monitor to view system resource usage. Use Power Center conditional filter in the Source Qualifier to improve performance.
• Share metadata. You can share metadata with a third party. For example, you want to send a mapping to someone else for testing or analysis, but you do not want to disclose repository connection information for security reasons. You can export the mapping to an XML file and edit the repository connection information before sending the XML file. The third party can import the mapping from the XML file and analyze the metadata.
Debugger
You can debug a valid mapping to gain troubleshooting information about data and error conditions. To debug a mapping, you configure and run the Debugger from within the Mapping Designer. The Debugger uses a session to run the mapping on the Integration Service. When you run the Debugger, it pauses at breakpoints and you can view and edit transformation output data.
You might want to run the Debugger in the following situations:
• Before you run a session. After you save a mapping, you can run some initial tests with a debug session before you create and configure a session in the Workflow Manager.
• After you run a session. If a session fails or if you receive unexpected results in the target, you can run the Debugger against the session. You might also want to run the Debugger against a session if you want to debug the mapping using the configured session properties.
Debugger Session Types:
You can select three different debugger session types when you configure the Debugger. The Debugger runs a workflow for each session type. You can choose from the following Debugger session types when you configure the Debugger:
• Use an existing non-reusable session. The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs the non-reusable session and the existing workflow. The Debugger does not suspend on error.
• Use an existing reusable session. The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs a debug instance of the reusable session And creates and runs a debug workflow for the session.
• Create a debug session instance. You can configure source, target, and session configuration properties through the Debugger Wizard. When you run the Debugger, the Integration Service runs a debug instance of the debug workflow and creates and runs a debug workflow for the session.
Debug Process
To debug a mapping, complete the following steps:
1. Create breakpoints. Create breakpoints in a mapping where you want the Integration Service to evaluate data and error conditions.
2. Configure the Debugger. Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type the Integration Service uses when it runs the Debugger. When you create a debug session, you configure a subset of session properties within the Debugger Wizard, such as source and target location. You can also choose to load or discard target data.
3. Run the Debugger. Run the Debugger from within the Mapping Designer. When you run the Debugger, the Designer connects to the Integration Service. The Integration Service initializes the Debugger and runs the debugging session and workflow. The Integration Service reads the breakpoints and pauses the Debugger
when the breakpoints evaluate to true.
4. Monitor the Debugger. While you run the Debugger, you can monitor the target data, transformation and mapplet output data, the debug log, and the session log. When you run the Debugger, the Designer displays the following windows:
• Debug log. View messages from the Debugger. • Target window. View target data.
5. Modify data and breakpoints. When the Debugger pauses, you can modify data and see the effect on transformations, mapplets, and targets as the data moves through the pipeline. You can also modify breakpoint information.
The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy breakpoint information and the Debugger configuration to another mapping. If you want to run the Debugger from another Power Center Client machine, you can copy the breakpoint information and the Debugger configuration to the other Power Center Client machine.
Running the Debugger:
When you complete the Debugger Wizard, the Integration Service starts the session and initializes the Debugger. After initialization, the Debugger moves in and out of running and paused states based on breakpoints and commands that you issue from the Mapping Designer. The Debugger can be in one of the following states:
• Initializing. The Designer connects to the Integration Service. • Running. The Integration Service processes the data.
• Paused. The Integration Service encounters a break and pauses the Debugger.
Note: To enable multiple users to debug the same mapping at the same time, each user must configure different port numbers in the Tools > Options > Debug tab.
Monitoring the Debugger :
When you run the Debugger, you can monitor the following information: • Session status. Monitor the status of the session.
• Data movement. Monitor data as it moves through transformations. • Breakpoints. Monitor data that meets breakpoint conditions. • Target data. Monitor target data on a row-by-row basis.
The Mapping Designer displays windows and debug indicators that help you monitor the session:
• Debug indicators. Debug indicators on transformations help you follow breakpoints and data flow.
• Instance window. When the Debugger pauses, you can view transformation data and row information in the Instance window.
• Target window. View target data for each target in the mapping.
• Output window. The Integration Service writes messages to the following tabs in the Output window: • Debugger tab. The debug log displays in the Debugger tab.
• Session Log tab. The session log displays in the Session Log tab. • Notifications tab. Displays messages from the Repository Service.
While you monitor the Debugger, you might want to change the transformation output data to see the effect on subsequent transformations or targets in the data flow. You might also want to edit or add more breakpoint information to monitor the session more closely.
Restrictions
You cannot change data for the following output ports:
• Normalizer transformation. Generated Keys and Generated Column ID ports. • Rank transformation. RANKINDEX port.
• Router transformation. All output ports.
• Sequence Generator transformation. CURRVAL and NEXTVAL ports.
• Lookup transformation. NewLookupRow port for a Lookup transformation configured to use a dynamic cache. • Custom transformation. Ports in output groups other than the current output group.
• Java transformation. Ports in output groups other than the current output group. Additionally, you cannot change data associated with the following:
• Mapplets that are not selected for debugging • Input or input/output ports
• Output ports when the Debugger pauses on an error breakpoint Constraint-Based Loading:
In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the
Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:
• Active source. Related target tables must have the same active source. • Key relationships. Target tables must have key relationships.
• Target connection groups. Targets must be in one target connection group.
• Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint based loading.
Active Source:
When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the Integration Service reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table. Key Relationships:
When target tables have no key relationships, the Integration Service does not perform constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce constraint-based loading for these tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, complete the following tasks:
• Verify all targets are in the same target load order group and receive data from the same active source. • Use the default partition properties and do not add partitions or partition points.
• Define the same target type for all targets in the session properties.
• Define the same database connection name for all targets in the session properties. • Choose normal mode for the target load type for all targets in the session properties. Treat Rows as Insert:
Use constraint-based loading when the session option Treat Source Rows As is set to insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading.
When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options:
• Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load the primary table.
• Perform inserts in one mapping and updates in another mapping.
Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a
collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier.
Example
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.
Since these tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the following order:
1. T_1
2. T_2 and T_3 (in no particular order) 3. T_4
The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no key
relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:
• T_5 • T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same
database connection for each target and you use the default partition properties. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets. Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable constraint-based loading:
1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property. 2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.
3. Click OK. Target Load Order
When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for sources within the mapplet.
Setting the Target Load Order
You can configure the target load order for a mapping containing any type of target definition. In the Designer, you can set the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. You can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the primary key and foreign key constraints.
The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups sequentially.
To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target within a mapping. To set the target load order, you then determine in which order the Integration Service reads each source in the mapping.
The following figure shows two target load order groups in one mapping:
In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order group includes all other objects in the mapping, including the TOTAL_ORDERS target. The Integration Service processes the first target load order group, and then the second target load order group.
When it processes the second target load order group, it reads data from both sources at the same time. To set the target load order:
1. Create a mapping that contains multiple target load order groups. 2. Click Mappings > Target Load Plan.
3. The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data from each source qualifier.
4. Select a source qualifier from the list.
5. Click the Up and Down buttons to move the source qualifier within the load order. 6. Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK. Advanced Concepts
MAPPING PARAMETERS & VARIABLES
Mapping parameters and variables represent values in mappings and mapplets.
When we use a mapping parameter or variable in a mapping, first we declare the mapping parameter or variable for use in each mapplet or mapping. Then, we define a value for the mapping parameter or variable before we run the session. MAPPING PARAMETERS
• A mapping parameter represents a constant value that we can define before running a session. • A mapping parameter retains the same value throughout the entire session.
Example: When we want to extract records of a particular month during ETL process, we will create a Mapping Parameter of data type and use it in query to compare it with the timestamp field in SQL override.
• After we create a parameter, it appears in the Expression Editor.
• We can then use the parameter in any expression in the mapplet or mapping.
• We can also use parameters in a source qualifier filter, user-defined join, or extract override, and in the Expression Editor of reusable transformations.
MAPPING VARIABLES
• Unlike mapping parameters, mapping variables are values that can change between sessions.
• The Integration Service saves the latest value of a mapping variable to the repository at the end of each successful session.
• We can override a saved value with the parameter file.
• We can also clear all saved values for the session in the Workflow Manager.
We might use a mapping variable to perform an incremental read of the source. For example, we have a source table
containing time stamped transactions and we want to evaluate the transactions on a daily basis. Instead of manually entering a session override to filter source data each time we run the session, we can create a mapping variable, $$IncludeDateTime. In the source qualifier, create a filter to read only rows whose transaction date equals $$IncludeDateTime, such as:
TIMESTAMP = $$IncludeDateTime
In the mapping, use a variable function to set the variable value to increment one day each time the session runs. If we set the initial value of $$IncludeDateTime to 8/1/2004, the first time the Integration Service runs the session, it reads only rows dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the repository at the end of the session. The next time it runs the session, it reads only rows from August 2, 2004.
Used in following transformations: • Expression
• Filter • Router
• Update Strategy Initial and Default Value:
When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter an initial value. When the Integration Service needs an initial value, and we did not declare an initial value for the parameter or variable, the Integration Service uses a default value based on the data type of the parameter or variable.
Data ->Default Value Numeric ->0
String ->Empty String Date time ->1/1/1
Variable Values: Start value and current value of a mapping variable Start Value:
The start value is the value of the variable at the start of the session. The Integration Service looks for the start value in the following order:
1. Value in parameter file 2. Value saved in the repository 3. Initial value
4. Default value Current Value:
The current value is the value of the variable as the session progresses. When a session starts, the current value of a variable is the same as the start value. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository.
Note: If a variable function is not used to calculate the current value of a mapping variable, the start value of the variable is saved to the repository.
Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we need to configure the Data type and aggregation type for the variable. The IS uses the aggregate type of a Mapping variable to determine the final current value of the mapping variable.
Aggregation types are:
• Count: Integer and small integer data types are valid only.
• Max: All transformation data types except binary data type are valid. • Min: All transformation data types except binary data type are valid. Variable Functions
Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline. SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the variable value when a row is marked for insertion, and subtracts one when the row is Marked for deletion. It ignores rows marked for update or reject. Aggregation type set to Count.
SetVariable: Sets the variable to the configured value. At the end of a session, it compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository. Creating Mapping Parameters and Variables
1. Open the folder where we want to create parameter or variable.
2. In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet Designer, click Mapplet > Parameters and Variables.
3. Click the add button.
4. Enter name. Do not remove $$ from name.
5. Select Type and Data type. Select Aggregation type for mapping variables. 6. Give Initial Value. Click ok.
Example: Use of Mapping of Mapping Parameters and Variables • EMP will be source table.
• Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.
• TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month) • SET_VAR: We will be added one month to the HIREDATE of every employee.
• Create shortcuts as necessary. Creating Mapping
1. Open folder where we want to create the mapping. 2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example 4. Drag EMP and target table.
5. Transformation -> Create -> Select Expression for list -> Create –> Done.
6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression. 7. Create Parameter $$Bonus and Give initial value as 200.
8. Create variable $$var_max of MAX aggregation type and initial value 1500. 9. Create variable $$var_min of MIN aggregation type and initial value 1500.
10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is visible when datatype is INT or SMALLINT.
12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR, out_COUNT_VAR and out_SET_VAR.
13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM. To add $$BONUS to it, select variable tab and select the parameter from mapping parameter. SAL + COMM + $$Bonus
14. Open Expression editor for out_max_var.
15. Select the variable function SETMAXVARIABLE from left side pane. Select
$$var_max from variable tab and SAL from ports tab as shown below.SETMAXVARIABLE($$var_max,SAL)
17. Open Expression editor for out_min_var and write the following expression: SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following expression: SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following expression: SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate. 20. Click OK. Expression Transformation below:
22. See mapping picture on next page.
PARAMETER FILE
• A parameter file is a list of parameters and associated values for a workflow, worklet, or session. • Parameter files provide flexibility to change these variables each time we run a workflow or session.
• We can create multiple parameter files and change the file we use for a session or workflow. We can create a parameter file using a text editor such as WordPad or Notepad.
• Enter the parameter file name and directory in the workflow or session properties. A parameter file contains the following types of parameters and variables:
• Workflow variable: References values and records information in a workflow.
• Worklet variable: References values and records information in a worklet. Use predefined worklet variables in a parent workflow, but we cannot use workflow variables from the parent workflow in a worklet.
• Session parameter: Defines a value that can change from session to session, such as a database connection or file name.
• Mapping parameter and Mapping variable USING A PARAMETER FILE
Parameter files contain several sections preceded by a heading. The heading identifies the Integration Service, Integration Service process, workflow, worklet, or session to which we want to assign parameters or variables.
• Make session and workflow.
• Give connection information for source and target table. • Run workflow and see result.
Sample Parameter File for Our example:
In the parameter file, folder and session names are case sensitive. Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example] $$Bonus=1000
$$var_max=500 $$var_min=1200 $$var_count=0
We can specify the parameter file name and directory in the workflow or session properties. To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager. 2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field. 5. Click OK.
To enter a parameter file in the session properties: 1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
3. Enter the parameter directory and name in the Parameter Filename field. 4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt 5. Click OK.
MAPPLETS
• A mapplet is a reusable object that we create in the Mapplet Designer.
• It contains a set of transformations and lets us reuse that transformation logic in multiple mappings. • Created in Mapplet Designer in Designer Tool.
We need to use same set of 5 transformations in say 10 mappings. So instead of making 5 transformations in every 10 mapping, we create a mapplet of these 5 transformations. Now we use this mapplet in all 10 mappings. Example: To create a surrogate key in target. We create a mapplet using a stored procedure to create Primary key for target table. We give target table name and key column name as input to mapplet and get the Surrogate key as output.
Mapplets help simplify mappings in the following ways:
• Include source definitions: Use multiple source definitions and source qualifiers to provide source data for a mapping.
• Accept data from sources in a mapping
• Include multiple transformations: As many transformations as we need.
• Pass data to multiple transformations: We can create a mapplet to feed data to multiple transformations. Each Output transformation in a mapplet represents one output group in a mapplet.
• Contain unused ports: We do not have to connect all mapplet input and output ports in a mapping. Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input transformation in the mapplet. We can create multiple pipelines in a mapplet.
• We use Mapplet Input transformation to give input to mapplet. • Use of Mapplet Input transformation is optional.
Mapplet Output:
The output of a mapplet is not connected to any target table.
• We must use Mapplet Output transformation to store mapplet output.
• A mapplet must contain at least one Output transformation with at least one connected port in the mapplet. Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to mapplet out transformation. · EMP and DEPT will be source tables.
· Output will be given to transformation Mapplet_Out. Steps:
1. Open folder where we want to create the mapping. 2. Click Tools -> Mapplet Designer.
3. Click Mapplets-> Create-> Give name. Ex: mplt_example1 4. Drag EMP and DEPT table.
5. Use Joiner transformation as described earlier to join them.
6. Transformation -> Create -> Select Expression for list -> Create -> Done
7. Pass all ports from joiner to expression and then calculate total salary as described in expression transformation. 8. Now Transformation -> Create -> Select Mapplet Out from list –> Create -> Give name and then done.
9. Pass all ports from expression to Mapplet output. 10. Mapplet -> Validate
11. Repository -> Save Use of mapplet in mapping:
• We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we drag source and target tables.
• When we use the mapplet in a mapping, the mapplet object displays only the ports from the Input and Output transformations. These are referred to as the mapplet input and mapplet output ports.
• Make sure to give correct connection information in session. Making a mapping: We will use mplt_example1, and then create a filter transformation to filter records whose Total Salary is >= 1500.
· mplt_example1 will be source.
· Create target table same as Mapplet_out transformation as in picture above. Creating Mapping 1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_mplt_example1 4. Drag mplt_Example1 and target table.
5. Transformation -> Create -> Select Filter for list -> Create -> Done. 6. Drag all ports from mplt_example1 to filter and give filter condition.
7. Connect all ports from filter to target. We can add more transformations after filter if needed. 8. Validate mapping and Save it.
• Make session and workflow.
• Give connection information for mapplet source tables. • Give connection information for target table.
• Run workflow and see result. PARTITIONING
• A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.
• When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.
A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in every pipeline stage.
PARTITIONING ATTRIBUTES 1. Partition points
• By default, IS sets partition points at various transformations in the pipeline. • Partition points mark thread boundaries and divide the pipeline into stages. • A stage is a section of a pipeline between any two partition points.
2. Number of Partitions
• When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.
• increasing the number of partitions or partition points increases the number of threads.
• The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.
3. Partition types
• The Integration Service creates a default partition type at each partition point.
• If we have the Partitioning option, we can change the partition type. This option is purchased separately. • The partition type controls how the Integration Service distributes data among partitions at partition points. PARTITIONING TYPES
1. Round Robin Partition Type
• In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions. • Each partition processes approximately the same number of rows.
• Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions.
2. Pass-Through Partition Type
• In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions. • All rows in a single partition stay in that partition after crossing a pass-Through partition point.
• Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.
3. Database Partitioning Partition Type
• Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only. • Use any number of pipeline partitions and any number of database partitions.
• We can improve performance when the number of pipeline partitions equals the number of database partitions. Database Partitioning with One Source
When we use database partitioning with a source qualifier with one source, the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally. For example, when a session has three partitions and the database has five partitions, 1st and 2nd session partitions will receive data from 2 database partitions each. Thus four DB partitions used. 3rd Session partition will receive Data from the remaining 1 DB partition.
Partitioning a Source Qualifier with Multiple Sources Tables
The Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions.
4. Hash Auto-Keys Partition Type
• The Integration Service uses all grouped or sorted ports as a compound Partition key.
• Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations.
5. Hash User-Keys Partition Type
• The Integration Service uses a hash function to group rows of data among Partitions. • we define the number of ports to generate the partition key.
• we choose the ports that define the partition key . 6. Key range Partition Type
• We specify one or more ports to form a compound partition key.
• The Integration Service passes data to each partition depending on the Ranges we specify for each port. • Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.
• Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition. WORKING WITH LINKS
• Use links to connect each workflow task.
• We can specify conditions with links to create branches in the workflow.
• The Workflow Manager does not allow us to use links to create loops in the workflow. Each link in the workflow can run only once.
Valid Workflow :
Example of loop:
Specifying Link Conditions:
• Once we create links between tasks, we can specify conditions for each link to determine the order of execution in the workflow.
• If we do not specify conditions for each link, the Integration Service runs the next task in the workflow by default. • Use predefined or user-defined workflow variables in the link condition.
Steps:
1. In the Workflow Designer workspace, double-click the link you want to specify. 2. The Expression Editor appears.
3. In the Expression Editor, enter the link condition. The Expression Editor provides predefined workflow variables, user-defined workflow variables, variable functions, and Boolean and arithmetic operators.
Using the Expression Editor:
The Workflow Manager provides an Expression Editor for any expressions in the workflow. We can enter expressions using the Expression Editor for the following:
• Link conditions • Decision task • Assignment task SCHEDULERS
We can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow. The Integration Service runs a scheduled workflow as configured.
By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change schedule settings, the Integration Service reschedules the workflow according to the new settings.
• A scheduler is a repository object that contains a set of schedule settings. • Scheduler can be non-reusable or reusable.
• The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow. • If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all
workflows.
• If we delete a folder, the Integration Service removes workflows from the schedule. • The Integration Service does not run the workflow if:
• The prior workflow run fails.
• We remove the workflow from the schedule • The Integration Service is running in safe mode Creating a Reusable Scheduler
• For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder.
• Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow. • When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the
workflows valid, we must edit them and replace the missing scheduler. Steps:
1. Open the folder where we want to create the scheduler. 2. In the Workflow Designer, click Workflows > Schedulers. 3. Click Add to add a new scheduler.
4. In the General tab, enter a name for the scheduler. 5. Configure the scheduler settings in the Scheduler tab. 6. Click Apply and OK.
Configuring Scheduler Settings
Configure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule.
There are 3 run options: 1. Run on Demand 2. Run Continuously