Notes Informatica

(1)

ETL Tools

 ETL Tools are meant to extract, transform and load the data into the data warehouse for decision making.

 Before ETL Tools, ETL Process was done manually by SQL code created by programmers.  This task was cumbersome and tedious since it involved many resources, complex coding

and more work hours.

 Maintaining the code placed a great challenge among the programmers.

 ETL tools are very powerful and then offer many advantages in all stages of ETL process starting from execution, data cleansing(Purification), data profiling, and transformation, debugging and loading the data into data warehouse when compared to old method. Popular ETL tools:

Tool Name Company Name

Informatica Informatica Corporation

Data Stage IBM

Ab Initio Ab Inito Sofware Corporation

Oracle Warehouse Builder Oracle Corporation

Why Informatica?

 Informatica is powerful ETL tool from Informatica Corporation.

 In Informatica all the metadata about source system, target system and transformations are stored in the Informatica Repository.

 Informatica PowerCenter client and Repository server access this repository to store and retrieve metadata.

 Informatica is very powerful in designing and building data warehouses.

 It can connect to the several source and targets to extract metadata from sources and targets, transform and load the data into the target systems.

Basic Details about Informatica

 Informatica was founded in 1993.

 its headquarter is in Redwood City, California  founded by Diaz Nesamoney and Gaurav Dhillion  Sohaib Abbasi is the chairman and CEO

(2)

Informatica Versions

Version Release Date Comments

Informatica Powercenter 4.1 Informatica Powercenter 5.1 Informatica Powercenter 6.1.2

Informatica Powercenter 7.1.2 Nov 2003

Informatica Powercenter 8.1 Jun 2006 Service Oriented Architecture Informatica Powercenter 8.5 Nov 2007

Informatica Powercenter 8.6 Jun 2008 Informatica Powercenter 9.1 Jun 2011 Informatica Powercenter 9.5 May 2012

Informatica power centre contains below tools

 Repository Manager

 Designer

 Workflow manager  Workflow Monitor

Repository Manager is used to  Configure domains  Create the repository  Deploy the code  Change password

Designer has the below tools

 Source Analyzer: - Import or create source definitions for flat file, XML, COBOL, Application, and relational sources.

 Target Designer: - Import or create target definitions.

 Transformation Developer: - Create reusable transformations.

(3)

 Mapping Designer:- Create mappings

Workflow Manager Tools

 To create a workflow, you first create tasks such as a session, which contains the mapping you build in the Designer.

 You then connect tasks with conditional links to specify the order of execution for the tasks you created.

 The Workflow Manager consists of three tools to help you develop a workflow:

 Task Developer. Use the Task Developer to create tasks you want to run in the workflow.  Workflow Designer. Use the Workflow Designer to create a workflow by connecting tasks

with links. You can also create tasks in the Workflow Designer as you develop the workflow.

 Worklet Designer: - it is used to create a worklet.

Workflow Tasks

You can create the following types of tasks in the Workflow Manager:  Assignment. Assigns a value to a workflow variable.

 Command. Specifies a shell command to run during the workflow.  Control. Stops or aborts the workflow.

 Decision. Specifies a condition to evaluate.  Email. Sends email during the workflow.

 Event-Raise:- Notifies the Event-Wait task that an event has occurred.  Event-Wait: - Waits for an event to occur before executing the next task.  Session:-Runs a mapping you create in the Designer.

 Timer:-Waits for a timed event to trigger

(4)

Workflow Monitor is used to  Monitor workflow and tasks.

 We can run, stop, abort and resume workflow in workflow monitor.

Basic Definitions:

 Mapping represents data flow from sources to targets.

 Mapplet is a set of transformations. That can be used in one or more mappings.  Session is a set of instructions to move data from sources to targets.

 Workflow is a set of instructions that tell the Informatica server how to execute the tasks.  Worklet is an object that represents a set of tasks.

Informatica Architecture

Informatica ETL product, known as Informatica Power Center consists of 3 main components. Informatica PowerCenter Client Tools:

 Designer

 Repository manager  Workflow manager  Workflow monitor

Informatica PowerCenter Repository

 Repository is the heart of the Informatica tool.

 It is kind of database where all the data related to mappings, sources, targets are Stored.  All the client tools and Informatica Server fetch data from Repository.

 Informatica client and server without repository is same as a PC without memory/harddisk, which has got the ability to process data but has no data to process. This can be treated as backend of Informatica.

Informatica PowerCenter Server

 Server is the place, where all the executions take place.

 Server makes physical connections to sources/targets, fetches data, applies the transformations mentioned in the mapping and loads the data in the target system.

(5)

This architecture is visually explained in diagram below: Sources Standard: RDBMS, Flat Files, XML, ODBC Applications: SAP R/3, SAP BW, PeopleSoft, Siebel, JD Edwards, i2

EAI: MQ Series, Tibco, JMS, Web Services Legacy: Mainframes (DB2, VSAM, IMS, IDMS, Adabas)AS400 (DB2, Flat File) Remote Sources Targets Standard: RDBMS, Flat Files, XML, ODBC Applications: SAP R/3, SAP BW, PeopleSoft, Siebel, JD Edwards, i2

EAI: MQ Series, Tibco, JMS, Web Services

Legacy: Mainframes (DB2)AS400 (DB2)

(6)

How components works in the Informatica Architecture

 Repository: - Repository is nothing but a relational database which stores all the

metadata created in power center. Whenever you develop any mapping, session,

workflow, an entries are made in the repository.

 Integration Service: - it extracts data from sources, processes it as per business logic

and load data to targets.

 Repository Service: - It connects to the repository, fetches data from the repository

and sends back them to the requested components (mostly client tools and

integration service).

 Power Center Client tools: - The PowerCenter Client consists of multiple tools. They

are used to manage users, define sources and targets, build mappings and mapplets

with the transformation logic, and create workflows to run the mapping logic. The

PowerCenter Client connects to the repository through the Repository Service to

fetch details. It connects to the Integration Service to start workflows. So essentially

client tools are used to code and give instructions to PowerCenter servers.

 PowerCenter Administration Console: This is simply a web-based administration

tool you can use to administer the PowerCenter installation.

(7)

1. What are the functionalities we can do with source qualifier

transformation?

Source qualifier is an active and connected transformation. It is used to represent the rows that the integrations service reads when it runs a session.

 Source qualifier transformation converts the source data types to the Informatica native data types.

 Joins: You can join two or more tables from the same source database.  Filter rows: You can filter the rows from the source database

 Sorting input: You can sort the source data by specifying the number for sorted ports. The Integration Service adds an ORDER BY clause to the default SQL query

 Distinct rows: You can get distinct rows from the source by choosing the "Select Distinct" property. The Integration Service adds a SELECT DISTINCT statement to the default SQL query.

 Custom SQL Query: You can write your own SQL query to do calculations.

2. Joiner Transformation Functionalities

The joiner transformation is an active and connected transformation used to join two heterogeneous sources.

Join Type

The joiner transformation supports the following four types of joins.  Normal Join

 Master Outer Join  Details Outer Join  Full Outer Join

We will learn about each join type with an example. Let say i have the following students and subjects tables as the source.

Table Name: Subjects Subject_Id subject_Name --- 1 Maths

(8)

3 Physics

Table Name: Students Student_Id Subject_Id --- 10 1 20 2 30 NULL

Assume that subjects source is the master and students source is the detail and we will join these sources on the subject_id port.

Normal Join:

The joiner transformation outputs only the records that match the join condition and discards all the rows that do not match the join condition. The output of the normal join is

Master Ports | Detail Ports

--- Subject_Id Subject_Name Student_Id Subject_Id --- 1 Maths 10 1

2 Chemistry 20 2

Master Outer Join:

In a master outer join, the joiner transformation keeps all the records from the detail source and only the matching rows from the master source. It discards the unmatched rows from the master source. The output of master outer join is

(9)

2 Chemistry 20 2 NULL NULL 30 NULL

Detail Outer Join:

In a detail outer join, the joiner transformation keeps all the records from the master source and only the matching rows from the detail source. It discards the unmatched rows from the detail source. The output of detail outer join is

2 Chemistry 20 2 3 Physics NULL NULL

Full Outer Join:

The full outer join first brings the matching rows from both the sources and then it also keeps the non-matched records from both the master and detail sources. The output of full outer join is

(10)

2 Chemistry 20 2 3 Physics NULL NULL NULL NULL 30 NULL

Joiner Transformation Performance Improve Tips

To improve the performance of a joiner transformation follow the below tips

 If possible, perform joins in a database. Performing joins in a database is faster than performing joins in a session.

 You can improve the session performance by configuring the Sorted Input option in the joiner transformation properties tab.

 Specify the source with fewer rows and with fewer duplicate keys as the master and the other source as detail.

Limitations of Joiner Transformation

The limitations of joiner transformation are

 You cannot use joiner transformation when the input pipeline contains an update strategy transformation.

(11)

Why do we need Lookup?

 Lookup transformation is used to look up data in a flat file, relational table, view or synonym.

 Lookup is a passive/active transformation

 It can be used in both connected/unconnected modes.

 From Informatica version 9 onwards lookup is an active transformation. The lookup transformation can return a single row or multiple rows.

The lookup transformation is used to perform the following tasks:

 Get a Related Value: You can get a value from the lookup table based on the source value. As

an example, we can get the related value like city name for the zip code value.

 Get Multiple Values: You can get multiple rows from a lookup table. As an example, get all the

states in a country.

 Perform Calculation. We can use the value from the lookup table and use it in calculations.  Update Slowly Changing Dimension tables: Lookup transformation can be used to

determine whether a row exists in the target or not.

What is the difference between Static and Dynamic Lookup Cache?

We can configure a Lookup transformation to cache the underlying lookup table. In case of static or read-only lookup cache the Integration Service caches the lookup table at the beginning of the session and does not update the lookup cache while it processes the Lookup transformation.

In case of dynamic lookup cache the Integration Service dynamically inserts or updates data in the lookup cache and passes the data to the target. The dynamic cache is synchronized with the target.

(12)

Difference Between Joiner Transformation And Lookup Transformation

Joiner Lookup

Active Transformation Active/Passive Transformation

We cannot override the query in joiner We can override the query in lookup to fetch the data from multiple tables.

Support Equi Join only Support Equi Join And Non Equi Join In joiner we cannot configure to use persistence

cache, shared cache, uncached and dynamic cache

Where as in lookup we can configure to use persistence cache, shared cache, uncached and dynamic cache.

We can perform outer join in joiner transformation.

We cannot perform outer join in lookup transformation.

Joiner used only as Source Lkp used as Source and as well as Target

Difference between Active and Passive Transformation

What is a Transformation

A transformation is a repository object which reads the data, modifies the data and passes the data.

Transformations can be classified as active or passive, connected or unconnected.

Active Transformations:

A transformation can be called as an active transformation if it performs any of the following actions.

 Change the number of rows: For example, the filter transformation is active because it removes the rows that do not meet the filter condition.

 Change the transaction boundary: The transaction control transformation is active because it defines a commit or roll back transaction.

 Change the row type: Update strategy is active because it flags the rows for insert, delete, update or reject.

(13)

Passive Transformations:

Transformations which does not change the number of rows passed through them, maintains the transaction boundary and row type are called passive transformation

List of Active and Passive Transformation

Active Transformation - An active transformation changes the number of rows that pass through the mapping.

1. Source Qualifier Transformation 2. Sorter Transformations 3. Aggregator Transformations 4. Filter Transformation 5. Union Transformation 6. Joiner Transformation 7. Normalizer Transformation 8. Rank Transformation 9. Router Transformation

10. Update Strategy Transformation

11. Advanced External Procedure Transformation

Passive Transformation - Passive transformations do not change the number of rows that pass through the mapping.

1. Expression Transformation

2. Sequence Generator Transformation 3. Lookup Transformation

4. Stored Procedure Transformation 5. XML Source Qualifier Transformation 6. External Procedure Transformation 7. Input Transformation(Mapplet) 8. Output Transformation(Mapplet)

(14)

Is look up Active Transformation?

Before Informatica version 9.1, Look up transformation was passive. For each input row that we pass to lookup transformation, we can get only one output row even if we get multiple rows as output.

Look up Policy on multiple matches:

This property determines which rows to return when the Lookup transformation finds

multiple rows that match the lookup condition. Select one of the following values:

Report Error. The Integration Service reports an error and does not return a row.

Use First Value. Returns the first row that matches the lookup condition.

Use Last Value. Return the last row that matches the lookup condition.

Use Any Value. The Integration Service returns the first value that matches the lookup

condition. It creates an index based on the key ports instead of all Lookup

transformation ports.

From Informatica 9.1 onwards Lookup transformation can returns multiple rows as output. There is new value added to the above property.



Use All Values. Return all matching rows.

(15)

Rank Transformation

 Rank Transformation is an active and connected transformation.  It is used to select top or bottom rank of data.

 It Allows us to select a group of top or bottom values, not just one value

Update Strategy Transformation

 It is an active and connected transformation.

 It is used to insert, update and delete records in target table.  It can also reject the records before reaching target tables.

Flagging Rows in Mapping with Update Strategy:

You have to flag each row for inserting, updating, deleting or rejecting. The constants and their numeric equivalents for each database operation are listed below.

 DD_INSERT: Numeric value is 0. Used for flagging the row as Insert.  DD_UPDATE: Numeric value is 1. Used for flagging the row as Update.  DD_DELETE: Numeric value is 2. Used for flagging the row as Delete.  DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.

The integration service treats any other numeric value as an insert.

Important Note:

Update strategy works only when we have a primary key on the target table. If there is no primary key available on the target table, then you have to specify a primary key in the target definition in the mapping for update strategy transformation to work

(16)

Stored Procedure Transformation

 It is passive transformation

 It can work as a connected or unconnected transformation  It is used to run the stored procedure in database

You can perform below tasks in stored procedure transformation

 Check the status of the target database before loading data into it  Determine if enough space exists in data base or not

 Perform a specialized calculation  Dropping and re-creating indexes

Stored Procedure Transformation Overview:

One of the important features of stored procedure is that you can send data to the stored procedure and receive data from the stored procedure. There are three types of data which pass between the integration service and the stored procedure:

 Input / Output Parameters: Used to send and receive data from the stored procedure.

 Return Values: After running a stored procedure, most databases returns a value. This value can either be user-definable, which means that it can act similar to a single output parameter, or it may only return an integer value. If a stored procedure returns a result set rather than a single return value, the Stored Procedure transformation takes only the first value returned from the procedure.  Status Codes: Status codes provide error handling for the Integration Service during a workflow.

Stored procedure issues a status code that notifies whether or not the stored procedure completed successfully. You cannot see this value.

Connected and Unconnected Stored Procedure Transformation:

 Connected Stored Procedure Transformation: The stored procedure transformation is connected to other transformations in the flow of the mapping. Use connected Stored Procedure transformation when you need data from an input port sent as an input parameter to the stored procedure, or the results of a stored procedure sent as an output parameter to another

transformation.

 Unconnected Stored Procedure Transformation: The stored procedure transformation is not connected directly to the flow of the mapping. It either runs before or after the session, or is called by an expression in another transformation in the mapping.

(17)

Specifying when the Stored Procedure Runs:

The property, "Stored Procedure Type" is used to specify when the stored procedure runs. The different values of this property are shown below:

 Normal: The stored procedure transformation runs for each row passed in the mapping. This is useful when running a calculation against an input port. Connected stored procedures run only in normal mode.

 Pre-load of the Source: Runs before the session reads data from the source. Useful for verifying the existence of tables or performing joins of data in a temporary table.

 Post-load of the Source: Runs after reading data from the source. Useful for removing temporary tables.

 Pre-load of the Target: Runs before the session sends data to the target. This is useful for verifying target tables or disk space on the target system.

 Post-load of the Target: Runs after loading data into the target. This is useful for re-creating indexes on the database.

Connected and Unconnected Lookup

Connected Lookup Unconnected Lookup

Receives input values directly

from the pipeline. :LKP expression in another transformation. Receives input values from the result of a We can use a dynamic or static

cache. We can use a static cache. Cache includes all lookup columns

used in the mapping. the lookup condition and the lookup/return Cache includes all lookup/output ports in port.

If there is no match for the lookup condition, the Power Center Server returns the default

value for all output ports.

If there is no match for the lookup condition, the Power Center Server returns

NULL.

If there is a match for the lookup condition, the Power Center Server returns the result of the

lookup condition for all lookup/output ports.

If there is a match for the lookup condition,the Power Center Server returns

the result of the lookup condition into the return port.

Pass multiple output values to

another transformation. Pass one output value to another transformation. Supports user-defined default

(18)

Number of Transformations in Informatica?

Around 30

Normalizer Transformation in Informatica  It is an active transformation

 Can output multiple rows from each input row

 Can transpose the data(transposing columns to rows)

Transposing data using Normalizer

Let's imagine we have a table like below that stores the sales figure for 4 quarters of a year in 4 different columns. As you can see each row represent one shop and the columns represent the corresponding sales. Next, imagine - our task is to generate a result-set where we will have separate rows for every quarter. We can configure a Normalizer transformation to return a separate row for each quarter like below..

The following source rows contain four quarters of sales by store: Source Table

Store Quarter1 Quarter2 Quarter3 Quarter4

Shop 1 100 300 500 700

Shop 2 250 450 650 850

The Normalizer returns a row for each shop and sales combination. It also returns an index - called GCID (we will know later in detail) - that identifies the quarter number:

Target Table

Shop Sales Quarter

(19)

Shop 1 300 2 Shop 1 500 3 Shop 1 700 4 Shop 2 250 1 Shop 2 450 2 Shop 2 650 3 Shop 2 850 4 Persistent cache:

Lookups are cached by default in Informatica. This means that Informatica by default brings in the entire data of the lookup table from database server to Informatica Server as a part of lookup cache building activity during session run.

Persistent cache is required when lookup table size is huge and the same lookup table is being used in different mappings.

In persistent cache, the integration service saves the lookup cache files after a successful session run.

If by mistake, persistent cache gets deleted then integration services generates the cache file again when the session run next time.

So the solution is to use Named Persistent Cache.

In the first mapping we will create the Named Persistent Cache file by setting three properties in the Properties tab of Lookup transformation.

(20)

 Lookup cache persistent:

To be checked i.e. a Named Persistent Cache will be used.

 Cache File Name Prefix:

user_defined_cache_file_name i.e. the Named Persistent cache file name that will be used in all the other mappings using the same lookup table. Enter the prefix name only. Do not enter .idx or .dat

 Re-cache from lookup source:

To be checked i.e. the Named Persistent Cache file will be rebuilt or refreshed with the current data of the lookup table.

Next in all the mappings where we want to use the same already built Named Persistent Cache we need to set two properties in the Properties tab of Lookup transformation.

(21)

 Lookup cache persistent:

To be checked i.e. the lookup will be using a Named Persistent Cache that is already saved in Cache Directory and if the cache file is not there the session will not fail it will just create the cache file instead.

 Cache File Name Prefix:

user_defined_cache_file_name i.e. the Named Persistent cache file name that was defined in the mapping where the persistent cache file was created.

If there is any Lookup SQL Override then the SQL statement in all the lookups should match exactly even also an extra blank space will fail the session that is using the already built persistent cache file.

(22)

So if the incoming source data volume is high, the lookup table’s data volume that need to be cached is also high, and the same lookup table is used in many mappings then the best way to handle the situation is to use one-time build, already created persistent named cache

1. Connected or unconnected transformation, which is good in performance wise?  Unconnected better than connected.

 If you are calling lookup transformation multiple times in the same mapping, then go with unconnected. So that unconnected will build the cache once so that performance will reduce than connected transformation.

2. What will be your approach when a workflow is running for more hours?  Check which session is taking long time.

 Which process is taking more time.. Like.. building cache, reading data from the source, loading data into the target

 What was the regular source count and today’s count  Check if any database lock in source or target table.

 While building cache, we need to check if we have enough space in the server or not.

3. Why should we go for stored procedure transformation in Informatica, why can't we build the same logic in Informatica itself?

 Yes, we can do in Informatica as well but there are some situation where we must have to go with Stored procedure transformation like

 If the complex logic has to be built and if we try to build through Informatica, then we have to keep many number of transformations in the mapping

 Which will make more complex to understand?

4. What are the functionalities we can do in source qualifier transformation?  filtration of the records if source is RDBMS

 Joins of the tables

 Remove duplicate records thorough distinct option  Ordering the records

5. Joiner operations functionality, main properties in Joiner Transformations?

 joins more than one table

 Table with less records are master table  Table with more records are details table

(23)

6. Workflow failed, out of 200, 100 records got loaded, now how do you re-run the workflow?

If there is a lookup on target to check if we get new records then insert it otherwise update it, in this case we can re-run the workflow from the starting itself.

7. How do you handle duplicate record to be loaded into the Informatica?

 If the source system is RDBMS, and then to eliminate duplicate records we can check the DISTINCT option of the source qualifier in the source table and load the data into the target.  If the source system is flat file, and then we can use SORTER transformation to eliminate

duplicate records.

8. SCD type 2.. is lookup required?  Yes, it is required.

9. Difference between persistent cache and non-persistent cache? What if persistent cache gets deleted by mistake?

 Persistent cache does not delete the lookup cache even after the session has completed.  If the lookup tables has huge data and in many mappings it is being used. In this scenario we

should go for persistent cache.

 Because each time when we run the session, it has to build the cache and the time for building the cache depends on the size of the lookup table.

 If the size of the lookup table is small, then building cache will require less time.

 If the size of the lookup table is huge, then building cache will require more time. Hence, performance degrades.

 To avoid these performance degrade, build the cache once and use the same cache whenever required.

 If any changes in the data in the lookup table then need to refresh the persistent lookup cache.

10. How do you update target table without using update strategy transformation.  Yes, we can update target table without using update Strategy transformation.

 For this, we need to define the key in the target table in Informatica level and

 Then we need to connect the key and the field we want to update in the mapping Target.  In the session level, we should set the target property as "Update as Update" in the mapping

(24)

11. Which Informatica version you have worked? Informatica 9.1

12. Which scheduler you have used? Informatica scheduler

13. Which schema follows your project? Star Schema

14. How many mappings u have created in the project? Around 40 to 50

15. How do you load data into dimension table and fact table?

First we will load the data from source to the stage tables then stage to the dimensional table. And stage to the fact table.

16. How many fact and dimensional tables in your project? Fact Table – 1

Dimensional Table – 6

17. How many mapplets you have created in your project? Around 10

18. List of Errors which u faced in your project?

Issue: "Repository Error ([REP_55101] Connection to the Repository Service [INF_EN2_REP] is broken.)"

"Error updating sequence generator [SEQTRANS]."

Cause: Sequence generator value is not updated

Ticket to other teams : Raise SC with GAMS team

Solution: Sequence generator value should be updated with max (primary key) in that mapping. If the data is completely loaded into the target running the session is not required

(25)

Issue: Severity Timestamp Node Thread Message Code Message

ERROR 22/05/2010 12:11:26 p.m. NODE_INF_EN2 READER_1_1_1 RR_4035 SQL Error [

ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

Database driver error... Function Name : Execute

Cause: Low memory space

Ticket to other teams :

Raise SC with DB team to increase the table space.

Issue: “Database driver error... CMN_1022 [

Database driver error... Function Name: Logon

ORA-01017: invalid username/password; logon denied

Database driver error... Function Name: Connect

Database Error: Failed to connect to database using user [csdwh_inf] and connection string [EUERDWP1.AE.GE.COM].]”

Cause: Invalid Login details for this user

(26)

Issue: “ORA-01089: immediate shutdown in progress - no operations are permitted” Cause:: Network issue.

Solution: Rerun the session once the issue is fixed.

Issue: “ SQL Error [

ORA-01722: invalid number

Database driver error...

Cause:: This error occurs when an invalid data comes from ERP.

Ticket to other teams : Raise a SC with GAMS team.

Issue_{: Error connecting to database [Database driver error...} Function Name : Logon

ORA-12541: TNS : no listener Database driver error... Function Name : Connect

Database Error: Failed to connect to database using user [MKT_APP] and connection string [GPSOR252]

(27)

Cause - DB is changed to etpsblp2 from GPSOR252

Raise SC with ERP DB team and the DB team has to add the

TNS on the new server and the connection string should be changed to etpsblp2

Issue_: SQL Error [

ORA-00942: table or view does not exist

Cause:

GEPSREADONLY user is replaced with GEPSCSIDW

Work order should be raised with ERP DB team for getting access to the tables for GEPSCSIDW user.

Issue_: SQL Error [

ORA-02019: connection description for remote database not found

Cause - GEPSCSIDW user dont have access to P76ORP99R DB link

(28)

Issue_:

“Database driver error... CMN_1022 [

Database driver error... Function Name: Logon

ORA-03113: end-of-file on communication channel

Database driver error... Function Name : Connect

Database Error: Failed to connect to database using user [GEPSREADONLY] and connection string [gpsesp76]. “

Cause: DB is down.

Solution: Check with DB team if the DB is fine and rerun the session.

Issue_:

FATAL ERROR: Aborting the DTM process due to fatal signal/exception.

Cause:: DB issue

Solution: Rerun the session

Issue: “FATAL ERROR: Failed to allocate memory (out of virtual memory).”

(29)

Solution: Rerun the session

Issue Transformation Evaluation Error [<<Expression Error>> [TO_DATE]: invalid string for converting to Date

TO_DATE(s:LTRIM(s:RTRIM(s:'10.09am',s:' '),s:' '),s:'MM-DD-YYYY HH12:MI:SS AM')]

Cause:: Issue with mapping.

Ticket to other teams : Raise SC with GAMS team.

Issue: Unique Constraint violated error.

Cause Sequence generator value is not updated.

Solution: Sequence generator value must be updated with the value max (primary key) in that mapping.

Issue:: “Database driver error... CMN_1022 [

Database driver error... Function Name : Logon ORA-12541: TNS: no listener

(30)

Function Name: Connect

Database Error: Failed to connect to database using user [epsuser] and connection string [atlorp38].] “

Cause: TNS entries changed on the server.

Raise SC with ERP DB team to get the Tns details.

Issue: “Error connecting to database [ Database driver error...

Function Name : Logon

ORA-12537: TNS:connection closed

Database driver error... Function Name : Connect

Database Error: Failed to connect to database using user [gepsreadonly] and connection string [gpsescp1].].”

Cause: DB is down.

Solution: Check with DB team and if the instance is up then rerun the session.

Issue:

"Error opening file

[/dba96/d031/staging/SrcFiles/CSDWH01/INFA/CSDWH_TC_REQUEST_30Jun2008_28sep2008.dat ]. Operating system error message [No such file or directory]."

(31)

Issue:

"Database driver error... Function Name: Logon

ORA-12545: Connect failed because target host or object does not exist

Database driver error... Function Name: Connect

Database Error: Failed to connect to database using user [epsuser] and connection string [ATLORP38]."

Cause: Change in the login details.

Ticket to other teams : Raise SC with ERP DB team for login details

Issue:

Writer initialization failed [Error 13 opening session bad (reject) file [/dba96/d031/staging/Prod/BadFiles/csdwh_etech_doc_t1.bad]]

[Error 13 opening session bad (reject) file

[/dba96/d031/staging/Prod/BadFiles/tgt_csdwh_etech_milestone_d_upd1.bad]]

(32)

Project Architecture:

Source

Systems Staging Layer

(33)

What kind of enhancements you have worked on?

I have worked on many enhancements as per business requirements. In some enhancement, we had source as flat files like .dat and .csv files,

We need to develop a mapping to load the data from source systems to the staging environment after performing all field level validations. From stage to base table, we call PLSQL procedure to load the data.

All file level validations will be done through UNIX shell scripting.

In few enhancements, our source was database. We need to fetch the data from the table and generate a file files and send it to the integration team.

SESSSTARTTIME

 SESSSTARTTIME returns the current date and time value on the node that runs the session when the Integration Service initializes the session.

 Use SESSSTARTTIME in a mapping or a mapplet. You can reference SESSSTARTTIME only within the expression language.

SYSDATE

SYSDATE returns the current date and time up to seconds on the node that runs the session for each row passing through the transformation

(34)

What is the complex mapping you have developed? There was few mappings which I feel quite complex. I have used more transformation in the mapping.

Used around 10 different transformations in the mapping.  Source qualifier  Expression  Lookup  Stored procedure  Filter  Update Strategy  Router  Union

Field Level Validations includes

 Trimming the spaces through LTRIM and RTRIM  Check if the value is NULL or not using ISNULL function

 Check if the value contains any spaces using IS_SPACE function

 Check if the length is greater than specified limit using LENGTH function  Check if the value is NUMBER or not using NOT IS_NUMBER function.

 We get the country code then we do lookup on the PITSTBT_CNTRY master table and get the country name

 Calendar date, we will get as a string and then convert it to date/time using TO_DATE function.

 Used lookup sql override to fetch data by querying multiple tables. 

(35)

Pushdown Optimization

When you run a session configured for pushdown optimization, the Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database

(36)

This mapping contains an Expression transformation that creates an item ID based on the store number 5419 and the item ID from the source. To push the transformation logic to the database, the Integration Service generates the following SQL statement:

INSERT INTO T_ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC) SELECT CAST((CASE WHEN 5419 IS NULL THEN '' ELSE 5419 END) + '_' + (CASE WHEN ITEMS.ITEM_ID IS NULL THEN '' ELSE

ITEMS.ITEM_ID END) AS INTEGER), ITEMS.ITEM_NAME, ITEMS.ITEM_DESC FROM ITEMS2 ITEMS The Integration Service generates an INSERT SELECT statement to retrieve the ID, name, and description values from the source table, create new item IDs, and insert the values into the ITEM_ID, ITEM_NAME, and ITEM_DESC columns in the target table. It concatenates the store number 5419, an underscore, and the original ITEM ID to get the new item ID.

Pushdown Optimization Types

You can configure the following types of pushdown optimization:

Source-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the source database.

Target-side pushdown optimization. The Integration Service pushes as much transformation logic as possible to the target database.

Full pushdown optimization. The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization.

Running Source-Side Pushdown Optimization Sessions

When you run a session configured for source-side pushdown optimization, the Integration Service analyzes the mapping from the source to the target .

The Integration Service generates and executes a SELECT statement based on the transformation logic for each transformation it can push to the database. Then, it reads the results of this SQL query and processes the remaining transformations.

(37)

When you run a session configured for target-side pushdown optimization, the Integration Service analyzes the mapping from the target to the source. It generates an INSERT, DELETE, or UPDATE statement based on the transformation logic for each transformation it can push to the target database.

The Integration Service processes the transformation logic up to the point that it can push the transformation logic to the database. Then, it executes the generated SQL on the target database.

Running Full Pushdown Optimization Sessions

 To use full pushdown optimization, the source and target databases must be in the same relational database management system.

 When you configure a session for full optimization, the Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the target database.

 If the Integration Service cannot push all transformation logic to the target database, it tries to push all transformation logic to the source database.

 If it cannot push all transformation logic to the source or target, the Integration Service pushes as much transformation logic to the source database, processes intermediate transformations that it cannot push to any database, and then pushes the remaining transformation logic to the target database.

 The Integration Service generates and executes an INSERT SELECT, DELETE, or UPDATE statement for each database to which it pushes transformation logic.

For example, a mapping contains the following transformations:

The Rank transformation cannot be pushed to the source or target database. If you configure the session for full pushdown optimization, the Integration Service pushes the Source Qualifier transformation and the Aggregator transformation to the source, processes the Rank

transformation, and pushes the Expression transformation and target to the target database. The Integration Service does not fail the session if it can push only part of the transformation logic to the database.

What are the types of loading in Informatica?

There are mainly two types of loading in Informatica

(38)

1. Normal 2. Bulk

In Normal Load, record by record gets loaded into the target table and it generates log for that but it takes time.

In Bulk Load, a number of records get loaded into the target table but it ignores logs. It takes less time to load the data into the target table.

Difference between Full Load and Incremental Load?

In Full Load or One-time load or History Load, complete data from source table will be loaded into the target table in single time. It truncates all rows and loads from scratch. It takes more time. In Incremental load, difference between target and source data is loaded at regular interval. Timestamp of pervious load has to be maintained. It takes less time.

By Full Load or One-time load we mean that all the data in the Source table(s) should be processed. This contains historical data usually. Once the historical data is loaded we keep on doing incremental loads to process the data that came after one-time load

PMCMD Command

(39)

 Start workflows.

 Start workflow from a specific task.  Stop Abort workflows and Sessions.  Schedule the workflows.

How to use PMCMD Command in Informatica:

1. Start workflow

the following pmcmd command starts the specified workflow:

pmcmd startworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

2. Stop workflow

Pmcmd command to stop the Informatica workflow is shown below:

pmcmd stopworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

3. Start workflow from a task

you can start the workflow from a specified task. This is shown below:

pmcmd startask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-user-name -w workflow-user-name -startfrom task-user-name

4. Stopping a task.

The following pmcmd command stops the specified task instance:

pmcmd stoptask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-user-name -w workflow-user-name task-user-name

5. Aborting workflow and task.

(40)

pmcmd abortworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

pmcmd aborttask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name task-name

6. Scheduling the workflow

the pmcmd command syntax for scheduling the workflow is shown below:

pmcmd scheduleworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

You cannot specify the scheduling options here. This command just schedules the workflow for the next run.

Tried with below command

pmcmd startworkflow -service Alpha_Dev_UTF8_IS -d Capital_Americas_Development -u 502197909 -p D502197909 -f CFDW_LOAN_LAYER -wait wflow_test_1001

(41)

Partitioning In Informatica

 Is used to improve performance in Informatica  It is done at session Level

 Adding partitions in the pipeline  Use more of the system hardware

 Achieve performance through parallel data processing

 A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.

 When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.

PARTITIONING ATTRIBUTES 1. Partition points

 By default, IS sets partition points at various transformations in the pipeline.

(42)

 We can define up to 64 partitions at any partition point in a pipeline.

 When we increase or decrease the number of partitions at any partition point, the

Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.

 Increasing the number of partitions or partition points increases the number of threads.

 The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.

(43)

3. Partition types

 The Integration Service creates a default partition type at each partition point.

 If we have the Partitioning option, we can change the partition type. This option is purchased separately.

 The partition type controls how the Integration Service distributes data among partitions at partition points.

Types of Partition

 Database Partitioning

In Database Partitioning Integration Service queries the oracle database system for table partition information. It reads the partitioned data from the corresponding nodes in the database.

 Hash auto-keys. The Integration Service uses a hash function to group rows of data among partitions. The Integration Service groups the data based on a partition key.

 Hash user keys. The Integration Service uses a hash function to group rows of data among partitions. You define the number of ports to generate the partition key.

 Key range. With key range partitioning, the Integration Service distributes rows of data based on a port or set of ports that you define as the partition key. For each port, you define a range of values. The Integration Service uses the key and ranges to send rows to the appropriate partition. Use key range partitioning when the sources or targets in the pipeline are partitioned by key range.

 Pass-through. In pass-through partitioning, the Integration Service processes data without redistributing rows among partitions. All rows in a single partition stay in the partition after crossing a pass-through partition point. Choose pass-through partitioning when you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions

 Round-robin. The Integration Service distributes data evenly among all partitions. Use round-robin partitioning where you want each partition to process approximately the same number of rows

(44)

New Features in Informatica 9.1

 Database deadlock resilience feature - this will ensure that your session does not immediately fail if it encounters any database deadlock, it will now retry the operation again. You can configure number of retry attempts

Multiple rows return

Lookups can now be configured as an Active transformation to return Multiple Rows.We can configure the Lookup transformation to return all rows that match a lookup condition

Limit the Session Log

You can limit the size of session logs for real-time sessions.

Passive transformation

We can configure the SQL transformation to run in passive mode instead of active mode. When the SQL transformation runs in passive mode, the SQL transformation returns one output row for each input row.

Important Notes on above features

Database Deadlock Resilience

When a database deadlock error occurs, the session does not fail. The Integration Service attempts to re-execute the last statement for a specified retry period.

You can configure the number of deadlock retries and the deadlock sleep interval for an Integration Service. You can override these values at the session level as custom properties.

(45)

Configure following Integration Service Properties:

NumOfDeadlockRetries. The number of times the PowerCenter Integration Service retries a target write on a database deadlock. Minimum is 0. Default is 10. If you want the session to fail on

deadlock set NumOfDeadlockRetries to zero.

DeadlockSleep. Number of seconds before the PowerCenter Integration Service retries a target write on database deadlock.

If a deadlock occurs, the Integration Service attempts to run the statement. The Integration Service waits for a delay period between each retry attempt. If all attempts fail due to deadlock, the session fails. The Integration Service logs a message in the session log whenever it retries a statement.

(46)

DTM PROCESS

DTM means Data Transformation Manager. In informatica this is the main background process which runs after completion of the Load Manager.

When Powercenter Server runs a workflow it initializes Load manager and the Load Manager is responsible to perform the below tasks.

1. Locks the workflow and reads the workflow properties 2. Reads the parameter file and expands the workflow Variables 3. Created the workflow log file.

4. Distributes the session to worker servers. 5. Starts the session and Create DTM process DTM Process gets started it does the below tasks

1. It fetches session and mapping metadata from the repository. 2. Creates and expands session variables

3. Creates the session log file.

4. Validates the source and target code pages. 5. Verifies the connection object permissions 6. Runs pre and post session commands

7. It creates the three processes i.e Reader thread, Transformation Thread and Writer thread to extract, transform and load the data.

Expression Transformation

 It is passive and connected transformation.  It is used to calculate values on a single row.

 Examples of calculations are concatenating the first and last name, adjusting the employee salaries, converting strings to date etc.

 Expression transformation can also be used to test conditional statements before passing the data to other transformations.

(47)

Sorter Transformation

 It is active and connected transformation.

 It is used to sort the data in ascending or descending order.

 The sorter transformation is used to sort the data from relational or flat file sources.  The sorter transformation can also be used for case-sensitive sorting and can be used to

specify whether the output rows should be distinct or not.

Performance improvement Tip

Use the sorter transformation before the aggregator and joiner transformation and sort the data for better performance.

Filter Transformation

 It is active and connected transformation.  It is used to filter out rows in the mapping.

Performance Tuning Tips

 Use the filter transformation as close as possible to the sources in the mapping. This will reduce the number of rows to be processed in the downstream transformations.

 In case of relational sources, if possible use the source qualifier transformation to filter the rows. This will reduce the number of rows to be read from the source.

Note: The input ports to the filter transformation must come from a single transformation. You cannot connect ports from more than one transformation to the filter.

(48)

Aggregator Transformation

 It is active and connected transformation.

 It is used to perform calculations such as sums, averages, counts on groups of data.

Configuring the aggregator transformation:

You can configure the following components in aggregator transformation

 Aggregate Cache: The integration service stores the group values in the index cache and row data in the data cache.

 Aggregate Expression: You can enter expressions in the output port or variable port.

 Group by Port: This tells the integration service how to create groups. You can configure input, input/output or variable ports for the group.

 Sorted Input: This option can be used to improve the session performance. You can use this option only when the input to the aggregator transformation in sorted on group by ports.

Incremental Aggregation:

After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally.

Sorted Input:

You can improve the performance of aggregator transformation by specifying the sorted input. The Integration Service assumes all the data is sorted by group and it performs aggregate calculations as it reads rows for a group. If you specify the sorted input option without actually sorting the data, then integration service fails the session.

(49)

Rank Transformation

 It is an Active and Connected Transformation.  It is used to select top or bottom rank of data.

 When we create RANK Transformation, by default it creates RANKINDEX port. This port is used to store the ranking position of each row in the group.

 In the ports tab, check the Rank (R) option for the port which you want to do ranking. You can check the Rank (R) option for only one port. Optionally you can create the groups for ranked rows. Select the Group By option for the ports that define the groups.

Configure the following properties of Rank transformation

 Top/Bottom: Specify whether you want to select the top or bottom rank of data.  Number of Ranks: specify the number of rows you want to rank.

 Rank Data Cache Size: The data cache size default value is 2,000,000 bytes. You can set a numeric value, or Auto for the data cache size. In case of Auto, the Integration Service determines the cache size at runtime.

 Rank Index Cache Size: The index cache size default value is 1,000,000 bytes. You can set a numeric value, or Auto for the index cache size. In case of Auto, the Integration Service determines the cache size at runtime.

(50)

Stored Procedure Transformation

 It is passive and can be acts as connected or unconnected transformation.

 Stored Procedure Transformation is used to run the stored procedure in the database.

Some of the Tasks we can perform through Stored Procedure Transformation.  Check the status of a target database before loading data into it.

 Determine if enough space exists in a database.  Perform a specialized calculation.

 Dropping and recreating indexes

Connected Stored Procedure Transformation

It is connected to other transformations in the mapping. Use connected stored procedure transformation when data from an port sent as input parameters to the stored procedure transformation.

Unconnected Stored Procedure Transformation

It is not directly connected to other transformations in the mapping. It runs either before or after the session or is being called by an expression in other transformation in the mapping.

Specifying when the Stored Procedure Runs:

The property, "Stored Procedure Type" is used to specify when the stored procedure runs. The different values of this property are shown below:

 Normal: The stored procedure transformation runs for each row passed in the mapping. This is useful when running a calculation against an input port. Connected stored procedures run only in normal mode.

 Pre-load of the Source: Runs before the session reads data from the source. Useful for verifying the existence of tables or performing joins of data in a temporary table.

(51)

 Post-load of the Source: Runs after reading data from the source. Useful for removing temporary tables.

 Pre-load of the Target: Runs before the session sends data to the target. This is useful for verifying target tables or disk space on the target system.

 Post-load of the Target: Runs after loading data into the target. This is useful for re-creating indexes on the database.

Update Strategy Transformation

 It is an Active and connected transformation.

 It is used to insert, update and delete the records in the target table.  It can also reject the records without reaching target tables.

In the Informatica, you can set the update strategy at two different levels:

 Session Level: Configuring at session level instructs the integration service to either treat all rows in the same way (Insert or update or delete).

 Mapping Level: Use update strategy transformation to flag rows for insert, update, delete or reject.

Flagging Rows in Mapping with Update Strategy:

You have to flag each row for inserting, updating, deleting or rejecting. The constants and their numeric equivalents for each database operation are listed below.

 DD_INSERT: Numeric value is 0. Used for flagging the row as Insert.  DD_UPDATE: Numeric value is 1. Used for flagging the row as Update.  DD_DELETE: Numeric value is 2. Used for flagging the row as Delete.  DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.

The integration service treats any other numeric value as an insert

Important Note:

Update strategy works only when we have a primary key on the target table. If there is no primary key available on the target table, then you have to specify a primary key in the target definition in the mapping for update strategy transformation to work.

(52)

Lookup Transformation

 It is passive/Active transformation.

 It can be used in both connected/unconnected modes.  It is used to look up data in flat file or relational database.

 From Informatica 9 onwards, Lookup is an active transformation. It can return single row or multiple rows.

The lookup transformation is used to perform the following tasks:

 Get a Related Value: You can get a value from the lookup table based on the source value. As an example, we can get the related value like city name for the zip code value.

 Get Multiple Values: You can get multiple rows from a lookup table. As an example, get all the states in a country.

 Perform Calculation. We can use the value from the lookup table and use it in calculations.  Update Slowly Changing Dimension tables: Lookup transformation can be used to determine

whether a row exists in the target or not.

Connected or Unconnected lookup: A connected lookup receives source data, performs a lookup and returns data to the pipeline. An unconnected lookup is not connected to source or target or any other transformation. A transformation in the pipeline calls the lookup transformation with the :LKP expression. The unconnected lookup returns one column to the calling transformation.

Cached or Un-cached Lookup: You can improve the performance of the lookup by caching the lookup source. If you cache the lookup source, you can use a dynamic or static cache. By default, the lookup cache is static and the cache does not change during the session. If you use a dynamic cache, the integration service inserts or updates row in the cache. You can lookup values in the cache to determine if the values exist in the target, then you can mark the row for insert or update in the target.

(53)

Union Transformation

 It is Active and Connected Transformation.

 It is used to merge the data from multiple pipelines into single pipelines.  It is same as UNION ALL transformation in SQL.

 It does not remove any duplicate rows.

Why union transformation is active

Union is an active transformation because it combines two or more data streams into one. Though the total number of rows passing into the Union is the same as the total number of rows passing out of it, and the sequence of rows from any given input stream is preserved in the output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might not be row number 1 in the output stream. Union does not even guarantee that the output is repeatable

Joiner Transformation

 It is Active and Connected Transformation.  It is used to join two heterogeneous sources.

 The integration service joins both the sources based on the Join Condition.  The two input pipelines include a master and a detail pipeline.

 To join more than our sources, you need to join the output of joiner transformation with another source.

 To join n number of sources, we have to use n-1 joiner transformations in the mapping.

(54)

 Drag the ports from the first source into the joiner transformation. By default the designer creates the input/output ports for the source fields in the joiner transformation as detail fields.

 Now drag the ports from the second source into the joiner transformation. By default the designer configures the second source ports as master fields

Join Condition

The integration service joins both the input sources based on the join condition. The join condition contains ports from both the input sources that must match. You can specify only the equal (=) operator between the join columns. Other operators are not allowed in the join condition. As an example, if you want to join the employees and departments table then you have to specify the join condition as department_id1= department_id. Here department_id1 is the port of departments source and department_id is the port of employees source.

Join Type

The joiner transformation supports the following four types of joins.  Normal Join

 Master Outer Join  Details Outer Join  Full Outer Join

We will learn about each join type with an example. Let say i have the following students and subjects tables as the source.

Table Name: Subjects Subject_Id subject_Name --- 1 Maths

2 Chemistry 3 Physics

Table Name: Students Student_Id Subject_Id

(55)

--- 10 1

20 2 30 NULL

Assume that subjects source is the master and students source is the detail and we will join these sources on the subject_id port.

Normal Join:

The joiner transformation outputs only the records that match the join condition and discards all the rows that do not match the join condition. The output of the normal join is

2 Chemistry 20 2

Master Outer Join:

In a master outer join, the joiner transformation keeps all the records from the detail source and only the matching rows from the master source. It discards the unmatched rows from the master source. The output of master outer join is

2 Chemistry 20 2 NULL NULL 30 NULL

(56)

Detail Outer Join:

In a detail outer join, the joiner transformation keeps all the records from the master source and only the matching rows from the detail source. It discards the unmatched rows from the detail source. The output of detail outer join is

2 Chemistry 20 2 3 Physics NULL NULL

Full Outer Join:

The full outer join first brings the matching rows from both the sources and then it also keeps the non-matched records from both the master and detail sources. The output of full outer join is

(57)

2 Chemistry 20 2 3 Physics NULL NULL NULL NULL 30 NULL

Joiner Transformation Performance Improve Tips

To improve the performance of a joiner transformation follow the below tips

 If possible, perform joins in a database. Performing joins in a database is faster than performing joins in a session.

 You can improve the session performance by configuring the Sorted Input option in the joiner transformation properties tab.

 Specify the source with fewer rows and with fewer duplicate keys as the master and the other source as detail

Normalizer Transformation

 It is active and connected Transformation.  It returns multiple rows for a source row. The two new ports are

 GK field generate sequence number starting from the value as defined in sequence field.  GCID holds column number of the occurrence field.

Router Transformation

 Router Transformation is an Active and Connected Transformation.  It is used to filter the data based on some condition.

 In a filter transformation, you can specify only one condition and drops the rows that do not satisfy the condition.

 Where as in a router transformation, you can specify more than one condition and provides the ability for route the data that meet the test condition.

 Use router transformation if you need to test the same input data on multiple conditions

Advantages of Using Router over Filter Transformation

Use router transformation to test multiple conditions on the same input data. If you use more than one filter transformation, the integration service needs to process the input for each filter