What are the points we need to look for? - Informatica Question & Answer Set

Answer:

When a source flat file is loaded to a staging database table, generally we focus on the below items:

 Define proper file-format for the input file (Delimited/Fixed-width), Code Page etc.

 Header information having any Processing date to be checked with sysdate or some other business logic.

 Check the detail records count in the file with the information in the Trailer information if any.

 Sum of any measure fields of detail records matches with Header/Trailer information if any.

 In case of Indirect Loading we can add the filename and record number in file as part of columns in the staging table.

Basically everything depends on your/business requirement.

9. What is the difference between Joiner and Lookup. Performance wise which one is better to use.

Answer:

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

 Only “=” operator can be used in join condition

 Supports normal, full outer, left/right outer join

 Active transformation

 Earlier a Passive transformation, 9 onwards an Active transformation (Can return more than 1 rec-ords in case of multiple match)

 Supports SQL override

 Connected/Unconnected

 Supports dynamic cache update

 Relational/FlatFile source/target

 Pipeline Lookup

Selection between these two transformations is completely dependent on project requirement. It’s a debat-able topic to conclude which one among these two serves good in terms of performance.

10. What is the B2B in Informatica? How can we use it in Informatica?

Answer:

B2B allows to parse and read unstructured data such as PDF, EXCEL, HTML etc. It has the capability to read binary data such as Messages, EBCDIC File etc. and has a very large list of supported formats.

B2B Data Transformation Studio is the Developer tool, by which the parsing of (reading) the unstructured da-ta is done. B2B mostly gives the output as an XML file.

B2B Data Transformation is integrated with Informatica PowerCenter using a Transformation "Unstructured Data Transformation", This transformation can receive the output of B2B Data Transformation studio and load into any Target supported by PowerCenter.

11. What is CDC, SCD and MD5 in Informatica?

Answer:

 CDC - Changed Data Capture. How, only the changed data is captured from the Source System.

 SCD- Slowly Changing Dimension. How, history data is maintained in the Dimension tables.

 MD5- MD5 Checksum Encoding. It generates 32 character HEX code encoding, can be used to decide Insert/Update strategy for target records.

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

12. How can we implement an SCD Type2 mapping without using a lookup transformation?

Answer:

The entire implementation will be same as that using a lookup. The only thing we need to replace the Lookup transformation with a Joiner transformation. In the Joiner transformation the Source table will be used as Master and the Target table as Detail. The join condition will be same as that of lookup condition and the join type being Detail Outer Join.

13. How does Joiner and Lookup transformation treat NULL value matching?

Answer:

A NULL value is not equal to another NULL value in Joiner whereas, Lookup transformation matches null val-ues.

14. Does Microsoft SQL server supports bulk loading? If yes, What happens when you specify bulk mode and data driven for SQL server target

Answer:

Yes MS SQL Server supports Bulk Loading. But if we select Treat Source Rows as Data Driven with the Target Load Type as Bulk then the session will fail. We have to select Normal Load with Data Driven source records.

15. How can you utilize COM components in Informatica?

Answer:

By writing C+, VB, VC++ code in External Stored Procedure Transformation

16. What is SQL transformation in Informatica?

Answer:

A SQL transformation can processes any SQL queries midstream in an Informatica pipeline. It supports mostly all the DDL, DML, DCL, TCL.

For quick reference following are some important notes:-

 We can configure the SQL transform in two modes that makes it Active/Passive.

 Active, Query mode fires the SQL query in the database defined in the transformation.

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

 Query mode can be configured to handle Static SQL Query (i.e. the SQL query is the same with bind variables) or Dynamic SQL Query (i.e. different query statements for each input row).

 In case of Dynamic Query when we substitute the entire SQL query of the Query_Port is called Full Query or portion of the query statement called Partial Query.

 We can configure the SQL transformation to connect to a database with a Static Connection (i.e. se-lecting a particular connection object) or Dynamic Connection (i.e. based on the logic it will dynami-cally select the connection object to connect to a database).

Also we can pass the entire database connection information (i.e. username,password, connectstring, codepage) called Full Database Connection.

17. What is a XML source qualifier?

Answer:

The XML source qualifier represents the data elements that the Informatica server reads when it runs a ses-sion with XML sources.

18. What is the “metadata extensions” tab in Informatica?

Answer:

PowerCenter allows end users and partners to extend the metadata stored in the repository by associating information with individual objects in the repository. That why it’s called Metadata Extension.

For example, when we create a mapping, we can store the information like the mapping functionality, busi-ness user information, CR information. Similarly for Session we can store schedule information, contact per-son for failed session information. We basically associate the information with repository metadata using metadata extensions.

When we create reusable metadata extensions for a repository object using the Repository Manager, the metadata extension becomes part of the properties of that type of object. For example, we can create a re-usable metadata extension for source definition called SourceCreator. When we create or edit any source definition in the Designer, the SourceCreator extension appears on the Metadata Extensions tab. anyone who creates or edits a source can enter the name of the person that created the source into this field.

PowerCenter Client applications can contain the following types of metadata extensions:-

 Vendor-defined. Third-party application vendors create vendor-defined metadata extensions. We can view and change the values of vendor-defined metadata extensions, but we cannot create, de-lete, or redefine them.

 User-defined. We create user-defined metadata extensions using PowerCenter. We can create, edit, delete, and view user-defined metadata extensions. We can also change the values of user-defined extensions.

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

party applications or other Informatica products, we may see domains such as Ariba or PowerExchange for Siebel. We cannot edit vendor-defined domains or change the metadata extensions in them.

User-defined metadata extensions exist within the User Defined Metadata Domain. When we create metadata extensions for repository objects, we add them to this domain.

Both vendor and user-defined metadata extensions can exist for the repository objects- Source definitions, Target definitions, Transformations, Mappings, Mapplets, Sessions, Tasks, Workflows, Worklets.

19. Describe some of the ETL Best Practices

Answer:

A lot of best practices may be applicable to a certain tool and pointless for the other. In a very high level and in a very tool independent way-

 Naming conventions for ETL objects

 Naming conventions for Database objects

 Parameterization of connections (so that things are easy for moving from 1 environment to other)

 Maintaining of ETL job log - ideally automated maintenance through logging of job run

 Handling of rejected records (and logging)

 Data reconciliation

 Meta data management- e.g. - maintaining Meta data columns in tables (Use of Audit columns e.g.

load date/ load user/ batch id etc.)

 Error reporting

 ETL job Performance evaluation

 Following generic coding standards

 Documentation

 Decomposing complex logic in multiple ETL stages - load balancing (pushdown optimization wherev-er applicable) etc.

 Removal of unwanted ports from different transformations used in a mapping

 Using Shortcuts for source, target and lookups

 Using mapplet, worklet as and when required

 Write some comments for every transformation

 Use Decode function rather that “if than else”

 make sure that the sorted data is moved into the aggregator transformation

 If the target table is having indexes, loading data into such tables will decrease the performance; in such situations, use pre SQL to drop the index before loading the data into target tables and once the data is loaded then, re-create the index using post SQL.

20. Is there a scope of cloud computing in Data warehousing technology?

Answer:

This is not only possible; in fact, this is the way to go for many of the providers of the modern day BI tools.

There are certain advantages and benefits of using cloud computing for Business Intelligence applications and this is a big topic of discussion today. I will quickly touch upon a few points that will substantiate the

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

 High Infrastructure requirement, leading to high upfront investment

 High development cost (needs special talent) as well high maintenance cost

 Unpredictable workload (data volume), and skewed business growth pattern

All these lead to the issues of longer cycle time and limited adoption of BI solutions. Now cloud platform, as opposed to typical in-house software platform, is basically an alternative delivery method for the software service. When you deliver the software or platform or infrastructure (as a service) through cloud, you can instantly start to get the following benefits:

 Lower entry cost

 Lower maintenance cost (pay as you use)

 Faster deployment

 Reduced risk

 Lower TCO (total cost of ownership)

 Multiple deployment model etc. etc.

Moreover, Small and medium enterprises (SMEs) can easily adapt to this model given their typical con-straints of small business. Companies like Pentaho etc. are already “in” with their products in SaaS (soft-ware as a service) model of cloud computing. But cloud models like SaaS has some typical problems (e.g.

no flexibility of design, security concerns etc.).

As opposed to SaaS model, we have another cloud model called PaaS - Platform as a service - which has the benefit of design flexibility. PaaS is very suitable for custom applications and even enterprise level BI applications. This cloud service is being offered by almost everyone in the BI market - - BusinessObjects - SAS - Microsoft Azure (check here: http://en.wikipedia.org/wiki/SQL_Azure ) - Vertica - Greenplum etc.

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

17. Mapping

1. Scenario Implementation 1

Suppose we have a source port called ename with data type varchar(20) and the corresponding target port as ename with varchar(20). The data type is now altered to varchar(50) in both source and target database.

Describe the changes required to modify the mapping.

Answer:

Reimport the source and target definition. Next open the mapping and Right click on the source port ename and use "Propagate Attribute" option. This option allows us to change the properties of one port across mul-tiple transformations without manually modifying the port in each and every transformation. We can choose the direction of propagation (forward / backward / both) and can also select attributes of propagation e.g.

data type, scale, precision etc.

2. What are mapping parameters and variables?

Answer:

A mapping parameter is a user definable constant that takes up a value before running a session.

It can be used in SQ expressions, Expression transformation etc.

A mapping variable is also defined similar to the parameter except that the value of the variable is subjected to change. It picks up the value in the following order.

 From the Session parameter file

 As stored in the repository object in the previous run

 As defined in the initial values in the designer

 Data type Default values

3. Which type of variables or parameters can be declared in parameter file?

$, $$, $$$ - Can all be declared or not.

Answer:

There is a difference between variable and parameter.

 Variable, as the name suggests, is like a variable value which can change within a session run.

 Parameters are fixed and their values don't change during session run.

 $ - for session level parameters which can be declared in parameter files.

 $$ - for mapping level parameters which can be declared in parameter files.

 $$$- Inbuilt Informatica system variables that cannot be declared in parameter files

E.g. $$$SessStartTime these are constant throughout the mapping and cannot be changed.

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

Read this article to get a detail understanding:http://www.dwbiconcepts.com/etl/14-etl-informatica/74-stop-hardcoding-follow-parameterization-technique.html

4. What are the default values for variables?

Answer:

 String = Null

 Number = 0

 Date = 1/1/1753

5. What does first column of bad file (rejected rows) indicates?

Answer:

 First Column - Row indicator (0, 1, 2, 3)

 Second Column – Column Indicator (D, O, N, T)

6. Out of 100000 source rows some rows get discard at target, how will you trace them and where it gets loaded?

Answer:

 Rejected records are loaded into bad files. It has record indicator and column indicator.

 Record indicator identified by (0-insert,1-update,2-delete,3-reject) and

 Column indicator identified by (D-valid,O-overflow,N-null,T-truncated).

 Normally data may get rejected in different reason due to transformation logic

7. What is Reject loading?

Answer:

During a session, the Informatica server creates a reject file for each target instance in the mapping. If the writer or the target rejects data, the Informatica server writes the rejected row into reject file. The reject file and session log contain information that helps you determine the cause of the reject. You can correct reject files and load them to relational targets using the Informatica reject load utility. The reject loader also cre-ates another reject file for the data that the writer or target reject during the reject loading.

Reject Loading

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

data and re-load them to relational targets, using the reject loading utility. (You cannot load rejected data in-to a flat file target) Each time, you run a session, the server appends a rejected data in-to the reject file.

Locating the BadFiles

 $PMBadFileDir / Filename.bad

When you run a partitioned session, the server creates a separate reject file for each partition.

Reading Rejected data

Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00

To help us in finding the reason for rejecting, there are two main things.

 Row indicator - Row indicator tells the writer, what to do with the row of wrong data.

Row indicator Meaning Rejected By o 0 Insert Writer or target o 1 Update Writer or target o 2 Delete Writer or target o 3 Reject Writer

If a row indicator is 3, the writer rejected the row because an update strategy expression marked it for reject.

 Column indicator - Column indicator is followed by the first column of data, and another column in-dicator. They appears after every column of data and define the type of data preceding it

Column Indicator Meaning Writer Treats as

o D Valid Data Good Data. The target accepts it unless a database error occurs, such as finding duplicate key.

o Overflow Bad Data.

o N Null Bad Data.

o T Truncated Bad Data

NOTE: NULL columns appear in the reject file with commas marking their column.

Correcting Reject File

Use the reject file and the session log to determine the cause for rejected data. Keep in mind that correcting the reject file does not necessarily correct the source of the reject. Correct the mapping and target database to eliminate some of the rejected data when you run the session again. Trying to correct target rejected rows before correcting writer rejected rows is not recommended since they may contain misleading column indicator. For example, a series of “N” indicator might lead you to believe the target database does not accept NULL values, so you decide to change those NULL values to Zero. However, if those rows also had a 3 in row indicator. Column, the row was rejected b the writer because of an update strategy expression, not

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

8. Why Informatica writer thread may reject a record?

Answer:

 Data overflowed column constraints

 An update strategy expression

9. Why target database can reject a record?

Answer:

 Data contains a NULL column

 Database errors, such as key violations

10. Describe various steps for loading reject file?

Answer:

 After correcting the rejected data, rename the rejected file to reject_file.in

 The rejloader used the data movement mode configured for the server. It also used the code page of server/OS. Hence do not change the above, in middle of the reject loading

 Use the reject loader utility Pmrejldr pmserver.cfg [folder name] [session name]

11. Variable v1 has values set as 5 in designer (default), 10 in parameter file, and 15 in reposi-tory. While running session which value Informatica will read?

Answer:

Informatica read value 15 from repository

12. What are shortcuts? Where it can be used? What are the advantages?

Answer:

There are 2 shortcuts (Local and global) Local used in local repository and global used in global repository.

The advantage is reusing an object without creating multiple objects. Say for example a source definition want to use in 10 mappings in 10 different folders without creating 10 multiple source you create 10 shortcuts.

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

13. Can we have an Informatica mapping with two pipelines, where one flow is having a Transaction Control transformation and another not. Explain why?

Answer:

No it is not possible. Whenever we have a Transaction Control transformation in a mapping, the session commit type is ‘User Defined’. Whereas for a pipeline without the Transaction Control transform, the session expects the commit type to be either Source based or Target based.

Hence we cannot have both the pipelines in a single mapping; rather we have to develop single mappings for each of the pipelines.

14. How can we implement Reverse Pivoting using Informatica transformations?

Answer:

Pivoting can be done using Normalizer transformation. For reverse-pivoting we will need to use an aggrega-tor transformation like below:

From,

Col1 Col2 A 10 B 20

To,

Col1 Col2

A B

10 20

can be done using one Expression transformation and one Aggregator transformation:

In Expression transform, create two ports, o_col_a, o_col_b.

o_col_a = IIF (col1="A", ColB, 0) o_col_b = IIF (col1="B", ColB, 0)

Next in the aggregator transform, take the MAX () of o_col_a, o_col_b and map it to target A and B columns.

(We may need to take SUM (), instead of MAX () if we have multiple A, B rows)

15. Is it possible to update a Target table without any key column in target?

Answer:

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

Yes it is possible to update the target table either by defining keys at Informatica level in Warehouse designer or by using Update Override.

DWBIConcepts DWBIConcepts DWBIConcepts DWBIConcepts

18. Mapplet

1. What is a Mapplet?

Answer:

Mapplets are reusable objects that represent collection of transformations.

2. What is the difference between Reusable transformation and Mapplet?

Answer:

Any Informatica Transformation created in the Transformation Developer or a non-reusable pro-moted to reusable transformation from the mapping designer which can be used in multiple mappings is known as Reusable Transformation. When we add a reusable transformation to a mapping, we actually add an instance of the transformation. Since the instance of a reusable transformation is a pointer to that transformation, when we change the transformation in the Transformation Developer, its instances reflect these changes.

A Mapplet is a reusable object created in the Mapplet Designer which contains a set of transformations and lets us reuse the transformation logic in multiple mappings. A Mapplet can contain as many transformations as we need. Like a reusable transformation when we use a mapplet in a mapping, we use an instance of the mapplet and any change made to the mapplet in Mapplet Designer, is inherited by all instances of the mapplet.

3. What are the transformations that are not supported in Mapplet?

Answer:

 Normalizer

 Cobol sources

 XML sources

 XML Source Qualifier

 Target definitions

 Pre- and Post- session Stored Procedures

 Other Mapplet

4. Is it possible to convert reusable transformation to a non-reusable one?

In document Informatica Question & Answer Set (Page 65-79)