• No results found

What can you do to increase performance or explain Performance tuning in Informatica?

In document informatica (Page 60-65)

What can you do to increase performance or explain Performance tuning in

Informatica?

What can you do to increase performance or explain Performance tuning in Informatica?

The goal of performance tuning is to optimize session performance so sessions run during the available load window for the Informatica Server.Increase the session performance by following:

The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance.So aviod

netwrok connections.

Flat files: If your flat files stored on a machine other than the informatca server, move those files to the machine that consists of informatica server.

Relational datasources: Minimize the connections to sources, targets and informatica server to improve session performance.Moving target database into server system may improve session performance.

Staging areas: If you use staging areas you force informatica server to perform multiple datapasses.Removing of staging areas may improve session performance.

You can run the multiple informatica servers’ againist the same repository.Distibuting the session load to multiple informatica servers may improve session performance.

Run the informatica server in ASCII datamovement mode improves the session performance. Because ASCII datamovement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a character.

If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.

We can improve the session performance by configuring the network packet size, which allows data to cross the network at one time. To do this go to server manger, choose server configure database connections.

If your target consists key constraints and indexes you slow the loading of data. To improve the session performance in this case drop constraints and indexes before you run the session and rebuild them after completion of session.

Running parallel sessions by using concurrent batches will also reduce the time of loading the data. So concurent batches may also increase the session performance.

sources and targets and loads data in paralel pipe lines.

In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance.

Aviod transformation errors to improve the session performance.

If the session contained lookup transformation you can improve the session performance by enabling the look up cache.

If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier.

Aggreagator, Rank and joiner transformation may often decrease the session performance .Because they must group data before processing it. To improve session performance in this case use sorted ports option.

What is snow flake scheme design in database?

Snow flake schema is one of the designs that are present in database design. Snow flake schema serves the purpose of dimensional modeling in data warehousing. If the dimensional table is split into many tables, where the schema is inclined slightly towards normalization, then the snow flake design is utilized. It contains joins in depth. The reason is that, the tables split further.

Explain the difference between star and snowflake schemas?

Star schema: A highly de-normalized technique. A star schema has one fact table and is

associated with numerous dimensions table and depicts a star.

Snow flake schema: The normalized principles applied star schema is known as Snow flake

schema. Every dimension table is associated with sub dimension table.

Differences:

• A dimension table will not have parent table in star schema, whereas snow flake schemas have one or more parent tables.

• The dimensional table itself consists of hierarchies of dimensions in star schema, where as hierarchies are split into different tables in snow flake schema. The drilling down data from top most hierarchies to the lowermost hierarchies can be done.

A view is created by combining data from different tables. Hence, a view does not have data of itself.

On the other hand, Materialized view usually used in data warehousing has data. This data helps in decision making, performing calculations etc. The data stored by calculating it before hand using queries.

When a view is created, the data is not stored in the database. The data is created when a query is fired on the view. Whereas, data of a materialized view is stored.

What is junk dimension?

A single dimension is formed by lumping a number of small dimensions. This dimension is called a junk dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text attributes in dimension by transmitting them to a distinguished sub dimension is related to junk dimension.

What is degenerate dimension table?

A degenerate table does not have its own dimension table. It is derived from a fact table. The column (dimension) which is a part of fact table but does not map to any dimension.

E.g. employee_id

What is conformed fact and conformed dimensions use for?

Conformed fact in a warehouse allows itself to have same name in separate tables. They can be compared and combined mathematically. Conformed dimensions can be used across multiple data marts. These conformed dimensions have a static structure. Any dimension table that is used by multiple fact tables can be conformed dimensions.

What is the difference between Informatica 7.0 and 8.0?

The architecture of Power Center 8 has changed a lot:

1. PC8 is service-oriented for modularity, scalability and flexibility.

2. The Repository Service and Integration Service (as replacement for Rep Server and Informatica Server) can be run on different computers in a network (so called nodes), even redundantly.

3. Management is centralized, that means services can be started and stopped on nodes via a central web interface.

4. Client Tools access the repository via that centralized machine, resources are distributed dynamically.

5. Running all services on one machine is still possible, of course.

6. It has a support for unstructured data which includes spreadsheets, email, Microsoft Word files, presentations and .PDF documents. It provides high availability, seamless fail over, eliminating single points of failure.

7. It has added performance improvements (To bump up systems performance, Informatica has added "push down optimization" which moves data transformation processing to the native relational database I/O engine whenever it is most appropriate.)

8. Informatica has now added more tightly integrated data profiling, cleansing, and matching capabilities.

9. Informatica has added a new web based administrative console. 10. Ability to write a Custom Transformation in C++ or Java.

11. Midstream SQL transformation has been added in 8.1.1, not in 8.1. 12. Dynamic configuration of caches and partitioning

13. Java transformation is introduced. 14. User defined functions

15. PowerCenter 8 release has "Append to Target file" feature.

What is Data warehousing?

A data warehouse can be considered as a storage area where interest specific or relevant data is stored irrespective of the source. What actually is required to create a data warehouse can be considered as Data Warehousing. Data warehousing merges data from multiple sources into an easy and complete form.

What are fact tables and dimension tables?

As mentioned, data in a warehouse comes from the transactions. Fact table in a data warehouse consists of facts and/or measures. The nature of data in a fact table is usually numerical.

On the other hand, dimension table in a data warehouse contains fields used to describe the data in fact tables. A dimension table can provide additional and descriptive information (dimension) of the field of a fact table.

e.g. If I want to know the number of resources used for a task, my fact table will store the actual measure (of resources) while my Dimension table will store the task and resource details. Hence, the relation between a fact and dimension table is one to many.

What is ETL process in data warehousing?

ETL stands for Extraction, transformation and loading. That means extracting data from different sources such as flat files, databases or XML data, transforming this data depending on the application’s need and loads this data into data warehouse.

Explain the difference between data mining and data warehousing?

Data mining is a method for comparing large amounts of data for the purpose of finding

patterns. Data mining is normally used for models and forecasting. Data mining is the process of correlations, patterns by shifting through large data repositories using pattern recognition

techniques.

Data warehousing is the central repository for the data of several business systems in an enterprise. Data from various resources extracted and organized in the data warehouse selectively for analysis and accessibility.

What is an OLTP system and OLAP system?

OLTP stands for OnLine Transaction Processing. Applications that supports and manges

transactions which involve high volumes of data are supported by OLTP system. OLTP is based on client-server architecture and supports transactions across networks.

OLAP stands for OnLine Analytical Processing. Business data analysis and complex

calculations on low volumes of data are performed by OLAP. An insight of data coming from various resources can be gained by a user with the support of OLAP.

What are cubes?

Multi dimensional data is logically represented by Cubes in data warehousing. The dimension and the data are represented by the edge and the body of the cube respectively. OLAP environments view the data in the form of hierarchical cube. A cube typically includes the aggregations that are needed for business intelligence queries.

In document informatica (Page 60-65)