data warehouse question and answers 3

Q. What is the necessity of having surrogate keys?

Production may reuse keys that it has purged but that you are still maintaining

Production might legitimately overwrite some part of a product description or a customer description with new values but not change the product key or the customer key to a new value.

We might be wondering what to do about the revised attribute values (slowly changing dimension crisis)

Production may generalize its key format to handle some new situation in the transaction system.

E.g. changing the production keys from integers to alphanumeric or may have 12-byte keys you are used to have become 20-byte keys

Acquisition of companies

Q. What are the advantages of using Surrogate Keys?

We can save substantial storage space with integer valued surrogate keys

Eliminate administrative surprises coming from production

Potentially adapt to big surprises like a merger or an acquisition

Have a flexible mechanism for handling slowly changing dimensions

Q. What are Factless Fact tables?

Fact tables which do not have any facts are called factless fact tables. They may consist of nothing but keys.

There are two kinds of fact tables that do not have any facts at all.

The first type of factless fact table is a table that records an event. Many event-tracking tables in dimensional data warehouses turn out to be factless.

E.g. A student tracking system that detects each student attendance event each day.

The second type of factless fact table is called a coverage table. Coverage tables are frequently needed when a primary fact table in a dimensional data warehouse is sparse.

E.g. A sales fact table that records the sales of products in stores on particular days under each promotion condition. The sales fact table does answer many interesting questions but cannot answer questions about things that did not happen. For instance, it cannot answer the question,

“which products were in promotion that did not sell?” because it contains only the records of products that did sell. In this case the coverage table comes to the rescue. A record is placed in the coverage table for each product in each store that is on promotion in each time period.

Q. What are Causal dimension?

A causal dimension is a kind of advisory dimension that should not change the fundamental grain of a fact table.

E.g. why the customer bought the product? It can be due to promotion, sales etc.

Q. What is meant by Drill Through? (Mascot)

Operating Data Source - directly connects to application database Q. What is Operational Data Store? (Mascot)

Q. What is BI? And why do we need BI?

Business Intelligence, it is an ongoing process of various integration packages to analyze data.

Q What is Slicing and Dicing ? How we can do in Impromptu (We cannot do)? It is done only in Powerplay.

GENERAL

Q. Explain the Project. (Polaris)

Explain about the various projects (MIDAS2/VIP).

Why was MIDAS2 or VIP or SCI developed.

Q. What is the size of the database in your project? (Polaris) Approximately 900GB.

Q. What is the daily data volume (in GB/records)? Or What is the size of the data extracted in the extraction process? (Polaris)

Q. How many Data marts are there in your project?

Q. How many Fact and Dimension tables are there in your project?

Q. What is the size of Fact table in your project?

Q. How many dimension tables did you had in your project and name some dimensions (columns)? (Mascot)

Q. Name some measures in your fact table? (Mascot) Q. Why couldn’t u go for Snowflake schema? (Mascot) Q. How many Measures u have created? (Mascot)

Q. How many Facts & Dimension Tables are there in your Project? (Mascot) Q. Have u created Datamarts? (Mascot)

Q. What is the difference between OLTP and OLAP?

OLAP - Online Analytical processing, mainly required for DSS, data is in denormalized manner and mainly used for non volatile data, highly indexed, improve query response time

OLTP - Transactional Processing - DML, highly normalized to reduce deadlock & increase concurrency

Q. What is the difference between OLTP and data warehouse?

Operational System Data Warehouse Transaction Processing Query Processing Time Sensitive History Oriented

Operator View Managerial View

Organized by transactions (Order, Input, Inventory) Organized by subject (Customer, Product) Relatively smaller database Large database size

Many concurrent users Relatively few concurrent users Volatile Data Non Volatile Data

Stores all data Stores relevant data Not Flexible Flexible

Q. Explain the DW life cycle

Data warehouses can have many different types of life cycles with independent data marts. The following is an example of a data warehouse life cycle.

In the life cycle of this example, four important steps are involved.

Extraction - As a first step, heterogeneous data from different online transaction processing systems is extracted. This data becomes the data source for the data warehouse.

Cleansing/transformation - The source data is sent into the populating systems where the data is cleansed, integrated, consolidated, secured and stored in the corporate or central data

warehouse.

Distribution - From the central data warehouse, data is distributed to independent data marts specifically designed for the end user.

Analysis - From these data marts, data is sent to the end users who access the data stored in the data mart depending upon their requirement.

Q. What is the life cycle of DW?

Getting data from OLTP systems from diff data sources

Analysis & staging - Putting in a staging layer- cleaning, purging, putting surrogate keys, SCM ,

dimensional modeling Loading

Writing of metadata

Q. What are the different Reporting and ETL tools available in the market?

Q. What is a data warehouse?

A data warehouse is a database designed to support a broad range of decision tasks in a specific organization. It is usually batch updated and structured for rapid online queries and managerial summaries. Data warehouses contain large amounts of historical data which are derived from transaction data, but it can include data from other sources also. It is designed for query and analysis rather than for transaction processing.

It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

The term data warehousing is often used to describe the process of creating, managing and using a data warehouse.

Q. What is a data mart?

A data mart is a selected part of the data warehouse which supports specific decision support application requirements of a company’s department or geographical region. It usually contains simple replicates of warehouse partitions or data that has been further summarized or derived from base warehouse data. Instead of running ad hoc queries against a huge data warehouse, data marts allow the efficient execution of predicted queries over a significantly smaller database.

Q. How do I differentiate between a data warehouse and a data mart? (KPIT Infotech Pune, Mascot)

A data warehouse is for very large databases (VLDBs) and a data mart is for smaller databases.

The difference lies in the scope of the things with which they deal.

A data mart is an implementation of a data warehouse with a small and more tightly restricted scope of data and data warehouse functions. A data mart serves a single department or part of an organization. In other words, the scope of a data mart is smaller than the data warehouse. It is a data warehouse for a smaller group of end

In document informatica (Page 33-36)