• No results found

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

N/A
N/A
Protected

Academic year: 2021

Share "Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Enterprise Solutions

Data Warehouse & Business Intelligence

Chapter-8

(2)

Learning Objectives

Concepts of Data Warehouse

Business Intelligence, Analytics & Big

Data

(3)

Concepts of Data Warehouse

A Data Warehouse is a subject-oriented, integrated,

time-variant and non-volatile collection of data in

support of management's decision making process,

that is, they are built to analyse a particular subject

area or domain or a subject area within a domain

-for example, "sales" within

“retail banking” would be

a particular subject area

Data Warehouses integrates data from multiple data

sources or collection of data from various sources of

the organization

Obviously, the data in a DWH is historical and hence

time variant

By the very fact that it is historical, the data is also

permanent and non-volatile and should never be

deleted or modified

(4)

Data Warehouse Architecture

Enterprise DWH are extremely complex entities and designs

and there is no generic architecture

However all DWHs have the following components or layers

as in the representation

(5)

below-Data Warehouse Architecture

 Data Source Layer – obviously, represents the different data sources that feed data into the DWH. These data sources may be of any types & format

 Data Extraction Layer – is the layer where data gets pulled from the data source into the DWH with some data cleansing though without any major Data Transformation

 Staging Area - is where the data sits prior to being cleansed and transformed into a DWH or Data Mart quality

 ETL Layer - is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an analytical nature

 Data Storage Layer – is the final storage for the cleansed and transformed data

 Data Logic Layer - is where business rules are stored

 Data Presentation Layer - Reporting Tools are used here and this layer gives shape to the format in which information is presented to the user

 Metadata Layer – Metadata is Data about Information stored in the DWH system

 System Operations Layer - includes information on how the DWH system operates, such as - ETL job status, access history, system performance, etc

(6)

Components of a Data Warehouse

Data Integrity - Data integrity refers to the validity of data - data is consistent and

correct.

Conceptual Data Model - identifies the highest-level relationships between the different entities.

Logical Data Model - A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database.

Physical data Model - Physical data model represents how the model will be built in the database.

Data Mart - Data marts are small slices of the data warehouse - they have a more limited audience and/or data content

Fact Table A table that stores quantitative information for analysis

Dimensional Model - Dimension Data Modelling is used for creating Summary Information for example, summarization of Sales by Day, Week, Month & Year or by Regions

Aggregation is a summary of detail data that is stored with or referred to by a cube

ETL - Extraction, Transformation, and Loading – is an ETL process to extract data from different types of systems, transform it into a structure that's more appropriate for reporting and analysis and finally loads it into the database and or cube(s)

OLAP - On-Line Analytical Processing - OLAP provides end users a quick way of slicing and dicing the data

(7)

Data Cleansing

 Data cleansing is a key process in DWH

 Data Cleansing is the process of altering data in a given storage to ensure that it is accurate and correct - deleting old, incomplete or duplicated data

 There are many ways to pursue data cleansing in various software and data storage architectures - mostly focused on careful review of data sets and the protocols associated with any particular data storage technology

 Data Cleansing issues are similar to problems that which archivists, database admin staff and others face

What is dirty data?

 Are data anomalies that create wrong outputs

 Dirty data is created when reality is different from what is captured and stored

Typical steps in Data Cleansing- Data Analysis

 Defining Transformation Workflows and Mapping Rules – Depends on the number of data sources, their degree of heterogeneity and the “dirtyness” of the data

 Verification – of the correctness and effectiveness of a transformation

 Transformation - Execution of the transformation steps either by running the ETL

 Backflow of cleaned data - After (single-source) errors are removed, the cleaned data should also replace the dirty data in the original sources

(8)

Business Intelligence & Big Data

 BI is a set of sophisticated Analytical Techniques & Tools used in identifying, extracting and analysing 'hard' business data - such as sales revenue by products or departments or associated costs and incomes

 Objectives of a BI action include - understanding of a firm's internal and external strengths and weaknesses, understanding of the relationships between different data for better decision making, detection of opportunities for innovation and cost reduction and optimal deployment of resources

 BI is accomplished through the use of special softwares and helps companies organize and analyse data to make better decisions – the data may be internal or from external data sources

 BI therefore is not one piece of software - generally include data mining tools, operational dashboards, reporting tools, search and query tools, analytics processing softwares, content viewer

 ISVs in the BI space include- SAP, Oracle, IBM, Microsoft, Information Builders, MicroStrategy and SAS. Some of the smaller niche players are Actuate Corporation, Alteryx, Logi Analytics, QlikTech and Tableau

(9)

Business Intelligence & Big Data

“Big Data” is a popular term used to describe the

exponential growth and availability of data, both

structured and unstructured

Big Data is defined in terms of

– Volume, Velocity,

Variety, Complexity and Variability

Why Big Data?

The hopeful vision is that organizations will be able to

take data from any source, harness relevant data and

analyse it to find answers that resolve key issues such as

– product development strategies & cost rationalization

It is intended that by combining big data and

high-powered analytics, it may be possible to - recalculate

entire risk portfolios in minutes, or identify root causes of

failures

(10)

DEH & BI Products

DWH & BI Products are categorized

as- Data Modelling  Data Mining  OLAP Tools  ETL Tools  BI Tools  Reporting Tools

The major companies in the DWH/BI space are

– IBM, SAS, Oracle,

TIBCO, Microstrategy, SAP, InformationBuilder, etc

There are also smaller and niche product companies

Some of the popular tools in the market are

 Erwin, Rational & Power Designer, Oracle Designer – for Data Modelling

 IBM Cognos, IBM SPSS, SAS Enterprise Miner, TIBCO, etc for Data Mining

 BO, Cognos, Microstrategy, Hyperion - OLAP

 Informatica, Cognos, BO, Websphere – ETL

 BI – Cognos, Netweaver, BO, Siebel

It is to be remembered that DWH & BI space is replete with tools from a

slew of companies and most large organizations use multiple

(seemingly ) redundant tool-set in their operations

References

Related documents

Data Extraction Data Prep & Transformation Data Mining Model Building Data Mining Model “Scoring” Data Preparation and Transformation Data Import Source
 Data SAS


Data Acquisition The process of extracting, transforming, and transporting data from the source systems and external data sources to the data warehouse database objects..

The major components of extraction include Data Source, Source Identifier, Retrieval, Join, Privacy Preserving Area and Data Staging Area.. All the above mentioned

Data Mart RDBMS Data Mart RDBMS ODS Mid-Tier OLTP Tools Mid-Tier MDB Source Databases Data Extraction Data Staging Data Transformation and Load Central Data Warehouse Architected

Data management Databases and database management system (DBMS) Extraction, transformation, and load (ETL) systems Data warehouses (DW), real-time DW, and data marts.

The goal of data warehouse (or data integration) testing is to verify that data moved from source to the target and the transformation rules have been applied as expected.. Type

Data Extraction Data Prep & Transformation Data Mining Model Building Data Mining Model “Scoring” Data Preparation and Transformation Data Import Source Data Work SAS. Area

Data Extraction Data Prep & Transformation Data Mining Model Building Data Mining Model “Scoring” Data Preparation and Transformation Data Import Source Data Work SAS. Area