• No results found

Microsoft. MCSA upgrade to SQL Server 2012 Certification Courseware. Version 1.0

N/A
N/A
Protected

Academic year: 2021

Share "Microsoft. MCSA upgrade to SQL Server 2012 Certification Courseware. Version 1.0"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Microsoft

MCSA upgrade to SQL Server

2012 Certification

Courseware

(2)

1

5/9/2013 1 ©2007 – Body Temple

5/9/2013

1

SQL Server 2012 for Data Warehousing

Appendix

MSCA SQL Server 2012 Upgrade Course

5/9/2013 2 ©2007 – Body Temple

5/9/2013

2

Deployment Models

(3)

5/9/2013 3 ©2007 – Body Temple

5/9/2013

3

Package Deployment Model

Storage

• MSDB • File System

Package Configurations

• Property values to be

set dynamically at run time

Package Deployment

Utility

• Generate all required files for easier deployment

Project Deployment Model

The SSIS catalog

• Storage and management for SSIS projects on a SQL Server 2012 instance

Folders

(4)

3

5/9/2013 5 ©2007 – Body Temple

5/9/2013

5

Deployment Model Comparison

Feature Package Deployment Project Deployment

Unit of Deployment Package Project

Storage File system or MSDB SSIS Catalog

Dynamic configuration Package configurations Environment variables

mapped to project-level parameters and connection managers

Compiled format Multiple .dtsx files Single .ispac file

Troubleshooting Configure logging for each

package

SSIS catalog includes built-in reports and views

5/9/2013 6 ©2007 – Body Temple

5/9/2013

6

Creating an SSIS Catalog

Pre-requisites • SQL Server 2012 • SQL CLR enabled Creating a catalog

• Use SQL Server Management Studio

(5)

5/9/2013 7 ©2007 – Body Temple

5/9/2013

7

Environments and Variables

•Environments

Execution contexts for projects

•Variables

Environment-specific values

that can be mapped to project parameters and connection manager properties at run time

Deploying an SSIS Project

Integration Services

Deployment Wizard

• SQL Server Data Tools • SQL Server

(6)

5

5/9/2013 9 ©2007 – Body Temple

5/9/2013

9

Viewing Project Execution Information

Integration Services

Dashboard provides

built-in reports

Additional sources

of information:

• Event Handlers • Error Outputs • Logging

• Debug Dump Files

5/9/2013 10 ©2007 – Body Temple

5/9/2013

10

Data Quality Services

(7)

5/9/2013 11 ©2007 – Body Temple

5/9/2013

11

What Is a Knowledge Base?

Repository of knowledge about data:

• Domains define values and rules for each field

• Matching policies define rules for identifying duplicate records

10

1

1

0

0

0

110

KB

What Is a Domain?

•Domains are specific to a data field

•Domains contain the rules for the data

•Domains can be individual or composite

KB

City Name State

Family Name First

(8)

7

5/9/2013 13 ©2007 – Body Temple 5/9/2013 13

KB

Name Family Name First Name •The Azure Marketplace hosts

specialist data cleansing providers

Set up an account Subscribe to a reference

service

Map your domain to the

reference service

What Is a Reference Data Service?

Address

5/9/2013 14 ©2007 – Body Temple

5/9/2013

14

Creating a Data Cleansing Project

1. Select a knowledge base 2. Map columns to domains

(9)

5/9/2013 15 ©2007 – Body Temple

5/9/2013

15

Viewing Cleansed Data

Output – The values for all fields after data cleansing

Source – The original value for fields that were mapped to domains and cleansed

Reason – The reason the output value was selected by the cleansing operation

Confidence – An indication of the confidence Data Quality Services estimates for corrected values

Status – The status of the output column (correct or corrected)

Using the Data Cleansing Data Flow

Transformation

Input columns to be cleansed

Select knowledge base and map

columns to domains

(10)

9

5/9/2013 17 ©2007 – Body Temple

5/9/2013

17

Creating a Matching Policy

•Define matching rules for business entities •Rules match entities based on domains:

 Similarity: Similar or exact match

 Weight: Percentage to apply if match succeeds

 Prerequisite: Mandatory domain match for rule to succeed

•If the combined weight of all matches meets or exceeds the rule’s minimum matching score, the entities are duplicates

5/9/2013 18 ©2007 – Body Temple

5/9/2013

18

Creating a Data Matching Project

1. Select a knowledge base 2. Map columns to domains 3. Review match clusters 4. Export matches and survivors

• Select survivorship rule:

• Pivot record

• Most complete and longest record

• Most complete record

(11)

5/9/2013 19 ©2007 – Body Temple

5/9/2013

19

Viewing Data Matching Results

Cluster ID – Identifier for a cluster of matched records Record ID – Identifier for a matched record

Matching Rule – The rule that produced the match Score – Combined weighting of match criteria

Pivot Mark – A matched record chosen arbitrarily by Data Quality Services as the pivot record for a cluster

Master Data Services

(12)

11

5/9/2013 21 ©2007 – Body Temple

5/9/2013

21

Master Data Services and Data Quality Services

Domain-Focused

Applicable to any dataset that contains domains in the KB

• Is “Seattle” a valid value for City?

Master Data Services

•Entity-Focused

•Applicable to specific instances of

business entities

 What is the correct name, address, and phone number for customer 1235?

Data Quality Services

Use Data Quality Services to:

•Cleanse data before importing into master data hub

•Apply matching policies to find duplicate master data entities

5/9/2013 22 ©2007 – Body Temple

5/9/2013

22

Components of Master Data Services

Master Data Services database

• Contains the objects that support Master Data Services Master Data Manager Web application

• Enables administrators to create and manage Master Data Services objects, such as entities and attributes • Enables data stewards to manage master data Master Data Services Configuration Manager

• Enables administrators to create the Master Data Services database and Web application Master Data Services Add-In for Microsoft Excel

• Enables administrators and data stewards to manage master data in Excel

(13)

5/9/2013 23 ©2007 – Body Temple

5/9/2013

23

What Is a Master Data Services Model?

A versioned data model for specific business item or area of the business Contains definitions for entities required in the business area

• Often an entity with the same name as the model, as well as related entities

Each entity has a defined set of attributes

• All entities have Code and Name attributes

• Attributes can be categorized in

attribute groups

Each instance of an entity is a known as a member

Customers Model

Version 1

Version 2 Version 3

Account Type Entity Customer Entity Attributes: • Code (string) • Name (string) • Code: 1 • Name: Standard Member • Code: 2 • Name: Premier Member • Code: 1235

• Name: Ben Smith

• Address: 1 High St, Seattle

• Phone: 555-12345

• AccountType: 1

• CreditLimit: 1000

Member

Attributes:

• Code (free-form text)

• Name (free-form text)

• Address (free-form text)

• Phone (free-form text)

• AccountType (domain-based)

• CreditLimit (free-form number) Contact Details Attribute Group

Editing a Model in Microsoft Excel

Use the Master Data Services Add-In for

Excel to connect to a model Create entities

(14)

13

5/9/2013 25 ©2007 – Body Temple

5/9/2013

25

Hierarchies and Collections

Derived Hierarchies

• Natural hierarchical groupings based on domain-based relationships

• Code: 1

• Name: Standard Account Type Member

• Code: 1235

• Name: Ben Smith

• Account Type: 1 Customer Member

• Code: 1267

• Name: Amy Strande

• Account Type: 1 Customer Member • Code: CustUS • Name: US Customers Consolidated Member • Code: 1235

• Name: Ben Smith

Leaf Member

• Code: 1267

• Name: Amy Strande

Leaf Member

• Code: CustEU

• Name: European Customers

Consolidated Member

• Code: 2214

• Name: Sabina Schütz

Leaf Member

Special Delivery Customers

• Code: Cust-EU

• Name: European Customers

Consolidated Member

• Code: 2600

• Name: Andrew Sinclair

Leaf Member

• Code: 1785

• Name: Doris Krieger

Leaf Member •Collections

Non-hierarchical groupings of

specific members

Can include leaf and

consolidated members from any hierarchy •Explicit Hierarchies Hierarchical groupings of specific members Can be mandatory or non-mandatory 5/9/2013 26 ©2007 – Body Temple 5/9/2013 26

Finding Duplicate Members

Create a Data Quality Services knowledge base with domains for entity attributes

(15)

5/9/2013 27 ©2007 – Body Temple

5/9/2013

27

Validating Members with Business Rules

•Define business rules

with the business rule expression editor •Publish business rules •Use business rules to

validate entity member data in:

 Explorer

 Excel

Master Data Hub Architecture

Users insert and update data in application data stores

CRM

Marketing System Order Processing System

Master Data Hub

Data Steward

Other consumers (e.g. Data Warehouse ETL)

SSIS

SSIS SSIS

(16)

15

5/9/2013 29 ©2007 – Body Temple

5/9/2013

29

Master Data Services Staging Tables

MDS generates a staging table for each:

• Leaf member • Consolidated member • Relationship stg.EntityName_Consolidated • ID • ImportType • ImportStatus_ID • Batch_ID • BatchTag • HierarchyName • ErrorCode • Code • Name • NewCode • <AttributeName> stg.EntityName_Relationship • ID • RelationshipType • ImportStatus_ID • Batch_ID • BatchTag • HierarchyName • ParentCode • ChildCode • SortOrder • ErrorCode stg.EntityName_Leaf • ID • ImportType • ImportStatus_ID • Batch_ID • BatchTag • ErrorCode • Code • Name • NewCode • <AttributeName> 5/9/2013 30 ©2007 – Body Temple 5/9/2013 30

Staging and Importing Data

1.

Load data into staging tables

• Using SSIS, Import and Export Wizard, Transact-SQL, etc.

2.

Run staging stored procedures

• stg.udp.EntityName_Leaf

• stg.udp.EntityName_Consolidated • stg.udp.EntityName_Relationship

3.

View import status in Master Data Manager

4.

Match data to identify any duplicates

(17)

5/9/2013 31 ©2007 – Body Temple

5/9/2013

31

Consuming Master Data with Subscription Views

•Create subscription views

in Master Data Manager •Formats:

 Leaf Attributes

 Consolidated Attributes

 Collection Attributes

 Collections

 Explicit Parent Child

 Explicit levels

 Derived Parent Child

 Derived Levels

Incremental ETL

(18)

17

5/9/2013 33 ©2007 – Body Temple

5/9/2013

33

Overview of Data Warehouse Load Cycles

Extract changes from data sources

Refresh the data warehouse based

on changes

Data Warehouse Staging Database

ETL process inserts or modifies data in the data warehouse based on changes

ETL process extracts new and modified data

Users modify data in business applications

5/9/2013 34 ©2007 – Body Temple

5/9/2013

34

Considerations for Incremental ETL

Data modifications to be tracked

Load order

Dimension keys

Updating dimension members

Updating fact records

(19)

5/9/2013 35 ©2007 – Body Temple

5/9/2013

35

Slowly Changing Dimensions

Types of change to a dimension member:

• Type 1: Changing attributes are updated in the dimension record

• Type 2: Historical attribute changes result in a new record

• Type 3: The original and current values of historical attributes are stored in the dimension record

Key AltKey Name Phone City

101 C123 Mary 5551234 New York

Key AltKey Name Phone City

101 C123 Mary 5554321 New York

Key AltKey Name Phone City Current

101 C123 Mary 5551234 New York True Key AltKey Name Phone City Current

101 C123 Mary 5551234 New York False

102 C123 Mary 5551234 Seattle True

Key AltKey Name Phone OriginalCity CurrentCity EffectiveDate

101 C123 Mary 5551234 New York Seattle 6/7/11

Key AltKey Name Phone OriginalCity CurrentCity EffectiveDate

101 C123 Mary 5551234 New York New York 1/1/00

Options for Extracting Modified Data

•Extract all records

•Store a primary key and checksum

•Use a datetime column as a “high water mark”

•Use Change Data Capture

(20)

19

5/9/2013 37 ©2007 – Body Temple

5/9/2013

37

Extracting Rows Based on a Datetime Column

1. Note the current time

2. Retrieve the last extraction time from an extraction log

3. Extract and transfer records that were modified between the last extraction and the current time

4. Replace the stored last extraction value with the current time

Staging Database Data Source 1 2 3 4 Extraction Log 5/9/2013 38 ©2007 – Body Temple 5/9/2013 38

Change Data Capture

1. Enable Change Data Capture

2. Map start and end times to log sequence numbers

3. Handle null log sequence numbers

4. Extract changes between log sequence numbers EXEC sys.sp_cdc_enable_db

EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Customers', @role_name = NULL, @supports_net_changes = 1

DECLARE @from_lsn binary(10), @to_lsn binary(10);

SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @StartDate) SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @EndDate)

IF (@from_lsn IS NULL) OR (@to_lsn IS NULL)

-- There may have been no transactions in the timeframe

(21)

5/9/2013 39 ©2007 – Body Temple

5/9/2013

39

Extracting Data with Change Data Capture

1. Identify the end-point for the extraction (LSN or DateTime)

2. Retrieve the last extraction end-point from an extraction log

3. Extract and transfer records that were modified during the LSN range defined by the previous extraction end-point and the current end-point

4. Replace the logged end-point value with the current end-point

Staging Database Data Source

1 2

3 4

Extraction Log

The CDC Control Task and Data Flow Components

Initial Extraction Incremental Extraction

CDC Control

Mark Initial Load Start

Source

Staged Inserts

CDC Control

Mark Initial Load End

CDC State Table CDC State

Variable

CDC Control

Get Processing Range

CDC Source

Staged Inserts

CDC Control

Mark Processed Range

CDC State Variable

CDC Splitter

Staged Updates Deletes Staged

1. A CDC Control Task records the starting LSN 1. CDC Control Task establishes the range of LSNs to be

(22)

21

5/9/2013 41 ©2007 – Body Temple

5/9/2013

41

Change Tracking

1. Enable Change Tracking

2. Record the current version and extract the initial data

3. Extract changes since the last extracted version, and then update the last extracted version

ALTER DATABASE Sales

SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON) ALTER TABLE Salespeople

ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = OFF)

SET @CurrentVersion = CHANGE_TRACKING_CURRENT_VERSION(); SELECT * FROM Salespeople

SET @LastExtractedVersion = @CurrentVersion

SET @CurrentVersion = CHANGE_TRACKING_CURRENT_VERSION();

SELECT * FROM CHANGETABLE(CHANGES Salespeople, @LastExtractedVersion) CT INNER JOIN Salespeople s ON CT.SalespersonID = s.SalespersonID SET @LastExtractedVersion = @CurrentVersion

Tip: Use snapshot isolation to ensure consistency

5/9/2013 42 ©2007 – Body Temple

5/9/2013

42

Extracting Data with Change Tracking

1. Retrieve the last version number that was extracted from an extraction log

2. Extract and transfer records that were modified since the last version, retrieving the current version number

3. Replace the logged version number with the current version number

Staging Database Data Source

1

2 3

(23)

5/9/2013 43 ©2007 – Body Temple

5/9/2013

43

Options for Incrementally Loading Data

Insert, Update, or Delete from CDC Output Tables

Use a Lookup transformation

Use the Slowly Changing Dimension transformation

Use the MERGE statement

Use a checksum

Using CDC Output Tables

Staging DB Data Warehouse

Staging DB Data Warehouse

Execute SQL Task INSERT… FROM Execute SQL Task UPDATE… FROM JOIN ON BizKey Source Staged Inserts Source Staged Updates Destination Dimension Table OLE DB Command UPDATE…

Staging and Data Warehouse Co-located Remote Data Warehouse

(24)

23

5/9/2013 45 ©2007 – Body Temple

5/9/2013

45

The Lookup Transformation

Redirect non-matched rows to the no match output

Look up extracted data in a dimension or fact table based on a

business key or unique combination of keys

If no match is found, insert a new record

Optionally, if a match is found, update non-key columns to apply a

type 1 change

Match No Match Lookup Update Insert 5/9/2013 46 ©2007 – Body Temple 5/9/2013 46

The Slowly Changing Dimension Transformation

Changing Attributes

(Type 1) Inferred members

SCD OLE DB Command to

update existing record

OLE DB Destination for new records OLE DB Command to insert minimal record Historical Attributes (Type 2) Derived Column to

add current row indicator column

OLE DB Command to set existing record’s

current row indicator to false

Union All

(25)

5/9/2013 47 ©2007 – Body Temple

5/9/2013

47

The MERGE Statement

Matches source and target rows

Performs insert, update, or delete operations based on row

matching results

MERGE INTO DW.dbo.FactSales as tgt

USING

(SELECT c.CustomerKey, s.OrderNo, s.Quantity, s.Amount FROM Staging.dbo.Sales s

INNER JOIN DW.dbo.DimCustomer c ON s.CustomerID = c.CustomerBizKey) AS src (CustomerKey, OrderNo, Quantity, Amount)

ON

(src.CustomerKey = tgt.CustomerKey AND src.OrderNo = tgt.OrderNo) WHEN MATCHED THEN

UPDATE

SET Quantity = src.Quantity, Amount = src.Amount

WHEN NOT MATCHED THEN

INSERT (CustomerKey, OrderNo, Quantity, Amount)

VALUES (src.CustomerKey, src.OrderNo, src.Quantity, src.Amount)

The target table being loaded Matching criteria to identify existing records Action when existing records are found Action when no existing records are found source data

SQL Server 2012 for Data Warehousing

Appendix

MSCA SQL Server 2012 Upgrade Course

References

Related documents

10331 Introduction to Microsoft SQL Server 2008 R2 Master Data Services English. 10333 Introduction to Microsoft SQL Server 2008 R2 Parallel Data Warehouse

A master data management (mdm) solution enables businesses to manage and align their enterprise master data assets (product, customer, vendor, etc.) and build and support

According to the international experience, federal authorities can carry out six groups of functions for support of mechanisms of development of innovative

Icrh}{ionw ln {hawän kh {hawän kh H}bxhwva} H}bxhwva} Madgicnka} nplimnglh} Madgicnka} nplimnglh} hc }xphwbimih} hc }xphwbimih} kh pnwhk kh pnwhk khlonkn,

The present study considers three types of source models and three different ground motion prediction equations each for various tectonic provinces in the study area.. The source

The Microsoft MDM solution will be built upon the solid foundations of Microsoft SQL Server, Microsoft Office SharePoint Server, Windows Communication Foundation, the ADO.Net

In this section, we will construct transfer maps between (Grothendieck-)Witt groups with respect to proper morphisms and establish some properties such as the base change and

Several inventory control strategies for airline includes material management and supply chain synchronization and outsourcing component management will also be discussed..