Microsoft
MCSA upgrade to SQL Server
2012 Certification
Courseware
1
5/9/2013 1 ©2007 – Body Temple
5/9/2013
1
SQL Server 2012 for Data Warehousing
Appendix
MSCA SQL Server 2012 Upgrade Course
5/9/2013 2 ©2007 – Body Temple
5/9/2013
2
Deployment Models
5/9/2013 3 ©2007 – Body Temple
5/9/2013
3
Package Deployment Model
Storage
• MSDB • File SystemPackage Configurations
• Property values to beset dynamically at run time
Package Deployment
Utility
• Generate all required files for easier deployment
Project Deployment Model
The SSIS catalog
• Storage and management for SSIS projects on a SQL Server 2012 instance
Folders
3
5/9/2013 5 ©2007 – Body Temple
5/9/2013
5
Deployment Model Comparison
Feature Package Deployment Project Deployment
Unit of Deployment Package Project
Storage File system or MSDB SSIS Catalog
Dynamic configuration Package configurations Environment variables
mapped to project-level parameters and connection managers
Compiled format Multiple .dtsx files Single .ispac file
Troubleshooting Configure logging for each
package
SSIS catalog includes built-in reports and views
5/9/2013 6 ©2007 – Body Temple
5/9/2013
6
Creating an SSIS Catalog
Pre-requisites • SQL Server 2012 • SQL CLR enabled Creating a catalog
• Use SQL Server Management Studio
5/9/2013 7 ©2007 – Body Temple
5/9/2013
7
Environments and Variables
•Environments
Execution contexts for projects
•Variables
Environment-specific values
that can be mapped to project parameters and connection manager properties at run time
Deploying an SSIS Project
Integration Services
Deployment Wizard
• SQL Server Data Tools • SQL Server
5
5/9/2013 9 ©2007 – Body Temple
5/9/2013
9
Viewing Project Execution Information
Integration Services
Dashboard provides
built-in reports
Additional sources
of information:
• Event Handlers • Error Outputs • Logging• Debug Dump Files
5/9/2013 10 ©2007 – Body Temple
5/9/2013
10
Data Quality Services
5/9/2013 11 ©2007 – Body Temple
5/9/2013
11
What Is a Knowledge Base?
Repository of knowledge about data:
• Domains define values and rules for each field
• Matching policies define rules for identifying duplicate records
10
1
1
0
0
0
110KB
What Is a Domain?
•Domains are specific to a data field
•Domains contain the rules for the data
•Domains can be individual or composite
KB
City Name State
Family Name First
7
5/9/2013 13 ©2007 – Body Temple 5/9/2013 13KB
Name Family Name First Name •The Azure Marketplace hostsspecialist data cleansing providers
Set up an account Subscribe to a reference
service
Map your domain to the
reference service
What Is a Reference Data Service?
Address
5/9/2013 14 ©2007 – Body Temple
5/9/2013
14
Creating a Data Cleansing Project
1. Select a knowledge base 2. Map columns to domains
5/9/2013 15 ©2007 – Body Temple
5/9/2013
15
Viewing Cleansed Data
• Output – The values for all fields after data cleansing
• Source – The original value for fields that were mapped to domains and cleansed
• Reason – The reason the output value was selected by the cleansing operation
• Confidence – An indication of the confidence Data Quality Services estimates for corrected values
• Status – The status of the output column (correct or corrected)
Using the Data Cleansing Data Flow
Transformation
Input columns to be cleansed
Select knowledge base and map
columns to domains
9
5/9/2013 17 ©2007 – Body Temple
5/9/2013
17
Creating a Matching Policy
•Define matching rules for business entities •Rules match entities based on domains:
Similarity: Similar or exact match
Weight: Percentage to apply if match succeeds
Prerequisite: Mandatory domain match for rule to succeed
•If the combined weight of all matches meets or exceeds the rule’s minimum matching score, the entities are duplicates
5/9/2013 18 ©2007 – Body Temple
5/9/2013
18
Creating a Data Matching Project
1. Select a knowledge base 2. Map columns to domains 3. Review match clusters 4. Export matches and survivors
• Select survivorship rule:
• Pivot record
• Most complete and longest record
• Most complete record
5/9/2013 19 ©2007 – Body Temple
5/9/2013
19
Viewing Data Matching Results
Cluster ID – Identifier for a cluster of matched records Record ID – Identifier for a matched record
Matching Rule – The rule that produced the match Score – Combined weighting of match criteria
Pivot Mark – A matched record chosen arbitrarily by Data Quality Services as the pivot record for a cluster
Master Data Services
11
5/9/2013 21 ©2007 – Body Temple
5/9/2013
21
Master Data Services and Data Quality Services
Domain-Focused
Applicable to any dataset that contains domains in the KB
• Is “Seattle” a valid value for City?
Master Data Services
•Entity-Focused
•Applicable to specific instances of
business entities
What is the correct name, address, and phone number for customer 1235?
Data Quality Services
Use Data Quality Services to:
•Cleanse data before importing into master data hub
•Apply matching policies to find duplicate master data entities
5/9/2013 22 ©2007 – Body Temple
5/9/2013
22
Components of Master Data Services
Master Data Services database
• Contains the objects that support Master Data Services Master Data Manager Web application
• Enables administrators to create and manage Master Data Services objects, such as entities and attributes • Enables data stewards to manage master data Master Data Services Configuration Manager
• Enables administrators to create the Master Data Services database and Web application Master Data Services Add-In for Microsoft Excel
• Enables administrators and data stewards to manage master data in Excel
5/9/2013 23 ©2007 – Body Temple
5/9/2013
23
What Is a Master Data Services Model?
A versioned data model for specific business item or area of the business Contains definitions for entities required in the business area
• Often an entity with the same name as the model, as well as related entities
Each entity has a defined set of attributes
• All entities have Code and Name attributes
• Attributes can be categorized in
attribute groups
Each instance of an entity is a known as a member
Customers Model
Version 1
Version 2 Version 3
Account Type Entity Customer Entity Attributes: • Code (string) • Name (string) • Code: 1 • Name: Standard Member • Code: 2 • Name: Premier Member • Code: 1235
• Name: Ben Smith
• Address: 1 High St, Seattle
• Phone: 555-12345
• AccountType: 1
• CreditLimit: 1000
Member
Attributes:
• Code (free-form text)
• Name (free-form text)
• Address (free-form text)
• Phone (free-form text)
• AccountType (domain-based)
• CreditLimit (free-form number) Contact Details Attribute Group
Editing a Model in Microsoft Excel
Use the Master Data Services Add-In forExcel to connect to a model Create entities
13
5/9/2013 25 ©2007 – Body Temple
5/9/2013
25
Hierarchies and Collections
Derived Hierarchies
• Natural hierarchical groupings based on domain-based relationships
• Code: 1
• Name: Standard Account Type Member
• Code: 1235
• Name: Ben Smith
• Account Type: 1 Customer Member
• Code: 1267
• Name: Amy Strande
• Account Type: 1 Customer Member • Code: CustUS • Name: US Customers Consolidated Member • Code: 1235
• Name: Ben Smith
Leaf Member
• Code: 1267
• Name: Amy Strande
Leaf Member
• Code: CustEU
• Name: European Customers
Consolidated Member
• Code: 2214
• Name: Sabina Schütz
Leaf Member
Special Delivery Customers
• Code: Cust-EU
• Name: European Customers
Consolidated Member
• Code: 2600
• Name: Andrew Sinclair
Leaf Member
• Code: 1785
• Name: Doris Krieger
Leaf Member •Collections
Non-hierarchical groupings of
specific members
Can include leaf and
consolidated members from any hierarchy •Explicit Hierarchies Hierarchical groupings of specific members Can be mandatory or non-mandatory 5/9/2013 26 ©2007 – Body Temple 5/9/2013 26
Finding Duplicate Members
Create a Data Quality Services knowledge base with domains for entity attributes
5/9/2013 27 ©2007 – Body Temple
5/9/2013
27
Validating Members with Business Rules
•Define business ruleswith the business rule expression editor •Publish business rules •Use business rules to
validate entity member data in:
Explorer
Excel
Master Data Hub Architecture
Users insert and update data in application data stores
CRM
Marketing System Order Processing System
Master Data Hub
Data Steward
Other consumers (e.g. Data Warehouse ETL)
SSIS
SSIS SSIS
15
5/9/2013 29 ©2007 – Body Temple
5/9/2013
29
Master Data Services Staging Tables
MDS generates a staging table for each:
• Leaf member • Consolidated member • Relationship stg.EntityName_Consolidated • ID • ImportType • ImportStatus_ID • Batch_ID • BatchTag • HierarchyName • ErrorCode • Code • Name • NewCode • <AttributeName> stg.EntityName_Relationship • ID • RelationshipType • ImportStatus_ID • Batch_ID • BatchTag • HierarchyName • ParentCode • ChildCode • SortOrder • ErrorCode stg.EntityName_Leaf • ID • ImportType • ImportStatus_ID • Batch_ID • BatchTag • ErrorCode • Code • Name • NewCode • <AttributeName> 5/9/2013 30 ©2007 – Body Temple 5/9/2013 30
Staging and Importing Data
1.
Load data into staging tables
• Using SSIS, Import and Export Wizard, Transact-SQL, etc.
2.
Run staging stored procedures
• stg.udp.EntityName_Leaf
• stg.udp.EntityName_Consolidated • stg.udp.EntityName_Relationship
3.
View import status in Master Data Manager
4.
Match data to identify any duplicates
5/9/2013 31 ©2007 – Body Temple
5/9/2013
31
Consuming Master Data with Subscription Views
•Create subscription viewsin Master Data Manager •Formats:
Leaf Attributes
Consolidated Attributes
Collection Attributes
Collections
Explicit Parent Child
Explicit levels
Derived Parent Child
Derived Levels
Incremental ETL
17
5/9/2013 33 ©2007 – Body Temple
5/9/2013
33
Overview of Data Warehouse Load Cycles
Extract changes from data sources
Refresh the data warehouse based
on changes
Data Warehouse Staging Database
ETL process inserts or modifies data in the data warehouse based on changes
ETL process extracts new and modified data
Users modify data in business applications
5/9/2013 34 ©2007 – Body Temple
5/9/2013
34
Considerations for Incremental ETL
Data modifications to be tracked
Load order
Dimension keys
Updating dimension members
Updating fact records
5/9/2013 35 ©2007 – Body Temple
5/9/2013
35
Slowly Changing Dimensions
Types of change to a dimension member:
• Type 1: Changing attributes are updated in the dimension record
• Type 2: Historical attribute changes result in a new record
• Type 3: The original and current values of historical attributes are stored in the dimension record
Key AltKey Name Phone City
101 C123 Mary 5551234 New York
Key AltKey Name Phone City
101 C123 Mary 5554321 New York
Key AltKey Name Phone City Current
101 C123 Mary 5551234 New York True Key AltKey Name Phone City Current
101 C123 Mary 5551234 New York False
102 C123 Mary 5551234 Seattle True
Key AltKey Name Phone OriginalCity CurrentCity EffectiveDate
101 C123 Mary 5551234 New York Seattle 6/7/11
Key AltKey Name Phone OriginalCity CurrentCity EffectiveDate
101 C123 Mary 5551234 New York New York 1/1/00
Options for Extracting Modified Data
•Extract all records•Store a primary key and checksum
•Use a datetime column as a “high water mark”
•Use Change Data Capture
19
5/9/2013 37 ©2007 – Body Temple
5/9/2013
37
Extracting Rows Based on a Datetime Column
1. Note the current time
2. Retrieve the last extraction time from an extraction log
3. Extract and transfer records that were modified between the last extraction and the current time
4. Replace the stored last extraction value with the current time
Staging Database Data Source 1 2 3 4 Extraction Log 5/9/2013 38 ©2007 – Body Temple 5/9/2013 38
Change Data Capture
1. Enable Change Data Capture
2. Map start and end times to log sequence numbers
3. Handle null log sequence numbers
4. Extract changes between log sequence numbers EXEC sys.sp_cdc_enable_db
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Customers', @role_name = NULL, @supports_net_changes = 1
DECLARE @from_lsn binary(10), @to_lsn binary(10);
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @StartDate) SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @EndDate)
IF (@from_lsn IS NULL) OR (@to_lsn IS NULL)
-- There may have been no transactions in the timeframe
5/9/2013 39 ©2007 – Body Temple
5/9/2013
39
Extracting Data with Change Data Capture
1. Identify the end-point for the extraction (LSN or DateTime)
2. Retrieve the last extraction end-point from an extraction log
3. Extract and transfer records that were modified during the LSN range defined by the previous extraction end-point and the current end-point
4. Replace the logged end-point value with the current end-point
Staging Database Data Source
1 2
3 4
Extraction Log
The CDC Control Task and Data Flow Components
Initial Extraction Incremental Extraction
CDC Control
Mark Initial Load Start
Source
Staged Inserts
CDC Control
Mark Initial Load End
CDC State Table CDC State
Variable
CDC Control
Get Processing Range
CDC Source
Staged Inserts
CDC Control
Mark Processed Range
CDC State Variable
CDC Splitter
Staged Updates Deletes Staged
1. A CDC Control Task records the starting LSN 1. CDC Control Task establishes the range of LSNs to be
21
5/9/2013 41 ©2007 – Body Temple
5/9/2013
41
Change Tracking
1. Enable Change Tracking
2. Record the current version and extract the initial data
3. Extract changes since the last extracted version, and then update the last extracted version
ALTER DATABASE Sales
SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 7 DAYS, AUTO_CLEANUP = ON) ALTER TABLE Salespeople
ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = OFF)
SET @CurrentVersion = CHANGE_TRACKING_CURRENT_VERSION(); SELECT * FROM Salespeople
SET @LastExtractedVersion = @CurrentVersion
SET @CurrentVersion = CHANGE_TRACKING_CURRENT_VERSION();
SELECT * FROM CHANGETABLE(CHANGES Salespeople, @LastExtractedVersion) CT INNER JOIN Salespeople s ON CT.SalespersonID = s.SalespersonID SET @LastExtractedVersion = @CurrentVersion
Tip: Use snapshot isolation to ensure consistency
5/9/2013 42 ©2007 – Body Temple
5/9/2013
42
Extracting Data with Change Tracking
1. Retrieve the last version number that was extracted from an extraction log
2. Extract and transfer records that were modified since the last version, retrieving the current version number
3. Replace the logged version number with the current version number
Staging Database Data Source
1
2 3
5/9/2013 43 ©2007 – Body Temple
5/9/2013
43
Options for Incrementally Loading Data
Insert, Update, or Delete from CDC Output Tables
Use a Lookup transformation
Use the Slowly Changing Dimension transformation
Use the MERGE statement
Use a checksum
Using CDC Output Tables
Staging DB Data Warehouse
Staging DB Data Warehouse
Execute SQL Task INSERT… FROM Execute SQL Task UPDATE… FROM JOIN ON BizKey Source Staged Inserts Source Staged Updates Destination Dimension Table OLE DB Command UPDATE…
Staging and Data Warehouse Co-located Remote Data Warehouse
23
5/9/2013 45 ©2007 – Body Temple
5/9/2013
45
The Lookup Transformation
Redirect non-matched rows to the no match output
Look up extracted data in a dimension or fact table based on a
business key or unique combination of keys
If no match is found, insert a new record
Optionally, if a match is found, update non-key columns to apply a
type 1 change
Match No Match Lookup Update Insert 5/9/2013 46 ©2007 – Body Temple 5/9/2013 46The Slowly Changing Dimension Transformation
Changing Attributes
(Type 1) Inferred members
SCD OLE DB Command to
update existing record
OLE DB Destination for new records OLE DB Command to insert minimal record Historical Attributes (Type 2) Derived Column to
add current row indicator column
OLE DB Command to set existing record’s
current row indicator to false
Union All
5/9/2013 47 ©2007 – Body Temple
5/9/2013
47
The MERGE Statement
Matches source and target rows
Performs insert, update, or delete operations based on row
matching results
MERGE INTO DW.dbo.FactSales as tgt
USING
(SELECT c.CustomerKey, s.OrderNo, s.Quantity, s.Amount FROM Staging.dbo.Sales s
INNER JOIN DW.dbo.DimCustomer c ON s.CustomerID = c.CustomerBizKey) AS src (CustomerKey, OrderNo, Quantity, Amount)
ON
(src.CustomerKey = tgt.CustomerKey AND src.OrderNo = tgt.OrderNo) WHEN MATCHED THEN
UPDATE
SET Quantity = src.Quantity, Amount = src.Amount
WHEN NOT MATCHED THEN
INSERT (CustomerKey, OrderNo, Quantity, Amount)
VALUES (src.CustomerKey, src.OrderNo, src.Quantity, src.Amount)
The target table being loaded Matching criteria to identify existing records Action when existing records are found Action when no existing records are found source data