SAP Data Services Hacks Auto Generating Data Migration Jobs Shobhit Acharya Session# 3507

(1)

SAP Data Services Hacks

Auto Generating Data Migration Jobs

Shobhit Acharya Session# 3507

(2)

 Improve data migration efficiency using SAP Data

Services and implementing a few custom approaches that will speed up the extraction and load of source data.

These approaches deal with

 Programmatically generating data migration jobs to replace labor

intensive and monotonous job design

 Using xml import mechanism to create job templates

 Using datastore configurations to ingest multiple instances of

identical source databases

(3)

Introductions & Overview

Data Migration & SAP EIM tools

• Data Services: Designer & Workbench • Information Steward

• Review a baseline framework

The use case for DS Workbench Alternative solutions for efficiency

• Understanding the DS Job xml structure

• Developing job generation programs for automation • Datastore configurations

• When to use these programs

Demo

(4)

Data Migration

Assess Extract Consolidate Cleanse Load Reconcile

Data migrations at an enterprise scale need a focus on migrating data efficiently, quickly, repeatedly and assuredly. The goal is not just to move and convert data; its to ensure that the data is of high quality and supports the business processes and target system’s operational needs.

(5)

Data Migration

80% of organizations will underestimate the costs related to the data acquisition tasks by an average of 50

percent.

(6)

SAP Data Services & Information Steward

(7)

One Solution that Provides • Data Integration • Data Quality • Text Data Processing One server to execute all capabilities One design environment to manage all development One administration console to monitor all functions

(8)

Business

Measure and compare against Information governance rules and standards

SAP Information Steward

IT

Share data quality metrics and problems with business

(9)

Case Study

Review a baseline Data Migration Framework

Staging Area IVM Target System Legacy / Staging 1 2 3 4 7 8 BoA Staging BOBJ (Reporting) Source SAP BODS SAP IS 9 10 12 13 14 15 16 Extraction 1 Ingestion into Stage 2 Initial Data Profiling 3 Transform to common structure 4 Initial Health Check 5 Auto De-dup & Cleanse 6 Apply Relevancy Rules 7 Secondary Health Check 8 _Facilitated Cleansing 9 Consolidate cleansed data 10 Reference Data validation 11 Fix Reference Data issues 12 Pre-load sign off 13 Load to Target System 14 Reconciliation 15 Post load sign off 16 Transformed 5 6 Target Relevant Load BoA 11

(10)

 25+ distinct source systems

 Multiple source product versions

 Need separate job streams

 Need separate job control

 Over 400 databases to ingest

 Hadoop as staging

 Multiple waves of migration

Case Study

Ingestion Scope Legacy / Staging 1 2 3 Source SAP BODS Extraction 1 Ingestion into Stage 2 _{Initial Data} Profiling 3

(11)

The use case for DS Workbench

 Quick to build data replication projects

 Data flow design

 Additional customizations in DS Designer

 Progressively additional

(12)

 Data Replication Design

(13)

 Data flows

(14)

 Goodies

The use case for DS Workbench

View data

(15)

The use case for DS Workbench

 No big data sources or targets

 Little or no workflow customizations

 A small list of supported transforms

 Additional work could be required in DS Designer for job control and customizations

(16)

 Generate your own jobs in XML

 Dataflow= Source Tables -> Query Transforms -> Target (including HDFS)

 Workflows

 Custom script stages

 Jobs

 Datastores and Configurations

 Flat file formats

 Import generated XML as DS Designer Jobs, workflows, dataflow

 Configure datastore for multiple deployments of the same source product database

(17)

Example : 1 Dataflow in a job

Data Services job export in XML

Understanding the structure of a simple job

(18)

Data Services job export in XML

… 350 lines of xml Lets look closer …

(19)

Data Services job export in XML

DIDatabaseDatastore DIAttributes

DSConfigurations --- <odbc_data_source>*DATASTORE*</odbc_data_source>

DITable --- name="*TABLE_NAME*" owner="*DBOWNER*“ datastore="*DATASTORE”

DIProperties

DIColumn --- List all columns and column properties, *COLUMN_NAME*”

DIDataflow ---name=“ *DATAFLOW*"

DITransforms

DIAttributes --- array sizes, static parameters

DIDatabaseTableSource --- datastoreName="*DATASTORE*“tableName="*TABLE_NAME*"

DIOutputView --- name="*TABLE_NAME*"

DIFileTarget --- formatName="*DATASTORE*_*TABLE_NAME*" filename="*TABLE_NAME*

DIAttributes --- HDFS File location/path + static parameters

DIQuery

DIAttribute --- name="*TABLE_NAME*" value="*TABLE_NAME*"

DISchema

DIElement --- List all columns and column properties, *COLUMN_NAME*

DISelect

DIProjection

DIExpression --- column="*COLUMN_NAME*"

DIFrom

DIFlatFileDatastore --- Name="*DATASTORE*_*TABLE_NAME*"

DISchema

DIElement --- List all columns and column properties, *COLUMN_NAME*

DIAttributes --- Datastore input and output file store attributes, 1 per job

DIUIOptions --- name="*DATASTORE*_*TABLE_NAME*" value="*TABLE_NAME*"

*Variables* Repeat for each

(20)

Generating the xml programmatically

Understanding what you need Programmers for your code

• Sql programming skills • Or on java/python/.net

Data Services

• Sandbox Repository

Source database table and column definitions

• Oracle : all_tab_columns, Sql Server : information_schema, Progress DB: sysprogress.syscolumns_full ….

(21)

Generating the xml programmatically

Applying the understanding for complex designs

Imported xml This example: 1 Source Datastore 20+ Configurations 400+ Tables 400+ HDFS formats 400+ Dataflow 400+ Workflows 1 Job Auto generated

(22)

Generating the xml programmatically

Applying the understanding for complex designs Workflow

Scripts

Dataflow Datastores

(23)

Data Services Datastore Configurations

 Contains alternate connection

parameters for the datastore

 Typically used for promotions to

new environments

 Could be leveraged for using

template jobs on multiple

databases with identical schemas (e.g. QAD Progress databases)

(24)

 Always isolate any custom xml imports into a sandbox repository

 Use datastore configurations to maximum effect

 Pre-import , export (to xml) the intended source and destination datastore without any tables included

 Post-import , re-import these datastores to override the generated datastores

(25)

Generating the xml programmatically

(26)

Return on Investment

 Initial assessment at JCI for developing the custom programs needed (Codename : ATLGEN)

 Effort invested in ATLGEN development: 1 pers. week

 Typical pre-use efforts: 2 weeks per source

 Potential post use efficiency : 10x per source

(27)

(28)

STAY INFORMED

Follow the ASUGNews team:

Tom Wailgum: @twailgum

Chris Kanaracus: @chriskanaracus

(29)

SAP Data Services Hacks Auto Generating Data Migration Jobs Shobhit Acharya Session# 3507

Data Migration

Data Migration

SAP Data Services & Information Steward

SAP Information Steward

Case Study

Case Study

The use case for DS Workbench

The use case for DS Workbench

The use case for DS Workbench

Data Services job export in XML

Data Services job export in XML

Data Services job export in XML

Generating the xml programmatically

Generating the xml programmatically

Generating the xml programmatically

Data Services Datastore Configurations

Generating the xml programmatically

Return on Investment

STAY INFORMED

SESSION CODE

3507