• No results found

SAP Data Services Hacks Auto Generating Data Migration Jobs Shobhit Acharya Session# 3507

N/A
N/A
Protected

Academic year: 2021

Share "SAP Data Services Hacks Auto Generating Data Migration Jobs Shobhit Acharya Session# 3507"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

SAP Data Services Hacks

Auto Generating Data Migration Jobs

Shobhit Acharya Session# 3507

(2)

 Improve data migration efficiency using SAP Data

Services and implementing a few custom approaches that will speed up the extraction and load of source data.

These approaches deal with

 Programmatically generating data migration jobs to replace labor

intensive and monotonous job design

 Using xml import mechanism to create job templates

 Using datastore configurations to ingest multiple instances of

identical source databases

(3)

Introductions & Overview

Data Migration & SAP EIM tools

• Data Services: Designer & Workbench • Information Steward

• Review a baseline framework

The use case for DS Workbench Alternative solutions for efficiency

• Understanding the DS Job xml structure

• Developing job generation programs for automation • Datastore configurations

• When to use these programs

Demo

(4)

Data Migration

Assess Extract Consolidate Cleanse Load Reconcile

Data migrations at an enterprise scale need a focus on migrating data efficiently, quickly, repeatedly and assuredly. The goal is not just to move and convert data; its to ensure that the data is of high quality and supports the business processes and target system’s operational needs.

(5)

Data Migration

80% of organizations will underestimate the costs related to the data acquisition tasks by an average of 50

percent.

(6)

SAP Data Services & Information Steward

(7)

One Solution that Provides • Data Integration • Data Quality • Text Data Processing One server to execute all capabilities One design environment to manage all development One administration console to monitor all functions

(8)

Business

Measure and compare against Information governance rules and standards

SAP Information Steward

IT

Share data quality metrics and problems with business

(9)

Case Study

Review a baseline Data Migration Framework

Staging Area IVM Target System Legacy / Staging 1 2 3 4 7 8 BoA Staging BOBJ (Reporting) Source SAP BODS SAP IS 9 10 12 13 14 15 16 Extraction 1 Ingestion into Stage 2 Initial Data Profiling 3 Transform to common structure 4 Initial Health Check 5 Auto De-dup & Cleanse 6 Apply Relevancy Rules 7 Secondary Health Check 8 Facilitated Cleansing 9 Consolidate cleansed data 10 Reference Data validation 11 Fix Reference Data issues 12 Pre-load sign off 13 Load to Target System 14 Reconciliation 15 Post load sign off 16 Transformed 5 6 Target Relevant Load BoA 11

(10)

 25+ distinct source systems

 Multiple source product versions

 Need separate job streams

 Need separate job control

 Over 400 databases to ingest

 Hadoop as staging

 Multiple waves of migration

Case Study

Ingestion Scope Legacy / Staging 1 2 3 Source SAP BODS Extraction 1 Ingestion into Stage 2 Initial Data Profiling 3

(11)

The use case for DS Workbench

 Quick to build data replication projects

 Data flow design

 Additional customizations in DS Designer

 Progressively additional

(12)

 Data Replication Design

(13)

 Data flows

(14)

 Goodies

The use case for DS Workbench

View data

(15)

The use case for DS Workbench

 No big data sources or targets

 Little or no workflow customizations

 A small list of supported transforms

 Additional work could be required in DS Designer for job control and customizations

(16)

 Generate your own jobs in XML

 Dataflow= Source Tables -> Query Transforms -> Target (including HDFS)

 Workflows

 Custom script stages

 Jobs

 Datastores and Configurations

 Flat file formats

 Import generated XML as DS Designer Jobs, workflows, dataflow

 Configure datastore for multiple deployments of the same source product database

(17)

Example : 1 Dataflow in a job

Data Services job export in XML

Understanding the structure of a simple job

(18)

Data Services job export in XML

Understanding the structure of a simple job

… 350 lines of xml Lets look closer …

(19)

Data Services job export in XML

Understanding the structure of a simple job

DIDatabaseDatastore DIAttributes

DSConfigurations --- <odbc_data_source>*DATASTORE*</odbc_data_source>

DITable --- name="*TABLE_NAME*" owner="*DBOWNER*“ datastore="*DATASTORE

DIProperties

DIColumn --- List all columns and column properties, *COLUMN_NAME*

DIDataflow ---name=“ *DATAFLOW*"

DITransforms

DIAttributes --- array sizes, static parameters

DIDatabaseTableSource --- datastoreName="*DATASTORE*“tableName="*TABLE_NAME*"

DIOutputView --- name="*TABLE_NAME*"

DIFileTarget --- formatName="*DATASTORE*_*TABLE_NAME*" filename="*TABLE_NAME*

DIAttributes --- HDFS File location/path + static parameters

DIQuery

DIAttribute --- name="*TABLE_NAME*" value="*TABLE_NAME*"

DISchema

DIElement --- List all columns and column properties, *COLUMN_NAME*

DISelect

DIProjection

DIExpression --- column="*COLUMN_NAME*"

DIFrom

DIFlatFileDatastore --- Name="*DATASTORE*_*TABLE_NAME*"

DISchema

DIElement --- List all columns and column properties, *COLUMN_NAME*

DIAttributes --- Datastore input and output file store attributes, 1 per job

DIUIOptions --- name="*DATASTORE*_*TABLE_NAME*" value="*TABLE_NAME*"

*Variables* Repeat for each

(20)

Generating the xml programmatically

Understanding what you need Programmers for your code

• Sql programming skills • Or on java/python/.net

Data Services

• Sandbox Repository

Source database table and column definitions

• Oracle : all_tab_columns, Sql Server : information_schema, Progress DB: sysprogress.syscolumns_full ….

(21)

Generating the xml programmatically

Applying the understanding for complex designs

Imported xml This example: 1 Source Datastore 20+ Configurations 400+ Tables 400+ HDFS formats 400+ Dataflow 400+ Workflows 1 Job Auto generated

(22)

Generating the xml programmatically

Applying the understanding for complex designs Workflow

Scripts

Dataflow Datastores

(23)

Data Services Datastore Configurations

 Contains alternate connection

parameters for the datastore

 Typically used for promotions to

new environments

 Could be leveraged for using

template jobs on multiple

databases with identical schemas (e.g. QAD Progress databases)

(24)

 Always isolate any custom xml imports into a sandbox repository

 Use datastore configurations to maximum effect

 Pre-import , export (to xml) the intended source and destination datastore without any tables included

 Post-import , re-import these datastores to override the generated datastores

(25)

Generating the xml programmatically

(26)

Return on Investment

 Initial assessment at JCI for developing the custom programs needed (Codename : ATLGEN)

 Effort invested in ATLGEN development: 1 pers. week

 Typical pre-use efforts: 2 weeks per source

 Potential post use efficiency : 10x per source

(27)
(28)

STAY INFORMED

Follow the ASUGNews team:

Tom Wailgum: @twailgum

Chris Kanaracus: @chriskanaracus

(29)

SESSION CODE

3507

References

Related documents

For current students, some key highlights of the event included opportunities for professional networking, advice for career development and advancement, and gaining insights into

[r]

administration. q) All IV therapy should be administered using a 10ml syringe or larger, to comply with manufacturers guidelines. r) For intermittent injection, the patency of

The Seckford Education Trust and our schools should be environments in which pupils and students or their parents/carers can feel comfortable and confident

Some qualifying countries have also experienced strong growth in foreign direct investment aimed at taking advantage of AGOA with positive spin-offs for increased employment

This section outlines the method used to acquire and analyze the data required to determine the factorial study of the analysis of factors affecting maintenance

ETL Reject Source System 1 E T L Staging 1 Staging 2 MDM Hub Rejection Cleansing Standardization Source System 2 MDM System Data Harmonization Data Publication Match MDM

U nastavku rada opisani su: komunikacijsko-funkcionalan pristup učenju, koji je usko povezan s napuštanjem tradicionalnog poučavanja te usvajanjem metoda i postupaka koje će