SAP Data Services 4.X. An Enterprise Information management Solution

17  Download (0)

Full text

(1)

SAP Data Services 4.X

An Enterprise Information management Solution

(2)

Table of Contents

I. SAP Data Services 4.X ... 3

Highlights

Training Objectives Audience

Pre Requisites Keys to Success Certification

II. SAP Data Services Core Components ... 4

Designer Repository Job Server Access Server Engine / Service

Data Services Management Console (DMC) Central Management Console (CMC)

III. Data Services Designer... 7

Fundamentals Key Areas Login

Object Hierarchy Define Metadata Pre Requisites Use Case

Naming Standards

IV. Working with Flat Files ... 11

Define a Flat File Query

Case Merge Validation

Custom Functions Working with Variables

V. Data Integrator Transforms ... 15

Performance Tuning

(3)

Strategy and Implementation

Life Cycle Management / Deployment Model Failure Recovery

Scheduling

VI. Appendix ... 16

Known Issues

References

(4)

SAP Data Services 4.X

The SAP Data Services application can help by delivering a single enterprise- class solution for data integration, data Quality, data profiling, and text analysis that allows you to integrate, transform, improve, and deliver trusted data that supports critical business processes and enables sound decisions.

The software customizes and manages data access and uniquely combines industry-leading technologies for delivering data to analytic, supply-chain management, customer relationship management, and Web applications.

Highlights

An enterprise can manage data as a corporate asset independent of any single system. Integrate data across many systems and reuse that data for many purposes

We can create a single infrastructure for batch and real-time data movement to enable faster and lower cost implementation

Use SAP Data Services to develop enterprise data integration for batch and real- time uses

Distributed Architecture

(5)

Training Objectives

Participants will gain the knowledge to participate in implementing, administering, and managing data migration / Integration related projects

Audience

Project team members with basic knowledge of Data Warehousing and data integration concepts

Data Warehousing consultants and project managers who work with data integration

Pre Requisites

Basic knowledge of data warehousing and ETL (Extraction, Transfer and Loading) concepts

Keys to Success

Basic knowledge of SQL language

Basic knowledge of elementary procedural programming

Author recommends combining education courses and hands-on experience to prepare for your certification exam as questions will test your ability to apply the knowledge you have gained in training

Certification

The certification test "SAP Certified Application Associate - Data Integration with SAP Data Services 4.x" verifies proven skills and fundamental knowledge in implementing ETL projects, administering and managing projects using SAP Data Services. This exam will measure your knowledge of SAP Data Services starting with understanding Data Integrator concepts through to managing the Data Integrator environment and profiling data. It will also test your skills in applying performance tuning and sizing methods, implementing complex design methodology, deploying Data Integrator using best practices and troubleshooting and supporting Data Integrator

SAP Data Services Core Components

SAP Data Services combines industry-leading data quality and integration into one platform.

With Data Services, your organization can transform and improve data anywhere. You can have a single environment for development, runtime, management, security and data

(6)

connectivity.

Designer

One of the fundamental capabilities of Data Services is extracting, transforming, and loading (ETL) data from heterogeneous sources into a target database or data warehouse. You create applications (jobs) that specify data mappings and transformations by using the Designer.

The Designer is a development tool with an easy-to-use graphical user interface. It enables developers to define data management applications that consist of data mappings, transformations, and control logic.

Use the Designer to create applications containing work flows (job execution definitions) and data flows (data transformation definitions)

Repository

The SAP Data Services repository is a set of tables that hold user-created and predefined system objects, source and target metadata, and transformation rules. Set up repositories on an open client/server platform to facilitate sharing metadata with other enterprise tools. Each repository must be stored on an existing RDBMS and registered in the Central Management Console (CMC).

Local

A local repository is used by an application designer to store definitions of objects (like projects, jobs, work flows, and data flows) and source/target metadata.

Central

A central repository is an optional component that can be used to support multi-user development. The central repository provides a shared object library allowing developers to check objects in and out of their local repositories.

While each user works on applications in a unique local repository, the team uses a central repository to store the master copy of the entire project. The central repository preserves all versions of an application's objects, so you can revert to a previous version if needed.

Multi-user development includes other advanced features such as labeling and filtering to provide you with more flexibility and control in managing application objects.

Job Server

The SAP Data Services Job Server starts the data movement engine that integrates data from multiple heterogeneous sources, performs complex data transformations, and manages extractions and transactions from ERP systems and other sources. The Job Server can move data in either batch or real-time mode and uses distributed query optimization, multi- threading, in-memory caching, in-memory data transformations, and parallel processing to deliver high data throughput and scalability.

(7)

Access Server

The SAP Data Services Access Server is a real-time, request-reply message broker that collects message requests, routes them to a real-time service, and delivers a message reply within a user-specified time frame. The Access Server queues messages and sends them to the next available real-time service across any number of computing resources. This approach provides automatic scalability because the Access Server can initiate additional real-time services on additional computing resources if traffic for a given real-time service is high. You can configure multiple Access Servers.

Engine / Service

When Data Services jobs are executed, the Job Server starts engine processes to perform data extraction, transformation, and movement. The engine processes use parallel processing and in-memory data transformations to deliver high data throughput and scalability

The Data Services service is installed when Job and Access Servers are installed. The service starts Job Servers and Access Servers when you restart your system. The Windows service name is SAP Data Services. The UNIX equivalent is a daemon named AL_JobService.

Data Services Management Console (DMC)

A Data Services Management console is a browser based module that includes various utilities like Administrator, Auto Documentation, Impact and Lineage analysis, Data Validation, Data Quality Reports, and Operational Dashboards.

Administrator

This is a valuable utility for production support team, various tasks like Scheduling, monitoring, and executing batch jobs, configuring, starting and stopping real-time services, configuring and managing adapters, publishing batch jobs and real-time services via Web services.

Auto Documentation

Data Services Auto documentation reports capture critical information for understanding your jobs so you can see at a glance the entire ETL process.

After creating a project, you can use Auto documentation reports to quickly create a PDF or Microsoft Word file that captures a selection of job, workflow, and/or data flow information including graphical representations and key mapping details.

Data Validation Reports

In the Data Validation application, create functional areas.

Verify that your source data meets your business rules.

Take the appropriate actions when the data does not meet your business rules Data Quality Reports

Many data quality transforms generate information about the data being processed. Data quality reports provide access to that data processing information.

The statistics-generating transforms include Match, USA Regulatory Address Cleanse, and Global Address Cleanse transforms.

Report types include job summaries, transform-specific reports, and transform group reports Impact and Lineage Analysis

Manage a single metadata repository to capture the relationships between different extraction and access methods and provide integrated lineage and impact analysis.

Analyze the impact of changing a source table, column, element, or field on existing documents and reports.

Analyze lineage to determine data sources of documents and reports.

Track different versions (changes) to each object over time.

(8)

Central Management Console (CMC)

SAP Data Services uses the Central Management Console for user accounts and rights management.

Before you can access Data Services repositories in other components such as the Designer and Management Console, you must configure them appropriately.

Create the repository in a database using the Repository Manager.

Register the repository in the Central Management Console (CMC).

Manage security settings for the repository in the CMC.

Data Services Designer

Fundamentals

The fundamental capabilities of Data Services is extracting, transforming, and loading (ETL) data from heterogeneous sources into a target database or data warehouse.

Broad source and target support Including SAP application and databases with bulk loading and CDC changes data capture including file types comma delimited, fixed width, COBOL, XML, Excel etc...

Instant traceability with impact analysis and data lineage capabilities that include the data quality process Multi-user support and versioning via a central repository

Administration tool with scheduling capabilities and monitoring/dashboards Scalable scripting language with a rich set of built-in functions

Interoperability and flexibility with Web services-based applications Debugging and built-in profiling and viewing data

Designer

(9)

Key Areas

Project area - 1

Contains the current project (and the job(s) and other objects within it) available to you at a given time.

In Data Services, all entities you create, modify, or work with are objects.

Workspace - 2

The area of the application window in which you define, display, and modify objects.

Local object library - 3

Provides access to local repository objects including built-in system objects, such as transforms and transform configurations, and the objects you build and save, such as jobs and data flows.

Tool palette - 4

Buttons on the tool palette enable you to add new objects to the workspace.

Login

TDB

Object Hierarchy

Projects

A data integration application typically contains one or more Data Services projects. Each project contains one or more jobs, which can load data in batch mode or respond to real-time requests. Projects are a convenient way to group related real-time jobs.

Jobs

There are two types of jobs: batch and real-time. A batch job extracts, transforms, and loads data. A real-time job receives a request, extracts, transforms and loads data to support that request, and then responds to the requestor in real-time.

Work flows

Work flows contain steps that define the order of steps within a job for execution. Work flows call data flows, but cannot manipulate data themselves. Work flows are called from inside other work flows or jobs.

Scripts

A script is a collection of Data Services scripting language statements. Batch jobs start and end with scripts (an initialization and finalization script). Scripts can also be used inside components to execute SQL statements, set variable values, or execute Data Services functions. All variables used in a data flow must be initialized in a script.

Data flows

A data flow contains defines steps to extract data from one or more sources, transform the data, and load the data into one or more target tables.

ABAP Dataflow

(10)

An ABAP Dataflow is specific to SAP Application, and has to be defined to extract data from SAP Application.

An ABAP data flow is mandated to use a transport object File Formats

File formats describe the metadata structure.

A file format is a set of properties describing the structure of a flat file (ASCII).

A file format describes a specific file.

A file format template is a generic description that can be used for many data files

The object library stores file format templates that you use to define specific file formats as sources and targets in data flows

Object Hierarchy

Define Metadata

Datastore definitions need to be consistent and secure.

(11)

Datastores will be defined once and referenced, used by all developers and jobs.

Developers have READ only access to the datastore definitions, but may modify their local definitions, including the creation of System Configuration for unit testing purposes.

Developers will “re-get” the datastore definitions when changes to structure or passwords are made

Pre Requisites

Define Data Base Schemas as follows ODS

DS_STAGE SALESMART

Note: Please refer to data services tutorial folder to create test data

Use Case

The objective is to load a Datamart (Salesmart) to meet various reporting

The Source of data is an Operational Data Store, and the following tables are used in order to populate Salesmart Star Schema

Source Schema

ODS

Stage Schema

DS_STAGE

Target Schema

Salesmart

ODS_Customer STG_ODS_Customer CUST_DIM

ODS_Employee STG_ODS_Employee EMPLOYEE_DIM

ODS_Material STG_ODS_Material MRTL_DIM

ODS_Region STG_ODS_Region SALESORG_DIM

ODS_SalesOrder STG_ODS_SalesOrder

SALES_FACT ODS_SalesItem STG_ODS_SalesItem

ODS_Delivery STG_ODS_Delivery

(12)

Naming Standards

The Best practice to build a multi user ETL environment is by following proper Naming standards

Prefix Object Example

Job Job Job_ODS_Salesmart_Extract

WF Workflow WF_ODS_Dimension_Extract

WF_ODS_Fact_Extract

DF Dataflow DF_ODS_Customer_Extract

DS Datastore DS_ODS

DS_STAGE DS_SALESMART

Working with Flat Files

Define a Flat File

Create a file format template that defines the structure for a file.

Create a specific source or target file format in a data flow.

The source or target file format is based on a template and specifies connection information such as the file path / file name.

File format objects can describe files in:

Delimited format — Characters such as commas or tabs separate each field Fixed width format — The column width is specified by the user.

All files within the ETL area must be placed on the Job Server.

If the File format is used in multiple places, it should be considered a reusable object and managed like a Datastore.

(13)

Query

Maps column from input Schema to output Schema.

Perform transformations and functions on the source data.

Assign Primary Keys to output Schema columns.

Add New Output columns, Nested Schemas, and Function Calls to the output Schema.

Perform Data Nesting and Unnesting with Sub Schemas of the Output Schema. Also assign Make Current Schema.

Generate distinct result set output for the input Schema.

Join data from Multiple Input Source Schemas. Inner Join as well as Outer Join is supported.

Filter input Source Data.

Performs Aggregation based on input column groups.

Generate sorted dataset based on source input column order.

Also we can generate DTD, XML Schema or File Format based on the Input or Output Schema.

Query Editor:

Use the Query editor to specify the Schema In, Schema Out, and Options for the Query transform.

The areas can be resized in order to expand the area in which you are working. You can also expand and contract the columns to change the width of properties displayed in the input and output schema areas.

Important Tabs in Query Editor:

(14)

Mapping Tab:

Mapping tab provides complex column mappings and drag drop input schemas and column into the output schema to enable the editor. Use the function wizard and expression editor to build expression.

Select Tab:

This tab specifies whether to output only distinct rows (Discarding any identical duplicate rows).

From Tab:

This tab specifies all input schemas that are used in the current schema.

Where tab:

This tab specifies conditions that determine which rows is output.

Group by tab:

This tab specifies how the output rows are grouped. (If Required)

Order by Tab:

This tab specifies the output rows are sequence (ascending or descending If Required).

Advanced Tab:

Use the Advanced tab to set up Data Services so that it creates separate sub data flows.

Sub data flows process any of the following resource-intensive query clauses:

DISTINCT | GROUP BY | JOIN | ORDER BY

Find Tab:

This tab specifies locate input or output elements containing a specific word or term.

JOINS IN QUERY TRANSFORM Source Tables:

ODS_CUSTOMER | ODS_SALESODER Output Columns:

Cust_id | Cust_name | Order_Date | Order_Status | Net_Order_Date

Join Condition:

ODS_CUSTOMER.CUST_ID = ODS_SALESORDER.CUST_ID

(15)

Case

Merge

(16)

Validation

Custom Functions

Working with Variables

Data Integrator Transforms

Performance Tuning

Strategy and Implementation

Life Cycle Management / Deployment Model

Failure Recovery

Scheduling

(17)

Appendix

Known Issues

References

Figure

Updating...

References

Related subjects :