• No results found

Data Migration Service An Overview

N/A
N/A
Protected

Academic year: 2021

Share "Data Migration Service An Overview"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Migration Service – An Overview

Metalogic Systems Pvt Ltd

J – 1/1, Block EP & GP, Sector V, Salt Lake Electronic Complex, Calcutta 700091 Phones: +91 33 2357-8991 to 8994

(2)

Data Migration Service – An Overview

Metalogic Systems: Data Migration Services

1. Introduction

Data Migration is an extremely important but often neglected or wrongly estimated phase in any software implementation / migration activity. Apparently it sounds rather easy to move the data from the source platform to the target platform, but in real life this can easily become an extremely daunting task.

The objective of this document is to give a general idea behind the activities related to migrating or transforming data and the technical overview of the processes within the data migration service provided by MLS.

2. Scope of Data Migration

One may need to transform or migrate data from one application to another due to various reasons and some of them can be listed as below:

• Technical obsolescence of old operating environment (hardware, o/s, application s/w) and a compulsion to migrate the application along with the data to newer platforms

• Implementation of new custom-built applications or packages requiring enterprise data from the legacy systems

• Requirement to provide web-enabled services that must have access to enterprise data online or updated up to a pre-determined interval.

• To provide a reliable and efficient electronic mechanism to make archived data available.

The actual scope of data migration needs to be determined after analyzing various requirements of the customer. We need to answer some basic questions before finalizing on the scope. Some of the questions are given below but the list is not exhaustive:

a) Whether it is a 1:1 data migration (e.g., attributes of the entities remain mostly unchanged between the source & target platforms) or there is a need for data transformation (i.e., the source and target data models differ by design) For application migration, generally the functionalities offered by the original application remain same and it is only the source and target environments (i.e., hardware, o/s, database products) that change. But in cases where the migrated data is going to be used by a new custom application or any package, there is a substantial change in the entity/relationship between the source and target databases. Naturally, in the latter case, the effort is much more and one needs to be thorough about the business rules to transform the data.

(3)

Data Migration Service – An Overview b) Whether the original application has to continue or it is going to be retired once

the target application goes into production. In all likelihood, in the latter case the migration effort is one time, whereas in the former case we may need to put in place a process by which both these applications should be able to exchange data to and fro at a regular interval of time. Often, due to modular replacement of legacy systems with new or migrated systems, exchanges of data between the two environments become a critical requirement.

c) In some cases it may so happen that the original application has already been retired and the new application is operational with current data, but there is still a requirement to transfer the older/archive data. It is likely that the source

environment itself may not be available any longer and in such cases, one will need to devise special handling for correct interpretation of this kind of data (e.g., computational / embedded sign fields, data residing in machines with 6/9 bit architecture etc.).

Apart from the issues outlined above, we need to consider various operational aspects of the specific application and the environment as listed below. However, some of these may not apply depending on the response to the above questions.

d) Which data is static and which is transactional in nature and what are the

respective volumes? What are the frequencies of various source database entities those undergo changes?

e) How much disk space is available in the source environment to download the data? If the download process needs to be split into several phases, how to manage the changes in the transactional data that happen during these phases? f) Can the source application be shut down during the data download & upload

phases? What is the estimated process run-time and is it lower than the maximum time allowed for such a shutdown?

Answers to the above questions naturally will lead to the decision on whether we will need to download/upload the data incrementally. The above analysis will also determine the periodicity of the data exchanges. It is apparent that an incremental download & upload process is going to be far more complicated compared to developing a set of simpler extraction and loading programs that will run as a single one-time batch job. Lastly, there is a great need to carefully evaluate the actual requirements of any data exchange process between the source and target applications if both have to run side by side and are dependent on each other. We need to determine the mode of data transfer in such cases. Should the data travel at a pre-determined interval by batch (e.g., nightly transfer by ftp or similar means) or do we need to have inter-process communication services between the two applications (through rpc or similar mechanisms)?

(4)

Data Migration Service – An Overview

3. Overview of the Data Migration services provided by MLS

MLS has acquired specialized expertise in carrying out data migration projects

successfully across the globe. While executing data migration projects for our customers, we have implemented certain processes to streamline all the activities involved in this kind of activity.

This process is based around a set of software tools developed in-house that generate some of the essential components like data download and upload programs. The tool-based approach automates the migration to a great extent, thereby ensuring a faster completion time and reducing the chances of errors.

S o u r c e P l a t f o r m ( H / W & S / W )

S o u r c e D a t a

T a r g e t P l a t f o r m ( H / W & S / W )

T a r g e t D a t a

T r a n s f o r m a t i o n

The above picture is the uppermost level depiction of a most common type of data transformation activity. The source data are transformed and transported to the target platform after a series of operations are applied on them in various stages. The scope generally ends with uploading the data into target platform.

Simple though it sounds, all aspects of the hardware, application software, database design (e.g., table structure, relationship, other objects like DB procedures etc.) and installation / deployment details of source and target environments must be considered in order to complete a successful data migration. While designing the service at MLS, we have tried our best to cover all these aspects.

(5)

Data Migration Service – An Overview

4. Process Overview

The process of Transformation involves:

4.1 Study and analysis of source and target data models and extraction of the mapping rules for transforming the business entities in the source application into the target platform.

4.2 Mapping existing data types available in the source platform to equivalent data types in the target database.

4.3 Translation/re-coding and manipulation of the source data as per requirements of the target application.

4.4 Scraping and Cleansing of data invalid in the new environment

The following figure represents the major processes involved in a data migration activity and the sequence in which they are performed. The square boxes each represent a separate process with further elaboration in the subsequent sections of this document.

T ool Processes Source D ata M odel A nalysis Target D ata M odel A nalysis A ttribute M apping M apper Creation R epository Population

D ata D ownload and Upload G eneration D ata D ow nload and

Upload Testing Live D ata M igration Source D ata Target Database A pplication Program s / D atabase Definition Scripts Tested D ata D ownload and U pload U tilities D ata V alidation R outine G eneration Error Reports

(6)

Data Migration Service – An Overview

Process Input and Output

Process Input Output

Source Data Model Analysis

Source Data Model Source file layouts COBOL layout

finalization for Flat and ISAM files

Source database scripts (DDL/ DDS/SDDL/DBD/PSB/SQL etc.) & Application programs

Source field/record Rules

Source database entities

Discrimination rules for each file/record/segment

Creation of data mapping rules and validations

Data Mapping and Validation Rules

Repository Population Source file layouts 1. Populated Repository (2)

2. Generated Data Validation Programs

Data Validation Programs

Populated Repository

Data Mapping and Validation Rules

Sample Error Reports on Invalid Data (3)

Data Download & Upload Program Generation

Populated Repository

Data Mapping and Validation Rules

Generated Data Download & Upload programs

Data Download Testing 1. Test Database in the Source

Environment

2. Generated Data Download programs

1. Sample source data in plain ASCII format 2. Sample Error Reports on Invalid Data

Data Upload Testing Sample source data in plain ASCII

format

1. Populated Test Database in the Target environment

2. Test results on the target platform

Tested Data Download and Upload programs

Tested programs, compilation and installation scripts for both platforms

Data Download and Upload programs installed on respective platforms

(7)

Data Migration Service – An Overview

Process Input Output

Data Migration – Dry Run

Sample data for all sources (Related and complete)

Migrated data in target environment

corresponding to the provided sample.

Rough Estimate of actual time needed in final run.

Data Migration Plan Plan document

Test Data Migration Full data for all sources for

identified phase(s)

Migrated data in staging environment

Actual estimate of time required for final run Revised plan document

Live Data Migration Source Data

Storage on Target platform

Transformed data migrated to target platform

Control report to ensure complete migration (1) Validation Rules may be defined for implementation during data transformation on source fields/records. Applying expressions or functions on source field(s) may generate target data elements.

(2) The repository is a complex set of data structures that stores all information related to source data models. It can be used to produce a variety of reports about the source data models and generate the download programs.

(3) This will be a repetitive task – inspection of error reports coming out of this step will gradually refine the data mapping rules. Only after a couple of iterations it will be possible to extract all the prevalent rules existing in the source entities.

(8)

Data Migration Service – An Overview

5. A brief look inside the Processes

5.1 Source Data Model Analysis

i. Identify all types of storage (Network/Hierarchical/Relation databases, ISAM files, sequential files, etc.) and respective Data Definition scripts (Schema/Sub-schema/DBD/PSB/SQL scripts etc.).

ii. Identify all data storage units (records/tables) requiring transformation. iii. Identify all possible layouts for each data storage unit.

iv. Determine layouts of individual data storage units – with break-up of data elements to the lowest possible levels.

v. Identify rules to validate records and/or fields in each storage unit. For example, whether a field is a date field or not and if yes, what is the format of this date field. Or, if a field contains a set of valid codes, what are these codes (e.g., M for Male, F for Female) and so on.

vi. Identify records with multiple layouts and the rules to distinguish the different layouts. For example, if there is multiple record types all put together in a single file, which field identifies the record type?

5.2 Target Data Model Analysis

i. Determine target data model

ii. Determine significance of each data element in target data model with respect to data migration requirement

iii. Identify data elements to be populated by migrated data.

5.3 Attribute Mapping

i. Identify rules to transform source data types to target data types ii. Correspond each target data element (columns) to source data

iii. Identify rules (expressions, functions, etc.) to be applied on source data element(s) in order to populate target data elements

iv. Identify rules to validate transformed data elements

v. Identify rules to transform source records/files into target records/tables (viz., merge, split, etc.)

vi. Identify discrepancies related to data types, sizes, and formats of data elements

vii. Resolve discrepancies related to data types, sizes, and formats of data elements

(9)

Data Migration Service – An Overview

5.4 Mapper Creation

Create Map Information files on the basis of source and target data models and the rules identified above to aid population of repository. Map Information files are files that store data mapping rules and validations in pre-defined formats. The transformation tool recognizes this format

5.5 Repository Population

Outputs of all preceding processes are utilized to populate the repository. The repository is a complex data structure that stores all Source and Target data definitions and the rules for transformation.

5.6 Data Validation Routine Generation

These programs will validate the supplied mapping rules. All the rules may not be readily available to the customer on day one and they have to evolve over a period of time. These programs will help validate those rules by sampling the actual data from the source databases and generating error reports. Appropriate inclusion / modifications are then applied on the supplied set of rules to bring the error report contents to an acceptable limit and determine the actual rules.

5.7 Data Download / Upload Program generation

5.7.1 Data Download

Data download programs are generated by the tool and run on the source platforms. There is one program for every file/record/table in the respective source databases, which dumps the contents of the respective data store in a flat file with all fields converted to plain ASCII text, removing all platform

dependencies (i.e., embedded sign fields, computational fields etc.). The download programs may also generate a control file in order to preserve the existing set relationships of the current record with other records, ordering and other information as per requirement, so that no information is lost while pulling the data out of the existing environments.

5.7.2 Data Upload

Data Upload Programs generated or developed during this stage will take as input the downloaded ASCII data extracted in the previous step. The Data Mapping document containing all mapping rules between the source and target databases will supply the specifications for this task..

(10)

Data Migration Service – An Overview

Data Download Testing

The generated Download programs are tested on a test database in the source database environment.

Testing is an iterative process. The test results will confirm correctness of the download process .

Data Upload Testing

Data Upload programs will be run on the development environment after setting up the target databases there. The test results at this stage will confirm correctness of the entire migration process.

Tested Data Download and Upload programs

The tested data Download and Upload programs are then delivered and installed on respective platforms with appropriate scripts to compile and execute them.

Data Migration – Dry Run

A test run on all Source data units with sample data to ensure success for the live run. This stage will also provide a rough estimate of the time required for the final data migration.

Data Migration Plan

A plan for the live data migration is produced. The plan takes all logistics and contingencies into account.

Test Data Migration

Test run of data migration with full set of operational data for the identified phase(s). The test run is to be carried out on the staging environment. This will provide an estimate of actual time required for the final data migration.

This stage will enable to determine the most suitable phases in the entire migration process and will produce a final data migration plan.

Live Data Migration

An approved migration plan is followed to undertake the Live Data Migration. The correctness of the transformation is confirmed by comparing control reports generated for both Source and Target data.

References

Related documents