• No results found

Implementing evolution: Database migration

N/A
N/A
Protected

Academic year: 2021

Share "Implementing evolution: Database migration"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

2IS55 Software Evolution

Implementing

evolution:

Database migration

Alexander Serebrenik

Sources

/ SET / W&I 7-6-2010PAGE 1

Last week

Assignment 8How is it going?

Questions to Marcel: [email protected]

/ SET / W&I 7-6-2010PAGE 2

Recapitulation

Last week: introduction to DB migrationDB migration

Data is important, technology is outdated

S – DB schema

D – DB data

P – data manipulation programs

/ SET / W&I 7-6-2010PAGE 3

Physical

conversion /

Conceptual

conversion

Wrap / Statement

rewriting /

Logical rewriting

Schema conversion: Physical vs Conceptual

/ SET / W&I 7-6-2010PAGE 4

Physical

Conceptual

Schema conversion: Physical

/ SET / W&I 7-6-2010PAGE 5

Easy to automate

Existing work: COBOL ⇒⇒⇒⇒relational, hierarchical ⇒⇒⇒⇒relational, relational ⇒⇒⇒⇒OO

“Migration as translation” vs “migration as improvement”Semantics is ignored

Limitations of COBOL ⇒⇒⇒⇒Design decisions in the legacy system ⇒⇒⇒⇒Automatic conversion ⇒⇒⇒⇒the same design decisions in the new system

(2)

Schema conversion: Physical vs Conceptual

/ SET / W&I 7-6-2010PAGE 6

Conceptual

Refinement: Data and code may contain implicit constraints (field refinement, foreign key, cardinality) on the schema

Schema conceptualization

Conceptualization: Remove implementation details

/ SET / W&I 7-6-2010PAGE 7

Conceptual

Conceptualization

Preparation: “clean up” to understand

e.g., rename attributes, drop one-element compounds

Untranslation: separate logic from limitations of technology

De-optimization: separate logic from performance

Conceptual normalization:Entities vs. relations and attributes

Explicit IS-A relations

/ SET / W&I 7-6-2010PAGE 8

Untranslation: Foreign keys

COBOL allows “direct access” via foreign keysER requires a relationship

set to connect two entitiesWhat would be the

appropriate cardinality?One customer can place

multiple orders

Every order can be placed only by one customer

/ SET / W&I 7-6-2010PAGE 9

De-optimization

Recall:

ORD-DETAIL is a complex multi-valued attributeHighly efficient COBOL trick

ORD-DETAIL cannot exist without an order

How would you model this in ER?Weak entity set

Conceptual normalization

What would you like to improve in this schema?Are the cardinality

constraints meaningful?

Which entities are, in fact, relations?

Are there unneeded structures?

(3)

Conceptual normalization

/ SET / W&I 7-6-2010PAGE 12

Logical design: schema concepts ⇒⇒⇒⇒DB tablesPhysical design: e.g., naming conventions

/ SET / W&I 7-6-2010PAGE 13

Conceptual

Hainaut 2009: Before and After

/ SET / W&I 7-6-2010PAGE 14

Another case study (Ch. 6)

Refined schema: decomposed attributes Address = Street, Number, City, ZIP, StateSchema refinement:

89 foreign keys, 37 computer foreign keys, 60 redundanciesRelational DB2

↑↑entities: decomposition of arrays

/ SET / W&I 7-6-2010PAGE 15

Recall…

So far we have considered DB schemas onlyNext step:

data migration

/ SET / W&I 7-6-2010PAGE 16

Data migration

Strategy depends on the schema migration strategy

Physical conversion: straightforward Data format conversion

Conceptual conversion

Data may violate implicit constraints

Hence, data cleaning is required as preprocessing

Once the data has been cleaned up: akin to physical conversion

(4)

What should be cleaned? 1 source

[Rahm, Do]Schema-level

Can be solved with appropriate integrity constraints

Instance-level

/ SET / W&I 7-6-2010PAGE 18

What should be cleaned? Multiple sources

Which DB tuples refer to the same real-world entity?

/ SET / W&I 7-6-2010PAGE 19

Scheme: name and structure conflicts

Instance: data representation, duplication, identifiers

How to clean up data?

/ SET / W&I 7-6-2010PAGE 20

Analyse:

Define inconsistencies and detect them

Define individual transformations and the workflowVerify correctness and effectiveness

Sample/copy of the dataTransform

Backflow if needed

If the “old” data still will be used, it can benefit from the improvements.

Data cleaning: Analysis

Data profiling

Instance analysis of individualattributes

Min, max, distribution, cardinality, uniqueness, null values

max(age) > 150? count(gender) > 2?

Data mining

Instance analysis of relationsbetween the attributes

E.g., detect association rules

Confidence(A ⇒⇒⇒⇒B) = 99%

1% of the cases might require cleaning

/ SET / W&I 7-6-2010PAGE 21

Data cleaning: Analysis (continued)

Record matching problem:

Smith Kris L., Smith Kristen L., Smith Christian, …Matching based on

Simplest case: unique identifiers (primary keys)

Approximate matching

Different weights for different attributes

Strings:

Edit distance

Keyboard distance

Phonetic similarity

Very expensive for large data sets

Define data transformations

Use transformation languages

Proprietary (e.g., DataTransformationService of Microsoft)

SQL extended with user-defined functions (UDF):

CREATE VIEW Customer2(LName, FName, Street, CID) AS SELECT LastNameExtract(Name),

FirstNameExtract(Name), Street, CID)

FROM Customer

CREATE FUNCTION LastNameExtract(Name VARCHAR(255)) RETURNS VARCHAR(255)

(5)

UDF: advantages and disadvantages

Advantages

Does not require learning a separate language

Disadvantages

Suited only for information already in a DB

What about COBOL files?

Ease of programming depends on availability of specific functions in the chosen SQL dialect

Splitting/merging are supported but have to be reimplemented for every separate field

Folding/unfolding of complex attributes not supported at all.

/ SET / W&I 7-6-2010PAGE 24

Inconsistency resolution

If inconsistency has been detected, the offending instances

Are removed

Are modified so the offending data becomes NULL

Are modified by following user-defined preferences

One table might be more reliable than the other

One attribute may be more reliable than the other

Are modified to reduce the (total) number of modifications required to restore consistency

/ SET / W&I 7-6-2010PAGE 25

Inconsistency resolution

If multiple inconsistencies are detectedWhich one has to be selected to be resolved?

Usually resolutions are not independent

[Wijsen 2006]:

Single database instead of multiple uprepairs

Special kind of updates

Which one is more important?

/ SET / W&I 7-6-2010PAGE 26

From data to programs

So far: schemas and dataNext : programs

Wrapping

Statement rewriting

Program rewriting

/ SET / W&I 7-6-2010PAGE 27

Wrappers

/ SET / W&I 7-6-2010PAGE 28

Legacy code

Legacy data representation

Legacy code

New data representation

Wrapper

?

Wrappers

Replace “standard” OPEN, CLOSE, READ, WRITE with wrapped operations

/ SET / W&I 7-6-2010PAGE 29

Start wrapping action “READ”

Actual implementation of “READ”

(6)

Wrappers

[Thiran, Hainaut]: wrapper code can be reused

/ SET / W&I 7-6-2010PAGE 30

Upper wrapper

Manually written Model

wrapper

Instance wrapper

Automatically generated

Common to all DMS in the family: cursor, transaction

Specific to the given DB: query translation, access optimization Cannot be

expressed in the DB itself

Wrapping: Pro and Contra

Wrapping

Preserves logic of the legacy system

Can be (partially) automatedPhysical + wrapper:

Almost automatic (cheap and fast)

Quality is poor, unless the legacy DB is well-structuredConceptual + wrapper:

More complex/expensive

Quality is reasonable: “First schema, then – code”

Possible performance penalty due to complexity of wrappers

Mismatch: “DB-like” schema and “COBOL like” code

/ SET / W&I 7-6-2010PAGE 31

Wrapping in practice

Wrappers159 wrappers

450 KLOC

/ SET / W&I 7-6-2010PAGE 32

Cursor?..

Control structure for the successive traversal of records in a query result

/ SET / W&I 7-6-2010PAGE 33

Cursor declaration

What will this cursor return? CUS_CODE CODE

J11 12

J12 11

J13 14

K01 15

O_CUST = J12 Why would you like to use such a cursor?

COBOL READ: Sequential reading starting from the first tuple with the given key

Cursor?..

Control structure for the successive traversal of records in a query result

Cursor declarationOpening a

cursorRetrieving

data

Closing cursor

Statement rewriting

Legacy code

Legacy data representation

Legacy code

New data representation

(7)

Statement rewriting

Replace “standard” OPEN, CLOSE, READ, WRITE with explicit SQL operations

/ SET / W&I 7-6-2010PAGE 36

Statement rewriting

Replace “standard” OPEN, CLOSE, READ, WRITE with explicit SQL operations

/ SET / W&I 7-6-2010PAGE 37

O-CUST does not appear in ORDERS

Statement rewriting

Replace “standard” OPEN, CLOSE, READ, WRITE with explicit SQL operations

/ SET / W&I 7-6-2010PAGE 38

Files can have multiple keys and multiple READ commands

We need to remember which key/READ is used!

Statement rewriting

Replace “standard” OPEN, CLOSE, READ, WRITE with explicit SQL operations

/ SET / W&I 7-6-2010PAGE 39

Prepare the cursor for READing

READ the data

Statement rewriting

Replace “standard” OPEN, CLOSE, READ, WRITE with explicit SQL operations

/ SET / W&I 7-6-2010PAGE 40

We need additional cursor and procedure to read the order details:

Legacy DB New DB

Statement rewriting: Pro and Contra

Statement rewriting

Preserves logic of the legacy system

Intertwines legacy code with new access techniques

Detrimental for maintainabilityPhysical + statement

Inexpensive and popular

Blows up the program: from 390 to ~1000 LOC

Worst strategy possibleConceptual + statement

Good quality DB, unreadable code: “First schema, then – code”

Meaningful if the application will be rewritten on the short term

(8)

Alternative 3: Logic Rewriting

Akin to conceptual conversione.g., COBOL loop ⇒⇒⇒⇒SQL join

And meaningful only in combination with it

Otherwise: high effort with poor results

/ SET / W&I 7-6-2010PAGE 42

Legacy DB

New DB

Alternative 3: Logic Rewriting

Manual transformation with automatic supportIdentify file access statements

Identify and understand data and statements that depend on these statements

Rewrite these statements and redefine the objects

/ SET / W&I 7-6-2010PAGE 43

Logic rewriting: Pro and Contra

Logic rewriting + physicalLow quality DB

High costs due to logic rewriting

Unfeasible

Logic rewriting + conceptualHigh quality

Highest costs

/ SET / W&I 7-6-2010PAGE 44

Putting it all together

/ SET / W&I 7-6-2010PAGE 45

Physical, Wrapping Conceptual,

Wrapping

Physical, Statement Conceptual,

Statement

Physical, Logic Conceptual,

Logic

Scheme

CodeAll combinations are possible

Not all are desirable

Conc.

Phys.

Statement

Wrapping Logic

Putting it all together

Zero time Better DB, performance

penalty

Popular, $, Bad Better DB, the application is rewritten later

Very bad, $$ Best but also

$$$

Scheme

CodeAll combinations are possible

Not all are desirable

Conc.

Phys.

Statement

Wrapping Logic

Tools

DB-MAIN CASE tool (University of Namur, ReVeR)DDL extraction

Schema storage, analysis and manipulation

Implicit constraint validation

Schema mapping management

Data analysis & migration

Wrapper generation (COBOL-to-SQL, CODASYL-to-SQL)

Transformations

Eclipse Modelling Framework: ATL

(9)

People ((ex)-FUNDP)

Anthony Cleve

Jean-Luc Hainaut

Philippe Thiran

/ SET / W&I 7-6-2010PAGE 48

Conclusions

3 levels of DB migration: schema, data, code

Schema: physical/conceptualData: determined by schema

Code: wrapper/statement rewriting/logical rewriting

Popular but bad: physical + statementExpensive but good: conceptual + logicAlternatives to consider:

conceptual + wrapping/statement

physical + wrapping (zero time)

References

Related documents

S tudents in UMaine’s Graduate School of Biomedical Sciences (GSBS) collaborate with more than 80 world-class researchers from UMaine and six partners—The Jackson Laboratory;

When a flash memory device leaves factory, it comes with a minimal number of initial bad blocks during production or out-of-factory as there is no currently

Southwest Chicken Salad 7.00 – Spring mix lettuce with crispy chicken strips, diced tomato, roasted corn, black beans and a chipotle avocado ranch dressing. Bistro Roast Beef

Future agrarian feminist theory may contribute to important social change in several ways including: examining the media representation of'farmer', organizational inclusion of

• fixture management scales from simple unit to complex functional testing, allowing to parametrize fixtures and tests according to configuration and component options, or to

Partners from North East of Gothenburg: the Community Center Hammarkullen, the Folk High School in Angered, the Library and Citizens’ Advice Bureau operated by the City District of

support policies for agricultural producers include fixed direct payments, disaster assistance, and environmental payments in the green box; market price supports for dairy and

Through smart use of technology, Government will improve the effectiveness and efficiency of its internal processes, better support front line services across all channels and