• No results found

Agile Development and Schema Evolution. Schema Evolution Agile Development & Databases Case Study on the Herschel Project

N/A
N/A
Protected

Academic year: 2021

Share "Agile Development and Schema Evolution. Schema Evolution Agile Development & Databases Case Study on the Herschel Project"

Copied!
41
0
0

Loading.... (view fulltext now)

Full text

(1)

Agile Development and Schema Evolution

Schema Evolution

Agile Development & Databases

(2)

Schema Evolution

Adapt the logical/physical data model of a database

Reasons

Performance optimisation

(3)

Software Development Cycle

Any Software

Information Systems

develop deploy use feedback develop deploy evolve database use feedback

(4)

Agile Development

Cycle time

 Waterfall: years

 Agile: weeks

Agile properties relevant for DBMS

 Iterative development  Constant stream of changes

 Users use software early on  Users have databases

 Frequent software releases  Frequent evolution of user-DBMS

Easy solution

 Scrap databases after every release  Often not feasible

Real solution

(5)

Schemas in Information Systems

Application schema

 Class definitions & hierarchy

Database schema

 Class definitions & hierarchy (ODBMS)

 Tables, views, … (RDBMS) C OODBMS C C C RDBMS C C T T T O/R mapping T T Application Layer Database Layer

(6)

Object-Relational Mapping (ORM)

Frameworks for mapping OO-model

to relational model

 Tools to create & maintain layers

 Automation limited

Generic ORM layers

 Hibernate, DataNucleus, ..

ORDBMS

 PostgreSQL, Oracle, DB2, …

Require

evolution

of

mapping

and

RDBMS

 By hand: XML, SQL, …

 Programmatically: SQL extensions (type, …).

Apparently these are hardly used.

C RDBMS C C T T T O/R mapping T T

ORM

provider

(7)

ODBMS evolution

Database model == application model

 Only one model to evolve, no ‘views’

 No mappings to evolve

Still, the one model has two representations (1:1 mapping)

 Application code (evolve via IDE)

 ODBMS schema (evolve via API)

C OODBMS C C C RDBMS C C T T T O/R mapping T T

(8)

Evolution Example

Change field type: String to Person

Replace “author” with unique Person instance

1

st

attempt:

 Schema evolution:

 Add class “Person”

 Class Comment:

Add field “Person author;”

  name clash

2

nd

attempt:

 Schema evolution:

 Add class “Person”

 Class Comment: Remove “String author;”

Class Comment: Now add “Person author;”

 How to initialise new attribute?

Old Model New Model Comment String author; Comment Person author; Person String name;

(9)

Database Evolution

Database evolution = Schema evolution ?

Database evolution

=

Schema evolution / Class evolution

+

(10)

Evolution Example – Correct Ordering

Change field type: String to Person

 Replace “author” with unique Person instance

Schema evolution

 Add class “Person”;

 Add field “Person auth2;”

Data evolution

 Look up Person

 If not found, create Person

 Assign Person to “auth2”

Schema evolution

 Remove field “author”

 Rename field “auth2” to “author”

Comment String author; Comment String author; Person auth2; Person String name; Comment String author; Person author; Person String name;

(11)

Evolution Example

Change field type: String to Person

Replace “author” with unique Person instance

Schema evolution

 Add class “Person”;

 Add field “Person auth2;”

Data evolution

Comment c = …;

Person p=find(c.author);

if (p==null) p=new Person(); c.auth2 = p;

 There is no appl.-class with auth2!  We need something like reflection!

Comment String author; Comment String author; Person auth2; Person String name; Comment String author; Person author; Person String name;

(12)

Evolution Example – Evolution API

Change field type: String to Person

Replace “author” with unique Person instance

Schema evolution

 Add class “Person”;

 Add field “Person auth2;”

Data evolution

Handle c = ... String a = c.getField(“author”); Person p=findPerson(a); if (p==null) p=createPerson(a); c.setField(“auth2”, p);

Schema evolution

 Remove field “author”

 Rename field “auth2” to “author”

Comment String author; Comment String author; Person auth2; Person String name; Comment String author; Person author; Person String name;

(13)

Database Evolution Taxonomy

Basic

schema

operations

 Class: Add, remove, rename

 Attribute: Add, remove, rename

Basic

data

operations

 Attribute: Change (initialize) value

 Class: Create/remove instances

Complex Operations (often not directly supported, can be

constructed from basic operations)

 Attribute: Change type (≈ rename + add + init + delete)

 Attribute: Move in class hierarchy (≈ add + init + remove)

 Class: Change super-class

(rename class, add/remove class, move attributes)

 … There are more: split/merge class, …

RDBMS

Class  Table(s) Attr.  Column

Change cell, add/rem row(s) Add/remove row(s)

(14)

db4o: Database Evolution

Idea: Evolution by changing Java code

Ignore changes to API, static or transient fields

Removing

a field

 new objects stored in new format

 old field ignored in objects stored in old format

(updating object will remove it)

Adding

a field

 new objects stored in new format

 additional field set to default value in objects stored in old format

Changing the

type

of a field

 simply stored as a new field

(15)

db4o: Database Evolution

db4o provides refactoring API similar to Java reflection

Renaming a field

 old field is deleted and a new inserted

Renaming a class

Merging fields

Splitting fields

manual using a helper program

Moving fields

CommonConfiguration#objectClass(...).objectField(...).rename(...);

(16)

db4o: Inheritance Evolution

Refactoring of inheritance structure not supported directly

 deleting classes from inheritance hierarchy

 inserting classes into inheritance hierarchy

 swap classes in inheritance hierarchy

Workaround

 copy class

 create one new instance for each old instance

 delete the old class & rename the new one

Author String name; Person String name; Author2 Author String name; Person String name; Author2 Author String name; Person String name; Author data

(17)

Implementing Evolution

How to evolve your database?

Automatic

evolution

 Analyse Java code and evolve

 db4o: add/remove only

 Nice, but very limited …

Scripted

evolution

 Use DBMS’ API to explicitly instruct schema changes

 For example:

db().getClass(“MyClass”).addAttr(name, type);

(18)

Automatic Evolution

Idea

 Compare old and new model

 Extract changes

 Generate evolution code

Often advertised

Conceptually limited

Model version 2 Person int age Car 0:* 0:* Evolve database Database v1 Model version 1 Person int age; Evolution Code while (iter.hasNext()) { //assign cars Person p = iter.next(); p.addCar(new Car()); }

Database v2

(19)

Problem: Type changes

String author; to Person author; Unique? How to initialise?

Problem: Add-remove / rename ambiguity

Problem: No semantic knowledge

float length; [inch] to [meter]

 Data evolution can be required without schema change!

 Still useful, because ‘add’ is the most common change

Automatic Evolution – Problems

29 April 2014 Tilmann Zäschke – [email protected] 19

Person String label;

Person String name; rename attr.

add attr. & remove attr.

(20)

Accidental evolution

Old classes in classpath

Multi-user DB: User with old software version

Partial evolution

 Evolves only classes that are used

 What happens in subsequent evolutions?

(21)

Scripted Evolution

Implement evolution by hand

No ambiguities

Fully support attribute type changes

 String to Person, …

Allows semantic changes

 [inch] to [meter]

 Split name into firstName, lastName

 …

No accidental evolution (e.g. triggered by old class, …)

Requires manual coding and testing

(22)

Evolution Strategies – Single User & Small DB

Many strategies on

how

and

when

to evolve a database

Data evolution: O(n)

 EITHER: Shut down application for prolonged time

 OR: Interference with other applications (Locking, performance, …)

Most databases are:

 Small

 Single-User

 For example address-books or office documents

 Evolve them during load or save

Multi-user databases are more complicated

 How to synchronize multiple users with their own clients?

(23)

Evolution Strategies #1 – Schema Mapping Layer

Schema mapping layers

 Search evolved and unevolved data, evolve on load

 Forward/backward compatibility: Restart one client at a time

 Old and new applications can get along for a while

 Popular with RDBMS (logical independence  ‘views’)

 Difficult with ODBMS because there is usually no mapping

Stop Old App. #1 Start New App

#1 Schema + data mapper New application Application #2 ... Reverse mapper (optional) Old application (optional) Trigger evolution Backup

(24)

Evolution Strategies #1 – Schema Mapping Layer

Pro

 No downtime. But is this always an issue?

 Access for old applications (high availability)

Contra

 Access for old applications

 Development phase: Old/buggy apps remain running (people forget to update)

 Glitches in performance or locking

 No separation of concerns …

 Exceptions in schema code  Posed to User

 Code cluttering / accumulation  Additional maintenance

 Bug hunting can be hard (Which app version? Mapping layer? …)

 Not easily revertible (backup always outdated)

 Mapping layer bugs will surface only during application runtime

(25)

Evolution Strategies #2 – Eager Evolution

Eager evolution

 Stop clients

 Backup

 Install new software

 Evolve database

 Start clients

Pro

 Simple to implement

 Simple to roll back

Contra

 Long downtime Evolve Stop user applications Start user applications [hours?] Backup

(26)

Evolution Strategies #3 – Lazy Evolution

Lazy evolution

 Stop clients

 Backup

 Install new software

Evolve schema and critical data

 Start clients

 Evolve remaining data

Pro

 Short downtime

Contra

 May affect running applications

 More implementation effort

 Not always possible

 Reverting can be harder

Evolve schema + some data

Backup Start user

applications Evolve remaining data [minutes?] Stop user applications

(27)

Evolution Strategies – Which One Is Best?

No clear rules

Downtime allowed?  Consider

Eager

or

Lazy

 Simple: Avoids schema mapping layers

 Best recovery options, separation of concerns, stops old apps, ….

 Eager: Ideal for small DBMS (office docs, …)

 Lazy (vs Eager)

 Pro: Can significantly reduce downtime

 Contra: Affects performance, more work, not always possible

 Case by case…

No downtime allowed during development (?)

 Go for schema mapping layers (maybe even RDBMS…)

(28)

Conclusion

Agile development may result in lots of database evolution

Application Model == DB Model?

 Reduces evolution effort, e.g. use ODBMS for OO-data

Database evolution: Schema + Data

 Data evolution usually more difficult and takes time

Only the simple parts of evolution can be automated

(29)

The Herschel Space Observatory

by the European Space Agency (ESA)

Herschel

 William Herschel

 Infrared observatory

 Largest mirror in space (3.5m)

 Coldest observatory (0.3 K)

 1.5M km from Earth

 Operation 2009-2013

 1 Million Euro per day

 Distributed development

 250 persistent classes

 1TB data during development

 5-15GB/d  10-15TB in total

(30)

The Herschel Common Science System (HCSS)

Observation proposals Proposal evaluation Proposal scheduling

Calibration & instrument programming

Satellite command generation

OODBMS

Astronomer Obs. Time Allocator Mission Planner Cal. Scientist / Inst. Engineer

Further processing Decompression & calibration Astronomer Telemetry ingestion Cal. Scientist / Inst. Engineer OODBMS OODBMS OODBMS

(31)

Agile development

50+ developers (≈10 full-time) in 5+ teams

Close communication with customer

 4-5 peer-customers (e.g. DBAs)

During development:

≈100 databases, up to 100GB, ≈10 use replication

Core Team

HCSS Project Structure

Developer 1 Scientists 1 Scientists 2 Peer 1 Peer 1

(32)

HCSS Schema Evolution

 Cycle:

 Incoming changes

 Implementation & testing

 Beta testing

 Code freeze (system tests, …)

 Total: 36 evolutions  500 changes(??)

Request schema changes

Week 1 Week 4 Week 6 Week 7

User Release Release of New Schema Version User Release Week 3 Week 2 Week 5 Further testing Code Freeze Developers Schema Developer Customers Implementa-tation and testing

(33)

HCSS Database Evolution Approach

ODBMS?

Automatic vs Scripted

(34)

HCSS: Agile & ODBMS?

Agile was a given (and appreciated)

ODBMS was a given, but fitted well with agile development

(ODBMS is quite common in defense and aviation industry)

(35)

HCSS: Automatic vs Scripted Evolution

Evolve automatically when opening DB or rely on evolution

API of DBMS?

Agile development

 Many refactorings

 Many changes too complex for automatic evolution

Prevent accidental evolution: Every DB stores a schema

version ID which is checked against the S/W version ID.

(36)

HCSS: Schema Evolution Strategy

Eager vs Lazy vs Schema Mapping (F/B compat.)?

DB downtime

 Not welcome but accepted  Only during development phase!

Lazy evolution

 When do users need evolved data?

 Mainly interested in newly created data

 Old data could wait a day or two, just needs proper indication

(37)

Lessons Learned:

Why Evolution May Fail Despite Tests

Limited time for testing

Test DBs ≠ real DBs

 Set-up (grid configuration, hardware, …)

 Scalability

Data varies

a lot between users

Erroneous data

 Software is still in development!

 Missing refs, inconsistent values, “P. Schmid” vs “Schmid, P.”

 Sometimes due to user-written applications or use of nightly builds

 Varies a lot and can be rare (rare is bad!)

(38)

Lessons Learned: Database Evolution Code

Evolution code is critical

 Evolution speed (8 hours vs 24…)

 Error during evolution  restore backup: x hours

 Error when starting applications afterwards  restore: x hours

 Database inconsistent/deleted, noticed only much later…  WCS

Evolution code is developed ad-hoc

 Plenty of reasons for problems (bad data: S/W still in development!)

 Limited testing

 Non-iterative!

Criticality Maturity

(39)

HCSS: Agile Database Evolution Wrap-Up

Agile development

 Lots of evolution, but manageable with ODBMS

 DB evolution dev. effort: avg. 3days/month, with 5-20 changes each

Agile was throughout appreciated

 Responsibility (Req. engineering, architecture, design, impl., …)

 Motivation (Personal relations to project and customer)

 Stable and known software at v1.0

 S/W was (mostly) finished on time

ODBMS  DBA == developer

 No communication bottleneck, can save a lot of time/headaches

(40)

HCSS: Conclusion

36 evolutions in ~6 years

No evolution since v1.0 (now: v10.0)

 Core (uplink) was hardly changed

Agile & ODBMS work well together

The whole thing worked fine

 Operation of satellite ceased 1 year ago (April 29th 2013)

(41)

References

http://herschel.esac.esa.int/latest_news.shtml

http://www.esa.int/Our_Activities/Space_Science/Herschel/

Operating_Herschel

T. Zäschke, M. Norrie; “Revisiting Schema Evolution In

Object Databases”; ICOODB 2010

References

Related documents