Agile Development and Schema Evolution. Schema Evolution Agile Development & Databases Case Study on the Herschel Project

(1)

Agile Development and Schema Evolution

• Schema Evolution

• Agile Development & Databases

(2)

Schema Evolution

Adapt the logical/physical data model of a database

Reasons



Performance optimisation

(3)

Software Development Cycle

Any Software

Information Systems

develop deploy use feedback develop deploy evolve database use feedback

(4)

Agile Development



Cycle time

 Waterfall: years

 Agile: weeks



Agile properties relevant for DBMS

 Iterative development  Constant stream of changes

 Users use software early on  Users have databases

 Frequent software releases  Frequent evolution of user-DBMS



Easy solution

 Scrap databases after every release  Often not feasible



Real solution

(5)

Schemas in Information Systems



Application schema

 Class definitions & hierarchy



Database schema

 Class definitions & hierarchy (ODBMS)

 Tables, views, … (RDBMS) C OODBMS C C C RDBMS C C T T T O/R mapping T T Application Layer Database Layer

(6)

Object-Relational Mapping (ORM)



Frameworks for mapping OO-model

to relational model

 Tools to create & maintain layers

 Automation limited



Generic ORM layers

 Hibernate, DataNucleus, ..



ORDBMS

 PostgreSQL, Oracle, DB2, …



Require

evolution

of

mapping

and

RDBMS

 By hand: XML, SQL, …

 Programmatically: SQL extensions (type, …).

Apparently these are hardly used.

C RDBMS C C T T T O/R mapping T T

ORM

provider

(7)

ODBMS evolution



Database model == application model

 Only one model to evolve, no ‘views’

 No mappings to evolve



Still, the one model has two representations (1:1 mapping)

 Application code (evolve via IDE)

 ODBMS schema (evolve via API)

C OODBMS C C C RDBMS C C T T T O/R mapping T T

(8)

Evolution Example



Change field type: String to Person

Replace “author” with unique Person instance



1

st

_attempt:

 Schema evolution:

 Add class “Person”

 Class Comment:

Add field “Person author;”

  name clash



2

nd

_attempt:

 Schema evolution:

 Add class “Person”

 Class Comment: Remove “String author;”

 Class Comment: Now add “Person author;”

 How to initialise new attribute?

Old Model New Model Comment String author; Comment Person author; Person String name;

(9)

Database Evolution

Database evolution = Schema evolution ?

Database evolution

=

Schema evolution / Class evolution

+

(10)

Evolution Example – Correct Ordering



Change field type: String to Person

 Replace “author” with unique Person instance



Schema evolution

 Add class “Person”;

 Add field “Person auth2;”



Data evolution

 Look up Person

 If not found, create Person

 Assign Person to “auth2”



Schema evolution

 Remove field “author”

 Rename field “auth2” to “author”

Comment String author; Comment String author; Person auth2; Person String name; Comment String author; Person author; Person String name;

(11)

Evolution Example



Change field type: String to Person



Replace “author” with unique Person instance



Schema evolution



Data evolution

Comment c = …;

Person p=find(c.author);

if (p==null) p=new Person(); c.auth2 = p;

 There is no appl.-class with auth2!  We need something like reflection!

(12)

Evolution Example – Evolution API



Change field type: String to Person



Replace “author” with unique Person instance



Schema evolution



Data evolution

Handle c = ... String a = c.getField(“author”); Person p=findPerson(a); if (p==null) p=createPerson(a); c.setField(“auth2”, p);



Schema evolution

 Remove field “author”

 Rename field “auth2” to “author”

(13)

Database Evolution Taxonomy



Basic

schema

operations

 Class: Add, remove, rename

 Attribute: Add, remove, rename



Basic

data

operations

 Attribute: Change (initialize) value

 Class: Create/remove instances



Complex Operations (often not directly supported, can be

constructed from basic operations)

 Attribute: Change type (≈ rename + add + init + delete)

 Attribute: Move in class hierarchy (≈ add + init + remove)

 Class: Change super-class

(rename class, add/remove class, move attributes)

 … There are more: split/merge class, …

RDBMS

Class  Table(s) Attr.  Column

Change cell, add/rem row(s) Add/remove row(s)

(14)

db4o: Database Evolution



Idea: Evolution by changing Java code



Ignore changes to API, static or transient fields



Removing

a field

 new objects stored in new format

 old field ignored in objects stored in old format

(updating object will remove it)



Adding

a field

 new objects stored in new format

 additional field set to default value in objects stored in old format



Changing the

type

of a field

 simply stored as a new field

(15)

db4o: Database Evolution



db4o provides refactoring API similar to Java reflection



Renaming a field

 old field is deleted and a new inserted



Renaming a class



Merging fields



Splitting fields

manual using a helper program



Moving fields

CommonConfiguration#objectClass(...).objectField(...).rename(...);

(16)

db4o: Inheritance Evolution



Refactoring of inheritance structure not supported directly

 deleting classes from inheritance hierarchy

 inserting classes into inheritance hierarchy

 swap classes in inheritance hierarchy



Workaround

 copy class

 create one new instance for each old instance

 delete the old class & rename the new one

Author String name; Person String name; Author2 Author String name; Person String name; Author2 Author String name; Person String name; Author da_ta

(17)

Implementing Evolution



How to evolve your database?



Automatic

evolution

 Analyse Java code and evolve

 db4o: add/remove only

 Nice, but very limited …



Scripted

evolution

 Use DBMS’ API to explicitly instruct schema changes

 For example:

db().getClass(“MyClass”).addAttr(name, type);

(18)

Automatic Evolution



Idea

 Compare old and new model

 Extract changes

 Generate evolution code



Often advertised



Conceptually limited

Model version 2 Person int age Car 0:* 0:* Evolve database Database v1 Model version 1 Person int age; Evolution Code while (iter.hasNext()) { //assign cars Person p = iter.next(); p.addCar(new Car()); }

Database v2

(19)



Problem: Type changes

 String author; to Person author; Unique? How to initialise?



Problem: Add-remove / rename ambiguity



Problem: No semantic knowledge

 float length; [inch] to [meter]

 Data evolution can be required without schema change!

 Still useful, because ‘add’ is the most common change

Automatic Evolution – Problems

29 April 2014 Tilmann Zäschke – [email protected] 19

Person String label;

Person String name; rename attr.

add attr. & remove attr.

(20)

Accidental evolution



Old classes in classpath



Multi-user DB: User with old software version



Partial evolution

 Evolves only classes that are used

 What happens in subsequent evolutions?

(21)

Scripted Evolution

Implement evolution by hand



No ambiguities



Fully support attribute type changes

 String to Person, …



Allows semantic changes

 [inch] to [meter]

 Split name into firstName, lastName

 …



No accidental evolution (e.g. triggered by old class, …)



Requires manual coding and testing

(22)

Evolution Strategies – Single User & Small DB



Many strategies on

how

and

when

to evolve a database



Data evolution: O(n)

 EITHER: Shut down application for prolonged time

 OR: Interference with other applications (Locking, performance, …)



Most databases are:

 Small

 Single-User

 For example address-books or office documents

 Evolve them during load or save



Multi-user databases are more complicated

 How to synchronize multiple users with their own clients?

(23)

Evolution Strategies #1 – Schema Mapping Layer



Schema mapping layers

 Search evolved and unevolved data, evolve on load

 Forward/backward compatibility: Restart one client at a time

 Old and new applications can get along for a while

 Popular with RDBMS (logical independence  ‘views’)

 Difficult with ODBMS because there is usually no mapping

Stop Old App. #1 Start New App

#1 Schema + data mapper New application Application #2 ... Reverse mapper (optional) Old application (optional) Trigger evolution Backup

(24)

Evolution Strategies #1 – Schema Mapping Layer



Pro

 No downtime. But is this always an issue?

 Access for old applications (high availability)



Contra

 Access for old applications

 Development phase: Old/buggy apps remain running (people forget to update)

 Glitches in performance or locking

 No separation of concerns …

 Exceptions in schema code  Posed to User

 Code cluttering / accumulation  Additional maintenance

 Bug hunting can be hard (Which app version? Mapping layer? …)

 Not easily revertible (backup always outdated)

 Mapping layer bugs will surface only during application runtime

(25)

Evolution Strategies #2 – Eager Evolution



Eager evolution

 Stop clients

 Backup

 Install new software

 Evolve database

 Start clients



Pro

 Simple to implement

 Simple to roll back



Contra

 Long downtime Evolve Stop user applications Start user applications [hours?] Backup

(26)

Evolution Strategies #3 – Lazy Evolution



Lazy evolution

 Stop clients

 Backup

 Install new software

 Evolve schema and critical data

 Start clients

 Evolve remaining data



Pro

 Short downtime



Contra

 May affect running applications

 More implementation effort

 Not always possible

 Reverting can be harder

Evolve schema + some data

Backup Start user

applications Evolve remaining data [minutes?] Stop user applications

(27)

Evolution Strategies – Which One Is Best?



No clear rules



Downtime allowed?  Consider

Eager

or

Lazy

 Simple: Avoids schema mapping layers

 Best recovery options, separation of concerns, stops old apps, ….

 Eager: Ideal for small DBMS (office docs, …)

 Lazy (vs Eager)

 Pro: Can significantly reduce downtime

 Contra: Affects performance, more work, not always possible

 Case by case…



No downtime allowed during development (?)

 Go for schema mapping layers (maybe even RDBMS…)

(28)

Conclusion



Agile development may result in lots of database evolution



Application Model == DB Model?

 Reduces evolution effort, e.g. use ODBMS for OO-data



Database evolution: Schema + Data

 Data evolution usually more difficult and takes time



Only the simple parts of evolution can be automated

(29)

The Herschel Space Observatory

by the European Space Agency (ESA)

Herschel

 William Herschel

 Infrared observatory

 Largest mirror in space (3.5m)

 Coldest observatory (0.3 K)

 1.5M km from Earth

 Operation 2009-2013

 1 Million Euro per day

 Distributed development

 250 persistent classes

 1TB data during development

 5-15GB/d  10-15TB in total

(30)

The Herschel Common Science System (HCSS)

Observation proposals Proposal evaluation Proposal scheduling

Calibration & instrument programming

Satellite command generation

OODBMS

Astronomer Obs. Time Allocator Mission Planner Cal. Scientist / Inst. Engineer

Further processing Decompression & calibration Astronomer Telemetry ingestion Cal. Scientist / Inst. Engineer OODBMS OODBMS OODBMS

(31)



Agile development



50+ developers (≈10 full-time) in 5+ teams



Close communication with customer

 4-5 peer-customers (e.g. DBAs)



During development:

≈100 databases, up to 100GB, ≈10 use replication

Core Team

HCSS Project Structure

Developer 1 Scientists 1 Scientists 2 Peer 1 Peer 1

(32)

HCSS Schema Evolution

 Cycle:

 Incoming changes

 Implementation & testing

 Beta testing

 Code freeze (system tests, …)

 Total: 36 evolutions  500 changes(??)

Request schema changes

Week 1 Week 4 Week 6 Week 7

User Release Release of New Schema Version User Release Week 3 Week 2 Week 5 Further testing Code Freeze Developers Schema Developer Customers Implementa-tation and testing

(33)

HCSS Database Evolution Approach



ODBMS?



Automatic vs Scripted

(34)

HCSS: Agile & ODBMS?



Agile was a given (and appreciated)



ODBMS was a given, but fitted well with agile development

(ODBMS is quite common in defense and aviation industry)

(35)

HCSS: Automatic vs Scripted Evolution



Evolve automatically when opening DB or rely on evolution

API of DBMS?



Agile development

 Many refactorings

 Many changes too complex for automatic evolution



Prevent accidental evolution: Every DB stores a schema

version ID which is checked against the S/W version ID.

(36)

HCSS: Schema Evolution Strategy



Eager vs Lazy vs Schema Mapping (F/B compat.)?



DB downtime

 Not welcome but accepted  Only during development phase!



Lazy evolution

 When do users need evolved data?

 Mainly interested in newly created data

 Old data could wait a day or two, just needs proper indication

(37)

Lessons Learned:

Why Evolution May Fail Despite Tests



Limited time for testing



Test DBs ≠ real DBs

 Set-up (grid configuration, hardware, …)

 Scalability



Data varies

a lot between users



Erroneous data

 Software is still in development!

 Missing refs, inconsistent values, “P. Schmid” vs “Schmid, P.”

 Sometimes due to user-written applications or use of nightly builds

 Varies a lot and can be rare (rare is bad!)

(38)

Lessons Learned: Database Evolution Code



Evolution code is critical

 Evolution speed (8 hours vs 24…)

 Error during evolution  restore backup: x hours

 Error when starting applications afterwards  restore: x hours

 Database inconsistent/deleted, noticed only much later…  WCS



Evolution code is developed ad-hoc

 Plenty of reasons for problems (bad data: S/W still in development!)

 Limited testing

 Non-iterative!

Criticality Maturity

(39)

HCSS: Agile Database Evolution Wrap-Up



Agile development

 Lots of evolution, but manageable with ODBMS

 DB evolution dev. effort: avg. 3days/month, with 5-20 changes each



Agile was throughout appreciated

 Responsibility (Req. engineering, architecture, design, impl., …)

 Motivation (Personal relations to project and customer)

 Stable and known software at v1.0

 S/W was (mostly) finished on time



ODBMS  DBA == developer

 No communication bottleneck, can save a lot of time/headaches

(40)

HCSS: Conclusion



36 evolutions in ~6 years



No evolution since v1.0 (now: v10.0)

 Core (uplink) was hardly changed



Agile & ODBMS work well together



The whole thing worked fine

 Operation of satellite ceased 1 year ago (April 29th₂₀₁₃₎

(41)

References



http://herschel.esac.esa.int/latest_news.shtml



http://www.esa.int/Our_Activities/Space_Science/Herschel/

Operating_Herschel



T. Zäschke, M. Norrie; “Revisiting Schema Evolution In

Object Databases”; ICOODB 2010