Agile Development and Schema Evolution
•
Schema Evolution
•
Agile Development & Databases
Schema Evolution
Adapt the logical/physical data model of a database
Reasons
Performance optimisation
Software Development Cycle
Any Software
Information Systems
develop deploy use feedback develop deploy evolve database use feedback
Agile Development
Cycle time
Waterfall: years
Agile: weeks
Agile properties relevant for DBMS
Iterative development Constant stream of changes
Users use software early on Users have databases
Frequent software releases Frequent evolution of user-DBMS
Easy solution
Scrap databases after every release Often not feasible
Real solution
Schemas in Information Systems
Application schema
Class definitions & hierarchy
Database schema
Class definitions & hierarchy (ODBMS)
Tables, views, … (RDBMS) C OODBMS C C C RDBMS C C T T T O/R mapping T T Application Layer Database Layer
Object-Relational Mapping (ORM)
Frameworks for mapping OO-model
to relational model
Tools to create & maintain layers
Automation limited
Generic ORM layers
Hibernate, DataNucleus, ..
ORDBMS
PostgreSQL, Oracle, DB2, …
Require
evolution
of
mapping
and
RDBMS
By hand: XML, SQL, …
Programmatically: SQL extensions (type, …).
Apparently these are hardly used.
C RDBMS C C T T T O/R mapping T T
ORM
providerODBMS evolution
Database model == application model
Only one model to evolve, no ‘views’
No mappings to evolve
Still, the one model has two representations (1:1 mapping)
Application code (evolve via IDE)
ODBMS schema (evolve via API)
C OODBMS C C C RDBMS C C T T T O/R mapping T T
Evolution Example
Change field type: String to Person
Replace “author” with unique Person instance
1
stattempt:
Schema evolution:
Add class “Person”
Class Comment:
Add field “Person author;”
name clash
2
ndattempt:
Schema evolution:
Add class “Person”
Class Comment: Remove “String author;”
Class Comment: Now add “Person author;”
How to initialise new attribute?
Old Model New Model Comment String author; Comment Person author; Person String name;
Database Evolution
Database evolution = Schema evolution ?
Database evolution
=
Schema evolution / Class evolution
+
Evolution Example – Correct Ordering
Change field type: String to Person
Replace “author” with unique Person instance
Schema evolution
Add class “Person”;
Add field “Person auth2;”
Data evolution
Look up Person
If not found, create Person
Assign Person to “auth2”
Schema evolution
Remove field “author”
Rename field “auth2” to “author”
Comment String author; Comment String author; Person auth2; Person String name; Comment String author; Person author; Person String name;
Evolution Example
Change field type: String to Person
Replace “author” with unique Person instance
Schema evolution
Add class “Person”;
Add field “Person auth2;”
Data evolution
Comment c = …;
Person p=find(c.author);
if (p==null) p=new Person(); c.auth2 = p;
There is no appl.-class with auth2! We need something like reflection!
Comment String author; Comment String author; Person auth2; Person String name; Comment String author; Person author; Person String name;
Evolution Example – Evolution API
Change field type: String to Person
Replace “author” with unique Person instance
Schema evolution
Add class “Person”;
Add field “Person auth2;”
Data evolution
Handle c = ... String a = c.getField(“author”); Person p=findPerson(a); if (p==null) p=createPerson(a); c.setField(“auth2”, p);
Schema evolution
Remove field “author”
Rename field “auth2” to “author”
Comment String author; Comment String author; Person auth2; Person String name; Comment String author; Person author; Person String name;
Database Evolution Taxonomy
Basic
schema
operations
Class: Add, remove, rename
Attribute: Add, remove, rename
Basic
data
operations
Attribute: Change (initialize) value
Class: Create/remove instances
Complex Operations (often not directly supported, can be
constructed from basic operations)
Attribute: Change type (≈ rename + add + init + delete)
Attribute: Move in class hierarchy (≈ add + init + remove)
Class: Change super-class
(rename class, add/remove class, move attributes)
… There are more: split/merge class, …
RDBMS
Class Table(s) Attr. Column
Change cell, add/rem row(s) Add/remove row(s)
db4o: Database Evolution
Idea: Evolution by changing Java code
Ignore changes to API, static or transient fields
Removing
a field
new objects stored in new format
old field ignored in objects stored in old format
(updating object will remove it)
Adding
a field
new objects stored in new format
additional field set to default value in objects stored in old format
Changing the
type
of a field
simply stored as a new field
db4o: Database Evolution
db4o provides refactoring API similar to Java reflection
Renaming a field
old field is deleted and a new inserted
Renaming a class
Merging fields
Splitting fields
manual using a helper program
Moving fields
CommonConfiguration#objectClass(...).objectField(...).rename(...);
db4o: Inheritance Evolution
Refactoring of inheritance structure not supported directly
deleting classes from inheritance hierarchy
inserting classes into inheritance hierarchy
swap classes in inheritance hierarchy
Workaround
copy class
create one new instance for each old instance
delete the old class & rename the new one
Author String name; Person String name; Author2 Author String name; Person String name; Author2 Author String name; Person String name; Author data
Implementing Evolution
How to evolve your database?
Automatic
evolution
Analyse Java code and evolve
db4o: add/remove only
Nice, but very limited …
Scripted
evolution
Use DBMS’ API to explicitly instruct schema changes
For example:
db().getClass(“MyClass”).addAttr(name, type);
Automatic Evolution
Idea
Compare old and new model
Extract changes
Generate evolution code
Often advertised
Conceptually limited
Model version 2 Person int age Car 0:* 0:* Evolve database Database v1 Model version 1 Person int age; Evolution Code while (iter.hasNext()) { //assign cars Person p = iter.next(); p.addCar(new Car()); }Database v2
Problem: Type changes
String author; to Person author; Unique? How to initialise?
Problem: Add-remove / rename ambiguity
Problem: No semantic knowledge
float length; [inch] to [meter]
Data evolution can be required without schema change!
Still useful, because ‘add’ is the most common change
Automatic Evolution – Problems
29 April 2014 Tilmann Zäschke – [email protected] 19
Person String label;
Person String name; rename attr.
add attr. & remove attr.
Accidental evolution
Old classes in classpath
Multi-user DB: User with old software version
Partial evolution
Evolves only classes that are used
What happens in subsequent evolutions?
Scripted Evolution
Implement evolution by hand
No ambiguities
Fully support attribute type changes
String to Person, …
Allows semantic changes
[inch] to [meter]
Split name into firstName, lastName
…
No accidental evolution (e.g. triggered by old class, …)
Requires manual coding and testing
Evolution Strategies – Single User & Small DB
Many strategies on
how
and
when
to evolve a database
Data evolution: O(n)
EITHER: Shut down application for prolonged time
OR: Interference with other applications (Locking, performance, …)
Most databases are:
Small
Single-User
For example address-books or office documents
Evolve them during load or save
Multi-user databases are more complicated
How to synchronize multiple users with their own clients?
Evolution Strategies #1 – Schema Mapping Layer
Schema mapping layers
Search evolved and unevolved data, evolve on load
Forward/backward compatibility: Restart one client at a time
Old and new applications can get along for a while
Popular with RDBMS (logical independence ‘views’)
Difficult with ODBMS because there is usually no mapping
Stop Old App. #1 Start New App
#1 Schema + data mapper New application Application #2 ... Reverse mapper (optional) Old application (optional) Trigger evolution Backup
Evolution Strategies #1 – Schema Mapping Layer
Pro
No downtime. But is this always an issue?
Access for old applications (high availability)
Contra
Access for old applications
Development phase: Old/buggy apps remain running (people forget to update)
Glitches in performance or locking
No separation of concerns …
Exceptions in schema code Posed to User
Code cluttering / accumulation Additional maintenance
Bug hunting can be hard (Which app version? Mapping layer? …)
Not easily revertible (backup always outdated)
Mapping layer bugs will surface only during application runtime
Evolution Strategies #2 – Eager Evolution
Eager evolution
Stop clients
Backup
Install new software
Evolve database
Start clients
Pro
Simple to implement
Simple to roll back
Contra
Long downtime Evolve Stop user applications Start user applications [hours?] BackupEvolution Strategies #3 – Lazy Evolution
Lazy evolution
Stop clients
Backup
Install new software
Evolve schema and critical data
Start clients
Evolve remaining data
Pro
Short downtime
Contra
May affect running applications
More implementation effort
Not always possible
Reverting can be harder
Evolve schema + some data
Backup Start user
applications Evolve remaining data [minutes?] Stop user applications
Evolution Strategies – Which One Is Best?
No clear rules
Downtime allowed? Consider
Eager
or
Lazy
Simple: Avoids schema mapping layers
Best recovery options, separation of concerns, stops old apps, ….
Eager: Ideal for small DBMS (office docs, …)
Lazy (vs Eager)
Pro: Can significantly reduce downtime
Contra: Affects performance, more work, not always possible
Case by case…
No downtime allowed during development (?)
Go for schema mapping layers (maybe even RDBMS…)
Conclusion
Agile development may result in lots of database evolution
Application Model == DB Model?
Reduces evolution effort, e.g. use ODBMS for OO-data
Database evolution: Schema + Data
Data evolution usually more difficult and takes time
Only the simple parts of evolution can be automated
The Herschel Space Observatory
by the European Space Agency (ESA)
Herschel
William Herschel
Infrared observatory
Largest mirror in space (3.5m)
Coldest observatory (0.3 K)
1.5M km from Earth
Operation 2009-2013
1 Million Euro per day
Distributed development
250 persistent classes
1TB data during development
5-15GB/d 10-15TB in total
The Herschel Common Science System (HCSS)
Observation proposals Proposal evaluation Proposal schedulingCalibration & instrument programming
Satellite command generation
OODBMS
Astronomer Obs. Time Allocator Mission Planner Cal. Scientist / Inst. Engineer
Further processing Decompression & calibration Astronomer Telemetry ingestion Cal. Scientist / Inst. Engineer OODBMS OODBMS OODBMS
Agile development
50+ developers (≈10 full-time) in 5+ teams
Close communication with customer
4-5 peer-customers (e.g. DBAs)
During development:
≈100 databases, up to 100GB, ≈10 use replication
Core Team
HCSS Project Structure
Developer 1 Scientists 1 Scientists 2 Peer 1 Peer 1HCSS Schema Evolution
Cycle:
Incoming changes
Implementation & testing
Beta testing
Code freeze (system tests, …)
Total: 36 evolutions 500 changes(??)
Request schema changes
Week 1 Week 4 Week 6 Week 7
User Release Release of New Schema Version User Release Week 3 Week 2 Week 5 Further testing Code Freeze Developers Schema Developer Customers Implementa-tation and testing
HCSS Database Evolution Approach
ODBMS?
Automatic vs Scripted
HCSS: Agile & ODBMS?
Agile was a given (and appreciated)
ODBMS was a given, but fitted well with agile development
(ODBMS is quite common in defense and aviation industry)
HCSS: Automatic vs Scripted Evolution
Evolve automatically when opening DB or rely on evolution
API of DBMS?
Agile development
Many refactorings
Many changes too complex for automatic evolution
Prevent accidental evolution: Every DB stores a schema
version ID which is checked against the S/W version ID.
HCSS: Schema Evolution Strategy
Eager vs Lazy vs Schema Mapping (F/B compat.)?
DB downtime
Not welcome but accepted Only during development phase!
Lazy evolution
When do users need evolved data?
Mainly interested in newly created data
Old data could wait a day or two, just needs proper indication
Lessons Learned:
Why Evolution May Fail Despite Tests
Limited time for testing
Test DBs ≠ real DBs
Set-up (grid configuration, hardware, …)
Scalability
Data varies
a lot between users
Erroneous data
Software is still in development!
Missing refs, inconsistent values, “P. Schmid” vs “Schmid, P.”
Sometimes due to user-written applications or use of nightly builds
Varies a lot and can be rare (rare is bad!)
Lessons Learned: Database Evolution Code
Evolution code is critical
Evolution speed (8 hours vs 24…)
Error during evolution restore backup: x hours
Error when starting applications afterwards restore: x hours
Database inconsistent/deleted, noticed only much later… WCS
Evolution code is developed ad-hoc
Plenty of reasons for problems (bad data: S/W still in development!)
Limited testing
Non-iterative!
Criticality Maturity
HCSS: Agile Database Evolution Wrap-Up
Agile development
Lots of evolution, but manageable with ODBMS
DB evolution dev. effort: avg. 3days/month, with 5-20 changes each
Agile was throughout appreciated
Responsibility (Req. engineering, architecture, design, impl., …)
Motivation (Personal relations to project and customer)
Stable and known software at v1.0
S/W was (mostly) finished on time
ODBMS DBA == developer
No communication bottleneck, can save a lot of time/headaches
HCSS: Conclusion
36 evolutions in ~6 years
No evolution since v1.0 (now: v10.0)
Core (uplink) was hardly changed
Agile & ODBMS work well together
The whole thing worked fine
Operation of satellite ceased 1 year ago (April 29th 2013)