Domain driven design, NoSQL and multi-model
databases
Max Neunhöffer
Java Meetup New York, 10 November 2014
Max Neunhöffer
I am a mathematician
“Earlier life”: Research inComputer Algebra (Computational Group Theory)
Always juggled withbig data
Now: working in database development, NoSQL, ArangoDB I like:
research, hacking, teaching,
tickling the highest performance out of computer systems.
A typical Project: a Web Shop
The Specification Workshop
(need recommendation engine, need statistics, etc.)
The Developers get to work
. . .(tables, relations, normalisation, schemas, queries, front-ends, etc.)
HANDOVER
(Why can I not . . . ? This is unusable!)
Solution: Agile Approach and Domain Driven Design
These days, many use (or try to use):
agile methods(Scrum, sprints, rapid prototyping)
withcontinuous feedbackfrom product owners to developers promisingless surprisesin deployment andhigh flexibility.
Domain Driven Design(Eric Evans, 2004):
identify aDomain(area in which software is applied) make aModel(abstract description of situation)
use aUbiquitous Language(that all team members speak) clearly define theContextin which the model applies.
Model your dataas close to the domain as possible.
Example:object oriented programming
Fundamental Problem: need a ubiquitous Language
Listening to team members, you hearcompletely different things:
Product Managers talk about
customers“browsing”through the shop,
powerful searchfor products (with the “good ones” up),
“useful”recommendations.
Developers talk about
tables, normalisation, queries and joins secondary indexes, front-end pages
object oriented, model view controller, responsive design
=⇒both groups think the others aremorons
The problem is rooted very deeply
functionalitynot gathered methodically
⇓
“obvious”functions are missing
no common language
⇓
misunderstandingsabout details
NoSQL: Richer Data Models are closer to the Domain
Some terms used by Evans as part of the ubiquitous language:
Entity: has anidentityandmutable state(e.g. a person) Value object: isidentified by its attributesandimmutable
(e.g. an address)
Aggregate: is acombinationof entities and value objects into one transactional unit(e.g. a customer with its orders) Association: is arelationbetween entities and value objects, can
have attributes, usuallyimmutable Consequences
These termscoming from the Domainmust bepresent in the Design. The whole team mustunderstand the same when talking about them.
Polyglot Persistence
Idea
Usethe right data modelforeach partof a system.
For an application, persist
an object or structured data as aJSON document, a hash table in akey/value store,
relations between objects in agraph database, a homogeneous array in arelational DBMS.
If the table has many empty cells or inhomogeneous rows, use acolumn-based database.
Takescalability needsinto account!
Document and Key/Value Stores
Document store
Adocument storestores a set of documents, which usually meansJSON data, these sets are calledcollections. The database has access to the contents of the documents.
each document in the collection has aunique key
secondary indexespossible, leading to more powerful queries different documents in the same collection:structure can vary no schemais required for a collection
database normalisation can berelaxed Key/value store
Opaque values, onlykey lookupwithout secondary indexes:
=⇒high performanceand perfectscalability
Graph Databases
Graph database
Agraph databasestores a labelled graph.Verticesand edgesaredocuments. Graphs are good to model relations.
graphsoften describe datavery naturally(e.g. the facebook friendship graph)
graphscan be stored using tables, however, graph queries notoriously lead toexpensive joins
there areinteresting and useful graph algorithmslike “shortest path” or “neighbourhood”
need agood query languageto reap the benefits horizontal scalabilityis troublesome
graph databasesvary widely inscopeandusage, no standard
A typical Use Case — an Online Shop
We need to hold
customerdata: usually homogeneous, but still variations
=⇒use adocument store:
productdata: even for a specialised business quite inhomogeneous
=⇒use adocument store:
shopping carts: need very fast lookup by session key
=⇒use akey/value store:
orderandsalesdata: relate customers and products
=⇒use adocument store:
recommendation enginedata: links between different entities
=⇒use agraph database:
Polyglot Persistence is nice, but . . .
Consequence:One needsmultiple database systemsin the persis- tence layer of asingleproject!
Polyglot persistence introducessome frictionthrough data synchronisation,
data conversion,
increased installation and administration effort, more training needs.
Wouldn’t it be nice, . . .
. . . to enjoy thebenefitswithout thedisadvantages?
The Multi-Model Approach
Multi-model database
Amulti-model databasecombines adocument storewith a graph databaseand akey/value store.
Vertices are documents in avertex collection, edges are documents in anedge collection.
a single, common query language forall three data models is able to compete withspecialised productson their turf allows for polyglot persistence usinga single database queries canmix the different data models
canreplace a RDMBSin many cases
A Map of the NoSQL Landscape
Map/reduce
Column Stores Extensibility
Documents
Massively distributed
Graphs Structured
Data
Key/Value Operational DBs
Analytic DBs Complex queries
is amulti-model database(document store & graph database), isopen source and free(Apache 2 license),
offers convenient queries (viaHTTP/RESTandAQL), includingjoinsbetween different collections,
strongconsistency guarantees usingtransactions ismemory efficientby shape detection,
usesJavaScript throughout(Google’s V8 built into server), API extensible by JavaScript code in theFoxx framework, offers manydriversfor a wide range of languages,
is easy to use withweb front endandgood documentation, and enjoysgood communityas well asprofessional support.
A Map of the NoSQL Landscape
Map/reduce
Column Stores Extensibility
Documents
Massively distributed
Graphs Structured
Data
Key/Value Operational DBs
Analytic DBs Complex queries
The ArangoDB Territory
Map/reduce
Column Stores Extensibility
Documents
Massively distributed
Graphs Structured
Data
Key/Value Operational DBs
Analytic DBs Complex queries
Strong Consistency
ArangoDB offers
atomic and isolated CRUDoperations for single documents, transactions spanningmultiple documentsandmultiple collections,
snapshot semantics forcomplex queries,
very secure durable storage usingappend onlyand storing multiple revisions,
all this fordocumentsas well as forgraphs.
In the (near) future, ArangoDB will
offer the same ACID semanticseven with sharding,
implementcomplete MVCC semanticsto allow forlock-free concurrent transactions.
Replication and Sharding — horizontal scalability
Right now, ArangoDB provides
easy setup of (asynchronous)replication,
which allowsread access parallelisation(master/slaves setup), shardingwith automatic data distribution to multiple servers.
Very soon, ArangoDB will feature
fault tolerancebyautomatic failoverandsynchronous replicationin cluster mode,
zero administrationby aself-reparingandself-balancing cluster architecture.
Powerful query language: AQL
The built inArangoQueryLanguageAQLallows complex, powerful and convenient queries, withtransaction semantics,
allowing to dojoins,
withuser definable functions(in JavaScript).
AQL isindependent of the driverused and offersprotection against injectionsby design.
For Version 2.3, we arereengineeringthe AQL query engine:
use a C++ implementation forhigh performance, optimisedistributed queriesin the cluster.