Domain driven design, NoSQL and multi-model databases

(1)

Domain driven design, NoSQL and multi-model

databases

Max Neunhöffer

Java Meetup New York, 10 November 2014

(2)

Max Neunhöffer

I am a mathematician

“Earlier life”: Research inComputer Algebra (Computational Group Theory)

Always juggled withbig data

Now: working in database development, NoSQL, ArangoDB I like:

research, hacking, teaching,

tickling the highest performance out of computer systems.

(3)

A typical Project: a Web Shop

The Speciﬁcation Workshop

(need recommendation engine, need statistics, etc.)

The Developers get to work

^{. . .}

(tables, relations, normalisation, schemas, queries, front-ends, etc.)

HANDOVER

(Why can I not . . . ? This is unusable!)

(4)

Solution: Agile Approach and Domain Driven Design

These days, many use (or try to use):

agile methods(Scrum, sprints, rapid prototyping)

withcontinuous feedbackfrom product owners to developers promisingless surprisesin deployment andhigh ﬂexibility.

Domain Driven Design(Eric Evans, 2004):

identify aDomain(area in which software is applied) make aModel(abstract description of situation)

use aUbiquitous Language(that all team members speak) clearly deﬁne theContextin which the model applies.

Model your dataas close to the domain as possible.

Example:object oriented programming

(5)

Fundamental Problem: need a ubiquitous Language

Listening to team members, you hearcompletely different things:

Product Managers talk about

customers“browsing”through the shop,

powerful searchfor products (with the “good ones” up),

“useful”recommendations.

Developers talk about

tables, normalisation, queries and joins secondary indexes, front-end pages

object oriented, model view controller, responsive design

=⇒both groups think the others aremorons

(6)

The problem is rooted very deeply

functionalitynot gathered methodically

⇓

“obvious”functions are missing

no common language

⇓

misunderstandingsabout details

(7)

NoSQL: Richer Data Models are closer to the Domain

Some terms used by Evans as part of the ubiquitous language:

Entity: has anidentityandmutable state(e.g. a person) Value object: isidentiﬁed by its attributesandimmutable

(e.g. an address)

Aggregate: is acombinationof entities and value objects into one transactional unit(e.g. a customer with its orders) Association: is arelationbetween entities and value objects, can

have attributes, usuallyimmutable Consequences

These termscoming from the Domainmust bepresent in the Design. The whole team mustunderstand the same when talking about them.

(8)

Polyglot Persistence

Idea

Usethe right data modelforeach partof a system.

For an application, persist

an object or structured data as aJSON document, a hash table in akey/value store,

relations between objects in agraph database, a homogeneous array in arelational DBMS.

If the table has many empty cells or inhomogeneous rows, use acolumn-based database.

Takescalability needsinto account!

(9)

Document and Key/Value Stores

Document store

Adocument storestores a set of documents, which usually meansJSON data, these sets are calledcollections. The database has access to the contents of the documents.

each document in the collection has aunique key

secondary indexespossible, leading to more powerful queries different documents in the same collection:structure can vary no schemais required for a collection

database normalisation can berelaxed Key/value store

Opaque values, onlykey lookupwithout secondary indexes:

=⇒high performanceand perfectscalability

(10)

Graph Databases

Graph database

Agraph databasestores a labelled graph.Verticesand edgesaredocuments. Graphs are good to model relations.

graphsoften describe datavery naturally(e.g. the facebook friendship graph)

graphscan be stored using tables, however, graph queries notoriously lead toexpensive joins

there areinteresting and useful graph algorithmslike “shortest path” or “neighbourhood”

need agood query languageto reap the beneﬁts horizontal scalabilityis troublesome

graph databasesvary widely inscopeandusage, no standard

(11)

A typical Use Case — an Online Shop

We need to hold

customerdata: usually homogeneous, but still variations

=⇒use adocument store:

productdata: even for a specialised business quite inhomogeneous

shopping carts: need very fast lookup by session key

=⇒use akey/value store:

orderandsalesdata: relate customers and products

recommendation enginedata: links between different entities

=⇒use agraph database:

(12)

Polyglot Persistence is nice, but . . .

Consequence:One needsmultiple database systemsin the persistence layer of asingleproject!

Polyglot persistence introducessome frictionthrough data synchronisation,

data conversion,

increased installation and administration effort, more training needs.

Wouldn’t it be nice, . . .

. . . to enjoy thebeneﬁtswithout thedisadvantages?

(13)

The Multi-Model Approach

Multi-model database

Amulti-model databasecombines adocument storewith a graph databaseand akey/value store.

Vertices are documents in avertex collection, edges are documents in anedge collection.

a single, common query language forall three data models is able to compete withspecialised productson their turf allows for polyglot persistence usinga single database queries canmix the different data models

canreplace a RDMBSin many cases

(14)

A Map of the NoSQL Landscape

Map/reduce

Column Stores Extensibility

Documents

Massively distributed

Graphs Structured

Data

Key/Value Operational DBs

Analytic DBs Complex queries

(15)

is amulti-model database(document store & graph database), isopen source and free(Apache 2 license),

offers convenient queries (viaHTTP/RESTandAQL), includingjoinsbetween different collections,

strongconsistency guarantees usingtransactions ismemory eﬃcientby shape detection,

usesJavaScript throughout(Google’s V8 built into server), API extensible by JavaScript code in theFoxx framework, offers manydriversfor a wide range of languages,

is easy to use withweb front endandgood documentation, and enjoysgood communityas well asprofessional support.

(16)

A Map of the NoSQL Landscape

Map/reduce

Documents

Graphs Structured

Data

(17)

The ArangoDB Territory

Map/reduce

Documents

Graphs Structured

Data

(18)

Strong Consistency

ArangoDB offers

atomic and isolated CRUDoperations for single documents, transactions spanningmultiple documentsandmultiple collections,

snapshot semantics forcomplex queries,

very secure durable storage usingappend onlyand storing multiple revisions,

all this fordocumentsas well as forgraphs.

In the (near) future, ArangoDB will

offer the same ACID semanticseven with sharding,

implementcomplete MVCC semanticsto allow forlock-free concurrent transactions.

(19)

Replication and Sharding — horizontal scalability

Right now, ArangoDB provides

easy setup of (asynchronous)replication,

which allowsread access parallelisation(master/slaves setup), shardingwith automatic data distribution to multiple servers.

Very soon, ArangoDB will feature

fault tolerancebyautomatic failoverandsynchronous replicationin cluster mode,

zero administrationby aself-reparingandself-balancing cluster architecture.

(20)

Powerful query language: AQL

The built inArangoQueryLanguageAQLallows complex, powerful and convenient queries, withtransaction semantics,

allowing to dojoins,

withuser deﬁnable functions(in JavaScript).

AQL isindependent of the driverused and offersprotection against injectionsby design.

For Version 2.3, we arereengineeringthe AQL query engine:

use a C++ implementation forhigh performance, optimisedistributed queriesin the cluster.