• No results found

NOSQL DATABASE SYSTEMS

N/A
N/A
Protected

Academic year: 2021

Share "NOSQL DATABASE SYSTEMS"

Copied!
106
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

NoSQL Database Systems

Categorization – Data Model – Storage Layout – Query Models – Solution Architectures • Data Modeling Application Development

Scalability, Availability and Consistency – Partitioning, Replication

– Consistency Models and Transactions • Select the Right DBMS

– Performance and Benchmarks – Polyglot Persistence

id ti …

(3)

NoSQL Database Systems

Considered Categories of NoSQL Database Systems – Key-Value Database Systems

– Document Database Systems

– Column Family Database Systems

(4)

Key-Value Database Systems

Data Model

– Key-value pairs

• Unique keys • Values

arbitrary type (serialized byte arrays) or – strings, lists, sets, ordered sets (of strings)

Schema-free Storage Layout

– Hash-Maps, B-Trees, … • Indexes

Primary indexes (Hash, B-tree) on key Secondary indexes on values?

key value key value key value key value key value NoSQL

(5)

Key-Value Database Systems (Cont.)

Query Models – Simple API

• set (key, value) • value = get (key) • delete (key)

• Operations on values?

– More complex operations

Language Bindings

MapReduce  later in this chapter

Systems

– Oracle Berkeley DB (mid-90s)

– Caches (EHCache, Memcache)

– Amazon Dynamo/S3, Redis, Riak, Voldemort, …

Big Data Technologies: NoSQL DBMS - SoSe 2015 5

key value key value key value key value key value

h_da Prof. Dr. Uta Störl

(6)

{

"id": 1,

"name": “football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } }

Document Store Database Systems

Data Model

– Key-value pairs with “documents” as value

– Document format: JSON or BSON (Binary JSON)

• Loosely structured name(key)-value pairs • Hierarchical

Additionally, MongoDB uses collections

• “arbitrary” documents could be grouped together

• documents in a collection should be similar to facilitate effective indexing

Storage Layout

– B-Trees to store the documents

– MongoDB: Documents in a single collection are stored together

(7)

Document Store Database Systems (Cont.)

Indexes

– Primary indexes on documentId (key) – Secondary indexes on JSON-names

• Default or user defined

• Composite indexes may be supported

Query Models

– Simple API: set/get/delete

– Further query support differ widely

• Powerful ad-hoc queries with integrated query language (MongoDB) • No ad-hoc queries, predefined views with indexes only (CouchDB &

Couchbase)

– Language Bindings

MapReduce  later in this chapter Systems

– MongoDB, CouchDB, Couchbase, …

{

"id": 1,

"name": “football boot", "price": 199, "stock": { "warehouse": 120, "retail": 10 } } NoSQL

(8)

Column Family Database Systems

Data Model

– Loosely structured by columns and column families (“set of nested maps”)

Column Family

• set of columns grouped together into a bundle • Column families have to be predefined

Column

• Not predefined; any type or data (can be nested)

Column Family Column Family Row Key1 column Row Key2 column column column column column column column column Table NoSQL

(9)

Column Family Database Systems (Cont.)

Data Model (Cont.) – Example:

Column family database systems support multiple versions of each cell by timestamps:

Row Key: title Column Family text Column Family revision

"NoSQL" text:content: "A NoSQL database provides

a mechanism …" revision:author: "Mendel" revision:comment": "changed … " "Redis" text:content: "Redis is an open-source,

networked …" revision:author: "Torben" revision:comment: "initial …"

9 Big Data Technologies: NoSQL DBMS - SoSe 2015

Row Key: title Time Stamp Column Family text Column Family revision

"NoSQL" t5 text:content: "…" revision:author: "Mendel" revision:comment: "changed …" t4 revision:author: "Torben"

revision:comment: "there …" "Redis" t3 text:content: "…" revision:author: "Torben"

revision:comment: "initial …"

h_da Prof. Dr. Uta Störl

(10)

Column Family Database Systems (Cont.)

Storage Layout

– Data is stored by column family

Row Key: title Time Stamp Column Family text column: content

NoSQL t5 A NoSQL database provides a mechanism … Redis t3 Redis is an open-source, networked …

Row Key: title Time Stamp ColumnFamily revision

column: author column: comment

NoSQL t5 Mendel changed view …

NoSQL t4 Torben there should be

Redis t3 Torben initial …

Row Key: title Time Stamp Column Family text Column Family revision

"NoSQL" t5 text:content: "…" revision:author: "Mendel"

revision:comment: "changed view … "

t4 revision:author: "Torben"

revision:comment: “there should be …"

"Redis" t3 text:content: "…" revision:author: "Torben" revision:comment: "initial …"

(11)

Column Family Database Systems (Cont.)

• Classical example: Web table

Row Key Time Stamp Column Family contents Column Family anchor

"com.cnn.www" t9 anchor:anchor:"cnnsi.com“ anchor:anchortext:"CNN" t8 anchor:anchor:"my.look.ch“ anchor:anchortext: "CNN.com" t6 "<html>…" t5 "<html>…" NoSQL

(12)

Column Family Database Systems (Cont.)

Query Models

– Simple API

• set (table, row, column, value) • value = get (table, row, column) • delete (table, row, column)

timestamp optional

– Language Bindings

– More powerful query engines integrated (Cassandra Query Language) or as additional software products (e.g. Google App Engine / Google

Datastore for BigTable, Hive for Data Warehousing on HBase) – MapReduce  later in this chapter

Indexes

– Primary indexes (B-Trees  sorted ordered) – Default or user defined secondary indexes • Systems

– Google BigTable, HBase, Cassandra, Amazon SimpleDB, …

(13)

NoSQL (Not only SQL): Definition

“NoSQL Definition: Next Generation Databases mostly addressing some

of the points: being non-relational, distributed, open-source and horizontally scalable.

The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more

characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.”

Source: S. Edlich, nosql-database.org

(14)

NoSQL (Not only SQL): Definition

Next Generation Databases mostly addressing some of the points: • non-relational

schema-free simple API

distributed and horizontally scalable easy replication support

eventually consistent / BASE (not ACID) open-source ???

more complex APIs currently under development

BASE as well as ACID are supported nowadays

(15)

NoSQL: The Essence

Data Model

non-relational schema-free

Scalability

distributed and horizontally scalable easy replication support

(16)

NoSQL Database Systems: Use Cases

Key-Value Database

Systems Document Store Database Systems Column Family Database Systems

Suitable Use Cases • Storing Session

Information User Profiles, Preferences Shopping Cart Data Event Logging Content Management Systems Blogging Platforms Web Analytics or Real-Time Analytics Event Logging Content Management Systems Blogging Platforms

Examples • Amazon (shopping carts) • Temetra (meter data) • … • Forbes (CMS) • MTV (CMS) • …

• Google (web pages) • Facebook (messaging) • Twitter (places of interest) • … NoSQL

(17)

NoSQL Family Tree

Source: cloudant.com

(18)

Google Stack Source: Saake/Schallehn:2011

Solution Architectures (Examples)

Hadoop Stack

(19)

NoSQL Database Systems

Categorization Data Model Storage Layout Query Models Solution Architectures Data Modeling Application Development

Scalability, Availability and Consistency – Partitioning, Replication

– Consistency Models and Transactions • Select the Right DBMS

– Performance and Benchmarks – Polyglot Persistence

id ti …

(20)

Data Modeling

Object-relational impedance mismatch • Example: blog, blogpost, comment, author

– Object-oriented modeling

– Mapping to relational database

(21)

Data Modeling Decisions

Primary Decision: Embedding vs. Referencing However, to consider

There are no join operations within NoSQL database systems! There are no distributed transactions within NoSQL!

Advantages and Disadvantages of Embedding

Advantages and Disadvantages of Referencing

• Martin Fowler: Aggregate-Oriented Modeling

Big Data Technologies: NoSQL DBMS - SoSe 2015 21

h_da Prof. Dr. Uta Störl

(22)

Data Modeling: Document Store DBS

• How to realize references? • Direction of references?

• Embedding: What about denormalization and redundancy?

(23)

Data Modeling: Column Family DBS

• How to implement embedded objects in column family database systems?

– Variant 1: Using run-time named column qualifiers – Variant 2: Using timestamps (or other id’s)

New (Cassandra CQL3): Using collection types (map, set, list)

• What about column families?

(24)

Data Modeling

• What about data modeling in key-value database systems?

Data Modeling: Conclusion – More degrees of freedom – Embedding vs. referencing

– Denormalization and redundancy

(25)

NoSQL Database Systems

Categorization Data Model Storage Layout Query Models Solution Architectures Data Modeling Application Development

Scalability, Availability and Consistency – Partitioning, Replication

– Consistency Models and Transactions • Select the Right DBMS

– Performance and Benchmarks – Polyglot Persistence

id ti …

(26)

Application Development for NoSQL

• Simple command line APIs

• REST-API

• (Some) more powerful query languages / query engines • Language Bindings

– Java, Ruby, C#, Python, Erlang, PHP, Perl, – REST

(27)

Application Development for NoSQL

• Example: title, content from blogpost with id = 042

// HBase

get 'blogposts', '042', { COLUMN => ['blogpost_data:title', 'blogpost_data:content'] }

// Cassandra

SELECT title, content FROM blogposts WHERE id = '042';

// MongoDB

db.blogposts.find( { _id : '042' }, { title: 1, content: 1 } ) // Couchbase

function (doc) {

if (doc._id == '042') {

emit(doc._id, [doc.title, doc.content]); }

(28)

Application Development for NoSQL

Challenge – Big data

– Data distributed over several hundred notes (remember: scale out) – Data-to-Code or Code-to-Data?

Executing jobs in parallel over several nodes

(29)

MapReduce: Basic Idea

• Old idea from functional programming (LISP, ML, Erlang, Scala etc.) – Divide tasks into small discrete tasks and run them in parallel – Never change original data (pipe concept)

Different operations on the same data do not influence No concurrency conflicts

No deadlocks

No race conditions MapReduce

– Basic idea and framework introduced by Google 2004:

J. Dean and S. Gehmawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04. 2004

http://labs.google.com/papers/mapreduce.html

Big Data Technologies: NoSQL DBMS - SoSe 2015 29

(30)

MapReduce: Basic Idea & WordCount Example

• Developers should implement two primary methods

Map: (key1, val1) → [(key2, val2)]

Reduce: (key2, [val2]) → [(key3, val3)]

Documents

Sport, Handball, Soccer

Soccer, FIFA

Documents

Sport, Gym, Money Soccer, FIFA, Money Key Value Sport 1 Handball 1 Soccer 1 Soccer 1

FIFA Key 1 Value Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 Key Value Sport 2 Handball 1 Soccer 3 Key Value FIFA 2 Gym 1 Money 2 MAP MAP REDUCE REDUCE Doc1 Doc2 Doc3 Doc4

(31)

MapReduce: Architecture and Phases

(32)

Map & Reduce Functions (Example)

Hadoop Example

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, …) … { String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, …) … { int sum = 0;

while (values.hasNext()) { sum += values.next().get(); }

output.collect(key, new IntWritable(sum)); }

}

(33)

MapReduce: Optional Combine Phase

• Decrease the shuffling cost

– Reduce the result size of map functions

– Perform reduce-like function in each machine

Documents

Sport, Handball, Soccer

Soccer, FIFA

Documents

Sport, Gym, Money Soccer, FIFA, Money Key Value Sport 1 Handball 1 Soccer 1 Soccer 1

FIFA Key 1 Value Sport 1 Gym 1 Money 1 Soccer 1 FIFA 1 Money 1 MAP MAP REDUCE REDUCE Key Value Sport 1 Handball 1 Soccer 2 FIFA 1 Key Value Sport 1 Gym 1 Money 2 Soccer 1 FIFA 1 COMBINE COMBINE

(34)

MapReduce Frameworks

• MapReduce frameworks take care of – Scaling

– Fault tolerance – (Load balancing)

MapReduce Frameworks

– Google (however, Google now promotes Dataflow)

– Apache Hadoop

• standalone or integrated in NoSQL (and SQL) DBMS

• Also commercial distributors: Cloudera, MapR, HortonWorks, …

(35)

Map Reduce and Query Languages

• MapReduce paradigm is too low-level

– Only two declarative primitives (map + reduce)

– Custom code for simple operations like projection and filtering – Code is difficult to reuse and maintain

Combination of high-level declarative querying and low-level programming with MapReduce

Dataflow Programming Languages

– HiveQL

– Pig – (Jaql)

(36)

Hadoop Stack

(37)

HiveQL

• Hive: data warehouse infrastructure built on top of Hadoop, providing: – Data Summarization

– Ad hoc querying

• Simple query language: HiveQL (based on SQL) • Extendable via custom mappers and reducers

• Developed by Facebook, now subproject of Hadoop • http://hadoop.apache.org/hive/

(38)
(39)

Pig

• A platform for analyzing large data sets • Pig consists of two parts:

– PigLatin: A Data Processing Language

– Pig Infrastructure: An Evaluator for PigLatin programs – Pig compiles Pig Latin into physical plans

– Plans are to be executed over Hadoop

• Interface between the declarative style of SQL and low-level, procedural style of MapReduce

(40)
(41)

MapReduce in Practice

VLDB 2012: Chen, Alspaugh, Katz: Interactive Analytical Processing in Big Data

Systems: A CrossIndustry Study of MapReduce Workloads:

(42)
(43)

MapReduce Trends

Hadoop 2.0 with YARN (Abstract from MapReduce)

Apache

– “In-Memory” Hadoop

– Performance! – Written in Scala

Big Data Technologies: NoSQL DBMS - SoSe 2015 43

h_da Prof. Dr. Uta Störl

(44)

Application Development for NoSQL

 MapReduce: Concept and Frameworks

• „State of the art“ application development

– With relational database systems: Object-Relational Mapping (ORM) frameworks and standards (Java Persistence API etc.) – Frameworks for Object-NoSQL mapping?!

(45)

Object-NoSQL Mapper: Architecture

Objekt-NoSQL Mapper

Applikation

SELECT titel, text FROM blogposts WHERE id = ’042’;

get ’blogposts’, ’042’,

{ COLUMN => [’blogpost_daten:titel’, ’blogpost_daten:text’] } SELECT b.titel, b.text FROM blogpost b WHERE b.id = ’042’

id titel 042 … id titel 042 … id tit … id tit … db.blogposts.find ( { _id : ’042’ } , { titel: 1, text: 1 } ) { "id" : "042", "titel" : ... }

(46)

Object-NoSQL Mapper: Market Overview

Mapper for different Programming Languages – Java, .NET, Python, Ruby …

– Volatile Market …

Main Focus: Object-NoSQL Mapper for Java

– Standardization: Java Persistence API (JPA) with Java Persistence Query Language (JPQL)

Categorization

– Multi Data Store Mapper – Single Data Store Mapper

(47)

Java Multi Data Store Mapper

• Support for Document Store, Column Family, and Graph Database Systems in Java Multi Data Store Mapper

Data

Nucleus Eclipse Link Hibernate OGM Kundera PlayORM Spring Data

Document Store Couchbase CouchDB   MongoDB       Column-Family DBMS Cassandra     HBase    Graph DBMS Neo4J    

(48)

Java Multi Data Store Mapper

• Support for Key-Value Database Systems in Java Multi Data Store Mapper

Data

Nucleus Eclipse Link Hibernate Kundera PlayORM Spring Data

Key-Value DBMS AmazonDynamoDB Apache Solr Ehcache Elasticsearch   GemFire Infinispan Oracle NoSQL   Redis  

(49)

Java Object-NoSQL Mapper:

Supported Functionality

Single Data Store Mapper

*Limited functionality (depending from the underlying NoSQL data store)

Source: Störl/Hauf/Klettke/Scherzinger: Schemaless NoSQL Data Stores – Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015

(50)

Object-NoSQL Mapper: Query Language Support

Challenge: Different Query Language Interfaces – Examples:

• Most systems do not support any JOINS

• Many systems do not offer aggregate functions, LIKE operator, or NOT operator, …

Approaches

1. Offer only the particular subset of features that is implemented by all supported NoSQL data stores, i.e. the intersection of features 2. Distinguish by data store and offer only the set of features

implemented by a particular NoSQL data store

3. Offer the same set of features for all supported NoSQL data

stores, possibly complementing missing features by implementing them inside the Object-NoSQL Mapper

(51)

Object-NoSQL Mapper: Query Language Support

Approach 2: NoSQL data store specific support of JPQL operators – Drawback: restricted portability

– Systems: Hibernate OGM, Kundera, EclipseLink

• Example: JPQL operators (selection) in Kundera

(52)

Object-NoSQL Mapper: Query Language Support

Approach 2: NoSQL data store specific support of JPQL operators Extension: Use third-party libraries to offer more functionality for

some but not for all supported NoSQL data stores – Systems: Hibernate OGM

(Hibernate Search), Kundera each with Apache Lucene NoSQL-DBMS Object-NoSQL Mapper Search Engine Index Application

(53)

Object-NoSQL Mapper: Query Language Support

Approach 3: Offer the same set of features for all supported NoSQL

data stores

– Complementing missing features by implementing them inside the Object-NoSQL Mapper

– Benefit: Portability

– Drawback: Performance

– Systems: DataNucleus,

Hibernate OGM (announced)

NoSQL-DBMS

Object-NoSQL Mapper

(54)

Object-NoSQL Mapper: Query Language Support

Outlook: Combination of Approach 2 and 3

– Systems: Hibernate OGM (announced)

NoSQL-DBMS Search Engine Index

Object-NoSQL Mapper

(55)

Conclusion: Java Object-NoSQL Mapper

Vendor Independency / Portability

Standardized Query Language (JPQL) – Support for different NoSQL data stores

– Supported query operators often depend on the capabilities of the underlying NoSQL data stores

Performance (as of end of 2014)

In reading data, there is only a small gap between native access and the Object-NoSQL Mappers for the majority of the evaluated

products

– Yet in writing, object mappers introduce a significant overhead – Further reading: U. Störl, Th. Hauf, M. Klettke and S. Scherzinger:

Schemaless NoSQL Data Stores – Object-NoSQL Mappers to the Rescue? BTW 2015, Hamburg, March 2015

(56)

NoSQL Database Systems

Categorization Data Model Storage Layout Query Models Solution Architectures Data Modeling Application Development

Scalability, Availability and Consistency – Partitioning, Replication

– Consistency Models and Transactions • Select the Right DBMS

– Performance and Benchmarks – Polyglot Persistence

id ti …

(57)

Partitioning

• Vertical vs. horizontal partitioning? • Horizontal Partitioning

Range-based Partitioning

• Based on an attribute value • Disadvantage?

Key-based Partitioning

• Based on key value

• Implemented with hash function usually – why? • Disadvantage?

(58)

Partitioning: Consistent Hashing

Consistent Hashing

Karger et al 1997: Consistent hashing and random trees: Distributed

caching protocols for relieving hot spots on the World Wide Web. – Keys and nodes (identified by id or server name) are mapped onto a

“circle”

– Keys are assigned to the node that is next to them in a clockwise direction

(59)

Partitioning: Consistent Hashing (Cont.)

• Consistent Hashing: Example – Removal of server C

– Addition of server D

Big Data Technologies: NoSQL DBMS - SoSe 2015 59

Source: http://weblogs.java.net/blog/2007/11/27/consistent-hashing h_da Prof. Dr. Uta Störl

(60)

Partitioning: Consistent Hashing (Cont.)

(61)

Partitioning in NoSQL Database Systems

• Key-based Partitioning

– Redis (client-side hash function), Riak, CouchBase, … • Range-based Partitioning

(62)

Replication

Motivation / Benefit

– Performance enhancement

– Availability enhancement (fault tolerance)

Tradeoff between benefits of replication and work required to keep replicas consistent

(63)

Types of Replication

• Two basic (and orthogonal) parameters: – Where is the update performed?

everywhere vs. selected copy

When are the updates propagated?

• synchronous vs. asynchronous

Types of Replication

Master-Slave Replication

• Where: Selected copy for update

Multi-Master Replication

(64)

Master-Slave Replication

Where is the update performed?

One selected copy (primary node / primary copy) – All other replicas are read only

– Different data items can have different primary nodes • When are the updates propagated?

(65)

Update Propagation

Synchronous update propagation (eager replication) – Expensive, blocking 

Asynchronous update propagation (lazy replication) – Consistency 

– Weak consistency  later in this chapter

Ensure consistency without synchronous update propagation?

– Read only from master node (replicas for failover purposes only)

• e.g. CouchBase

(66)

Terminology

N: the number of nodes that store replicas of the data

W: the number of replicas that need to acknowledge the receipt of the update before the update completes

R: the number of replicas that are contacted when a data item is accessed through a read operation

W W+R > N Quorum Assembly Strong Consistency W W+R <= N Weak Consistency R R

(67)

Quorum Consensus: Server-Side

W+R > N  strong consistency through quorum assembly

– W=N and R = 1  read optimized strong consistency (ROWA) – W=1 and R = N  write optimized strong consistency

– R = W = N/2 + 1  Majority Consensus

– A common choice in NoSQL database systems is N=3, R=2, W=2

(68)

Quorum Consensus: Variations

Unweighted vs. weighted votes

• Weighted votes

– Each node is assigned a weight, e.g. for “better” replicas – Instead of sum of nodes N use sum of weights of nodes

Static vs. dynamic quorum

• Dynamic Quorum

– Quorums can be chosen separately for each item, e.g. in case of unavailable nodes

(69)

Quorum Consensus: Client-Side Example

Cassandra (Version 1.2)

Write Consistency Level

Zero: A write must be written to at least one node. (If all replica nodes for the

given row key are down  hinted handoff)

One/Two/Three: A write must be written to at least one/two/three replica

node(s).

Quorum*: A write must be written on a quorum of replica nodes (N/2 +1).

All: A write must be written on all replica nodes.

Read Consistency Level

One/Two/Three: Returns a response from the closest / two / three of the

closest replica (may be inconsistent).

Quorum*: Returns the record with the most recent timestamp once a quorum

of replicas (N/2 + 1) has responded.

All: Returns the record with the most recent timestamp once all replicas have

responded. The read operation will fail if a replica does not respond.

(70)

Asynchronous Write:

Write-Related Strategies

• How to propagate a write operation to an unavailable node? • Hinted Handoff Algorithms

– Do a “hinted” write to an alive node (e.g., nearest live replica) – When the failed node returns to the cluster, the updates received

by the neighboring nodes are handed off to it

System can continue to handle requests as if the node where still there

– Implemented in Cassandra, Riak, etc.

– But: How does a node learn when a node is available?

• E.g., gossip protocols

– each node periodically sends its current view of the ring state to a randomly-selected peer (or other protocols to choose the peers)

(71)

Asynchronous Write:

Read-Related Strategies

Read-Related Strategies

– A write has not propagated to all replicas

• Repair outdated replicas after read  Read Repair

• Repair outdated replicas that have not been read  Anti-Entropy

Read Repair Algorithm

– A system may detect that several nodes are out of sync with older versions of the data requested in a read operation

Mark the nodes with the stale data with a Read Repair flag Synchronizing the stale nodes with newest version of the data

requested

− Implemented in Cassandra, Riak, etc.

Big Data Technologies: NoSQL DBMS - SoSe 2015 71

(72)

Multi-Master Replication

Where is the update performed? Update everywhere

New challenge: concurrent writes When are the updates propagated?

– Synchronous

• No concurrent writes  • Expensive, blocking, …  • Deadlocks 

– Asynchronous

(73)

Conflict Detection and Resolution

• How to detect older versions of data? • How to detect concurrent writes?

– Timestamps!

– Timestamps in a distributed environment?!

• global unique timestamp:= <node identifier, unique local timestamp> • Define within each node Ni a logical clock (LCi)*, which generates the

unique local timestamp

• If Ni received a request from a transaction T with timestamp < Nj , LCj> and LCi < LCj  set LCi = LCj + 1

Common approach in NoSQL database systems: Vector Clocks

(74)

Conflict Detection: Vector Clocks

• Vector Clocks

– Vector clock = list of (node, counter)

– On receive: element-wise maximum

A B C C:1 B:1 C:1 A:1 B:1 C:1 A:1 B:2 C:2 A:1 B:2 C:1 C:1 C:1 B:1 C:1 B:1 C:1 A:1 B:1 C:1 A:1 B:2 C:1 A:1 B:1 C:1 read read A:1 B:2 C:1

(75)

Conflict Detection: Vector Clocks (Cont.)

Vector Clocks allows to determine whether one object is a direct

descendant of the other / direct descendant of a common parent / are unrelated in recent heritage

A B C C:1 B:1 C:1 A:1 B:1 C:1 B:2 C:1 C:1 C:1 B:1 C:1 B:1 C:1 B:2 C:1 A:1 B:1 C:1 Concurrent writes cannot be resolved automatically!

(76)

Replication and Availability: Systems

Master-Slave Replication

– Redis, HBase, MongoDB, CouchBase, etc. – Master: single point of failure

automatic failover mechanisms necessary

• Special instances responsible for monitoring (e.g. Redis, HBase) • Handle failover with existing instances (e.g. MongoDB, CouchBase)

Multi-Master Replication – Riak

(77)

NoSQL Database Systems

Categorization Data Model Storage Layout Query Models Solution Architectures Data Modeling Application Development

Scalability, Availability and Consistency Partitioning, Replication

– Consistency Models and Transactions • Select the Right DBMS

– Performance and Benchmarks – Polyglot Persistence

id ti …

(78)

CAP Theorem

CAP Theorem (Eric Brewer, 2000): in a distributed database system, you can only have at most two of the following three characteristics:

Consistency: all clients have the same view, even in case of updates Availability: every request received by a non-failing node in the

system must result in a response (i.e., even when severe network failures occur, every request must terminate)

Partition tolerance: system properties hold even when the network (system) is partitioned (i.e., nodes can still function when

(79)

CAP Theorem

Asymmetry of CAP properties

– Some are properties of the system in general

– Some are properties of the system only when there is a partition

A

P

C

DBMS X, Y DBMS A, B DBMS K, L

(80)

Critism of CAP Theorem

[Aba2012]: Abadi, J. Consistency Tradeoffs in Modern Distributed Database

System Design. In IEEE Computer 45(2)

http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf

– “… CAP has become increasingly misunderstood and misapplied, potentially causing significant harm. In particular, many designers

incorrectly conclude that the theorem imposes certain restrictions on a DDBS during normal system operation, and therefore implement an

unnecessarily limited system. In reality, CAP only posits limitations in the

face of certain types of failures, and does not constrain any system capabilities during normal operation. …”

The theorem simply states that a network partition causes the

system to have to decide between reducing availability or consistency

General: Fundamental tradeoff between consistency, availability

(81)

PACELC

[Aba2012]: Rewriting CAP as PACELC:

if there is a partition (P), how does the system trade off availability and consistency (A and C);

else (E), when the system is running normally in the absence of partitions, how does the system trade off latency (L) and

consistency (C)? PA/EL

– Default versions of Amazon Dynamo, Cassandra, Riak – However, R + W > N  more consistency

PC/EC

– HBase, BigTable • PA/EC

(82)

Strong vs. Weak Concistency

Strong consistency

– After the update completes, any subsequent access will return the updated value, i.e., any subsequent access from A, B, C will return D1

Weak consistency

– The system does not guarantee that subsequent accesses will return the updated value D1 (a number of conditions need to be met before D1 is returned)

(83)

Eventual Concistency

Eventual consistency (Vogel, 2008) – Specific form of weak consistency

– Guarantees that if no new updates are made, eventually all accesses will return D1

– If no failures occur, the maximum size of the inconsistency window can be determined based on factors such as communication delays, the load on the system, and the number of replicas involved in the replication scheme.

(84)

BASE

Alternative consistency model: BASE (Eric Brewer, 2000) – Basically available

• Availability of the system even in case of failures

– Soft-state

• The state of the system may change over time, even without input (clients must accept stale state under certain circumstances.)

– Eventually consistent

• The system will become consistent over time, given that the system doesn't receive input during that time

(85)

Variants of Eventual Consistency

Causal consistency:

– If A notifies B about the update, B will read D1 (but not C!) • Read-your-writes:

– A will always read D1 after its own update • Session consistency:

– Read-your-writes inside a session • Monotonic reads:

– If a process has seen Dk, any subsequent access will never return any Di with i < k

Monotonic writes:

– Guarantees to serialize the writes of the same process

(86)

Consistency and Replication: Example

• Facebook‘s strategy:

– The master copy is always in one location, a remote user typically has a closer but potentially stale copy.

– However, when users update their pages, the update goes to the master copy directly as do all the user’s reads for a short time, despite higher latency.

– After 20 seconds, the user’s traffic reverts to the closer copy, which by that time should reflect the update.

Source: Eric A. Brewer: Pushing the CAP: Strategies for Consistency and

Availability. In IEEE Computer 45(2)

Based on: J. Sobel: Scaling Out. Facebook Engineering Notes, 20 Aug. 2008;

(87)

Transactions

ACID vs. BASE?

What about isolation on the same node?  MVCC

BASE focus mainly on C and I – however, what bout A and D in ACID?

Atomicity

• Most NoSQL systems:

– No concept of transactional features

– Atomicity only for a single object, row, or document respectively • Few NoSQL systems only:

– Concept of transactional features over multiple objects

• Redis, Google Datastore, …

(88)

Transactions

Durability

• Durability in relational database systems: write-ahead logging (WAL) Concepts of Persistence and Durability in NoSQL systems

• In-memory only

– Durability by replication only

• In-memory and write-ahead logging by configuration

– Redis, CouchBase…

• Write-ahead logging by default – Riak, HBase, MongoDB, …

(89)

NoSQL Database Systems

Categorization Data Model Storage Layout Query Models Data Modeling Application Development

Scalability, Availability and Consistency Partitioning, Replication

Consistency Models and Transactions Select the Right DBMS

– Performance and Benchmarks – Polyglot Persistence

id ti …

(90)

Performance / Benchmarks

Traditional database benchmarks

– Benchmarks simulate typical usage scenarios (OLTP: TPC-C, OLAP: TPC-H, SAP benchmarks)

– Metrics:

• Performance (transactions per minute) • Price/performance

Benchmarks for NoSQL database systems? – What is a „typical“ NoSQL scenario?

• Facebook? Log analytics? Web Caching?

– Metrics?

• Scalability • Availability

• Partition Tolerance • …

(91)

Benchmarks for NoSQL Systems

• Open research field!

• Up to now: Specific workloads only

Yahoo! Cloud Serving Benchmark YCSB(SoCC 2010)

– Simple operations only (read, insert, delete, range scans) – No specific use-case; different workload scenarios

Workload Operations Workload R 100 % Reads Workload U 100 % Updates Workload I 100 % Inserts Workload M 50 % Reads 25 % Updates 25 % Inserts

(92)

Yahoo! Cloud Serving Benchmark

Yahoo! Cloud Serving Benchmark: Metrics Performance

• For constant hardware increase throughput  measure latency

Scaling

Scale-up: Increase hardware, data size and workload proportionally  measure latency

• Example:

Elastic Speedup: measure latency during dynamically server addition

h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS - SoSe 2015 92

(93)

Yahoo! Cloud Serving Benchmark

Status Quo YCSB

– Lot of variants publishes (over 560 forks on GitHub …)

• Including implementation errors regarding measurement as well as analysis of results (see H. Wegert: Benchmarking von

NoSQL-Datenbanksystemen, master’s thesis, University of Applied Sciences,

April 2015)

General challenge: Comparison of different configurations of NoSQL database systems

– No independent Benchmarking Council for NoSQL database

benchmarking up to now (like TPC for relational database systems)

(94)

Benchmarking: Persistence

Couchbase 3.0.1 (2015)

• YCSB Thumbtack Technologies Version

• 4 Server Nodes, 4 Clients (Big Data Cluster h_da)

1 10 100 1.000 10.000 100.000 1.000.000 I U Ø O pe ra tio ne n pr o S ek unde (l og ) Workload Keine Bestätigung Eine Bestätigung Zwei Bestätigungen Drei Bestätigungen Vier Bestätigungen

(95)

Benchmarking: Durability

Cassandra 2.1.2 (2015)

• YCSB Thumbtack Technologies Version

• 4 Server Nodes, 4 Clients (Big Data Cluster h_da)

1 10 100 1.000 10.000 100.000 1.000.000 I U M Ø O pe ra tio ne n pr o S ek unde (l og ) Workload Periodic (10.000 ms) Batch (50 ms) WAL deaktiviert

(96)

Select the Right DBMS

• Choosing between NoSQL or RDBMS?!

• Select the right (NoSQL) DBMS?! • Criteria

– Data Analysis

• Estimated size of date • Complexity of data • Type of navigation

– Consistency Requirements – Query Requirements

– Performance Requirements (latency, scalability)

– Non functional Requirements (license, company policies, security, documentation etc.)

– Costs (including development and administration)

– More detailed list: http://nosql-database.org/select-the-right-database.html

(97)

Polyglot Persistence

• „One Size Fits All“?

• Alternative: Choose the right system for the right task! – Example: Amadeus Log Service

• Hundreds of terabyte log data each week (SOA architecture with several servers)

• Architecture (Prototype, Kossmann:2012)

– Distributed file system (HDFS) for compressed log data

– NoSQL system (HBase) for storage and instant random access (indexing by timestamp and SessionID)

– Full text search engine (Apache Solr) for queries on log messages – MapReduce framework (Apache Hadoop) for analysis (usage

statistics and error)

– Relational DBMS (Oracle) for meta data (user infos etc.)

(98)

Polyglot Persistence

Example (Source: Sadalage, P. J., & Fowler, M. NoSQL Distilled. Pearson Education, 2013)

Key-Value

DBMS Store DBMS Document (Legacy DB) RDBMS Graph DBMS

Shopping cart

and session data Completed Orders Inventory and Item Price social graph Customer

(99)

One size fits all?

NewSQL DBMS

Idea behind: Best of both worlds

• SQL • ACID

• Non locking concurrency control • High per-node performance

• Scale out, shared nothing architecture

Opportunity 1: Development of new database systems

• VoltDB (Michael Stonebraker) • Drizzle

• …

Opportunity 2: Integration in existing database systems

• MySQL Cluster • JSON integration

Big Data Technologies: NoSQL DBMS - SoSe 2015 99

(100)

Trends: JSON Integration

PostgreSQL 9.2

– Native JSON support since release 9.2 (2012) – Proprietary JSON Query API

IBM DB2

– Native JSON support since 10.5 (June 2013)

– Using MongoDB API

IBM Informix

– Native JSON support since 12.10 (September 2013)

– Using MongoDB API

Oracle

– Native JSON Support since 12.1.0.2 (July 2014) – Proprietary JSON Query API

(101)

• JSON stored as BSON in BLOB column

Trends: JSON Integration – Example DB2

(102)

Trends: Integration of MapReduce

Trend: MapReduce (Hadoop) Integration in relational DBMS and data

warehouse systems

– 2012 available

• Oracle BigData-Appliance

• Oracle NoSQL 2.0 (Key-Value-Store) • IBM Infosphere with Hadoop support

• Microsoft SQL Server 2012 with Hadoop* support • …

– 2013

• Integration of Hadoop in SAPs BigData portfolio (SAP HANA, SAP Sybase IQ, SAP Data Integrator, SAP Business Objects)

• Hadoop Integration in Teradata with

SQL-H-API (instead of writing Map-Reduce jobs) • SAS on Hadoop

(103)
(104)
(105)

Database Popularity

Scoring:

Google/Bing results, Google Trends,

Stackoverflow, job offers, LinkedIn

(106)

Big Data Technologies

 Introduction

 NoSQL Database Systems

• Column Store Database Systems

• In-Memory Database Systems

References

Related documents

Whereas this signs refers to the ‘Art Education Resources’ and designers could be used other appropriate sign (like-educational resources, art education,

Premium-Discount Formula and Other Bond Pricing Formulas.. 1

In this section, we look at protocols for secure service migra- tion [8]. As stated previously, the main issues are fraudulent Cloud providers that attempt to entice service

I dette kapittelet har vi sett hvordan skoleelevene fikk delta aktivt i museumsoppleggene jeg observerte og hvordan også det kan settes i sammenheng med Falk og Dierkings teorier om

In addition to the lower probability of receiving Social Security benefits and an employer-sponsored pension, this research found that never married people also possessed

mononematous, macronematous, rarely semimacronematous, septate, straight or flexuous, geniculate at upper part, cell size decreasing towards apex, irregularly branched, cell

Level 4 – being able to cite at least 5 relevant cases to support their argument with accurate names and some factual description and make reference to specific sections of

Stefanova Graphic Designer [email protected] mobil +45 22883497 skoleprojekt 2011 visuel identitet af et nyt produkt. Projekt 2009 animation/