• No results found

Databases 2 (VU) ( )

N/A
N/A
Protected

Academic year: 2021

Share "Databases 2 (VU) ( )"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

Databases 2 (VU) (707.030)

Introduction to NoSQL

Denis Helic

KMI, TU Graz

(2)

Outline

1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4 NoSQL Approaches Slides

Slides are based on “Introduction to Databases” course from Stanford University by Jennifer Widom

(3)

NoSQL Motivation

NoSQL: What is in the name?

Very often “SQL” = traditional relations DBMS Experience from the past decade: not every data

management/analysis problem is best solved using a traditional relational DBMS

“NoSQL” = not using traditional relational DBMS “NoSQL” 6= don’t use SQL language

(4)

NoSQL Motivation

Databases: Types of problems

Not every data management/analysis problem is best solved using a traditional DBMS

DBMS

Database Management System (DBMS) provides . . . efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.

(5)

NoSQL Motivation

Databases: Types of problems

Not every data management/analysis problem is best solved using a traditional DBMS Convenient Multi-user Safe Persistent Reliable Massive Efficient

(6)

NoSQL Motivation

Databases: Types of problems

Not every data management/analysis problem is best solved using a traditional DBMS

E.g. a tagging system

http://delicious.com/ http://www.bibsonomy.org/

The relation between tags and URLs is a “many-to-many” (m : n) relation

(7)

NoSQL Motivation

Tagging

url id url 1 kti.tugraz.at 2 www.tugraz.az . . . .

(8)

NoSQL Motivation

Tagging

tag id tag 1 knowledge 2 data 3 school . . . .

Table: Tag table

(9)

NoSQL Motivation

Tagging

tag id url id 1 1 2 1 3 2 . . . .

(10)

NoSQL Motivation

Tagging

To retrieve tags and URLs you JOIN the tables and select records that fulfill a criteria

E.g. to form a tag cloud

http://www.bibsonomy.org/

How to retrieve related tags?

(11)

NoSQL Motivation

Tagging

Retrieve all URLs for a tag, then for each URL retrieve all tags “Hub” tags have millions of URLs

That is just one-hop neighbors Two-hop, three-hop neighbors? What is the problem here?

(12)

NoSQL Motivation

Tagging

Retrieve all URLs for a tag, then for each URL retrieve all tags “Hub” tags have millions of URLs

That is just one-hop neighbors Two-hop, three-hop neighbors? What is the problem here?

Relational data model is not well-suited for tagging

(13)

NoSQL Motivation

Tagging

knowledge data school ...

(14)

NoSQL Motivation

Tagging

Graph-based model is better suited Bipartite graph

Finding related tags is equivalent to traversing graphs E.g. breadth-first search

(15)

NoSQL Systems

NoSQL systems

Alternative to traditional relational DBMS Advantages

Flexible schema

Quicker/cheaper to set up Massive scalability

(16)

NoSQL Systems

NoSQL systems

Alternative to traditional relational DBMS Disadvantages

No declarative query language → more programming Relaxed consistency → fewer guarantees

(17)

NoSQL Examples

Example: Web log analysis

Each record: UserID, URL, timestamp, additional-info Task: Load into database system

(18)

NoSQL Examples

Example: Web log analysis

Relational DBMS 1 Data cleaning 2 Data extraction 3 Verification 4 Schema specification NoSQL 1 Nothing

(19)

NoSQL Examples

Example: Web log analysis

Each record: UserID, URL, timestamp, additional-info Task: Find all records for:

Given UserID Given URL Given timestamp

Certain construct appearing in additional-info

None of the operations requires SQL: “NoSQL” Highly parallelizable

(20)

NoSQL Examples

Example: Web log analysis

Each record: UserID, URL, timestamp, additional-info Task: Find all pairs of UserIDs accessing same URL Here a SELFJOIN is required

But very unlikely that we will have such a query at all

(21)

NoSQL Examples

Example: Web log analysis

Each record: UserID, URL, timestamp, additional-info Separate records: UserID, name, age, gender, . . . Task: Find average age of user accessing given URL SQL-like

Some aspects of NoSQL: do we need consistency? Statistical operation: e.g. we could sample

(22)

NoSQL Examples

Example: Social network

Each record: UserID1, UserID2

Separate records: UserID, name, age, gender, . . . Task: Find all friends of given user

Simple operation: we do not need a complicated query language

(23)

NoSQL Examples

Example: Social network

Each record: UserID1, UserID2

Separate records: UserID, name, age, gender, . . . Task: Find all friends of friends given user The number of JOINs increases

(24)

NoSQL Examples

Example: Social network

Each record: UserID1, UserID2

Separate records: UserID, name, age, gender, . . .

Task: Find all women friends of men friends of given user The number of JOINs increases

(25)

NoSQL Examples

Example: Social network

Each record: UserID1, UserID2

Separate records: UserID, name, age, gender, . . .

Task: Find all friends of friends of friends of . . . friends of given user The number of JOINs explodes

Graph-based data model Do we need consistency?

(26)

NoSQL Examples

Example: Wikipedia

Large collection of documents

Combination of structured and unstructured data

Task: Retrieve introductory paragraph of all pages about U.S. presidents before 1900

Not suitable for SQL systems because of the mix of structured and unstructured database

Do we need consistency?

(27)

NoSQL Examples

NoSQL systems

Alternative to traditional relational DBMS Advantages

Flexible schema

Quicker/cheaper to set up Massive scalability

(28)

NoSQL Examples

NoSQL systems

Alternative to traditional relational DBMS Disadvantages

No declarative query language → more programming Relaxed consistency → fewer guarantees

(29)

NoSQL Approaches

Types of NoSQL systems

Map-Reduce framework Key-value stores

Document stores Graph database systems

(30)

NoSQL Approaches

Map-Reduce Framework

Originally from Google Open source Hadoop

No data model, data stored in files User provides specific functions

(31)

NoSQL Approaches

Map-Reduce Framework

System provides data processing “glue”, fault-tolerance, scalability Map and Reduce functions

Map: Divide problem into subproblems

(32)

NoSQL Approaches

Key-Value Stores

Extremely simple interface Data model: (key, value) pairs

Operations: Create(key,value), Retrieve(key), Update(key), Delete(key)

CRUD operations

(33)

NoSQL Approaches

Key-Value Stores

Implementation: efficiency, scalability, fault-tolerance Records distributed to nodes based on key

Replication

(34)

NoSQL Approaches

Key-Value Stores

Some allow (non-uniform) columns within value Some allow Retrieve on range of keys

Example systems

Google BigTable, Amazon Dynamo, Cassandra, Voldemort, HBase, etc.

(35)

NoSQL Approaches

Document Stores

Like Key-Value Stores except value is document Data model: (key, document) pairs

Document: JSON, XML, other semistructured formats

Basic operations: Create(key,document), Retrieve(key), Update(key), Delete(key)

(36)

NoSQL Approaches

Document Stores

Also Retrieve based on document contents Example systems

CouchDB, MongoDB, SimpleDB, etc.

(37)

NoSQL Approaches

Graph database systems

Data model: nodes and links

Nodes may have properties (including ID) Links may have labels, roles, or other properties

(38)

NoSQL Approaches

Graph database systems

Interfaces and query languages vary

Single-step versus “path expressions” versus full recursion Example systems

Neo4j, FlockDB, Pregel, etc.

RDF “triple stores” can map to graph databases

References

Related documents

The M270/M274 family of four-cylinder engines is optimally equipped thanks to the flexible consumption technologies Camtronic, lean-burn combustion and natural gas capability.

AGM Range Longhaul Surface Pumps Rugged, high capacity pumps with heavy duty bearing housings Stainless steel wear resistant rotor with hard chrome plating and stainless steel

The aim of this study was to evaluate the current vac- cination status of the HCWs in all of the Departments different from the Department for the Health of Women and Children of one

In this report, we examine the stability of enrollment in both Medi-Cal and Medi-Cal health plans, proportion of children who are disenrolled, only to be subsequently re-enrolled,

Binary logistic regression was used to examine the relationship between three dimensions of the market environment (munificence, dynamism, and complexity) and EMR adoption

With several of the world’s most climate vulnerable cities situated in well-peopled and rapidly growing urban areas near coasts, our case study of Khulna City speaks globally into

Accordingly, the aim of this study is to document smart city initiates in South Africa in three cities (Johannesburg, Cape Town and Ekurhuleni) and to highlight the

yellowfin segar Indonesia tahun sebelumnya (Mt-1) sedangkan model log ganda hanya memiliki tiga tanda koefisien peubah yang sesuai dengan anggapan a priori, yakni GOP