The MongoDB Tutorial Introduction for MySQL Users. Stephane Combaudon April 1st, 2014

(1)

The MongoDB Tutorial

Introduction for MySQL Users

Stephane Combaudon

(2)

Agenda

• _Introduction

• _{Install & First Steps}

• _CRUD

• _{Aggregation Framework} • _{Performance Tuning}

• _{Replication and High Availability} • _Sharding

(3)

Before we start

• _{No knowledge of MongoDB is required} • _{Several interesting topics will}

intentionally be left out

• _{Any problem, any question? Raise}

your hand!

(4)

Agenda

• _Introduction

• _CRUD

(5)

Typical problems with RDBMS

• _{SQL is difficult to understand}

• _{Changing the schema is difficult with}

large tables

– _{aka “The relational model is not flexible”}

• _{Scaling out and HA are difficult}

– _{Support for sharding is limited or}

non-existent

– _{Many HA solutions, which one to choose?}

(6)

MongoDB solutions (1)

● “SQL is difficult to understand”

– Everything is a JSON document in MongoDB – No joins but a powerful way to write complex

queries (aggregation framework)

– Queries are easy to read/write

● SQL: SELECT * FROM people

WHERE name = 'Stephane'

● MongoDB: db.people.find({name:'Stephane'})

(7)

MongoDB solutions (2)

● “The relational model is not flexible”

– MongoDB is schemaless, no ALTER TABLE! db.people.insert({name: 'Stephane'});

db.people.insert({name: 'Joe',age:30});

● But be careful, this is also allowed

db.people.insert({name: 'Stephane') db.people.insert({n: 'Joe'})

(8)

MongoDB solutions (3) ● “Scaling and HA are difficult”

8 M Shard 1 ... Application mongos Shard N mongod Replica set ... mongos Application mongod mongod Config servers Shard 2

(9)

Summary 9 Functionality Scalability & Performance Memcached RDBMS MongoDB

(10)

Agenda

• _Introduction

• _CRUD

(11)

MongoDB setup

11

(12)

Useful terminology

12

Relational MongoDB Database Database Table Collection

(13)

JSON

• _{Stands for JavaScript Object Notation}

– But not tied to Javascript

• _{Open standard for human-readable}

data interchange

• _{Lightweight alternative to XML}

• _{Internally, docs are stored in BSON}

– Binary JSON

– See http://bsonspec.org

(14)

Invoking the shell

$ mongo (--port=xxx, default 27017)

(15)

Inserting data (1)

use test

db.people.insert({name:'Stephane',country:'FR'})

• _{This inserts a document}

– In the people collection

– Collection is in the test database

• _{Collection is created if it did not exist}

(16)

Inserting data (2)

• _{The shell embeds a JS interpreter}

– You can also use a variable to build

the JSON document step by step

(17)

_id

• _{Unique identifier of a doc} • _{You can set it explicitly}

db.people.insert({_id:'Stephane',country:'FR'})

• _{If you don't, it will be created for you} "_id" : ObjectId("523ef7bf8108101415e7d1d1")

• _{Monotonically increasing on a single}

node

– Think auto_increment in MySQL

(18)

Structure of a document

• _{Set of key/value pairs}

– _{{'key1':'value1','key2':'value2',...}}

• _{A value can be an array}

• _{A value can be another document}

(19)

Lab #1

• _{See lab1.rst}

(20)

Agenda

• _Introduction

• _CRUD

(21)

Inserting a document

• _SQL

– _{INSERT INTO t (fn, ln) VALUES ('John', 'Doe')}

• _MongoDB

– _{db.t.insert({fn:'John', ln: 'Doe'})}

• _{No need to specify non-existing fields}

(22)

Updating a document (1)

• _SQL

– _{UPDATE t SET ln='Smith' WHERE ln='Doe'}

• _MongoDB

– _{db.t.update({ln:'Doe'},{$set:{ln:'Smith'}})}

• _{Only 1 doc. is updated by default}

– Specify _{multi:true} for multi-doc

updates

– _{db.t.update({ln:'Doe'},{$set:{ln:'Smith'}},}

{multi:true})

(23)

Updating a document (2)

• _{To add a new field}

– No need to run ALTER TABLE! – It is a regular update

– _{db.t.update({ln:'Smith'},{$set:{age:30}})}

(24)

Removing documents

• _SQL

– _{DELETE FROM t WHERE fn='John'}

• _MongoDB

– _{db.t.remove({fn:'John'})}

(25)

Dropping a collection • _SQL – _{DROP TABLE t} • _MongoDB – _db.t.drop() 25

(26)

Selecting all documents • _SQL – _{SELECT * FROM t} • _MongoDB – _db.t.find() • _SQL

– _{SELECT id, name FROM t}

• _MongoDB

– _{db.t.find({},{name:1})}

(27)

Specifying a “WHERE” clause

• _SQL

– _{SELECT * FROM t WHERE fn='John'}

• _MongoDB

– _{db.t.find({ln:'John'})}

• _SQL

– _{SELECT id, age FROM t WHERE fn='John' AND ln='Smith'}

• _MongoDB

– _{db.t.find({fn:'John',ln:'Smith'},{age:1})}

(28)

Sorts, limits

• _SQL

– _{SELECT country FROM t}

WHERE fn='John'

ORDER BY age DESC LIMIT 10 • _MongoDB – _db.t.find( {fn:'John'}, {country:1, _id:0} ).sort({age:1}).limit(10) 28

(29)

Inequalities • _SQL – _{SELECT fn,ln FROM t} WHERE age > 30 • _MongoDB – _db.t.find( {age:{$gt:30}}, {fn:1, ln:1, _id:0} ) 29

(30)

Conditions on subdocuments { _id:... fn:'John', ln:'Doe' address:{ country:'US', state:'CA' } } db.t.find( {address.country:'US'}, {fn:1, ln:1, _id:0} ) 30

(31)

Conditions on arrays

{

_id:...

hobbies:['music', 'MongoDB', 'Python'] }

• _{Exact match on array}

db.t.find({hobbies:['music', 'MongoDB', 'Python']}) • _{Match on array element}

db.t.find({hobbies:'Python'})

(32)

Counting

• _SQL

– _{SELECT COUNT(*) FROM t}

WHERE country='US'

• _MongoDB

– _{db.t.count({country:'US'})}

(33)

Lab #2

• _{See lab2.rst file}

(34)

Agenda

• _Introduction

• _CRUD

(35)

Aggregation

• _{“Means” GROUP BY, SUM(), …}

• _{Not possible with the operators we}

saw previously (exception: count)

• _{2 ways to aggregate}

– Map-Reduce # Not covered here – Aggregation Framework

(36)

Aggregation Framework

• _{Available from MongoDB 2.2}

• _{Easier and faster than Map-Reduce} • _{Some limitations}

– AF is perfect for simple cases

– Map-Reduce can be needed for

complex situations

(37)

10,000 feet overview

• _{Documents are modified through}

pipelines

• _{Similar to Unix pipelines}

grep oo /etc/passwd | sort -rn | awk -F ':' '{print $1,$3,$4}'

37

Stage 1 Stage 2 Stage 3

(38)

First example

• _SQL

SELECT fn,count(*) AS total FROM people WHERE fn >= 'M%' GROUP BY fn

• _MongoDB

db.people.aggregate({$match:{fn:{$gte:'M'}}},{$group: {_id:"$fn",total:{$sum:1}}})

• _{You probably want some explanation!}

(39)

First example, explained

db.people.aggregate(

{$match:XXX}, # Pipeline 1, filtering criteria {$group:XXX} # Pipeline 2, group by

)

• _{Filtering condition like with find()}

$match:{fn:{$gte:'M'}}

• _{Specifying the grouping field}

$group:{_id:"$fn",...}

• _{Specifying the aggregated fields}

$group:{..., total:{$sum:1}}

(40)

Pipeline order matters db.people.aggregate( {$match:...}, {$group:...} ) vs db.people.aggregate( {$group:...}, {$match:...} ) 40

(41)

SQL analogy

SELECT … FROM people WHERE …

GROUP BY … vs

SELECT … FROM people GROUP BY …

HAVING …

(42)

Clean the output

• _{The projecting operator allows}

renaming keys/values

db.people.aggregate( {$match:...},

{$group:...},

{$project:{_id:0, name:{$toUpper:”$_id”}, total:1}} )

(43)

Final query

• _SQL

SELECT UPPER(fn) AS name, count(*) AS total FROM people WHERE fn >= 'M%' GROUP BY fn • _MongoDB db.people.aggregate( {$match:{fn:{$gte:'M'}}}, {$group:{_id:"$fn",total:{$sum:1}}},

{$project:{_id:0, name:{$toUpper:”$_id”}, total:1}} )

(44)

Other operators • _$sort – _{{$sort: {total: -1}}} • _$limit – _{{$limit: 10}} • _$skip – _{{$skip: 100}} • _$unwind

– Splits a document with an array

into multiple documents

(45)

Lab #3

• _{See lab3.rst}

(46)

Agenda

• _Introduction

• _CRUD

(47)

Indexes

• _{MongoDB supports indexes}

– At the collection level

– Similar to indexes on RDBMS

• _{Can be used for}

– More efficient filtering – More efficient sorting

– Index-only queries (covering index)

(48)

Main types of indexes

• _{Single-field/multiple field index}

– _{db.t.ensureIndex({fn:1})}

– _{db.t.ensureIndex({fn:1, age:-1})}

• _{Unique index}

– All collections have an index on _id

– _{db.t.ensureIndex({username:1}, {unique:true})}

• _{Indexes on arrays, embedded fields,}

subdocuments are also supported

(49)

Compound indexes (1)

• _{Sort order}

– Traversal in either direction

db.t.find().sort({country:1,age:-1}) – Good indexes ● {country:1, age:-1} ● {country:-1, age:1} – Bad index ● {country:1, age:1} 49

(50)

Compound indexes (2)

• _Prefixes

– Leftmost prefixes are supported

db.t.ensureIndex({age:1,country:1})

– Good for

● db.t.find({age:30})

● db.t.find({age:30,country:'US'}) – Not good for

● db.t.find({country:'US'})

(51)

Lab #4

• _{See lab4.rst}

(52)

Agenda

• _Introduction

• _CRUD

(53)

Overview

• _{Replication is asynchronous} • _{All writes go to the master} • _{Secondary can accept reads}

• _{A new master is elected if the current}

master fails

• _{An arbiter (no data) can be set up for}

the election process

(54)

54 Primary Secondary Heartbeat Writes Secondary

(55)

Setup of a 3 node replica set

• _{Start 3 instances with}_{--replSet rsname} • _{Then on the master}

rsconf = { _id: "RS", members: [{ _id: 0, host: "localhost:30001" }] } rs.initiate(rsconf) rs.add("localhost:30002") rs.add("localhost:30003") 55

(56)

If the master crashes

• _{If a heartbeat does not return within}

10s, the master is considered unavailable

• _{Election of a new master starts then} • _{You can influence the choice by}

setting a priority (between 0 and 100)

(57)

Setting priorities

cfg = rs.conf()

cfg.members[1].priority = 0 cfg.members[2].priority = 10 rs.reconfig(cfg)

• _{A node set to have higher priority}

than the master will be elected the new master

(58)

Write concerns (aka w option)

• _{How many nodes should ack a write?}

– w=1

– Primary only (default setting)

– w=2

– Primary and one secondary – w=majority

– Majority of the nodes

(59)

Setting a write concern cfg = rs.conf() cfg.settings.getLastErrorDefaults = {w: "majority"} rs.reconfig(cfg) 59

(60)

Read preferences • _{Can be any of} – primary – primaryPreferred – secondary – secondaryPreferred – nearest

– Or you can use a custom tag

(61)

Lab #5

• _{See lab5.rst}

(62)

Agenda

• _Introduction

• _CRUD

(63)

When to shard

• _{When the dataset no longer fits in a}

single server

• _{When write activity exceeds the}

capacity of a single node

• _{When the working set no longer fits in}

memory

(64)

Architecture of a sharded cluster 64 M Shard 1 ... Application mongos Shard N mongod Replica set ... mongos mongod mongod Config servers Shard 2 Application

(65)

Setup (1)

• _{Start the config servers}

– _{mongod --configsvr --dbpath <path> --port 40001} – _{mongod --configsvr --dbpath <path> --port 40002} – _{mongod --configsvr --dbpath <path> --port 40003}

• _{Start a router}

– _{mongos –configdb localhost:40001,}

localhost:40002, localhost:40003 --port 31000

• _{Connect to mongos}

– _{mongo --host localhost --port 31000}

(66)

Setup (2)

• _{Add shards}

– _{sh.addShard('RS/localhost:30001')}

• _{Enable sharding for a database}

– _{sh.enableSharding('test')}

• _{Enable sharding for a collection}

– _{sh.shardCollection('test.people',{'fn':1})}

(67)

Balancing (1)

• _{Data is stored in chunks}

sh.addShard(shard1) 67 Shard 1 -∞ 32 33 45 46 82 83 +∞

(68)

Balancing (2) 68 sh.addShard(shard2) Shard 1 -∞ 32 46 82 Shard 2 33 45 83 +∞

(69)

Balancing (3) 69 sh.addShard(shard3) Shard 1 -∞ 32 Shard 2 33 45 83 +∞ Shard 3 46 82

(70)

Lab #6

• _{See lab6.rst}

(71)

Thank you for your attention!

[email protected]