Agenda
– What is NoSQL
What is NoSQL?
NoSQL Definition
http://nosql-database.org/NoSQL DEFINITION: Next Generation Databases mostly
addressing some of the points: being non-relational,
distributed, open-source and horizontal scalable. The original
intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often
more characteristics apply as: schema-free, easy replication
support, simple API, eventually consistent /BASE (not ACID),
a huge data amount, and more. So the misleading
Who Uses NoSQL?
• Twitter uses DBFlock/MySQL and Cassandra
• Cassandra is an open source project from Facebook • Digg, Reddit use Cassandra
• bit.ly, foursquare, sourceforge, and New York Times use
MongoDB
UNDERSTANDING THE
Why SQL sucks..
• O/R mapping (also known as Impedance Mismatch)
• Data-Model changes are hard and expensive
• SQL database are designed for high throughput, not
low latency
• SQL Databases do no scale out well
• Microsoft, Oracle, and IBM charge big bucks for
databases
– And then you need to hire a database admin
• Take it from the context of Google, Twitter, Facebook
and Amazon.
– Your databases are among the biggest in the world and nobody pays you for that feature
What has NoSQL done?
• Implemented the most common use cases
as a piece of software
NoSQL Data Models
• Key-Value
• Document-Oriented
NoSQL Data Model: Document
Oriented
• Data is stored as “documents”
• We are not talking about Word documents • Comparable to Aggregates in DDD
• It means mostly schema free structured data
• Can be queried• Is easily mapped to OO systems (Domain
Model, DDD)
Network Communications
• REST/JSON
• TCP/BSON (ClientDriver)
BSON [bee · sahn], short for Bin-ary JSON, is a
bin-ary-en-coded seri-al-iz-a-tion of JSON-like doc-u-ments.
Like JSON, BSON sup-ports the em-bed-ding of
doc-u-ments and ar-rays with-in oth-er doc-u-ments and
ar-rays. BSON also con-tains ex-ten-sions that al-low
Client Drivers (Apache License)
• MongoDB currently has client support for the following programming languages: • C • C++ • Erlang • Haskell • Java • Javascript
• .NET (C# F#, PowerShell, etc) • Perl
Collections vs. Capped Collection
(Table in SQL)
• Collections
• blog.posts • blog.comments • forum.users • etc.• Capped collections (ring buffer)
• Logging • Caching • Archiving
Indexes
• Every field in the document can be indexed
• Simple Indexes: db.cities.ensureIndex({city: 1}); • Compound indexes: db.cities.ensureIndex({city: 1, zip: 1}); • Unique indexes: db.cities.ensureIndex({city: 1, zip: 1}, {unique: true});
Relations
• ObjectId
db.users.insert(
{name: "Umbert", car_id: ObjectId("<GUID>")});
• DBRef
db.users.insert(
{name: "Umbert", car: new DBRef("cars“, ObjectId("<GUID>")});
db.users.findOne(
Queries (Regular Expressions)
{field: /regular.*expression/i}
// get all cities that start with “atl”
and end on “a” (e.g. atlanta)
Queries (2) : LINQ
https://github.com/craiggwilson/fluent-mongo
Equals
x => x.Age == 21 will translate to {"Age": 21}
Greater Than, $gt:
x => x.Age > 18 will translate to {"Age": {$gt: 18}}
Greater Than Or Equal, $gte:
x => x.Age >= 18 will translate to {"Age": {$gte: 18}}
Less Than, $lt:
x => x.Age < 18 will translate to {"Age": {$lt: 18}}
Less Than Or Equal, $lte:
x => x.Age <= 18 will translate to {"Age": {$lte: 18}}
Not Equal, $ne:
Atomic Operations (Optimistic
Locking)
• Update if current:
• Fetch the object.
• Modify the object locally.
Atomic Operations: Sample
> t=db.inventory
> s = t.findOne({sku:'abc'})
{"_id" : "49df4d3c9664d32c73ea865a" , "sku" : "abc" , "qty" : 1} > t.update({sku:"abc",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ;
> db.$cmd.findOne({getlasterror:1})
{"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it has worked > t.update({sku:"abcz",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ;
>db.$cmd.findOne({getlasterror:1})
Atomic Operations: multiple
items
db.products.update(
{cat: “boots”, $atomic: 1},
{$inc: {price: 10.0}},
false, //no upsert
Replica set (1)
• Automatic failover
• Automatic recovery of servers that were
offline
• Distribution over more than one
Datacenter
• Automatic nomination of a new Master
Server in case of a failure
Mongo Sharding
• Partitioning data across multiple physical servers to
provide application scale-out
• Can distribute databases, collections or objects in a
collection
• Choose how you partition data (shardkey)
• Balancing, migrations, management all automatic
• Range based
• Can convert from single master to sharded system with
0 downtime
Map Reduce
http://www.joelonsoftware.com/items/2006/08/01.html
• It is a two step calculation where one
step is used to simplify the data, and the
second step is used to summarize the
Map Reduce using LINQ
https://github.com/craiggwilson/fluent-mongo/wiki/Map-Reduce
• LINQ is by far an easier way to compose map-reduce functions.
// Compose a map reduce to get the sum everyone's ages.
var sum = collection.AsQueryable().Sum(x => x.Age);
// Compose a map reduce to get the age range of everyone grouped by the first letter of their last name.
var ageRanges =
from p in collection.AsQueryable() group p by p.LastName[0] into g select new
{
FirstLetter = g.Key,
AverageAge = g.Average(x => x.Age), MinAge = g.Min(x => x.Age),
Store large Files: GridFS
• The database supports native storage of
binary data within BSON objects (limited in
size 4 – 16 MB).
• GridFS is a specification for storing large
files in MongoDB
Performance
On MySql, SourceForge was reaching its limits of
performance at its current user load. Using some of
the easy scale-out options in MongoDB, they fully
replaced MySQL and found MongoDB could handle
the current user load easily. In fact, after some
testing, they found their site can now handle 100
times the number of users it currently supports.
It means you can charge a lot less per user of
Performance
http://www.michaelckennedy.net/blog/2010/04/29/MongoDBVsSQLServer2008PerformanceShowdown.aspx