noSQL and
NonRelational Databases
Matthias Lee
What NoSQL?
Yes no SQL..
Atleast not only SQL
Large class of Non–Relaltional Databases
trading Consistancy for Availability
Easily Scalable (Partitioning)
Highly fault tolerent
Google, Facebook, Amazon, Twitter et al.
What? Why NonRelational?
No complicated Relationships
Schema light/free
Less interdependencies
Easier scaling
Higher fault tolerance
Distributed Computing
Store and search
Hashing & MapReduce
CAP Theorem
Consistency
Availability Patritioning
Choose 2 and work around the other.
Eric Brewer UC Berkley
Dropping some of ACID
Compromises must be made
ACID
Atomic – All or nothing? partial
Consitency – ”eventually consistent”
Isolation – ”revision” history
Durability – ”written in stone” sometimes
BASE
BASE
Basically Available
Soft State
Eventual consistency
ASYNC conflict resolution and repair
ID Content
<unique id> Key1, Value Key2, Value Key3, Value Key4, Value
<unique id> Key2, Value Key3, Value
<unique id> Key8, Value Key9, Value Key5, Value Key4, Value
<unique id> Key1, Value Key2, Value Key3, Value Key4, Value Key5, Value Key6, Value Key7, Value Key8, Value
Highlights
Fast processing/specific tasks
Usage of distributed queries and operations
MapReduce/Hadoop
Async reads and writes
Fire & Forget
Flexible schema (often)
And its scalable / easy replecation
Things these DBs do... Easily
Distributed storage (performance/fault tolerance)
Increased response time and fault tolerance
Things these DBs do... Easily
Master User
Request
Slave 1 Slave 2 Slave 3 Slave 4
Master User
Request
Master
Distributed storage (locality)
Things these DBs do...
North America
Europe
Server 2
Server 1
Australia
Server 3
Server 4
Issues and challenges
ACID goes out of the windows
No direct translation SQL<>noSQL
Relatively new field
many similar solutions
All solutions have different trade offs
Things these DBs *dont* do... Easily
Querying noSQL
Map/Reduce
Various query languages
RQL – rasdaman
CQL – Cassandra
noSQL means mostly no SQL
Easily distributed method of processing data
Fault tolarant
map() and reduce()
Input reader/partitioner
map()
Sort and partition
reduce()
Output writer
Map/Reduce
Map/Reduce
Map/Reduce
Chunk_1 Chunk_2 Chunk_3 Chunk_4 Chunk_5 Chunk_6 Chunk_7 Chunk_8 INPUT
map_out_1 map_out_2 map_out_3 map_out_4 map_out_5 map_out_6 map_out_7 map_out_8
map() sorting
red_out_1
red_out_2
red_out_3 reduce()
magic
Types of noSQL DBs
Document Databases
Key/Value Stores
Array Databases
ColumnOriented Datastores
Graph Databases(they exist)
Column Families
Name { Location {
Types of noSQL DBs
ColumnOriented Datastores
Indexing over Column families
Fast aggregation & searching
Inline compression
Easy sharding
id Fname Lname Zip Street
1 Joe Shmoe 32818 Cedar
2 Ralph Peters 65636 Birch
3 Mary Lewis 10337 Green
Joe,Ralph,Mary
Shmoe,Peters,Lewis 32818,65636,10337 Cedar,Birch,Green
Types of noSQL DBs
ColumnOriented Datastores
BigTable and its clones
Hbase
Facebook, Hulu and StumbleUpon
Hypertable
Baidu and Rediff
Types of noSQL DBs
Key/Value Stores (simple)
Some of the earliest noSQL early 90's
Easily distributed Storage and Searching
Hashtable like structure
MapReduce
Often used as caching engine
O(1) ave lookup time
[hash] : bytes[N]
Types of noSQL DBs
Document stores (mostly structured K,V store)
MongoDB
FourSquare, Shutterfly, Intuit, Github & more
CouchDB
BBC, Canonical, Cern, Android apps & more
Redis
Digg, Flicker, StackOverflow, Craigslist & more
Types of noSQL DBs
Key/Value Stores
BerkleyDB
MySQL, Bitcoin, MemcachedDB, SVN & more
Redis
Digg, Flicker, StackOverflow, Craigslist & more
Cassandra (CQL)
Facebook, Reddit, Twitter, Netflix & many more
Types of noSQL DBs
Document stores (mostly structured K,V store)
Versitile Dynamic schema
Eventual consistency
Highly Parallelizable
Easy replication
"_id": "4eea98de1550e2cc04000000":
{
"lastModified": "2011-12-15 20:03:26",
"name": "Peter Lustig",
"avatar": "4eea61a11550e26f7d000000",
"email": "Peter.Lustig@void.net",
”hobbies”: ”sleeping”
}
Types of noSQL DBs
Document stores (mostly structured K,V store)
MongoDB
FourSquare, Shutterfly, Intuit, Github & more
CouchDB
BBC, Canonical, Cern, Android apps & more
Redis
Digg, Flicker, StackOverflow, Craigslist & more
CouchDB – ”set it up and relax”
”cluster of unreliable commodity hardware”
”RESTful” JSON API
Document Store with easy replication
Eventual consistency
Light weight (runs on phones)
Easy replication
Distributed TinyURL crawler
TinyURL crawler
Quick deploy TinyURL resolver
MasterSlave architecture
Replicating Databases
Amazon EC2 spot instances
Distributed TinyURL crawler
TinyURL crawler
Distributed TinyURL crawler
resolver TinyURL
TinyURL crawler
Distributed TinyURL crawler
Amazon EC2 Master
R
R R
R
R R
R
Thanks for listening
Interested in this? Want to know more?
#jhuacm on irc.freenode.net
+Matthias Lee
github.com/madmaze