3 Case Studies of NoSQL
and
Java Apps in the Real World
Eugene Ciurana [email protected] - pr3d4t0r ##java, irc.freenode.netThis presentation is available from:
About Eugene...
•
15+ years building mission-critical,high-availability systems
•
15+ years of Java work•
Open source evangelist•
MapReduce + Hadoop early adopter•
VP of R&D at badoo.com - largest socialnetwork in Europe (120M subscribers worldwide!)
•
State of the art main line of business atthe largest companies in the world - not a web guy!
Very Important!
Please Ask Questions!
What Is NoSQL?
•
Database...•
Horizontally scalable•
Non-relational•
Built-in application support•
Custom file system designed for supporting NoSQLoperations
•
Best for non-OLTP applications•
Unstructured dataNoSQL Topology
Virtual File System
logical table management, load balancing, garbage collection (HDFS, GridFS, Hypertable)
Tablet Server 0
Tablet Server 1
Tablet Server n
Distributed File System
FS 0 FS 1 FS 2 FS n
Node Node Node Node
Areas of Application
•
Document storage and management•
Object databases•
Graph databases•
Key/value stores•
Eventually consistent key/value stores•
Financial modeling•
Click stream analytics•
Simulations•
Protein foldingBrewer’s CAP Theorem
Pick Any Two
C
A
P
Consistency Availability Partition tolerance Relational Key-Value Column-Oriented Document-OrientedRDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica
Dyn amo
, Vol demo
rt, T okyo Cab inet , KAI ,C assa ndra , Simp leD
B, C ouch DB, Ria k mo ng oD B, T erra st ore ,D atast ore , H yp ert ab le , H base , R ed is, Be rke ley D B, Me mca ch eD B, Sca la ris
Pick any
two!
Three NoSQL Systems
•
mongoDB•
Horizontally scalable•
Document-oriented database•
No JOIN operations, no row level locking•
GigaSpaces XAP•
Data grid for replacing application servers•
Event processing model•
Front-end to various data stores (SQL and NoSQL)•
Hadoop/Hive/HBase•
MapReduce framework foundation•
Optimized for fast search and retrievalmongoDB
•
Document-oriented storage•
Querying via JavaScript or custom APIs for all majorprogramming languages
•
In-place updates for atomicity•
Any attribute in a document can be indexed•
Built-in MapReduce•
Built-in cachingmongoDB
mongoDB Server (master)
Data Storage
mongod
Database daemon
mongos
Sharding daemon
mongoDB Server (slave)
Data Storage
mongod
Database daemon
mongos
Sharding daemon Consumer
GigaSpaces XAP
•
Data persistence•
Distributed processing•
Caching•
Multi-language support•
NoSQL operations:•
SQLQuery - SQL-like syntax•
Persistency - RDBMS through wrapper•
memcachedGigaSpaces XAP
Application Frameworks
Jetty JEE
Spring Mule
Groovy .Net
C++ Java
XAP Management
and Monitoring
XAP Deployment Virtualization
XAP Middleware Virtualization
(Virtualized Clustering Layer)
Hadoop and HBase
•
HDFS - distributed high performance file system•
Runs on top of ext3, HFS+, whatever•
Alternatives: AWS S3, CloudStore, others•
MapReduce - framework for running jobs•
Java or anything that works with stdin, stdout•
Chukwa - large log analysis framework (not very popular)•
Hive - Data warehousing, ETL, and SQL-like language•
HBase - Column-oriented NoSQL databaseHadoop and HBase
HDFS
Disk Disk Disk Disk
MapReduce HBase
Sqoop Chukwa
Hive PIG
Z
o
o
Ke
e
p
e
Case Study 1: Large FI Stock Trades
•
Stock trading system is based on large commercialdatabase
•
It can store only up to 4 weeks of trades•
Otherwise it’s too expensive•
Inability to run long-term forecasting or trend analysis•
Robust, Java-based•
Mule-based - all messaging going through ESBCase Study 1: Large FI Stock Trades
•
Syphon trades as they fly by through the ESB•
Copy every trade to HDFS•
Use MapReduce to break the data down for analysis•
Commit initial analysis to HBase•
Run queries and further mine data through HBase andMapReduce
•
Data mining and presentation using WEKA•
Forecasting accuracy increased by 11.3% in the first 180Large SaaS
Client Relationships App Dispatcher CRM Custom Queuing System Main App Search Queue Static Files (S3) Reporting query reply Rich Docs (GridFS) update Netezza Lucene Service Providers Various services providers throughout the Internet. Some are public, some are partners End Users Service Consumers Browser RSS Outlook CWS EWS End Users Service Consumers Internal Service Providers Heavy web services Some XML, some customFirewall Legend HTTP SOAP Custom RPC ODBC/JDBC Direct/API Internal End Users
Large SaaS
Search Static Files (S3) Reporting ServiceProviders End Users
Service Consumers Browser RSS Outlook CWS EWS Internal Service Providers
Mule ESB Container: Services, Message Routing, and Transformations
Client Relations Services Dispatcher Services Main App Services OpenMQ Other New Services
Tomcat App Container
Main App (zone instance)Client Relations
(Zone Manager)Dispatcher New Apps New System Acquisition (.Net, PHP, etc.) cron Services Rich Documents (GridFS) m e m c a c h e d Local DBs, Other Resource Cloud Firewall Enterprise Services Corporate Firewall End Various services providers throughout the Internet. Some are public, some are partners
Large SaaS
Databases Search Hive Static Files (S3) Reporting PigMule ESB Container: Services, Message Routing, and Transformations
Client Relations Services Dispatcher Services Main App Services OpenMQ Other New Services
Tomcat App Container
Main App (zone instance)Client Relations
(Zone Manager)Dispatcher New Apps cron Services Rich Documents (GridFS) m e m c a c h e d Internal Services
HDFS, GridFS, Data Warehouse Hadoop, DB cluster,
computational network
External Service or Consumer
Cloud-based MapReduce/NoSQL Infrastructure - expand and contract
SOBA Labs
sobaDB 192.168.0.42 Other Consumer 192.168.0.42 sobaEngine localhost Ubuntu LandscapeREST SOBA interface - implementation is transparent to caller! http://soba.myserver.com/manage/resource F i r e w a l l Oracle vm_uuid: b220c8db Xen Host SOBA Agent
Xen XML-RPC API
REST SOBA interface
Xen Python SOBA Python Amazon EC2 End-user App ami-322ec65b End-user App ami-322ec65b
SOBA Labs
Mule-based SOBA Engine abstracts provisioning, configuration, and
monitoring through web services Java and Python Web Services Interface CANONICAL Landscape Other Application easy integration! JSON JSON web se rvi ce s R E S T R E S T web se rvi ce s SOBA Engine Python API Native Application easy integration! Python dict
amazon EC2 API Xen Server API Rackspace Cloud Servers API
SOBA Agent Python dict EC 2 w e b se rvi ce s API Xe n XML -R PC API R E S T JSON JSON SOBA Data mongoDB EC2 Data XML EC2 Query XML Config Data (Puppet?) Ubuntu Server Ubuntu Server Ubuntu Server R E S T DRY Interface
Don't Repeat Yourself!
Provisioning, configuration or monitoring via SOBA is the same regardless of target: Same API call, same data payload, same data format, etc.
Implementation is abstracted from the
dict
SOBA
Plug - Know Any High Caliber Coders?
•
badoo.com is hiring!•
Top talent - we’re very demanding•
PHP, MySQL developers and sr. developers•
Java with a Business Intelligence twist for Pentaho and Hadoop•
Mobile: Android, iOS, Blackberry, WAP, JME•
QA sr. lead - highly technical, web, web services, and mobile•
€2,000 referral bonus for you if we hire your friend!•
Paid 90 days after hiring (trial period ends)•
If your friend can legally work in Russia or the UK, but doesn’t live in Moscow or London, we’ll work out relocation•
Contact: [email protected]
Q&A
Comments?
Anything else?
Eugene Ciurana [email protected] - pr3d4t0r ##java, irc.freenode.nethttp://ciurana.eu/scalablesystems