• No results found

No-SQL Databases for High Volume Data

N/A
N/A
Protected

Academic year: 2021

Share "No-SQL Databases for High Volume Data"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

No-SQL Databases

for High Volume Data

Edward Wijnen

3 November 2014

(2)

The New Connected World Needs

a Revolutionary New DBMS

“The Internet of Things”

Client-Server

Semi-Connected

Isolated

Social

Radically Connected

©2014 DataStax Confidential. Do not distribute without consent.

Mobile Cloud

Mainfram

e

1970’s

1990’s

Today

(3)

Businesses Must Close the Gap…and Fast

(4)

Connected

Customers

Connected

Partners

Connected

Employees

Devices

Connected

Connected

Products

(5)

Connected

Partners

Connected

Employees

Devices

Connected

Connected

Products

You Would End With This

Distributed

Transactional

Database

Connected

(6)

Apache Cassandra™

• Apache Cassandra™ is a massively scalable, open source, NoSQL,

distributed database built for modern, mission-critical online applications

• Written in Java and is a hybrid of Amazon Dynamo and Google BigTable

• Masterless with

no single point of failure

• Distributed and data centre aware

• 100% uptime

• Predictable scaling

Dynamo

BigTable

Cassandra

BigTable: http://research.google.com/archive/bigtable-osdi06.pdf

(7)

Apache Cassandra™

(8)

Distributed Transactional Database Advantages

The Hague

(9)

Distributed Transactional Database Advantages

The Hague

(10)

Distributed Transactional Database Advantages

The Hague

(11)

Distributed Transactional Database Advantages

The Hague

(12)

 

Distributed Transactional Database Advantages

The Hague

London

C*

C*

(13)

 

Distributed Transactional Database Advantages

The Hague London

C*

C*

(14)

Delivers 150+ Billion Content Recommendations Per Month

Serves content for largest media brands in the world: Reuters, Wall St Journal, USA Today

Needed a massively scalable data store

High velocity of data with 58,000 links to content per second

Always-on data architecture

(15)

 

Distributed Transactional Database Advantages

The Hague

London

C*

C*

(16)

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*

C*

(17)

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*

C*

(18)

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*

C*

(19)

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*

C*

(20)

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*

C*

(21)

 

Distributed Transactional Database Advantages

The Hague

Groningen

London

C*

C*

C*

(22)

Netflix Delights Customers with Personal Recommendations

World’s leading streaming media provider with digital revenue $1.5BN+

Tailors content delivery based on viewing preference data captured in Cassandra Increased market cap by 600% since 2012

Introduction of ‘Profiles’ drove throughput to over 10M transactions per second Replaced Oracle in six data centers, worldwide, 100% in the cloud

(23)

 

Cassandra – Always On, No Matter What

The Hague

Groningen

London

C*

C*

C*

C*

l

Transactional Backbone

l

Industry Leading

Performance

l

Predictable Scalability

l

Operational Simplicity

l

Business Flexibility

(24)
(25)
(26)

Cassandra

– Tunable Consistency

• Consistency Level (CL)

• Client specifies per read or write

• Handles multi-data center operations

• ALL = All replicas ack

• QUORUM = > 51% of replicas ack

• LOCAL_QUORUM = > 51% in local DC ack • ONE = Only one replica acks

• Plus more…. (see docs)

• Blog: Eventual Consistency != Hopeful Consistency

http://planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis/

Node 1 1st copy

Node 4

Node 5 Node 2

2nd copy

Node 3 3rd copy Parallel

Write

Write

CL=QUORUM

5 μs ack

12 μs ack

500 μs ack 12 μs ack

(27)

Messaging Product Catalogs and

Playlists

Recommendation/ Personalization

Fraud detection

Internet of things/ Sensor data

(28)

Challenges

Customers

A product catalog is an organized collection of products or services. Playlists refer to user-defined queues of songs, movies, games and lessons.

Examples: Shopping carts, gift registries, media playlists.

Why DataStax?

• Rigidity of relational databases • Increase in volume and diversity of

data

• Application must have zero downtime

• Predictable scalability is hard • Desire to operate in the cloud

• Real-time database infrastructure • Rich analytics for flexible access to

information

• Fast search and indexing of data • Add new features while the

application is online

• Multiple data centers to ensure applications and data have 100% uptime

(29)

Challenges

Customers

Recommendation and Personalization Engines understand each person’s unique habits and

preferences and bring to light products and items that a user may be unaware of and not looking for. Examples: News sites, shopping carts.

Why DataStax?

• Large volumes of user data makes accuracy challenging

• Merging real-time and historical information

• Cross-product information

• Response times need to be fast • Predictable scalability is hard

• Rich query language and enterprise search to store, search and analyze user activity data

• Integrations with data lakes allow for the merging of real time and

historical data

• Multi-data center replication ensures applications and data suffer no

downtime

• Linear scalability is predictable

(30)

Challenges

Customers

Fraud detection solutions identify out-of-the-ordinary patterns to prevent malicious attacks on digital and physical assets from unauthorized applications and individuals. Examples: credit card monitoring, application infiltration

Why DataStax?

• Increasing volume of fraudulent attacks across all industries • Technology sophistication • Limited historical and trend

information

• Information is stored across multiple channels

• The customer can be the first to spot the fraud

• Easy management of high-data volumes

• Real-time monitoring across channels, sites and data centers • Integrations with data lakes allow for

the merging of real time and historical data

• Ease of use in managing and monitoring data

• Multi-data center replication ensures applications and data suffer no

downtime

(31)

Challenges

Customers

IOT refers to the revolution of a growing number of internet-connected devices that can network and communicate with each other.

Why DataStax?

• Vast and diverse amounts of unstructured data from internet enabled devices

• Volume of sensors is increasing exponentially

• Fast-changing technology • Support multiple channels with

varying data types

• Predictable scalability is hard

• Easy management of high-data volumes

• Rich query language and enterprise search to store, search and analyze data

• Dynamic database schema

• Linear scalability offers predictability

(32)

Challenges

Customers

Messaging facilitates communication, interaction and collaboration between diverse user-groups and applications via social networks, cloud services and more.

Examples: SMS, email and instant messaging.

Why DataStax?

• Managing large data volumes at a reasonable cost

• Real-time updates and information, getting detailed alerts and

notifications

• Predictable scalability is hard

• Information is stored across multiple platforms and systems

• Agility

• Easy management of high-data volumes

• Real-time monitoring across channels, sites and data centers • Multi-data center replication ensures

applications and data suffer no downtime

• Ease of use in managing and monitoring data

• Dynamic database schema

(33)

The Weather Channel on

Learning to use Cassandra

“If you had a look in the past, you may have found

Cassandra had a high learning curve and a fair amount

of complexity. CQL3, the native drivers, and virtual

nodes have

changed the game entirely

, making

Cassandra a much more accessible and friendly

platform.

While I have years of experience using Cassandra, my

team was mostly new to it;

CQL made their transition

essentially painless

. But where Cassandra really

shines is in

speed and operational simplicity

, and I

would say those two points were critical.”

(34)

Application drivers and connectors for all popular developer

languages exist for Cassandra and DataStax Enterprise.

CQL (Cassandra Query Language) is the primary API

Drivers/connectors include:

Java

C++

Python

Ruby

PHP

(35)

Connected

Partners

Connected

Employees

Devices

Connected

Connected

Products

DataStax – Enabling The Future

Distributed Transactional

Database

Connected

(36)

Thank you!

[email protected]

@edwardwijnen

References

Related documents

In the second part of this paper I will discuss how Hadot views ancient philosophy, and present his view as a counter to my support of Aristotle’s Virtue Ethics over Kant’s or

In COMET project the sector specific problems of design quality are identified as follows: (1) Clients’ needs are not sufficiently studied and considered, (2) Requirements that

Hasil pengujian path analysis menun- jukkan bahwa nilai koefisien berpengaruh tidak langsung lebih besar dari nilai koefisien penga- ruh langsung, maka profitabilitas (ROA),

Howell ENTITLED Host location and host- associated divergence in parasitoids of the gall midge, Asteromyia carbonifera BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

The report outlines the main developments in the fi eld of product safety in 2010, in particular the continuous improvement of cooperation between the Member States and the

Perhaps the greatest challenge to using continuous EEG in clinical practise is the lack of reliable method for online seizure detection to determine when ICU staff evaluation of

ASHE Healthcare Facilities Management Series (American Society for Healthcare Engineering).. 1997 Hospital

Carbon dating makes use of the fact that some atoms – radioactive atoms – change over time into other types of atoms in a natural process known as radioactive decay.. The decay