• No results found

SCALABLE DATA SERVICES

N/A
N/A
Protected

Academic year: 2021

Share "SCALABLE DATA SERVICES"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

SCALABLE DATA SERVICES

2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D.

(2)

Outline

2110414 - Large Scale Computing Systems 2

Overview

MySQL Database Clustering

GlusterFS

(3)

Overview

3

(4)

Problems of Data Services

2110414 - Large Scale Computing Systems 4

 Data retrieval is usually the bottleneck  Searching

 Transferring

 Basic performance improvement schemes  Data partitioning

 Data replication – need to maintain consistency  Other techniques

 Database Clustering

 High-performance File Systems  In-memory caching

(5)

Adapted from G. Vanderkelen,

“MySQL Cluster: An introduction”, 2006

MySQL Database Clustering

5

(6)

Quick intro to MySQL

 MySQL is a DBMS running on most OS

 Reputation for speed, quality, reliability and easy

to use

 Storage Engines (MyISAM, InnoDB, ..)

 Support standard SQL and other features  Stored procedures

 Triggers, Updatable Views, Cursors  Precision math

 Data dictionary (INFORMATION_SCHEMA database)  and more..

(7)

MySQL Architecture

Connection Pool

(Authentication, limits, caches,..)

API (PHP,Java,C/C++,.NET,..) SQL Interface DML/DDL Stored Procedures, Triggers, Views,.. Parser SQL Translation, Object Privileges Optimizer Execution Plan, Statistics Caches & Buffers Query Cache, global & per engine

Storage Engines

MyISAM InnoDB Memory Federated NDB

(custom)

Archive

(8)

What is MySQL Cluster?

 In-memory storage

 data and indices in-memory  check-pointed to disk

 Shared-Nothing architecture  No single point of failure

 Synchronous replication between nodes  Fail-over in case of node failure

 Row level locking  Hot backups

(9)

Cluster Nodes

 Participating processes are called „nodes‟  Nodes can be on same computers

 Two tiers in Cluster:  SQL layer

 SQL nodes (also called API nodes)

 Storage layer

 Data nodes, Management nodes

(10)

Components of a Cluster

MySQL (mysqld) NDB Application .Net JBoss PHP Management (ndb_mgmd) SQL Nodes Data Nodes Management Nodes ndbd ndbd ndbd ndbd Data Nodes MySQL (mysqld) NDB MySQL (mysqld) NDB Sql Layer Storage Layer

(11)

 Contain data and index

 Used for transaction coordination

 Each data node is connect to the others  Shared-nothing architecture

 Up to 48 data nodes

(12)

SQL nodes

 Usually MySQL servers  Also called API nodes

 Each SQL node is connected to all data nodes  Applications access data using SQL

 Native NDB application (e.g. ndb_restore)  Client application written using NDB API

(13)

Management nodes

 Controls setup and configuration  Needed on startup for other nodes  Cluster can run without

 Can act as arbitrator during network partitioning  Cluster logs

(14)

A Configuration

MySQL

NDB

MySQL

+ ndb_mgmd

NDB

Application Application

Application

Data Nodes

Node Group 1

(15)

Failure: MySQL server

 Applications can use other  mysqld reconnects

MySQL

NDB

MySQL

+ ndb_mgmd

NDB

Application Application

Application

Data Nodes

Node Group 1

(16)

Failure: Data Node

 Other data nodes know  Transaction aborted

 Min. 1 node per group needed  0 nodes in group = shutdown

MySQL NDB MySQL + ndb_mgmd NDB Application Application Application Data Nodes

Node Group 1

(17)

Example: Web Sessions

 Without Cluster

 One MySQL server holding data  Single point of failure

17

MySQL

(mysqld)

Apache/PHP Apache/PHP Apache/PHP Apache/PHP

(18)

18

Web Sessions

Apache + PHP mysqld

Load balancer

 With Cluster

 MySQL distributed

 No Single point of failure  Shared storage, but

redundant

Apache + PHP mysqld

Apache + PHP mysqld

Apache + PHP mysqld

ndbd ndbd

ndbd ndbd

(19)

GlusterFS

19

(20)

What is GlusterFS?

 Open source, clustered file system

 Scale up to several petabytes for thousands of clients  Aggregate disk and memory resources into a single

global namespace over network

 Leverage commodity hardware  Lead to storage virtualization

 Allow administrators to dynamically expand, shrink,

rebalance, and migrate volumes

 Provide linear scalability, high performance, high

availability, and ease of management

2110414 - Large Scale Computing Systems 20

(21)

GlusterFS Overview

2110414 - Large Scale Computing Systems 21

(22)

GlusterFS Architecture

2110414 - Large Scale Computing Systems 22

(23)

GlusterFS Load Balancing Mode

 Data are stored as files

and folders

 Use tokens

 Extended attributes of a file  Identify the location of a file  Distributed across directories  No need for dedicated

metadata server

 Gluster translates the

requested file name to a token and access the files directly

23

(24)

GlusterFS Replication Mode

 Support auto replication

across multiple storages

 Provide high availability

(auto fail-over) and auto self-healing

 Uses load balancing to

access replicated instances

24

(25)

Memcached

25

(26)

What is Memcached?

2110414 - Large Scale Computing Systems 26

 General-purpose high-performance open source

distributed memory caching system

 giant hash table distributed across multiple machines  Speed up dynamic database-driven websites by

caching data and objects in RAM

 Being used by many popular web sites

 LiveJournal, Wikipedia, Facebook, Flickr, Twitter, Youtube  API is available in many languages

(27)

Basic Memcached Operations

2110414 - Large Scale Computing Systems 27

(28)

Memcached with Java

MemcachedClient c=new MemcachedClient(

new InetSocketAddress("127.0.0.1", 11211)); c.set("someKey", 3600, someObject);

Object myObject=c.get("someKey"); c.delete("someKey")

2110414 - Large Scale Computing Systems 28

(29)

Memcached and MySQL

 Caching the results of database queries

 “SELECT * FROM users WHERE userid = ?” with (userid:user)

29

(30)

Putting It All Together:

Facebook Architecture

2110414 - Large Scale Computing Systems 30

(31)

References

 P. Strassmann, “ Introduction to Virtualization”,

http://www.strassmann.com/pubs/gmu/2008-10.pdf, George Mason University, 2008

 Gluster Community, “Gluster 3.1.x Documentation”,

http://www.gluster.com/community/documentation/index.php/Main_Page, 2010

 “Designing and Implementing Scalable Applications with Memcached and

MySQL”, MySQL White Paper, 2008

 Wikipedia, “Memcached”, http://en.wikipedia.org/wiki/Memcached,

2010

2110414 - Large Scale Computing Systems 31

References

Related documents

Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions. Start

If you are using a Commercial release of MySQL NDB Cluster 7.6, see the MySQL NDB Cluster 7.6 Community Release License Information User Manual for licensing information,

10 Database Server mysqld sshd snmpd /proc/ diskstats Cacti MySQL (contains cacti configuration) .rrd Files (contains rrd data) Apache (serves web interface) Poller (cronjob which

To attain and retain ICSA Labs Network IPS Certification, the candidate being tested must repeatedly prevent any and all attacks targeting the vulnerability set while 80% of

As indicated in section 1, Culpeper and Archer (2008) found no evidence for ability-CIRs involving can you or could you (or their variant forms) in their detailed study of requests

Beim Vergleich mit den Ergebnissen der Indirekten Methode wird jedoch deutlich, dass auf diese Weise erzielte Ergebnisse einer systematischen Abweichung unterworfen sind,

Our  development  team  is  close‐knit  and  highly  motivated  with  a  keen  interest  in  the  products 

If the engineer were to make a structural design error resulting in financial loss to the client, the engineer would be potentially liable to the client on