An Overview of a Scalable
Distributed Database System:
SD-SQL Server
Witold LITWIN, Soror SAHRI & Thomas SCHWARZ
[email protected] [email protected] [email protected]
Ceria Laboratory Comp. Eng. Dep.
Paris-Dauphine University Santa Clara U.
BNCOD 2006
1. Introduction
2. Architecture
3. Command Interface 4. Processing
5. Performance
6. Conclusion & Future Work
Overview
Most DBSs have distributed/parallel versions with partitioned tables
SQL Server, Oracle, DB2, MySQL, Postgres…
Partitioned Tables
BENEFITS OF PARTITIONING
BENEFITS OF PARTITIONING
Partitioning can provide tremendous benefits to a wide variety of applications by improving manageability, performance, and availability. It is not unusual for
partitioning to improve the performance of certain queries or maintenance
operations by an order of magnitude. Moreover, partitioning can greatly simplify common administration tasks.
Partitioning also enables database designers and administrators to tackle some of the toughest problems posed by cutting-edge applications. Partitioning is a key tool for building multi-terabyte systems or systems with extremely high
availability requirements.
Partitioning in Oracle Database 10g Release 2 An Oracle White Paper May 2005
DBSs require manual partitioning
And manual repartitioning when tables scale-up
DBSs do not provide dynamically scalable tables
Hassle of Partitioning
http://ceria.dauphine.fr/CERIA-publications.html
Research Report, December 2005
[Oracle Database 10g]
Facts
Scalable Distributed
Partitioning of Relational Tables
Scalable Distributed Database System
SD-DBS
SD-SQL Server Goal
Several SDDS schemes are well-known by now:
LH*, RP*, k-RP*, LH*RS…Chord, VBI & most of P2P schemes
The domain has over 20.000 references on Google
An SD-DBS reuses SDDS design principles
With DB management specificity
A scalable distributed data structure
Specifically designed for possibly very large data on multi-computers or networks of WSs
P2P & Grids in modern vocabulary
Why SDDS Role Model?
Role Model: an SDDS
The first and yet the only SD-DBS
Implements the SD-DBS architecture
• Litwin, Schwartz & Risch (2002)
Runs on Microsoft SQL Server 2000
Shared Nothing Architecture
Up to 250 nodes at present
SD-SQL Server
Linked SQL Servers
D1 D2 Di Di+1
S S P
C D1_T
_D1_T _D1_T
User/Application
T
sd_create_table
User/Application
sd_insert
_D1_T Split
NDBs
SD-SQL server SD-SQL
server SD-SQL
client SD-SQL
Server Managers
SD-SQL peer
Gross Architecture
DB1
Node1 Node2 Node3 Nodei
DB1
……
DB1 MDB
DB1 SDB DB2 SDB
DB2 DB2
Nodes, SDBs & NDBs
Primary NDB
First for an SDB
Client NDB
• Interfaces applications & users
• Carries only images
• No actual tables with application data
Server NDB
• Carries actual tables
• segments
Peer NDB
• Both functions
NDB Types
For the application: a table of an SDB
Internally: a collection of segments behind client images
A segment is an SQL table
• One per NDB of the SDB
• Sharing the scalable table scheme
• Except its check constraint
• Min and Max value of the partition key
• With size limit
• Splitting when overflows occur
The check constraints partition the partition key space
Scalable (Distributed) Table
The primary segment
First allocated for a new table
At some server or peer NDB of SDB
• The peer creating the table
• The primary server of the client creating the table
Scalable (Distributed) Table
Linked SQL Servers
D1 D2 Di Di+1
S S P
C D1_T
_D1_T _D1_T
User/Application
T
sd_create_table
User/Application
sd_insert
_D1_T Split
NDBs
SD-SQL server SD-SQL
server SD-SQL
client SD-SQL
Server Managers
SD-SQL peer
Gross Architecture
For the application, it the client image is the table
The image name is the table name
Primary image
Created during the scalable table creation
• at the client or peer NDB creating the table
Secondary images
Created later on
On other NDBs of the SDB
For local applications
By a dedicated command
• sd_Create Image….
Scalable (Distributed) Table
Internally, every image is a specific SQL Server view of the segments:
Distributed partitioned union view
CREATE VIEW T AS SELECT * FROM N2.DB1.SD._N1_T UNION ALL SELECT * FROM N3.DB1.SD._N1_T UNION ALL SELECT * FROM N4.DB1.SD._N1_T
Updatable
• Through the check constraints
With or without Lazy Schema Validation
Scalable (Distributed) Table
Linked SQL Servers
D1 D2 Di Di+1
S S P
C D1_T
_D1_T _D1_T
User/Application
T
sd_create_table
User/Application
sd_insert
_D1_T Split
NDBs
SD-SQL server SD-SQL
server SD-SQL
client SD-SQL
Server Managers
SD-SQL peer
Gross Architecture
Store various SD-SQL Server meta-data
In particular about each scalable table
At each server or peer NDB
• SD.Size meta-table
• Segment capacity
• The number of stored tuples triggering a split
• Same for every segment at present
• SD.RP meta-table
• the actual partitioning of the scalable table
• The location of each segment
• SD.Primary table
• The location of the SD.RP table for each segment in the NDB
SD SQL Server Meta-Tables
At every client or peer NDB
In SD.Image table
• All the local images
• The name of the image
• The type
• Primary or secondary
• The number of segments
• As seen by an image
• Not necessarily the actual one
SD SQL Server Meta-Tables
At every NDB
SD.SDBNode points towards the primary NDB
SD.MDBNode points towards the MDB
At MDB
SD.Nodes indicates all the available SD-SQL Server nodes
• Over linked SQL Server nodes
SD.SDB describes all the SDBs
At every primary NDB
SD.NDB points to every NDB of the SDB
SD SQL Server Meta-Tables
…….
DB1 SDB
N1.DB1 N2.DB1 N3.DB1
T Scalable Table
Ni.DB1
Size 1000
N1.DB1 Primary
Ni.DB1 Nodes
N1.DB1 N2.DB1 N3.DB1 RP
Meta-Tables Meta-Tables
Scalable Tables: Meta-data
SDBNode
The number of segments in a scalable table may grow
An overflowing segment splits
• Creating one or more new segments
A split occurs when an insert overflows the segment capacity
The trigger launches the split as an asynchronous job called splitter
To avoid the application level timeout
Scalable Table Expansion
Linked SQL Servers
D1 D2 Di Di+1
S S P
C D1_T
_D1_T _D1_T
User/Application
T
sd_create_table
User/Application
sd_insert
_D1_T Split
NDBs
SD-SQL server SD-SQL
server SD-SQL
client SD-SQL
Server Managers
SD-SQL peer
Gross Architecture
Every new segment
Is basically created at an existing NDB that does not yet have any segments of the expanding
table
• provided there is any
Otherwise a new NDB is first appended to SDB
• Provided there is an available SD SQL Server node
Inherits the “father”’s schema
Gets its new check constraint
Gets indexes as defined at the “father”
Scalable Table Expansion
S b+1
S S1
b+1-p p
p=INT(b/2)
C( S)= { c: c < h = c (b+1-p)}
C( S1)={c: c > = c (b+1-p)}
Check Constraint?
b
SELECT TOP Pi * INTO Ni.Si FROM S ORDER BY C ASC SELECT TOP Pi * WITH TIES INTO Ni.S1 FROM S ORDER BY C ASC
Single Segment Split
Single Tuple Insert
Single Segment Split
Bulk Insert
Single segment split
Multi-Segment Split
Bulk Insert
Multi-segment split
SDB DB1 SDB DB1
Scalable Table T sd_insert
N1 N2 N3 N4
NDB
DB1
NDB
DB1
NDB
DB1
sd_insert
NDB
DB1
Ni
sd_create_node
sd_insert
N3 NDB
DB1
sd_create_node_database
NDB
DB1
…….
sd_create_node_database
SDB DB1
Split with SDB Expansion
Image Adjustment
The splits do not modify synchronously the images
Any split makes every image outdated
The client or peer verifies every image dynamically when a query to the image comes in
Image checking
Image adjustment if necessary
Get the number of segments presented in the image, N1
Get the number of segments of the scalable table, N2
Compare N1 and N2:
If N1<N2 then Image Adjustment
Alter the partitioned view definition
Image Adjustment
Image: Example
N1.DB1 N2.DB1 N3.DB1
T Scalable Table
CREATE VIEW T AS SELECT * FROM N2.DB1.SD._N1_T CREATE VIEW T AS SELECT * FROM N2.DB1.SD._N1_T UNION ALL SELECT * FROM N3.DB1.SD._N1_T
UNION ALL SELECT * FROM N4.DB1.SD._N1_T Primary
Image
DB1 SDB
N4.DB1
T
Image
The application interface manipulates scalable tables through SD-SQL Server commands
The SD-SQL Server commands start with ‘sd_’
to distinguish from SQL Server commands for static tables
INSERT sd_insert
CREATE TABLE sd_create_table
Application Interface
Node Creation
sd_create_node ‘Dell1’ /* Server by default */
sd_create_node ‘Ceria’, ‘client’
Node Alteration
sd_alter_node ‘Ceria’, ‘ADD server’ /* Becomes peer*/
Node Removal
sd_drop_node ‘Ceria’
Nodes Management
SDB Creation
sd_create_scalable_database ‘SkyServer’, ‘Dell1’, ‘Server’, 2
/* Creates the primary SkyServer NDB as well at Dell1*/
SDB Alteration
sd_create_node_database ‘SkyServer’, ‘Ceria’, ‘Client’
SDB Removal
sd_drop_scalable_database ‘SkyServer’
SDB & NDB Management
Scalable Table Creation
sd_create_table ‘PhotoObj (objid BIGINT PRIMARY KEY…)’, 10000
• No foreign keys yet
Scalable Table Alteration
sd_alter_table ‘PhotoObj ADD t INT’, 1000
sd_create_index ‘run_index ON Photoobj (run)’
sd_drop_index ‘PhotoObj.run_index’
Scalable Table Removal
sd_drop_table ‘PhotoObj’
Scalable Tables Management
Secondary Image Creation
sd_create_image ‘Ceria’, ‘PhotoObj’
sd_create_image ‘Dell2’, ‘PhotoObj’
Secondary Image Removal
sd_drop_image 'PhotoObj’
Image Adjustment
A view of an image
Involving perhaps static tables
And perhaps static views
…
Declared under SD-SQL Server by the SQL Server CREATE VIEW command
Scalable View
USE SkyServer /* SQL Server command */
Scalable Update Queries
sd_insert ‘INTO PhotoObj SELECT * FROM Ceria5.Skyserver-S.PhotoObj’
Scalable Search Queries
sd_select ‘* FROM PhotoObj’
sd_select ‘TOP 5000 * INTO PhotoObj1 FROM PhotoObj’, 500
Scalable Queries Management
Image Binding
Let Q a scalable query using the PhotoObj image:
sd_select ‘COUNT (*) FROM PhotoObj’
Find Images in Q
Check PhotoObj Image for Correctness Adjust PhotoObj Image if needed Send Q’ to SQL Server for Execution
Command Processing
Concurrency
SD-SQL Server processes every command as SQL distributed transaction at Repeatable Read
isolation level
Tuple level locks
Shared locks
Exclusive 2PL locks
Much less blocking than the Serializable Level
Splits use exclusive locks on segments and tuples in RP meta-table.
Shared locks on other meta-tables: Primary, NDB meta-tables
Scalable queries use basically shared locks on meta-tables and any other table involved
All the conccurent executions can be shown serializable
Concurrency
Splitter Dell1 sd_alter_table Dell2
Dell3 RP
PhotoObj
Exclusive Lock Waiting
Exclusive Lock
Shared Lock
Exclusive Lock
X X
Concurrency: Example
Experimental Environment
6 Machines Pentium IV 1.7 GHz
RAM: 780 Mb & 1 Gb
Operating System: Windows 2K Server
Ethernet Network: max bandwidth of 1 Gb/s
Use of SQL Analyzer for editing queries
Use of SQL Profiler to take measurements
The SkyServer Benchmark
We use SkyServer database as benchmark
Provided and installed at Ceria by Dr. Gray
SkyServer brings the entire database of the Sloan Digital Sky Survey, SDSS
We use of the PhotoObj table as an example scalable table
In our experiments, PhotoObj has almost 159 K tuples (about 260 MB)
•
Splitting PhotoObj with 160 k tuples into 2…5 segments, according to segment capacity
Split Time
Split Time
Splitting PhotoObj with 160 k tuples and indexes into 2… 5 segments according to segment capacity
Split Time Analysis
Longer split time may timeout a query put on wait
Future solution: Incremental Splitting
The splitter moves tuples by an increment at a time
• Let us say 1000 tuples
Then ends up by calling upon itself
• The query may proceed as the splitter releases the exclusive lock on the RP tuple
The process continues for next increment etc as long as there are tuples to move
(Q) sd_select ‘COUNT (*) FROM PhotoObj’
Query (Q1) execution time
Image Adjustment
Scalable View Processing
(Q) sd_select ‘COUNT (*) FROM Ti’
(Q): sd_select ‘COUNT (*) FROM PhotoObj’
Execution time of (Q) on SQL Server and SD-SQL Server
SD-SQL Server / SQL Server
Scalable tables are now a reality
with SD-SQL Server
No more manual repartitioning
• Unlike in any other DBS we know about
Performance analysis proves
Efficiency of our design
Immediate utility of SD-SQL Server
Conclusion
SQL Server 2005 portage
Incremental splits
Virtual repository of eGov documents
SQL Server XML View
Foreign keys for scalable tables
More performance measurements
Skyserver & other benchmarks
Error processing
High availability
Parity segments
Application to other DBMSs
Future Works
Thank you for your attention
Work performed between 2003 -2006 Partly founded by
MsResearch EEC Icons Project EEC E-Gov Project