Big Data & Scripting storage networks and distributed file systems

(1)

Big Data & Scripting

storage networks and

distributed file systems

(2)

adaptivity: Cut-and-Paste

1

• distribute blocks to [0,1] using hash function

• start withn nodes: n equal parts of [0,1] [0,1_n]→N1, (i−_n1,_ni]→Ni

• adding a node:

– reassign stripe of width _n₊₁1 from each part to new nodeNn+1 – move blocks in these parts to Nn+1

– fuse new part together sorted by previous block assignment consider as one continuous interval

1_{Brinkmann, Salzwedel, Scheideler}_{Efficient, distributed data placement}

(3)

adaptivity: Cut-and-Paste

• assumption: hash function yields uniform random distribution

• expected equal distribution of blocks to nodes

• node addresses can be determined without keeping table

– retrace splitting scheme

– ordering before fusing allows exact location

• not applicable in heterogeneous situations

• extensions to heterogeneous case exist2

2_{Brinkmann, Salzwedel, Scheideler}_{Compact adaptive placement schemes for}

non-uniform requirements , 2002

(4)

data access

• until now: store and retrieve blocks

• distribution and redundancy by duplicates

• open: accessing this data

• possible guarantees/properties:

– parallel data access distributed among nodes

– synchronization with read operations (i.e. read always current state) – synchronization of write/update operations (e.g. which update

wins)

– performance:

latency = time until system reacts to request

bandwidth = transferable data per time unit (e.g. Mb/s)

update strategy determines guarantees and strategies for read access

(5)

data access: consistency and synchronicity

• redundancy →block update affects multiple nodes

synchronous updates

• sequence of read write accesses yields correct results example:

– write operationA1 affects one copy of a block Bi – next read access A2 reads different copy of Bi • all copies ofBi should be identical (synchronous)

consistency

• parallel updates should not lead to “alternative histories” example: simultaneous updates A1 and A2 to different copies of

blockBi [] two copies of Bi in different states • determine winner copy of Bi or prevent parallel update

(6)

data access: master copy

the master copy can be used ensure consistency:

• blockBi has r duplicates: Bi1, . . . ,Bir

• a single, distinguished copy is Bi’s master copy, sayBi1

• consistency is ensured by limiting write access to the master copy

• reading:

– use any copy

reading in parallel is allowed

• writing:

– write only to master copy – update duplicates from master

(7)

data access: master copy

updates can be executed using different strategies, resulting in different guarantees

immediately

• hold reading access when master copy is written

• can guarantee synchronicity

lazily

• e.g. when system load allows without blocking access

• copies can be in obsolete state

(8)

data access: master copy

• access-blocking needs centralized write access

• centralization leads to bottle-necks in efficiency and security guarantees like synchronicity can often only be given in exchange for efficiency:

synchronous systems with immediate update

• have block reading access on updates (→centralization)

• can guarantee synchronicity and consistency

asynchronous / lazy updates

• can still guarantee consistency

(9)

data access: without master copy

• writing is allowed to any duplicate

• synchronicity could be achieved by blocking duplicates

– blocking in a distributed situation as complicated – easily leads to locking situations

updates without blocking or master copy

• parallel writing is allowed

• two duplicates can be in different updated states

• neither synchronicity nor consistency guaranteed

• synchronization strategy needed to resolve inconsistencies

– duplicates in different states have to be reunified – example: latest update always wins

(10)

data access: transactions

abstraction of grouped updates

• implement invariants in storage systems

• example: online store

– (1) outstanding debts, (2) bills to be send – (3) goods available, (4) goods to be send – when order is placed, ensure (1)-(4) are

• all updated (purchase successful, transaction completed) • not updated at all (purchase failed, transaction failed) • transactions group individual updates together

usually either all updates are executed or none (atomic)

• systemcan guarantee properties for transactions

e.g. ACID (atomic, consistent, isolated, durable) in RDBMS

(11)

data access: the CAP-theorem

formalize guarantees and network model:

• consistency

– transactions, guarantee ACID-properties – allows to keep system in a consistent state

• availability

– data still accessible if nodes fail (e.g. backups)

• partition tolerance

– tolerate disconnection of the network – individual parts function independently – allows arbitrary scaling

• network model

– no global clock (asynchronous) – nodes decide on local information

(12)

data access: the CAP theorem

CAP theorem

Maximal two properties out of consistency, availability and partition tolerance can be guaranteed.

• conjectured by Brewer, 2000

• proven for an asynchronous network model3

• improvements are possible when nodes have a global timer (synchronous network model)

• keeping all nodes on single, global time is very complicated (if possible at all)

3_{Lynch, Gilbert,}_{Brewer’s conjecture and the feasibility of consistent,}

(13)

data access: trade-offs

• redundancy

– enables parallelism and failure tolerance – enforces updates in multiple places

• collides with

– efficiency

– synchronicity andconsistency

• systems differ in the subset of properties they provide

– optimal system for particular task result of required guarantees – distributed RDBMS systems: consistency, availability

– NOSQL4 systems: availability, partition tolerance

4_{not only SQL}