• No results found

Big Data & Scripting storage networks and distributed file systems

N/A
N/A
Protected

Academic year: 2021

Share "Big Data & Scripting storage networks and distributed file systems"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data & Scripting

storage networks and

distributed file systems

(2)

adaptivity: Cut-and-Paste

1

• distribute blocks to [0,1] using hash function

• start withn nodes: n equal parts of [0,1] [0,1n]→N1, (in1,ni]→Ni

adding a node:

– reassign stripe of width n+11 from each part to new nodeNn+1 – move blocks in these parts to Nn+1

– fuse new part together sorted by previous block assignment consider as one continuous interval

1Brinkmann, Salzwedel, ScheidelerEfficient, distributed data placement

(3)

adaptivity: Cut-and-Paste

assumption: hash function yields uniform random distribution

• expected equal distribution of blocks to nodes

• node addresses can be determined without keeping table

– retrace splitting scheme

– ordering before fusing allows exact location

• not applicable in heterogeneous situations

• extensions to heterogeneous case exist2

2Brinkmann, Salzwedel, ScheidelerCompact adaptive placement schemes for

non-uniform requirements , 2002

(4)

data access

• until now: store and retrieve blocks

• distribution and redundancy by duplicates

• open: accessing this data

• possible guarantees/properties:

– parallel data access distributed among nodes

– synchronization with read operations (i.e. read always current state) – synchronization of write/update operations (e.g. which update

wins)

– performance:

latency = time until system reacts to request

bandwidth = transferable data per time unit (e.g. Mb/s)

update strategy determines guarantees and strategies for read access

(5)

data access: consistency and synchronicity

• redundancy →block update affects multiple nodes

synchronous updates

• sequence of read write accesses yields correct results example:

– write operationA1 affects one copy of a block Bi – next read access A2 reads different copy of Bi • all copies ofBi should be identical (synchronous)

consistency

• parallel updates should not lead to “alternative histories” example: simultaneous updates A1 and A2 to different copies of

blockBi [] two copies of Bi in different states • determine winner copy of Bi or prevent parallel update

(6)

data access: master copy

the master copy can be used ensure consistency:

• blockBi has r duplicates: Bi1, . . . ,Bir

• a single, distinguished copy is Bi’s master copy, sayBi1

• consistency is ensured by limiting write access to the master copy

• reading:

– use any copy

reading in parallel is allowed

• writing:

– write only to master copy – update duplicates from master

(7)

data access: master copy

updates can be executed using different strategies, resulting in different guarantees

immediately

• hold reading access when master copy is written

• can guarantee synchronicity

lazily

• e.g. when system load allows without blocking access

• copies can be in obsolete state

(8)

data access: master copy

• access-blocking needs centralized write access

• centralization leads to bottle-necks in efficiency and security guarantees like synchronicity can often only be given in exchange for efficiency:

synchronous systems with immediate update

• have block reading access on updates (→centralization)

• can guarantee synchronicity and consistency

asynchronous / lazy updates

• can still guarantee consistency

(9)

data access: without master copy

• writing is allowed to any duplicate

• synchronicity could be achieved by blocking duplicates

– blocking in a distributed situation as complicated – easily leads to locking situations

updates without blocking or master copy

• parallel writing is allowed

• two duplicates can be in different updated states

• neither synchronicity nor consistency guaranteed

• synchronization strategy needed to resolve inconsistencies

– duplicates in different states have to be reunified – example: latest update always wins

(10)

data access: transactions

abstraction of grouped updates

• implement invariants in storage systems

• example: online store

– (1) outstanding debts, (2) bills to be send – (3) goods available, (4) goods to be send – when order is placed, ensure (1)-(4) are

• all updated (purchase successful, transaction completed) • not updated at all (purchase failed, transaction failed) • transactions group individual updates together

usually either all updates are executed or none (atomic)

• systemcan guarantee properties for transactions

e.g. ACID (atomic, consistent, isolated, durable) in RDBMS

(11)

data access: the CAP-theorem

formalize guarantees and network model:

consistency

– transactions, guarantee ACID-properties – allows to keep system in a consistent state

availability

– data still accessible if nodes fail (e.g. backups)

partition tolerance

– tolerate disconnection of the network – individual parts function independently – allows arbitrary scaling

network model

– no global clock (asynchronous) – nodes decide on local information

(12)

data access: the CAP theorem

CAP theorem

Maximal two properties out of consistency, availability and partition tolerance can be guaranteed.

• conjectured by Brewer, 2000

• proven for an asynchronous network model3

• improvements are possible when nodes have a global timer (synchronous network model)

• keeping all nodes on single, global time is very complicated (if possible at all)

3Lynch, Gilbert,Brewer’s conjecture and the feasibility of consistent,

(13)

data access: trade-offs

redundancy

– enables parallelism and failure tolerance – enforces updates in multiple places

• collides with

efficiency

synchronicity andconsistency

• systems differ in the subset of properties they provide

– optimal system for particular task result of required guarantees – distributed RDBMS systems: consistency, availability

– NOSQL4 systems: availability, partition tolerance

4not only SQL

References

Related documents

We have found that bank type matters as they have different role in bank networks, the Brazilian bank network is characterized by money centers, in which large banks are

The WHOQOL-BREF (WHO, 1996) was used to assess the health related quality of life of respondents. This is a 26 item self administered questionnaire. However, the questions were

Using graduate competencies as a way to specify expected outcomes gives us a powerful way to tell the story about the strengths of our graduates, continue to bolster our image,

— the Government has put in place a broad strategy to address the problem of mortgage arrears and family home repossessions which has included an extensive suite

This paper presents the results of a systematic analysis of all judgments handed down by the High Court, Court of Appeal, and House of Lords in defamation claims brought by non-human

Intratec Production Cost Reports describe specific chemical production processes and present detailed and up-to-date analyses of their cost structure, encompassing capital

In order to gain a better insight into the causes of this phenomenon it is important to observe where the informal economy is more prominent and where its share is small or

mononematous, macronematous, rarely semimacronematous, septate, straight or flexuous, geniculate at upper part, cell size decreasing towards apex, irregularly branched, cell