Virtual Full Replication for Scalable. Distributed Real-Time Databases

(1)

Virtual Full Replication for Scalable

Distributed Real-Time Databases

Thesis Proposal

Technical Report HS-IKI-TR-06-006

Gunnar Mathiason

[email protected]

University of Sk¨ovde

June, 2006

(2)

Abstract

Distributed real-time systems increase in size an complexity, and the nodes in such systems become difficult to implement and test. In particular, communication for synchro-nization of shared information in groups of nodes becomes complex to manage. Several authors have proposed to using a distributed database as a communication subsystem, to off-load database applications from explicit communication. This lets the task for in-formation dissemination be done by the replication mechanisms of the database. With increasingly larger systems, however, there is a need for managing the scalability for such database approach. Furthermore, timeliness for database clients requires predictable re-source usage, and scalability requires bounded rere-source usage in the database system. Thus, predictable resource management is an essential function for realizing timeliness in a large scale setting.

We discuss scalability problems and methods for distributed real-time databases in the context of the DeeDS database prototype. Here, all transactions can be executed timely at the local node due to main memory residence, full replication and detached replication of updates. Full replication contributes to timeliness and availability, but has a high cost in excessive usage of bandwidth, storage, and processing, in sending all updates to all nodes regardless of updates will be used there or not. In particular, unbounded resource usage is an obstacle for building large scale distributed databases.

For many application scenarios it can be assumed that most of the database is shared by only a limited number of nodes. Under this assumption it is reasonable to believe that the degree of replication can be bounded, so that a bound also can be set on resource usage.

The thesis proposal identifies and elaborates research problems for bounding resource usage in large scale distributed real-time databases. One objective is to bound resource usage by taking advantages of pre-specified data needs, but also by detecting unspecified data needs and adapting resource management accordingly. We elaborate and evaluate the concept of virtual full replication, which provides an image of a fully replicated database to database clients. It makes data objects available where needed, while fulfilling timeliness and consistency requirements on the data.

In the first part of our work, virtual full replication makes data available where needed by taking advantages of pre-specified data accesses to the distributed database. For hard real-time systems, the required data accesses are usually known since such systems need to be well specified to guarantee timeliness. However, there are many applications where a specification of data accesses can not be done before execution. The second part of our work extends virtual full replication to be used with such applications. By detecting

(3)

new and changed data accesses during execution and adapt database replication, virtual full replication can continuously provide the image of full replication while preserving scalability.

One of the objective of the thesis work is to quantify scalability in the database context, so that actual benefits and achievements can be evaluated. Further, we find out the conditions for setting bounds on resource usage for scalability, under both static and dynamic data requirements.

(4)

2.4.1 Basic notation . . . 13 2.4.2 Definition . . . 14 2.5 Incremental recovery . . . 14 3 Problem formulation 15 3.1 Motivation . . . 15 3.2 Problem statement . . . 16 3.3 Problem decomposition . . . 17 3.4 Aims . . . 18 3.5 Objectives . . . 18 4 Methodology 19 4.1 Virtual full replication for scalability . . . 19

4.2 Static segmentation . . . 20

4.2.1 Static segmentation algorithm . . . 22

4.2.2 Properties, dependencies and rules . . . 25

4.2.3 Analysis model and assumptions . . . 26

4.2.4 Analysis . . . 28

4.2.5 Implementation . . . 29

4.2.6 Discussion . . . 31

4.3 Simulation for scalability evaluation . . . 31

4.3.1 Motivation for a simulation study . . . 32

4.3.2 Simulation objectives . . . 33

4.3.3 Validation of software simulations . . . 33

4.3.4 Modeling detail . . . 35

4.3.5 Approaches for validity and credibility . . . 35

4.3.6 Experimental process and experiment design . . . 36

(5)

4.3.8 Experiment execution . . . 39

4.3.9 Discussion and results . . . 39

4.4 Adaptive segmentation . . . 41

4.4.1 Adaptive segmentation with pre-fetch . . . 43

4.5 A framework of properties . . . 44

4.6 Case study and implementation . . . 44

5 Related work 46 5.1 Strict consistency replication . . . 46

5.2 Replication with relaxed consistency . . . 46

5.3 Partial replication . . . 46

5.4 Asynchronous replication as a scalability approach . . . 47

5.5 Adaptive replication and local replicas . . . 47

5.6 Usage of data properties . . . 47

6 Conclusions 48 6.1 Summary . . . 48

6.2 Initial results . . . 48

6.3 Expected contributions . . . 49

6.4 Expected future work . . . 49

A Project plan and publications 51 A.1 Time plan and milestones . . . 51

A.2 Publications . . . 52

A.2.1 Current . . . 52

(6)

List of papers

This Thesis Proposal is based on the work in the following papers.

G. Mathiason and S. F. Andler. Virtual full replication: Achieving scalability in distrib-uted real-time main-memory systems. In Proc. of the Work-in-Progress Session of

the 15th Euromicro Conf. on Real-Time Systems, pages 33-36, July 2003. (ISBN

972-8688-11-3) [52]

G. Mathiason, S. F. Andler, and D. Jagszent. Virtual full replication by static segmen-tation for multiple properties of data objects. In Proceedings of Real-time in Sweden

(RTiS 05), pages 11-18, Aug 2005. ISBN 91-631-7349-2, ISSN 1653-2325) [53]

G. Mathiason. A simulation approach for evaluating scalability of a virtually fully repli-cated real-time database. Technical Report HS-IKI-TR-06-002, University of Sk¨ovde, Sweden, Mar 2006. [50]

(7)

1 Introduction

In a distributed database, availability of data can be improved by allocating data at the local nodes where data is used. For real-time databases transaction timeliness is a major concern. Data can be allocated to the local node, to avoid remote data access with the risk of unpredictable delays on the network. Further, multiple replicas of the same data at different nodes allow data redundancy that can be used for recovery from failures, avoiding corruption or losses of data when nodes fail. In a fully replicated database the entire databse is available at all nodes. This offers full local availability of all data objects at each node.

The DeeDS [6] database prototype stores the fully replicated database in main mem-ory to make transaction timeliness independent of disk accesses. Full replication of data together with detached replication of updates allow transactions to execute on the local node entirely, independent of network delays. Detached replication sends updates to other nodes independent from the execution of the transaction. Database clients get the per-ception of a single local database and need not to consider specific data locations or how to synchronize concurrent updates of different replicas of the same data.

Full replication uses excessive system resources, since the system must replicate all updates to all the nodes in such a system. This causes a scalability problem for resource usage, such as bandwidth for replication of updates, storage for data replicas, and process-ing for replicatprocess-ing updates and resolvprocess-ing conflicts for concurrent updates at different nodes for the same data object.

With existing approaches, real-time databases does not scale well, and the need for increasingly larger real-time databases increase. With this Thesis we aim show how to bound resource usage in a distributed real-time database such as DeeDS, by using virtual full replication to make it scalable. We also quantify scalability for the domain, for an evaluation of the benefits

We elaborate virtual full replication [6] as an approach to manage resources for scala-bility, to replicate and store updates only for those data objects that are used at a node, and maintain scalability over time. This avoids irrelevant replication and maintains real-time properties, while providing an image of full replication for local availability of data. Virtual full replication reduces resource usage to meet the actual need, rather than using resources to replicate all updates blindly to all nodes, resulting in a scalable distributed real-time database. Virtual full replication uses knowledge about the needs of the ap-plication, and uses application requirements for properties of the data, such as location, consistency model, and storage media to support timeliness requirements of the clients in a scalable system. Furthermore, properties have relations that can be used to control

(8)

resource usage in more detail.

This Thesis Proposal claims that a replicated database with virtual full replication and eventual consistency enables large scale distributed real-time databases, where each database client have an image of full availability of a local database.

1.1 Document layout

Section 2 contains a background on distributed real-time databases and in particular how full replication may support timeliness. An generic approach for scalability evaluation is introduced, and indicates how such approach can be used for a scalability evaluation of a distributed databases. In section 3 the problem of scalability in a fully replicated database is detailed and how we decompose the problem, how the problem can be defined in terms of assessable objectives. Section 4 describes our methodology in detail, for how to reach the objectives. Some of the steps in the methodology have been performed already and the results are presented in conjunction, while other steps remains to be done. Section 5 describes related work, both for distributed real-time databases and for other areas that relates to our approach. Finally, section 6 summarizes the conclusions in this proposal, highlights the expected contributions and expected consequences for subsequent future work.

(9)

2 Background

2.1 A distributed real-time database architecture

The main property of real-time systems is timeliness, which can only be achieved with predictable resource usage and with sufficiently efficient execution. Predictability is essen-tial and for a hard real-time system the consequence of a missed deadline may be fatal, while for soft real-time systems a missed deadline lowers the value of the provided service. Thus, for real-time systems, predictable resource usage is the primary design concern that enables timeliness.

To improve predictability and efficiency, the database of the distributed real-time data-base system DeeDS [6] resides entirely in main memory, removing dependability on disk I/O delays caused by unpredictable access times for hard drives. Also, accesses to main memory are many times faster.

To further improve predictability of database transactions, the database is fully repli-cated to all nodes, to make transaction execution independent of network delays or network partitioning. With full replication there is no need for remote data access during trans-actions. Replication also improves fault tolerance, since there are redundant copies of the data. Full replication allows transactions to have all of its operations running at the local node. A fully replicated database with detached replication [32], where replication is done after transaction commit, allow independent updates [18], that is, concurrent and unsyn-chronized updates for replicas of the same data objects. Independent updates may cause database replicas to become inconsistent, and inconsistencies must be resolved in the repli-cation process by a conflict detection and resolution mechanism. In DeeDS, updates are replicated to all nodes detached from transaction execution, by propagation from the node executing an updating transaction after transaction commit, and integration of replicated updates at all the other nodes. Conflicting updates as result of independent updates are resolved at integration time. Temporary inconsistencies are allowed and guaranteed to be resolved at some point in time, giving the database the property of eventual consistency. Applications that use eventually consistent databases need to be tolerant to temporarily inconsistent replicas, and many distributed applications are. Further, in an eventually con-sistent database that supports a bounded time for such temporary inconsistencies (which can be achieved by using bounded time replication), applications that have requirements on timely replication and consistency can use a temporary inconsistent database as well.

In a fully replicated database using detached replication, a number of predictability problems can be avoided that are associated with synchronization of concurrent updates at different nodes, such as agreement protocols or distributed locking of replicas of objects,

(10)

and reliance on stable communication to access data. Also, application programming is easier since the application programmer may assume that the entire database is available, and that the application program has exclusive access to it.

2.2 The concept of scalability

Scalability is a concept that intuitively may appear obvious. It is used in many areas of research, but the availability of a generic definition and a theoretical framework with metrics is limited. A system is scalable if the growth function for required amount of resources, req(p), does not exceed the function for available amount of resources, res(p), when the system is scaled up for some system parameter p (also called the scale factor ). The resource usage may follow some function of the scale factor, and the upper bound for this function, O(req(p)), must not exceed the function of available resources. For a system with linear scalability this relation must be valid for all sizes of p, but for other systems scalability may be related to only certain sizes of p. In a few research areas, scalability concepts are well developed and related metrics for scalability are available, namely in the areas of parallel computing systems [86] [57] in particular for resource management [55], and for shared virtual memory [74], for design of system architectures for distributed systems [16], and for network resource management [3].

Fr¨olund and Garg define a list of generic terms for scalability analysis in distributed application design [29]:

• Scalability: A distributed (software) design D, is scalable if its performance model

predicts that there are possible deployment and implementation configurations of D that would meet the Quality of Service (QoS) expected by the end user, within the scalability tolerance, over a range over scale factor variations, within the scalability limits.

• Scale factor : A variable that captures a dimension of the input vector that defines

the usage of a system.

• Scaling enabler : Entities of design, implementation or deployment that can be

changed to enable scalability of the design.

Further, the authors also define Scalability point, Scalability tolerance, Scalability limits and Scaling parameters, which we exclude here.

We can map the terms above into our context. QoS can be seen as timeliness of local transactions (deadline miss ratio), the level of consistency (consistency model properties and the bound on replication delay). Scale factors may be the database size, the number of nodes, the ratio and frequency of update transactions or others.

(11)

Jogalekar and Woodside [40] have presented a generic metric for scalability in distrib-uted systems based on productivity, and a system is regarded as scalable if productivity is maintained as the scale changes. Given the quantities:

• λ(k) = throughput in responses/sec, at scale k

• f (k) = average value of each response, calculated from its quality of service at scale k

• C(k) = cost at scale k, expressed as running cost per second to be uniform with λ

The value function f (k) includes appropriate system measures, such as response de-lay, availability, timeouts or probability of data loss. The productivity F (k) is the value delivered per second, divided by the cost per second:

F (k) = λ(k) ∗ f (k)/C(k) (1)

The scalability metric ψ(k1, k2) relates the productivity at two different scales such

that

ψ(k1, k2) = (F (k2))/(F (k1)) (2)

With ψ(kx, ky), design alternatives x and y can be compared, such that a higher ratio

indicates better scalability for ky. It it not possible to compare scalability between different

systems since the ratio is based on specific value functions, but scalability can be evaluated for finding alternative scaling enablers and settings for scale factors for a particular system. The authors use the scalability metric for evaluating actual systems, and the metric has also been used by other authors using other value functions [17] [42]. Similarly, we need to define an appropriate value function for a distributed real-time database, which would be connected to the real-time properties of the system. Typically this would include the timeliness of transactions for different classes of transactions. The cost function would relate to the amount of resources used, in terms of bandwidth, storage and processing time.

2.3 Scalability in a fully replicated real-time main memory

data-base

For a fully replicated database with immediate consistency, updates are replicated to all data replicas during the execution of the updating transaction, using a distributed agreement algorithm to update all replicas at one instant, and where there is no state where replicas can differ during the update. Such updates must lock a majority of replicas of the

(12)

updated object during the transaction. With detached replication, transaction timeliness does not depend on timeliness for locking replicas, the network delays, or the waiting time for release of locks set by transactions at other nodes. Thus, detached replication is a scaling enabler for such database. However, a fully replicated database with detached replication has another scalability problem in that all updates need to be sent to all other nodes, regardless of whether the data will be used there or not. Full replication also requires that replicas of all data objects must be stored at all the nodes, independent of whether the data ever will by used there or not. Also, updates in fully replicated databases must be integrated at all nodes, requiring integration processing of updates at all nodes.

Under the assumption that replicas of data objects will only be used at a bounded subset of the nodes, the required degree of replication becomes lower than in a fully replicated database. The resource usage of bandwidth, storage and processing depends on the degree of replication, and these resources are wasted in a fully replicated database, compared to a database with a lower degree of replication. With replication of only those data objects that are actually used, resources can be saved and thereby scalability can be improved.

2.4 Virtual full replication

With virtual full replication [6] the database clients has a image of a fully replicated database. The database system manages knowledge of what is needed for database clients to perceive such an image. This knowledge includes a specification of the data accesses required by database clients. Virtual full replication use the knowledge to replicate data objects to a subset of the nodes where data is needed, and also to replicate with the least resource usage needed to maintain the image. With virtual full replication several consistency models for data may coexist in the same database, and the knowledge is also used to ensure the consistency model for each data object.

Scalable usage of resources is dependent on the number of replicas to update. Since a virtually fully replicated database has fewer replicas, resource consumption is lower, while in a fully replicated database there are as many replicas as there are number of nodes. The overall degree of replication is lower and replication processing that serves no purpose is avoided. Only those data objects that are used at a node are replicated there, which reduces resource usage of both bandwidth, storage and processing. With such resource management, scalability is improved without changing the application’s assumption of having a complete database replica available at the local node.

However, virtual full replication that considers only specified data accesses does not make data available for accesses to arbitrary data at an arbitrarily selected node, which

(13)

makes such a system less flexible than a full replicated database. For unspecified data accesses and for changes in data requirements, virtual full replication adapts by detecting changes and reconfigure replica allocation and replication. This preserves scalability by managing the degree of replication over time.

Segmentation of the database is an approach for limiting the degree of replication in a virtually fully replicated database. A segment is a subset of the objects in the database and each segment has an individual degree of replication over the nodes. The degree of replication is a result of allocating segments only to the nodes where its data objects are accessed. This is typically much fewer nodes than used in a fully replicated database. Also, a database may have multiple segmentations, each for a different purpose. Segmenting for data availability strives to minimize the degree of replication, while a segmentation for consistency points out the method for replicating updates in the database.

2.4.1 Basic notation

In this thesis proposal the following notation is used.

A database maintains a finite set of logical data objects O = {o0, o1, ...}, representing

database values. Object replicas as physical manifestations of logical objects. A distributed

database is stored at a finite set of nodes N = {N0, N1, ...}. A replicated database contains

a set of object replicas R = {ro, r1, ...}.

The function R : O × N → R identifies the replica r ∈ R of a logical object o ∈ O on a node N ∈ N if such a replica exists. R(o, N ) = r if r is the replica of o on node

N . If no such replica exists, R(o, N ) = null. node(r) is the node where replica r is

located, i.e. node(R(o, N )) = N . object(r) is the logical object that r represents, i.e..

object(R(o, N ))= o..

A distributed global database (or simply database) D is a tuple < O, R, N >, where

O is the set of objects in D, and R is the set of replicas of objects in O, and N is the set

of nodes such that each node N ∈ N hosts at least one replica in R, i.e. N = {N | ∃r ∈

R(node(r) = N )}.

We model transaction programs, T , using four parameters, including the set of objects

read by the transaction, READT (the read set), the set of objects written by the

transac-tion, WRIT ET (the write set), the conflict set CON FLICTT is the set of objects that

conflicts with updates at other nodes. The transaction program T can thus be defined as

T = {READT, WRIT ET, CON FLICTT}. Also, we refer to the size of the read set as

rT =| READT |, the size of the write set as wT =| WRIT ET |, and the size of the conflict

set as cT =| CON FLICTT |. Also, the working set WST is the union of the read and

(14)

An transaction instance Tj of a transaction program is executing at a given node n

with a certain maximal frequency fj. We define such transaction instance by a tuple such

that Tj =< fj, n, T >. When the node for execution is implicit and when we only need the

sizes of the read, write, and conflict sets, we simplify the notation as Tj =< fj, rj, wj, cj >.

Further, node(Tj) is the node where transaction T is executed, begin(Tj) is the time

at which transaction Tj begun its execution. commit(Tj) is the commit time of Tj and

abort(Tj) is the abort time of Tj (if Tj is aborted, commit(Tj) = ⊥, if Tj is committed,

abort(Tj) = ⊥). end(Tj) is the completion time of Tj.

2.4.2 Definition

Virtual full replication ensures that for each transaction that read or write database objects at a node, there exists a replica of the object. Formally,

∀o ∈ O∀T (o ∈ {READT ∪ WRIT ET} → ∃r ∈ R(r = R(o, node(T ))))

2.5 Incremental recovery

With incremental recovery, a node in a full replicated distributed database can be recovered into a consistent database replica, without that any of the other working replicas need to stopped or even locked [45]. Incremental recovery was proven to give a consistent database copy at the recovered node.

In a main memory database, data is stored by memory pages, which is the smallest memory entity that is access during read or write operations. It is assumed that memory management circuitry ensures that a read or write operation has exclusively access to a certain memory page during the operation. Fuzzy checkpointing uses this mechanism for sequentially copying all memory pages at a node (the recovery source), and such a copy can be sent over the network to recover a failed node (the recovery target). Selecting a recovery source is done by a negotiation process, to select the most appropriate source node. Each page that is copied is logged at the recovery source, and pages in the log that are updated after the memory page was sent to the target node need to be sent again. For such updates, updated memory pages are forwarded to the recovery target as long as fuzzy checkpointing proceeds. Once the entire database has been copied and all subsequent updates have been forwarded, the fuzzy checkpoint finishes atomically with a consistent replica at the recovered node.

(15)

3 Problem formulation

3.1 Motivation

A fully replicated distributed database scales badly since it replicates all updates to all nodes, using excess resources. Scalability is an increasingly important issue for distributed real-time databases, since the number of nodes, the database size, the number of users and the workload involved in typical distributed database applications are increasing. For many such systems it can be assumed that only a fraction of the replicated data is used at the local node, which is motivated by hot-spot and locality behavior of accesses in distributed databases.

This thesis argues that a fully replicated distributed main memory real-time database can be made scalable by effective resource management by using virtual full replication, and that degrees of scalability can be quantified by metrics. Different scale factors and different scaling enablers influence resource usage differently are varied to evaluate scalability of the database. Also, resource usage is compared to alternative approaches for timeliness in large scale distributed databases, such as fully replicated databases, or approaches that use partial replication and remote transactions.

For many applications, we believe that a distributed real-time database can be a suit-able infrastructure for communication between nodes. Publishing and using exchange data through a distributed real-time database facilitates structured storage and access, im-plicit consistency management between replicas, fault-tolerance and higher independence by lower coupling between applications. With a distributed database as an infrastructure there is no need to explicitly coordinate communication in a distributed application, which reduces complexity for the communicating application, in particular where the groups that communicate often change.

Consider a wildfire fighting mission. In such dynamic situation, hundreds of different actors need to coordinate and distribute information in real time. In such scenario, actors and information are added and removed dynamically when the mission situation suddenly change. A distributed database has been pointed out as a suitable infrastructure for emer-gency management [75], and could be used as a white-board (also called ’black-board’ [56], in particular as an software architecture approach [30], or for loosely coupled agents [54]), for reading and publishing current information about the state of a mission, supporting situation awareness from information assembled from all the actors. Using a database, implying the usage of transactions, ensures consistency of the information and also avoids the need of specific addressing within the communication infrastructure. Ethnographical field studies [43] show that such infrastructure supports sense-making by enabling

(16)

hu-mans to interact, and it also gives support for the chain of command and strengthens organizational awareness, which is essential for success of the mission [58]. In such an infrastructure actors may have access to the complete information when needed, but each actor will most of the time use only parts of the information for their local actions and for collaboration with close by peers. By using virtual full replication, scalability of such distributed database can be preserved.

Virtual full replication uses known properties about data objects, the applications and the database system, to reduce resource usage by resource management based on the data needs. We have shown a database segmentation approach that considers multiple properties for data, and relations between the data properties. So far, only a few selected properties have been used, but an extended set of useful properties and their relations will to be defined, to setup segments that meet data requirements from database applications. Since properties are related, it can be expected that groups of properties can be structured into profiles of data properties to reduce the resulting number of segments for our approach.

3.2 Problem statement

Predictable execution of transactions is inherent for distributed real-time databases. Pre-dictable transactions can be achieved by prePre-dictable resource usage and sufficient efficiency of execution, and by excluding involvement of sources of unpredictability in execution of transactions. With main memory residence, access times become independent of disk ac-cess times. With full replication, the entire database becomes locally available and local accesses become independent of network delays. With detached replication, the network delays for updating replicas is separated from the execution time of the transaction. Com-bining main memory residence, full replication and detached replication of updates gives predictable transactions in a distributed real-time database.

Such database does not scale well, since all updates need to be replicated to all other nodes. By using knowledge about data needs, irrelevant replication can be avoided and the database becomes scalable, but such replication makes data available only for the data needs that are known prior to execution. Thereby, the database loses the flexibility to execute arbitrary transactions at arbitrary nodes. To maintain scalability while making data available, the virtually fully replicated database also needs to detect and adapt to unspecified data needs that arises during execution.

Our initial work shows that segmentation of the database, based on a priori known data needs and database and application properties, improves scalability by scalable usage of three key resources. Therefore, reconfigurable segmentation of the database seems to be a viable extension to be able to adapt to changed data needs and to regain the flexibility

(17)

lost by fixed segmentation. Further, scalability of such an approach need to be evaluated. A quantification of scalability is needed to evaluate the influence of different scale factors and alternative scaling enablers.

This thesis explores, by detailed simulation, the scalability of a large scale distributed real-time database that manage resources by using virtual full replication. Such database is also capable of adapting to changed data needs while preserving real-time properties of the application. The hypothesis of this thesis is that such system is scalable, and that scalability can be preserved by adaptation during execution.

3.3 Problem decomposition

The problem has several aspects and we choose to divide it into the following components:

• Concept elaboration. The concept of virtual full replication has been pointed out as

an approach for providing an image of full replication, while replicating only what is needed to create such an image [6]. An approach needs to be elaborated with algorithms and an architecture that can support it. Also, conditions need to be found out for which virtual full will manage to provide such an image.

• Formation and allocation of units of replication. Segmentation of the database, and

replication of segments only to the nodes where the data is used, seems to be useful approach for bounding resource requirements. The cost of segmentation needs to be elaborated, both the processing cost for using a segmentation algorithm, in partic-ular during system run time, and the storage cost for data structures to maintain segments. A part of this subproblem is to define how segments are appropriately formed and adapted, and what data properties to consider for saving resources by using segmentation. Also, architectural support need to be found out for segment management and replication of updates.

• Adaptation of database configuration. To maintain scalability of the database

through-out execution, the virtually fully replicated database will need to adapt to changed data needs of the database clients. For this purpose, adaptation algorithms and ar-chitectures need to be developed and evaluated, as well as the type of changes to consider for adaptation.

• Properties and their relations. Virtual full replication is based on knowledge about

data properties and application requirements. In our present work, we have used only a few of the data properties that could be considered for segmentation and resource management. Further, we have shown that there exists relations between some of the properties that can be used, and such relations may be used to improve

(18)

manage-ment of resources. However, we have not defined a full framework of properties and their relations. In such framework it may be useful to define application profiles of related properties to match typical groups of applications, to reduce the amount of information used for resource management.

• Scalability analysis and evaluation. From our initial studies we see that different scale

factors influence scalability differently. So far, we have only evaluated a few scale factors, such as the number of nodes and the bound on replication degree, and we will add others, for a more extensive evaluation. The concept of virtual full replica-tion does not put limitareplica-tions on how the image of full replicareplica-tion is built. Therefore, alternative scaling enablers may be explored and combined for resource management, including approaches for replication of updates, segmentation considering differenti-ation in cost of communicdifferenti-ation links, or different propagdifferenti-ation topologies.

3.4 Aims

Our aims with this thesis are:

A1 To bound resource usage to achieve scalability in a distributed main memory real-time database, by exploring virtual full replication as a scaling enabler.

A2 To show how scalability can be quantified for such a system.

A3 To show how properties and their relations can be used in resource management for virtual full replication.

3.5 Objectives

O1 Elaborate the concept of virtual full replication

O2 Bound used resources by virtual full replication for expected data requirements. O3 Bound used resources by virtual full replication for unexpected data requirements. O4 Define conditions for scalability, for expected and unexpected requirements.

O5 Quantify scalability in the domain, and valuate scalability achieved related to the conditions.

(19)

4 Methodology

We develop and evaluate an approach of resource management for scalability by using vir-tual full replication. We assess scalability using analysis, simulation and implementation, while evaluating the proposed scalability enablers. We start by elaborating the problem with full replication and we present possible approaches for virtual full replication. As next steps we refine the approach by using static and adaptive segmentation, and we elab-orate segmentation algorithms for multiple properties of data and multiple requirements from database applications. For an initial evaluation of segmentation, we implement static segmentation in the DeeDS database prototype. We develop a simulation for evaluating adaptive segmentation and to allow study of large scale systems. Adaptive segmentation is more complex to analyze and also to implement in the actual database system, so we study scalability in an accurate simulation of the system. For sanity control of the simula-tion, we replicate the experiments done with the implementation already done for DeeDS. Further, the approach for adaptive segmentation is extended with pre-fetch of data for availability. Finally, we conclude the findings about data properties and their relations, as used in our segmentation approach, to document a framework of useful properties that can be use to improve scalability of distributed real-time databases. Some of the steps in our methodology have already been done while other steps remains. Some of the steps described can optionally be excluded, which is indicated at each step.

4.1 Virtual full replication for scalability

M1: In the first research step, virtual full replication has been introduced and segmen-tation has been proposed as a mean for achieving it [52]. The concept of virtual full replication was elaborated. By using knowledge about the actual data needs, and also recognizing differences in properties of the data, segments can be allocated only to the nodes where data is used, and thereby bound resource requirements at a lower level, while maintaining the same degree of (perceived) availability of data for the application. This reduces flexibility compared to a fully replicated database, since not all objects are avail-able at all nodes. Consequently, availability for unspecified accesses can not be guaranteed. For hard real-time systems, access patterns and resource requirements are usually known a priori of execution and the data needs can easily be specified. For such systems the reduced flexibility is not a problem, since virtual full replication then will guarantee availability for all data needs.

The aspect of cohesive nodes is emphasized, where some nodes share information more frequently than other nodes, or some nodes use data with the same consistency model. The paper describes a work in progress and points out essential research problems in

(20)

the area, such as the need for an analysis model for how system parameters influence critical resources. A first definition of scalability for a fully replicated database (’replication effort’) was presented and an algorithm for segmentation was introduced, however with high complexity and a combinatorial problem when using multiple data properties.

Multiple consistency models may coexist in the same database, since different segments can use different consistency models. To evaluate approach for this, an implementation for co-exstince of both bounded and unbounded replication time for updates was implemented in the DeeDS prototype [48], following the proposed architecture for virtual full replication. In this implementation there are two segments with different types of eventual consistency, one with unbounded replication delays and another with a bound for replication delays.

4.2 Static segmentation

M2: In the second research step, we pursued deeper knowledge about static segmentation. The segmentation approach for virtual full replication was refined [53].

We explore how Virtual full replication can be supported by grouping data objects into segments. The database is segmented to group data objects that need to be accessed at the same set of nodes and each segment is allocated only at the nodes where its data objects need to be accessed. Segmentation is an approach for managing resources by limiting the degree of replication for data objects to avoiding excessive resource usage. The data objects in a segment share key properties, where node allocation is one such property.

Segmenting a database for the known data references is a trivial problem, since a partitioning and replication schema can be derived directly from the list of required ac-cesses. This can be done by a database designer or automatically from a list operations in transactions. However, when adding other data properties, there will be a combinatorial increase in the resulting number of segments with unique combinations of properties, such as described in [49].

With a segmented database, each segmentation enables support for a specific property and by allowing multiple segmentations on the same database, it is possible support several properties. A segmentation on consistency allows multiple concurrent consistency models in the same replicated database. This enables new types of applications for the DeeDS database system, since it allows data objects that can not use eventual consistency. With a segmentation on storage medium, some segments may be allocated to disk instead of memory, when the requirement is to prioritize durability in favor of timeliness or even efficiency.

By segmenting the database on combinations of properties we can find the largest possible physical segments, where all data objects of a segment can be managed uniformly.

(21)

We can recover segments faster by block-reading an entire physical segment from a backup node and also prioritize recovery of critical segments.

When introducing combination of properties of the data to replicate, the number of segments will multiply with the number of data properties used and the possible values that properties can be set to. Consider a segmentation that allows two consistency models combined with segmentation for allocation. The resulting number of possible segments may double compared to a segmentation only for allocation. To find the the segments in a database where multiple properties are combined, a naive algorithm need to loop through the combination of values of each property for all data objects. Also, the more properties that are used for segmenting the database, the smaller the segments will become. In a case where each object has its own unique combination of properties, each segment will become one object large. This is a bound on the number of possible segments, which is less likely in a database where usage of objects rather cluster to cohesive sets of data objects. Thus, there is a need for an approach of segmenting data on multiple properties that is efficient and that considers clustering but also capture and use knowledge on how properties can be combined.

In this step we have presented an algorithm that handles the combinatorial problem of segmenting a database on multiple properties, where multiple and overlapping seg-mentations of the database are allowed, and where dependencies between properties are considered[53]. We introduce logical segments that allow overlapping segmentations, where each segmentation represents a property or a set of properties of interest. From logical segments, we can derive physical segments, such as allocation and recovery units. With the refined segmentation algorithm presented, we can segment a database by multiple properties without a combinatorial problem. This algorithm also recognizes the relations between data properties, where relations can be dependent and unrelated of each other, or even mutual. The dependency relation implies that a dependent property can not be supported if the property dependent upon is not supported. Relations between properties reduce the combinations of segmentations for multiple properties, and is the way profiles may be created. Our approach allows control of relations, by rules that can be applied on the set of combined properties. The approach also allows specification of data clustering, which sets the common properties equal for clustered objects typically shared by cohesive nodes.

In current work, we use our algorithm for automatic segmentation and segment alloca-tion of units of distribualloca-tion, physical segments, based on applicaalloca-tion knowledge about data used by transactions and other knowledge about data properties originating in application semantics.

(22)

For our future work, we plan to extend the algorithm to maintain scalability at execu-tion time, supporting mode changes and unspecified data requests.

4.2.1 Static segmentation algorithm

Our approach for static segmentation is based on that sets of properties are associated with each object, and that objects with same or similar property sets can be combined into segments. We introduce an example here by first considering a single property, where data is accessed.

Consider a scenario, with the following five transaction programs executing as seven transaction instances, in a database with at least six objects replicated to at least five nodes: T1.1 =< r : o1, o6, w : o6, N1>, T1.2 =< r : o1, o6, w : o6, N4 >, T2 =< r : o3, w :

o5, N3 >, T3 =< r : o5, w : o3, N2 >, T4.1 =< r : o2, w : o2, N2 >, T4.2 =< r : o2, w :

o2, N5 >, T5 =< r : o4, N3 >. Based on these accesses the objects have the following access sets: o1= {N1, N4}, o2= {N2, N5}, o3= {N2, N3}, o4= {N3}, o5= {N2, N3}, o6= {N1, N4}.

These particular access sets can give a segmentation that has an optimal placement of data: s1 =< {o4}, N3 >, s2 =< {o3, o5}, N2, N3 >, s3 =< {o1, o6}, N1, N4 >, s4 =<

{o2}, N2, N5>.

For an implementation, we use a table to collect all properties of interest. For the algorithm used with the table, the data accesses of all transactions are marked in a table, where we have data objects in rows and nodes in columns. See Figure 1 (left) for a database of 6 objects replicated to 5 nodes. Note that rows in the table equally well could represent groups of data objects, such as object classes or user-defined object clusters that share the same properties. By assigning a binary value to each column, each row can be interpreted as a binary number that forms an object key that identifies the nodes where the object is accessed. By sorting the table on the object key, we get the table in Figure 1 (right). Passing through this table once, we can collect rows with same key value into unique segments and allocate each segment at the nodes marked. See Algorithm 1 for pseudo code.

(23)

sort N1 N2 N3 N4 N5 o4

_x

o5

x x

o6

x

o1

x

o2

x

o3

_{x x}

4 6 9 9 18 6 Seg 2 Seg 3 Seg 4 Seg 1 segment key N1 N2 N3 N4 N5 o4

x

o5

x x

o6

x

o1

_x

o2

_x

o3

x x

1 4 8 16 4 6 9 9 18 6 2 column value objects

ACCESS LOCATIONS ACCESS LOCATIONS

segment key

Figure 1: Segmentation principle, used for segment allocation

The sort operation in this algorithm contributes with the highest computational com-plexity. Thus this algorithm segments the database in O(o log o), where o is the number of objects in the database.

Such property set can be extended with additional knowledge. We can uniformly handle more properties on objects by extending the property sets. Assume that the database

clients require that o2 must have a bound on the replication time, and that o1 and o6

can be allowed to be temporality inconsistent. Also, assume that objects o3, o4, o5 need

to be immediately consistent. Further, assume that objects o1, o3, o5, o6 are specified to

be stored in main memory, while objects o2, o4 are stored on disk. To control multiple

properties, we extend the property set into (resulting in table shown in Figure 2)

o1= {N1, N4}, {asap}, {memory}, o2= {N2, N5}, {bounded}, {disk}, o3= {N2, N3}, {immediate}, {disk}, o4= {N3}, {immediate}, {disk}, o5= {N2, N3}, {immediate}, {memory}, o6= {N1, N4}, {asap}, {memory}.

The sort operation in the algorithm implemented contributes with the highest com-putational complexity. One sort operation is used for each segmentation of interest, and typically a reduced number of segmentations are used. Thus, the algorithm segments the database in O(o log o) for multiple properties as well, where o is the number of ob-jects in the database. This is far better than the naive algorithm for multiple property segmentation that was presented in [49], with a computational complexity of O(o!) .

(24)

/Mark the read and write sets /

clear(access);

for i ← 1 to numtransactions do

for j ← 1 to T [i].numnodes do

for k ← 1 to T [i].L.numreads do

access(j,T[i].L.read[k])=1;

end

for k ← 1 to T [i].W.numwrites do

access(j,T[i].W.write[k])=1;

end

/Assign key values for nodes /

for j ← 1 to numnodes do

access.colKey[j]=2

j

_;

end

/Calculate object key /

for i ← 1 to numobjects do

access.Key[i] = BuildKey(access,i,numnodes);

end

/Sort lines on object key /

SortTable(access.Key, access);

/Find segments and allocations /

segmID=0;

currKey = -1;

currSeg = NIL;

for i ← 1 to numobjects do

if currKey != access.Key(i) then

currKey = access.Key(i);

currSeg = NewSeg(Key);

end

currSeg.Add(access.oid(i));

end

(25)

o4

x

o5

x x

o6

_x

o1

x

o2

x

o3

x x

1 4 8 16 548 294 329 329 402 294 2 column value object key objects ACCESSES

x

MEDIUM disk mem N2 N1 N3 N4 N5 256 512

x

32 64 128 asap bound imm CONSISTENCY

Figure 2: Multiple property table

from the complete property set. For instance, units of physical allocation can be gener-ated as a subset the properties of ’access’ and ’storage medium’ only. By executing the segmentation algorithm on this subset, the resulting segmentation gives the segments for physical allocation to nodes and for recovery of segments.

4.2.2 Properties, dependencies and rules

There may be conditions for combining properties. For instance, to guarantee transaction timeliness three conditions must be fulfilled: data need to be stored in main memory, being available at the node where accessed, and the database need to replicate by detached replication. Detached replication can be done as soon as possible (asap) or in bounded

time (bounded). Consider o2in the property set above. To guarantee timeliness, we need

to change the storage of object o2 into main memory storage to make the replication

method property consistent with the storage property. Thus, we use a known dependency between properties to ensure the consistency among the settings of the properties in use.

Also, we may combine objects o3and o5into the same segment by changing the storage

of either of them. Both need to be immediate consistency, but there is no relation rule that requires a certain type of storage for that consistency. We can select storage either in memory or on disk, but to put them in the same segment both need to use the same storage setting. A choice could be to store them on disk, since disk storage is a cheaper resource compared to storage in memory.

Further, by letting o4 be additionally located to node N2 we can create a segment

consisting of objects o3, o4, o5. This storage is not anymore optimal storage according to

the specification of known data accesses. Still this may be a valid choice to reduce the number of segments in the segmentation.

(26)

By considering properties of data and the relations between them, segmentations can be made consistent by fulfilling rules that specify the property dependence relation. There may also be other rules that influence the table, such as guaranteeing a minimum degree of data replication to provide a certain level of fault tolerance, or labeling of clustered data, which is data objects that are replicated together and have the same property values. Using such clustering rule, the entries for such data objects are set to the same value, typically based on the most restricting combination of properties of the clustered objects.

4.2.3 Analysis model and assumptions

To evaluate resource usage for the implementation of static segmentation, we present an analysis model for usage of three key resources. Bandwidth usage is reduced when less nodes need to be updated. Also, such a system requires less overall storage, since each node hosts only a subset of the database objects. As a consequence, overall processing of conflict detection and resolution for inconsistent replicas is lowered since fewer nodes host conflicting replicas.

We evaluate our approach by examining how three important system resources are used when selected system parameters are scaled up. The baseline for comparison is a fully replicated database with detached replication, such as DeeDS. First of all, a large-scale distributed database with many nodes requires scalable bandwidth usage, and full

replication does not scale well since an update generates O(n2_{) replication messages for}

n nodes. Secondly, large distributed and fully replicated databases store the entire

data-base on the local node, requiring storage of O(nm) data objects. Conflict detection and

resolution keep O(n2_{) processing time, since every node that has a replica must resolve a}

conflict caused by any other node. Based on our database model, we present an analysis for how these three resources scale, both for a virtually fully replicated and a fully replicated system.

We assume that the database is distributed to n nodes, and with segmentation the degree of replication is limited to k ≤ n nodes. In such a database, every update needs to be replicated to k − 1 nodes. With network functions for multicast or broadcast the system could replicate data specifically to a set of nodes by a single send message, reducing the processing time required to send the data and the number of messages sent on the network, but not reduce the integration time or the conflict detection and resolution time of an update. Currently, we focus on networks with point to point communication only, since we argue that scalability in such networks is a problem that can be generalized to many different kinds of networks. Replication of several updates for the same segment replica may also be coordinated to reduce communication overhead, but in our current

(27)

analysis model we assume that each update is sent individually. Thus, we consider each object update to generate a send of k − 1 single update messages. Our simple analysis

model in this paper does not consider individual degrees of replication kifor each segment,

but the same degree of replication for all segments, k.

Selective replication by multicast has been studied before [4] and may be included in future work as part of a larger study involving other network topologies than the sin-gle network resource used in the current analysis model. We also consider coordinated propagation of replication messages to be a future extension of the work in this paper.

The number of transaction instances in the database is denoted |Tj|. We characterize

each transaction, Tj, by its frequency, fj, the size of its read set, rj, the size of its write

set wj and the size of its conflict set, cj. Thus, for the purpose of the analysis the

characteristics of each transaction can be modeled as Tj=< fj, rj, wj, cj>

An analysis of network usage is essential, since a distributed system does not scale well if the communication cost limits a large systems to be built. Bandwidth is a shared resource that is critical for scalability of the system. We choose to model this resource as a single shared network link, as is the case with traditional Ethernet or with time-shared communication, such as used in wireless sensor networks. A single shared network resource is more restrictive for scalability than e.g. a switched network, where the design topology can relieve congestion by distributing communication on multiple independent network links.

A segmented database stores k replicas of database objects, instead of n replicas that are stored in a fully replicated database. For applications that typically has a low repli-cation degree, k, much memory can be saved overall by segmentation and scalability is improved. For an analysis of storage we may consider the storage of database object replicas, the segment administrational data structures and the internal variables of the database system. The analysis in this paper considers the storage of data object replicas only, since additional data structures are expected to be small or independent of whether the system is fully replicated or segmented.

Processing time for storing updates on the local node include time for locking, updat-ing and unlockupdat-ing the data objects, which we denote L. Updates for data objects that have replicas on other nodes use additional processing time to propagate the update, including logging, packing, marshalling, and sending the update on the network. We denote the processing time for propagation P . With point to point communication, the sending node spends P time for each node that has a replica of the updated data object. At the node receiving an replicated update at another node, the database system must integrate the update. This includes receiving, unpacking, unmarshalling, locking data objects, detect

(28)

conflicts, resolve conflicts, update to database objects, and finally unlocking of the data objects that was replicated. We denote the time used for integration I, excluding conflict detection and resolution, while the time used for for conflict detection and resolution is denoted C. Processing time C only occurs when there are conflicting updates, while I is used for all updates replicated to a node. Integration processing is needed at all nodes having a replica of the originally updated data object.

4.2.4 Analysis

We analyze scalability by examining how the chosen resources are used according to our analysis model.

Bandwidth usage depends on the number of updates replicated, including the new

values and their associated version vectors. In our model every transaction Tj, executed at

frequency fj generates one network message for each member of its write set wj to update

its replicas at k − 1 nodes. Such network messages are generated by all transactions at all nodes. We express bandwidth usage in Equation 3.

(k − 1)

|Tj| X

i=1

wi∗ fi [messages/sec] (3)

From the formula we see that bandwidth usage scales with the degree of replication. For a virtually fully replicated database with a limit k on the degree of replication bandwidth usage scales with number of replicas, O(k), rather than with number of nodes, O(n), as is the case with a fully replicated database. However, the amount of transactions often depends on the number of nodes, O(n), so bandwidth usage becomes O(kn) for the

virtually replicated database and O(n2_{) for the fully replicated database.}

For storage of a database with s segments, there are ki replicas of each segment of the

size si. We express required storage for a virtually fully replicated database in Equation

4.

s

X

i=1

(si∗ ki) [objects] (4)

In our analysis model we assume the same degree of replication, k, for all segments,

∀i(ki = k). Thus, each data object o in the database is replicated at k nodes and the

required storage can be expressed as o ∗ k. With full replication to n nodes, n replicas of each data object is stored and the required storage is o∗n. This means that a virtually fully replicated database scales with the limit on replication, k, rather than with the number of nodes, n, in the system as is the case with a fully replicated database.

The processing time used for replication depends on the write set of each transaction,

(29)

Additionally the conflict set, ci, requires conflict detection and resolution time, C. Thus,

we can express the processing time asP|Tj|

i=1fi{[L + (k − 1)P + (k − 1)I]wi+ [(k − 1)C]ci}, or as Equation 5. |Tj| X i=1 fi[Lwi+ (k − 1){(P + I)wi+ Cci}] [sec] (5)

Similarly to Equations 3 and 4, this formula shows that processing is constant with the degree of replication, not growing with the number of nodes in the system. Also, the sizes of the write set and the conflict set influences the amount of processing time required.

With multi-cast operations available at the network used, the analysis formulas be-comes different for bandwidth and for processing. Replication of updates will send only one message for all the replicas, and bandwidth usage can be expressed as in Equation 6. For processing, each update is sent only once on the network, but there is still integra-tion time, and conflict detecintegra-tion and resoluintegra-tion is also done at every replica receiving an update. Thus, the corresponding processing time can be expressed as in Equation 7.

|Tj| X i=1 wj∗ fj [messages/sec] (6) |Tj| X i=1 fi[Lwi+ P wi+ (k − 1){Iwi+ Cci}] [sec] (7)

4.2.5 Implementation

The segmentation algorithm has been used to implement virtual full replication in the DeeDS prototype, and the database was segmented into fixed segments with individual degrees of replication [53].

In this implementation, transactions are specified by a configuration file that describes the transactions with their data requirements, the data properties given by the application semantics, and specification of key capabilities of the database system.

For this implementation transactions specifications were recorded in a configuration file that was translated into the property table. Some basic rules were hard coded and use only a few relations for properties. A generic system would need support for an explicit rule specification. The replication schema and physical segments are derived from the property table using the static segmentation algorithm (Algorithm 1).

In the experiment we show here, all parameters and all but one scale factors were fixed. The number of nodes were changed from 1 to 10 nodes, and the three resources of bandwidth, storage and processing were measured. For virtual full replication, we limited the maximum degree of replication at 3 replicas.

(30)

Figure 3: Bandwidth usage vs. Number of nodes

segmentation overhead

Figure 4: Storage vs. Degree of replication

Figure 5: Processing vs. Number of nodes

Scalability has been examined by evaluating the usage of the resources. Additional experiments were done where we varied the degree of replication, and the ratio of write operations in transactions. In these experiments, we compared the original implementation for a fully replicated database with the new implementation for segmentation using full

(31)

replication, as well as a lower degree of replication typically used in a virtually fully replicated setting.

Figure 3 shows bandwidth usage for up to 10 nodes, with a replication degree of k = 3 for Virtual full replication (VFR). For systems larger than 3 nodes, the bandwidth usage remains constant with Virtual full replication, while for a fully replicated (FR) database it increases with number of nodes added. The measurements show the total amount of messages sent at the network to replicate one fixed size segment in the database during a fixed length run of the experiment.

Figure 4 shows how storage requirements change for a system up to 10 nodes, and the measurements show the amount of storage for one fixed size segment. For a virtually fully replicated database, storage of the database scales with the limit of replication, and a fully replicated database scales with the number of nodes. In this experiment we also measure the segment management storage overhead in the system overall, and the figures can be seen at the bottom of Figure 4. Segment storage management data scales with the number of nodes in the current implementation. This needs to be further analyzed and improved, to ideally scale with the degree of replication instead.

Figure 5 shows the processing used at each node to replicate updates. Processing for the virtually fully replicated database (VFR) does not increase with number of nodes added, but remains constant with the degree of replication. The fully replicated database (FR) scales with the number of nodes, while the implementation for Virtual full replication using a fully replicated database (FR new) adds processing overhead, compared to FR.

4.2.6 Discussion

The curves match well the behavior of our analysis in Equations 3, 4 and 5. For a fully large-scale evaluation, experiments need to be run for a larger number of nodes, which could not be done successfully with the current database prototype implementation. Our analysis in 4.2.4 complements the implementation evaluation for this research step by giving an analyzing model for large-scale behavior. Our next step addresses the problem of studying a large scale implementation.

4.3 Simulation for scalability evaluation

M3: We use simulation to examine large scale systems. The first simulation implemen-tation experiment has been performed [50], strengthening the findings about bandwidth usage from the implementation of static segmentation in DeeDS. The simulator will be further developed to run experiments with 1000 nodes and more. Such large scale systems cannot be easily run as a DeeDS implementation, since that would require an unfeasible

(32)

amount of resources. Our simulation approach uses a representation of bandwidth, storage and processing, rather than using the actual resources as is the case with an implemen-tation in the DeeDS prototype. An example advantage with simulation is the storage of the database itself, which does not store the actual data in the simulation, but only information to describe the data objects. By only using a value for the size of the object instead of storing the object, the amount of storage required for the data objects can still be modeled, while using less resources. With such representation, network messages never replicates actual data objects values, but only information about the size and version of the update. A simulation for large scale systems requires that the representation of the system is done so that the simulation itself is scalable. To study resource usage by the replication mechanisms used by DeeDS and extensions for virtual full replication, this is what is primarily modeled. The representation for modeling this needs to be further refined to be scalable, for 1000 nodes and more.

Simulation enables an evaluation of static segmentation for large scale virtually fully replicated databases. In previous work we have implemented Virtual full replication in the DeeDS database prototype. The implementation could not give precise results, since only a few nodes could be run in the experiment. The conclusions about large scale behavior could therefore only be limited from that experiment. With a simulation experiment, the database nodes are simulated and resources that are used can be represented as data entities rather than using actual resources in the system.

The simulation is only a model of the real system, but with an properly designed sim-ulation we have the opportunity to get an understanding of large scale behavior of Virtual Full Replication. Before using a certain simulation it must be valid for the purpose of the study, so our approach in this research step is a two stage work. The first stage has been to assemble relevant literature for a background on validation of simulations. The second part is to model and implement the simulation, and this is a work that is in progress. An initial simulation has been done and is reported [50].

4.3.1 Motivation for a simulation study

Our motivation for simulating a distributed real-time database are:

• Large scale experiments with the prototype database system are infeasible, due to

the amount of hardware and the size of the installation required.

• The analysis done in previous step, of resource usage for bandwidth, storage and

Virtual Full Replication for Scalable. Distributed Real-Time Databases