Fault tolerance in cloud technologies presented as a service

(1)

Fault tolerance in cloud technologies

presented as a service

Pavel Dzhunev, PhD

student

International Scientific Conference Computer

Science’2015

(2)

INTRODUCTION

Improvements in techniques for virtualization and

performance in cloud networking technologies over

the last decade have made it possible to achieve

greater abstraction in the form of cloud computing.

Cloud can be defined as a framework to deliver

services for various kinds of resources like computing,

software, storage etc. over the global network -

(3)

RELEVANCE OF THE PROBLEM

Fault tolerance of failures in cloud computing is one of

the main concerns of users and providers of this type

of modern services. Today public clouds provide

reliability and good connectivity.

Let's take the example of a global site which we all

know this is EBAY.com. Today UPTIME on this site is

99.95%, while 6-9 years ago this percentage was

99.1%.

(4)

RELEVANCE OF THE PROBLEM

In this scientific work, we introduce the concept of fault

tolerance as a service based on user requirements and the

agreement between the provider - user.

Through this model tenant cloud service can choose different

specifications of the huge base, as a place to be located

virtual machine, type of connection, speed of delivery of

information, and many other parameters.

(5)

BASIC MODEL

Let T = {T1, T2, T3… Tp} be a set of p tenants. The

FTaaS has a planning span of τ slots over time

duration of T, each time slot of the same duration.

The cost associated with each of the two fault

tolerance methods are as follows:

price for storage for each client

price for each vitual machine

price a tenant has to pay for each invocation

of a coordinated check pointing

SS VM cp

C



(6)

BASIC MODEL

Let V be the maximum number of Virtual Machines that can be provisioned and let S be the maximum amount of storage [4] that is available for storing

checkpoints.

In the SLA for tenant Ti is represented:

i i i

< N , S , t , c , R >

_isla _isla

N

i i

,

N - checkpoint interfal function of the tenant Ti;

S - size of storrage for tenant Ti;

t

- function of TMR (0,1);

c

- function of SLA (0,1)

sla i sla

Where

(7)

BASIC MODEL

Tenant can request to be served either by TMR or COORD as fault tolerance [5]. Hence,

( )

1

sla sla i i

c

k



t

k



( ) 1

( )

0

sla i sla i

C

k

C

k



Similarly, when the tenant wants to be served in appropriate Example of fault tolerance SLA

Below is an example of fault tolerance provided by the two tenants of cloud services. It is planned to test for a day - a total of 24 hours. This whole day is divided into 24 slots each lasting one hour.

(8)

EXAMPLE OF FAULT TOLERANCE SLA

This whole day is divided into 24 slots each lasting

one hour.

Slot Fault Tolerance method Chekpoint interval (Ni) Critical – Yes or No (Ri)

1-4 COORD 0,25 hr N 5-9 COORD 0,5 hr N 10-14 TMR - N 15-22 COORD 0,5 hr N 23 TMR - Y 24 COORD 0,5 hr N Checkpoint size 2 mb Tab. 1

(9)

EXAMPLE OF FAULT TOLERANCE SLA

Slot Fault Tolerance method Chekpoint interval (Ni) Crytical (Ri)

1-4 COORD 0,5 hr N 5-8 TMR - N 9-12 COORD 0,5 hr N 13-16 TMR - N 17-20 COORD 0,5 hr Y 21-24 TMR - Y Checkpoint size 4 mb Tab. 2

(10)

EXAMPLE OF FAULT TOLERANCE SLA

Slot 23 is marked as critical both for tenants and

service providers, so that should either be denied or

to add a new virtual machine from the service

provider.

Tables 1 and 2 show the test results there of

overnight at two different clients.

If the supplier FTaaS has a maximum capacity of 4

virtual machines, it will not be able to meet the

(11)

DEFINE OF THE PROBLEM

Considering the above results can be clearly get an idea what is the main role of FTaaS. The provider has n number of virtual machines, and space, which is known in advance, the main task of FTaaS is to analyse whether it is possible to meet the re-quirements for fault tolerance for all tenants.

Let's look at the main problems of the global Internet network in terms of denial of services.

Let get following variables:

At any slot, the total number of VMs cannot be more

Than what the FTaaS provider can provision. The TMR mode requires 3 VMs for each FTVM and COORD mode requires 1 VM for a FTVM.

(12)

DEFINE OF THE PROBLEM

i 1

(3t ( ) c (k)).y

V : 1

p sla sla i i i

k









 



Therefore,

At any slot, the total amount of stable storage cannot exceed the amount of stable storage available with the FTaaS (S) provider. The TMR mode does not require any

Storage [6]. However each FTVM in COORD mode require stable storage commensurate to its checkpoint size (Si).

(13)

DEFINE OF THE PROBLEM

In order to maximize the gain, the solver will try to select a set of tenants such that the combined expression of gain above is maximized, while all other constraints are honored.

i

1 ( .c (k)).y

S : 1

p

sla

i

S

k







 



(14)

Results

Presented are the results based on simulated data using IBM

ilog cplex optimizer [8].

We have 20 the client 24 with the slot length in a day, i.e., 24

hours. Requiring vendors of services is not limited by the

methods of fault tolerance range from 10 to 100, and the cost

of virtual machines from 20 to 200. For our experiment we

considered that the price for TMR fault tolerance method is

more expensive than the COORD.

(15)

Results

(16)

Results

Fig. 1 shows total revenues corresponding to V with a fixed value S, as prepared by formulating ILP. As expected, the more revenue may be obtained with greater separation of the virtual machines.

However, when the maximum possible tenants more revenue saturated fatty acids. As expected, revenue generated with the help of our model is more than the revenue generated by the simple model because the criminal model has strategic flexibility for more tenants for a virtual machine. However, when a

provider has sufficiently large number of virtual machines on hand, the strategic advantage decreases.

Fig. 2 shows the total revenue, corresponding to an S with a fixed value V, as obtained by formulating ILP. As expected, the more revenue is generated with a stable storage for both methods. However, the criminal model generates more revenue than the simplified model, thanks to strategic flexibility.

(17)

CONCLUSION

In this report we presented SLA solution for fault tolerance in

cloud computing. Converting fault tolerance concept and

presented as a service solution to customers potentially

would be much more efficient and flexible in its use in

modern communication environments. With the expansion of

computer services and the growing trend of cloud computing

we believe that such a decision would have produced

promising results in shaping the business needs in the cloud

computing.

(18)

REFERENCES

[1] M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable,

commodity data center network architecture,” in Proceedings

of the ACM SIGCOMM 2008 conference on Data communication. ACM, 2008, pp. 63–74.

[2] J. von Neumann, “Probabilistic logics and the synthesis of reliable organisms from unreliable components,” Automata

Studies, pp. 43–98, 1956.

[3] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus: high availability via asynchronous

virtual machine replication,” in 5th USENIX Symp. on Networked Systems Design and Implementation, ser. NSDI.

USENIX Association, 2008, pp. 161–170.periodically

[4] “Hp chilled-water performance optimized data centers 20c and 40c,” 2011. [Online]. Available: http://h20195.www2.hp.

(19)

REFERENCES

[5] C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu “DCell: A scalable and fault-tolerant network structure for data centers,” ACM SIGCOMM Computer Communication

Review, vol. 38, no. 4, pp. 75–86, 2008.

[6] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, “Hedera: Dynamic flow scheduling for data center networks,” in Proceedings of the 7th USENIX conference on

Networked systems design and implementation. USENIX Association, 2010, p. 19.

[7] Y. Tamura. (2008, Jun) Kemari: Virtual Machine Synchronization for Fault Tolerance using DomT.

[8] “IBM ilog cplex optimizer,” http://www-01.ibm.com/

software/in/integration/optimization/cplex/.

(20)