Fault tolerance in cloud technologies
presented as a service
Pavel Dzhunev, PhD
student
International Scientific Conference Computer
Science’2015
INTRODUCTION
Improvements in techniques for virtualization and
performance in cloud networking technologies over
the last decade have made it possible to achieve
greater abstraction in the form of cloud computing.
Cloud can be defined as a framework to deliver
services for various kinds of resources like computing,
software, storage etc. over the global network -
RELEVANCE OF THE PROBLEM
Fault tolerance of failures in cloud computing is one of
the main concerns of users and providers of this type
of modern services. Today public clouds provide
reliability and good connectivity.
Let's take the example of a global site which we all
know this is EBAY.com. Today UPTIME on this site is
99.95%, while 6-9 years ago this percentage was
99.1%.
RELEVANCE OF THE PROBLEM
In this scientific work, we introduce the concept of fault
tolerance as a service based on user requirements and the
agreement between the provider - user.
Through this model tenant cloud service can choose different
specifications of the huge base, as a place to be located
virtual machine, type of connection, speed of delivery of
information, and many other parameters.
BASIC MODEL
Let T = {T1, T2, T3… Tp} be a set of p tenants. The
FTaaS has a planning span of τ slots over time
duration of T, each time slot of the same duration.
The cost associated with each of the two fault
tolerance methods are as follows:
price for storage for each client
price for each vitual machine
price a tenant has to pay for each invocation
of a coordinated check pointing
SS VM cp
C
C
C
BASIC MODEL
Let V be the maximum number of Virtual Machines that can be provisioned and let S be the maximum amount of storage [4] that is available for storing
checkpoints.
In the SLA for tenant Ti is represented:
i i i
< N , S , t , c , R >
isla islaN
i i,
N - checkpoint interfal function of the tenant Ti;
S - size of storrage for tenant Ti;
t
- function of TMR (0,1);
c
- function of SLA (0,1)
sla i slaWhere
BASIC MODEL
Tenant can request to be served either by TMR or COORD as fault tolerance [5]. Hence,
( )
( )
1
sla sla i ic
k
t
k
( ) 1
( )
0
sla i sla iC
k
C
k
Similarly, when the tenant wants to be served in appropriate Example of fault tolerance SLA
Below is an example of fault tolerance provided by the two tenants of cloud services. It is planned to test for a day - a total of 24 hours. This whole day is divided into 24 slots each lasting one hour.
EXAMPLE OF FAULT TOLERANCE SLA
This whole day is divided into 24 slots each lasting
one hour.
Slot Fault Tolerance method Chekpoint interval (Ni) Critical – Yes or No (Ri)
1-4 COORD 0,25 hr N 5-9 COORD 0,5 hr N 10-14 TMR - N 15-22 COORD 0,5 hr N 23 TMR - Y 24 COORD 0,5 hr N Checkpoint size 2 mb Tab. 1
EXAMPLE OF FAULT TOLERANCE SLA
Slot Fault Tolerance method Chekpoint interval (Ni) Crytical (Ri)
1-4 COORD 0,5 hr N 5-8 TMR - N 9-12 COORD 0,5 hr N 13-16 TMR - N 17-20 COORD 0,5 hr Y 21-24 TMR - Y Checkpoint size 4 mb Tab. 2
EXAMPLE OF FAULT TOLERANCE SLA
Slot 23 is marked as critical both for tenants and
service providers, so that should either be denied or
to add a new virtual machine from the service
provider.
Tables 1 and 2 show the test results there of
overnight at two different clients.
If the supplier FTaaS has a maximum capacity of 4
virtual machines, it will not be able to meet the
DEFINE OF THE PROBLEM
Considering the above results can be clearly get an idea what is the main role of FTaaS. The provider has n number of virtual machines, and space, which is known in advance, the main task of FTaaS is to analyse whether it is possible to meet the re-quirements for fault tolerance for all tenants.
Let's look at the main problems of the global Internet network in terms of denial of services.
Let get following variables:
At any slot, the total number of VMs cannot be more
Than what the FTaaS provider can provision. The TMR mode requires 3 VMs for each FTVM and COORD mode requires 1 VM for a FTVM.
DEFINE OF THE PROBLEM
i 1(3t ( ) c (k)).y
V : 1
p sla sla i i ik
k
Therefore,At any slot, the total amount of stable storage cannot exceed the amount of stable storage available with the FTaaS (S) provider. The TMR mode does not require any
Storage [6]. However each FTVM in COORD mode require stable storage commensurate to its checkpoint size (Si).
DEFINE OF THE PROBLEM
In order to maximize the gain, the solver will try to select a set of tenants such that the combined expression of gain above is maximized, while all other constraints are honored.
i
1
( .c (k)).y
S : 1
p
sla
i
i
i
S
k
Results
Presented are the results based on simulated data using IBM
ilog cplex optimizer [8].
We have 20 the client 24 with the slot length in a day, i.e., 24
hours. Requiring vendors of services is not limited by the
methods of fault tolerance range from 10 to 100, and the cost
of virtual machines from 20 to 200. For our experiment we
considered that the price for TMR fault tolerance method is
more expensive than the COORD.
Results
Results
Fig. 1 shows total revenues corresponding to V with a fixed value S, as prepared by formulating ILP. As expected, the more revenue may be obtained with greater separation of the virtual machines.
However, when the maximum possible tenants more revenue saturated fatty acids. As expected, revenue generated with the help of our model is more than the revenue generated by the simple model because the criminal model has strategic flexibility for more tenants for a virtual machine. However, when a
provider has sufficiently large number of virtual machines on hand, the strategic advantage decreases.
Fig. 2 shows the total revenue, corresponding to an S with a fixed value V, as obtained by formulating ILP. As expected, the more revenue is generated with a stable storage for both methods. However, the criminal model generates more revenue than the simplified model, thanks to strategic flexibility.
CONCLUSION
In this report we presented SLA solution for fault tolerance in
cloud computing. Converting fault tolerance concept and
presented as a service solution to customers potentially
would be much more efficient and flexible in its use in
modern communication environments. With the expansion of
computer services and the growing trend of cloud computing
we believe that such a decision would have produced
promising results in shaping the business needs in the cloud
computing.
REFERENCES
[1] M. Al-Fares, A. Loukissas, and A. Vahdat, “A scalable,
commodity data center network architecture,” in Proceedings
of the ACM SIGCOMM 2008 conference on Data communication. ACM, 2008, pp. 63–74.
[2] J. von Neumann, “Probabilistic logics and the synthesis of reliable organisms from unreliable components,” Automata
Studies, pp. 43–98, 1956.
[3] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus: high availability via asynchronous
virtual machine replication,” in 5th USENIX Symp. on Networked Systems Design and Implementation, ser. NSDI.
USENIX Association, 2008, pp. 161–170.periodically
[4] “Hp chilled-water performance optimized data centers 20c and 40c,” 2011. [Online]. Available: http://h20195.www2.hp.
REFERENCES
[5] C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu “DCell: A scalable and fault-tolerant network structure for data centers,” ACM SIGCOMM Computer Communication
Review, vol. 38, no. 4, pp. 75–86, 2008.
[6] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, “Hedera: Dynamic flow scheduling for data center networks,” in Proceedings of the 7th USENIX conference on
Networked systems design and implementation. USENIX Association, 2010, p. 19.
[7] Y. Tamura. (2008, Jun) Kemari: Virtual Machine Synchronization for Fault Tolerance using DomT.
[8] “IBM ilog cplex optimizer,” http://www-01.ibm.com/
software/in/integration/optimization/cplex/.