• No results found

Performance Comparison of Erasure Codes for Different Churn Models in P2P Storage Systems

N/A
N/A
Protected

Academic year: 2021

Share "Performance Comparison of Erasure Codes for Different Churn Models in P2P Storage Systems"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

D.-S. Huang et al. (Eds.): ICIC 2010, LNAI 6216, pp. 418–425, 2010. © Springer-Verlag Berlin Heidelberg 2010

Performance Comparison of Erasure Codes for Different

Churn Models in P2P Storage Systems

Jinxing Li, Guangping Xu, and Hua Zhang Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology

Tianjin, China

venus.l@126.com

Abstract. We evaluate the performance of erasure codes for different models in P2P storage systems. The comparative analysis is based on that node session length follows exponential distribution (ED), Pareto distribution (PD), and Weibull distribution (WD). The reliability theory is utilized to evaluate the mean data availability. More, we evaluate the impact of both node-join churn and erasure coding parameters m, n on data duration. And we compare the effective results of different churn models by simulations. The simulations are driven by both real and synthetic traces. The results show that node-join churn causes no impact on data duration if node session length follows ED, but different impact on duration if node session length follows PD and WD. The impacts on duration for both WD and PD increase with the raising node-join churn degree. In the cases of the fixed redundancy rate, the increase of m and n reduces the data duration.

Keywords: Erasure codes; P2P storage systems; churn model; data duration.

1 Introduction

In recent years, P2P systems have shown great advantages in file sharing, streaming media and other applications, and also will be the main structure and technology of building new large-scale Internet applications. But owing to huge technical challenges, P2P storage applications have not reached prospective commercial success. However, P2P storage systems have been considered to be a very promising application.

An important issue in P2P storage systems is data availability [1]. Though it’s impossible to gain 100% availability, high availability can be achieved by redundancy storage. Normally, our notion of redundancy storage is to create extra replicas. It is a simple trade-off between overhead of storage and data availability.

It is well known that erasure codes [2,3] can be used to achieve higher availability. In the process of erasure coding, an object is divided into m equal sizes of blocks, and then the m blocks are encoded into n (n>m) blocks (the same size as before). So the original object can be reconstructed from any m out of the n encoded blocks. The redundancy rate in this case is n /m and the availability of an object can be presented based on the k-out-of-n component reliability [4] in reliability theory.

(2)

The availability of objects stored in P2P storage systems is directly affected by node-join churn. Before discussing the impact of node-join churn, we define a metric, node-join churn degree J= j/n, where j is the number of new join nodes, n is just the erasure codes parameter n. In a P2P storage system, nodes firstly join the system, contribute some resources, and then leave the system. The join-participate-leave cycle is called a session, and the collective effect created by the independent arrival and departure of thousands or millions of peers is called churn [5,7]. The typical distributions that session length follows are ED [6,7], PD [8,9] and WD [5,12].

In this paper, we analyze the performance of erasure codes in P2P storage systems under three different typical churn models. The session length of each node follows ED, PD, and WD. The model of component reliability in reliability theory is used to calculate the mean data availability. It has been proved that node-join churn negatively impact on replicated data duration in structured P2P storage systems [10]. However, no researchers evaluate the impact of node-join churn on the performance of erasure codes in P2P storage systems. So, we investigate the impact of node-join churn on data duration of erasure codes based P2P storage systems. By means of controlling the degree of node-join churn J, how node-join churn impact on the data duration is comparatively analyzed, while the session length following ED,PD and WD. Meanwhile, the relationship between erasure codes parameters m, n and the data duration is inspected in the cases of fixed redundancy rate.

The remainder of the paper is structured as follows: Section 2 gives the related work; Section 3 presents the duration model which is used to analyze availability of erasure-coded data in P2P storage systems; Section 4 provides the simulation results and our evaluation; Section 5 we conclude the paper.

2 Related Work

In a dynamic P2P storage system, data redundancy is indispensable to ensure high data availability. The main redundant strategies are replication-based and code-based. Erasure codes [2,3] is a typical and efficient code-based redundant strategy.

The distributions that session length of nodes follows affect the performance of P2P storage systems directly. Moreover, each simulation or analysis of a P2P storage system relies on the models of churn. Towards this issue, researchers and developers require an accurate model of churn in order to draw accurate conclusions about P2P storage systems.

ED, PD and WD are used extensively as the fundamental models to analyze the effect of churn on the performance of P2P storage systems. The results of Nurmi et al. [13] indicated that either a hyper-exponential or Weibull model effectively represents machine availability in Internet computing environments. Stutzbach et al. [5] presented a thorough analysis of Churn in three real life P2P systems, i.e. Gnutella, Kad and BitTorrent, and concluded that session length was not heavy-tailed or Pareto, instead they were more accurately modeled by a Weibull distribution. Steiner et al. [11] explored the peer behavior by crawling a real system Kad continuously for six months. They found that the distribution of the session length can be best characterized by a Weibull distribution, with shape parameter k < 1.

(3)

Xu et al. [10] analyzed the impact of node-join churn on replicated data duration in P2P storage systems. Their conclusion was that node-join churn had negative impact on data duration of Planet and Skype. Furthermore, they showed the results of mathematical analysis that node-join churn negatively impact on replicated data duration if session length of nodes followed PD and WD except ED because of its memory-less property.

3 Analysis Model

In this section, the mean performance of erasure codes are analyzed in P2P storage systems, i.e. availability [1,2] and available duration, utilizing reliability theory as the mathematics model.

The three typical churn models refer to three probability distributions: ED, PD and WD. In this section, we mainly introduce the method to calculate the expected value of available probability or available duration under the condition that available probability or session length of each node follows ED.

3.1 Churn Models

3.1.1 Exponential Distribution

The cumulative distribution function (CDF) for an ED is given by the following equations:

/

1 , 0,

( ; ) .

0, 0. t

e

e t F t

t λ λ =⎧⎨ − − ≥< ⎫⎬

⎩ ⎭ (1) whereλ > 0 is a scale parameter of the distribution. The mean or expected value of an exponentially distributed variable T with scale parameterλ is λ.

3.1.2 Pareto Distribution

The cumulative distribution function (CDF) for a PD is given by the following equations:

1 ( ) , ,

( ; , ) .

0, . p

t

F t t

t α

β β

α β

β

⎪ ⎪

=⎨ ⎬

<

⎩ ⎭ (2)

where β is a scale parameter of the distribution, and is the minimum possible value of the random variable T that is necessarily positive, while α stands for the heavy-tailed degree. The mean or expected value of a random variable T that follows PD with scale parameter β is

1 α β α− .

3.1.3 Weibull Distribution

The cumulative distribution function (CDF) for a WD is given by the following equations:

(4)

( / )

1 , 0,

( ; , ) .

0, 0.

k

t w

e t

F t k

t

λ

λ

=⎧⎪⎨ − − ≥ ⎫⎪⎬

<

⎪ ⎪

⎩ ⎭ (3) where k > 0 is called the shape parameter and λ > 0 is called the scale parameter. When k = 1, it is equivalent to an ED; when k = 2, it equals to the Rayleigh distribution. The expected value of a random variable T that follows WD with shape parameter k and scale parameter λ is 1 1

k

λΓ⎛ + ⎞

⎝ ⎠, where Γ is the Gamma function.

According to the method of calculating the reliability of k-out-of-n systems in reliability theory, the availability of an object in P2P storage systems can be expressed as:

( )

1 2

( , ) (1 )

n

n i n i

n i

i m

A p p p p p

=

=

… … ,

(4) P1, P2…, Pn are the reliability of n nodes where the n blocks generated by erasure coding process. Here it is assumed that each Pi equals to a mean reliability P. If the session length follows ED, we suppose that the mean session length of each node is the expect value

λ

, and /

( ) t i

F t =e− λ

. Then the expected value of available probability or available duration can be calculated by

[ ]

( )

/ / 0

A n ( t ) (1i t )n i i

E =

e− λ −e− λ −dt

(5) The final expression about the expected value of available probability or available duration is shown as follow:

[ ]

! ( 1)!( ) ! 1 A

( )! ! !

n n

i m i m

n i n i

E

n i i n i

λ λ = = − − = = −

(6) So under the premise that if available probability or session length of nodes follows ED, we can calculate the expected value of available probability or available duration and then evaluate the performance of erasure codes in P2P storage systems through formula (6).

3.2 Node-Join Churn

When a new node joins in a P2P storage system, it will inevitably cause that redundant data migrates to the new node from the current node. Though this process can not lead to data loss, node-join churn could impact on the data duration probably according to different distributions the session length follows. As Figure 1 shown, a data object joins in the system at time t, any one of its erasure codes replica residing in node p; at time s, a new node q joins in the system and this replica migrates to node q from node p. Supposing that Rt and Rs stand for residue duration of nodes at time t and s, L is the whole duration of new node. After migration of this node, the original duration of the data replica is replaced by duration L of new joined node. Thus, the duration of this data replica is changed.

(5)

original nodep

t new node

q

s Rt

L r Rs

replace

fail

join

Fig. 1. Relationship between node duration and residue duration

In next section, we will verify how does node-join churn influence data duration as memory-less property of ED, and evaluate how node-join churn impact on the data duration if session length of nodes follows PD and WD through our simulations.

4 Simulation and Results

In this section, the practical performance of erasure codes in P2P storage systems will be analyzed through our simulation.

In the simulations, both real traces of Skype and synthetic traces that the session length of nodes follows three different distributions, ED, PD and WD are used to evaluate the performance. We separately set three groups of churn degree J= 0, J= 0.5 and J= 1, to inspect the relationship between node-join churn and data duration. In order to inspect the impact of erasure codes parameters, three groups of erasure codes parameters: m=2, n=4; m=5, n=10 and m=50, n=100 are employed in our simulations to gain more comparative simulation results in the case of fixed redundancy rate,

m/n= 2.

Each above-mentioned experiment is repeated for 100 times to gain more exact results. The mean results of experiments are visualized in the following comparative figures.

4.1 Trace Data of Skype

The trace data of Skype is used to drive the simulation. The related experimental results are shown in Figure 2.

Through Figure 2, it is obvious that the data duration is negatively impacted by node-join churn, and while the node-join churn degree J increasing, the negative impact becomes more evident; Towards the three groups of erasure codes parameters, we summarized that larger values of m and n cause decline of data duration, and the trend of decline becomes more significant with raising values of m and n.

4.2 Trace Data of ED

The trace data that session length follows ED is used to drive the simulation. The parameter λ of ED is 1600000. Related experimental results are shown in Figure 3.

(6)

Fig. 2. Data duration of Skype Fig. 3. Data duration for ED

As the memory-less property of ED, node-join churn has no impact on data duration of ED through theoretical analysis. In Figure 3, the two groups of results that belong to

J= 0.5 and J= 1 are consistent with the results of J= 0. So we verified the theoretically speculation about the memory-less property of ED.From the comparison among the three groups of erasure codes parameters in the simulations: m=2, n=4; m=5, n=10 and

m=50, n=100, we can see that the data duration declines with raising m, n. The decline of data duration can be verified through formula (6) in the above section. The three groups of corresponding theoretical durations to erasure codes parameters are 1.73e+006s, 1.35e+006s and 1.13e+006s. The simulation results shown in Figure 3 are approximately equal to the theoretical results.

4.3 Trace Data of PD

The trace data that session length follows PD is used to drive the simulation. The parameters of PD are set as α=1.5, β=10000. Experimental are shown in Figure 4.

The impact of node-join churn on data duration for PD is different from the above simulations. When m=2, n=4, node-join churn really causes the same negative impact

(7)

as above simulations on data duration. However, while the values of erasure codes parameters are raised to m=5, n=10, the impact of node-join churn on data duration becomes indeterminate and not evident, sometimes negatively, sometimes no impact, or even positively. Moreover, if the values of erasure codes parameters are set as

m=50, n=100, node-join churn brings positive impact on data duration. Both negative and positive impacts mentioned above become more evident with the raising churn degree J; what’s more, the impact of erasure codes parameters on data duration is consistent with other groups of simulations. The data duration declines with increasing

m, n.

4.4 Trace Data of WD

The trace data that session length follows WD is used to drive the simulation. The parameters of WD are set as λ=100000, k=0.5. Experimental results are shown in Figure 5.

It is clearly that data duration is negatively impacted by node-join churn. The negative impact of node-join churn becomes more evident with increasing J. Through the comparison of the three groups of simulations: m=2, n=4; m=5, n=10 and m=50,

n=100, it is easy to see that if the redundancy rate is fixed, larger values of m and n

can’t bring larger data duration, but cause decline of data duration, and this kind of decline becomes more significant with raising values of m and n.

5 Conclusion

In the paper, the performance of erasure codes for three different models (ED, PD, WD) is comparatively analyzed in P2P storage systems. Through simulations, we summarized the following conclusions.Node-join churn causes negative impact on the data duration of real system Skype. If node session length follows ED, there is no impact of node-join churn on data duration. While the session length of nodes follows WD, the negative impact of node-join churn on data duration is similar to that on Skype system. If session length of nodes follows PD, the impacts of node-join churn on data duration are peculiar according to different erasure codes parameters. The impact of node-join churn on the data duration increases with the raising J, no matter which kind of distribution that session length follows. In the case of fixed redundancy rate, the raising values of m and n can’t increase the data duration, but lead to negative impact on data duration. The simulation results not only give us a clearer cognition to different impacts that node-join churn caused on data duration, but also bring us better understanding on the impact of erasure codes parameters m, n.

Acknowledgment

This work is partly supported by Informatization Project of Tianjin (No. 071035012) and Tianjin Technical Commissioner Project (No. SB20080051). The authors are grateful for the anonymous reviewers who made constructive comments.

(8)

References

1. Bhagwan, R., Savage, S., Voelker, G.: Understanding Availability. In: Proc. of the 2nd Int’l Workshop Peer-to-Peer Systems IPTPS ’05(2005)

2. Rodrigues, R.: High availability in dhts: Erasure coding vs. replication. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005)

3. Lin, W.K., Chiu, D.M., Lee, Y.B.: Erasure code replication revisited. In: Proc. of the 4th International Conference on Peer-to-Peer Computing, pp. 90–97 (2004)

4. Aven, T., Jensen, U.: Stochastic models in reliability. Springer, Heidelberg (1999) 5. Stutzbach, D., Rejaie, R.: Understanding Churn in Peer-to-Peer Networks. In: Proc. of the

6th ACM SIGCOMM on Internet measurement, Brazil, pp. 189–202 (2006)

6. Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the Evolution of Peer-to-Peer Systems. In: ACM Symposium on Principles of Distributed Computing, pp. 233–242 (2000)

7. Rhea, S., Geels, D., Roscoe, T., Kubiatowicz, J.: Handling Churn in a DHT. In: Proc. of the USENIX Annual Technical Conference, Boston, USA, pp.127–140 (2004)

8. Saroiu, S., Gummadi, P.K.: A measurement study of peer-to-peer file sharing systems. In: SPIE/ACM Conference on Multimedia Computing and Networking, pp. 156–170 (2002) 9. Leonard, D., Rai, V., Loguinov, D.: On Lifetime-Based Node Failure and Stochastic

Resilience of Decentralized Peer-to-Peer Networks. In: ACM SIGMETRICS, pp. 26–37 (2005)

10. Xu, G., Ma, W.: Churn Impact on Replicated Data Duration in Structured P2P Networks WAIM (2008)

11. Steiner, M., En-Najjary, T., Biersack, E.W.: A Global View of KAD. In: Proc. of the 7th ACM Internet measurement, IMC’07, San Diego, USA, pp. 117–122 (2007)

12. Zhonghong, O., Erkki, H., Mika, Y.: Effects of different churn models on the performance of structured peer-to-peer networks. In: PIMRC’09 (2009)

13. Nurmi, D., Brevik, J., Wolski, R.: Modeling machine availability in enterprise and wide-area distributed computing environments. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 432–441. Springer, Heidelberg (2005)

Figure

Fig. 1. Relationship between node duration and residue duration
Fig. 2. Data duration of Skype                    Fig. 3. Data duration for ED

References

Related documents

 when approved approved by by all all parties, parties, constitutes constitutes an an agreement agreement that that data data may may be be collected collected

Additional highlighted events and presentations on campus include: Annual fall Sustainability Festival and spring Earth Day+, which involve many campus partners; Town Hall meetings

Economic problems, divorce, marital problems, lack of familial supervision, unemployment, psychological and physical problems had a significant relationship with the income of

Baseline and scenario projections (2000, 2025, 2050) of irrigated and rainfed blue/green consumptive water use by major grain production for rice in the Limpopo Basin..

T his study investigates the question “The Orthodox Tradition on Divorced and Remarried Faithful: What can the Catholic Church Learn?” in three steps: first, it outlines the

Diet and lifestyle interventions that tailor programs for activity and provide education and support with nutrition can reduce weight in childhood and adolescence, and

The full-band phase-averaged spectrum for the pulsar of interest was then obtained using all the photons from the full phase range, and fixing the diffuse background and all