Delft University of Technology
Parallel and Distributed Systems Report Series
The Peer-to-Peer Trace Archive:
Design and Comparative Trace Analysis
Boxun Zhang, Alexandru Iosup, and Dick Epema
{B.Zhang,A.Iosup,D.H.J.Epema}@tudelft.nl
Completed April 2010.
To be submitted after revision
report number PDS-2010-003
PDS
ISSN 1387-2109
Published and produced by:
Parallel and Distributed Systems Section
Faculty of Information Technology and Systems Department of Technical Mathematics and Informatics Delft University of Technology
Zuidplantsoen 4 2628 BZ Delft The Netherlands
Information about Parallel and Distributed Systems Report Series: [email protected]
Information about Parallel and Distributed Systems Section: http://pds.ewi.tudelft.nl/
c
2010 Parallel and Distributed Systems Section, Faculty of Information Technology and Systems, Department of Technical Mathematics and Informatics, Delft University of Technology. All rights reserved. No part of this series may be reproduced in any form or by any means without prior written permission of the publisher.
Abstract
Real-world measurements play a key role in studying the characteristics and improving the design of Peer-to-Peer (P2P) systems. Although many P2P measurements have been carried out in the last decade, few traces are publicly accessible, and the available traces are available online in different formats. This situation hampers researchers in exchanging, studying, and reusing existing traces. As a result, many P2P studies have been based on unrealistic assumptions about the characteristics of P2P systems, and many P2P algorithms and methods still lack a realistic evaluation. To address this problem, in this work we introduce the P2P Trace Archive, which we design as a virtual meeting place for the community to exchange P2P traces. First, we design the Trace Archive, including a single, flexible data format for storing anonymized P2P traces. Using the tools we have developed as part of the Archive, we add to the Archive more than 20 traces collected from 12 P2P communities; the traces capture the characteristics of millions of user sessions between 2003 and 2010. Second, we make a comparative analysis of traces in the Archive that focuses on content characteristics, peer arrivals and departures, and peer sharing behavior. We find that the characteristics and usage patterns differ significantly among systems and among communities, and that they change significantly over multi-year intervals. Third, we investigate how different methods for identifying peers and sessions in P2P traces may lead to very different analysis results.
Contents
1 Introduction 5
2 Requirements for a P2P Trace Archive 6
3 The P2P Trace Archive 6
3.1 A Unified Trace Format . . . 6
3.2 The Archive Design . . . 7
4 Traces Currently in the Archive 8 4.1 Community Dataset: SuprNova . . . 8
4.2 Community Dataset: PirateBay . . . 8
4.3 Community Dataset: FileList.org . . . 9
4.4 Community Dataset: LegalTorrents.com . . . 9
4.5 Community Dataset: eTree.org . . . 10
4.6 Community Dataset: tlm-project.org . . . 10
4.7 Community Dataset: transamrit.net . . . 10
4.8 Community Dataset: unix-ag.uni-kl.de . . . 11
4.9 Community Dataset: idsoftware.com . . . 11
4.10 Community Dataset: boenielsen.dk . . . 11
4.11 Community Dataset: alluvion.org . . . 11
4.12 Community Dataset: Gnutella . . . 12
4.13 Community Dataset: eDonkey . . . 12
4.14 Community Dataset: PP Live . . . 12
4.15 Community Dataset: Skype . . . 12
5 A Comparative Trace Analysis 12 5.1 Content characteristics . . . 12
5.2 Peer arrival and departure . . . 15
5.3 Bandwidth characteristics . . . 18
5.4 Peer Sharing Behavior . . . 22
6 Identifying Peers and Sessions 31 6.1 Peer Identification . . . 31
6.2 Session Identification . . . 36
7 Related Work 39
8 Conclusion and Ongoing Work 39
List of Figures
1 CDF of the file size in 6 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale). . . 13
2 CDF of the file size in 4 communities measured in 2005 and 2009. . . 13
3 CDF of the file popularity in 6 traces collected between 2003 and 2005 (horizontal axis in loga-rithmic scale). . . 16
4 CDF of the file popularity in 4 communities measured in 2005 and 2009 (horizontal axis in
logarithmic scale). . . 16
5 CDF of the (hourly) peer arrival rate in 5 traces collected between 2003 and 2005 (horizontal
axis in logarithmic scale). . . 19
6 CDF of the (hourly) peer arrival rate in 4 communities measured in 2005 and 2009 (horizontal
axis in logarithmic scale). . . 19
7 CDF of the peer session length in 5 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale). . . 20
8 CDF of the peer session length in 4 traces collected in 2009 (horizontal axis in logarithmic scale). 20
9 CDF of the peer session length in 4 communities measured in 2005 and 2009 (horizontal axis in
logarithmic scale). . . 21
10 CDF of the peer download speed in 5 traces collected between 2003 and 2005. . . 24
11 CDF of the peer download speed in 4 communities measured in 2009. . . 24
12 Comparison of the peer upload speed distributions in 4 traces collected in 2005 (horizontal axis in logarithmic scale). . . 25
13 Comparison of the peer upload speed distributions in 4 communities measured in 2009 (horizontal axis in logarithmic scale). . . 25
14 CDF of the download completion of traces collected between 2003 and 2005. . . 27
15 CDF of the download completion in 4 communities measured in 2005 and 2009. . . 27
16 CDF of the seeding time in 5 traces collected between 2003 and 2005 (horizontal axis in logarith-mic scale). . . 29
17 CDF of the seeding time in 4 communities measured in 2009 (horizontal axis in logarithmic scale). 29
18 CDF of the seeding-after-leeching time in 5 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale). . . 32
19 CDF of the seeding-after-leeching time in 4 communities measured in 2009 (horizontal axis in
logarithmic scale). . . 32
20 CDF of peer arrival rate for various peer identification intervals (horizontal axis in logarithmic scale). . . 34
21 CDF of session length resulting for various peer identification intervals (horizontal axis in loga-rithmic scale). . . 34
22 CDF of download speed for various peer identification intervals (horizontal axis in logarithmic scale). . . 35
23 CDF of peer arrival rate resulting from various session identification intervals (horizontal axis in logarithmic scale). . . 37
24 CDF of session length resulting from various session identification intervals (horizontal axis in logarithmic scale). . . 37
25 CDF of download speed resulting from various session identification intervals (horizontal axis in logarithmic scale). . . 38
List of Tables
1 Data format for dynamic peer-level data. . . 7
2 Summary of the datasets . . . 9
3 File Size Statistics . . . 14
4 P-values from KS and AD test for file size distributions. . . 14
5 Parameters of fitting distributions for file size. . . 15
6 File Popularity Statistics . . . 15
7 P-values from KS and AD test for file popularity distributions. . . 17
8 Parameters of fitting distributions for file popularity. . . 17
9 Peer Arrival Rate Statistics . . . 18
10 Peer Arrival Rate Statistics . . . 18
11 Peer Arrival Rate Statistics . . . 21
12 Session Length Statistics . . . 22
13 Session Length Statistics . . . 22
14 Session Length Statistics . . . 23
15 Download Speed Statistics . . . 23
16 Download Speed Statistics . . . 23
17 Download Speed Statistics . . . 24
18 Upload Speed Statistics . . . 26
19 Upload Speed Statistics . . . 26
20 Upload Speed Statistics . . . 27
21 Download Completion Statistics . . . 28
22 Download Completion Statistics . . . 28
23 Downlod Completion Statistics . . . 28
24 Seeding Time Statistics . . . 30
25 Seeding Time Statistics . . . 30
26 Seeding Time Statistics . . . 30
27 Seeding-after-Leeching Time Statistics . . . 31
28 Seeding-after-Leeching Time Statistics . . . 31
29 Seeding-after-Leeching Time Statistics . . . 33
30 Hourly Peer Arrival Rate Statistics . . . 33
31 Session Length Statistics . . . 34
32 Peer Download Speed Statistics . . . 34
33 Peer Difference Statistics . . . 35
34 Peer Difference Statistics . . . 35
35 Peer Difference Statistics . . . 35
36 Peer Difference Statistics . . . 35
37 Hourly Peer Arrival Rate Statistics . . . 36
38 Session Length Statistics . . . 36
39 Peer Download Speed Statistics . . . 36
40 Session Difference Statistics . . . 37
41 Session Difference Statistics . . . 38
1
Introduction
Peer-to-Peer (P2P) systems have gained phenomenal popularity in the past few years, and several studies [1,
20] show that P2P applications generate large amounts of Internet traffic. Measurement data collected from real P2P systems are fundamental for gaining solid knowledge and understanding of the usage patterns and the characteristics of these systems. Thus, measurement data are important for the modeling, the design, and the evaluation of P2P systems. Although many P2P measurements have been carried out in the last decade, few measurement results [16, 12, 15] are publicly available, and for these few the data are presented in different formats. This situation makes it difficult for researchers to exchange, study, and reuse existing traces. Furthermore, due to the lack of available datasets, many P2P studies have been based on unrealistic assumptions about the characteristics and usage patterns of P2P systems, and as a consequence, many P2P algorithms and methods still lack a realistic evaluation. Until now, no effort has been put into making existing P2P traces accessible to the research community. To remedy this situation, in this work we present the P2P Trace Archive (P2PTA): a virtual meeting place that facilitates the collection and exchange of P2P traces. In addition, we perform a comparative analysis of many of the traces in the P2PTA.
One of the main benefits of building the P2PTA is that the Archive paves the way for comparative studies of P2P systems, which may help researchers to consider various (types of) P2P systems and to capture their overall characteristics simultaneously, and to discover the long-term evolution in the behavior of P2P systems. Such studies may lead to better knowledge of the commonalities and differences in usage patterns in P2P systems, so that it becomes possible to envisage the usage pattern of a new P2P system by looking at those of similar systems. Another important benefit of the P2PTA is that it complements the current model-based approaches with a trace-based approach. In this way, the hidden patterns that exist in real traces of existing P2P systems will be implicitly used to improve the testing and tuning of P2P systems.
In this work, we first present the P2P Trace Archive. The main design goal of the Archive is to facilitate and simplify the exchange of P2P traces. To achieve this goal, we design a unified data format to represent traces in the Archive with three main considerations. First, the data format is designed to fully reflect the structure of P2P systems. Secondly, the data format can be easily extended for new traces, and extending the data format will not affect the traces already stored in the Archive. Thirdly, the data format ensures the anonymization of user information in the Archive. With the tools associated with the data format, we add to the Archive more than 20 traces collected from 12 communities, which capture the characteristics of millions of users between 2003 and 2009. Besides the unified data format, the Archive also has several software modules for trace collection, anonymization, and processing, respectively.
Secondly, we perform a comparative analysis of traces in the Archive, both across multiple P2P systems and across time. The analysis focuses on content characteristics, peer arrivals and departures, peer bandwidth, and peer sharing behavior, respectively. We find that these characteristics differ significantly in different com-munities, and some characteristics also change dramatically over the years. This result indicates the need to calibrate P2P models and algorithms with a sufficient number of traces. We also investigate how different ways of identifying peers and sessions in traces in the face of dynamic IP-address reassignment influence the analysis results.
Our contribution in this work is threefold:
1. We establish the largest P2P trace archive to date, and adopt a unified data format to represent anonymized traces (Section 3).
2. We conduct a multi-angle comparative trace analysis, and we find that P2P systems differ significantly and evolve rapidly over the years (Section 5).
3. We investigate how different ways of identifying peers and sessions in the face of dynamic IP-address reassignment impacts the results of analyzing P2P traces (Section 6).
2
Requirements for a P2P Trace Archive
In this section, we formulate five requirements for building a P2P trace archive. The first three of these are for designing the data format used to include traces in the Archive, while the last two are for building the actual Archive.
Requirement 1: Trace Archiving. First, the data format used to include traces in the Archive must
reflect as much as possible the structure of P2P systems. Thus, a common set of operational levels must be found across P2P systems. Second, because of the complexity and fast evolution of P2P systems, the format must be flexible and extensible, in order to not only support existing, but also future traces. Finally, existing traces in the Archive should not be affected when the data format is extended for new traces.
Requirement 2: Trace Comparison. The data format should ease the process of trace comparison. The
data format should organize the traces in such a way that it is straightforward for researchers to compare traces collected from different P2P systems, traces collected from same P2P system but in different years, and traces collected with different measurement techniques.
Requirement 3: Trace Processing. First, for privacy and ethical reasons, information that can be used to identify users should be anonymized in the Archive. Previous privacy breaches of AOL [5] and NetFlix [19] indicate that simply anonymizing user names is not enough to preserve privacy, as other relevant information can still be used to identify users. Thus, all user-related information in the Archive must be anonymized thoroughly to ensure user privacy. Furthermore, traces originally represented in other formats must be converted into the trace format without losing useful information.
Requirement 4: Trace Using. To facilitate the usage of traces, the Archive should provide a set of tools to extract commonly used properties of P2P traces, such as peer arrival rate and bandwidth. The Archive should also provide tools for generating input to P2P simulators.
Requirement 5: Trace Sharing. The Archive must host its traces at a place that is accessible for
large numbers of users. The Archive should also allow researchers to rank traces and share use cases of these traces. This information will be considered as feedback on and suggestions for improving the Archive, and will provide other prospective trace users extra information about traces in the Archive, helping them to select the appropriate traces for their research.
3
The P2P Trace Archive
In this section, we present our P2P Trace Archive (P2PTA). We first introduce the data format of the traces, and then we introduce the main software modules in the Archive.
3.1
A Unified Trace Format
In order to simplify the exchange and reuse of traces, we design a unified data format to represent all the traces in the Archive, and this data format is extended from the format proposed in our previous work [26]. We now introduce the main design features of this data format, and also show how this design addresses Requirements 1.-3. formulated in Section2.
To address Requirement 1, in our design the trace data are stored at four different levels, three corresponding each to one of the community, the swarm, and the peer levels, and a fourth level to store data that characterize the interaction between the P2P application and the resources it uses (hard disks, bandwidth, etc.). At each level, we distinguish between static and dynamic data, which are stored separately. As an example, Table1
shows the format for storing dynamic peer-level data. Each peer event such as starting downloading a file and sending a query message is stored in a record with information identifying the peer, the event type, etc. and one or more data fields. In our experience, three values for each event type are enough for all the traces in the Archive. We keep a separate event mapping table that records for every event type its ID and additional information about the event type. In this way, new event types can be easily added by adding new entries
into the event mapping table without affecting existing traces in the Archive, which also addresses one part of Requirement 3 of Section2. Static information like file names and sizes are stored in a similar format, without a time stamp.
To address Requirement 2 of Section2, in our design we distinguish between traces and community datasets. A trace is the result of a single measurement collected from a P2P community. A community dataset is a set of traces collected from the same P2P community by possibly different authors, in different years, and with different measurement techniques. Traces in a community dataset are further grouped by the year when they were collected and by the measurement techniques used to collect them. This design simplifies the study of characteristics of different systems (by comparing different community datasets), the study of the evolution of P2P systems (by comparing traces collected in different years in one community dataset), and also the study of measurement techniques (by comparing traces within one community dataset but collected with different measurement techniques.)
To address Requirement 3 of Section2, in our design we employ user mapping tables, one per trace, to store the relationships between information identifying users (e.g., IP address) and integer user identifiers generated by tools in the Archive. When the user mapping table for a converted dataset is not made public, this approach effectively anonymizes the traces, with the notable drawback of loosing information (e.g., (approximate) geo-graphical location). When converting a trace into our format, all event types in the original trace should first be identified and added to the corresponding event mapping table, and then the data related to actual events are stored in the records for dynamic and static data, respectively. Since the unique identifiers in the mapping tables are stored as integral values, the mapping tables greatly reduce the storage requirements and significantly increase the speed of processing the stored data, especially for dynamic data. Another benefit of using event mapping tables is that the research community can work together to establish such tables with generic event types of the systems, which will further simplify the trace comparison process.
Furthermore, since the data volume of most P2P traces is large, the Archives trace format is designed to include initially only the minimal amount of information needed to reproduce accurately the original trace. However, through its extensibility features our format can also store information derived through intensive computation, thus reducing the trace processing efforts of the Archive users.
ID Field Description 1 Time
Stamp
Timestamp when data are col-lected (only for dynamic data) 2 Swarm
ID
Unique identifier of the swarm or the group the measured peer be-longs to
3 Peer ID Unique identifier of the mea-sured peer
4 Event ID Unique identifier of peer event type
5 iVal The integer value of the peer event
6 fVal The float value of the peer event 7 sVal The string value of the peer
event
Table 1: Data format for dynamic peer-level data.
3.2
The Archive Design
We envision three main roles for P2PTA members. The contributor is the legal owner of P2P traces, and agrees to offer these traces to the Archive. The archive administrator manages the operation of the P2PTA and helps
contributors to add and convert traces. The trace user uses the traces in the Archive for their research but does not own these traces. We now introduce the main software modules in the P2PTA—collectively, they make the P2PTA meet Requirements 4 and 5 of Section2.
The trace collection module is responsible for collecting traces from contributors. If the collected trace is already anonymized by the contributor, it will be converted into the unified trace format directly by the trace conversion module. Otherwise, the trace will be anonymized by the trace anonymization module, and mapping tables for user relevant information will be sent back to the trace contributor but will not be included in the Archive.
The trace processing module provides basic functions to extract common features of P2P systems, like peer bandwidth and content popularity, and the simulator module is designed for generating input for simulators. Both these modules are open to the research community, so that they can be improved by the community for future research. We also invite the community to contribute tools for complex trace analysis to the Archive. Finally, the trace sharing module is responsible for hosting the traces in the Archive and providing space for users to rank and comment on traces.
4
Traces Currently in the Archive
In this section we present the traces included in the Archive—currently more than 20 heterogeneous traces collected from the BitTorrent, the Gnutella, and the eDonkey P2P systems by various researchers. In particular, the Archive includes a rich collection of traces taken from BitTorrent, one of the most popular file-sharing systems. From the community perspective, the BitTorrent traces focus on communities with either general or very specific types of content, and communities that are accessible either to everyone or that are only open to a small number of users and adopt sharing-ratio enforcement. From the community size perspective, these traces have been collected from the largest communities in the world at the time of the data collection down to small communities, both in terms of number of users and number of shared files. Table II gives an overview of all the traces in the P2PTA; many of these traces have not been analyzed before.
Besides including traces that we have collected ourselves in the Archive, we have also converted traces collected by others, such as the community dataset eDonkey (T13’03 and T13’04) [12, 17], the trace T5’05 (small) [4], and the trace T11’03 [16]. The trace T12’04 is a subset of the Gnutella trace collected by [15]; because of time constraints, we currently only include data of 10 out of the 56 days of the trace collected in the original measurement, but we plan to include the rest of the data in the Archive in the near future.Below, we describe three community datasets in some detail.
4.1
Community Dataset: SuprNova
Trace (1): T1’03
This trace was collected from SuprNova during the period between 2003 and 2004, and it was first studied by [21]. SuprNova was the biggest BitTorrent community at that time and it distributed various types of contents. This trace contains detailed peer level data, which was collected from 12 big swarms during the period between Dec 6, 2003 and Jan 17, 2004, with a sampling interval of 2.5 minutes; in total, 28,423,470 sessions were captured. In this trace, peer’s IP address, port number, download progress (number of downloaded chunks), and error messages are recorded.
4.2
Community Dataset: PirateBay
Trace (1): T2’04
This trace was collected from the ThePirateBay during the period between 05 May 2005 and 11 May 2005, and it was first studied by [13]. ThePiratecommunity distributes vary types of contents. The trace contains peer level data. which was collected from 4,000 swarms with sampling interval of 2.5 minutes, and in total 35,881,338
ID Trace description (content type) Period Sampling Files Sessions Traffic T1’03 SuprNova, (general) 06 Dec 2003 to 17 Jan 2004 2.5 min 120 28,423,470 n/a T2’05 ThePirateBay, (general) 05-11 May 2005 2.5 min 4,800 35,881,338 12 PB/year T3’05 FileList.org, (general) 14 Dec 2005 until 4 Apr 2006 6 min 3,000 2,172,738 n/a T4’05 LegalTorrents.com 22 Mar to 19 Jul 2005 5 min 41 n/a 698 GB/day T4’09 (General) 24 Sep 2009 to Feb 2010 5 min 183 n/a 1.1 TB/day T5’05 eTree.org 22 Mar to 19 Jul 2005 15 min 52 165,168 9 GB/day T5’05 (small) collected by [4] Mar 2004 30 min 1,505 81,584 n/a
T5’10 (Recorded events & meetings) 24 Sep 2009 to Feb 2010 15 min 45 169,768 143 GB/day T6’05 tlm-project.org 22 Mar to 30 Apr 2005 10 min 264 149,071 735 GB/day T6’09 (Linux OS) 24 Sep 2009 to Feb 2010 10 min 74 21,529 15 GB/day T7’05 transamrit.net 22 Mar to 19 Jul 2005 5 min 14 130,253 258 GB/day T7’09 (Slackware OS) 24 Sep 2009 to Feb 2010 5 min 60 61,011 840 GB/day T8’05 unix-ag.uni-kl.de 22 Mar to 19 Jul 2005 5 min 11 279,323 493 GB/day T8’09 (Knoppix OS) 24 Sep 2009 to Feb 2010 5 min 12 160,522 348 GB/day T9’05 zerowing.idsoftware.com 22 Mar to 19 Jul 2005 5 min 13 48,271 19 GB/day T9’09 (Game demos) 24 Sep 2009 to Feb 2010 5 min 37 14,697 12 GB/day T10’05 boegenielsen.dk (Knoppix OS) 22 Mar to 19 Jul 2005 5 min 15 36,391 308 GB/day T11’03 alluvion.org, (general) [16] Oct 27 2003 to Jan 16 2004 30 min 1,476 173,532 n/a T12’04 Gnutella, (general) Mar 19 2004 to Mar 28 2004 n/a 2,896,885 n/a n/a T13’03 eDonkey Oct 14 2003 to Oct 16 2003 n/a 1,282,420 n/a n/a T13’04 (general) Dec 9 2003 to Feb 2 2004 n/a 23,965,651 n/a n/a T14’07 PPLive (Streaming & VoD) Jan 2007 10 min n/a 67,051 n/a
T15’07 Skype (VoIP) Sep 2005 30 min n/a 29,218 n/a
Table 2: Summary of the traces in the P2PTA.
sessions were captured, which contains peers’ IP address, port number, client ID, download progress (number of downloaded chunks), and error messages. The estimated annual throughput of this community during that period is 12 PB.
4.3
Community Dataset: FileList.org
Trace (1): T3’05
The trace T3’06 was collected from Filelist.org during the period from Dec 14, 2005 until Apr 4, 2006, and it was first studied by [22]. FileList.org is a private BitTorrent community that distributes various types of contents. This community adopts a sharing-ratio enforcement scheme and removes users who do not actively contribute to the community, which is its main difference from most other BitTorrent communities represented in the Archive. This trace contains data collected from 3,000 swarms this community, and in each swarm peer’s ID, download and upload amount, download and upload speed, connectivity, and connected time are recorded, which captures 2,172,738 sessions. At the time when this trace was collected, the FileList.org community had around 110,000 members.
4.4
Community Dataset: LegalTorrents.com
Traces (2): T4’05, T4’09
T3’05 was collected from the LegalTorrents.com during the period between 22 Mar 2005 and 17 Jul 2005, and T3’09 has been collected from this community since 24 Sep 2009 with 5 minute sampling interval. This community mainly distributes general types of contents. Both datasets only contain community-level data, which is the number of leechers and seeders, total number of completed downloads and traffic of each swarm. And both datasets contain descriptive information of measured torrents including file name, added time, file size, number of files in each torrent and description.
In 2005, 41 swarms were measured and the daily throughput of this community was 698 GB traffic. In 2009, 183 swarms until now are measured and the daily throughput of this community is 1.1 TB traffic.
4.5
Community Dataset: eTree.org
Traces (3): T5’05, T5’05(small), T5’09
T5’05 was collected from etree.org during the period between 22 Mar 2005 and 17 Jul 2005, T5’09 has been colloected from this community since 24 Sep 2009 with 15 minute sampling interval, and T5’05(small) was collected in a 10-day duration in 2005 May with 30 minutes sampling interval. Both of T5’05 and T5’09 are collected by the PDS group of TU Delft, and T5’05(small) was collected by [4]. This community mainly distributes recorded events and only provides legal contents. Both datasets only contain swarm level data, which is peer’s ip address with last byte blinded, client type, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed of in each swarm. And both datasets contain descriptive information of measured torrents including file name, infohash, added time, file size, number of files in each torrent and torrent description.
In 2005, 165,168 sessions in 52 swarms were measured and the daily throughput of this community was 9 GB. In 2009, until now 169,768 sessions in 45 swarms are measured and the daily throughput of this community is 143 GB traffic.
4.6
Community Dataset: tlm-project.org
Traces (2): T6’05, T6’09
T6’05 was collected from tlm-project.org during the period between 22 Mar 2005 and 30 Apr 2005, and T6’09 has been collected from this community since 24 Sep 2009 with 10 minute sampling interval. This community mainly distributes various linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads and traffic of each measured swarm; peer level data contains peer’s ip address with last byte blinded, port number, download amount, upload amount, download progress, connected time, sharing ratio in each swarm, and T5’09 also includes peer’s download and upload speed. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size, number of files in each torrent.
In 2005, 149,071 sessions in 264 swarms were measured and the daily throughput of this community was 735 GB. In 2009, until now 21,529 sessions in 74 torrents are measured and the daily throughput of this community is 15 GB.
4.7
Community Dataset: transamrit.net
Traces (2): T7’05, T7’09
T7’05 was collected from the transamrit.net during the period between 22 Mar 2005 and 19 Jul 2005, and T7’09 has been collected from this community since 24 Sep 2009 with 5 minute sampling interval. This community mainly distributes Slackware linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads and traffic of each measured swarm; peer level data contains ip address with last byte blinded, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent.
In 2005, 130,253 sessions in 14 swarms were measured and the daily throughput of this community was 258 GB. In 2009, until now 61,011 sessions in 60 swarms are measured and the daily throughput of this community
is 840 GB.
4.8
Community Dataset: unix-ag.uni-kl.de
Traces (2): T8’05, T8’09
T8’05 was collected from unix-ag.uni-kl.de during the period between 22 Mar 2005 and 19 Jul 2005, and T8’09 has been collected from this community since 24 Sep 2009 with 5 minute sampling interval. This com-munity mainly distributes Knoppix linux distributions and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads, total traffic and average download progress of all participating peers of each swarm; peer level data contains peer’s ip address with last byte blinded, port number, download amount, upload amount, connected time, sharing ratio, download progress, download speed and upload speed in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent.
In 2005, 279,323 sessions in 11 swarms were measured and the daily throughput of this community was 493 GB. In 2009, until now 160,522 sessions in 12 swarms are measured and the daily throughput of this community is 348 GB.
4.9
Community Dataset: idsoftware.com
Traces (2): T9’05, T9’09
T9’05 was collected from idsoftware.com during the period between 22 Mar 2005 and 19 Jul 2005, and T9’09 has been collected from this community since 24 Sep 2009 with 5 minute sampling interval. This community distributes demos of games from id Software and only provides legal contents. Both datasets contain community level and swarm level data: community level data contains the number of leechers and seeders in each swarm: peer level data contains peer’s ip address with last byte blinded, port number, download amount, upload amount, connected time, download progress and sharing ratio in each measured swarm. And both datasets contain descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent.
In 2005, 48,271 sessions in 13 swarms were measured and the daily throughput of this community was 19 GB. In 2009, until now 14,697 sessions in 37 swarms are measured and the daily throughput of this community is 12 GB.
4.10
Community Dataset: boenielsen.dk
Datasets (2): T10’05
T10’05 was collected from boegenielsen.dk during the period between 22 Mar 2005 and 19 Jul 2005 with 5 minute sampling interval. This community mainly distributed Knoppix linux distributions and only provided legal contents. The dataset contains community level and swarm level data: community level data contains the number of leechers and seeders, total number of completed downloads, total traffic and average download progress of all peers of each swarm: peer level data contains peer’s ip address with last byte blinded, port number, download amount, upload amount, connected time, download progress and sharing ratio in measured swarms. And the dataset also contains descriptive information of torrents including file name, infohash, added time, file size and number of files in each torrent.
In 2005, 36,391 sessions in 15 swarms were measured and the daily throughput of this community was 308 GB.
4.11
Community Dataset: alluvion.org
The trace T11’03 was collected from alluvion.org during the period from Dec 14, 2005 until Apr 4, 2006, and it was collected by [16]. Alluvion.org is a BitTorrent tracker for users of the Something Awful forums. SA members can upload torrents and anyone can download them. This trace contains data collected from 1,476 swarms in this community, and in each swarm peer’s ID, download and upload amount, download and upload speed, connectivity, and connected time are recorded, which captures 173,532 sessions.
4.12
Community Dataset: Gnutella
Trace (1): T12’04
The trace T12’04 is a subset of the Gnutella trace collected by [15]; because of time constraints, we currently only include data of 10 out of the 56 days of the trace collected in the original measurement, but we plan to include the rest of the data in the Archive in the near future.
4.13
Community Dataset: eDonkey
Trace (2): T13’03, T13’04
Trace T13’03 and T13’04 were collected by [12,17] in 14-16 october 2003 and 09 december 2003 - 02 february 2004, respectively. These traces were collected from a fake client, connecting to other clients, and asking for their lists of files.
4.14
Community Dataset: PP Live
Trace: T14’07
The trace T14’07 was collected and studied by [24] in 2007 by taking snapshot of PP Live network. The measurement was conducted in two video channels on PP Live for one day, with 10 minute sampling interval. As a result, 67,051 sessions were collected.
4.15
Community Dataset: Skype
Trace: T15’05
The trace T15’05 was collected and studied by [11] in 2005 by pinging the super nodes in Skype network. With 30 minute sampling interval, 29,218 sessions were collected.
5
A Comparative Trace Analysis
In this section, we present a comparative analysis of the traces currently in the Archive. Our analysis focuses on content characteristics, peer arrivals and departures, peer bandwidth, and peer sharing behavior, respectively. We show how these characteristics differ across P2P communities and evolve over the years. In the analysis results, IQR stands for the Inter-Quartile Range of a stochastic variable.
5.1
Content characteristics
The size and the popularity of content distributed in P2P systems are basic properties to characterize these systems. We find that the content size distributions differ significantly in P2P communities. In Gnutella and eDonkey, more than 70% of the file sizes are less than 10 MB. In contrast, most of the files distributed in most of the BitTorrent communities are much larger, as shown in Figure1. We also notice that in some communities, the file size distribution changes dramatically over time and the evolution trend is different among communities: Most of the files distributed in LegalTorrents (T4’05,’09) in 2009 were smaller than in 2005, while most of the files in eTree (T5’05,’09) and id Software (T9’05,’09) were larger in 2009 than in 2005; the file size distribution
0 0.2 0.4 0.6 0.8 1 0.1 1 10 100 1000 10000 CDF File Size (MB) IQR Gnutella eDonkey BitTorrent T3’05 T6’05 T11’05 T12’04 T13’04
Figure 1: CDF of the file size in 6 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale).
0 0.2 0.4 0.6 0.8 1 0 300 600 900 1200 CDF T4’05 T4’09 0 0.2 0.4 0.6 0.8 1 0 300 600 900 1200 T5’05 T5’09 0 0.2 0.4 0.6 0.8 1 0 300 600 900 1200 CDF File Size (MB) T6’05 T6’09 0 0.2 0.4 0.6 0.8 1 0 300 600 900 1200 File Size (MB) T9’05 T9’09
Figure 2: CDF of the file size in 4 communities measured in 2005 and 2009.
of tlm-project (T6’05,’09) remained almost unchanged between 2005 and 2009, as shown in Figure2. Statistics of the file size in the traces analyzed in this section further support this finding, as shown in Table3.
The file size distributions of most traces can be fit with either Weibull, Log-Normal, or Gamma distributions, but only the file size distribution of eDonkey (T13) can be fit with Pareto distribution, as shown in Table4. Table5 shows the parameters for each fitting distributions.
We also find that the file popularity distributions are very different in P2P communities. Files distributed in most BitTorrent communities are requested by thousands of peers. In contrast more than 80% of the files in eDonkey and Gnutella are owned or requested by less than 10 peers, as shown in Figure 3. Similar to file size distribution, the file popularity distributions change significantly over time in some communities, and again its evolution trend is not same among communities. Many of the files distributed in tlm-project (T6’05,’09) and unix-ag.uni-kl (T8’05,’09) were requested by much fewer peers in 2009 than in 2005, some of the files in id Software(T9’05,’09) were requested by more peers in 2009 than in 2005, and the file popularity distribution in transamrit(T7’05,’09) remained almost unchanged between 2005 and 2009, as shown in Figure4. Statistics of the file popularity in the traces analyzed in this section further support this finding, as shown in Table6.
Trace Max Mean StDev Q1 Median Q3 IQR T3’05 17,963 1,053 1,344 349 604 1,212 863 T4’05 1,010 492 247 329 434 700 371 T4’10 10,032 303 807 39 80 219 180 T5’05 1,955 793 322 656 824 937 281 T5’09 18,200 6,648 7,175 790 1,087 15,682 14,891 T6’05 3,205 410 329 92 439 644 552 T6’09 2,079 411 333 135 390 667 532 T9’05 463 129 168 8 17 258 250 T9’09 745 285 212 103 248 446 343 T11’03 9,953 702 879 186 672 716 530 T12’04 4,096 14 70 3 4 5 3 T13’04 4,096 76 205 0 3 8 8
Table 3: File Size Statistics (MB).
Trace Exponential Weibull Pareto Log-Normal Gamma T1’03 0.003 0.253 0.094 0.715 0.000 0.000 0.040 0.633 0.057 0.669 T2’05 0.120 0.739 0.093 0.739 0.000 0.036 0.108 0.732 0.087 0.752 T3’05 0.094 0.321 0.121 0.344 0.000 0.000 0.177 0.487 0.089 0.319 T4’05 0.026 0.252 0.257 0.630 0.000 0.000 0.074 0.307 0.245 0.578 T4’09 0.014 0.099 0.198 0.435 0.000 0.000 0.340 0.579 0.129 0.330 T5’05 0.007 0.069 0.211 0.579 0.000 0.000 0.122 0.514 0.165 0.555 T5’05(s) 0.013 0.053 0.363 0.495 0.000 0.000 0.348 0.514 0.427 0.565 T5’09 0.000 0.087 0.005 0.187 0.000 0.001 0.007 0.216 0.005 0.190 T6’05 0.116 0.326 0.108 0.304 0.000 0.000 0.086 0.269 0.112 0.303 T6’09 0.193 0.406 0.207 0.414 0.000 0.000 0.167 0.409 0.203 0.418 T7’05 0.000 0.015 0.214 0.723 0.000 0.000 0.114 0.633 0.117 0.648 T7’09 0.000 0.044 0.000 0.056 0.000 0.000 0.000 0.056 0.000 0.060 T8’05 0.000 0.083 0.056 0.754 0.000 0.000 0.033 0.720 0.031 0.721 T8’09 0.003 0.314 0.000 0.335 0.000 0.000 0.000 0.296 0.000 0.298 T9’05 0.000 0.221 0.013 0.445 0.000 0.018 0.024 0.524 0.012 0.452 T9’09 0.240 0.645 0.210 0.680 0.000 0.000 0.258 0.666 0.221 0.677 T10’05 0.000 0.018 0.002 0.324 0.000 0.000 0.002 0.255 0.002 0.302 T11’03 0.220 0.481 0.209 0.492 0.000 0.000 0.139 0.367 0.212 0.490 T12’04 0.000 0.003 0.015 0.104 0.000 0.000 0.015 0.106 0.004 0.051 T13’03 0.000 0.000 0.215 0.494 0.009 0.053 0.173 0.469 0.083 0.297 T13’04 0.000 0.000 0.216 0.503 0.009 0.053 0.221 0.510 0.081 0.284
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 1767.71 1977.07 2.90 -1.09 3422.23 7.38 0.52 5.14 343.79 T2’05 1.05 1.11 1.14 -0.03 1.09 -0.38 0.98 1.31 0.81 T3’05 1056.93 1015.81 0.93 0.27 777.97 6.38 1.07 1.00 1061.88 T4’05 492.48 546.36 1.91 -0.95 964.90 5.97 1.00 2.29 215.09 T4’09 308.58 200.68 0.65 0.74 102.14 4.55 1.50 0.53 581.47 T5’05 793.34 891.80 2.59 -0.57 1139.92 6.58 0.47 5.37 147.72 T5’05(s) 743.04 837.43 2.01 -0.19 848.74 6.49 0.53 4.19 177.34 T5’09 6647.98 5540.23 0.74 0.88 2603.20 7.87 1.49 0.65 10163.75 T6’05 409.94 431.04 1.17 -0.09 442.90 5.54 1.19 1.19 344.59 T6’09 411.36 440.61 1.24 -0.14 466.01 5.62 1.01 1.39 295.48 T7’05 649.59 655.85 65.71 -1.69 1126.39 6.48 0.02 1972.54 0.33 T7’09 1112.78 1207.79 1.23 0.01 1103.58 6.73 0.65 1.90 584.68 T8’05 693.72 696.24 174.38 -1.47 1026.86 6.54 0.01 15388.67 0.05 T8’09 1865.59 2011.65 1.24 -1.10 4684.01 7.17 0.85 1.52 1223.75 T9’05 138.58 96.61 0.62 1.37 28.10 3.68 1.81 0.51 272.86 T9’09 285.24 309.23 1.31 -0.58 466.53 5.28 1.00 1.50 189.92 T10’05 653.34 682.51 14.42 -1.72 1205.29 6.47 0.13 67.76 9.64 T11’03 702.33 675.70 0.92 0.11 618.93 5.89 1.47 0.88 794.92 T12’04 13.98 6.62 0.59 0.48 4.16 1.10 1.88 0.42 32.94 T13’03 63.26 8.21 0.32 3.80 0.23 0.36 3.70 0.20 324.08 T13’04 76.11 9.04 0.31 3.93 0.21 0.43 3.71 0.19 400.49
Table 5: Parameters of fitting distributions for file size.
Table7 and Table8shows the significance values from GOF test and parameters of fitting distributions for file popularity, respectively.
5.2
Peer arrival and departure
The peer arrival rate is one of the key elements to model churn in P2P networks, and we find that it differs significantly across P2P communities. The peer arrival rate in SuprNova (T1’03) can reach a few thousand per hour, while in alluvion (T11’04) it is less than 10 peers per hour most of the time, as shown in Figure5. When comparing the peer arrival rate of the same communities in different years, we do not find significant differences in most communities, except that in trasamrit (T7’05, ’09) for most of the time the peer arrival rate was lower in 2009 than 2005, as shown in Figure6. Statistics of the peer arrival rate in the traces analyzed in this section are shown in Table9.
Trace Max Mean StDev Q1 Median Q3 IQR
T2’05 9,951 441 751 8 36 691 683 T3’05 13,618 709 1,253 73 245 783 710 T6’05 8,553 576 1,295 28 114 420 392 T6’09 30,162 634 3,124 2 8 40 38 T7’05 49,199 9,304 15,873 166 290 13,374 13,209 T7’09 90,485 7,892 21,595 176 427 6,470 6,294 T8’05 129,978 25,393 35,039 5,794 10,784 27,579 21,785 T8’09 227,098 19,640 42,038 57 4,505 19,902 19,845 T9’05 23,493 3,448 5,917 360 1,445 3,411 3,051 T9’09 9,725 2,811 2,728 925 1,289 4,539 3,614 T11’04 2,364 114 198 15 50 124 109 T12’04 9,011 2 14 1 1 1 0 T13’04 5,533 2 11 1 1 2 1
0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF File Popularity IQR T3’05 T6’05 T11’05 T12’04 T13’04
Figure 3: CDF of the file popularity in 6 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale). 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 CDF T6’05 T6’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 T7’05 T7’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 105 CDF File Popularity T8’05 T8’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 File Popularity T9’05 T9’09
Figure 4: CDF of the file popularity in 4 communities measured in 2005 and 2009 (horizontal axis in logarithmic scale).
Trace Exponential Weibull Pareto Log-Normal Gamma T1’03 0.138 0.716 0.126 0.704 0.000 0.000 0.119 0.671 0.132 0.711 T2’05 0.002 0.071 0.104 0.377 0.000 0.017 0.117 0.398 0.116 0.381 T3’05 0.100 0.203 0.471 0.595 0.000 0.003 0.453 0.606 0.429 0.542 T5’05 0.042 0.182 0.332 0.619 0.006 0.042 0.206 0.552 0.334 0.603 T5’05(s) 0.218 0.402 0.455 0.582 0.000 0.003 0.473 0.656 0.379 0.521 T5’09 0.153 0.417 0.224 0.455 0.000 0.000 0.404 0.620 0.258 0.501 T6’05 0.008 0.041 0.433 0.576 0.001 0.010 0.485 0.642 0.278 0.430 T6’09 0.000 0.001 0.047 0.329 0.002 0.263 0.268 0.510 0.008 0.071 T7’05 0.000 0.046 0.070 0.591 0.003 0.108 0.122 0.665 0.044 0.499 T7’09 0.000 0.034 0.137 0.656 0.001 0.065 0.192 0.686 0.083 0.489 T8’05 0.136 0.639 0.171 0.678 0.000 0.016 0.161 0.723 0.158 0.654 T8’09 0.004 0.078 0.123 0.552 0.001 0.045 0.099 0.519 0.194 0.541 T9’05 0.063 0.418 0.308 0.741 0.002 0.058 0.236 0.722 0.264 0.671 T9’09 0.076 0.407 0.078 0.415 0.000 0.000 0.094 0.515 0.064 0.418 T10’05 0.000 0.060 0.196 0.656 0.006 0.223 0.235 0.703 0.107 0.583 T11’03 0.167 0.306 0.472 0.610 0.000 0.004 0.388 0.600 0.424 0.555 T12’04 0.000 0.440 0.000 0.451 0.000 0.362 0.000 0.651 0.000 0.436 T13’03 0.000 0.525 0.000 0.567 0.000 0.300 0.000 0.736 0.000 0.675 T13’04 0.001 0.485 0.000 0.488 0.000 0.316 0.003 0.690 0.002 0.484
Table 7: P-values from KS and AD test for file popularity distributions.
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 32458.00 33880.10 1.13 -0.30 43056.37 9.91 1.22 1.19 27326.07 T2’05 441.22 212.61 0.48 2.25 25.92 4.20 2.31 0.36 1242.71 T3’05 709.12 508.32 0.66 0.75 275.49 5.40 1.72 0.54 1311.33 T5’05 3514.23 2083.22 0.54 1.12 877.53 6.57 2.31 0.41 8531.34 T5’05(s) 51.34 42.44 0.77 0.46 28.59 3.08 1.36 0.70 73.34 T5’09 7435.00 7876.55 1.15 0.05 7049.89 8.55 0.81 1.53 4863.84 T6’05 575.57 290.42 0.53 1.28 94.18 4.68 2.02 0.39 1463.51 T6’09 634.22 46.14 0.34 2.01 5.54 2.51 2.42 0.19 3365.89 T7’05 9303.86 2738.32 0.39 2.56 225.11 6.53 2.84 0.27 34590.87 T7’09 7892.19 2061.94 0.40 2.00 323.77 6.31 2.81 0.26 29848.23 T8’05 25393.09 22634.27 0.83 0.37 16439.22 9.39 1.36 0.79 32039.18 T8’09 19640.38 5839.71 0.37 3.94 130.40 7.10 3.29 0.25 77191.58 T9’05 3448.00 2280.77 0.60 0.80 1241.14 6.78 2.14 0.47 7318.12 T9’09 2810.50 2937.25 1.11 -0.06 2974.18 7.52 0.91 1.32 2122.07 T10’05 2599.43 789.10 0.39 2.93 48.61 5.26 2.96 0.27 9648.87 T11’03 113.57 85.30 0.68 0.60 53.28 3.65 1.68 0.57 198.31 T12’04 2.17 1.81 0.84 0.21 1.35 0.22 0.61 1.03 2.10 T13’03 1.48 1.61 1.22 0.04 1.40 0.20 0.46 2.84 0.52 T13’04 2.54 2.26 0.88 0.22 1.72 0.41 0.71 1.10 2.31
Trace Max Mean StDev Q1 Median Q3 IQR T1’03 2,614 83 132 23 51 86 63 T3’05 4,503 12 30 2 5 12 10 T5’05 29 5 3 3 4 5 2 T5’09 315 10 15 3 5 11 8 T6’09 83 3 4 1 2 4 3 T7’05 321 13 12 2 10 20 18 T7’09 516 10 19 1 3 6 5 T11’04 477 2 4 1 1 3 2 T8’05 681 17 29 2 6 14 12 T8’09 430 16 22 3 7 17 14 T9’05 44 3 3 1 2 4 3 T9’09 36 2 2 1 2 3 2
Table 9: Peer Arrival Rate Statistics (number of peers per hour).
Trace Exponential Weibull Pareto Log-Normal Gamma T1’03 0.347 0.524 0.393 0.567 0.000 0.000 0.371 0.606 0.361 0.539 T2’05 0.394 0.636 0.393 0.642 0.000 0.005 0.438 0.713 0.386 0.623 T3’05 0.118 0.416 0.250 0.588 0.000 0.059 0.350 0.670 0.236 0.496 T5’05 0.014 0.445 0.038 0.719 0.000 0.016 0.041 0.728 0.039 0.744 T5’05(s) 0.000 0.606 0.000 0.639 0.000 0.309 0.002 0.720 0.002 0.672 T5’09 0.341 0.609 0.346 0.623 0.000 0.016 0.348 0.705 0.340 0.608 T6’05 0.002 0.682 0.012 0.670 0.000 0.185 0.025 0.725 0.025 0.686 T6’09 0.021 0.691 0.050 0.682 0.000 0.119 0.077 0.729 0.072 0.685 T7’05 0.275 0.637 0.275 0.644 0.000 0.003 0.165 0.545 0.281 0.649 T7’09 0.013 0.287 0.080 0.517 0.000 0.107 0.198 0.583 0.060 0.426 T8’05 0.057 0.311 0.237 0.541 0.000 0.038 0.363 0.627 0.191 0.457 T8’09 0.222 0.528 0.334 0.599 0.000 0.010 0.439 0.685 0.301 0.565 T9’05 0.019 0.697 0.064 0.688 0.000 0.109 0.082 0.735 0.091 0.700 T9’09 0.001 0.640 0.043 0.723 0.000 0.117 0.036 0.774 0.059 0.759 T10’05 0.197 0.675 0.211 0.678 0.000 0.052 0.190 0.719 0.231 0.682 T11’03 0.001 0.638 0.004 0.647 0.000 0.214 0.016 0.716 0.014 0.640 T14’07(1) 0.233 0.452 0.278 0.627 0.000 0.000 0.205 0.556 0.247 0.594 T15’07 0.018 0.111 0.120 0.235 0.000 0.000 0.452 0.610 0.334 0.475
Table 10: P-values from KS and AD test for arrival rate distributions.
Table 10and Table 11shows the significance values from GOF test and parameters of fitting distributions for peer arrival rate, respectively.
Session length is another important element to model churn in P2P systems. We find that the session length distributions are very different in communities of different types, as shown in Figure 7. We also find that the session length distributions in communities of similar types are very close, as shown in Figure8. Furthermore, the session length distribution does not change dramatically within one community over the years, as shown in Figure [?]. This result suggests a possible correlation between the session length distribution and the community type. Statistics of the session length in the traces analyzed in this section are shown in Table12.
Table 13and Table 14shows the significance values from GOF test and parameters of fitting distributions for session length, respectively.
5.3
Bandwidth characteristics
Bandwidth is one of the most frequently investigated properties in empirical P2P studies, as it is closely related to the service capacity of P2P systems. We find that the peer download speed differs significantly across P2P communities, and that the download speed has increased differently over the years in all measured communities, as shown in Figures10and11, respectively. Statistics of the peer download speed in the traces analyzed in this
0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF
Peer Arrival Rate (per hour)
IQR T1’03 T3’05 T7’05 T9’05 T11’04
Figure 5: CDF of the (hourly) peer arrival rate in 5 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale). 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 CDF T6’05 T6’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 T7’05 T7’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 CDF
Peer Arrival Rate (hourly) T8’05 T8’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 Peer Arrival Rate (hourly)
T9’05 T9’09
Figure 6: CDF of the (hourly) peer arrival rate in 4 communities measured in 2005 and 2009 (horizontal axis in logarithmic scale).
0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF
Session Length (min)
IQR T1’03 T3’05 T7’05 T9’05 T11’04
Figure 7: CDF of the peer session length in 5 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale). 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF
Session Length (min)
IQR
T6’09 T7’09 T8’09 T9’09
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 83.36 77.52 0.88 0.24 62.33 3.77 1.21 0.89 93.35 T2’05 11.63 11.55 0.99 0.16 9.75 1.94 1.02 1.10 10.52 T3’05 12.49 9.96 0.75 0.50 6.30 1.66 1.24 0.70 17.92 T5’05 4.70 5.30 1.71 -0.18 5.40 1.37 0.62 2.91 1.61 T5’05(s) 1.82 1.96 1.19 0.07 1.69 0.34 0.59 2.07 0.88 T5’09 9.68 9.25 0.92 0.22 7.41 1.69 1.05 1.00 9.66 T6’05 2.46 2.63 1.18 0.05 2.33 0.58 0.71 1.72 1.43 T6’09 3.23 3.38 1.11 0.07 3.01 0.78 0.82 1.43 2.27 T7’05 12.84 12.92 1.02 -0.01 12.99 1.98 1.23 1.01 12.76 T7’09 10.06 7.67 0.72 0.67 4.07 1.37 1.26 0.65 15.52 T8’05 16.65 12.95 0.72 0.65 7.16 1.87 1.33 0.65 25.72 T8’09 15.78 14.34 0.85 0.36 10.38 2.06 1.19 0.84 18.75 T9’05 3.19 3.39 1.16 0.02 3.14 0.80 0.80 1.52 2.10 T9’09 2.30 2.56 1.44 -0.06 2.42 0.61 0.63 2.40 0.96 T10’05 5.22 5.28 1.03 0.11 4.62 1.17 0.96 1.18 4.42 T11’03 2.44 2.53 1.08 0.12 2.12 0.53 0.72 1.54 1.58 T14’07(1) 1396.71 1546.83 1.51 -0.63 2272.43 6.95 0.86 1.84 758.95 T15’07 45.30 49.84 1.30 0.03 43.91 3.62 0.54 2.79 16.26
Table 11: parameters of fitting distributions for peer arrival rate in all traces.
0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 CDF T6’05 T6’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 T7’05 T7’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 CDF
Session Length (min) T8’05 T8’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 Session Length (min)
T9’05 T9’09
Figure 9: CDF of the peer session length in 4 communities measured in 2005 and 2009 (horizontal axis in logarithmic scale).
Trace Max Mean StDev Q1 Median Q3 IQR T1’03 50,264 1,013 2,357 92 404 1,065 973 T3’05 40,246 421 949 45 136 459 415 T7’05 42,630 332 1,003 35 85 285 250 T7’09 48,240 401 1,468 45 95 255 210 T8’05 62,220 355 1,473 40 80 250 210 T8’09 124,485 348 1,730 30 80 205 175 T9’05 80,310 366 2,529 40 75 190 150 T9’09 58,160 531 3,159 45 80 180 135 T11’04 64,799 1,551 3,004 240 660 1,500 1,260
Table 12: Session Length Statistics (minutes).
Trace Exponential Weibull Pareto Log-Normal Gamma T1’03 0.000 0.000 0.063 0.124 0.000 0.003 0.415 0.579 0.000 0.000 T2’05 0.000 0.000 0.015 0.082 0.000 0.000 0.081 0.204 0.007 0.039 T3’05 0.080 0.175 0.447 0.564 0.000 0.001 0.459 0.612 0.367 0.479 T5’05 0.046 0.246 0.020 0.391 0.000 0.049 0.101 0.519 0.042 0.315 T5’05(s) 0.190 0.406 0.389 0.568 0.000 0.008 0.453 0.652 0.313 0.497 T5’09 0.060 0.424 0.101 0.598 0.000 0.020 0.055 0.596 0.125 0.606 T6’05 0.023 0.158 0.310 0.521 0.000 0.015 0.434 0.654 0.202 0.362 T6’09 0.032 0.200 0.278 0.506 0.000 0.011 0.434 0.664 0.179 0.374 T7’05 0.028 0.146 0.385 0.574 0.000 0.009 0.433 0.642 0.253 0.435 T7’09 0.010 0.074 0.308 0.514 0.000 0.006 0.449 0.644 0.159 0.304 T8’05 0.015 0.098 0.310 0.528 0.000 0.006 0.420 0.646 0.173 0.337 T8’09 0.009 0.066 0.347 0.541 0.000 0.013 0.419 0.642 0.181 0.330 T9’05 0.004 0.048 0.249 0.478 0.000 0.004 0.377 0.637 0.103 0.246 T9’09 0.000 0.011 0.187 0.412 0.000 0.007 0.352 0.623 0.044 0.140 T10’05 0.024 0.105 0.399 0.561 0.000 0.007 0.448 0.645 0.247 0.392 T11’03 0.169 0.350 0.402 0.565 0.000 0.002 0.445 0.632 0.311 0.471 T14’07(1) 0.042 0.143 0.458 0.589 0.000 0.025 0.448 0.637 0.332 0.458 T15’07 0.049 0.108 0.384 0.496 0.000 0.002 0.449 0.581 0.282 0.402
Table 13: P-values from KS and AD test for session length distributions. section are shown in Table15.
Table 16and Table 17shows the significance values from GOF test and parameters of fitting distributions for download speed, respectively.
We find that the difference of peer upload speed across P2P communities is not as significant as that of peer download speed, as shown in Figure12. Surprisingly, we find that the peer upload speed has not increased but decreased in some communities over these years, in particular, the average peer upload speed In tml-project (T6’05, ’09) has even decreased dramatically, as shown in Figure13. Statistics of the peer upload speed in the traces analyzed in this section are shown in Table18.
Table 19and Table 20shows the significance values from GOF test and parameters of fitting distributions for upload speed, respectively.
5.4
Peer Sharing Behavior
Understanding the peer sharing behavior is key to obtain in-depth knowledge of the usage patterns and to model user behavior in P2P systems. We analyze the peer sharing behavior with three metrics. First, the download completion is the percentage of a file that is downloaded in a single session. Second, the seeding time is the amount of time a seeder (that is, a peer who entered the system with the complete copy of a file) stays in the system. And third, the seeding-after-leeching time is the amount of time a peer stays in the system after finishing its download.
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 5214005443.99 612.27 0.34 0.68 404.67 5.72 1.71 0.05 99619224948.77 T2’05 102.37 30.96 0.48 1.16 9.24 2.52 1.62 0.32 317.12 T3’05 421.00 295.93 0.66 0.73 160.19 4.90 1.60 0.55 766.84 T5’05 6525.77 5602.84 0.82 0.43 3636.19 8.07 1.01 0.83 7836.69 T5’05(s) 1268.09 1062.80 0.78 0.47 701.28 6.32 1.29 0.73 1746.03 T5’09 1863.87 1486.82 0.69 0.44 1152.65 6.45 1.82 0.57 3241.96 T6’05 464.76 287.93 0.63 0.76 143.79 4.91 1.47 0.51 907.91 T6’09 368.31 246.24 0.67 0.64 136.23 4.80 1.36 0.56 654.21 T7’05 332.11 205.71 0.62 0.80 102.47 4.52 1.61 0.49 671.05 T7’09 401.39 222.56 0.60 0.79 109.28 4.61 1.56 0.46 864.87 T8’05 355.06 203.09 0.61 0.77 101.72 4.53 1.56 0.48 746.44 T8’09 348.33 181.91 0.59 0.79 90.43 4.39 1.61 0.44 786.58 T9’05 366.15 175.47 0.59 0.72 90.45 4.40 1.49 0.43 844.97 T9’09 531.34 200.80 0.55 0.76 98.66 4.50 1.55 0.38 1416.36 T10’05 413.45 238.85 0.61 0.79 119.99 4.67 1.61 0.47 874.30 T11’03 1550.52 1249.08 0.75 0.49 810.54 6.46 1.33 0.68 2267.88 T14’07 87.86 53.89 0.60 0.96 23.56 3.12 1.74 0.47 186.26 T15’07 976.15 651.81 0.64 0.79 325.15 5.69 1.57 0.53 1847.40
Table 14: parameters of fitting distributions for session length.
Trace Max Mean StDev Q1 Median Q3 IQR
T1’03 6,019 103 127 28 62 133 105 T3’05 24,851 317 493 68 167 373 305 T6’05 9,490 189 379 23 68 186 163 T6’09 8,484 343 758 29 98 334 305 T7’05 23,115 333 585 50 139 366 316 T7’09 102,621 1,231 3,149 119 404 1,210 1,091 T8’05 19,091 515 783 86 247 653 567 T8’09 116,042 1,313 2,396 158 531 1,390 1,232 T9’05 7,062 366 521 48 168 451 403 T9’09 18,002 1,015 1,911 113 376 1,036 923 T11’04 61,608 131 271 23 68 157 134
Table 15: Peer Download Speed Statistics (Kbps).
Trace Exponential Weibull Pareto Log-Normal Gamma T1’03 0.473 0.582 0.471 0.591 0.000 0.000 0.479 0.608 0.468 0.579 T2’05 0.042 0.109 0.416 0.526 0.000 0.002 0.503 0.629 0.295 0.403 T3’05 0.361 0.484 0.471 0.595 0.000 0.000 0.385 0.537 0.445 0.568 T5’05 0.180 0.296 0.428 0.590 0.000 0.003 0.307 0.483 0.463 0.605 T5’05(s) 0.442 0.546 0.479 0.618 0.000 0.000 0.343 0.516 0.461 0.593 T5’09 0.275 0.419 0.475 0.610 0.000 0.000 0.442 0.589 0.440 0.572 T6’05 0.117 0.215 0.452 0.583 0.000 0.001 0.447 0.598 0.397 0.519 T6’09 0.038 0.103 0.459 0.601 0.000 0.003 0.461 0.597 0.344 0.494 T7’05 0.184 0.310 0.462 0.600 0.000 0.001 0.385 0.557 0.401 0.544 T7’09 0.074 0.157 0.484 0.593 0.000 0.003 0.472 0.600 0.394 0.516 T8’05 0.271 0.406 0.496 0.622 0.000 0.001 0.401 0.541 0.465 0.591 T8’09 0.149 0.256 0.481 0.603 0.000 0.002 0.435 0.584 0.448 0.567 T9’05 0.212 0.347 0.495 0.610 0.000 0.001 0.436 0.577 0.486 0.604 T9’09 0.113 0.212 0.491 0.599 0.000 0.002 0.464 0.606 0.400 0.522 T10’05 0.205 0.336 0.490 0.616 0.000 0.001 0.404 0.543 0.466 0.586 T11’03 0.299 0.428 0.493 0.611 0.000 0.001 0.402 0.560 0.461 0.584
0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 CDF Download Speed (kbps) IQR T1’03 T3’05 T9’05 T11’04
Figure 10: CDF of the peer download speed in 5 traces collected between 2003 and 2005.
0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 CDF T6’05 T6’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 T7’05 T7’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 CDF Download Speed (kbps) T8’05 T8’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 104 Download Speed (kbps) T9’05 T9’09
Figure 11: CDF of the peer download speed in 4 communities measured in 2009.
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 102.88 99.93 0.94 0.15 87.55 4.03 1.21 0.96 106.68 T2’05 0.82 0.52 0.63 0.75 0.27 -1.45 1.61 0.51 1.62 T3’05 317.03 276.97 0.80 0.32 216.79 4.93 1.58 0.73 435.46 T5’05 18.66 13.81 0.63 0.62 9.33 1.67 2.16 0.50 37.03 T5’05(s) 168.79 151.32 0.82 0.24 127.84 4.33 1.58 0.75 225.46 T5’09 253.28 216.49 0.78 0.39 160.76 4.66 1.57 0.69 365.24 T6’05 188.69 133.84 0.66 0.67 77.31 4.08 1.75 0.54 349.10 T6’09 342.76 210.06 0.59 0.92 98.33 4.43 1.99 0.46 750.56 T7’05 333.38 254.11 0.69 0.56 165.16 4.72 1.85 0.57 582.91 T7’09 1231.29 811.31 0.62 0.79 426.68 5.83 1.91 0.49 2490.25 T8’05 515.05 423.39 0.73 0.42 314.70 5.27 1.79 0.63 820.32 T8’09 1313.12 968.77 0.67 0.64 585.86 6.05 1.80 0.55 2370.87 T9’05 365.97 290.08 0.70 0.53 196.62 4.85 1.81 0.59 620.61 T9’09 1014.64 717.72 0.65 0.71 405.07 5.74 1.78 0.53 1902.28 T10’05 372.86 289.99 0.70 0.52 197.79 4.86 1.81 0.59 636.32 T11’03 131.31 110.11 0.76 0.38 83.57 3.96 1.65 0.66 197.54
0 0.2 0.4 0.6 0.8 1 0.1 1 10 100 1000 CDF Upload Speed (kbps) IQR T3’05 T6’09 T9’05 T11’04
Figure 12: Comparison of the peer upload speed distributions in 4 traces collected in 2005 (horizontal axis in logarithmic scale). 0 0.2 0.4 0.6 0.8 1 100 101 102 103 CDF T6’05 T6’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 T7’05 T7’09 0 0.2 0.4 0.6 0.8 1 100 101 102 103 CDF Upload Speed (kbps) T8’05 T8’09 0.2 0.4 0.6 0.8 1 100 101 102 103 Upload Speed (kbps) T9’05 T9’09
Figure 13: Comparison of the peer upload speed distributions in 4 communities measured in 2009 (horizontal axis in logarithmic scale).
Trace Max Mean StDev Q1 Median Q3 IQR T3’05 106,324 85 475 6 22 70 65 T6’05 13,162 41 139 3 14 40 37 T6’09 8,319 17 86 1 3 10 10 T7’05 11,539 42 129 5 18 46 40 T7’09 97,748 82 873 5 18 64 59 T8’05 46,679 53 289 4 18 54 50 T8’09 104,755 59 651 3 12 42 39 T9’05 11,744 24 103 1 7 24 23 T9’09 1,475 20 58 1 5 18 17 T11’04 15,307 85 212 12 38 92 80
Table 18: Peer Upload Speed Statistics (Kbps).
Trace Exponential Weibull Pareto Log-Normal Gamma T3’05 0.000 0.000 0.085 0.159 0.001 0.005 0.448 0.587 0.000 0.000 T5’05 0.058 0.131 0.479 0.610 0.002 0.009 0.380 0.533 0.500 0.602 T5’05(s) 0.000 0.000 0.127 0.216 0.000 0.001 0.338 0.493 0.000 0.000 T5’09 0.095 0.181 0.488 0.607 0.001 0.004 0.417 0.577 0.405 0.543 T6’05 0.066 0.136 0.492 0.603 0.000 0.003 0.431 0.578 0.430 0.536 T6’09 0.005 0.016 0.474 0.579 0.001 0.009 0.483 0.614 0.266 0.375 T7’05 0.186 0.282 0.486 0.592 0.000 0.001 0.432 0.568 0.454 0.552 T7’09 0.015 0.046 0.490 0.601 0.001 0.004 0.435 0.582 0.327 0.432 T8’05 0.085 0.160 0.507 0.608 0.000 0.003 0.397 0.555 0.449 0.538 T8’09 0.007 0.028 0.478 0.592 0.002 0.009 0.457 0.606 0.335 0.447 T9’05 0.034 0.086 0.502 0.620 0.002 0.010 0.403 0.571 0.470 0.561 T9’09 0.029 0.074 0.490 0.608 0.001 0.007 0.441 0.592 0.401 0.516 T10’05 0.165 0.267 0.507 0.617 0.000 0.001 0.406 0.555 0.465 0.564 T11’03 0.198 0.304 0.475 0.604 0.000 0.001 0.377 0.535 0.438 0.553 T12’04 0.000 0.000 0.006 0.107 0.000 0.002 0.055 0.464 0.000 0.018
Table 19: P-values from KS and AD test for upload speed distributions.
We find that the download completion distributions differ significantly in communities of different types, as shown in Figure14. Merely 20% of the sessions in SuprNova (T1’03) download more than 50% of the file. In contrast, more than 40% of the sessions in Filelist, transamrit, id Software, and alluvion (T3’05, T7’05, T9’05, T11’04) download more than 50% percent of the file, and around 20% of the sessions complete the download. Although the reason for the low download completion in SuprNova is not clear, this result suggests the prevalence of the multi-session download behavior in this community. We also find that the download completion distributions in some communities change significantly over time, and the evolution trend is different among communities. In tlm-project and id Software (T6’05,’09, T9’05,’09) most of the sessions download much more of a file in 2009 than in 2005. In contrast, in transamrit (T7’05, ’09) most of the sessions download less in one session in 2009 than 2005. And the download completion distributions in unix-ag.uni-kl (T8’05, ’09) do not change very much between 2005 and 2009, as shown in Figure15. Statistics of the download completion in the traces analyzed in this section are shown in Table21.
Table 22and Table 23shows the significance values from GOF test and parameters of fitting distributions for download completion, respectively.
We find that the seeding time distributions are very different in communities of different types. Most of the seeders in alluvion (T11’04) seed for several hours, while most of the seeders in id Software (T9’05) seed around only one hour, as shown in Figure16. We also find that the seeding time distributions in most communities do not change very much over years, except in id Software (T9’05, ’09), where the seeding time is considerably longer in 2009 than in 2005, as shown in Figure17. Another noticeable finding is that the ratio of the number of seeding sessions to the total number of sessions is very different across communities. In Filelist (T3’05) and
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T3’05 1516386634.83 38.81 0.31 0.94 20.61 2.84 2.03 0.05 31591083797.19 T5’05 23.14 13.99 0.55 1.16 5.69 1.56 2.37 0.41 55.80 T5’05(s) 9306810.31 62.14 0.37 0.48 50.58 3.36 1.97 0.07 138803130.51 T5’09 51.67 34.55 0.62 0.79 18.59 2.64 1.97 0.49 105.80 T6’05 40.91 25.42 0.60 0.82 13.36 2.33 1.96 0.46 88.22 T6’09 16.80 6.53 0.50 1.34 2.03 0.82 2.17 0.34 49.64 T7’05 41.99 30.29 0.67 0.53 20.26 2.61 1.74 0.55 75.77 T7’09 82.30 38.40 0.55 1.02 16.76 2.67 2.10 0.38 215.78 T8’05 52.63 32.53 0.60 0.78 17.99 2.56 2.00 0.46 114.45 T8’09 59.26 25.55 0.52 1.23 9.07 2.20 2.19 0.36 166.21 T9’05 24.05 13.20 0.54 1.24 4.80 1.53 2.25 0.40 60.32 T9’09 20.35 10.87 0.54 1.16 4.14 1.37 2.15 0.40 50.71 T10’05 54.52 38.88 0.66 0.56 25.61 2.83 1.82 0.54 101.41 T11’03 85.38 63.05 0.68 0.51 43.33 3.33 1.81 0.56 152.52 T12’04 455.16 3.53 0.44 0.73 1.52 0.46 1.40 0.14 3312.31
Table 20: parameters of fitting distributions for upload speed.
0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 CDF Download Completion (%) T1’03 T3’05 T7’05 T9’05 T11’04
Figure 14: CDF of the download completion of traces collected between 2003 and 2005.
0 0.2 0.4 0.6 0.8 1 0 25 50 75 100 CDF T6’05 T6’09 0 0.2 0.4 0.6 0.8 1 25 50 75 100 T7’05 T7’09 0 0.2 0.4 0.6 0.8 1 25 50 75 100 CDF Download Completion (%) T8’05 T8’09 0 0.2 0.4 0.6 0.8 1 25 50 75 100 Download Completion (%) T9’05 T9’09
Trace Mean StDev Q1 Median Q3 IQR T1’03 24 31 2 10 34 32 T3’05 59 41 13 69 100 87 T9’05 68 38 29 91 100 71 T9’09 77 34 55 100 100 45 T6’05 41 39 4 25 87 83 T6’09 68 39 28 98 100 72 T7’05 49 41 7 39 100 93 T7’09 39 39 4 20 82 78 T11’04 66 34 38 79 97 59 T8’04 59 40 15 68 100 85 T8’09 62 40 17 82 100 83
Table 21: Download Completion Statistics (values represent download completion percentages).
Trace Exponential Weibull Pareto Log-Normal Gamma T1’03 0.098 0.207 0.469 0.593 0.000 0.003 0.422 0.564 0.432 0.569 T2’05 0.022 0.067 0.280 0.486 0.002 0.004 0.308 0.487 0.274 0.490 T3’05 0.069 0.323 0.069 0.308 0.000 0.000 0.038 0.311 0.070 0.346 T5’05 0.285 0.460 0.369 0.525 0.000 0.000 0.218 0.368 0.401 0.560 T5’05(s) 0.094 0.244 0.146 0.300 0.000 0.000 0.036 0.153 0.122 0.262 T5’09 0.151 0.286 0.388 0.539 0.000 0.000 0.379 0.538 0.389 0.546 T6’05 0.103 0.225 0.261 0.437 0.000 0.000 0.191 0.411 0.277 0.457 T6’09 0.023 0.269 0.025 0.233 0.000 0.000 0.010 0.233 0.024 0.263 T7’05 0.113 0.272 0.173 0.379 0.000 0.000 0.079 0.370 0.194 0.418 T7’09 0.095 0.230 0.294 0.491 0.000 0.000 0.281 0.518 0.310 0.511 T8’05 0.077 0.278 0.081 0.292 0.000 0.000 0.024 0.261 0.074 0.299 T8’09 0.048 0.230 0.047 0.233 0.000 0.000 0.019 0.237 0.048 0.254 T9’05 0.025 0.219 0.034 0.212 0.000 0.000 0.008 0.174 0.029 0.220 T9’09 0.004 0.127 0.007 0.185 0.000 0.000 0.002 0.152 0.005 0.169 T10’05 0.120 0.289 0.168 0.379 0.000 0.000 0.078 0.345 0.176 0.408 T11’03 0.034 0.185 0.096 0.251 0.000 0.000 0.015 0.166 0.059 0.227
Table 22: P-values from KS and AD test for download completion distributions.
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 24.31 18.01 0.64 0.83 9.59 1.98 1.94 0.52 46.52 T2’05 33.42 21.95 0.56 1.55 6.07 2.03 2.25 0.44 76.28 T3’05 58.65 58.37 0.99 -1.22 121.59 3.37 1.69 0.84 69.68 T5’05 32.51 29.65 0.81 -0.29 42.85 2.60 1.86 0.69 47.38 T5’05(s) 58.12 62.13 1.35 -1.22 121.80 3.61 1.36 1.24 46.93 T5’09 32.92 27.33 0.72 0.41 21.52 2.47 1.87 0.60 54.63 T6’05 41.18 34.24 0.70 -1.16 115.88 2.63 2.07 0.57 72.22 T6’09 67.84 71.70 1.29 -1.11 111.43 3.71 1.49 1.12 60.58 T7’05 48.74 43.53 0.77 -1.25 124.92 2.90 2.12 0.63 77.91 T7’09 38.58 32.15 0.72 0.20 31.56 2.61 1.94 0.59 64.92 T8’05 58.95 58.34 0.97 -1.50 150.45 3.34 1.90 0.80 73.51 T8’09 62.23 63.32 1.06 -1.32 132.35 3.48 1.71 0.90 68.84 T9’05 67.61 71.44 1.30 -1.16 115.83 3.70 1.54 1.12 60.55 T9’09 77.18 84.37 2.00 -2.16 215.73 4.05 1.14 1.84 41.89 T10’05 49.90 45.66 0.80 -1.28 127.79 2.99 2.01 0.66 75.31 T11’03 65.99 71.86 1.65 -1.48 148.24 3.83 1.22 1.55 42.60
0 0.2 0.4 0.6 0.8 1 10 100 1000 10000 CDF
Seeding Time (min)
IQR T1’03 T3’05 T7’05 T9’05 T11’04
Figure 16: CDF of the seeding time in 5 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale). 0 0.2 0.4 0.6 0.8 1 101 102 103 104 105 CDF T6’05 T6’09 0 0.2 0.4 0.6 0.8 1 101 102 103 104 105 T7’05 T7’09 0 0.2 0.4 0.6 0.8 1 101 102 103 104 105 CDF
Seeding Time (min) T8’05 T8’09 0 0.2 0.4 0.6 0.8 1 101 102 103 104 105 Seeding Time (min)
T9’05 T9’09
Figure 17: CDF of the seeding time in 4 communities measured in 2009 (horizontal axis in logarithmic scale).
alluvion(T11’04) more than 50% of all sessions are seeding sessions. In contrast, this percentage is less than 5 in other communities, as shown in the column Ratio in Table24.
Table 25and Table 26shows the significance values from GOF test and parameters of fitting distributions for seeding time, respectively.
Similar to the seeding time, we find that the seeding-after-leeching time distributions differ significantly in communities of different types, as shown in Figure 18. Noticeably, the seeding-after-leeching time of around 10% of seeding-after-leeching sessions is shorter than one minute, which means that these peers leave the system almost immediately after finishing their downloads. We also find that there is no significant change of the seeding-after-leeching time distributions over time within the same communities, as shown in Figure 19. Furthermore, the difference of the ratio of the number of seeding-after-leeching sessions to the total number of sessions across P2P communities is not as significant as that of seeding sessions, and the ratio is below 20% in all measured communities, as shown in Table27.