Understanding the peer sharing behavior is key to obtain in-depth knowledge of the usage patterns and to model user behavior in P2P systems. We analyze the peer sharing behavior with three metrics. First, the download completion is the percentage of a file that is downloaded in a single session. Second, the seeding time is the amount of time a seeder (that is, a peer who entered the system with the complete copy of a file) stays in the system. And third, the seeding-after-leeching time is the amount of time a peer stays in the system after finishing its download.
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 5214005443.99 612.27 0.34 0.68 404.67 5.72 1.71 0.05 99619224948.77
T2’05 102.37 30.96 0.48 1.16 9.24 2.52 1.62 0.32 317.12
T3’05 421.00 295.93 0.66 0.73 160.19 4.90 1.60 0.55 766.84 T5’05 6525.77 5602.84 0.82 0.43 3636.19 8.07 1.01 0.83 7836.69 T5’05(s) 1268.09 1062.80 0.78 0.47 701.28 6.32 1.29 0.73 1746.03 T5’09 1863.87 1486.82 0.69 0.44 1152.65 6.45 1.82 0.57 3241.96 T6’05 464.76 287.93 0.63 0.76 143.79 4.91 1.47 0.51 907.91 T6’09 368.31 246.24 0.67 0.64 136.23 4.80 1.36 0.56 654.21 T7’05 332.11 205.71 0.62 0.80 102.47 4.52 1.61 0.49 671.05 T7’09 401.39 222.56 0.60 0.79 109.28 4.61 1.56 0.46 864.87 T8’05 355.06 203.09 0.61 0.77 101.72 4.53 1.56 0.48 746.44 T8’09 348.33 181.91 0.59 0.79 90.43 4.39 1.61 0.44 786.58 T9’05 366.15 175.47 0.59 0.72 90.45 4.40 1.49 0.43 844.97 T9’09 531.34 200.80 0.55 0.76 98.66 4.50 1.55 0.38 1416.36 T10’05 413.45 238.85 0.61 0.79 119.99 4.67 1.61 0.47 874.30 T11’03 1550.52 1249.08 0.75 0.49 810.54 6.46 1.33 0.68 2267.88 T14’07 87.86 53.89 0.60 0.96 23.56 3.12 1.74 0.47 186.26 T15’07 976.15 651.81 0.64 0.79 325.15 5.69 1.57 0.53 1847.40
Table 14: parameters of fitting distributions for session length.
Trace Max Mean StDev Q1 Median Q3 IQR
T1’03 6,019 103 127 28 62 133 105
T3’05 24,851 317 493 68 167 373 305
T6’05 9,490 189 379 23 68 186 163
T6’09 8,484 343 758 29 98 334 305
T7’05 23,115 333 585 50 139 366 316
T7’09 102,621 1,231 3,149 119 404 1,210 1,091
T8’05 19,091 515 783 86 247 653 567
T8’09 116,042 1,313 2,396 158 531 1,390 1,232
T9’05 7,062 366 521 48 168 451 403
T9’09 18,002 1,015 1,911 113 376 1,036 923
T11’04 61,608 131 271 23 68 157 134
Table 15: Peer Download Speed Statistics (Kbps).
Trace Exponential Weibull Pareto Log-Normal Gamma
T1’03 0.473 0.582 0.471 0.591 0.000 0.000 0.479 0.608 0.468 0.579 T2’05 0.042 0.109 0.416 0.526 0.000 0.002 0.503 0.629 0.295 0.403 T3’05 0.361 0.484 0.471 0.595 0.000 0.000 0.385 0.537 0.445 0.568 T5’05 0.180 0.296 0.428 0.590 0.000 0.003 0.307 0.483 0.463 0.605 T5’05(s) 0.442 0.546 0.479 0.618 0.000 0.000 0.343 0.516 0.461 0.593 T5’09 0.275 0.419 0.475 0.610 0.000 0.000 0.442 0.589 0.440 0.572 T6’05 0.117 0.215 0.452 0.583 0.000 0.001 0.447 0.598 0.397 0.519 T6’09 0.038 0.103 0.459 0.601 0.000 0.003 0.461 0.597 0.344 0.494 T7’05 0.184 0.310 0.462 0.600 0.000 0.001 0.385 0.557 0.401 0.544 T7’09 0.074 0.157 0.484 0.593 0.000 0.003 0.472 0.600 0.394 0.516 T8’05 0.271 0.406 0.496 0.622 0.000 0.001 0.401 0.541 0.465 0.591 T8’09 0.149 0.256 0.481 0.603 0.000 0.002 0.435 0.584 0.448 0.567 T9’05 0.212 0.347 0.495 0.610 0.000 0.001 0.436 0.577 0.486 0.604 T9’09 0.113 0.212 0.491 0.599 0.000 0.002 0.464 0.606 0.400 0.522 T10’05 0.205 0.336 0.490 0.616 0.000 0.001 0.404 0.543 0.466 0.586 T11’03 0.299 0.428 0.493 0.611 0.000 0.001 0.402 0.560 0.461 0.584
Table 16: P-values from KS and AD test for download speed distributions.
0
Figure 10: CDF of the peer download speed in 5 traces collected between 2003 and 2005.
0
Figure 11: CDF of the peer download speed in 4 communities measured in 2009.
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 102.88 99.93 0.94 0.15 87.55 4.03 1.21 0.96 106.68 T2’05 0.82 0.52 0.63 0.75 0.27 -1.45 1.61 0.51 1.62 T3’05 317.03 276.97 0.80 0.32 216.79 4.93 1.58 0.73 435.46 T5’05 18.66 13.81 0.63 0.62 9.33 1.67 2.16 0.50 37.03 T5’05(s) 168.79 151.32 0.82 0.24 127.84 4.33 1.58 0.75 225.46
T5’09 253.28 216.49 0.78 0.39 160.76 4.66 1.57 0.69 365.24 T6’05 188.69 133.84 0.66 0.67 77.31 4.08 1.75 0.54 349.10 T6’09 342.76 210.06 0.59 0.92 98.33 4.43 1.99 0.46 750.56 T7’05 333.38 254.11 0.69 0.56 165.16 4.72 1.85 0.57 582.91 T7’09 1231.29 811.31 0.62 0.79 426.68 5.83 1.91 0.49 2490.25 T8’05 515.05 423.39 0.73 0.42 314.70 5.27 1.79 0.63 820.32 T8’09 1313.12 968.77 0.67 0.64 585.86 6.05 1.80 0.55 2370.87 T9’05 365.97 290.08 0.70 0.53 196.62 4.85 1.81 0.59 620.61 T9’09 1014.64 717.72 0.65 0.71 405.07 5.74 1.78 0.53 1902.28 T10’05 372.86 289.99 0.70 0.52 197.79 4.86 1.81 0.59 636.32 T11’03 131.31 110.11 0.76 0.38 83.57 3.96 1.65 0.66 197.54
Table 17: parameters of distributions for download speed in all traces.
0
Figure 12: Comparison of the peer upload speed distributions in 4 traces collected in 2005 (horizontal axis in logarithmic scale).
Figure 13: Comparison of the peer upload speed distributions in 4 communities measured in 2009 (horizontal axis in logarithmic scale).
Trace Max Mean StDev Q1 Median Q3 IQR
T3’05 106,324 85 475 6 22 70 65
T6’05 13,162 41 139 3 14 40 37
T6’09 8,319 17 86 1 3 10 10
T7’05 11,539 42 129 5 18 46 40
T7’09 97,748 82 873 5 18 64 59
T8’05 46,679 53 289 4 18 54 50
T8’09 104,755 59 651 3 12 42 39
T9’05 11,744 24 103 1 7 24 23
T9’09 1,475 20 58 1 5 18 17
T11’04 15,307 85 212 12 38 92 80
Table 18: Peer Upload Speed Statistics (Kbps).
Trace Exponential Weibull Pareto Log-Normal Gamma
T3’05 0.000 0.000 0.085 0.159 0.001 0.005 0.448 0.587 0.000 0.000 T5’05 0.058 0.131 0.479 0.610 0.002 0.009 0.380 0.533 0.500 0.602 T5’05(s) 0.000 0.000 0.127 0.216 0.000 0.001 0.338 0.493 0.000 0.000 T5’09 0.095 0.181 0.488 0.607 0.001 0.004 0.417 0.577 0.405 0.543 T6’05 0.066 0.136 0.492 0.603 0.000 0.003 0.431 0.578 0.430 0.536 T6’09 0.005 0.016 0.474 0.579 0.001 0.009 0.483 0.614 0.266 0.375 T7’05 0.186 0.282 0.486 0.592 0.000 0.001 0.432 0.568 0.454 0.552 T7’09 0.015 0.046 0.490 0.601 0.001 0.004 0.435 0.582 0.327 0.432 T8’05 0.085 0.160 0.507 0.608 0.000 0.003 0.397 0.555 0.449 0.538 T8’09 0.007 0.028 0.478 0.592 0.002 0.009 0.457 0.606 0.335 0.447 T9’05 0.034 0.086 0.502 0.620 0.002 0.010 0.403 0.571 0.470 0.561 T9’09 0.029 0.074 0.490 0.608 0.001 0.007 0.441 0.592 0.401 0.516 T10’05 0.165 0.267 0.507 0.617 0.000 0.001 0.406 0.555 0.465 0.564 T11’03 0.198 0.304 0.475 0.604 0.000 0.001 0.377 0.535 0.438 0.553 T12’04 0.000 0.000 0.006 0.107 0.000 0.002 0.055 0.464 0.000 0.018
Table 19: P-values from KS and AD test for upload speed distributions.
We find that the download completion distributions differ significantly in communities of different types, as shown in Figure14. Merely 20% of the sessions in SuprNova (T1’03) download more than 50% of the file.
In contrast, more than 40% of the sessions in Filelist, transamrit, id Software, and alluvion (T3’05, T7’05, T9’05, T11’04) download more than 50% percent of the file, and around 20% of the sessions complete the download. Although the reason for the low download completion in SuprNova is not clear, this result suggests the prevalence of the multi-session download behavior in this community. We also find that the download completion distributions in some communities change significantly over time, and the evolution trend is different among communities. In tlm-project and id Software (T6’05,’09, T9’05,’09) most of the sessions download much more of a file in 2009 than in 2005. In contrast, in transamrit (T7’05, ’09) most of the sessions download less in one session in 2009 than 2005. And the download completion distributions in unix-ag.uni-kl (T8’05, ’09) do not change very much between 2005 and 2009, as shown in Figure15. Statistics of the download completion in the traces analyzed in this section are shown in Table21.
Table 22and Table 23shows the significance values from GOF test and parameters of fitting distributions for download completion, respectively.
We find that the seeding time distributions are very different in communities of different types. Most of the seeders in alluvion (T11’04) seed for several hours, while most of the seeders in id Software (T9’05) seed around only one hour, as shown in Figure16. We also find that the seeding time distributions in most communities do not change very much over years, except in id Software (T9’05, ’09), where the seeding time is considerably longer in 2009 than in 2005, as shown in Figure17. Another noticeable finding is that the ratio of the number of seeding sessions to the total number of sessions is very different across communities. In Filelist (T3’05) and
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T3’05 1516386634.83 38.81 0.31 0.94 20.61 2.84 2.03 0.05 31591083797.19 T5’05 23.14 13.99 0.55 1.16 5.69 1.56 2.37 0.41 55.80 T5’05(s) 9306810.31 62.14 0.37 0.48 50.58 3.36 1.97 0.07 138803130.51
T5’09 51.67 34.55 0.62 0.79 18.59 2.64 1.97 0.49 105.80 T6’05 40.91 25.42 0.60 0.82 13.36 2.33 1.96 0.46 88.22
T6’09 16.80 6.53 0.50 1.34 2.03 0.82 2.17 0.34 49.64
T7’05 41.99 30.29 0.67 0.53 20.26 2.61 1.74 0.55 75.77 T7’09 82.30 38.40 0.55 1.02 16.76 2.67 2.10 0.38 215.78 T8’05 52.63 32.53 0.60 0.78 17.99 2.56 2.00 0.46 114.45 T8’09 59.26 25.55 0.52 1.23 9.07 2.20 2.19 0.36 166.21 T9’05 24.05 13.20 0.54 1.24 4.80 1.53 2.25 0.40 60.32 T9’09 20.35 10.87 0.54 1.16 4.14 1.37 2.15 0.40 50.71 T10’05 54.52 38.88 0.66 0.56 25.61 2.83 1.82 0.54 101.41 T11’03 85.38 63.05 0.68 0.51 43.33 3.33 1.81 0.56 152.52 T12’04 455.16 3.53 0.44 0.73 1.52 0.46 1.40 0.14 3312.31
Table 20: parameters of fitting distributions for upload speed.
0
Figure 14: CDF of the download completion of traces collected between 2003 and 2005.
0
Figure 15: CDF of the download completion in 4 communities measured in 2005 and 2009.
Trace Mean StDev Q1 Median Q3 IQR
T1’03 24 31 2 10 34 32
T3’05 59 41 13 69 100 87
T9’05 68 38 29 91 100 71
T9’09 77 34 55 100 100 45
T6’05 41 39 4 25 87 83
T6’09 68 39 28 98 100 72
T7’05 49 41 7 39 100 93
T7’09 39 39 4 20 82 78
T11’04 66 34 38 79 97 59
T8’04 59 40 15 68 100 85
T8’09 62 40 17 82 100 83
Table 21: Download Completion Statistics (values represent download completion percentages).
Trace Exponential Weibull Pareto Log-Normal Gamma
T1’03 0.098 0.207 0.469 0.593 0.000 0.003 0.422 0.564 0.432 0.569 T2’05 0.022 0.067 0.280 0.486 0.002 0.004 0.308 0.487 0.274 0.490 T3’05 0.069 0.323 0.069 0.308 0.000 0.000 0.038 0.311 0.070 0.346 T5’05 0.285 0.460 0.369 0.525 0.000 0.000 0.218 0.368 0.401 0.560 T5’05(s) 0.094 0.244 0.146 0.300 0.000 0.000 0.036 0.153 0.122 0.262 T5’09 0.151 0.286 0.388 0.539 0.000 0.000 0.379 0.538 0.389 0.546 T6’05 0.103 0.225 0.261 0.437 0.000 0.000 0.191 0.411 0.277 0.457 T6’09 0.023 0.269 0.025 0.233 0.000 0.000 0.010 0.233 0.024 0.263 T7’05 0.113 0.272 0.173 0.379 0.000 0.000 0.079 0.370 0.194 0.418 T7’09 0.095 0.230 0.294 0.491 0.000 0.000 0.281 0.518 0.310 0.511 T8’05 0.077 0.278 0.081 0.292 0.000 0.000 0.024 0.261 0.074 0.299 T8’09 0.048 0.230 0.047 0.233 0.000 0.000 0.019 0.237 0.048 0.254 T9’05 0.025 0.219 0.034 0.212 0.000 0.000 0.008 0.174 0.029 0.220 T9’09 0.004 0.127 0.007 0.185 0.000 0.000 0.002 0.152 0.005 0.169 T10’05 0.120 0.289 0.168 0.379 0.000 0.000 0.078 0.345 0.176 0.408 T11’03 0.034 0.185 0.096 0.251 0.000 0.000 0.015 0.166 0.059 0.227
Table 22: P-values from KS and AD test for download completion distributions.
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 24.31 18.01 0.64 0.83 9.59 1.98 1.94 0.52 46.52 T2’05 33.42 21.95 0.56 1.55 6.07 2.03 2.25 0.44 76.28 T3’05 58.65 58.37 0.99 -1.22 121.59 3.37 1.69 0.84 69.68 T5’05 32.51 29.65 0.81 -0.29 42.85 2.60 1.86 0.69 47.38 T5’05(s) 58.12 62.13 1.35 -1.22 121.80 3.61 1.36 1.24 46.93 T5’09 32.92 27.33 0.72 0.41 21.52 2.47 1.87 0.60 54.63 T6’05 41.18 34.24 0.70 -1.16 115.88 2.63 2.07 0.57 72.22 T6’09 67.84 71.70 1.29 -1.11 111.43 3.71 1.49 1.12 60.58 T7’05 48.74 43.53 0.77 -1.25 124.92 2.90 2.12 0.63 77.91 T7’09 38.58 32.15 0.72 0.20 31.56 2.61 1.94 0.59 64.92 T8’05 58.95 58.34 0.97 -1.50 150.45 3.34 1.90 0.80 73.51 T8’09 62.23 63.32 1.06 -1.32 132.35 3.48 1.71 0.90 68.84 T9’05 67.61 71.44 1.30 -1.16 115.83 3.70 1.54 1.12 60.55 T9’09 77.18 84.37 2.00 -2.16 215.73 4.05 1.14 1.84 41.89 T10’05 49.90 45.66 0.80 -1.28 127.79 2.99 2.01 0.66 75.31 T11’03 65.99 71.86 1.65 -1.48 148.24 3.83 1.22 1.55 42.60
Table 23: parameters of fitting distributions for download completion.
0
Figure 16: CDF of the seeding time in 5 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale).
Figure 17: CDF of the seeding time in 4 communities measured in 2009 (horizontal axis in logarithmic scale).
alluvion(T11’04) more than 50% of all sessions are seeding sessions. In contrast, this percentage is less than 5 in other communities, as shown in the column Ratio in Table24.
Table 25and Table 26shows the significance values from GOF test and parameters of fitting distributions for seeding time, respectively.
Similar to the seeding time, we find that the seeding-after-leeching time distributions differ significantly in communities of different types, as shown in Figure 18. Noticeably, the seeding-after-leeching time of around 10% of seeding-after-leeching sessions is shorter than one minute, which means that these peers leave the system almost immediately after finishing their downloads. We also find that there is no significant change of the seeding-after-leeching time distributions over time within the same communities, as shown in Figure 19.
Furthermore, the difference of the ratio of the number of seeding-after-leeching sessions to the total number of sessions across P2P communities is not as significant as that of seeding sessions, and the ratio is below 20% in all measured communities, as shown in Table27.
Table 28and Table 29shows the significance values from GOF test and parameters of fitting distributions
Trace Ratio (%) Mean StDev Q1 Median Q3 IQR
T1’03 2.6 560 1,316 74 251 659 584
T3’05 61.6 392 950 40 113 400 360
T6’05 1.1 803 2,170 60 180 580 520
T6’09 1.9 534 1,551 50 110 380 330
T7’05 1.56 902 2,832 55 175 675 620
T7’09 2.0 1,145 3,187 45 120 714 669
T8’09 0.6 1,150 3,801 65 150 585 520
T9’05 2.7 332 972 30 70 191 161
T9’09 0.6 952 2,708 45 135 495 450
T11’04 79.0 1,425 2,765 240 600 1,380 1,140
Table 24: Seeding Time Statistics (minutes).
Trace Exponential Weibull Pareto Log-Normal Gamma
T1’03 0.199 0.309 0.489 0.596 0.000 0.001 0.428 0.577 0.484 0.561 T2’05 0.027 0.083 0.436 0.555 0.001 0.005 0.437 0.589 0.376 0.484 T3’05 0.055 0.136 0.451 0.560 0.000 0.001 0.483 0.617 0.326 0.445 T5’05 0.000 0.575 0.002 0.708 0.000 0.016 0.001 0.770 0.001 0.754 T5’05(s) 0.266 0.478 0.423 0.589 0.000 0.006 0.443 0.665 0.365 0.530 T5’09 0.057 0.723 0.097 0.726 0.000 0.034 0.030 0.569 0.095 0.725 T6’05 0.012 0.076 0.369 0.528 0.000 0.021 0.448 0.633 0.220 0.361 T6’09 0.009 0.103 0.277 0.503 0.000 0.020 0.421 0.637 0.169 0.351 T7’05 0.009 0.053 0.401 0.546 0.000 0.011 0.486 0.651 0.243 0.384 T7’09 0.001 0.029 0.184 0.439 0.000 0.014 0.299 0.559 0.104 0.265 T8’05 0.011 0.054 0.350 0.502 0.000 0.009 0.462 0.622 0.221 0.343 T8’09 0.001 0.019 0.214 0.411 0.000 0.006 0.362 0.582 0.105 0.236 T9’05 0.006 0.064 0.249 0.480 0.000 0.009 0.391 0.625 0.123 0.315 T9’09 0.001 0.018 0.312 0.506 0.000 0.017 0.450 0.629 0.144 0.298 T10’05 0.009 0.071 0.405 0.589 0.001 0.019 0.448 0.647 0.244 0.431 T11’03 0.183 0.360 0.398 0.571 0.000 0.003 0.441 0.644 0.322 0.486 T12’04 0.071 0.187 0.306 0.480 0.002 0.011 0.182 0.330 0.336 0.526 T13’03 0.000 0.015 0.455 0.625 0.000 0.000 0.069 0.135 0.336 0.434 T13’04 0.247 0.399 0.369 0.485 0.000 0.000 0.367 0.517 0.349 0.463
Table 25: P-values from KS and AD test for seeding time distributions.
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ)
T1’03 560.55 425.69 0.70 0.50 292.02 5.29 1.62 0.59 945.20 T2’05 211.93 129.59 0.58 1.16 47.40 3.94 1.87 0.46 465.51 T3’05 392.04 260.28 0.64 0.81 130.34 4.75 1.64 0.52 757.69 T5’05 1964.15 2234.00 2.06 -0.51 2841.21 7.48 0.43 4.85 405.15 T5’05(s) 1105.13 969.84 0.82 0.38 689.37 6.26 1.23 0.79 1396.34
T5’09 3205.32 3361.03 1.15 -0.18 3769.78 7.60 1.24 1.21 2651.78 T6’05 802.84 448.84 0.58 1.00 181.03 5.25 1.66 0.45 1783.69 T6’09 533.60 307.82 0.61 0.88 137.00 4.94 1.52 0.48 1120.17 T7’05 902.27 464.42 0.56 1.06 179.49 5.25 1.78 0.42 2154.25 T7’09 1145.25 474.20 0.50 1.37 128.58 5.18 1.88 0.36 3182.83 T8’05 1333.92 715.25 0.57 1.05 275.66 5.70 1.70 0.43 3068.43 T8’09 1150.28 504.58 0.53 1.12 173.99 5.33 1.70 0.39 2985.44 T9’05 332.04 177.63 0.58 0.89 77.99 4.36 1.59 0.45 743.47 T9’09 951.56 407.26 0.51 1.26 127.29 5.04 1.89 0.37 2587.97 T10’05 700.27 376.69 0.56 1.15 135.35 5.00 1.86 0.42 1663.78 T11’03 1426.24 1164.35 0.76 0.47 770.72 6.40 1.31 0.70 2033.16 T12’04 3165.14 2060.99 0.53 1.06 1039.14 6.41 2.78 0.40 7931.95 T13’03 1619.29 1774.43 3.67 -1.24 3092.07 7.31 0.60 6.16 262.72 T13’04 17424.71 16133.51 0.87 0.29 12720.26 9.06 1.26 0.84 20826.17
Table 26: parameters of fitting distributions for seeding time.
Trace Ratio (%) Mean StDev Q1 Median Q3 IQR
T1’03 14.8 312 1,174 21 103 296 275
T3’05 18.0 517 1,054 58 219 568 510
T6’05 7.4 345 823 20 100 360 340
T6’09 2.8 277 768 20 60 210 190
T7’05 11.3 333 693 25 120 380 355
T7’09 2.8 417 1,271 35 110 380 345
T8’05 12.7 351 1,017 25 100 360 335
T8’09 1.9 393 1,741 25 70 245 220
T9’05 13.4 182 424 10 55 165 155
T9’09 3.5 184 826 5 40 100 95
T11’04 12.7 1,823 3,495 240 660 1,800 1,560
Table 27: Seeding-after-Leeching Time Statistics (minutes).
Trace Exponential Weibull Pareto Log-Normal Gamma
T1’03 0.000 0.000 0.059 0.101 0.001 0.003 0.420 0.577 0.000 0.000 T2’05 0.384 0.507 0.498 0.614 0.000 0.000 0.412 0.579 0.487 0.599 T3’05 0.213 0.332 0.493 0.590 0.000 0.001 0.456 0.602 0.431 0.526 T5’05 0.000 0.521 0.002 0.660 0.000 0.015 0.001 0.732 0.002 0.724 T5’05(s) 0.141 0.420 0.369 0.602 0.000 0.004 0.443 0.664 0.320 0.565 T5’09 0.039 0.174 0.344 0.547 0.000 0.019 0.349 0.608 0.270 0.470 T6’05 0.113 0.319 0.447 0.589 0.000 0.012 0.458 0.654 0.356 0.505 T6’09 0.026 0.195 0.263 0.534 0.000 0.026 0.400 0.648 0.176 0.392 T7’05 0.164 0.334 0.485 0.621 0.000 0.004 0.438 0.627 0.427 0.553 T7’09 0.051 0.170 0.412 0.553 0.000 0.005 0.487 0.651 0.293 0.429 T8’05 0.084 0.226 0.454 0.587 0.000 0.006 0.471 0.649 0.379 0.496 T8’09 0.005 0.052 0.330 0.509 0.000 0.012 0.449 0.632 0.173 0.306 T9’05 0.076 0.261 0.416 0.600 0.000 0.009 0.425 0.646 0.319 0.507 T9’09 0.008 0.089 0.269 0.507 0.000 0.023 0.426 0.645 0.145 0.317 T10’05 0.138 0.299 0.470 0.605 0.000 0.006 0.466 0.636 0.400 0.536 T11’03 0.149 0.343 0.397 0.565 0.000 0.002 0.476 0.655 0.312 0.492
Table 28: P-values from KS and AD test for seeding-after-leeching time distributions.
for seeding-after-leeching time, respectively.
6 Identifying Peers and Sessions
Identifying peers and sessions in P2P traces is an important analysis step. Until now, no empirical study has examined how different ways of identifying peers and sessions in BitTorrent traces could influence the analysis results. In this section, we will investigate the effects on the analysis results of two parametric methods for identifying respectively peers and sessions in BitTorrent traces. To this end, we will use a subset of trace T1’03 and compare the following analysis results: the peer arrival rate, the session length, and the peer download speed (service capacity).
6.1 Peer Identification
A recent study [18] of a residential broadband network shows that users’ IP addresses are re-assigned very frequently: 50% of the IP addresses are re-assigned at least twice, and some even more than 10 times, in 24 hours. This means that the same IP address can be assigned to multiple peers over time, and that using the IP address as a permanent identifier for peers in P2P systems may lead to inaccurate analysis results. To examine how IP reassignment may influence the analysis results, we adopt the following simple approach when
0
Figure 18: CDF of the seeding-after-leeching time in 5 traces collected between 2003 and 2005 (horizontal axis in logarithmic scale).
Figure 19: CDF of the seeding-after-leeching time in 4 communities measured in 2009 (horizontal axis in logarithmic scale).
Trace Exp(µ) Wbl(λ, κ) Pareto LogN(µ, σ) Gam(κ, λ) T1’03 70042582.59 201.34 0.34 0.65 128.72 4.58 1.65 0.06 1103937114.54 T2’05 291.67 263.22 0.83 0.26 218.02 4.90 1.44 0.77 379.15 T3’05 540.02 422.78 0.72 0.52 277.93 5.31 1.51 0.63 860.04 T5’05 1940.11 2205.87 2.07 -0.49 2769.65 7.47 0.43 5.00 388.18 T5’05(s) 1854.76 1585.91 0.78 0.50 1043.46 6.68 1.40 0.71 2609.84
T5’09 1338.23 932.23 0.64 0.99 397.30 6.00 1.67 0.52 2549.41 T6’05 396.60 295.03 0.70 0.61 175.60 4.96 1.45 0.60 657.33 T6’09 314.89 204.22 0.65 0.76 101.67 4.58 1.42 0.54 588.02 T7’05 370.61 285.66 0.71 0.55 183.02 4.90 1.55 0.61 607.57 T7’09 449.40 293.82 0.65 0.72 156.08 4.91 1.54 0.53 852.78 T8’05 392.58 272.42 0.66 0.70 151.07 4.82 1.60 0.54 721.13 T8’09 439.08 222.45 0.58 0.93 95.05 4.57 1.62 0.43 1020.91 T9’05 218.09 157.29 0.68 0.69 86.97 4.30 1.52 0.57 381.06 T9’09 222.11 122.11 0.61 0.76 60.69 4.04 1.48 0.47 472.12 T10’05 408.48 303.95 0.69 0.60 184.53 4.96 1.55 0.59 694.37 T11’03 2013.22 1629.00 0.75 0.51 1041.90 6.72 1.36 0.68 2958.60
Table 29: parameters of fitting distributions for seeding-after-leeching time.
Interval Max Mean StDev Q1 Median Q3 IQR
1 hour 2,764 731 467 399 648 906 507
5 hours 2,363 260 219 131 200 307 176
10 hours 2,328 180 183 86 130 203 117
1 day 2,286 129 156 54 88 140 86
1 week 2,269 112 149 44 70 116 72
Table 30: Hourly Peer Arrival Rate Statistics with Various Peer Identification Intervals (number of joining peers per hour).
identifying peers by their IP addresses. If the time interval between two consecutive observations of the same IP address is longer than what we call the peer identification interval, we assume that these two observations correspond to different peers. We then analyze the peer arrival rate, the session length, and the download speed of this trace. We use different peer identification intervals ranging from one hour to a few weeks, and then compare the analysis results of traces derived with different intervals. Although this approach only considers the possible IP reassignment during long observation intervals but omit those that could happen at any time, we believe that it is enough to show the effects of IP reassignment to the analysis results.
We find that the peer identification interval has a significant impact on the distributions of the peer arrival rate and the session length: a smaller peer identification interval leads to higher peer arrival rates and shorter session lengths, as shown in Figures 20and 21, respectively. The reason for this is that with a small interval, a series of observations of the same IP address will be identified as corresponding to more peers than with a large interval. Obviously, the resulting sessions identified with a small interval are shorter than those identified with a large interval. In addition, we find that the peer identification interval can also affect the download speed in the derived trace. Using small peer identification intervals can lead to a higher download speed: The average peer download speed in the derived traces decreases from 108 Kbps to 71 Kbps as the peer identification interval is increased from 1 hour to 1 week, as shown in Figure25. The reason for this is that when calculating the download speed with small intervals, the observation gaps are larger than with large peer identification intervals. Tables 30, 31, and32provide respectively the statistics of the peer arrival rate, session length, and download speed of traces derived with various peer identification intervals.
Table 33, 34, 35shows the significance values from GOF test and parameters for fitting distributions of peer arrival rate, session length resulting from different peer identification intervals, respectively.
0 0.2 0.4 0.6 0.8 1
10 100 1000 10000
CDF
Peer Arrival Rate
IQR
1 hour 5 hour 10 hour 1 day 1 week
Figure 20: CDF of peer arrival rate for various peer identification intervals (horizontal axis in logarithmic scale).
0 0.2 0.4 0.6 0.8 1
1 10 100 1000 10000
CDF
Session Length
IQR
1 hour 5 hour 10 hour 1 day 1 week
Figure 21: CDF of session length resulting for various peer identification intervals (horizontal axis in logarithmic scale).
Interval Max Mean StDev Q1 Median Q3 IQR
1 hour 48,982 106 626 41 56 59 18
5 hours 48,982 352 1028 141 276 298 157
10 hours 48,982 519 1226 124 441 595 470
1 day 49,424 716 1475 98 426 1058 960
1 week 49,863 863 1876 86 378 965 879
Table 31: Session Length Statistics with Various Peer Identification Intervals (minutes).
Interval Max Mean StDev Q1 Median Q3 IQR
1 hour 3,717 108 119 34 72 138 104
5 hours 3,664 90 106 27 57 113 86
10 hours 3,664 82 101 25 51 102 77
1 day 3,664 74 96 23 46 90 68
1 week 3,664 71 96 22 45 86 64
Table 32: Peer Download Speed Statistics with Various Peer Identification Intervals (minutes).
0 0.2 0.4 0.6 0.8 1
1 10 100 1000 10000
CDF
Download Speed (kbps)
IQR
1 hour 5 hour 10 hour 1 day 1 week
Figure 22: CDF of download speed for various peer identification intervals (horizontal axis in logarithmic scale).
Interval Exponential Weibull Pareto Log-Normal Gamma 1h 0.083 0.235 0.357 0.566 0.000 0.000 0.294 0.526 0.367 0.583 5h 0.163 0.364 0.333 0.537 0.000 0.000 0.361 0.573 0.371 0.579 10h 0.200 0.420 0.298 0.503 0.000 0.000 0.349 0.571 0.337 0.543 24h 0.250 0.469 0.284 0.491 0.000 0.000 0.355 0.580 0.322 0.521 168h 0.235 0.452 0.249 0.462 0.000 0.000 0.385 0.594 0.250 0.468
Table 33: p-values of KS and AD test for arrival rates with different peer identification intervals.
Interval Exp(µ) Wbl(κ, λ) Pareto LogN(µ, σ) Gam(κ, λ) 1h 731.29 818.23 1.64 -0.33 948.04 6.37 0.76 2.36 310.38 5h 259.91 284.50 1.33 -0.07 278.16 5.26 0.84 1.81 143.64 10h 180.43 192.85 1.19 0.01 179.37 4.84 0.90 1.54 116.93 24h 128.80 132.99 1.07 0.08 117.61 4.42 0.98 1.29 99.52 168h 112.37 113.58 1.02 0.14 95.87 4.26 0.96 1.21 92.48
Table 34: parameters of distributions for arrival rates with different peer intervals.
Interval Exponential Weibull Pareto Log-Normal Gamma 1h 0.000 0.000 0.000 0.008 0.000 0.000 0.003 0.097 0.000 0.000 5h 0.000 0.000 0.009 0.042 0.000 0.000 0.033 0.217 0.000 0.000 10h 0.000 0.000 0.043 0.095 0.000 0.000 0.119 0.343 0.000 0.000 24h 0.000 0.000 0.118 0.201 0.000 0.001 0.293 0.479 0.000 0.000 1w 0.000 0.000 0.181 0.283 0.000 0.002 0.391 0.557 0.000 0.000
Table 35: p-values of KS and AD test for session length with different peer identification intervals.
Interval Exp(µ) Wbl(κ, λ) Pareto LogN(µ, σ) Gam(κ, λ) 1h 1284223.71 70.22 0.48 0.30 54.26 3.88 0.85 0.08 15712186.48 5h 3617197.83 292.36 0.47 0.19 259.44 5.25 1.13 0.08 42969895.97 10h 5210676.38 423.30 0.45 0.18 400.59 5.53 1.37 0.08 62362001.43 24h 7314544.64 532.07 0.44 0.29 496.98 5.65 1.61 0.08 89305559.32 1w 8442285.33 553.43 0.43 0.61 387.07 5.62 1.70 0.08 104620683.43
Table 36: parameters of distributions for session length with different peer intervals.
Interval Max Mean StDev Q1 Median Q3 IQR
10 minutes 3,583 446 405 135 338 624 488
30 minutes 3,524 251 317 33 141 349 316
1 hour 3,565 163 256 30 75 212 183
10 hours 2,188 106 169 24 54 119 95
1 day 2,189 102 164 24 51 112 88
Table 37: Hourly Peer Arrival Rate Statistics with Various Session Identification Intervals (number of joining peers per hour).
Interval Max Mean StDev Q1 Median Q3 IQR
10 minutes 4,144 70 153 7 17 61 54
30 minutes 6,962 157 270 22 56 162 140
1 hour 10,995 277 387 44 122 361 318
10 hours 11,430 563 736 81 315 772 691
1 day 23,757 637 866 87 355 881 793
Table 38: Session Length Statistics with Various Session Identification Intervals (minutes).