7.3 Simulation-Based Analysis of DTR1 and DTR2
7.3.2 Results and Analysis
7.3.2.1 Scalability.
Section 4.5 has argued that in DTR1, the CA is unlikely to be a performance bottleneck in a relatively stable P2P system (with low churn rate). The CA maintains a list of peers that are currently in the network, which would not scale well. However, such the list could be removed by using a more complex mechanism for issuing certificates. The main role of the CA is to issue neighbor certificates to peers during churn events. Figure 7.3.1 shows the number of certificates produced by the CA in the simulation. As expected, such the workload grows with the number of nodes and the churn rate.
To see if the workload at the CA could be a performance bottleneck, the simulation execution time of DTR1 and DTR2 are recorded and the results are shown in Figure 7.3.2. It can be seen that the simulation of DTR1 takes longer to complete than that of DTR2, but the differences are small, even with the high the workload incurred at the CA. There- fore, it can be concluded that the negative impact of the CA on the system’s performance is visible, but compared to the non-CA system (DTR2) it is not substantial enough for the CA to be adjudged a bottleneck.
Figure 7.3.3 and 7.3.4 show the average hop count in DTR1 and DTR2. The hop count is close to log2bpNq- the hop count of Pastry under static network condition. DTR1
100000 1e+06 32768 16384 8192 4096 # certificates nNodes
Number of certificates issued by CA, nLPs = 32
cRate = 0.005, mPeriod = 100 cRate = 0.01, mPeriod = 100
Figure 7.3.1: The workload at the CA during the simulation. nLPs 32, nNodes varies
from 4096 to 32768, cRate varies from 0.005 to 0.01, and mPeriod 32
100 1000 10000 100000 32768 16384 8192 4096
Simulation time (sec)
nNodes
Simulation time of DTR1 vs DTR2, nLPs = 32, cRate = 0.01, mPeriod = 100 DTR1 DTR2
Figure 7.3.2: Simulation execution time of DTR1 vs DTR2. nLPs 32, cRate 0.01,
1 10 32768 16384 8192 4096 Hop count nNodes Hop counts in DTR1 (nLPs = 32) cRate = 0.005, mPeriod = 50 cRate = 0.005, mPeriod = 100 cRate = 0.01, mPeriod = 50 cRate = 0.01, mPeriod = 100 log16(N)
Figure 7.3.3: The average hop-count in DTR1, nLPs 32 and nNodes varies from 4096
yo 32768 1 10 32768 16384 8192 4096 Hop count nNodes Hop counts in DTR2 (nLPs = 32) cRate = 0.005, mPeriod = 50 cRate = 0.005, mPeriod = 100 cRate = 0.01, mPeriod = 50 cRate = 0.01, mPeriod = 100 log16(N)
Figure 7.3.4: The average hop-count in DTR2, nLPs 32 and nNodes varies from 4096
0.01 0.1 1 32768 16384 8192 4096
ratio of successful joins
nNodes
Average ratio of successful joins in DTR1 vs DTR2 (nLPs = 32) DTR1 DTR2
Figure 7.3.5: Average rate of successful joins in DTR1 vs DTR2, nLPs 32 and nNodes
varies from 4096 to 32768
and DTR2 are similar regarding this metric, because the verifications occur only at the very last steps of the routing. In fact, these figures demonstrate that the hop-count is mainly influenced by the churn rate. In particular, higher churn rates result in higher hop count.
7.3.2.2 Robustness.
The rates of successful joins for varying values of cRate and mP eriod are averaged and the results are then shown in Figure 7.3.5. There are two noticeable observations from the figure. First, unsuccessful join events always exist, as the rates of successful joins are always below 1. One explanation is that the churn condition causes the routing towards the closest neighbors of the joining nodes to fail occasionally. Second, the rate of successful joins in DTR1 is always larger than that in DTR2, which can be explained as follows. In DTR1, join events are always considered successful once the joining nodes receive a leafset from other nodes. In DTR2, a join event can fail when the joining node fails to get the correct tokens from its neighbor, which could happen under churn because the routing of the joining query ended up at the wrong neighbor. Therefore, the number of
0.1 1 32768 16384 8192 4096
ratio of completed queries
nNodes
Average ratio of failed queries over all completed queries in DTR1 vs DTR2, nLPs=32 DTR1 DTR2
Figure 7.3.6: Average rate of query failure per completed query in DTR1 vs DTR2. nLP s32, nNodes varies from 4096 to 32768
successful joins in DTR2 is smaller than that in DTR1.
Figure 7.3.6 illustrates the rate of query failure per completed query in DTR1 and DTR2. The results are averaged over those with varying values of cRate and mP eriod. As stated earlier, a completed query arrives the node that cannot forward the query further. A successful query requires, in addition, the node to show evidence that it is indeed the root node of the search key. In other words, a query is deemed unsuccessful or failed in two cases:
1. The query is dropped because it is forwarded to a node which is no longer in the network.
2. The query arrives at the node that cannot present the correct certificate (in DTR1) or tokens (in DTR2).
The first case depends on the churn rate, while the second is attributed to the imple- mentation of DTR1 and DTR2. Figure 7.3.6 demonstrates that DTR1 and DTR2 incur high, but comparable rates of query failure per completed query. In both systems, of all the queries arrived at some destination nodes, less than 50% of those nodes are the cor- rect root nodes of the search keys. This provides more evidence for the negative effects of
churn in DTR1 and DTR2. Interestingly, the failure rate in DTR1 is always lower than in DTR2. This can be related to the higher level of successful queries in DTR1, as explained by the following example. Consider a node pn joining the system, and the joining query
arrives at a node pd which is not the closest neighbor of pn in the current network. pn
always gets its certificate in DTR1, but fails to do so in DTR2. In both cases, pn stays in
the network and participates in the routing protocol. When query for a key k whose root node is pn arrives at pn, the query will succeed in DTR1 but fail in DTR2. Therefore, the
chance of a given query being unsuccessful in DTR2 is higher than that in DTR1.
7.3.2.3 Summary of the results.
The discussion of the simulation results above can be summarized as follows:
1. The CA introduces overheads to DTR1. For a typical P2P system, such overheads are not substantial enough for the CA to be considered as a performance bottleneck. However, it is clear that for systems under highly frequent churn, the CA can cause scalability issues.
2. The negative effects of churn in DTR1 and DTR2 are evident. The rates of suc- cessful joins and the rates of query failure per completed query indicate that the performance of DTR1 and DTR2 are comparable, but DTR1 seems more robust under churn.
7.4
Related Work and Discussion
This chapter presents experimental analysis of DTR1 and DTR2 using a large-scale sim- ulation tool called dPeerSim. Other simulation tools for studying P2P systems exist. Generic network simulators such as NS-2 [40] and p2pSim [41] are used occasionally, while bespoke simulators are found in many other P2P studies. The detailed implementa- tion and source codes of bespoke simulators are often unavailable, which makes it difficult
to reproduce the results or to carry out comparative evaluations. The aim of PeerSim is to fulfill the need for a generic, easily extendable open-source P2P simulator.
Most existing simulators are limited by the total number of nodes they can simulate. Bespoke simulators achieve 100, 000 nodes [17] at most, while NS-2 claims the scalability of 260, 000 nodes. This limit is largely due to resource constraints at the simulation machine. PDNS is a distributed simulation platform based on NS. It supports packet level simulation, as opposed to dPeerSim which supports application-level simulation.
There are a number of limitations with the current simulation models of DTR1 and DTR2 implemented in dPeerSim. First, the detailed protocols involving the security devices (TPMs in DTR1 and TTMs in DTR2) are not implemented. In the current churn model in which nodes fail gracefully, and with the assumption that the issuing of certificate is an atomic operation, the actual number of messages handled by the devices could be extrapolated from the number of successful joining and leaving events. It would be interesting to implement these protocols into dPeerSim and to observe the workload at the devices when more complex churn models (Byzantine failure, for example) are considered. Second, the maintenance protocol is very simple, in which peers send keep alive messages and routing tables to each other after fixed intervals. dPeerSim has been used to evaluate more complex, adaptive maintenance protocols [76]. This current protocol might be accountable for the high rates of query failure per completed query observed in DTR1 and DTR2. Therefore, it would be interesting to investigate how other maintenance schemes help improve the performance of DTR1 and DTR2 under churn. Third, the network adversaries are not included in the simulation models. In practice, the adversary could perform Denial of Service (DoS) attacks to undermine the system’s performance. Simulating the network adversary and analyzing its impact on the system is an interesting and challenging domain of future work.
As discussed in Section 4.5 and 5.7, the current designs of DTR1 and DTR2 present a number of avenues for future work. For example, the churn model can be made more realistic by taking fail-stop or Byzantine failures into consideration. In addition, a more
complex mechanism for issuing certificate in DTR1 could remove the need for the CA to maintain the list of peers. Having redesigned DTR1 and DTR2, a necessary step is to evaluate their performance by experimenting with the new protocols using dPeerSim.
Regarding dPeerSim, an extension that adds support for packet-level simulation would make the tool more useful for studying P2P. From a distributed simulation perspective, it would be interesting to study the effect of optimistic synchronizations on the scalabil- ity of the simulation. Finally, implementing a load balancing technique into dPeerSim could prove useful when simulating P2P systems under frequent churn, because the more balanced workload among LPs would result in the shorter simulation execution time.
CHAPTER 8
CONCLUSION AND FUTURE WORK
This concluding chapter summarizes the results presented in the previous chapters, and discusses the extent to which the goal of the thesis, which has been to investigate the reliability of trust systems for structured P2P, has been met. Section 8.1 discusses this thesis’s contributions to knowledge and highlights its limitations. Section 8.2 presents a number of research directions that may be taken in the future.
8.1
Contributions and Evaluation
Peer-to-Peer (P2P) infrastructure has received a great amount of research attention. Thanks to its scalability, structured P2P in particular has been used for designing many large-scale distributed systems. Security is one of many challenges that must be addressed before the potential of structured P2P can be fully utilized. Having a trust system for P2P that allows one node to assess the trustworthiness of another before interacting with it can help mitigate security as well as some other problems in P2P. A trust system com- prises a reputation metric and a feedback mechanism, and it should be both reliable and efficient. This thesis set out to investigate on making the current states of trust systems for P2P more reliable. Recall that the goal of this thesis is to seek the answers to following questions:
Sybil manipulations? Can they be improved? Are the feedback models used by those metrics strong enough? Can they be made more realistic?
2. Regarding the existing feedback mechanisms for structured P2P applications, is it always possible for peers to leave feedback to each other after their transactions? If not, can we design mechanisms that overcome such the limitation?
Overall, this thesis has answered both questions. To a large extent, it has demonstrated that the existing reputation metrics and feedback mechanisms can be improved to make the trust system more reliable. It has also made other contributions regarding the methods for evaluating P2P systems.