500 1000 1500 2000 0 5 10 15 20 25 30
Average bandwidth gain [kbps]
Borrowers set size
Figure 3.11: The average bandwidth gain as a function of the size of the borrowers set.
model derived from a real-world trace.
Figure 3.11 presents the correlation between the average bandwidth gain, computed in the same way as in Section 3.5.3, and the size of the borrowers set. Here the number of randomly selected borrowers is set to the smallest integer value larger than or equal to one tenth of the borrowers set size. The experimental evaluation confirms the results obtained analytically that the gain as a function of the borrowers set size converges quickly.
3.6
Summary
In this chapter we have proposed to replace the traditional incentive model of P2P net- works which is based on content exchange with a novel incentive model based on band- width exchange. Bandwidth as a resource has a universal value for all the peers, unlike content, which is relevant only to the group of peers interested in it. Therefore, the mech- anisms for keeping track of the bandwidth exchanged between peers can be simpler than the mechanisms for maintaining the content contributions of peers across multiple down- loads. As a proof of the bandwidth exchange concept, we have designed the ATFT proto- col based on mechanisms employed by BitTorrent TFT for content exchange. ATFT of- fers an elegant solution to the problems encountered in BitTorrent including bootstrap- ping of newcomers, providing seeding incentive, efficient support of asymmetric links, and anonymity. We have formally proven that the ATFT protocol provides incentives for contributing bandwidth, we have discussed the selection of protocol parameters, and we have evaluated ATFT using a trace of a real-world P2P community.
57
Chapter 4
Optimizing peer relationships in a
super-peer network
Super-peer architectures exploit the heterogeneity of nodes in a P2P network by assigning additional responsibilities to higher-capacity nodes. The most common use of super-peer architectures is content search in file sharing P2P networks. In the design of a super- peer network for file sharing, several issues have to be addressed: how client (ordinary, weak) peers are related to super-peers, how super-peers locate files, how the load is bal- anced among the super-peers, and how the system deals with node failures. Those issues are difficult to resolve in the current super-peer networks that assume a fixed topology connecting client peers with randomly selected super-peers.
Addressing the limitations of the current super-peer networks based on a fixed topol- ogy, we propose a Self-Organizing Super-Peer Network architecture (SOSPNET). SOSP- NET maintains a super-peer network topology that dynamically adapts to the semantic similarity of peers sharing content interests. Super-peers maintain semantic caches of pointers to files that are requested by peers with similar interests. Client peers, on the other hand, dynamically select super-peers offering the best search performance. We show how this simple approach can be employed not only to optimize searching, but also to solve generally difficult problems encountered in P2P architectures such as load balancing and fault tolerance. The evaluation results indicate that SOSPNET achieves close-to-optimal file search performance, quickly adjusts to changes in the environment, survives even catastrophic node failures, and efficiently distributes the system load.
The remaining part of this chapter is organized as follows. Section 4.1 specifies the problem domain addressed by the design of SOSPNET. Section 4.2 positions our design in the context of related work while Section 4.3 describes in detail the architecture of SOSPNET. Section 4.4 introduces a model of P2P networks with semantic relationships between peers and files based on real-world traces. This model is further used in Sec- tion 4.5 to evaluate the performance of SOSPNET. Section 4.6 summarizes this chapter.
4.1
Organizing peer relationships
The vast majority of mechanisms for optimizing different performance aspects of P2P networks rely in one way or another on organizing the relations between peers. The relationships are organized by defining for each peer the set of other peers, called its neighbors, it interacts with.
In symmetric P2P networks such as Gnutella [116] and Freenet [39], any two peers are potential neighbors. In hybrid approaches such as Napster [10], all peers have a sin- gle neighbor — a central server that keeps information on all peers and responds to re- quests for that information. In super-peer networks [145] such as Kazaa [83], Gnutella ultrapeers [123], and Chord super-peers [97], neighbors are selected from the set of high- capacity peers called super-peers; low-capacity peers — the client peers — cannot be- come neighbors.
In this chapter we aim at solving the problems of the existing super-peer networks related to the issue of establishing relationships between peers. Before presenting our approach, we identify the weak points of existing super-peer architectures. Each of the popular super-peer protocols proposed in the literature, including Kazaa, Gnutella ultra- peers, and Chord super-peers, makes at least one of the following three assumptions:
1. Every peer is assigned to a fixed, very small number (usually one) of super-peers. Consequently, super-peers become bottlenecks in terms of fault tolerance. Restor- ing the system structures such as routing tables back to a consistent state after a super-peer crash requires a considerable effort.
2. Peers are assigned to super-peers randomly and statically. The randomness of the assignment is explicit (as in Gnutella) or implicit (as in Chord, where the super- peer selection is based on peer identifiers, which are selected randomly). This static assignment does not adapt to changes in the network structure or in peer character- istics (e.g., content interests).
3. The peer-to-super-peer assignment has the so-called all-or-nothing property. When a peer connects to a super-peer, the latter takes responsibility for all the content stored at the peer. Such an assignment does not take into account the possible diversity of the peer’s interests, and makes balancing the load among the super- peers difficult.
In the rest of this chapter we show how to overcome all these limitations by introduc- ing our self-organizing super-peer architecture, SOSPNET. The design of SOSPNET is guided by the following requirements. First, SOSPNET should be self organizing in that it is able to discover and exploit the semantic structure present in the network no matter what the initial topology is. Second, a new peer joining the network does not need to have
4.2 Related work 59
any knowledge about the system; the longer a peer stays in the system, the more informa- tion it can collect and exploit for improving the performance of its searches. Third, the time it takes a new peer to achieve its optimal performance should be minimized.
SOSPNET uses two-level semantic caches deployed at both the super-peer and the weak-peer level to maintain relationships between related peers and files. The cache main- tained by a super-peer contains references to those files which were recently requested by its weak peers, while the cache of a weak peer stores references to those super-peers that satisfied most of its requests. We propose a novel mixed caching policy that com- bines the advantages of the traditional least-frequently used (LFU) and least-recently used (LRU) policies to improve the cache hit rates for less popular files. Furthermore, SOSP- NET incorporates in its design a mechanism for balancing the load among super-peers. Load balancing is fully integrated with the content search algorithm and does not require any additional information exchange between super-peers nor a separate, external control component. The load-balancing decisions are made independently by individual super- peers based on local information.
4.2
Related work
Several protocols have been proposed to exploit super-peers [96, 101, 144, 145]. These protocols are described in more detail in Section 1.4. None of the super-peer protocols proposed to date are capable of optimizing relationships between peers taking into account their content interests as deduced from their (possibly changing) behavior. In contrast to the existing super-peer networks, in the SOSPNET architecture described in this chapter the relationships between peers are discovered, maintained, and exploited automatically, without any need for user intervention or explicit mechanisms.
While some researchers are focused on exploiting static properties of shared data [115, 129,147], also the possibility of utilizing patterns in dynamic peer behavior have attracted the attention of the research community. Such patterns in peer behavior have been re- ported by several measurement studies [59, 62, 67], which have revealed correlations be- tween the search requests made by users of popular P2P systems. It was observed that the performance of locating content can be greatly improved [79, 80] by grouping peers interested in similar files and routing their search requests within these groups. Various approaches can be used to capture semantic proximity between peers. Some approaches rely on a predefined ontology (semantic classification) [43]. Unfortunately, classifying items is not easy and the classification may vary over time to reflect the changes of se- mantic profiles. Another approach is to add some semantic shortcuts (i.e., additional links) between peers that share some interest [61, 126]. These links are created dynamically be- tween peers, based on the set of most recent downloads, for instance. Such a mechanism is very reactive to evolving download patterns. Nevertheless, the non-intrusive nature of
this approach does not allow to exploit further available information such as the overlap between caches, which has also been used to approximate the semantic proximity be- tween peers [140]. A refined proximity measure takes into account not only the content of peers’ caches but also their generosity and the popularity of the shared files [31].
The semantic relationships between peers and files can be discovered relatively eas- ily [139]. The biggest challenge is, thus, to build an architecture that maintains and ex- ploits the discovered semantic structure. In this chapter we present the design and eval- uation of a P2P architecture that combines the homogeneity of peer interests with the heterogeneity of peer capacities to solve the problem of effective peer relationship man- agement.