• No results found

1.5 Thesis Contribution and Organization

2.2.2 Content Exchange Process in P2P File-Sharing Systems

In a P2P le-sharing system, in order to download desired content, each down-loader must go through two necessary stages:

• locating peers with sharable content;

• connecting the located peer to download content.

Correspondingly, each uploader also needs two steps for completing content ex-change:

• publishing shareable content;

• selecting downloaders to upload content.

In the following, we review these two stages respectively.

2.2.2.1 Content Information Publishing/Retrieving in P2P File-Sharing Systems

In order to help uploaders publish content information and downloaders retrieve content information, the corresponding peers in the system must rst be discov-ered. P2P le-sharing systems build unstructured or structured logical networks mentioned in Section 2.1 to facilitate this process, i.e., using the index server method, the ood-query method, or the DHT method to discover peers.

• Index server method: This method can be used for both content publishing and content retrieving processes. With the index server method, a certain number of nodes, as servers, are popularly known by peers. To let peers know the information of servers, this information is usually published out-of-bound, like published on famous websites. Instead of directly uploading the whole content into these servers, a peer publishes its shareable content location information (e.g. this peer's location information) into the servers.

As a result, other peers can retrieve location information of their desired content from these index servers, and then directly download the content from sharing peers. The disadvantage of this index server method is obvious because it breaks the fully distributed structure of P2P systems by intro-ducing central nodes. The failures of these central index servers will greatly aect functionality and robustness of the system. Currently, popular P2P

le-sharing systems like BitTorrent and aMule/eMule employ an index server method as one of their content publishing/retrievaling mechanisms.

• Query-ooding method: Query-ooding method, which was adopted by the Gnutella P2P le-sharing system [6], helps P2P le-sharing systems maintain their fully distributed feature. It is only used to assist peers to look for content. When a peer wants to retrieve particular content information, the peer sends requests to its known peers. If these known peers do not hold the required content, the original request will be forwarded to the other peers, who are known by these known peers. Under this strategy, each peer is required to either answer the request, if it holds the content, or forward the request to its known peers. So, the request has been iteratively ooded within the P2P network. Compared with the index server method, the content information does not need to be published anymore by using the query-ooding method. However, since ooding requests cannot be innite, the maximum hops of requests are usually limited [6], and peers could face the risk of not nding the content, even though the content really exists in the system.

• DHT method: The DHT method also maintains the fully distributed feature of P2P systems. Like the index server method, it can also be used for both content publishing and content retrieving processes. In a P2P le-sharing network using DHT technology, each peer holds a unique identity by applying a special hash function. When a peer plans to publish some content information, a hash value has rst been obtained by hashing the

Chapter 2 Section 2.2

information with the same hash function. After that, the hashed content information is stored onto several peers whose identities are close to the hash value. The reason for storing onto several peers instead of one peer is to improve system redundancy and robustness. If a peer later wants to retrieve this published information, this peer can also rst obtain the same hash value by hashing the desirable content information. And then, it can search dierent parts of the entire distributed hash table to locate peers who have the closer identities with the hash value, and those peers should be the peers holding the published information. As each peer must keep its neighbors' information in the DHT network, maintaining the information of neighbors can utilize a logical structure of the DHT can as a tree, a circle, a chain, etc. With this logical structure, peers can iteratively send requests to neighbors to locate other peers in the system in a relatively short period time. Currently, aMule/eMule builds a DHT called the KAD network for facilitating the content publishing and retrieving processes. BitTorrent also began to combine the DHT functionality into its network.

2.2.2.2 Content Downloading/Uploading in P2P File-Sharing System After obtaining information of peers who share the desired content, downloaders will build connections with those peers and begin the downloading process. They can choose to download content from one single peer, and this may require a relatively long time period, if the size of content is large. In order to speed up the download process, they can also choose to download content from multiple peers simultaneously, and this is generally adopted by modern P2P le-sharing systems. To implement this multiple-downloading scheme, the content is usually separated into multiple same-size parts called chunks. If the downloading content has multiple chunks, instead of fully downloading it from one peer, the downloader can simultaneously download dierent chunks from dierent peers. Furthermore, with the chunk-based method, each downloaded chunk can be uploaded to other

peers immediately without waiting to obtain the entire le. After downloading dierent chunks from dierent peers, the downloader will reorganize those chunks, and the whole content has been recovered to itsoriginal format.

An uploader also needs to make a decision about how to allocate uploading bandwidth to downloaders. The uploader can put downloaders into its waiting queue and assign the entire upload bandwidth to a downloader selected by a particular policy like rst in rst out (FIFO) or round robin. The uploader may also separate its whole uploading bandwidth into multiple slots and upload to multiple downloaders at the same time.