• No results found

P2P protocol works at the application layer of P2P networks. The major responsibility of these protocols is to form and maintain the overlay network and the participating nodes. It provides a node with the mechanism of searching for contents, joining the network, leaving the network and publishing it contents. Starting from 1999, there is a long list of P2P protocols. A few of them are Ares, Bitcoin [40], BitTorrent[41], FastTrack [42], eDonkey [42], Gnutella and GIA.

In the next section, popular P2P protocols will be discussed in more detail.

2.4.1.  Gnutella

The idea of P2P file sharing system was devised by Napster while supporting centralized servers for file sharing. Soon after that, the idea of decentralized file sharing and search algorithm was proposed in the form of Gnutella which is an open, decentralized and search protocol, mainly used for file sharing. The term ‘Gnutella’ refers to the overlay network of Gnutella-speaking applications connected via the internet and a number of smaller and often private disconnected networks. Gnutella gained popularity after Napster was closed down and it has turned out to be one of the most popular systems to date [43].

Primarily, using Gnutella protocol, when a new peer joins the system, it executes a bootstrapping process to find and connect to potential peers. Once a new peer has connected to the overlay, it sends a ping message to all connected peers to announce its presence. The receiving peer sends back a pong message which in turn comprises its port number and IP address.

Query propagation in Gnutella is based on a flooding mechanism, for example the search query propagates from the original requesting peer to all connected neighbours. The query is further flooded until it finds the contents or time-to-live expires. Moreover, the intermediate nodes are in charge of checking their repository as to whether they have the desired file, and of replicating and forwarding the query to all reachable neighbours in the overlay. The query-hit message or query response is also returned along the reverse path in the direction of the requesting peer. Gnutella protocol follows random walk procedure in which the peer which

have to forward query to its immediate neighbours selects its neighbour randomly without any criteria.

To reduce the potentially enormous number of messages in the network, queries are restricted by a time-to-live (TTL) value, which signifies the number of hops a message can travel before it is discarded. TTL is decreased by one with each hop, until the message is discarded when it reaches zero. Broadcasting the query to all reachable peer nodes in the overlay within the TTL value limitation is called a Breadth-First Search (BFS) mechanism [44].

Figure 2.2: Gnutella Search Algorithm

Figure 2.2 shows the search process of Gnutella. Assume that peer ‘A’ needs to find a particular file. Initially, it generates its search query which is forwarded to all its connected neighbours, peers B, C, and E. When those neighbours receive the query message, they will check whether their repository has the required file. Otherwise, they forward the query on to their neighbours. Suppose that peer D holds the required file. Peer D will send a response along the reverse path to the peer that sent the query, which was peer C. After that, peer C continues forwarding the response to the query originator, peer A. Eventually, peer ‘A’

contacts peer D directly to download the required file.

The design goals of Gnutella are as follows:

•   The Ability to Operate in a Dynamic Environment: Gnutella based application operates in an environment where the nodes join and leave the networks very quickly.

In order to provide the flexibility for smooth operations, transparent resource handling is required.

•   Performance and Scalability: P2P network can only be beneficial for large-scale networks where the limitations of client-server paradigm become evident. The scalability refers to the capability of a network to handle a large number of the participants.

•   Reliability: The network attacks should not significantly degrade the performance of network significantly.

•   Anonymity: It is about how to protect the privacy of users seeking or providing unpopular information.

2.4.2.  GIA

One of the influential unstructured P2P approaches is GIA [16], which derives its name from GIAnduia, and is based on the common P2P Gnutella algorithm. By 2007, Gnutella, which GIA is improving, was the most popular file-sharing protocol with an estimated market share of more than 40%. As Gnutella adopted a decentralized search algorithm, it had one fundamental problem, namely, that the nodes became overloaded quickly due to the effect of the high aggregation query rate [16]. Therefore, as the resource placement performance is significantly affected by the overlay topology, GIA suggested several modifications to Gnutella design, to deal with the high aggregation query rate. These can be summarized as the following four components:

•   An Active Flow Control: To ensure that the nodes are not overloaded with high queries, each sender node can direct the query only to a neighbour that has informed the sender by means of a flow-control token. Each token represents a one-query message that the neighbour is ready to accept.

•   One-hop Replication: Each node keeps an index for its connected neighbour’s content. So, when a node receives a query message, it responds with the matching contents, and it will also include a list of contents of its connected neighbours.

•   A Biased Random Walk (BRW) Search Protocol: Instead of using flooding or random-walk search methods, the search process in GIA makes sure that all search queries are directed to the high-capacity nodes rather than to randomly chosen nodes as in random walkers. The algorithm for the BRW is shown below in Figure 2.3.

•   A Dynamic Topology Adaptation: The most important component in GIA ensures that all nodes in the overlay are connected to the high-capacity nodes, based on the pseudo-code in Figure 2.3. To achieve the main goal of topology adaptation, each node in GIA calculates a value from 0 to 1. This value is presented by Level of Satisfaction (S), in order to measure how the node is satisfied with its connected neighbours. The value of 0 means that the node is completely unsatisfied whereas 1 refers to that the node being totally satisfied.

Figure 2.3: GIA Pseudo-Code [16]

Where:

o   Num_nbrs: Number of neighbours of a peer.

o   Max_nbrs: Maximum number of neighbours that a node can have.

o   Ci: Capacity of a node to handle requests.

With the aim of improving the consistency of GIA overlay networks in mobile networks, the study in [45] presented a new technique based on GIA. The main concept introduced by M-GIA is the physical distance between two nodes. The information about a node’s location will therefore be considered as an additional parameter to GIA protocol. Indeed, the distance between nodes does not comply with the mobility of nodes. Thus, this in turn will burden the network, thus increasing the need to calculate the distance with each movement. Furthermore, some studies such as GES [46] have inspired the idea of topology adaptation in GIA to improve their look-up performance, with the difference bring that the topology adaptation in GES is mainly used to organize the semantically relevant nodes into similar semantic groups.

Start environment has been affected by the significant growth in communication and wireless technologies, particularly with mobile phones.