2.4 Overlay Algorithms
2.4.1 Distributed Hash Tables (DHTs)
We use Chord [Stoica et al., 2003] to explain the underlying concepts of distributed hash tables. For a detailed survey of different overlay algorithms including DHTs, we refer the reader to RFC 4981 [Risson and Moors, 2007]. Figure 2.2 illustrates a Chord network. Each node in the Chord network has a fixed length identifier which is randomly chosen by a node or by a central authority. Similar to a hash table, the key space in Chord ranges from zero to the maximum value of the fixed length identifier. A node maintains two tables for maintaining connectivity to the Chord network, the successor list, and the routing
table. The successor list of a node X contains the IP addresses and port numbers of nodes running the Chord protocol, that have numerically closest ID’s to X. We refer to the successor list as the neighbor table of a node. The routing table of a node X contains the IP addresses of log N Chord nodes, assuming there are N nodes in the system. A node selects a node for its ithrouting table row with an ID that lies between [X +2i, X +2i+1). A
node uses the successor list to maintain a consistent view of the overlay whereas it uses the routing table to quickly send messages to other nodes in the Chord network. A message can traverse log N hops on average. The nodes in the Chord network have to periodically check the liveness of the nodes in their successor (neighbor) and routing tables. A node stores any key value pairs whose keys are between its predecessor’s identifier and itself. When a node joins the Chord network, it takes ownership of a portion of the data that its successor stores. Similarly, when it leaves the Chord network, its successor must take ownership of the data stored by this departing node.
Part I
Chapter 3
Protocols for Building Peer-to-Peer
Communication Systems
3.1
Introduction
The research in peer-to-peer systems has focused on the design of structured or unstructured protocols [Stoica et al., 2003; Ratnasamy et al., 2001; Rowstron and Druschel, 2001a; Rhea et al., 2004; Maymounkov and Mazieres, 2002; Chawathe et al., 2003], file sharing [Rowstron and Druschel, 2001b; Rhea et al., 2003], and streaming [Zhang et al., 2005] that distribute the functionality of servers to nodes. The designers of these protocols, more often than not, need to reinvent mechanisms for data model, message reliability, security, and NAT and firewall traversal. Such reinvention increases the time to build and deploy a p2p system. Further, many of the above referenced protocols ignore the issue of NATs and firewalls altogether which is central to the deployment of a p2p communication system.
Skype [Skype, 2010a] is the first peer-to-peer VoIP application that enables user agents to establish media sessions with minimal use of managed servers. It does so by distributing the directory service, proxy server, and media relaying functions to the Skype user agents. Specifically, the Skype user agents cooperate to provide a distributed directory service for locating the network address of other Skype user agents, and for exchanging signaling messages to establish a media session. They can then directly exchange media traffic such as voice, video, or IM. However, restrictive NATs and firewalls may prevent them from directly
exchanging packets. The Skype network enables media session establishment between these user agents by using other Skype nodes with unrestricted Internet connectivity to relay the signaling and media traffic between these user agents.
Although Skype works, it uses a proprietary and encrypted protocol and requires Inter- net access to connect to the Skype’s p2p network. Consequently, it cannot work in envi- ronments with no Internet connectivity. Further, our research has shown that the success rate of Skype media sessions depends on the characteristics of a network connection such as a NAT, and the selection of a Skype user agent to relay media traffic is suboptimal [Kho et al., 2008]. Moreover, the relaying of media session consumes network bandwidth on the machine running the Skype application. Our conversations with Skype users suggest that at times they have become annoyed with the use of their machine’s resources by the Skype application and decided to terminate it. This action can result in the failure of the relayed media session.
The thesis devises an open, standardized, and interoperable protocol for building peer- to-peer communication systems that is motivated by the desire to prevent the reinvention of mechanisms for data model, message reliability, security, and NAT and firewall traversal while at the same time, keeping the protocol extensible and flexible for non-VoIP uses. The protocol facilitates the design and development of VoIP systems that require minimal or no use of managed infrastructure and doing so under many different and often conflicting requirements of network connectivity, scalability, resource and service discovery, reliability, monitoring and diagnostics, and security. The protocol is extensible, allows incorporating a p2p protocol (which we refer to as overlay algorithm (Section 3.2)) for file-sharing, stream- ing, or VoIP, and reuses the same protocol machinery for data model, message reliability, security, and NAT and firewall traversal for different overlay algorithms, thereby preventing reinvention of this machinery for each overlay algorithm.
The rest of this chapter is organized as follows. In Section 3.2, we describe the require- ments of designing such a protocol. In Section 3.3-3.6, we present Peer-to-Peer Protocol (P2PP) that we have designed. The protocol can be used to build p2p communication systems for ad hoc, enterprise, and Internet scale environments. In Chapter 4, we present OpenVoIP, a p2p communication system that we have built using P2PP that implements
three different DHTs. As shown in Section 4.4, 85% of the total lines of code (approxi- mately 16,000) are independent of the overlay algorithm implemented using P2PP. This result confirms our assertion of keeping the same protocol machinery for data model, mes- sage reliability, security, and NAT and firewall traversal for different overlay algorithms.