Golgi Apparatus Project

(1)

Project

A Distributed Peer-to-Peer Web Cache Final Report Draft

ABSTRACT

This final report describes the design and implementation of a distributed peer-to-peer web cache for Internet Explorer and Netscape running on Windows 2000 and Windows XP clients which pools and shares the caching automatically performed by these web browsers. This provides an efficient, scalable and resilient content cache without the need for dedicated hardware or administration. This document describes the prototype of the product developed and future directions for continuing the work. It also presents an analysis and evaluation of the prototype and of the software engineering procedures used to develop the prototype.

Copyright  2002 by Chris Lord, Nimit Sawhney, Madhur Joshi, Sajjid Salyani and Anupam Dhanuka.

Authors: Chris Lord, Nimit Sawhney, Madhur Joshi, Sajjid Salyani and Anupam Dhanuka

Software Version: Golgi Apparatus 0.0.001 Document Version: 0.0.003

Edit Number: 25

Last Revised: 5/10/2002 4:50 PM by Chris Lord

Web Site: www.andrew.cmu.edu/~nsawhney/ds

Source Library: arsenic.ini.cmu.edu/golgi

Date Printed: 5/10/2002 4:50 PM

(2)

(3)

Project Team

Name Email Address Homepage

Anupam Dhanuka [email protected] http://www.andrew.cmu.edu/~ adhanuka

Madhur Joshi [email protected] http://www.andrew.cmu.edu/~ mjoshi

Chris Lord [email protected] http://www.chrisandtrudi.com/Chris/Chris.html

Sajjid Salyani [email protected] http://www.andrew.cmu.edu/~ssalyani

Nimit Sawhney [email protected] http://www.andrew.cmu.edu/~ nsawhney

Document Revision History

Date Version Authors Description

06-May-02 0.0.001 CCL Initial

09-May-02 0.0.002 CCL Incorporate sections and contributions

10-May-02 0.0.003 CCL Finish it after everyone else has left or become unmotivated.

Related Web Sites

Site Location Comment

Apache www.apache.org Open source web server.

IANA www.iana.org Internet Assigned Numbers Authority.

IETF www.ietf.org Internet Engineering Task Force standards body.

Jupiter Media Metrix www.caching.com Caching resources (vendor profiles, news, analysis and whitepapers).

RFC Archive www.faqs.org Internet FAQs, RFCs and Standards archive.

Squid Web Proxy Cache www.squid-cache.org Open source Internet caching and proxy software.

(6)

References

[1] Amitabh, D., “Proxies, Application Interfaces and Distributed Systems,” University of Illinois, 1997.

[2] Bentley, J.L., and R. Sedgewick, R., “Fast Algorithms for Sorting and Searching Strings,” Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, January 1997.

[3] Berners-Lee, T., “Uniform Resource Identifiers (URI): Generic Syntax,” RFC 2396, August 1998.

[4] Berners-Lee, T., “Universal Resource Identifiers in WWW,” RFC 1630, June 1994.

[5] Bernstein, D. J., “A structure for constant databases”, September 1996.

[6] Bernstein, D., “CDB Specification,” http://cr.yp.to/cdb.html.

[7] Braden, R., “Requirements for Internet Hosts—Communication Layers,” RFC 1122, October 1989.

[8] Comer, D., Internetworking with TCP/IP: Volume 1 - Principles, Protocols and Architecture, 1995.

[9] D. H. Crocker, “Standard for The Format of ARPA Internet Text Messages,” STD 11, RFC 822, August 1982.

[10] Fielding, R., “Relative Uniform Resource Locators,” RFC 1808, June 1995.

[11] Fielding, R., et al, “Hypertext Transfer Protocol -- HTTP/1.1,” RFC 2616, June 1999.

[12] Franks, J., et al, “An Extension to HTTP: Digest Access Authentication,” RFC 2069, January 1997.

[13] Horton, M., and R. Adams. “Standard for Interchange of USENET Messages,” RFC 1036, December 1987.

[14] Iyer, S., et al, “Squirrel: A Decentralized Peer-to-Peer Web Cache,” 2001.

[15] John Dilley, Martin Arlitt, “Improving Proxy Cache Performance – Analyzing Three Cache Replacement Policies”, HPL-1999-142, Hewlett Packard Laboratories, Palo Alto, CA, USA, May 1999.

[16] Meyers, J., and Rose, M., “The Content-MD5 Header Field,” RFC 1864, October 1995.

[17] Nebel, E., and L. Masinter. “Form-based File Upload in HTML,” RFC 1867, November 1995.

[18] Patricia Seybold Group, Network Caching Guide: Optimizing Web Content Delivery, March 1999.

[19] Reynolds, J. and J. Postel. “Assigned Numbers,” STD 2, RFC 1700, October 1994.

[20] W3C, “HTML 4.01 Specification,” W3C Recommendation, December 1999.

[21] Wessels, D., ICP and the Squid Web Cache, 1997.

[22] J Wang, “A survey of web caching schemes for the Internet”, ACM Computer Communication Review, October 1999

[23] A Wolman, et al, “ On the scale and performance of cooperative web proxy caching”, Proceedings of the 17^th ACM Symposium on OS principles, December 1999

[24] R Cacere, et al , “Web proxy caching: the devil is in the details, Proceedings of the workshop on Internet Server Perfomance, May 1999

[25] Rabinovich, M., and Spatscheck, O., Web Caching and Replication, 2002

[26] Feldman et al, Performance of Web Proxy Caching in heterogeneous bandwidth environments, Proceedings of INFOCOM, 1999

Many of these references are on the project web site and in the project source control library. The list does not include the many associated RFCs. These can be found at any of the online repositories such as www.faqs.org.

Although not expressly acknowledged in the text, some definitions are derived from www.whatis.com, an online dictionary of IT terms (URI, etc.). Microsoft’s MSDN resources were also used frequently.

(7)

1 Introduction

The typical personal computer running a web browser such as Internet Explorer or Netscape performs extensive local caching of web content to avoid refetching content that has not changed. This increases performance and thereby contributes to a better user experience, masking the typically slow connections to the content origin. The performance improvement can be dramatic because much of web content consists of content that changes far less frequently than it is accessed and personal computers have gigabytes to spare for a content cache. In a large campus or enterprise environment, the amount of cached content could easily reach terabyte scales. None of it is shared.

But it could be. Studies have shown that over 60% of web content is static or changes infrequently enough to be effectively cached [18]. A recent study of Microsoft’s corporate Internet traffic also showed that 30-40% of all web requests could be satisfied by content that was already present in the intranet, assuming that content that was cachable was actually cached [14]. The fact that this study was done by analyzing traffic between Microsoft’s intranet and the Internet further supports the fact that caches are not shared to the extent they could be.

The Golgi Apparatus project is a shared, distributed web cache that functions across self-defined affinity groups within an intranet characterized by high bandwidth, high speed network connections and few comparatively slower Internet connections. It improves the performance and robustness of accessing Internet content from within an intranet by allowing private content caches to be shared. The benefits include:

Faster Content Access Individual users obtain from content that exists anywhere in the intranet, not just in their own local caches. The source is chosen based on a number of factors including the historical performance of the source, the content age, any resource limits imposed by the source, and of course the availability of the source. Being able to obtain content from nearer sources—those with higher bandwidth connections to the user—provides faster access to the desired content.

Better Availability Regardless of where content is obtained—from an intranet peer or the content origin—

the content is cached on the requesting system. This creates redundant copies of popular content which further improves the performance of others requesting the same content and improves the availability of content when any of the copies or when the content origin is not reachable. The more popular the content, the more copies proliferate. Should all intranet-based copies be unavailable, the origin is used. Should the origin also be unavailable, then the user is no worse than before.

Lower Costs The more content that is obtained from the shared cache, the fewer demands on the connections to the Internet and the more bandwidth available for other applications.

Furthermore, the need for dedicated resources for a proxy server is eliminated.

Some of these benefits are also claimed by reverse proxy servers that cache web content on behalf of a group of clients. Indeed, the Golgi Apparatus has much in common with traditional reverse proxy servers. There are, however, differences that make the distributed approach a more attractive solution. Affinity groups are independent of administrative domains. This is in contrast to departmental or organizational proxy servers which must exist within an administrative domain of the clients served. To be sure, affinity groups may split along departmental lines (such as in the case of an engineering team) but they equally may not (such as in the case of a group of students participating in the same course). Groups might even span an entire intranet, allowing all caches to be shared regardless of the level of expected content overlap.

Such an approach eliminates the administrative overhead of traditional proxy servers and simultaneously empowers users to improve their performance without administrative involvement. The costs of providing the cache and distributing content are shared across all participants. The Golgi Apparatus therefore eliminates the need for a proxy server and provides a lower-cost scalable distributed solution with better performance, improved content availability and no single point of failure.

(8)

1.1 Related Research

There are a number of related research efforts that are related to or similar to the Golgi Apparatus. These include:

Squid Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. It is the most available, most studied and most used proxy cache. It therefore serves as a benchmark for all other proxy servers, both in terms of functionality and performance.

The Golgi Apparatus differs from Squid primarily in the fact that it is not attempting to be a full proxy server, but rather is using the mechanisms of a proxy server to smoothly integrate with existing application and intercept web requests. It is possible that other interfaces exist (such as Internet API filters under Windows) that could accomplish a similar, albeit platform-specific, result. [21]

Squirrel A decentralized peer-to-peer web cache. Of all existing projects, this one most closely parallels the work being done for the Golgi Apparatus. While the problem being addressed is the same, Squirrel represents a proof-of-concept approach and is built on other infrastructures, most notably Pastry. The Golgi Apparatus represents new development of all infrastructural services tailored to the problem of distributed content caches. [14]

(9)

2 Product Specification

The Golgi Apparatus provides the functional, distributed system and interoperability capabilities described in the following sections. These meet most of the requirements specified in the original system specification.

2.1 Functional Capabilities

This section details the end-user functional characteristics of the Golgi Apparatus. The actual mechanisms and interfaces are not defined here.

2.1.1 Caching

The Golgi Apparatus takes over all responsibility for all web content caching, a task usually provided by Internet Explorer or Netscape. This caching capability is a superset of that provided by the browsers. It supports additional user-defined replacement policies that are more effective at managing content. Unlike browser caching, the caching provided by Golgi Apparatus can be used by any application that supports a proxy configuration.

2.1.2 Groups

The Golgi Apparatus allows the user to join groups by selecting the host name or address from a list of hosts available on the local subnet (or entering a specific address or resolvable name). The metaphor used is identical to that of creating buddy lists for instant messaging and the user interface is designed to be familiar to anyone who has used IM applications. When you select a host to join a group, that host must also agree. This represents the true peering nature of the product: the offer to join a group and acceptance represent a non-transitive agreement between the two hosts to share content. Each host may then have a different set of peer relationships. The content distribution network formed is therefore similar in structure to small-world social networks rather than hierarchical, flat or centralized networks.

2.1.3 Configuration

Each component is configurable to control parameters that affect performance or resource usage. For example, the cache replacement policy and the maximum amount of disk space consumed will be user-settable. Likewise, the maximum amount of memory consumed for the shared namespace can be limited. Configuration parameters are controlled through the use of a resource file consisting of name-value pairs.

2.2 Distributed Systems Capabilities

These individual caches are distributed with partial replication based on content access patterns and accessed through a global namespace with loosely-coupled replicas on all systems participating in an affinity group. Updates to the namespace are published to the group on periodic basis. Group membership is maintained by an affinity manager which, like the namespace, provides a loosely consistent view of the membership across all members.

With this in mind, the Golgi Apparatus provides the following key features of a distributed systems:

Location Transparency

Users that participate in the distributed cache do not know—and should not know—where the content actually comes from. The system is responsible for selecting the source that balances the resource constraints imposed by all participating members and simultaneously provides the best performance. In the common case, this will result in faster content access than possible by directly containing the source. In rare cases (such as during membership update windows), content access may be slower.

Failure

Transparency To the extent that a desired piece of content is available, the cache tolerates a variety of individual failures: if a peer is down, content can come from another peer; if all peers are down, content can come from the origin. If the origin is down, a client is no worse off than a

(10)

system without the Golgi Apparatus.

2.3 Interoperability Capabilities

The Golgi Apparatus is designed to interoperate smoothly with existing web applications and browsers such as Internet Explorer and Netscape. It is configured as the default proxy server for these applications and from that point on takes on responsibility for all local caching as well as participation in the distributed cache.

The Golgi Apparatus does not require any modifications to applications, clients or the existing network infrastructure. It can coexist with dedicated proxies (which may be necessary due to the multifunctional role many proxies play offering NAT, firewall and monitoring services).

2.4 Operating Environment

The software and hardware environment required by the product covers both the configuration of the clients and the interconnections between them.

2.4.1 Network Infrastructure

The benefits from sharing caches depend not only on the number of clients participating, but on network throughput between clients being substantially greater than the throughput to the content origins. The Golgi Apparatus is therefore well suited to environments with large numbers of LAN-based clients and few Internet connections at much slower speeds. This is typical of corporate and campus environments. This environmental requirement is imposed by the design.

2.4.2 Hardware and Software Configuration

The Golgi Apparatus will initially be designed to support Internet Explorer 5.x and 6.x and Netscape 4.x and 6.x on Windows 2000 and Windows XP platforms. This will allow the project to meet its time-to-market constraint and provide a product that satisfies a significant portion of the user base. There is, however, nothing about that design that precludes it from being ported to other platforms in the future. This environmental requirement is imposed by the implementation.

The Golgi Apparatus also requires ample memory and disk space. The service consumes 8-16MB of memory which is consistent with other peer sharing applications. The amount of disk space allocated to the cache can be the same as that already allocated to existing Internet Explorer or Netscape caches.

(11)

3 System Architecture

A Proxy web cache is a computer application that provides a shared cache to a set of web clients. A client configured to use a proxy will receive all the client’s requests regardless of which origin servers these requests are pointed to. The Proxy either responds to these requests using previously cached responses or obtains responses from the origin server on behalf of the client, forwarding the response to the client and optionally storing the response in its cache for future use. It has been thought that by servicing subsequent requests for the same object providing the object is stored in its cache (a cache hit) that a proxy can provide certain benefits including reduced latency (improved response time) and reduced bandwidth usage resulting in cost savings. Various studies have been done to quantify the benefits of a proxy and will be discussed in subsequent sections.

Consider the diagram below, which shows two streams of requests generated by two browser clients. The name of the of the requested are marked on the time axis as O1, O2, O3 and so on. Since neither client revisits any objects in their individual request streams, both will have a zero hit rate in their browser caches. However if both users are connected to the same proxy, all requests from client 2 will be in the cache after the previous requests from Client 1.

Thus the hit rate at the proxy cache will be a respectable 50 %

Client 1

Client 2

Intuitively, one would think the benefit of a shared cache would increase with the size of a client population using it.

However there are three potential drawbacks associated with a central proxy:

- One point of failure

- Load scalability – rate at which a proxy has to handle requests

- Geographical scalability – a proxy may be far removed in the network from many of its clients and the advantage of obtaining objects from the proxy cache rather than from the origin server becomes questionable

Cooperative proxy caching is a technique whereby individual where autonomous proxy web caches coordinate and share content with each other’s clients and can provide potential solutions to the above problems. Thus when a client sends a request for an object to a proxy cache that participates in a cooperative caching scheme, if the proxy has the object locally in its cache, the request is called a local hit. If the request does not hit locally, it is a local miss. In the context of cooperative caching, the proxy can fetch the object from another proxy, in which case the request is called a remote hit. Finally if the proxy fetches the object from the origin server, the request is called a global miss. The diagram below illustrates a Cooperative proxy cache.

Client

Internet Origin S

Campus LAN

Client 1

Client 2

PROXY

Origin S Internet

O

…

….

O1 O2 O3 O4

TIME

….

O1 O2 O3 O4

(12)

Golgi Apparatus falls under such a category of applications. The rationale is that such an application can be found useful in improving hit ratios in a group of small organizations. There are many forms of cooperative web caching broadly divided into:

Hierarchical Web caching where upper layers of the hierarchy often cover geographically larger domains.

A proxy sends all locally missed requests to its parents in the hierarchy. Examples include some versions of Squid.

Hashed based schemes where clients hash a request to one of multiple dedicated proxy caches. Each proxy becomes responsible for a well known subset of all web resources. Examples include Squirrel [

Directory based scheme where a directory service tracks the whereabouts of cached objects in order to redirecting client requests if not found locally. Examples include the CRISP proxy cache [Gaddle et al 1997b] and the Cache mesh application [Wang 1997]

Broadcast (Multicast) based systems where a proxy looking for an object issues a query to other proxies with which it cooperates

Golgi is based on a slightly different paradigm requiring no more infrastructure and can be thought of as a combination of the last two categories. Cooperative caching is enabled at each node by running an instance of the Golgi apparatus and the user subscribing to a peer group using the affinity manager. The web browser is then configured to use the Golgi apparatus instance as its proxy cache. The browser and Golgi apparatus share a single cache; one way to do this is by disabling the browser cache. No other changes to the browser (or to the external web servers) are necessary. The browser Client/Golgi instance in each node now plays the dual role of web browsing and cooperative caching. The diagram below how Golgi apparatus interfaces with the browser and client node and how a peer group of three would operate

I n t e r n e t

Web Server Web Server

Proxy

cooperation

Clients Clients Clients

(13)

The Namespace component is similar to a directory keeping track of object locations among a group of peer clients and is published at set time intervals to all subscribing peers. The Namespace is a fully replicated directory within the peer group but may loose consistency between the updates which can be controlled to occur within a certain time interval. Once the client node subscribes to a peer group, the Affinity manager exchanges cache content information between peers using selective multicast.

3.1 Functional Components

The major functional components of the Golgi Apparatus are described below. Each of these components has well defined interfaces and interactions with other components.

Proxy The Proxy component acts as a traditional reverse proxy server to applications with the exception that it runs on the local host. This component is responsible for intercepting all HTTP requests from web applications and overseeing the process of locating the best source for the content (local, peer, origin), satisfying the freshness constraints on content and updating and sharing content with peers.

Namespace

Manager The Namespace Manager manages the URI namespace for all content known within an affinity group. It provides fast partial lookups using URIs as keys and returns the information necessary to obtain content (location and host access, age, type). The namespace is maintained via updates from the Proxy and from the Affinity Manager and is initially populated with locally cached content.

Content Cache The Content Cache manages all stored content that has been accessed by a browser. This replaces the default cache maintained by the browser so that the project can manage its own aging and replacement policies.

Publisher and

Subscriber The Publisher and Subscriber serialize and coordinate the update of changes to locally cached objects to peers through the Affinity Manager. This module is responsible for propagating changes to local namespaces to peers and managing namespace consistency.

Affinity

Manager The Affinity Manager manages peer groups with which a client shares cached content and tracks the performance, availability and reachability of those peers. It updates the namespace with changes received from other peers and accumulates and publishes local changes made to the namespace.

Connection

Manager The Connection Manager provides a common interface to the network with the additional functionality of managing persistent connections to other hosts to reduce the overhead of connection setup and teardown and the penalty of slow start suffered by many HTTP requests. It also reports reachability problems to the Affinity Manager.

Peer Management GUI

The GUI is the only exposed user-interface component of the project. It allows the user to manage membership in affinity groups. In the future it will also allow users to manage the caching and sharing policies, import and export cache contents, and monitor and limit the resources and performance of the service (such as network bandwidth and CPU consumed as a result of participation in the distributed cache).

Configuration

Manager This component provides a platform-independent means of accessing configuration parameters using named lookups. It obtains these from either the registry or a configuration file.

Client Node

Standard

Browser Standard Browser

Proxy

Namespace

…

Client Node

Standard

Browser Standard

Browser

Proxy

Namespace

…

Client Node

Standard

Browser Standard Browser

Proxy

Namespace

…

Cached Objects

Namespace Update

(14)

3.2 Data Flow

This section describes the overall request and response dataflow that occurs when a web application requests and obtains a piece of content. This is described in terms of the functional components introduced above. Additional detail is available in Chapter 4 on component design and implementation.

3.2.1 Request Data Flow

Namespace

Manager Content

Cache

Affinity Manager ApplicationsWeb

Publisher and Subscriber

Network Dispatcher

Find Fetch

Dispatch

Update

Get

Proxy and Web Server

Golgi Peer Management

GUI

Send

Receive

Configuration Manager Origin Web

Server

Send

1

3 4

5 8

7

6 2

Peer Group

GUI ^Find

9

Figure 3-1: Request Data Flow

The steps in the request data flow can be summarized as follows:

1. Web applications make HTTP get requests which are intercepted by the network dispatcher.

2. The dispatcher assigns a worker thread to process the request and hands it to the proxy.

3. The proxy looks up the request URI in the namespace which tracks local and remote sources for content 4. If fresh content exists locally, then it is delivered from the cache.

5. If fresh content exists on a peer, then a request is sent to the peer to obtain the content

6. Lastly, if the content doesn’t exist in the group or is stale, the request is sent to the origin server.

7. In parallel with this, the client is receiving updates about new content available in the peer group.

8. These updates are added to the namespace so the proxy can learn about new sources for content.

9. The peer group GUI manages group membership and monitors the availability of other hosts.

(15)

3.2.2 Response Data Flow

Namespace

Manager Content

Cache

Affinity Manager ApplicationsWeb

Publisher and Subscriber

Network Dispatcher

Update

Send

Store

Proxy and Web Server

Golgi Peer

Configuration Manager

Receive Done

Response

Update

Management GUI

Origin Web Server

Receive

7

5 4

1

8

2 3

6

Figure 3-2: Response Data Flow

The steps in the response data flow can be summarized as follows:

1. If the content existed on a peer, it is obtained from the peer via the private Web server interface.

2. If the content doesn’t exist in the peer group, the content is received from the origin server,

3. The metadata associated cacheable content is sent to the publisher for distribution in the background.

4. Cacheable content is stored in the local content cache.

5. A namespace entry is created or updated with the locally cached content information.

6. The request is completed through the network dispatcher.

7. The response is delivered to the application.

8. In the background, updates to locally cached content are published to the peer group.

3.3 Base Class Architecture

The entire project is written in C++ using object-oriented design techniques and well-defined classes. At the base of the class hierarchy are three classes that standardize the most of the common functionality in the system. These are shown Figure 3-3.

(16)

+AssertValid() +Dump() +operator new() +operator delete()

CObject

+AddRef() +Release() +GetRef()

-ReferenceCount : unsigned long CCommon

+Attach() +Detach() +Initialize() +Start() +Run() +Stop() +Uninitialize() +GetState()

-State : EInterfaceState CInterface +RequestWriteLock()

+RequestReadLock() +ReleaseWriteLock() +ReleaseReadLock() +Lock()

+Unlock()

-WriteMutex : CMutex -ReadEvent : CEvent -ReadMutex : CMutex

-WaitingReaders : unsigned long = 0 -WaitingWriters : unsigned long = 0 -LockTimeout : unsigned long = -1

CLock

+INTERFACE_UNINITIALIZED +INTERFACE_INITIALIZING +INTERFACE_INITIALIZED -INTERFACE_STARTING -INTERFACE_STARTED -INTERFACE_RUNNING -INTERFACE_STOPPING -INTERFACE_STOPPED -INTERFACE_UNINITIALIZING

«enumeration»

EInterfaceState

Figure 3-3: Base Class Hierarchy

Detailed information on the base classes is available in the online documentation at the following URIs.

http://128.2.237.196/myweb/code/golgi/classCCommon.html http://128.2.237.196/myweb/code/golgi/classCLock.html http://128.2.237.196/myweb/code/golgi/classCInterface.html

3.3.1 CLock

Clock provides multiple readers/single writer locking semantics on any object with optional timeouts to detect potential deadlocks. The lock and unlock methods are synonyms RequestWriteLock and ReleaseWriteLock, respectively.

3.3.2 CCommon

CCommon provides atomic reference counting and dynamic release of objects, in addition to the common allocators and diagnostic methods inherited from CObject. Because objects are passed between modules with potentially many references and pointers to them, dynamic release ensures that only objects that are no longer referenced are deleted. A module adds a reference to an object when it doesn’t want the object to disappear (such as when it is

(17)

3.3.3 CInterface

The CInterface class solves the problem of initializing many mutually dependent modules by breaking the initialization process in four distinct phases and standardizing the types of operations that can be performed at each stage. These are captured in the methods that must be supported by a module interface:

Constructor The object constructor should construct base classes and other attributes, but perform no initialization.

Initialize Allocate memory, open files, and perform any setup that isn't dependent on other modules. At the end of this stage, the interface should be usable by other modules (although certain functionality might be delayed until the next stage).

Start Attach to dependent interfaces and complete any mutually dependent initialization.

Run Interfaces call this method to indicate startup code is done.

In addition, there are defined stages for returning an interface to its uninitialized state which is captured in the Stop and Uninitialize methods

Stop Detach from other interfaces and return to the Initialized state. The interface may still be used by others.

Uninitialize Return to the pre-initialized state. Object must be stopped at this point and all dependents detached.

Derived classes that override any of the default functions must call the CInterface versions before returning which keep track of an interfaces state and ensure consistency between modules.

The startup module will perform construction, initialization and starting of all module interfaces. The configuration module is initialized first and is available for all other interfaces at their construction time. When an interface is used, the caller must first Attach() to it. When completed, call Detach(). This provides reference counting and dynamic release of interfaces.

(18)

4 Component Design and Implementation

Detailed information on the methods and functions provided by each functional component is available in the online documentation in the project website (http://128.2.237.196/myweb/code/golgi/index.html). What follows here is a description of the design, operation and implementation that is not contained or may not be obvious from the individual method documentation available online.

4.1 Proxy and Web Server

The Proxy Web Server acts as a traditional proxy server and acts on behalf of client applications. It intercepts all local http requests and peer requests in order to service them. It oversees the process of locating the best source for the content (local, peer or origin) satisfying the freshness constraints on the content and updating the namespace on the new distribution of objects thus increasing the chances of sharing.

4.1.1 Design

Each request to the proxy comes in the form of a separate worker thread, which is serviced by the operations that the proxy class defines. In addition to servicing local and peer requests it also provides local ratings and system information to the Publisher class. Below is a diagram that illustrates how each request gets serviced. In general the request handler implements the following steps in order to service a request

1. It queries the namespace to determine if the content can be found in the local cache and is fresh. If the query is successful it services the client using the cached content.

2. If the query is unsuccessful it determines its peering status and queries the namespace again to determine whether the content can be found fresh at ony of its peer proxies.

3. If the content can not be found at the peer then it gets the content from the origin server.

4. Before sending the content it caches it provides it meets the caching directives and sends an update to the namespace.

Interface Worker

thread

Request

Buffer Web

Outside

Page

Proxy Inside Proxy

NameSpace Peer

Server CproxyEntry

CCacheEntry HTTP request

Web Page

Request handler

HTTP request Web Page

Peer Status

(19)

4.1.2 Implementation

Below is a flow chart that illustrates the decision points implemented in the request handler. There are basically two scenarios: handling requests from the local application and handling requests from peers.

4.1.2.1 Handling requests from a local client (web browser)

Is URI Found in Namespace?

4. Get the name of the peer where it is cached 5. Ask the Affinity Manager if the peer is still alive.

Yes

3. Invoke the GetHttp function to get the requested web page from the Web Server

No

1. Create a parser object and call its requestparse method to parse the request

2. Query the Namespace with the URI

Is it cached locally?

Does the local Cache Still have it?

No

Was the page found on the peer?

Yes

4. Ask the Web Server whether the content is still valid. (Conditional Get)

No Is the peer Alive?

6. Invoke the GetHttp function to get the page from the peer

Yes

No

A A

A

What does the Server Say?

5. Cache the Content Locally if got from Peer

4. Use the parser object to parse the response headers from the Server 5. Based on the HTTP Cache directives, check whether the page can be cached

6. If the page is cacheable add/update entry into the

Namespace and insert the page into the cache along with expiry date 7. Give the page to the worker thread

B No

B

6. Give the Cached Content to the Worker Thread

Yes

4.1.2.2 Handling Requests from Peers

1. Create a parser object and call its requestparse method to parse the request

2. Query the Cache with the URI

Does the local Cache Still have it?

3. Give this cached content to the worker thread 3. Return “404 Not Found”

No Yes

(20)

4.1.3 Interface

4.2 Namespace Manager

The Namespace Manager provides a hierarchical namespace which supports very fast URI lookups and translation from request URIs supplied by Web applications to target local or remote objects using prefix matching and substitution. Local objects are those that are cached or available through the local file system; remote objects are those available on a particular host or through a participating peer group.

These services provide flexibility in managing and organizing content and content sources similar to the filter specifications, map and alias directives in many HTTP servers. This functionality is in excess of the services required in anticipation of expanding the scope of the current project in the future. Consider, for example, the simple source and translated URIs in Figure 4-1.

+HttpParse() +requestparse() +responseparse() +gethflag() +getrhflag() +setCTexpires() +gethostname() +getreqmethod() +geturi() +ReqMessage +ResMessage +reqHead +resHead +reqUri : CUri

HttpParse

CUri +operator=()

+AddSource() +ClearLocalSource() +FindSource() +GetLocalSource() +GetRemoteSource() +GetRemoteSourceCount() +GetUri()

+IsLocal() +RemoveSource() +SetLocalSource() +SetUri()

-m_bLocal : CSourceList -m_LocalSource : bool -m_SourceList : CProxySource -m_Uri

CProxyEntry

-m_Host : <unspecified>&

-m_RemoteHost -m_Source

-m_ulRating : CProxySourceData CProxySource -m_Created

-m_ETag -m_Expires -m_Modified CProxySourceData

+CProxy() +CInitialize() +CReset() +CStart() +CStop() +GetHostName() +GetHostNameLength() +GetServiceRating() +IsCacheable() +PeerHandleRequest() +ProxyHandleRequest()

«interface»

CProxy CInterface

CCommon

CNamespaceEntry

(21)

/estore/index.html

URI Namespace Map

Alias /images/banner.gif Object /images/banner_cmu.gif

Type Local File Mapping (Exact)

Alias /estore/pricing/*

Object /presidentspecial/*

Type Local File (Partial-Partial)

Alias /news/*

Object http://128.2.11.43/reuters/*

Type Cached Proxy (Partial-Partial)

Alias /estore/*

Object /subst/catalog/feb2002/*

Type Local File (Partial-Partial)

/subst/catalog/feb2002/index.html

/images/smileyarrow.gif

/estore/pricing/collardgreens.html

/news/pa.html

/images/banner.gif

Invalid URI

/presidentspecial/

collardgreens.html

http://128.2.11.43/reuters/pa.html

/images/banner_cmu.gif

Alias /estore/monkeytoys/*

Object /notavail.html Type Local File (Partial-Exact)

/estore/monkeytoys/sticks/sharpones.html /notavail.html

Alias /checkout.html*

Object http://128.2.11.43/checkout.html*

Type Redirect (Partial-Partial)

/checkout.html?custid=542123&oid=42 http://128.2.11.43/checkout.html?

custid=542123&oid=42

Figure 4-1: URI Mapping

As illustrated above, the mapping service supports several variations of partial and exact matches. In a partial to partial mapping the target is substituted for the matching portion of the source and the resultant URI is used to locate the object. This is used when a complete portion of the URI namespace resides locally or remotely in its entirety and allows a compact representation. Partial-partial matches do not need to be aligned on directory boundaries. The service also supports exact matches where the entire request URI must match and it is replaced with the entire target URI. Lastly, the service supports partial to exact matches. In this case, all URIs with a matching prefix result in the same response. The exact and partial-partial mappings are the most used in the Golgi Apparatus.

The namespaces on members of an affinity group are loosely consistent replicas. As local updates occur to the namespace (primarily due to new content additions and cache replacement) these are logged to the Publisher and periodically sent to the affinity group based on either the time elapsed since the last update or on the quantity of changes since the last update. The window during which namespaces will vary between any two members of an affinity group can be reduced—at the expense of network traffic—by increasing the frequency of updates. But it is never eliminated completely through the mechanisms provided in this design.

4.2.1 Canonical URIs and Paths

URIs and file paths must be in a canonical form before they can function as keys or be tested for equivalence.

Likewise, raw file names and paths must be converted to valid URI syntax before being used in HTTP requests or responses. The rules governing URIs are defined in RFC 2396 which describes a generic syntax applicable to many URL schemes, including HTTP. This, and the specific interpretations in HTTP 1.1 [11], provide the basis for converting between raw paths and URIs. These URIs can take one of two forms:

Absolute URI http://host[:port][path[?query]]

Absolute Path path[?query]

The Namespace provides a separate simple class which handles the conversion of URIs to and from canonical form.

It implements the rules described in the following sections.

Converting URIs to Paths

Windows file names can be up to 255 characters and consist of a restricted character set. This escaping mechanisms and canonicalization algorithms to make a filename which maps to a URI (and thereby avoid a separate index). The following rules form the basis of a conversion algorithm from URIs to canonical form (also known as canonical paths):

(22)

1. Start with the URI path. The access scheme, host and port should match those used to access the server but may be inaccurate or spoofed and are therefore discarded.

2. Discard any fragment identifier (a “#” and all characters following). Clients should never pass URI fragment requests to servers..

3. If the path is not present, assume it to be “/”, the virtual root. Empty paths are illegal in absolute paths but may occur in absolute URIs.

4. If the path is present but is not absolute (does not begin with “/”) return a 404 status (“Not Found”).

Clients should always convert relative URIs to absolute form using a base URI.

5. Convert all escaped characters (“%” followed by two hex digits) except an escaped slash (%2F) into their ASCII byte equivalents. The slash is escaped in HTTP URIs so that its unescaped form can unambiguously be used as a path separator. Unescaped slashes are path separators and filename slashes will appear in escaped form (“%2F”). Escaped forms are always converted to uppercase.

The HTTP protocol does not place any limit on the length of a URI, however the Golgi Apparatus limits URIs to a maximum of 255 characters per path component in canonical form with a maximum length of 512 characters. URIs in excess of this length will not be cached and will be passed through to the origin server. URIs are generated by a variety of clients, some of which may not adhere to the standards and some of which may be deliberately malicious.

The conversion from URIs to paths is therefore both flexible and robust.

Converting Paths to URIs

The URI generic syntax allows most ASCII characters except those conventionally used as delimiters around URIs (“<” and “>”, for example) and non-printable control characters. Each scheme, context and URI component imposes its own set of restrictions. Only the characteristics of the URI path component need to be considered when converting from paths to URIs. These are as follows:

• Path characters that should appear in unescaped form are “a”…“z”, “A”…“Z”, “:”, “@”, “&”, “+”, “$”, “,”,

“-”, “_”, “.”, “!”, “~”, “*” “'”, “(”, and “)”. All other characters must be escaped. The escaping of path characters unnecessarily is deprecated, but does not change the semantics of the URI.

• Path segments (which include both directory and file names) consist of one or more path characters. There is no limit on the maximum length nor are there restrictions on the initial character. Within a path segment, the characters “/”, “;”, “=”, and “?” are reserved. Each path segment may include a sequence of parameters, indicated by the semicolon (“;”). The parameters are not significant to the parsing of relative references or in the processing of aliases during URI mapping.

• The path consists of a sequence of path segments separated by a single slash character (“/”). Absolute paths begin with a slash, indicating the virtual root. A path may be followed by query parameters indicated by the question mark (“?”).

To convert a path to URI for use in a HTTP request or response, all characters outside those allowed in unescaped form are converted to their escaped form (an escaped slash (%2F) is left unmodified). The Golgi Apparatus preserves query parameters from the original request.

4.2.2 Design

The namespace is managed as a ternary search tree consisting of construction nodes and user nodes (those associated with a user object stored in the namespace). This structure was chosen because it balances the memory efficiency of

(23)

A binary search tree that represents 12 common two-letter words.

For every node, all nodes down the left child have smaller values;

all nodes down the right child have greater values.

Digital search tries store strings character by character. This tree represents the same set of words; each input word is shown beneath the node that represents it. In a tree representing lowercase words, each node has 26-way branching. Searches are fast: a search for “is” starts at the root, takes the “i” branch, then the

“s” branch, and ends at the desired node. At every node, we have one of 26 possible branches. Tries have exorbitant space requirements: nodes with 26-way branching typically occupy 104 bytes, and 256-way nodes consume 1K.

A balanced ternary search tree for the same set of words. The low and high pointers are shown as solid lines, while equal pointers are shown as dashed lines. Each input word is shown beneath its terminal node. A search for the word “is” starts at the root, proceeds down the equal child to the node with value “s,” and stops there after two comparisons. A search for “ax” makes three comparisons to the first letter (“a”) and two comparisons to the second letter (“x”) before reporting that the word is not in the tree.

The disadvantages of this approach are that searches are most efficient on a balanced tree. When the namespace is first populated by the cache, entries are added in balanced order. Since the cache content can change over time, it may be necessary to added support for rebalancing (or simply rebuilding) the tree when the depth grows too large relative to breadth. This can be done in the background.

4.2.3 Interface

The Namespace Manager provides interfaces for use by the Proxy, Cache and Publisher modules that allow them to find entries that were previously added to the namespace, add entries to the namespace and delete existing entries.

The relationships and use of these methods by other modules is illustrated in Figure 4-2. Additional detail is available in the online documentation at:

http://128.2.237.196/myweb/code/golgi/classCNamespace.html http://128.2.237.196/myweb/code/golgi/classCNamespaceEntry.html

(24)

Namespace Manager

Find

Proxy

Publisher and Subscriber

Find Add

Cache

Add Delete

Add

Figure 4-2: Namespace Interface and Class Diagram

Objects to be stored in the namespace must be derived from the CNamespaceEntry class. This class provides the attributes necessary to manage a node in the ternary tree. Instances of the CNamespaceEntry class are used to build the intermediate nodes in the tree. The module uses runtime type information to determine if a tree node represents a user object or a structural object. The namespace maintains no information about the key used to add an object (although this information is implicit in the location of an object in the tree) so managing keys is the responsibility of callers. In the case of CProxyEntry, the URI is used as the key and carried in the CProxyEntry object.

The Namespace Manager interface is exposed through the global instantiation of the CNamespace class.

Synchronization with other modules is performed at the interface level. That is, the namespace methods lock the entire namespace interface for write (Delete, Add, Purge methods) or read (Find method). This greatly simplifies the management of the tree structure. The effect on performance is minimal because all operations complete very quickly and the most common operations are read-only and can therefore proceed concurrently. The second most common operation is updates to proxy objects, but these take place outside the namespace after a successful Find so namespace interface locking is not involved. The methods provided by CNamespace are described below.

Namespace Method Summary

Name Description

Add Add an object derived from CNamespaceEntry to the namespace using the specified ASCIIZ string as a key. This string is only used to construct the control structure and is not required after completion. If the operation is successful, the function returns a reference to the new mapping object that corresponds to the URI. This may be new or an existing one if an attempt is made to add a string which already exists. The caller is responsible for determining whether or not the entry is a duplicate by checking the contents of the associated object. (New entries are always allocated clean). The entries are created in a ternary search tree. This structure exhibits the best performance when the tree is well balanced. This should be taken into consideration when creating entries.

Delete Given a string, remove an object from the namespace. If there is a user object satisfying an exact match on the string, the reference the namespace maintains is removed on the object

Remove a specific entry from the map based on an exact match of the supplied string and return the mapping entry object. This function removes an entry from the map based on an exact match. The mapping object exists until it is specifically deleted by the caller. The underlying tree structure is not deleted as a result of this call, nor is there any rebalancing performed.

Find Given a string, search the namespace for a matching object and return the best match found. This may be either an exact match or the most complete prefix match. For example, if a map contains {ABCD,

+Add() +Find() +Delete() +Purge() +Dump() -Traverse() -DumpEntry() -PurgeEntry()

-pRoot : CNamespaceEntry*

-EntryCount : unsigned long -TotalCount : unsigned long -MaximumDepth : unsigned long -MaximumWidth : unsigned long

CNamespace

+operator:assignment() +operator:copy() -SplitChar : char -Inserted : bool -VisitIndex : int

-pLow : CNamespaceEntry*

-pEqual : CNamespaceEntry*

-pHigh : CNamespaceEntry*

-pParent : CNamespaceEntry*

CNamespaceEntry CCommon CInterface

CProxyEntry

(25)

4.3 Content Cache

The cache serves as a database (and a database manager) where the ‘content’ is managed and stored. Our cache is responsible for maintaining all the stored content that has been access by the user’s browser. This replaces the default cache maintained by the browser so that we can manage our own aging and replacement policies.

4.3.1 Design

The cache will provide a fast and effective mapping between a URI (uniform resource identifiers) and the corresponding content. The user is able to specify the size of the cache that she/he wishes to share with the ‘affinity group’. The content cache is therefore required to maintain the size of the cache within these limits.

The cache manager is designed to optimize retrieval of cached content. In this light, the other operations on the cache, such as removal of expired content and replacement of content is delayed until ‘absolutely necessary’ (lazy updates). In addition, these ‘secondary’ operations are performed as and when necessary within the execution of the ordinary operations on the cache (namely insert and fetch operations).

The intuition behind this design decision was the consideration that the goal of the Golgi Apparatus was to reduce (if not eliminate) user perceived latency, i.e., the latency observed when the user accesses content (a cache fetch).

The volume of fetch operations is expected to outnumber the volume of other operations (e.g. removing expired content from the cache). This follows from the fact that the content stored in the cache will be

1. primarily of a static nature 2. frequently accessed by the user.

In our implementation, we use a hash table with 4096 buckets. The hash key that we compute is a 32bit number.

Thus, if we compute a hash key x, the entry will be placed in the hash-bucket numbered “x mod 4096”.

4.3.2 Interface

BOOL Fetch ( in URI , out Buffer , out Size)

The Cache interfaces with the Proxy Server, which accesses the cache to fetch cached pages (or pages that are requested by members of the affinity group). The cache will return the content referred to by the URI or FALSE if the content was not found.

BOOL Insert ( in URI , in Buffer , in Size , in Expiry)

The Proxy Server also informs the Cache when it fetches a new page that was not previously in the cache. The page is then added to the cache. The function returns FALSE if an error occurs.

BOOL Resize ( in Size ) This method allows the size of the cache to be reset.

BOOL MoveDir ( ) Allows the user to change the directory that is used to store cached content.

4.3.3 Other Private Methods

BOOL Chain ( in CacheEntry) Inserts a cache entry into the age list and the hash table.

BOOL AddIndexFileEntry ( in CacheEntry)

Stores information about the cache entry in the index file so that it can be reloaded on next startup

LPSTR GetFreshFileName ( ) Returns an unused filename for storing the next cache entry CACHEKEY GenerateHashKey ( in URI ) Generates the MD5 key value for the given URI.

BOOL PopulateCache ( ) Loads the cache at startup

BOOL Remove ( in CacheEntry ) Removes the given entry from the cache

4.3.4 Cache Lookup (Fetch)

A page in the cache is accessed by an external entity (the Namespace or the Proxy Server) by providing the URI for that particular page. The Cache returns the contents of the page to the requesting entity. The pages in the cache will be referenced using a fast hash function; namely MD5.

(26)

4.3.4.1 Motivation for using MD5

It has proven effective for use in web-caching (Squid). It is highly collision resistant and fair. It has been formally verified. MD5 hashing has been adopted by many cache implementations (Squid being one such) and is known to be effective. By choosing a hash-table structure for content retrieval, the computational complexity for a cache fetch is kept at O(1). Typical Scenario

Suppose we have 40960 entries in our cache (which is reasonable for a cache size of 100MB). Assuming that the hashing function is fair and collision resistant, this will mean an average of 10 entries per hash-bucket. It is

‘unlikely’ (i.e., the event occurs with negligible probability) that two entries in the same hash bucket will have the same key value (to illustrate this, there are 1048576 different key values that map to the same hash-bucket, thus the probability that 2 of the 10 values are the same is 10/1048576 = 9.5367*10^-6).

4.3.4.3 Algorithm

• Calculate the hash value (=x) of the URI (=U) whose contents are to be fetched

• Use hash-bucket whose number is the last 12bits of x (same as mod 4096)

• For each entry in this bucket (whose hash value is y and URI is U’) o If x == y

If U==U’ //Content found If Content is not expired

Read the content from disk.

Move content to the head of the age list Return TRUE to the parent function Else

Perform Cache Removal

Return FALSE to parent function

• If no match is found, Return FALSE to the parent function In the worst case, the expensive computations involved will be

• One hash-key computation

• (If match is found) One string compare (URI) and one time comparison (expiry)

• One disk read

4.3.5 Cache Removal (Remove)

Pages are removed from the cache as and when they expire. Also new pages that are fetched for the browser may replace older pages (and infrequently used pages) from the cache. These changes in the cache are propagated to the Namespace Manager so that all members of the affinity group are made aware of the most current state of the distributed cache.

Cache removal is performed in two circumstances.

1. The content has expired (called during a cache fetch or while initializing the cache)

(27)

stored is minimal. Though it may not be as effective as LFUDA or GDSF, it has marginally lower performance plus the added benefits.

• Remove the entry from the age list (either we have a pointer to the entry or it is the tail of the age list)

• Remove the entry from the hash table

• Remove the entry from disk

• Remove the entry from the Namespace

4.3.6 Cache Insertion (Insert)

As long as there is space available in the cache, the insert operations will also be low overhead. Once the cache reaches maximum capacity, there will be (on average) one removal operation performed for every insert operation.

Noting that cache insertion is done after the content has been passed to the browser, even this overhead is unlikely to be apparent to the user.

• Calculate the hash-key (=x) for the content

• Lookup the hash-bucket (x mod 4096) If it exists

Write the new contents to the file in which the content was stored Update the expiry time of the content

Move the entry to the head of the age list Update the Namespace

Return to the parent function Else

Insert the entry into the hash-bucket Insert the entry to head of the age list

Write the contents to a file on disk Update the Namespace

Return to the parent function

4.3.7 Cache Population (Populate)

At startup time, the contents of the cache have to be read from disk and the data structures (cache-entries, hash-table, and age list) have to be setup. Also the Namespace has to be updated regarding the current contents of the cache.

This process can be (conceptually) viewed as equivalent to one cache insertion per entry that is stored in the cache (minus the write to disk). The cache stores it last consistent state in an index while which is used to populate the cache during startup. The cache index file is also updated every time content is inserted into the cache.

The index file format is as follows:

index_file = entry*

entry = filename | TAB | size | TAB | expiry_time | TAB | uri filename = A +

uri = {A , 1 , $} + size = 1 +

expiry_time = year | , | mon | , | day | , | hr | , | mn | , | sc year = 1 | 1 | 1 | 1

mon = day = hr = mn = sc = 1 | 1

LEGEND | = concatenation 1 = a digit A = alphabets $ = special character

In a sense, Cache population is similar to cache insertion, except that in this case, the entries are already on the disk, we just have to inform the Cache about their presence and update its data structures accordingly.

Golgi Apparatus Project

Project

A Distributed Peer-to-Peer Web Cache Final Report Draft

ABSTRACT

Authors: Chris Lord, Nimit Sawhney, Madhur Joshi, Sajjid Salyani and Anupam Dhanuka

Software Version: Golgi Apparatus 0.0.001 Document Version: 0.0.003

Edit Number: 25

Last Revised: 5/10/2002 4:50 PM by Chris Lord

Web Site: www.andrew.cmu.edu/~nsawhney/ds

Source Library: arsenic.ini.cmu.edu/golgi

Date Printed: 5/10/2002 4:50 PM

Table of Contents

Project Team

Document Revision History

Related Web Sites

References

1 Introduction

1.1 Related Research

2 Product Specification

2.1 Functional Capabilities

2.1.1 Caching

2.1.2 Groups

2.1.3 Configuration

2.2 Distributed Systems Capabilities

2.3 Interoperability Capabilities

2.4 Operating Environment

2.4.1 Network Infrastructure

2.4.2 Hardware and Software Configuration

3 System Architecture

3.1 Functional Components

3.2 Data Flow

3.2.1 Request Data Flow

3.2.2 Response Data Flow

3.3 Base Class Architecture

3.3.1 CLock

3.3.2 CCommon

3.3.3 CInterface

4 Component Design and Implementation

4.1 Proxy and Web Server

4.1.1 Design

Interface Worker

thread

Outside

Proxy Inside Proxy

4.1.2 Implementation

4.1.3 Interface

4.2 Namespace Manager

4.2.1 Canonical URIs and Paths

4.2.2 Design

4.2.3 Interface

4.3 Content Cache

4.3.1 Design

4.3.2 Interface

4.3.3 Other Private Methods

4.3.4 Cache Lookup (Fetch)

4.3.5 Cache Removal (Remove)

4.3.6 Cache Insertion (Insert)

4.3.7 Cache Population (Populate)