• No results found

2.5 Chapter Summary

3.4.2 Algorithm Development and Analysis

Our approach supports both initial configuration, based on requests and responses recorded in an existing log file, and dynamic adaptation of that configuration in response to observed changes in real traffic as they happen. Synchronous Internet Nodes for Consistency (SINC) is structured as a set of basic policies responsible for handling HTTP/1.1 request and response messages and caching of documents. The communication aspect is encapsulated in the HTTP/1.1 protocol with changes to the messaging to handle the invalidation portion of the communication. Our measured criteria are messages, bytes, control messages, file transfers, unnecessary updates, staleness count, period of staleness, and server state overhead. We use a workload of traces from various organizations. Since we need to compare a wide variety of characteristics, the benchmark approach provides the best way to evaluate alternatives.

Feasibility of Server Based

Invalidation

4.1

Introduction

The popularity of the Web and its inherent problems have produced numerous research efforts for finding solutions to reduce bandwith requirements and to document the latency concerns of users and network administrators. The focus of most research to address the above stated problems is to observe Web performance by studying traffic under certain predefined conditions. This chapter presents preliminary work done to show the feasibility of employing server-based invalidation. The study uses an analytical approach to determine which cache consistency algorithms work best given certain network parameters.

The purpose of this study is to address the cache consistency issue. It provides an analytical approach to approximate which cache consistency algorithms work best given certain network parameters. A study was done to ascertain the rate at which documents on the Web change and to reveal how many messages are required to communicate these changes to proxies. This is accomplished by measuring when changes occur in documents at regular intervals and then estimating the number of messages that would be needed to propagate those changes. This allows us to enumerate the circumstances in which certain cache consistency mechanisms are most appropriate.

Two experiments are performed. First, we measure the fraction of documents that are modified often and try to determine whether the changes occur in a predictable manner. In other words, if a large number of documents change in a short period of time, do they change every x hours? If x is large, but the rate of read accesses is small, then today’s version of HTTP generates many unnecessary requests to the server. A major part of this study is to determine how many proxies would be notified of changes in a server document if server invalidation were used. We also analyze the data to reveal how many proxies actually access

more than one document at a server or access a document multiple times. This reveals how much overhead is involved in using the invalidation method to disseminate changes at the server. We want to provide consistent documents while minimizing the number of consistency checks necessary to accomplish this task.

The workload characteristics used in the feasibility study are based on log file analysis. The logs for this study are dated, mainly due to the lack of access to current log files. Many companies, government organizations, and Internet service providers do not provide access to their log files due to security and privacy concerns. This results in a reliance on log files from universities who engage in Web and Internet traffic analysis. We are aware that the Web has changed since 1999 with the increase in e-commerce and the rise in broadband users. According to the Nielsen NetRatings 2005 report, the number of Internet users has grown to 200,933,147, 67.8% of the U.S. population. This is double the usage that was reported by UTI in 2000 of 95,354,000 Internet users or 33.1% of the U.S. population. However, we are only using these log files to evaluate existing algorithms for feasibility of implementation. While the workload for this study is not current, we use more recent files for the performance analysis in Chapter 6.

The overall intent of our research is to define criteria for restructuring Web consistency protocols to adapt to specific conditions of the Internet. This includes understanding how popular documents in the Web change, determining their request history, and identifying the problems with existing mechanisms of coherency. This is accomplished by studying, and understanding the pattern of access to documents in terms of frequency of requests and the number of proxy accesses for a document. A primary goal of this study is to determine the effectiveness of server invalidation and its role in providing consistency by defining when invalidation is most useful. Finally, a set of effective consistency policies is then identified from the investigation. We make recommendations based on how frequently documents change and are accessed.

We answer the following questions.

• Are the documents modified in a predictable manner (never, sometimes, often, always)? • How does the access and modification rate affect the number of messages sent?

• How many web sites need to be notified of a change in a document if server invalidation

is used?

• How often do proxies contact the server for the same document and for different docu-

4.2

Analytical Evaluation of Consistency Mechanisms