A Framework for Providing Strong Cache Consistency on the Web

Felicia Doswell

Department of Computer Science

Virginia Tech

Introduction

The popularity of the World Wide Web (Web) has generated so much network traffic that it has increased concerns as to how the Internet will scale to meet future demand. The increased population of users and the large size of files being transmitted have resulted in concerns for different types of Internet users. Server administrators want a manageable load on their servers. Network administrators need to eliminate unnecessary traffic, thereby allowing more bandwidth for useful information. End users desire faster document retrieval. Proxy caches decrease the number of messages that enter the network by satisfying requests before they reach the server. However, the use of proxies introduces a concern with how to maintain consistency among cached document versions. Existing consistency protocols used in the Web are proving to be insufficient to meet the growing needs of the World Wide Web population. For example, too many messages are due to caches guessing when their copy is inconsistent. One option is to apply the cache coherence strategies already in use for many other distributed systems, such as parallel computers. However, these methods are not satisfactory for the World Wide Web due to its larger size and range of users. Our research includes understanding how popular documents in the Web change, and determining their request history. The main objective of our work is to use our findings to design and implement a consistency algorithm that provides improved performance over the current mechanisms presented in literature. We propose an algorithm that adapts based on requests and updates to documents.

Theoretical Background

Due to its simple interface to a wide array of media types and its accessibility from multiple platforms, the Web has become a major form of information dissemination. It provides easier access to information via Web browsers, sometimes referred to as ėclientsė. This massive availability of information has lead to a number of concerns. End users experience high latency when attempting to retrieve documents and images. High bandwidth consumption and network congestion are other problems evident. In addition, a large amount of the Web traffic is due to fetching of documents that have recently been retrieved, possibly by users at the same site. When multiple Internet users request the same item, it becomes apparent that network traffic and congestion can be reduced if the information was stored closer to users. In addition, if a user or a group of users access an item that has been retrieved before and the item has not changed, it is more efficient to serve a previously retrieved copy of the item, than to make costly attempts to retrieve it from a remote server.

 

Proxy caching is one way to address many of these concerns. They act as intermediate agents to allow multiple clients to quickly access a group of popular Web pages. In such an approach, requests for documents are redirected to a cache rather than being serviced directly by a server. In general, cached copies should be updated when the originals change imposing strong consistency. However, users may desire to have an out-of-date copy in order the retrieve the document more quickly. Most existing Web caches provide this type of weak consistency. On the other hand, if a stale copy is not tolerable, proxies must guarantee that the user will see the same version as the origin server. Current technology requires the user to ensure freshness, when desired, by pressing the reload button on a browser. This causes a burden on the server as well as the user. We propose and analyze a framework that provides strong consistency by allowing negotiation between the proxy and server. There are many algorithms proposed in literature that introduce the idea of a combination of strong and weak consistency, but usually to achieve bandwidth savings. We are concerned with maintaining strong consistency on popular documents while minimizing the server overhead of notifying all proxies that hold copies of its documents and managing bandwidth consumption.

Research Goals

This research effort will provide possible solutions to the consistency problem introduced in the Web when caching is employed. The principle objective is to develop a new technique for providing consistent documents while minimizing network and server cost. The objectives of the proposed research are the following:

 

  1. Identify which performance measures are most useful for evaluating consistency of cached documents.
  2. Quantitatively assess the cost of maintaining consistency with HTTP1.x using measures identified in Objective 1.
  3. Evaluate the feasibility of server-based invalidation as compared to client-based consistency.
  4. Acquire data (server log files) and perform pre-analysis on the data to determine the characteristics of web access.
  5. Develop a new algorithm for consistency that results in low staleness and minimal cost. This includes making the necessary modifications to HTTP1.1 in order to use the new algorithm.
  6. Establish the correctness of the new algorithm.
  7. Demonstrate the performance of the algorithm on benchmarks using measures from Objective 1.
  8. Compare the new algorithm to the existing proposed algorithms.

Research Status

We have completed all parts of this research except the last two items. This required implementing a proxy and server because we wanted an efficient application that eliminated unnecessary features that are present in the current implementations of proxy and server systems available to the public. The proxy and server with the consistency algorithm have been implemented in the Java programming language (for ease of implementation) and in the C programming language (for more efficiency in running the benchmark). We are currently preparing the benchmark to run thousands of proxies and to simulate traffic between each proxy and server using the new consistency algorithm and comparing it with three other existing algorithms.

Interim Conclusion

The cache consistency problem needs to be addressed to the extent that administrator prefer to use the mechanism without the many cost that arise from the currently implemented and proposed methods. Many experiments need to be carried out to understand the characteristic of data on the web, to determine what consistency mechanism to use based on the data and other trends in the Internet, and how to best maintain consistent copies throughout the web. We have investigated some characteristics of Web documents that affect how consistency should be performed in the Web. We conclude that there is a need for strong consistency for popular documents and that in some cases weak consistency is an option.

Program Status

I have completed my Qualifying Exam and the Ph.D. Preliminary defense. I am in the final stages of my Ph.D. dissertation work. I plan to complete my work by May 2003.

Expectation from Doctoral Consortium Participation

This is my first time participating in a doctoral consortium. I hope to gain from the experience of presenting to a broad audience of Computer Science professionals and educators that are not directly involved in my current research.

References

1.      P. Cao and Chengjie Liu. Maintaining strong cache consistency in the world wide web. In Proceedings of ICDCS'97, pages 12-21, May 1997.

2.      C. Gray and D. Cheriton. Leases: An efficient fault-tolerant mechanism for distributed cache consistency. In Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pages 202-210, 1989.

3.      J. Gwertzman and M. Seltzer. World-wide web cache consistency. USENIX Symposium on Internetworking Technologies and Systems, pages 141-152, January 1996.

4.        B. Krishnamurthy and C. E. Wills. Piggyback server invalidation for proxy cache coherency. In Seventh International World Wide Web Conference, volume 30, pages 185-193, April 1998.

5.      Ari Luotonen. Web Proxy Servers. Prentice-Hall, London, 1998.

6.      K. Worrell. Invalidation in large scale network object cache. Master's thesis, University of Colorado, Boulder, 1994.

7.      J. Yin, L. Alvisi, M. Dahlin, and C. Lin. Volume leases for consistency in large-scale systems. IEEE Transaction on Knowledge and Data Engineering, 1999.