A
Framework for Providing Strong Cache Consistency on the Web
Felicia Doswell
Department of Computer Science
Virginia Tech
The popularity of the World Wide Web (Web)
has generated so much network traffic that it has increased concerns as to how
the Internet will scale to meet future demand. The increased population of
users and the large size of files being transmitted have resulted in concerns
for different types of Internet users. Server administrators want a manageable
load on their servers. Network administrators need to eliminate unnecessary
traffic, thereby allowing more bandwidth for useful information. End users
desire faster document retrieval. Proxy caches decrease the number of messages
that enter the network by satisfying requests before they reach the server.
However, the use of proxies introduces a concern with how to maintain
consistency among cached document versions. Existing consistency protocols used
in the Web are proving to be insufficient to meet the growing needs of the World
Wide Web population. For example, too many messages are due to caches guessing
when their copy is inconsistent. One option is to apply the cache coherence
strategies already in use for many other distributed systems, such as parallel
computers. However, these methods are not satisfactory for the World Wide Web
due to its larger size and range of users. Our research includes
understanding how popular documents in the Web change, and determining their
request history. The main objective of our work is to use our findings to
design and implement a consistency algorithm that provides improved performance
over the current mechanisms presented in literature. We propose an algorithm
that adapts based on requests and updates to documents.
Due to its simple interface to a wide array of media types and its accessibility from multiple platforms, the Web has become a major form of information dissemination. It provides easier access to information via Web browsers, sometimes referred to as ėclientsė. This massive availability of information has lead to a number of concerns. End users experience high latency when attempting to retrieve documents and images. High bandwidth consumption and network congestion are other problems evident. In addition, a large amount of the Web traffic is due to fetching of documents that have recently been retrieved, possibly by users at the same site. When multiple Internet users request the same item, it becomes apparent that network traffic and congestion can be reduced if the information was stored closer to users. In addition, if a user or a group of users access an item that has been retrieved before and the item has not changed, it is more efficient to serve a previously retrieved copy of the item, than to make costly attempts to retrieve it from a remote server.
Proxy caching is one way to address many of these concerns. They act as intermediate agents to allow multiple clients to quickly access a group of popular Web pages. In such an approach, requests for documents are redirected to a cache rather than being serviced directly by a server. In general, cached copies should be updated when the originals change imposing strong consistency. However, users may desire to have an out-of-date copy in order the retrieve the document more quickly. Most existing Web caches provide this type of weak consistency. On the other hand, if a stale copy is not tolerable, proxies must guarantee that the user will see the same version as the origin server. Current technology requires the user to ensure freshness, when desired, by pressing the reload button on a browser. This causes a burden on the server as well as the user. We propose and analyze a framework that provides strong consistency by allowing negotiation between the proxy and server. There are many algorithms proposed in literature that introduce the idea of a combination of strong and weak consistency, but usually to achieve bandwidth savings. We are concerned with maintaining strong consistency on popular documents while minimizing the server overhead of notifying all proxies that hold copies of its documents and managing bandwidth consumption.
This research effort will provide possible solutions to the consistency problem introduced in the Web when caching is employed. The principle objective is to develop a new technique for providing consistent documents while minimizing network and server cost. The objectives of the proposed research are the following:
We have completed all parts of this research except the last two items. This required implementing a proxy and server because we wanted an efficient application that eliminated unnecessary features that are present in the current implementations of proxy and server systems available to the public. The proxy and server with the consistency algorithm have been implemented in the Java programming language (for ease of implementation) and in the C programming language (for more efficiency in running the benchmark). We are currently preparing the benchmark to run thousands of proxies and to simulate traffic between each proxy and server using the new consistency algorithm and comparing it with three other existing algorithms.
The cache consistency problem needs to be addressed to the extent that administrator prefer to use the mechanism without the many cost that arise from the currently implemented and proposed methods. Many experiments need to be carried out to understand the characteristic of data on the web, to determine what consistency mechanism to use based on the data and other trends in the Internet, and how to best maintain consistent copies throughout the web. We have investigated some characteristics of Web documents that affect how consistency should be performed in the Web. We conclude that there is a need for strong consistency for popular documents and that in some cases weak consistency is an option.
I have completed my Qualifying Exam and the Ph.D. Preliminary defense. I am in the final stages of my Ph.D. dissertation work. I plan to complete my work by May 2003.
This is my first time participating in a doctoral consortium. I hope to gain from the experience of presenting to a broad audience of Computer Science professionals and educators that are not directly involved in my current research.
1. P. Cao and Chengjie Liu. Maintaining strong cache consistency in the world wide web. In Proceedings of ICDCS'97, pages 12-21, May 1997.
2.
C. Gray and D.
Cheriton. Leases: An efficient fault-tolerant mechanism for distributed cache
consistency. In Proceedings of the Twelfth ACM Symposium on Operating Systems
Principles, pages 202-210, 1989.
3.
J. Gwertzman
and M. Seltzer. World-wide web cache consistency. USENIX Symposium on
Internetworking Technologies and Systems, pages 141-152, January 1996.
4.
B.
Krishnamurthy and C. E. Wills. Piggyback server invalidation for proxy cache
coherency. In Seventh International World Wide Web Conference, volume 30, pages
185-193, April 1998.
5.
Ari Luotonen.
Web Proxy Servers. Prentice-Hall, London, 1998.
6.
K. Worrell.
Invalidation in large scale network object cache. Master's thesis, University
of Colorado, Boulder, 1994.
7. J. Yin, L. Alvisi, M. Dahlin, and C. Lin. Volume leases for consistency in large-scale systems. IEEE Transaction on Knowledge and Data Engineering, 1999.