Component-Based Framework for Memory Management Studies

Stephen P. Carl (scarl@cs.wright.edu)
Dept. of Computer Science and Engineering, Wright State University

Overview

Programs developed using modern programming languages make extensive use of dynamic allocation to create and manipulate objects on the heap. Once an object is no longer needed in a program, the object must be removed from the heap either manually or automatically. Such recycling of objects on the heap is necessary because the heap is a bounded resource. Runtime systems which provide automatic memory management systems for allocation and reclamation of objects on the heap are a key element in making software systems more reliable and modular, by freeing programmers from the error-prone task of reclaiming objects from memory, and by reducing the dependencies between modules which share references to heap objects.

The performance of the runtime system will have an impact on the performance of the program being executed. Traditionally an execution environment provides a single mechanism for handling storage allocation and another for handling reclamation, yet it has been shown for both allocators [WJNB95] and collectors [FT00] that different types of applications perform better when mechanisms other than those ``built-in'' to the system are used. In short, the memory usage patterns of an application (or even distinct phases of execution of an application) determine what memory management strategy will result in the best performance.

We are building a memory management simulator to measure the performance of different mechanisms on particular applications, under the assumption that choosing a mechanism which matches a program's allocation behavior will improve performance more than trying to tune a built-in mechanism to the program. The simulator is component-based in the sense that management mechanisms can be prototyped as units and loaded either at startup or dynamically as the simulation system runs. Further, the simulator has the following properties:

Experimental Control. Simulator is controlled by scripts which allow the user to specify such experimental parameters as heap size, input format, manager mechanisms to use, and what performance information to collect.
Prototype Framework. Memory management policies and mechanisms can be prototyped in the language to more adequately explore the design space. Simulations for different runtimes can be built quickly and flexibly compared to developing new managers .
Language independence. The user will be able to tailor the system to any specific language model.
Extensibility. Work in memory management systems encompasses a wide range of architectures for runtime systems and new proposals are forthcoming. The runtime components provide a foundation but may be extended so that our studies can proceed from basic systems, to more advanced systems, to newly-proposed systems.

Previous Work

Memory managment. Automatic systems for managing heap memory date from the early 1960's. Two classical storage reclamation techniques are reference counting and mark/sweep collectors [JL96]. Reference counting associates a count with each allocated object equal to the number of references to that object;  when the count reaches zero, the object is unused and can be reclaimed.  An advantage of reference counting is that the work of reclaiming storage is distributed throughout the normal execution of an application. One major disadvantage to reference counting is that objects in cyclic data structures will never have a reference count of zero, and so will never be recycled.

The mark/sweep method periodically "takes over" execution from a running program to examine the objects on the heap. In the mark phase, all heap objects which are reachable through some sequence of references are marked as "live". Then the sweep phase looks at each object currently allocated, and deallocates those which do not have a mark (or, are "dead"). A disadvantage is that every object on the heap must be examined, which for large heaps with many objects can result in a substantial pause time during which the program can do no work.

An improvement intended to address pause times is called copying collectors. The simplest of these divides the heap into to semispaces, and allocation occurs in only one at a time. When the currently allocated semispace is full, all live objects are copied from it into the other semispace and the process continues. In 1981, Lieberman and Hewitt developed the weak-generational hypothesis, which posits that most objects have a short life-time and can be recycled soon after they are allocated [LH83]. This led to the development of generational copying collectors, in which the heap is divided into two or more "generations."  All objects are allocated from the first generation (sometimes called the nursery); those objects which are still live when this generation becomes full are "promoted" or "tenured" into a succeeding generation. One special consideration which is specific to this type of collector is how to keep track of references between generations [JL96].  Generational collection is very flexible in the number of partitions in the heap and the way in which each partition is managed. For example, the Java HotSpot Performance Engine manages tenured objects using a mark/sweep collector; it is also possible to use the advanced train algorithm to incrementally manage this generation [HM92].

Memory Simulators. Simulators have been used for a number of years to study both the allocation behavior of specify applications and the behavior of memory management techniques. One of the first described systems was MARS, (Memory Allocation and Reference Simulator), described in Ben Zorn's dissertation [Zorn89]. This simulator was attached to a running Lisp system and allowed the user to study the impact of using different (simulated) garbage collection algorithms with a set of applications. Simulation is also important in the work of Darkovic [DS98], Holzle and Scheider [DH98], and Hanson [Han02]. Of these three, only [DH98] gives details about the simulator framework. Few if any such systems have been made available to the research community.

Runtime System Construction. As an alternative to simulation, some researchers provide interfaces for replacing existing memory management systems with newly developed systems. The Jikes Research VM from IBM [IBM-URL] is a Java virtual machine that allows the programmer to replace the existing manager at build-time, either with one of several provided managers or a user-defined manager. A set of Java classes are provided as an example of the interface to the runtime system. Earlier, the GC Toolkit was a library for programmers that allowed them to build generational collectors for Smalltalk and other languages [HMDW]; a version of this library has been developed to work specifically with the Jikes RVM [GCtk].

Research Goals

Our aim is to better understand the interfaces necessary to structure actual runtime environments such that memory management policies and mechanisms can be made to adapt to the requirements of a running application. We would like to show that structuring runtime systems as in the simulator can flexibly support a range of management requirements while retaining the efficiency required for today's applications.

Program Study Status

I am currently in my last term before proposing a research topic as part of the Candidacy Exam.  I expect to have completed the Exam by the time of the Consortium.

Interest in the Doctoral Consortium

While I am a part-time instructor and have classroom lecture experience, I have not had many opportunities to present my research ideas to a wide audience.  I would like to know if my presentations are accessible to computer scientists outside of the area, and I would like my experience in giving formal presentations.

References

[DH98]  Sylvia Dieckmann and Urs Holzle. A Study of the Allocation Behavior of the SPECjvm98 Java Benchmarks. Technical Report 1998-33, UCSB Computer Science Department. December, 1998.
[FT00]  Robert Fitzgerald and David Tarditi. The Case for Profile-Directed Selection of Garbage Collectors. In Proceedings of the Internation Symposium on Memory Management, October 2000.
[GCtk]  A Garbage Collection Toolkit. Architecture & Language Implementation Laboratory, Department of Computer Science, University of Massachusetts.
[Han02]  Lars T. Hansen and William D. Clinger. An Experimental Study of Renewal-Older-First Garbage Collection. International Conference on Functional Programming, 247-258, 2002.
[HMDW]  Richard L. Hudson, J. Eliot B. Moss, Amer Diwan, and Christopher F. Weight. A language-independent garbage collector toolkit. COINS Technical Report 91-47, University of Massachusetts, Amherst, September 1991. 
[HM92]  Richard L. Hudson and J. Eliot B. Moss. Incremental garbage collection for mature objects. In Proceedings of International Workshop on Memory Management, Lecture Notes in Computer Science 637. Springer-Verlag, September 1992.
[IBM-URL]  Jikes Research Virtual Machine from IBM 
[JL96]  Richard Jones and Rafael Lins. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. John Wiley and Sons, 1996.
[LH83]  Henry Lieberman and Carl Hewitt. A Real-Tim Garbage Collector Baswed on the Lifetimes of Objects, Communications of the ACM 26(6), pp. 419-429, June 1983.
[DS98]  Darko Stefanovic. Properties of Age-based Memory Reclamation Algorithms. Ph.D. tehsis, University of Massachusetts, February 1999.
[WJNB95]  Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In 1995 International Workshop on Memory Management, LNCS 986. Springer, 1995.
[Zorn89]  Benjamin Zorn. Comparitive Performance Evaluation of Garbage Collection Algorithms. Published as CSD-89-544 from University of California, Berkeley. 
Last modified: Sat, Nov 30, 2002, 9:28 pm
HTML conversion by TeX2page 4p4k3