Pointer Variables and Dynamic Allocation

Pointers: The Problem

Pointers - The Problem

Problem: Knowing the size of an object at compile time is not always possible

Example: List with varying number of elements
Example: Array whose elements are members of a type hierarchy

Parent: Person. Children: Staff, Student. Array of Person

Problem: Making a copy of an object (eg value semantics) is not always efficient

Example: a Large record

Problem: Making a copy of an object (eg value semantics) is not always feasible

May not want more than one instance of a dictionary, a DB, or an table in the OS

The Solution

Solution: Use pointers to reference objects

Why does this help:

Makes it possible for programmer dynamically allocate memory as needed

Pointers are fixed size, even if the items they point to are not

Terminology

We use these terms as roughly synonymous:

Pointer
Reference
Access

Other terms:

Access type: type whose value is an access of a value of another type

Dereference: follow a pointer to the object it points to

Object: anything in memory (not just as in OO)

Memory address: the location in memory at which an object is stored

Introduction: Box Example

Java References: Class Box

Let's start with something you already know: Java references

Example:

        class Box{ 
            int l;
            int w;
        }

        class Client{
            psvm(){
                Box p;          -- What's allocated for this local variable?
                p = new Box();

                p.l := 3;
                p.w := 4;
                S.o.p(p.l + p.w);

                Box q = null;   -- What if no initialization?
                S.o.p(q.l);     -- What happens here?
            }
        }

Start with type Box

Shows a type for a Box with a length and a width:

        type Box is record
            len, wid: Integer;
        end record;

        -- What is allocated
        b1, b2: Box
    begin
        b1.len := 11;
        b1.wid := 22;

        b2 := (33, 44);

boxexample.adb (prettified)

Now Introduce Type BoxPtr

Shows:

type BoxPointer is an access type - it points to a Box
we allocate boxes dynamically with new Box or new Box(1, 2)
Dynamic allocation creates the box and returns its address
.all dereferences (ie follows the pointer)

Example:

        type Box is record
            len, wid: Integer := 0;
        end record;

        -- A type for pointers to boxes
        type BoxPtr is access Box;

        b1: Box;

        -- What is allocated here?
        p1, p2, p3: BoxPtr;
    begin
        b1.len := 11;
        b1.wid := 22;

        p1 := new Box;  -- Similar to Java

        -- .all means to dereference the pointer
        p1.all.len := 33;
        p1.all.wid := 44;

        p2 := new Box;
        p2.all := (55, 66);  -- Aggregate assignment

        p3 := new Box'(77, 88);  -- Allocate and initialize

boxptrexample1.adb (prettified)

Refer to the Entire Box, with .all

Can refer to the entire box with .all

Example:

        type Box is record
            len, wid: Integer := 0;
        end record;

        -- A type for pointers to boxes
        type BoxPtr is access Box;

        b1: Box := (11, 22);

        -- What is allocated here?
        p1, p2: BoxPtr;
    begin
        p1 := new Box'(33, 44);  

        p1.all := b1;

        p1.all := (33, 44);

        p2 := new Box;
        p2.all := p1.all;

        p2 := p2;    -- What does this do?

Some Things Never Change ...

Another similarity with Java

        type Box is record
            len, wid: Integer := 0;
        end record;

        type BoxPtr is access Box;

        b1 := (1, 2);

        p1, p2: BoxPtr;
    begin
        p1.all.len := 11;    -- Okay?

        p1.all := p2.all;    -- Okay?

        p1.all := b1;        -- Okay?

        p1 := p2;            -- Okay?

Some Strong Typing

Now let's have two pointers

        type Box is record
            len, wid: Integer := 0;
        end record;

        type Box3D is record
            len, wid, height: Integer := 0;
        end record;

        type BoxPtr is access Box;
        type Box3DPtr is access Box3D;

        bp: BoxPtr;
        bp3D: Box3DPtr;
    begin
        bp := new Box;
        bp3D := new Box3D;

        bp := bp3D?  -- Okay?

        bp := new Box3DPtr?  -- Okay?

        bp3D.all := (len => 11, wid => 22);  -- Okay?

Can .all can be Omitted?

Sometimes .all can be omitted

Example:

        type Box is record
            len, wid: Integer := 0;
        end record;

        -- A type for pointers to boxes
        type BoxPtr is access Box;

        p1: BoxPtr;
    begin
        p1 := new Box'(11, 22);

        p1.all.len := 33;
        p1.wid := 44;

        put(p1.len);
        put(p1.all.wid);

Shorthand notation: in some cases you get automatic dereference

Sometimes .all is Required

.all is required when referring to the entire object

Look carefully at the two put routines and their calls:

        type Box is record
            len, wid: Integer := 0;
        end record;

        -- A type for pointers to boxes
        type BoxPtr is access Box;

        procedure put(b: Box) is begin
            put(b.len); 
            put(b.wid);
        end put;

        procedure put(b: BoxPtr) is begin
            put(b.len); 
            put(b.all.wid);
        end put;

        p1: BoxPtr;
    begin
        p1 := new Box'(11, 22);

        -- Is there a difference here?
        p1.all := (33, 44);
        p1     := (33, 44);

        put(p1);       -- What gets called?

        put(p1.all);   -- What gets called?

Type BoxPtr: Another Example

Here is another sample program

boxptrexample2.adb (prettified)

Illustrates:

Allocation with initial values: p := new Box'(3,4);
A pointer parameter
Strong typing
.all is optional: p.all.l := 55; p.w := 66;

Some Practice and Strong Typing

In the code below, consider the expressions in each statement

    p: BoxPtr := new Box'(1, 2);
    q: BoxPtr := new Box'(3, 4);
    b: Box;
begin
    p.all.x := q.all.x;
    p.x := q.x;
    p.all := q.all;
    p.all := b;
    p := q;

    -- Strong typing - these fail type checking
    p := q.all;
    p := b;

    p.all := 3;
    p.all := q;

One Type in Java vs Two in Ada

Java and the Object Referenced

Now consider some Java statements:

    Box p = new Box(1, 2);
    Box q = new Box(3, 4);

    // p.all.x := q.all.x;   // Let's do this in java?
    p.x = q.x;

    // p.all := q.all;       // Can we do this in java?

How do we implement something similar in Java?

Java Comparison: Two Types vs One

Example:

        class Box{ 
            int l, w;
        }

        class Client{
            psvm(){
                Box p = new Box();
            }
        }

Two types vs one:

Ada uses separate types for the pointer and the thing it points to

Informally, Java uses the same type for the pointer and the object it refers to

Java Terminology: Two Types vs One

Terminology:

Actually, Java does distinguish between the two:

Actually, Java says that p has type Box and the object has class Box
The object is said to belong to class Box

Sometimes we say the variable has a run-time type which refers to the class of the object that the variable points to
The First Edition of the Java Language Specification had a section entitled "Variables Have Types, Objects Have Classes"

Dynamic Structures and Linked Lists

Problem and Solution

Problem: we want structures whose size is dynamic

Solution:

Allocate and deallocate mememory as needed

Typically using a linked structure

Examples: linked lists, stacks, queues

Dynamic Stacks and Queues

Dynamic stacks and queues:

We can use linked structures to create lists, stacks, and queues

These structures can grow and shrink as needed

Types for Linked Structures - Version 1

What we want: Define a type for a node that ...

contains a field, typically called next, that ...

points to another (typically the next) node

What would a picture of the memory for this look like:

Possible solution (with a possible problem):

        type Node is record
            data: Integer;
            next: access Node;
        end Node;

How would we access the list?

What would the type of the access value be?
What if we wanted a named type for the access?

Types for Linked Structures - Version 2

Let's try this named type for the access Node:

        type Node is record
            data: Integer;
            next: NodePointer;
        end Node;

        type NodePointer is access Node;

        p: NodePointer;

Okay, but p and the field next should have the same named type

Let's try this:

        type NodePointer is access Node;

        type Node is record
            data: Integer;
            next: NodePointer;
        end Node;

What is the problem?

Solution: Use a partial definition

        type Node;      -- A partial definition of the name  Node 

        type NodePointer is access Node;

        type Node is record
            data: Integer;
            next: NodePointer;
        end Node;

Problem: Circular definition

Making a List

Let's create a list with three nodes holding integers 1, 2, 3

First draw a picture

Now let's write code to PRINT the list

In what order do we add the nodes?

Now the code:

        p, q: Node
    begin
        p := new Node;
        p.all.data := 3;
        p.all.next := null;

        t := p;
        p := new Node'(2, t);
        p.all.data := 3;
        p.all.next := t;

        -- Can we create the second node in one statement
        -- Can we create the second node in one statement
        -- Can we create the third node in one statement

Let's write the code to print each element of the list

Dynamic Stacks and Queues

A Dynamic Stack

Our stack will add and remove nodes as we push and pop

Types for a Stack and a StackNode (ignoring privacy, for now)

        type StackNode;
        type Stack is access StackNode;

        type StackNode is record
            data: Integer;
            next: Stack;
        end Node;

what is a Stack:

A Stack is a pointer to a StackNode

How would we code these routines:

            s: Stack;
        begin
            push(i, s);
            pop(s);
            put(top(s));

What modes are the parameters?

Stack specification (and prettified)

An Empty Stack

Notice that empty stack is different in Ada and Java

Ada:

            s: Stack;
        begin
            if isempty(s) then ...

What happens in Java?

        Stack s;
        if (s.isNull())

What does an empty stack look like

In Ada
In Java

A Dynamic Queue

Example using a dynamic queue

        myQ: Queue;
        enqueue(3, myQ);
        enqueue(4, myQ);
        while not isempty(myQ) loop
            f := front(myQ);
            dequeue(myQ);
        end loop;

Looks just like a regular queue

Dynamic queue implementation:

A Queue is a linked list of nodes (like a stack)

Needs pointers to both the front and back of the queue

The front and back pointers point to the first and last nodes in the list

Comparison with a dynamic stack:

A stack contains just one pointer (ie the top)
A variable of type Stack IS a pointer to a StackNode

A variable of type Queue IS NOT a pointer to a QueueNode

It is a pair of pointers

Code for Dynamic Stacks and Queues

Examples specifications and bodies:

Stack specification (and prettified)

Partial Stack implementation (and prettified)

Queue specification (and prettified)

Class Exercises for Linked Structures

Write the implementation of operations that will print the first, second, third, last, and all elements of a stack.

Write the implementation of a copy operation that would efficiently make a copy of an existing stack/queue using the underlying dynamic implementation, without using push and pop.

Write the implementation of a = operation that would efficiently determine whether or not two stacks/queues are the same, using the underlying dynamic implementation.

Problems with Pointers - And Solutions

Three Possible Kinds of Errors

We will look at three kinds of errors that can happen with pointers:

Null reference
Garbage creation (ie memory leak)
Dangling reference

Error 1: Null Pointers

What happens here?

    type Box is ...
    type BoxPtr is access Box;

    b1: Box := (2, 3);
    p1: BoxPtr;
begin
    p1.all := b1;

    put(p1.all.w);

Error 2: Garbage

What is garbage?

In Java, how can we create garbage:

    Foo f;

    f = new Foo();
    f = new Foo();

In Ada the same code creates garbage, too:

    declare
        f: Foo:
    begin
        f := new Foo;
        f := new Foo;

In Java, what happens to garbage?

In Ada, garbage normally creates a memory leak:

Memory leak: increasing amounts of unreachable memory
Garbage is unreachable memory

In Ada we avoid garbage with explicit deallocation

Ada systems can also have garbage collection

Like Java
Available on systems whose compilers generate byte code
But garbage collected Ada systems is rare
GC can be problematic on real time systems

Avoid Garbage with Explicit Deallocation of Memory

To deallocate an anonymous variable of a specific type,

we must create a deallocation procedure specific for

Create the deallocation procedure by instantiating a generic procedure Unchecked_Deallocation for the specified type

Example: procedure dispose is new Unchecked_Deallocation(Box, BoxPtr)

Then use the deallocation routine as required

Example: dispose(p);

Example: boxdispose.adb (prettified)

Other points:

You can choose any name - free is another common name
Two params needed to instantiate dispose, one needed to call it
Separate Deallocation routine required for each pointer type
Deallocation sets pointer to null
Deallocation of null is okay

Error 3: Dangling References

Dangling References are the result of having aliases

Before looking at Dangling References, we need to understand Aliases

Aliases

Alias: alternate name for a variable

Pointers are a common way to create an alias

Example

    type Box ...
    type BoxPtr is access Box;

    p, q: BoxPtr;
begin
    p := new Box'(1, 2);

    q := p;

What is another way that aliases are created?

Think procedures

Aliases are unavoidable, but they also cause problems

Make debugging difficult:

        p := new Box(1, 2);

        q := p;

        q.w := 99;

        put_line(p);  -- What gets printed?

Dangling References

Dangling Reference: pointer to memory that is invalid

    p, q: BoxPtr;
begin
    p := new Box'(1, 2);
    q := p;

    dispose(q);

    put_line(p);     -- output unpredictable

Invalid memory could be

memory that is no longer allocated
reallocated memory of correct or incorrect type

Dangling References and Reallocated Memory

Sometimes new (re)allocates the most recently deallocated memory

Example danglingref.adb (prettified)

    p, q: BoxPtr;
begin
    p := new Box'(1, 2);
    put_line(p);     -- output: 1 2

    q := p;          -- Create an alias

    dispose(q);

    put_line(p);     -- output unpredictable (example: 0 2)

    q := new Box'(3, 4);

    -- p may be an alias to the new object
    put_line(p);     -- may output 3 4

    p.all.w := 99;   -- What does this modify?

    put_line(q);     -- may output 3 99

Dangling References and Reallocation of Different Types

Reallocation can cause havoc with types
Example:

    p, q: BoxPtr;
    r: PersonPtr;
begin
    p := new Box'(1, 2);
    q := p;

    dispose(q);

    put_line(p);     -- outputs 1 2

    r := new Person;

    -- What does this do to Person r?
    p.all.w := 99;

Avoiding Dangling References

Two possibilities when deallocating q with alias p:

Pointer p is still needed

In this case, deallocation of q is an error

Pointer p is not needed

Can set it to null before deallocation

Requires keeping track of aliases, which is difficult

Java will not deallocate this object

Dangling References can't occur if it's impossible to create aliases

Limited private types can't have aliases

Summary of Errors

Watch out for several pointer errors:

Dereference a null pointer

Note that pointer variables have default value null

Creating garbage

Garbage is a memory location that is allocated but is unaccessible (ie has no pointers to it)
How to avoid (when no garbage collection): explicit deallocation

Dangling reference

A dangling reference is a pointer to invalid memory
Avoid by setting alias pointers to null prior to deallocating other aliases

Memory Organization

Memory Management and Kinds of Variables

When learning about pointers, knowledge of memory management is useful

Modern languages recognize 3 different kinds of variables:

Local variables: declared in a routine (or method)
Anonymous variables: allocated with new (in both Ada and Java)
Package variables

What are some Ada examples that we've seen?
What are some Java examples? (Trick question)

Each kind of variable is allocated and deallocated differently:

Local variables follow a stack protocol:

Allocate on entering the routine (or block)
Deallocate on leaving

Anonymous variables: allocated with new (in both Ada and Java)

Allocate with explicit new
Deallocate explicitly (Ada, C, C++) or with Garbage Collection (Java)

Package variables

Allocate when program beings
Deallocate when program ends

Memory Organization - Three Areas of Memory

Each kind of memory management is done in a different area of memory:

Stack: Holds local variables

Variable allocated when enter scope
Deallocated when exit scope
Locals allocated following a Stack protocol

Heap: Holds objects allocated with new

allocated when created
deallocated explicitly or by garbage collection

Static: ...

Allocated when program begins
Deallocated when program ends
Where is static memory used?

Ada: Variables declared in package specs and bodies: eg ...
C and C++: variables declared as static
Java: Trick question

Memory Organization Diagram

Diagram: useful to draw stack next to heap
A more accurate diagram: static at bottom, heap next, stack at other end

Stack and heap grow toward each other

Looking at Memory Addresses

Printing Addresses

It's interesting and informative to see actual memory addresses

We will look at addresses of stack and heap variables

First we must learn about modular types

Modular Types

Example modular type: minutes.adb (prettified)

Values of type minuteType range from 0 to 59 and they wrap
Arithmetic for type minuteType wraps

procedure minutes is

    type MinuteType is mod 60;

	package Minute_IO is new Text_IO.Modular_IO (Num => MinuteType);
	use Minute_IO;

    m: MinuteType;

 begin
    m := 50;
    m := m + 20;
    put(m);    -- Outputs 10

Side Note: Modular types allow some bitwise operations such as and, or, not, xor:

    type Unsigned_Byte is mod 256;
    u: Unsigned_Byte := 2#0011_0101#
    v: Unsigned_Byte := 2#1111_0000#
    w: Unsigned_Byte := u and v;

Modular Types and Addreses

Our motivation for looking at modular types: define a type that will allow us to print addresses

Addresses are 32 bits long (on a 32 bit machine)
Addresses take values from 0 to 2**32 - 1
The largest address (2**32-1) is 2#1111_1111_1111_1111_1111_1111_1111_1111# in binary
Ada.Integer_Text_IO would print 2#1111_1111_1111_1111_1111_1111_1111_1111# as -1 rather than as a large integer

We use a Modular Type to hold 32 bit addresses

 type AddressType is mod 2**32;

We use generic package modular_io to define an io package for AddressTypes

Converting Pointers to Our Address Type

Generic function Unchecked_Conversion allows a program to ignore type rules:

Use Unchecked_Conversion to convert from a pointer type to a modular type

Example: Create Copy which takes an IntPointer and returns an AddressType

        type IntPointer is access Integer;
        type AddressType is mod 2**32;

        function Copy is
            new Unchecked_Conversion (Source => IntPointer,
                                      Target => AddressType);

        a: IntPointer;
        b: AddressType
        ...
        b := Copy(a)

Variable a is copied to b, even though types don't match

Example of use: printmem.adb and prettified: printmem.adb.html

Shows Modular Type and Unchecked_Conversion
Shows heap addresses and reuse of heap locations

Pointing to Variables on the Stack

Addresses of local variables, allocated on the stack, decrease

Rules for pointers to local variables:

If a pointer is going to point to a local variable, that variable must be marked as aliased
The type IntPointer must be defined as an access all Integer
Use 'access to get the address of a variable

Example 1: aliasedlocal.adb (and prettified)

Shows addresses of locals decreasing and of heap increasing
Also shows creation and use of a dangling reference and reallocation to same location

Example 2: aliasedlocal2.adb (and prettified)

Shows addresses of locals in the main procedure and in a called procedure:

Another Example: uninitlocals.adb (and prettified)

Main calls putlocaladdr1 then main calls putlocaladdr2
Both putlocaladdr1 and putlocaladdr2 share the same memory
Variable b in putlocaladdr2 is uninitialized, but it uses the value left in memory by putlocaladdr1

Another way to print addresses:

Package System has a routine that will convert an address into a string
The 'Address attribute returns the address of a variable

            with System.Address_Image;
            i: Integer;
            ...
            put_line("Address of i: " & System.Address_Image(i'address));

Pointers in Other Languages

In java and ada we get automatic dereferencing of pointer variables; in other languages (eg c and c++), dereferencing is not necessarily automatic.

C makes heavy use of pointers. For example, it uses pointers to get the effect of out and in out parameters. Arrays are also accessed using pointers. C does not do strong typing on pointers.

// Swap routine in C
swap(int *ip, int *jp){   // ip and jp are pointers to integers
   int temp = *ip;        // Explicit dereference of pointer ip
   *ip = *jp;
   *jp = temp
}

// Client code:
int i=5, j=6;
    ...
swap(&i, &j);   // Pass explicit pointers to i and j

C++ is similar to C with pointers. C++ does allow reference mode parameters that are effectively in out mode parameters. They are automatically passed by address and have automatic dereferencing.

// Swap routine in C++
swap(int &ip, int &jp){   // i and j are reference parameters
   int temp = ip;         // Automatic dereference of pointer ip
   ip = jp;
   jp = temp
}

// Client code:
int i=5, j=6;
    ...
swap(i, j);   // Pass implicit pointers to i and j

More Discussion of Pointers

Memory and Addresses

Pointers and references are variables whose values are addresses of other memory locations

We say that pointer p points to a location if it contains the address of that location

Assume memory divided into cells.

Each cell stores a value
Each cell has an address

We draw pointers like this: ...

Buzzword: Dereference: follow a pointer to whatever it points to

Comparison of Java References and Ada Pointers

Pointers are similar to references, with some differences

Similarities:

Both refer to values that are allocated dynamically on the heap with new
Both are strongly typed
Both use null for pointers that don't point to allocated values
The values of both are the address of the value to which it points/refers
Both can create garbage
Implicit dereferencing only

Differences

Ada has separate types for the pointer and the value to which it points
Ada allows accessing the value as a separate entity
Ada pointers can also point to objects on the stack
Some languages (eg C) can do pointer arithmetic [Ada, can too, if you disable type checking]
No garbage collection (normally)
Explicit deallocation required
Dangling references possible
Implicit or explicit dereferencing

Why Pointers/References

Why do languages have pointers/references?

Answer: flexibility:

They allow programmer to control memory management
References are all of the same size

Memory management allows, for example, use of linked data structures (eg linked lists)

These structures can grow as needed
Their size does not need to initially be known

Since Java uses references, the size of all reference variables is known at compile time

Primitive variables (eg int, double) of course have known size
References are all of the same size, regardless of the size of the object they point to

Example - Assume Student inherits from Person and has extra fields (ie is larger):

            Person p = new Person();
            p = new Student();

Why Pointers

Creating dynamic structures

Size can grow and shrink as needed
Examples: Dynamic stacks and queues

Allocating heap memory

Java objects (of any size) are on heap

Accessing variable-sized heap objects with constant sized references

Makes OO and polymorphism possible
Ada objects use pointers to avoid size issue

Heap objects can change size

Example: Java strings (s="abc"; s="de";)
Example: Java ArrayList

Allocate a certain size
When run out of space, allocate more (may copy into new space)

Known as amortized doubling

Parameter passing

Can pass large or unknown-size parameters by copying reference instead of value

Ada in out mode:

small values copied in/out
large values pointer is passed

Java: Parameters that are objects are passed as (in mode) references to those objects

C: to change a parameter, you must pass an address (see above)
C++: Like c, but also have reference parameters (see above)

Allows multiple access to an object and avoids copying

Example: clock
Lookup table
Array of students
Print queue: queue of pointers to print jobs

C arrays:

    int a[] = {11, 22, 33};
    a[2] = 44;
    *(a+2) = 55;   // * means dereference.  +2 means 2 words