Pointer Variables and Dynamic Allocation
Pointers: The Problem
Pointers - The Problem
- Problem: Knowing the size of an object at compile time is not always possible
- Example: List with varying number of elements
- Example: Array whose elements are members of a type hierarchy
- Parent: Person. Children: Staff, Student. Array of Person
- Problem: Making a copy of an object (eg value semantics) is not always efficient
- Problem: Making a copy of an object (eg value semantics) is not always feasible
- May not want more than one instance of a dictionary, a DB, or an table in the OS
The Solution
- Solution: Use pointers to reference objects
- Why does this help:
- Makes it possible for programmer dynamically allocate memory as needed
- Pointers are fixed size, even if the items they point to are not
Terminology
- We use these terms as roughly synonymous:
- Other terms:
- Access type: type whose value is an access of a value of another type
- Dereference: follow a pointer to the object it points to
- Object: anything in memory (not just as in OO)
- Memory address: the location in memory at which an object is stored
Introduction: Box Example
Java References: Class Box
- Let's start with something you already know: Java references
- Example:
class Box{
int l;
int w;
}
class Client{
psvm(){
Box p; -- What's allocated for this local variable?
p = new Box();
p.l := 3;
p.w := 4;
S.o.p(p.l + p.w);
Box q = null; -- What if no initialization?
S.o.p(q.l); -- What happens here?
}
}
Start with type Box
- Shows a type for a Box with a length and a width:
type Box is record
len, wid: Integer;
end record;
-- What is allocated
b1, b2: Box
begin
b1.len := 11;
b1.wid := 22;
b2 := (33, 44);
boxexample.adb
(prettified)
Now Introduce Type BoxPtr
- Shows:
- type BoxPointer is an access type - it points to a Box
- we allocate boxes dynamically with new Box or new Box(1, 2)
- Dynamic allocation creates the box and returns its address
- .all dereferences (ie follows the pointer)
- Example:
type Box is record
len, wid: Integer := 0;
end record;
-- A type for pointers to boxes
type BoxPtr is access Box;
b1: Box;
-- What is allocated here?
p1, p2, p3: BoxPtr;
begin
b1.len := 11;
b1.wid := 22;
p1 := new Box; -- Similar to Java
-- .all means to dereference the pointer
p1.all.len := 33;
p1.all.wid := 44;
p2 := new Box;
p2.all := (55, 66); -- Aggregate assignment
p3 := new Box'(77, 88); -- Allocate and initialize
boxptrexample1.adb (prettified)
Refer to the Entire Box, with .all
- Can refer to the entire box with .all
- Example:
type Box is record
len, wid: Integer := 0;
end record;
-- A type for pointers to boxes
type BoxPtr is access Box;
b1: Box := (11, 22);
-- What is allocated here?
p1, p2: BoxPtr;
begin
p1 := new Box'(33, 44);
p1.all := b1;
p1.all := (33, 44);
p2 := new Box;
p2.all := p1.all;
p2 := p2; -- What does this do?
Some Things Never Change ...
- Another similarity with Java
type Box is record
len, wid: Integer := 0;
end record;
type BoxPtr is access Box;
b1 := (1, 2);
p1, p2: BoxPtr;
begin
p1.all.len := 11; -- Okay?
p1.all := p2.all; -- Okay?
p1.all := b1; -- Okay?
p1 := p2; -- Okay?
Some Strong Typing
- Now let's have two pointers
type Box is record
len, wid: Integer := 0;
end record;
type Box3D is record
len, wid, height: Integer := 0;
end record;
type BoxPtr is access Box;
type Box3DPtr is access Box3D;
bp: BoxPtr;
bp3D: Box3DPtr;
begin
bp := new Box;
bp3D := new Box3D;
bp := bp3D? -- Okay?
bp := new Box3DPtr? -- Okay?
bp3D.all := (len => 11, wid => 22); -- Okay?
Can .all can be Omitted?
- Sometimes .all can be omitted
- Example:
type Box is record
len, wid: Integer := 0;
end record;
-- A type for pointers to boxes
type BoxPtr is access Box;
p1: BoxPtr;
begin
p1 := new Box'(11, 22);
p1.all.len := 33;
p1.wid := 44;
put(p1.len);
put(p1.all.wid);
Shorthand notation: in some cases you get automatic dereference
Sometimes .all is Required
- .all is required when referring to the entire object
- Look carefully at the two put routines and their calls:
type Box is record
len, wid: Integer := 0;
end record;
-- A type for pointers to boxes
type BoxPtr is access Box;
procedure put(b: Box) is begin
put(b.len);
put(b.wid);
end put;
procedure put(b: BoxPtr) is begin
put(b.len);
put(b.all.wid);
end put;
p1: BoxPtr;
begin
p1 := new Box'(11, 22);
-- Is there a difference here?
p1.all := (33, 44);
p1 := (33, 44);
put(p1); -- What gets called?
put(p1.all); -- What gets called?
Type BoxPtr: Another Example
- Here is another sample program
- boxptrexample2.adb (prettified)
- Illustrates:
- Allocation with initial values:
p := new Box'(3,4);
- A pointer parameter
- Strong typing
-
.all
is optional: p.all.l := 55; p.w := 66;
Some Practice and Strong Typing
- In the code below, consider the expressions in each statement
p: BoxPtr := new Box'(1, 2);
q: BoxPtr := new Box'(3, 4);
b: Box;
begin
p.all.x := q.all.x;
p.x := q.x;
p.all := q.all;
p.all := b;
p := q;
-- Strong typing - these fail type checking
p := q.all;
p := b;
p.all := 3;
p.all := q;
One Type in Java vs Two in Ada
Java and the Object Referenced
- Now consider some Java statements:
Box p = new Box(1, 2);
Box q = new Box(3, 4);
// p.all.x := q.all.x; // Let's do this in java?
p.x = q.x;
// p.all := q.all; // Can we do this in java?
How do we implement something similar in Java?
Java Comparison: Two Types vs One
class Box{
int l, w;
}
class Client{
psvm(){
Box p = new Box();
}
}
Two types vs one:
- Ada uses separate types for the pointer and the thing it points to
- Informally, Java uses the same type for the pointer and the object it refers to
Java Terminology: Two Types vs One
- Terminology:
- Actually, Java does distinguish between the two:
- Actually, Java says that
p
has type Box
and
the object has class Box
- The object is said to belong to class Box
- Sometimes we say the variable has a run-time type which refers to the
class of the object that the variable points to
- The First Edition of the Java Language Specification had a section entitled
"Variables Have Types, Objects Have Classes"
Dynamic Structures and Linked Lists
Problem and Solution
- Problem: we want structures whose size is dynamic
- Solution:
- Allocate and deallocate mememory as needed
- Typically using a linked structure
- Examples: linked lists, stacks, queues
Dynamic Stacks and Queues
- Dynamic stacks and queues:
- We can use linked structures to create lists, stacks, and queues
- These structures can grow and shrink as needed
Types for Linked Structures - Version 1
- What we want: Define a type for a node that ...
- contains a field, typically called next, that ...
- points to another (typically the next) node
- What would a picture of the memory for this look like:
- Possible solution (with a possible problem):
type Node is record
data: Integer;
next: access Node;
end Node;
How would we access the list?
- What would the type of the access value be?
- What if we wanted a named type for the access?
Types for Linked Structures - Version 2
- Let's try this named type for the access Node:
type Node is record
data: Integer;
next: NodePointer;
end Node;
type NodePointer is access Node;
p: NodePointer;
Okay, but p
and the field next
should have the same named type
Let's try this:
type NodePointer is access Node;
type Node is record
data: Integer;
next: NodePointer;
end Node;
What is the problem?
Solution: Use a partial definition
type Node; -- A partial definition of the name Node
type NodePointer is access Node;
type Node is record
data: Integer;
next: NodePointer;
end Node;
Problem: Circular definition
Making a List
- Let's create a list with three nodes holding integers 1, 2, 3
- First draw a picture
- Now let's write code to PRINT the list
- In what order do we add the nodes?
- Now the code:
p, q: Node
begin
p := new Node;
p.all.data := 3;
p.all.next := null;
t := p;
p := new Node'(2, t);
p.all.data := 3;
p.all.next := t;
-- Can we create the second node in one statement
-- Can we create the second node in one statement
-- Can we create the third node in one statement
Let's write the code to print each element of the list
Dynamic Stacks and Queues
A Dynamic Stack
- Our stack will add and remove nodes as we push and pop
- Types for a Stack and a StackNode (ignoring privacy, for now)
type StackNode;
type Stack is access StackNode;
type StackNode is record
data: Integer;
next: Stack;
end Node;
what is a Stack
:
- A
Stack
is a pointer to a StackNode
How would we code these routines:
s: Stack;
begin
push(i, s);
pop(s);
put(top(s));
What modes are the parameters?
Stack
specification
(and prettified)
An Empty Stack
- Notice that empty stack is different in Ada and Java
- Ada:
s: Stack;
begin
if isempty(s) then ...
What happens in Java?
Stack s;
if (s.isNull())
What does an empty stack look like
A Dynamic Queue
- Example using a dynamic queue
myQ: Queue;
enqueue(3, myQ);
enqueue(4, myQ);
while not isempty(myQ) loop
f := front(myQ);
dequeue(myQ);
end loop;
Looks just like a regular queue
Dynamic queue implementation:
- A Queue is a linked list of nodes (like a stack)
- Needs pointers to both the front and back of the queue
- The front and back pointers point to the first and last nodes in the list
- Comparison with a dynamic stack:
- A stack contains just one pointer (ie the top)
- A variable of type Stack IS a pointer to a StackNode
- A variable of type Queue IS NOT a pointer to a QueueNode
Code for Dynamic Stacks and Queues
- Examples specifications and bodies:
Class Exercises for Linked Structures
- Write the implementation of operations that will print the
first, second, third, last, and all elements of a stack.
- Write the implementation of a copy operation that
would efficiently make a copy of an existing stack/queue using the
underlying dynamic implementation, without using push and pop.
- Write the implementation of a = operation that would
efficiently determine whether or not two stacks/queues are the same,
using the underlying dynamic implementation.
Problems with Pointers - And Solutions
Three Possible Kinds of Errors
- We will look at three kinds of errors that can happen with pointers:
- Null reference
- Garbage creation (ie memory leak)
- Dangling reference
Error 1: Null Pointers
type Box is ...
type BoxPtr is access Box;
b1: Box := (2, 3);
p1: BoxPtr;
begin
p1.all := b1;
put(p1.all.w);
Error 2: Garbage
- What is garbage?
- In Java, how can we create garbage:
Foo f;
f = new Foo();
f = new Foo();
In Ada the same code creates garbage, too:
declare
f: Foo:
begin
f := new Foo;
f := new Foo;
In Java, what happens to garbage?
In Ada, garbage normally creates a memory leak:
- Memory leak: increasing amounts of unreachable memory
- Garbage is unreachable memory
In Ada we avoid garbage with explicit deallocation
- Ada systems can also have garbage collection
- Like Java
- Available on systems whose compilers generate byte code
- But garbage collected Ada systems is rare
- GC can be problematic on real time systems
Avoid Garbage with Explicit Deallocation of Memory
- To deallocate an anonymous variable of a specific type,
- we must create a deallocation procedure specific for
- Create the deallocation procedure by instantiating a generic procedure
Unchecked_Deallocation
for the specified type
- Example:
procedure dispose is new Unchecked_Deallocation(Box,
BoxPtr)
- Then use the deallocation routine as required
- Example: boxdispose.adb (prettified)
- Other points:
- You can choose any name -
free
is another common name
- Two params needed to instantiate dispose, one needed to call it
- Separate Deallocation routine required for each pointer type
- Deallocation sets pointer to null
- Deallocation of null is okay
Error 3: Dangling References
- Dangling References are the result of having aliases
- Before looking at Dangling References, we need to understand Aliases
Aliases
- Alias: alternate name for a variable
- Pointers are a common way to create an alias
- Example
type Box ...
type BoxPtr is access Box;
p, q: BoxPtr;
begin
p := new Box'(1, 2);
q := p;
What is another way that aliases are created?
Aliases are unavoidable, but they also cause problems
- Make debugging difficult:
p := new Box(1, 2);
q := p;
q.w := 99;
put_line(p); -- What gets printed?
Dangling References
- Dangling Reference: pointer to memory that is invalid
p, q: BoxPtr;
begin
p := new Box'(1, 2);
q := p;
dispose(q);
put_line(p); -- output unpredictable
Invalid memory could be
- memory that is no longer allocated
- reallocated memory of correct or incorrect type
Dangling References and Reallocated Memory
p, q: BoxPtr;
begin
p := new Box'(1, 2);
put_line(p); -- output: 1 2
q := p; -- Create an alias
dispose(q);
put_line(p); -- output unpredictable (example: 0 2)
q := new Box'(3, 4);
-- p may be an alias to the new object
put_line(p); -- may output 3 4
p.all.w := 99; -- What does this modify?
put_line(q); -- may output 3 99
Dangling References and Reallocation of Different Types
- Reallocation can cause havoc with types
- Example:
p, q: BoxPtr;
r: PersonPtr;
begin
p := new Box'(1, 2);
q := p;
dispose(q);
put_line(p); -- outputs 1 2
r := new Person;
-- What does this do to Person r?
p.all.w := 99;
Avoiding Dangling References
- Two possibilities when deallocating q with alias p:
- Pointer p is still needed
- In this case, deallocation of q is an error
- Pointer p is not needed
- Can set it to null before deallocation
- Requires keeping track of aliases, which is difficult
- Java will not deallocate this object
- Dangling References can't occur if it's impossible to create aliases
- Limited private types can't have aliases
Summary of Errors
- Watch out for several pointer errors:
- Dereference a null pointer
- Note that pointer variables have default value null
- Creating garbage
- Garbage is a memory location that is allocated but is
unaccessible (ie has no pointers to it)
- How to avoid (when no garbage collection): explicit deallocation
- Dangling reference
- A dangling reference is a pointer to invalid memory
- Avoid by setting alias pointers to null prior to
deallocating other aliases
Memory Organization
Memory Management and Kinds of Variables
- When learning about pointers, knowledge of memory management is useful
- Modern languages recognize 3 different kinds of variables:
- Local variables: declared in a routine (or method)
- Anonymous variables: allocated with new (in both Ada and Java)
- Package variables
- What are some Ada examples that we've seen?
- What are some Java examples? (Trick question)
- Each kind of variable is allocated and deallocated differently:
- Local variables follow a stack protocol:
- Allocate on entering the routine (or block)
- Deallocate on leaving
- Anonymous variables: allocated with new (in both Ada and Java)
- Allocate with explicit new
- Deallocate explicitly (Ada, C, C++) or with Garbage Collection (Java)
- Package variables
- Allocate when program beings
- Deallocate when program ends
Memory Organization - Three Areas of Memory
- Each kind of memory management is done in a different area of memory:
- Stack: Holds local variables
- Variable allocated when enter scope
- Deallocated when exit scope
- Locals allocated following a Stack protocol
- Heap: Holds objects allocated with
new
- allocated when created
- deallocated explicitly or by garbage collection
- Static: ...
- Allocated when program begins
- Deallocated when program ends
- Where is static memory used?
- Ada: Variables declared in package specs and bodies: eg ...
- C and C++: variables declared as
static
- Java: Trick question
Memory Organization Diagram
- Diagram: useful to draw stack next to heap
- A more accurate diagram: static at bottom, heap next, stack at other end
- Stack and heap grow toward each other
Looking at Memory Addresses
Printing Addresses
- It's interesting and informative to see actual memory addresses
- We will look at addresses of stack and heap variables
- First we must learn about modular types
Modular Types
- Example modular type: minutes.adb
(prettified)
- Values of type
minuteType
range from 0 to 59 and they wrap
- Arithmetic for type
minuteType
wraps
procedure minutes is
type MinuteType is mod 60;
package Minute_IO is new Text_IO.Modular_IO (Num => MinuteType);
use Minute_IO;
m: MinuteType;
begin
m := 50;
m := m + 20;
put(m); -- Outputs 10
Side Note: Modular types allow some bitwise operations such as and, or, not, xor:
type Unsigned_Byte is mod 256;
u: Unsigned_Byte := 2#0011_0101#
v: Unsigned_Byte := 2#1111_0000#
w: Unsigned_Byte := u and v;
Modular Types and Addreses
- Our motivation for looking at modular types: define a type that will allow us to
print addresses
- Addresses are 32 bits long (on a 32 bit machine)
- Addresses take values from 0 to 2**32 - 1
- The largest address (2**32-1) is 2#1111_1111_1111_1111_1111_1111_1111_1111# in binary
- Ada.Integer_Text_IO would print 2#1111_1111_1111_1111_1111_1111_1111_1111# as -1
rather than as a large integer
- We use a Modular Type to hold 32 bit addresses
type AddressType is mod 2**32;
We use generic package modular_io
to define an io package for
AddressTypes
Converting Pointers to Our Address Type
- Generic function Unchecked_Conversion allows a program
to ignore type rules:
- Use Unchecked_Conversion to convert from a pointer type
to a modular type
- Example: Create
Copy
which takes an IntPointer
and returns an AddressType
type IntPointer is access Integer;
type AddressType is mod 2**32;
function Copy is
new Unchecked_Conversion (Source => IntPointer,
Target => AddressType);
a: IntPointer;
b: AddressType
...
b := Copy(a)
- Variable a is copied to b, even though types don't match
Example of use: printmem.adb and
prettified: printmem.adb.html
- Shows Modular Type and Unchecked_Conversion
- Shows heap addresses and reuse of heap locations
Pointing to Variables on the Stack
Pointers in Other Languages
Pointers in Other Languages
- In java and ada we get automatic dereferencing of pointer
variables; in other languages (eg c and c++), dereferencing is not
necessarily automatic.
- C makes heavy use of pointers. For example, it uses
pointers to get the effect of out and in out parameters. Arrays are also
accessed using pointers. C does not do strong typing on pointers.
// Swap routine in C
swap(int *ip, int *jp){ // ip and jp are pointers to integers
int temp = *ip; // Explicit dereference of pointer ip
*ip = *jp;
*jp = temp
}
// Client code:
int i=5, j=6;
...
swap(&i, &j); // Pass explicit pointers to i and j
- C++ is similar to C with pointers. C++ does allow reference
mode parameters that are effectively in out mode parameters. They are automatically
passed by address and have automatic dereferencing.
// Swap routine in C++
swap(int &ip, int &jp){ // i and j are reference parameters
int temp = ip; // Automatic dereference of pointer ip
ip = jp;
jp = temp
}
// Client code:
int i=5, j=6;
...
swap(i, j); // Pass implicit pointers to i and j
More Discussion of Pointers
Memory and Addresses
- Pointers and references are variables whose values are addresses of other memory locations
- We say that pointer p points to a location if it contains the address of
that location
- Assume memory divided into cells.
- Each cell stores a value
- Each cell has an address
- We draw pointers like this: ...
- Buzzword: Dereference: follow a pointer to whatever it points to
Comparison of Java References and Ada Pointers
- Pointers are similar to references, with some differences
- Similarities:
- Both refer to values that are allocated dynamically on the heap
with
new
- Both are strongly typed
- Both use null for pointers that don't point to
allocated values
- The values of both are the address of the value to
which it points/refers
- Both can create garbage
- Implicit dereferencing only
- Differences
- Ada has separate types for the pointer and the value to
which it points
- Ada allows accessing the value as a separate entity
- Ada pointers can also point to objects on the stack
- Some languages (eg C) can do pointer arithmetic [Ada,
can too, if you disable type checking]
- No garbage collection (normally)
- Explicit deallocation required
- Dangling references possible
- Implicit or explicit dereferencing
Why Pointers/References
- Why do languages have pointers/references?
- Answer: flexibility:
- They allow programmer to control memory management
- References are all of the same size
- Memory management allows, for example, use of linked data structures (eg linked lists)
- These structures can grow as needed
- Their size does not need to initially be known
- Since Java uses references, the size of all reference variables is known at compile time
- Primitive variables (eg int, double) of course have known size
- References are all of the same size, regardless of the size of the object they point to
- Example - Assume Student inherits from Person and has extra fields (ie is larger):
Person p = new Person();
p = new Student();
Why Pointers
- Creating dynamic structures
- Size can grow and shrink as needed
- Examples: Dynamic stacks and queues
- Allocating heap memory
- Java objects (of any size) are on heap
- Accessing variable-sized heap objects with constant sized
references
- Makes OO and polymorphism possible
- Ada objects use pointers to avoid size issue
- Heap objects can change size
- Example: Java strings (
s="abc"; s="de";
)
- Example: Java ArrayList
- Allocate a certain size
- When run out of space, allocate more (may copy into new
space)
- Known as amortized doubling
- Parameter passing
- Can pass large or unknown-size parameters by copying reference instead of value
- Ada in out mode:
- small values copied in/out
- large values pointer is passed
- Java: Parameters that are objects are passed as
(in mode) references to those objects
- C: to change a parameter, you must pass an address (see
above)
- C++: Like c, but also have reference parameters (see above)
- Allows multiple access to an object and avoids copying
- Example: clock
- Lookup table
- Array of students
- Print queue: queue of pointers to print jobs
- C arrays:
int a[] = {11, 22, 33};
a[2] = 44;
*(a+2) = 55; // * means dereference. +2 means 2 words