This is a two-part lab.
ejderrick and ibarland sections:
Hw09 will be: complete all parts of lab12a
,
due the Wednesday after the break (Dec.03).
Consider a person who leaves a pub, and travels in the
following manner:
This is called a random walk
(along one
dimension: that is, only walking along a line going north-south.)
If somebody takes a random walk of, say,
WALK_LENGTH steps,
where will end up?
Sometimes they might be fully WALK_LENGTH
moves north of the pub, but that's not very likely.
Sometimes they'll be exactly back at the pub entrance
(but that's still pretty lucky, if they went north
exactly as many times as they went south).
Most of the time they'll probably be somewhat near the pub,
but not exactly where they started from.
So a natural question is, on average, where do they end up?
If fifty such random walkers all leave the pub,
how far will they get, on average?
How much variation will there be---will these fifty people
all be roughly the same distance away from the pub,
or will some be very close to the pub while others
are nearly a full WALK_LENGTH blocks away?
Aha, we are now in a position to experimentally find how
far away they are!
We will:
TO DO
First, write a function
Initially, the walker starts 0 blocks north of the pub,
and for the following numSteps times,
it takes adds a random amount of -1, 0, or +1.
Test your function by first calling it with a small number of steps (say, a random walk of 5 steps, or even just 1 step). Repeat several times, so you get an idea of whether your method seems to be giving reasonable results. Then, try calling it with larger number of steps, like 150 or so.
Tip
How to generate a random number in {-1,0,+1}:
Create a random-number-generator (java.util.Random),
as we saw last week.
Create a number in {0,1,2} (by calling the Random's method nextInt(3)),
and then subtract one from the result.
This gives you a random number in the range you want.
Try it in the code pad, a couple of times!
TO DO
Write a method to simulate many people, each taking their own random walk:
numWalks ints,
and then
initialize each element of the array
so it holds the result of doing on random walk.
(Call the function you just wrote above!).
TO DO Write a driver method. For right now, it will just call our previous method, and print out the array of results. (We'll add more things to our driver later.)
But how do we print out an array?
Alas, the built-in toString method is unhelpful, for arrays.
There are several ways:
String representing an array:
loops through the array,
each time tacking on the next number to a stringSoFar.
toString method
mentioned in the table comparing Lists and arrays.
We are going to want to calculate the average of several different
data sets (arrays).
Rather than write the
code-to-calculate-the-average-of-an-array repeatedly:
TO DO:
let's just write a method avg
which calculates the average of any array.
What is its signature?
TO DO: Go back and modify your driver method, so that it prints out the average of all the random walks. (This average should be close to zero...on average. About half the walkers will end up north of the pub, about half will end up south of the pub, and these will roughly cancel each other out. But not always, of course -- occasionally a set-of-random-walks might include an unusual percentage of people who all end up (say) to the north, and the average of that set will be further away from 0.)
We might know the average distance of a random-walk,
but
how far away from the average is a typical walk?
(That is, are walks tightly clustered around the average,
or spread far away?)
TO DO:
Modify your driver method
to take the results of the random walks (which are both positive and negative),
and modify that array so that
each location stores how far each walker got (no negative numbers).
That is, replace each item with is absolute value.
(Use Math.abs to convert a number to its absolute value.)
TO DO: Now that the driver method has an array just of distances, calculate and print the average of this modified array. (Remember, from a previous step, you already have a method which will compute any array's average. Don't repeat that code; just call that method!) This gives an informative number: The average distance walked.
TO DO
Write a method which takes in one array of numbers:
first it calculates the average value of the array,
and then it returns a whole new array whose elements
are the distance from each original number was from the average.
TO DO
At last,
we can provide answers to the question of how tightly clustered random walks are
—
that is, how-far-from-average a walk tends to be (on average!).
Have your driver print out the
the average value of the distance-from-the-average.
Of course, getting this should be easy:
calling eachDistFromAvg gives you an array of the differences,
and then calling average will give us the result:
avg( eachDistFromAvg( data ) ).
However,
there is
one annoying gotcha
:
Our method avg wanted an array of ints.
But eachDistFromAvg returns an array of doubles,
so
we can't actually pass the result of
eachDistFromAvg to avg!
Sigh:
The best we can do in JavaInteger and Double
for our data,
instead of the primitive int and double,
then we could have made avg taking in an array-of-Numberarray of
.
<? extends Number>Integer or
array-of-Double.
…It kinda makes one wonder,
why Java includes the primitive types in the first place,
if it just means you need to duplicate code for methods like avg, eh?
avg -- one
which takes in double[] instead of int[]. (Grrr!)
From a coding standpoint,
one thing to notice is that although there are several "TO DO" tasks,
each one solves a nice, restricted problem, and is only about 3-4 lines of code.
Moreover,
by calling methods we have previously written,
we leverage this code greatly -- we don't
have the code for computing the average repeated four times;
we only write it onceint[] and once taking in double[].
Grrr again!
This approach is called a Monte Carlo simulation: rather than mathematically figuring out exactly what the average is, we do a whole bunch of simulations, and get an idea of what's going on.
Challenge:
Run your program
on walks with different lengths
(say 100 steps, 1000 steps, 10,000 steps, 100,000 steps).
Do you see a pattern, in
distFromMean[] to store
the distance from our data to its mean.
That is, if meanOfData is 4,
and data[17] happened to be -1,
then distFromMean[17] would be 5
(the distance between -1 and 4).\footnote{Remember
that in math.h, the function double fabs(double x);
is declared, which returns the absolute value of its double
(floating-point (f)) argument.}
That same aside: when debugging/testing your program, this would
be a good time to print out the array distFromMean
to make sure things are working okay. Just call the
function printDoubleArray you already wrote!
However, when submitting your final program, please comment out
this call to printDoubleArray, as it will just
make your output printouts long.
\item
Hey, now that we have all the distances from the mean
in the array distFromMean[],
we can find the average of this array!
Easy, since we already have a function to do this.
Now, We will run our program, using NUM_DATA different
random walkers leaving the pub.
The function fillArray will put into each
array element the result of a single random walk:
That is, arr[0] will be the ending position
of {\em one} random walk of length WALK_LENGTH.
Seperately, arr[1] will be the ending
position of another random walk, and so on.
So fillArray is almost identical to what it
was before, except that instead of filling arr[i]
with the value i, it fills it with
randWalk( WALK_LENGTH ).
The rest of your program, without changing it\footnote{
You may want to modify what your program prints out
when reporting its results, though.},
is now calculating the mean ending position of all those
random walkers, and how far away they are from this mean
position, on average.
This approach is called a Monte Carlo simulation:
rather than mathematically figuring out exactly what
the average is, we do a whole bunch of simulations,
and get an idea of what's going on.
Can you guess a mathematical expression which gives
how far the walkers tend to be from their mean position,
as WALK_LENGTH gets larger and larger?
We will write a program which creates an array,
calculates the average value, and then
calculates how far away from the average those values
tend to be (i.e., are the numbers clustered around the
average, or do they vary widely?).
Your main program should do the following.
\begin{itemize}
\item
\#define NUM_DATA 50
(For debugging your program, I suggest using something
like 5, and once you are confident it works for small
cases, try something bigger like 50.)
\item Declare two arrays of size NUM_DATA, of
type double.
Call them (say) data and distFromMean.
(That is, data[0] is the first item of data,
and data[NUM_DATA - 1] is the final item.)
\item Put interesting values into data[]
by
calling a function fillArray.
This function fillArray takes as arguments
an array arr of doubles, and an integer size,
and fills arr[0] through arr[size-1] with
``interesting" values.
What does ``interesting" mean? We'll come back and spice it up
later, but for now write fillArray
so that element i of
the array just has value i, to keep things simple.
An aside:
When testing/debugging your program,
you may want to (for the moment) have main print out the values
of data, just to make sure fillArray is working as
expected.
An all-around handy tool to have is a function printDoubleArray,
which takes two arguments (an array of doubles, and an int representing
the size of the array), and prints the array on the screen.
However, when submitting your final program, please comment out
this call to printDoubleArray, as it will just
make your output printouts long.
\item
Print a message, informing the user of the mean,
and the average distance from the mean (ta-dum!).
\end{itemize}
\newpage
\subsection*{Random Walks}
One more thing: earlier, I said we'd later let
fillArray use more interesting
values than $\set{0, 1, \ldots, size-1}$.
Instead, we'll use data generated from random walks.
Consider a person who leaves a pub, and travels in the
following manner: a third of the time they walk north
a block. A third of the time they walk south a block.
And the remaining third of the time they don't move
at all. This is called a ``random walk" (along one
dimension, only walking along a line going north-south.)
After taking, say, WALK_LENGTH moves,
where will they be? Sometimes they'll be fully WALK_LENGTH
moves north of the pub, but that's not very likely.
Sometimes they'll be {\em exactly} back at the pub entrance
(but that's still pretty lucky, if they went north
exactly as many times as they went south).
Most of the time they'll probably be somewhat near the pub,
but not exactly back where they started from.
So a natural question is, on average, where do they end up?
%On average, how far away from the pub do they end up
%(regardless of whether they are north or south of the pub)?
If fifty such random walkers all leave the pub, where
will they be, on average?
How much variation will there be---will these fifty people
all be roughly the same distance away from the pub,
or will some be very close to the pub while others
are nearly a full WALK_LENGTH blocks away?
Aha, we are now in a position to experimentally find how
far away they are!
First, write a function int randWalk( int numSteps ),
which simulates a random walk of numSteps steps,
and returns the final position (in blocks north of the pub;
this number will of course be negative if the final position was south
of the pub).
You'll call this function, passing it WALK_LENGTH,
which you should \#define to be 150.
Initially, the walker starts 0 blocks north of the pub,
and for the following numSteps times,
it takes adds a random amount of $-1$, $0$, or $+1$.
(Hmm, last week's function randBetween could
come in handy here!)
We will run our program, using NUM_DATA different
random walkers leaving the pub.
% (Changing the name to
%something more descriptive, like NUM_WALKERS,
%would certainly be reasonable.
The function fillArray will put into each
array element the result of a single random walk:
That is, arr[0] will be the ending position
of {\em one} random walk of length WALK_LENGTH.
Seperately, arr[1] will be the ending
position of another random walk, and so on.
So fillArray is almost identical to what it
was before, except that instead of filling arr[i]
with the value i, it fills it with
randWalk( WALK_LENGTH ).
The rest of your program, without changing it\footnote{
You may want to modify what your program prints out
when reporting its results, though.},
is now calculating the mean ending position of all those
random walkers, and how far away they are from this mean
position, on average.