This is a two-part lab.
ejderrick and ibarland sections: Hw09 will be: complete all parts of lab12a, due the Wednesday after the break (Dec.03).

Random walks

Consider a person who leaves a pub, and travels in the following manner: A third of the time they walk north a block. A third of the time they walk south a block. And the remaining third of the time they don't move at all. This is called a random walk (along one dimension: that is, only walking along a line going north-south.)

If somebody takes a random walk of, say, WALK_LENGTH steps, where will end up? Sometimes they might be fully WALK_LENGTH moves north of the pub, but that's not very likely. Sometimes they'll be exactly back at the pub entrance (but that's still pretty lucky, if they went north exactly as many times as they went south). Most of the time they'll probably be somewhat near the pub, but not exactly where they started from.

So a natural question is, on average, where do they end up? If fifty such random walkers all leave the pub, how far will they get, on average? How much variation will there be---will these fifty people all be roughly the same distance away from the pub, or will some be very close to the pub while others are nearly a full WALK_LENGTH blocks away?

Aha, we are now in a position to experimentally find how far away they are! We will: Write a method which simulates one random walk. Collect data on many random walks, storing our results in an array. Calculate the average-distance-walked of all the numbers in the array. Calculate how far (on average) each walk tended to vary from the average.


TO DO First, write a function /** See how far somebody gets, in a random walk. * @param numSteps The total number of steps taken. * @return The final position, after the walk: * 0 represents the starting point; * the result can be positive or negative. */ int randWalk( int numSteps )

Initially, the walker starts 0 blocks north of the pub, and for the following numSteps times, it takes adds a random amount of -1, 0, or +1.

Test your function by first calling it with a small number of steps (say, a random walk of 5 steps, or even just 1 step). Repeat several times, so you get an idea of whether your method seems to be giving reasonable results. Then, try calling it with larger number of steps, like 150 or so.

Tip How to generate a random number in {-1,0,+1}: Create a random-number-generator (java.util.Random), as we saw last week. Create a number in {0,1,2} (by calling the Random's method nextInt(3)), and then subtract one from the result. This gives you a random number in the range you want. Try it in the code pad, a couple of times!

Analyzing Random Walks

TO DO Write a method to simulate many people, each taking their own random walk: /** Perform many random walks, recording the result. * @param numWalks How many random walks to simulate. * @param numSteps The length of each of those walks. * @return an array of length `numWalks`. */ int[] takeManyWalks( int numWalks, int numSteps ) You method must create (allocate) an array of numWalks ints, and then initialize each element of the array so it holds the result of doing on random walk. (Call the function you just wrote above!).

TO DO Write a driver method. For right now, it will just call our previous method, and print out the array of results. (We'll add more things to our driver later.)

But how do we print out an array? Alas, the built-in toString method is unhelpful, for arrays. There are several ways: Write a helper-method which loops through an array, printing each item. More flexibly, write a method which returns a String representing an array: loops through the array, each time tacking on the next number to a stringSoFar. You can use the improved toString method mentioned in the table comparing Lists and arrays. In BlueJ's Code Pad, you can inspect (double-click) on an array object.


Lab12, continued:

We are going to want to calculate the average of several different data sets (arrays). Rather than write the code-to-calculate-the-average-of-an-array repeatedly:
TO DO: let's just write a method avg which calculates the average of any array. What is its signature?

TO DO: Go back and modify your driver method, so that it prints out the average of all the random walks. (This average should be close to zero...on average. About half the walkers will end up north of the pub, about half will end up south of the pub, and these will roughly cancel each other out. But not always, of course -- occasionally a set-of-random-walks might include an unusual percentage of people who all end up (say) to the north, and the average of that set will be further away from 0.)

For example, The average endpoint of random walks. (This should be close to zero...on average. About half the walkers will end up north of the pub, about half will end up south of the pub, and these will roughly cancel each other out.) The average distance from the pub, when the walk ends (regardless of whether it was north or south). Suppose that, after doing many random walks, the average distance reached was (say) 20 steps. We know some walks were more than 20, and some were less than 20. Suppose we're told one walker ended up 30 steps from the pub: is that unusually far away, or is that routine? That is, does this average of 20 come from some walks being 22 and some 18 (and it's remarkable if somebody ended up 10 steps away), or does this average of 20 come from some walks being 30 and some walks being 10 (and it's typical for somebody to end up 10 steps away). We want Okay, The average (regardless of whether it was north or south).

We might know the average distance of a random-walk, but how far away from the average is a typical walk? (That is, are walks tightly clustered around the average, or spread far away?)
TO DO: Modify your driver method to take the results of the random walks (which are both positive and negative), and modify that array so that each location stores how far each walker got (no negative numbers). That is, replace each item with is absolute value. (Use Math.abs to convert a number to its absolute value.)

TO DO: Now that the driver method has an array just of distances, calculate and print the average of this modified array. (Remember, from a previous step, you already have a method which will compute any array's average. Don't repeat that code; just call that method!) This gives an informative number: The average distance walked.

TO DO Write a method which takes in one array of numbers: first it calculates the average value of the array, and then it returns a whole new array whose elements are the distance from each original number was from the average. /** Given an array of data, * compute how far each element is from the average. * @param data The array of data. * @return an array containing the distance (non-negative) * of each item from the average. * For example: if `data` contains {18.0, 20.0, 22.0}, * this method would first compute the average (20.0), * and return the array {2.0, 0.0, 2.0}. * (since 18 is distance-two from the average, * as is 22 -- we don't care about positive or negative). */ double[] eachDistFromAvg( int[] data )

TO DO
At last, we can provide answers to the question of how tightly clustered random walks are — that is, how-far-from-average a walk tends to be (on average!). Have your driver print out the the average value of the distance-from-the-average.
Of course, getting this should be easy: calling eachDistFromAvg gives you an array of the differences, and then calling average will give us the result: double[] allDistances = eachDistFromAvg( data ); double avgDistFromAvg = avg( allDistances ); or even in one expession (without needing local variables): avg( eachDistFromAvg( data ) ).
However, there is one annoying gotcha: Our method avg wanted an array of ints. But eachDistFromAvg returns an array of doubles, so we can't actually pass the result of eachDistFromAvg to avg!
Sigh: The best we can do in Java If we had started out using the wrapper classes Integer and Double for our data, instead of the primitive int and double, then we could have made avg taking in an array-of-Number Actually, it's not even that simple: you'd need to use the type array of <? extends Number>. , and it would accept either array-of-Integer or array-of-Double. …It kinda makes one wonder, why Java includes the primitive types in the first place, if it just means you need to duplicate code for methods like avg, eh? is write a second version of avg -- one which takes in double[] instead of int[]. (Grrr!)

Reflecting

From a coding standpoint, one thing to notice is that although there are several "TO DO" tasks, each one solves a nice, restricted problem, and is only about 3-4 lines of code. Moreover, by calling methods we have previously written, we leverage this code greatly -- we don't have the code for computing the average repeated four times; we only write it onceYeah yeah, twice, once taking in int[] and once taking in double[]. Grrr again!, and then later we can just call the method.

This approach is called a Monte Carlo simulation: rather than mathematically figuring out exactly what the average is, we do a whole bunch of simulations, and get an idea of what's going on.

Challenge: Run your program on walks with different lengths (say 100 steps, 1000 steps, 10,000 steps, 100,000 steps). Do you see a pattern, in The average distance walked? (Can you guess a formula for it?) The average-distance-from-the-average? (Can you guess a formula for it?)

\item We are not so much concerned with the average value of our data, but rather how far the data tends to vary from its average value (from its ``mean"). For instance, the values $\set{3,4,5}$ and the values $\set{-1,9}$ both have a mean of 4, but in the first case the values tend to be close to 4, but in the second they are further away. In particular, how far are the numbers $\set{3,4,5}$ from their mean, 4, on average? We add up the distance of each number from 4, and then divide by how many numbers we had: (1 + 0 + 1)/3 = 2/3. This indicates that the data is clustered close to its mean: on average, it's only 2/3 away from the mean. In the second case, how far does $\set{-1,9}$ tend to be from 4? This should be a larger value than before, since these numbers are spread further away from their mean. (So don't confused the two things we are averaging: the average value (mean) of the array, and later the average distance from the mean.\footnote{ Two other common ways to measure how spread out data is from its mean are the {\em variance} and the {\em standard deviation}. These quantities are nicer to doeal with mathematically, but aren't quite as intuitive a quantity as what we're calculating. The variance of data is the average {\em squared} distance between each datum from its mean. The standard deviation is the square root of the variance (to try to make things kosher again). However note that $\sqrt{3^2 + 4^5}$ is different than $3+4$; by squaring, then adding, then taking the root we've distorted things a bit.}) So all that said, how can our program calculate the average distance of our data from its mean? We already have a function which calculates the average of an array of numbers, so let's use it again! Use the array distFromMean[] to store the distance from our data to its mean. That is, if meanOfData is 4, and data[17] happened to be -1, then distFromMean[17] would be 5 (the distance between -1 and 4).\footnote{Remember that in math.h, the function double fabs(double x); is declared, which returns the absolute value of its double (floating-point (f)) argument.} That same aside: when debugging/testing your program, this would be a good time to print out the array distFromMean to make sure things are working okay. Just call the function printDoubleArray you already wrote! However, when submitting your final program, please comment out this call to printDoubleArray, as it will just make your output printouts long. \item Hey, now that we have all the distances from the mean in the array distFromMean[], we can find the average of this array! Easy, since we already have a function to do this. Now, We will run our program, using NUM_DATA different random walkers leaving the pub. The function fillArray will put into each array element the result of a single random walk: That is, arr[0] will be the ending position of {\em one} random walk of length WALK_LENGTH. Seperately, arr[1] will be the ending position of another random walk, and so on. So fillArray is almost identical to what it was before, except that instead of filling arr[i] with the value i, it fills it with randWalk( WALK_LENGTH ). The rest of your program, without changing it\footnote{ You may want to modify what your program prints out when reporting its results, though.}, is now calculating the mean ending position of all those random walkers, and how far away they are from this mean position, on average. This approach is called a Monte Carlo simulation: rather than mathematically figuring out exactly what the average is, we do a whole bunch of simulations, and get an idea of what's going on. Can you guess a mathematical expression which gives how far the walkers tend to be from their mean position, as WALK_LENGTH gets larger and larger? We will write a program which creates an array, calculates the average value, and then calculates how far away from the average those values tend to be (i.e., are the numbers clustered around the average, or do they vary widely?). Your main program should do the following. \begin{itemize} \item \#define NUM_DATA 50 (For debugging your program, I suggest using something like 5, and once you are confident it works for small cases, try something bigger like 50.) \item Declare two arrays of size NUM_DATA, of type double. Call them (say) data and distFromMean. (That is, data[0] is the first item of data, and data[NUM_DATA - 1] is the final item.) \item Put interesting values into data[] by calling a function fillArray. This function fillArray takes as arguments an array arr of doubles, and an integer size, and fills arr[0] through arr[size-1] with ``interesting" values. What does ``interesting" mean? We'll come back and spice it up later, but for now write fillArray so that element i of the array just has value i, to keep things simple. An aside: When testing/debugging your program, you may want to (for the moment) have main print out the values of data, just to make sure fillArray is working as expected. An all-around handy tool to have is a function printDoubleArray, which takes two arguments (an array of doubles, and an int representing the size of the array), and prints the array on the screen. However, when submitting your final program, please comment out this call to printDoubleArray, as it will just make your output printouts long. \item Print a message, informing the user of the mean, and the average distance from the mean (ta-dum!). \end{itemize} \newpage \subsection*{Random Walks} One more thing: earlier, I said we'd later let fillArray use more interesting values than $\set{0, 1, \ldots, size-1}$. Instead, we'll use data generated from random walks. Consider a person who leaves a pub, and travels in the following manner: a third of the time they walk north a block. A third of the time they walk south a block. And the remaining third of the time they don't move at all. This is called a ``random walk" (along one dimension, only walking along a line going north-south.) After taking, say, WALK_LENGTH moves, where will they be? Sometimes they'll be fully WALK_LENGTH moves north of the pub, but that's not very likely. Sometimes they'll be {\em exactly} back at the pub entrance (but that's still pretty lucky, if they went north exactly as many times as they went south). Most of the time they'll probably be somewhat near the pub, but not exactly back where they started from. So a natural question is, on average, where do they end up? %On average, how far away from the pub do they end up %(regardless of whether they are north or south of the pub)? If fifty such random walkers all leave the pub, where will they be, on average? How much variation will there be---will these fifty people all be roughly the same distance away from the pub, or will some be very close to the pub while others are nearly a full WALK_LENGTH blocks away? Aha, we are now in a position to experimentally find how far away they are! First, write a function int randWalk( int numSteps ), which simulates a random walk of numSteps steps, and returns the final position (in blocks north of the pub; this number will of course be negative if the final position was south of the pub). You'll call this function, passing it WALK_LENGTH, which you should \#define to be 150. Initially, the walker starts 0 blocks north of the pub, and for the following numSteps times, it takes adds a random amount of $-1$, $0$, or $+1$. (Hmm, last week's function randBetween could come in handy here!) We will run our program, using NUM_DATA different random walkers leaving the pub. % (Changing the name to %something more descriptive, like NUM_WALKERS, %would certainly be reasonable. The function fillArray will put into each array element the result of a single random walk: That is, arr[0] will be the ending position of {\em one} random walk of length WALK_LENGTH. Seperately, arr[1] will be the ending position of another random walk, and so on. So fillArray is almost identical to what it was before, except that instead of filling arr[i] with the value i, it fills it with randWalk( WALK_LENGTH ). The rest of your program, without changing it\footnote{ You may want to modify what your program prints out when reporting its results, though.}, is now calculating the mean ending position of all those random walkers, and how far away they are from this mean position, on average.