Newsgroups: comp.ai.neural-nets From: saswss@hotellng.unx.sas.com (Warren Sarle) Subject: Kangaroos (Was Re: BackProp without Calculus?) Date: Wed, 1 Sep 1993 21:34:56 GMT Training a network is a form of numerical optimization, which can be likened to a kangaroo searching for the top of Mt. Everest. Everest is the global optimum, but the top of any other really high mountain such as K2 would be nearly as good. (We're talking about maximization now, while neural networks are usually discussed in terms of minimization, but if you multiply everything by -1 it works out the same.) Initial weights are usually chosen randomly, which means that the kangaroo may start out anywhere in Asia. If you know something about the scales of the inputs, you may be able to get the kangaroo to start near the Himalayas. However, if you make a really stupid choice of distributions for the random initial weights, or if you have really bad luck, the kangaroo may start in South America. With Newton-type (2nd order) algorithms, the Himalayas are covered with a dense fog, and the kangaroo can only see a little way around its location. Judging from the local terrain, the kangaroo make a guess about where the top of the mountain is, and tries to jump all the way there. In a stabilized Newton algorithm, the kangaroo has an altimeter, and if the jump takes him to a lower point, he backs up to where he was and takes a shorter jump. If the algorithm isn't stabilized, the kangaroo may mistakenly jump to Shanghai and get served for dinner in a Chinese restaurant. (I never claimed this analogy was realistic.) In steepest ascent with line search, the fog is _very_ dense, and the kangaroo can only tell which direction leads up. The kangaroo hops in this direction until the terrain starts going down again, then chooses another direction. In standard backprop or stochastic approximation, the kangaroo is blind and has to feel around on the ground to make a guess about which way is up. He may be fooled by rough terrain unless you use batch training. If the kangaroo ever gets near the peak, he may jump back and forth across the peak without ever landing on the peak. If you use a decaying step size, the kangaroo gets tired and makes smaller and smaller hops, so if he ever gets near the peak he has a better chance of actually landing on it before the Himalayas erode away. In backprop with momentum, the kangaroo has poor traction and can't make sharp turns. I have been unable to devise a kangaroo analogy for cascade correlation. Any ideas, Scott? Notice that in all the methods discussed so far, the kangaroo can hope at best to find the top of a mountain close to where he starts. There's no guarantee that this mountain will be Everest, or even a very high mountain. Various methods are used to try to find the actual global optimum. In simulated annealing, the kangaroo is drunk and hops around randomly for a long time. However, he gradually sobers up and tends to hop up hill. In genetic algorithms, there are lots of kangaroos that are parachuted into the Himalayas (if the pilot didn't get lost) at random places. These kangaroos do not know that they are supposed to be looking for the top of Mt. Everest. However, every few years, you shoot the kangaroos at low altitudes and hope the ones that are left will be fruitful and multiply. -- Warren S. Sarle SAS Institute Inc. The opinions expressed here saswss@unx.sas.com SAS Campus Drive are mine and not necessarily (919) 677-8000 Cary, NC 27513 those of SAS Institute. [follow-up to first article follows] From: prechelt@i41s18.ira.uka.de (Lutz Prechelt) Subject: Re: Kangaroos (Was Re: BackProp without Calculus?) Date: 2 Sep 1993 16:31:45 GMT In article , saswss@hotellng.unx.sas.com (Warren Sarle) writes: |> |> Training a network is a form of numerical optimization, which can |> be likened to a kangaroo searching for the top of Mt. Everest. ... |> Initial weights are usually chosen randomly, which means that the |> kangaroo may start out anywhere in Asia. To shed some more light on what this wonderful article means in respect to the original question (how to understand the backpropagation algorithm without calculus) here are a few additional remarks: 1. The analogy only explains WHAT the algorithm does, but not HOW it does the most intricate part: chosing the direction of the next jump. Basically what the kangaroo does is the following: Wherever it stands, it carves two ditches whose walls meet in a way so as to form a V-shape; one ditch in north-south direction and one in east-west direction. It plates the walls of these ditches first with steel then with teflon so as to minimize friction (most, but not all variants of backprop in fact minimize friction to zero) and so that all small valleys or hills the ditch may have had are averaged out. Then the kangaroo takes a bowling ball out of its pouch, puts it into the north-south ditch and measures how far it rolls in a certain time and in which direction it rolls away along the ditch. This procedure is repeated for the east-west ditch. Let's assume the ball rolled 8 centimeters in north direction in the first ditch and 14 centimeters in the east direction in the second ditch. Then a kangaroo that uses learning rate 50 will jump to a point that is 4 meters north and 7 meters east of where it was before. It is not important for the algorithm whether the kangaroo uses the same bowling ball over and over again, or throws it away after each measurement and picks a new one from its pouch next time. This is because in the backpropagation world, bowling balls bio-degrade in zero time. 2. As all nice simplifications, this one, too, has a slight drawback. In this case, the limitation is that the analogy only explains the case of a network with two weights (which is less than *any* useful backpropagation network must have). These two weights are represented by the two orthogonal search directions of the kangaroo (North-South and East-West). In order to generalize the example to, say, a fully connected network with three layers containing 10 input nodes, 5 hidden nodes, and 8 output nodes (having 10x5 + 5x8 = 50 + 40 = 90 weights) you have to imagine the same situation in a world existing in a 91-dimensional space instead of our 3-dimensional one. I assure you that to visualize this generalization is just as easy for a non-math person as it is for any calculus professor. 3. Oh, yes, one more very important question: Why does the Himalaya look just like it does ? The answer is: it doesn't. The mountains in which the kangaroo jumps around are `induced' by your training data. Each example suggests certain hills or mountains at certain points on the surface of the (otherwise absolutely flat) earth. If the kangaroo performs a `batch' search, the world looks like the arithmetic average of what the training examples suggest. If the kangaroo performs an `online' search, the situation is more complicated: There is one world for each training example; each of these worlds looks exactly like the one training example it was made from suggests. The kangaroo takes one jump in the first world according to the above method and is then magically transfered to the equivalent point in the next world, that is, to the point with the same longitude and latitude, er, the same x and y coordinates (since the worlds have to be rectangular for backprop, instead of spheric). In each of the worlds the same procedure is applied and then the kangaroo continues in the first world again. Interestingly enough, the magical inter-world transfer is so inspiring to the kangaroo that it can make one jump in all of the `online' worlds (no matter how many there are) in about the same time it needs for only two jumps in the `batch' world. This is the reason why `online' kangaroos often find the point that provides the best compromise between the altitude in all worlds much faster than `batch' kangaroos find the top of the highest mountain in their single world. Sometimes, however, the inter-world transfers are so confusing to the `online' kangaroo that it never (or only very slowly) finds the optimal point. There are lots of heuristics to further improve the speed and/or precision of the kangaroo's search. Most of them, though, require a pocket calculator or lots of note paper or both. >From all this we can conclude that the best methods to find the Mount Everest are (in order): 1. to know where it is 2. to have a map on which you can find it 3. to know someone how knows where it is or who has a map 4. to send a kangaroo to search for it and even if you have to send a kangaroo, it is useful if you know at least 1. where the mountain range is in which the Mount Everest may be and 2. how to bring your kangaroo to that mountain range. Lutz P.S.: Newest research results in the neural network area indicate that backprop also works with frogs if you replace the bowling ball with something appropriate (for instance a solar-powered electro-mechanic 3-bit steep-O-meter). -- -- alt.humor.best-of-usenet -- -- Funniest postings from USENET, altnet, and the worlds beyond -- -- Moderator's address: best@polaris.async.vt.edu --