In machine learning applications one will often come across the term “random” (or in the jargon of ML, stochastic). What does it mean when we speak about random models (stochastic models) and does the “randomness” of models mean they are not reliable?
Let us begin with a simple analogy:
One day you wake up in the middle of a forest – trees reach high into the sky with a thick, viney canopy blocking out the sky. In every direction you see nothing but trees and vines. Panicking slightly you realise you are lost. What to do?
In the distance you hear what you think is the sound of a highway – if you can get to that highway you will be rescued! The problem facing you is that the sound is very faint and with the breeze rustling the canopy, birds tweeting and the occasional howling wolf (!) you are not quite sure which way to go (your environment is “noisy” in the parlance of machine learning).
You listen intently and decide the highway is in a certain direction and you head off – after 20 steps you pause and re-evaluate. Can you hear any better from here? Which way is the highway now? You decide on a direction and head off – another 20 steps.
Stop. Evaluate. Take 20 steps. Repeat (perhaps hundreds of times).
Eventually you find yourself on the highway – right near mile marker 35. Well done!
Your path through the forest was random (stochastic), with each step taking you closer to your target on average but with some steps taking you away. Imagine in your mind’s eye the path you took, wandering across the forest floor in a “drunken stagger”. Sometimes (from the vantage point of an eagle, and with hindsight), you headed directly to the highway (so close!) and other times you moved away (oh no!). But, on average, you got closer and closer until you reached your goal.
If you were to repeat this process many many times, each path would likely differ, and you would exit at mile marker 34,35,36 and so on in your journey. But in all cases you would land up at the highway. This random walk, with an outcome in a narrow band of true goals (reach the highway), is a stochastic model. We could repeat the exercise hundreds of times, carefully writing down each literal step we took, and then decide which one of the exercises was quickest or took the fewer steps (or was most scenic!) and then publish a guide Lost in Woods for anyone in those exact circumstances.This would be a “model”.
Machine Learning models follow the same basic stochastic approach. They start off lost. They try and learn to predict an outcome by taking small steps (gradients) in the direction that leads to their goal (finding local or global minima) and when they reach a final solution, that solution is just one of many possible solutions, all “good enough” (getting to the highway solved your problem – it was not your aim to reach mile marker X in Y number of steps and Z minutes).
In short, the random nature of machine learning (stochastic gradient descent) is an approach to solve for the local or global minima so as to approximate the best solution in a noisy, non-deterministic model. It is the fundamental concept underpinning all of machine learning and should not concern you – embrace it.