I am often asked to explain complex technical concepts to people who are not technical. These people are often experts in their own field and are either generally curious as to the subject matter or require a non-technical grasp of the subject matter for decision making purposes.
This article is aimed at non-technical people to assist in basic understanding of machine learning and provide insight into the process. Data scientists may argue some of the points or analogies I make, of feel that some parts are over simplified.
I always find the best way to explain any subject is through an analogy, woven into a story for the more complex subjects. We build toward an understanding of the subject matter through an exploration of material the student already knows.
And we can apply this same concept to AI (or machine learning).
Before we get into the nuts and bolts of the topic, let us first take a simple analogy and work from there. If I were to program a robotic arm to make a cup of tea using conventional programming languages the task would be tedious and time consuming. Now, imagine explaining to a five year old the same task. You would begin by saying “Take a tea cup from the cupboard and place it next to the kettle. Put some water in the kettle and turn it on. When it is boiling, turn it off and pour some water into the tea cup.” The five year old may well succeed at this task, but when you analyse what you have instructed, you will realize the enormous amount of information the child had at her disposal that you did not need to explain. What is a tea cup? Where are they kept? What is water? How do I get water into the kettle? This is all domain knowledge the child has accumulated over a few years. Procedural programming languages cannot learn. The child has learned each step through trial and error over many attempts, and is able to build a new cohesive solution from these partial skills.
Taking one step closer to machine learning from the above analogy, show the child a photo of a common animal (dog,cat,horse etc) and ask what animal it is. You would likely be disappointed if they could not. But how does a child perform this feat? How do they know, instinctively, what animal it is? It could be a dog breed they have never before encountered, and yet most five year olds would recognise it immediately. What is it about a photo of a dog that says “this is a dog”? We humans just “know” things like “if it has big floppy ears it is more likely to be a dog than a cat” and we take all these “known” facts and add it up instantly in our brain and say “dog!”. We have been able to take partial information segments and cohesively build it into one coherent result. These partial pieces of information is our first concept. In machine learning the “partial pieces of information that allow us to decide on an answer” are called features.
Machines learn through a process of identifying which features are most important to a problem area and using those learned importances to make decisions. Going back to the previous example of animal identification, a feature such as “has a head” is easily dismissed by humans. ALL animals have a head (at least those that are alive!). But do they? A starfish does not – but we would still recognise it! If you have ever played “20 questions” you will understand that asking certain questions up front can lead you to a smaller subset of possibilities quickly, but can also cause you to go down the wrong track from the start. “Does it fly” is such a question. If the answer is “No” then we eliminate all birds and start thinking of land or sea animals. But the answer may have been “Ostrich”.
The machine learning algorithm has to decide on which features are most important and for what outcome. These “decisions” are called weights. Different combinations of features can be learned together and these sets would have individual weights. This allows us to overcome the “does it fly/ostrich” type of problems. “Does it fly and not stand 1.4m tall” for example.
Within the structure of a machine learning system we have a number of decision making nodes called neurons. For the sake of simplicity we will only have one layer of neurons for our discussion, and in a future article we will add more. Each neuron is fed information about the problem domain (so each neuron gets access to all the features). Each neuron then determines the importance of the features it has been given and provides information to the output (the part that guesses the answer). If the neuron contributed to the correct answer, that is its “instinct” was correct, it is rewarded through a reinforcement process that strengthens its decision, and if it is wrong it is told to try something else next time. and this repeats for all the neurons. Once it has worked through one game of “20 questions” or “Guess the animal” it is given a second turn, and the process repeats. And it will repeat for every example we have. These examples are called samples.
And we can rerun the full set of samples multiple times (these reruns are called epochs, but we need not worry about that for now).
Over a course of many (many!) samples the machine can learn what features are most likely to indicate the animal is a dog / cat / horse etc. Some neurons will get excited when they see the pairing of “floppy ears” and “20 or more kgs” or “retractable claws” and “slit eyes”. This excitement is called an activation. When neurons activate they send a signal to the output node, and given a specific set of activations, the output node will be able to guess what animal was shown.
If you played had a game of “20 questions” but it was instead “2 questions”, you would think the game was rigged. This is because you realise that the more information you have, the better your guess will be. The same is true for machine learning. The more samples we have, and the more features per sample, the better the result. In order to distinguish which animal it is (or in a business setting, which clients are likely to lapse, or not pay us) we need enough samples with enough important features. Note here that we need important features. And much like the “does it have a head?” analogy earlier it is not always a good idea for us humans to decide what is good or bad in terms of a feature. It may well be that “hair colour”, as an example, is a good indicator of whether or not a person will subscribe to our service! (Maybe our marketing campaigns are subconsciously influencing a certain segment?)
The last part of this mini article I want to touch on is the importance of good data. Imagine we are playing “20 questions” with a small child. We ask “does it fly” and they say “yes”. So we guess “eagle”, “sparrow”, “dragon fly” etc. When we lose, the child says “Ostrich! haha”. The child just assumed that since it has wings, it flies. This is an example of how poor information can cause the system to learn the wrong things. In a business setting if our call centre operators routinely misclassify some people as say old or young by the sound of their voice or the nature of the language they use, without asking for an age our data would be incorrect. This incorrect data will then form features and weights in a system and our results could be invalid. It is not that the machine made an error – it made the correct decisions based on the data it had.
In this article we discussed features, weights and activations and described basic machine learning in the context of how humans learn. If you are left feeling “this is too simple” then fear not. While the mechanics of the process are complex, involving statistics and calculus, the “how it does it in simple terms” is accurate. It is no more difficult to understand the basics of machine learning and avoid the complexity of the mathematics than it is to understand how a child knows what animal she sees and yet know nothing about neuroscience!