
Author's note: This a high level, introductory piece aimed to introduce the core tenets of Machine Learning. For more technical pieces, please visit our blog.
Machine Learning is a hot topic, but it covers a huge range of applications — and with so many people are talking about it — it can be overwhelming to navigate, especially when you’re just starting out. So we put together this article to introduce you to three of the big schools of Machine Learning (ML) — Supervised Learning, Unsupervised Learning, and Reinforcement Learning. These three are responsible for many of the innovations in Machine Learning in recent years — of which you’ve probably experienced but not realised.
So, what’s the difference?
Supervised Learning, Unsupervised Learning and Reinforcement Learning
The “Learning” component of these three applications (and machine learning more generally), refers to how a computer ‘learns’ to do a task using an algorithm. The key difference between these learning types is how the algorithm achieves the task or gets to the right answer.
Before we continue, a few comments on language used within this piece: “Learning” for an algorithm refers to training the algorithm on a data set until it achieves an acceptable degree of accuracy. What is acceptable will depend on various factors including the end application of the algorithm. For example, if one were training an algorithm to detect cancer, one would want the highest degree of confidence possible, to avoid needlessly stressing patients. On the other hand, if one were to train an algorithm to detect whether a picture contained a jar of jam or a lemon meringue pie, the stakes are much lower.
Supervised Learning

When you use Supervised Learning (SL) algorithms, the algorithm is trained on labelled data.. This is usually with the help of training sets — data sets that are already labelled with the right answer — which the algorithm uses to figure out how to get this answer. Then, when the algorithm encounters new, unlabelled data, it can use what it’s learnt to label the new data correctly. Or, if it doesn’t get it right, the training set ‘corrects’ it and the algorithm keeps learning. For example, in the case of the jam jar/meringue quandary posed earlier, we would have a data set of 10,000 photos. Half would contain jam, half would contain meringue. They would be tagged as such. We would take 9000 of the photos for training. The algorithm would “look” at each picture (Note — link to “How computer vision works”), and slowly it would be trained to detect each image. Once trained, if it were shown an unlabelled picture (from the 1000 we thoughtfully didn’t train it on), the algorithm would be able to output whether it thought the photo contained either a jar of jam or a lemon meringue pie, and a degree of confidence — expressed as a percentage.

Unsupervised Learning

Unlike in Supervised Learning, algorithms that use Unsupervised Learning (UL) do not start with labelled data. They are trained on unlabelled data (and it is in this way they are “unsupervised”). In the example of jam jars and meringues, they would be shown the same dataset — this time without the labels. Training is undertaken and the algorithm has to instead look for relationships in the data. By doing this, the algorithm can either sort out the data into groups based on common features or try and find rules that describe as much of the data as it can. Then it’s up to the user to decide whether the algorithm’s answer is useful. If it is, then that’s great! If not, then the user can run a different algorithm to see what kind of answer it comes up with. So, we know what that means for jam that comes in a jar, and pies of the lemon meringue ilk. In a business sense, you could train an algorithm on your customer data and find relationships that you didn’t know existed and this could lead to new opportunities for you. A famous example of this was Netflix realising that geography and age was not an accurate indicator of taste from person to person. Instead, if they clustered audience by what they had watched, then recommended other shows viewers of the same cluster had watched, they could greatly improve the accuracy of their recommendation system.
