Supervised learning is a machine learning approach in which algorithms are trained on labelled datasets—that is, data that already includes the correct outputs or classifications. The model learns to map inputs to outputs based on these examples, allowing it to make accurate predictions when presented with new, unseen data.
The input is what you feed into the system. It could be text, numbers, or images. The output is the result you expect, like "cat" or "not cat." The model looks at many examples. It finds patterns that link the inputs to the correct outputs. Once trained, the model can guess the output for new inputs it hasn't seen before.
Supervised learning powers many of the intelligent systems we rely on daily. It enables email platforms to recognise and filter spam, helps banks identify potentially fraudulent transactions, and supports voice assistants in understanding and responding to spoken commands. These systems improve over time by learning from labelled examples, making supervised learning a cornerstone in building reliable and accurate AI applications.
Classification is used when the output belongs to a specific category or group. It helps answer questions like "Will this customer buy product X, product Y, or neither?" The model figures out which existing group new information belongs to by recognising patterns from past observations.
Regression is used when the output is a real number or a continuous value. It helps answer questions like "How much will this product sell for?" The model learns how inputs, like past prices or weather patterns, relate to numeric outcomes.
| Aspect | Supervised Learning | Unsupervised Learning | Reinforcement Learning |
|---|---|---|---|
| Data | Uses labeled data (input + correct output) | Uses unlabeled data (no correct output) | Learns by interacting with the environment |
| Goal | Learn the mapping from inputs to outputs | Find hidden patterns or structure in data | Maximise cumulative reward through actions |
| Learning Style | Guided learning using examples with known outcomes | Unguided learning from raw data | Trial-and-error learning based on feedback (rewards/penalties) |
| Example | Predicting house prices or classifying emails as spam | Grouping customers based on purchasing habits | Teaching a robot to walk by rewarding it for correct moves |
| Feedback Mechanism | Feedback comes from comparison to known correct answers | No feedback on correct output; learns from data structure | Feedback comes in the form of rewards or penalties after actions |
Some of the commonly used algorithms in supervised learning include:
Supervised learning models can be trained on large datasets of emails labelled as "spam" or "not spam." These models learn the patterns and characteristics that usually indicate spam, such as certain keywords or sender behaviour.
By feeding historical stock data along with corresponding price outcomes into a supervised learning model, it can learn how factors like trading volume, economic indicators, and news sentiment affect stock prices. The model then tries to forecast future price trends or suggest buy/sell actions.
Voice recognition relies on supervised learning to convert speech into text. These systems learn from audio examples that are already tagged with their written equivalents, enabling them to power popular AI assistants such as Siri, Alexa, and Google Assistant.
In healthcare, supervised learning helps doctors identify conditions like cancer or heart disease. Models are trained on medical images, lab results, or patient data with correct diagnoses.
H3: Creating labelled data means someone has to tag each input with the correct output manually. This process is slow and often costly, especially for large datasets. Without enough labelled data, the model cannot learn effectively.
A model overfits when it memorises the training data, even the irrelevant details and errors. It learns every small detail, even the noise, which reduces its ability to generalise to new data. This leads to poor performance in real-world scenarios.
As the size of the dataset grows, the time and resources needed to process it also increase. Training large models on big datasets can require powerful computers and take a long time.
It’s a type of machine learning where the model learns from labelled data, meaning the input comes with the correct answer.
Supervised learning uses labelled data with known outcomes, while unsupervised learning works with raw data without any labels.
An email dataset labelled as "spam" or "not spam" is a common example.
It’s used in spam filtering, voice recognition, fraud detection, and medical diagnosis.
It requires large labelled datasets, is prone to overfitting, and can be resource-intensive.
Common ones include linear regression, decision trees, support vector machines, and neural networks.
Accuracy is usually checked by comparing the model’s predictions to the correct labels on test data.