When learning Machine Learning, it is important to be clear and concise with its terminology.
Model – a set of patterns learned from data.
Algorithm – a specific ML process used to train a model.
Training data – the dataset from which the algorithm learns the model.
Test data – a new dataset for reliably evaluating model performance.
Features – Variables (columns) in the dataset used to train the model.
Target variable – A specific variable you’re trying to predict.
Observations – Data points (rows) in the dataset.
For example, let’s say you have a dataset of 150 primary school students, and you wish to predict their Height based on their Age, Gender, and Weight…
- You have 150 observations…
- 1 target variable (Height)…
- 3 features (Age, Gender, Weight)…
- You might then separate your dataset into two subsets:
- Set of 120 used to train several models (training set)
- Set of 30 used to pick the best model (test set)