Skip to main content

Challenge 05: Regression in Machine Learning

Estimated Time

25-35 min | Cost: Free | Domain: Machine Learning on Azure (15-20%)

Exam skills covered

  • Identify regression machine learning scenarios
  • Describe how training data is used in regression
  • Identify features and labels in a dataset
  • Understand model evaluation metrics for regression

Overview

Regression is the machine learning technique used to predict a numeric value. Whenever the answer to your question is a number — a price, a temperature, a duration, a quantity — you're looking at a regression problem.

Think of regression like drawing the best-fit line through a scatter plot of data points. If you plot house sizes on one axis and their prices on the other, regression finds the pattern (line or curve) that lets you predict the price of a new house based on its size. The model learns: "for every extra 100 square feet, the price increases by approximately $X."

The key vocabulary: features are the input data (square footage, number of rooms, location), and the label is what you're predicting (the price). Training data is historical examples where both features AND the label are known — the model learns the relationship between them.

Explore

Task 1: Understand regression terminology

TermDefinitionExample (predicting house price)
FeaturesInput variables used for predictionSquare footage, bedrooms, zip code, year built
LabelThe value being predicted (output)Sale price ($)
Training dataHistorical examples with known features AND labelsPast house sales with all details
ModelThe mathematical relationship learned from training data"Price = $150 × sqft + $20,000 × bedrooms + ..."
PredictionThe model's output for new, unseen dataEstimated price for a house not yet sold

Task 2: Identify regression scenarios

Which of these are regression problems? (Answer: all the ones predicting a NUMBER)

ScenarioRegression?Why
Predicting tomorrow's high temperature✅ YesOutput is a numeric value (degrees)
Predicting a student's exam score✅ YesOutput is a number (0-100)
Determining if an email is spam❌ NoOutput is a category (spam/not-spam) — this is classification
Predicting how long a delivery will take✅ YesOutput is a number (minutes/hours)
Sorting photos into "cat" or "dog"❌ NoOutput is a category — classification
Estimating a car's fuel efficiency (MPG)✅ YesOutput is a numeric value (miles per gallon)

Task 3: Explore Azure ML Designer sample regression

  1. Visit Azure Machine Learning Studio
  2. If you don't have a workspace, review this sample pipeline conceptually:
    • Dataset: Automobile price data (features: make, body-style, engine-size, horsepower, etc.)
    • Algorithm: Linear Regression
    • Goal: Predict the price of a car based on its features
  3. The Designer provides a drag-and-drop experience to build ML pipelines without code
  4. Sample pipelines demonstrate regression with real datasets

Task 4: Understand regression evaluation metrics

After training a regression model, you evaluate how good its predictions are:

MetricWhat it measuresGood value
MAE (Mean Absolute Error)Average difference between predicted and actual valuesLower is better
RMSE (Root Mean Squared Error)Average error, penalizing large mistakes moreLower is better
R² (R-squared)How much of the variation the model explainsCloser to 1.0 is better

Example: If a model predicts house prices with MAE of $15,000, it means on average, predictions are off by $15,000 from the actual price.

Exam strategy

The exam tests whether you can IDENTIFY regression scenarios, not whether you can calculate metrics. The key question: "Is the output a number?" If yes → regression. If it's a category → classification.

Key Concepts

ConceptDefinition
RegressionML technique that predicts a continuous numeric value
FeaturesInput variables (predictors) used by the model
LabelThe target value being predicted
Training dataHistorical data with known features and labels used to train the model
Linear regressionSimplest regression — finds a straight-line relationship between features and label
Mean Absolute Error (MAE)Average magnitude of errors in predictions
R-squared (R²)Proportion of variance in the label explained by the model (0 to 1)
OverfittingModel memorizes training data instead of learning general patterns

Common Misconceptions

MisconceptionReality
"Regression means the data goes down (regresses)"In ML, regression means predicting a numeric value. The term comes from statistics ("regression to the mean") — it has nothing to do with declining trends
"Regression can only predict future values"Regression predicts any numeric value — past, present, or future. Predicting the age of a fossil or the price of a painting are both regression
"More features always make a better model"Irrelevant features add noise and can worsen predictions. Feature selection — choosing the RIGHT inputs — is crucial
"Linear regression can only model straight lines"Linear regression models straight-line relationships. But Azure ML offers many regression algorithms (decision trees, neural networks) that can model complex curves
"A high R² always means the model is good"A very high R² on training data might indicate overfitting — the model memorized the training data but won't generalize to new data

Knowledge Check

1. A company wants to predict how many units of a product they will sell next month based on historical sales data, advertising spend, and seasonal trends. What type of ML problem is this?

2. In a dataset used to predict house prices, which of the following would be the LABEL?

3. A regression model has an R-squared value of 0.92. What does this tell you?

4. Which scenario is NOT a regression problem?

5. What is the role of training data in a regression model?

Learn More