Challenge 05: Regression in Machine Learning

Estimated Time

25-35 min | Cost: Free | Domain: Machine Learning on Azure (15-20%)

Exam skills covered

Identify regression machine learning scenarios
Describe how training data is used in regression
Identify features and labels in a dataset
Understand model evaluation metrics for regression

Overview

Regression is the machine learning technique used to predict a numeric value. Whenever the answer to your question is a number — a price, a temperature, a duration, a quantity — you're looking at a regression problem.

Think of regression like drawing the best-fit line through a scatter plot of data points. If you plot house sizes on one axis and their prices on the other, regression finds the pattern (line or curve) that lets you predict the price of a new house based on its size. The model learns: "for every extra 100 square feet, the price increases by approximately $X."

The key vocabulary: features are the input data (square footage, number of rooms, location), and the label is what you're predicting (the price). Training data is historical examples where both features AND the label are known — the model learns the relationship between them.

Explore

Task 1: Understand regression terminology

Term	Definition	Example (predicting house price)
Features	Input variables used for prediction	Square footage, bedrooms, zip code, year built
Label	The value being predicted (output)	Sale price ($)
Training data	Historical examples with known features AND labels	Past house sales with all details
Model	The mathematical relationship learned from training data	"Price = $150 × sqft + $20,000 × bedrooms + ..."
Prediction	The model's output for new, unseen data	Estimated price for a house not yet sold

Task 2: Identify regression scenarios

Which of these are regression problems? (Answer: all the ones predicting a NUMBER)

Scenario	Regression?	Why
Predicting tomorrow's high temperature	✅ Yes	Output is a numeric value (degrees)
Predicting a student's exam score	✅ Yes	Output is a number (0-100)
Determining if an email is spam	❌ No	Output is a category (spam/not-spam) — this is classification
Predicting how long a delivery will take	✅ Yes	Output is a number (minutes/hours)
Sorting photos into "cat" or "dog"	❌ No	Output is a category — classification
Estimating a car's fuel efficiency (MPG)	✅ Yes	Output is a numeric value (miles per gallon)

Task 3: Explore Azure ML Designer sample regression

Visit Azure Machine Learning Studio
If you don't have a workspace, review this sample pipeline conceptually:
- Dataset: Automobile price data (features: make, body-style, engine-size, horsepower, etc.)
- Algorithm: Linear Regression
- Goal: Predict the price of a car based on its features
The Designer provides a drag-and-drop experience to build ML pipelines without code
Sample pipelines demonstrate regression with real datasets

Task 4: Understand regression evaluation metrics

After training a regression model, you evaluate how good its predictions are:

Metric	What it measures	Good value
MAE (Mean Absolute Error)	Average difference between predicted and actual values	Lower is better
RMSE (Root Mean Squared Error)	Average error, penalizing large mistakes more	Lower is better
R² (R-squared)	How much of the variation the model explains	Closer to 1.0 is better

Example: If a model predicts house prices with MAE of $15,000, it means on average, predictions are off by $15,000 from the actual price.

Exam strategy

The exam tests whether you can IDENTIFY regression scenarios, not whether you can calculate metrics. The key question: "Is the output a number?" If yes → regression. If it's a category → classification.

Key Concepts

Concept	Definition
Regression	ML technique that predicts a continuous numeric value
Features	Input variables (predictors) used by the model
Label	The target value being predicted
Training data	Historical data with known features and labels used to train the model
Linear regression	Simplest regression — finds a straight-line relationship between features and label
Mean Absolute Error (MAE)	Average magnitude of errors in predictions
R-squared (R²)	Proportion of variance in the label explained by the model (0 to 1)
Overfitting	Model memorizes training data instead of learning general patterns

Common Misconceptions

Misconception	Reality
"Regression means the data goes down (regresses)"	In ML, regression means predicting a numeric value. The term comes from statistics ("regression to the mean") — it has nothing to do with declining trends
"Regression can only predict future values"	Regression predicts any numeric value — past, present, or future. Predicting the age of a fossil or the price of a painting are both regression
"More features always make a better model"	Irrelevant features add noise and can worsen predictions. Feature selection — choosing the RIGHT inputs — is crucial
"Linear regression can only model straight lines"	Linear regression models straight-line relationships. But Azure ML offers many regression algorithms (decision trees, neural networks) that can model complex curves
"A high R² always means the model is good"	A very high R² on training data might indicate overfitting — the model memorized the training data but won't generalize to new data

Knowledge Check

1. A company wants to predict how many units of a product they will sell next month based on historical sales data, advertising spend, and seasonal trends. What type of ML problem is this?

2. In a dataset used to predict house prices, which of the following would be the LABEL?

3. A regression model has an R-squared value of 0.92. What does this tell you?

4. Which scenario is NOT a regression problem?

5. What is the role of training data in a regression model?

Exam skills covered​

Overview​

Explore​

Task 1: Understand regression terminology​

Task 2: Identify regression scenarios​

Task 3: Explore Azure ML Designer sample regression​

Task 4: Understand regression evaluation metrics​

Key Concepts​

Common Misconceptions​

Knowledge Check​

Learn More​