L29 — Lines of Best Fit

Key Terms

Line of best fit: A straight line that minimises the sum of squared vertical distances from each data point to the line (least-squares line).
ŷ = a + bx: The equation of the regression line; ŷ (y-hat) is the predicted value of y for a given x.
Slope b: The rate of change — for each 1-unit increase in x, y changes by b units on average.
y-intercept a: The predicted value of y when x = 0; only meaningful if x = 0 is within (or near) the data range.
Interpolation: Predicting y for an x value within the observed range of data — generally reliable.
Residual: The difference (actual y − predicted ŷ); positive residuals sit above the line, negative residuals below.

Line of best fit (least-squares regression line)

A line of best fit is a straight line drawn through a scatter plot to best represent the trend in the data. The least-squares line minimises the sum of squared vertical distances from each point to the line.

Equation of the line of best fit

ŷ = a + bx

Symbol	Meaning
ŷ (y-hat)	Predicted value of y
a	y-intercept (predicted y when x = 0)
b	Slope (change in y per unit increase in x)
x	Value of the explanatory variable

Finding the equation

Method	How
By hand (two points)	Read two points on the drawn line; use y = mx + c
Calculator / technology	Enter data, run linear regression; gives a and b directly
Mean point	The line always passes through (¯x, ¯y)

Interpolation and extrapolation

Term	Meaning	Reliability
Interpolation	Predicting within the range of the data	Generally reliable
Extrapolation	Predicting outside the range of the data	Unreliable — use with caution

Residuals

A residual = actual y − predicted ŷ. Positive residuals sit above the line; negative residuals sit below. A good fit has residuals randomly scattered around zero.

Line of best fit passes through (¯x,¯y); residuals shown as dashed lines

Hot Tip: The line of best fit always passes through the mean point (¯x, ¯y). Use this to check your equation: substitute ¯x and verify the predicted value is close to ¯y.

Worked Example 1 — Drawing a line of best fit by eye

Data: (1,5), (2,7), (3,9), (4,8), (5,12), (6,13). Draw a line of best fit and find its equation.

Step 1: Plot the points. The pattern is positive and roughly linear.

Step 2: Draw a line with roughly equal numbers of points above and below.

Step 3: Read two points on the line: approximately (1, 4) and (6, 14).

Step 4: Slope b = (14 − 4)/(6 − 1) = 10/5 = 2. Intercept: 4 = 2(1) + a ⇒ a = 2.

Equation: ŷ = 2 + 2x.

Worked Example 2 — Using the equation to predict

A line of best fit for temperature (x, °C) vs ice cream sales (y, units) is ŷ = −20 + 15x. Predict sales when temperature is 28°C.

ŷ = −20 + 15(28) = −20 + 420 = 400 units.

Note: if 28°C is within the range of observed data, this is interpolation and reliable.

Worked Example 3 — Interpreting slope and intercept

ŷ = 30 + 4.5x, where x = hours of training, y = fitness score (out of 100).

Slope (b = 4.5): For each additional hour of training, fitness score increases by approximately 4.5 points.

Intercept (a = 30): A person who does zero hours of training is predicted to score 30. (Only meaningful if x = 0 is in the data range.)

Worked Example 4 — Calculating a residual

A student studied 4 hours and scored 78. The line of best fit gives ŷ = 2 + 15(4) = 62. Find the residual.

Residual = actual − predicted = 78 − 62 = +16.

This student scored 16 marks above what the model predicted. They sit above the line.

Worked Example 5 — Mean point check

Data: x = {2, 4, 6, 8}, y = {10, 14, 20, 24}. Verify that (¯x, ¯y) lies on the line ŷ = 5 + 2.3x.

¯x = (2+4+6+8)/4 = 5. ¯y = (10+14+20+24)/4 = 17.

Predicted: ŷ = 5 + 2.3(5) = 5 + 11.5 = 16.5 ≈ 17. ✓ (Close — the line passes through the mean point.)

See Answers ➔

Reading the line of best fit. Fluency
- (a) A line of best fit passes through (2, 10) and (8, 28). Find its equation.
- (b) ŷ = 5 + 3x. Predict y when x = 7.
- (c) ŷ = 100 − 4x. What is the y-intercept? What is the slope?
- (d) ŷ = 12 + 2.5x. Predict y when x = 0. What does this represent?
Interpreting slope and intercept. Fluency
- (a) ŷ = 50 + 6x, where x = weeks of exercise and y = fitness score. Interpret the slope.
- (b) ŷ = 200 − 3x, where x = years and y = resale value ($00s). Interpret the slope and intercept.
- (c) ŷ = 8 + 0.5x, where x = hours of sleep and y = alertness rating. What alertness is predicted for 7 hours?
- (d) ŷ = 60 − 2x, where x = number of absences and y = exam mark. What does a slope of −2 mean?
Residuals. Fluency
- (a) Actual y = 45, predicted ŷ = 38. Find the residual. Is the point above or below the line?
- (b) ŷ = 10 + 4x. For x = 5, actual y = 25. Find the residual.
- (c) A point has residual = −8. Is the actual value higher or lower than predicted?
- (d) Why is the sum of residuals for a least-squares line always (approximately) zero?
Interpolation vs extrapolation. Fluency
- (a) Data was collected for x = 10 to x = 50. A prediction is made at x = 35. Is this interpolation or extrapolation?
- (b) Using the same data, a prediction is made at x = 70. Which type? Is it reliable?
- (c) ŷ = −5 + 2x predicts negative values when x < 2.5. If x ≥ 3 in all observed data, is predicting y at x = 1 sensible?
- (d) Why is extrapolation risky even when r is very close to 1?
Line of best fit from a scatter plot. Understanding

The scatter plot shows hours of revision (x) and exam score (y) for 8 students. A line of best fit has been drawn.
- (a) Read two points on the line of best fit and find its equation.
- (b) Predict the score for a student who revised for 7 hours.
- (c) One student revised 3 hours and scored 65. Calculate their residual.
- (d) Is it reliable to use the line to predict a score for 15 hours of revision? Why?
Finding a line through the mean point. Understanding

Data: x = {2, 4, 6, 8, 10}, y = {14, 18, 20, 26, 32}.
- (a) Find ¯x and ¯y.
- (b) Using technology or estimation, the slope is b = 1.8. Find the y-intercept a using the mean point.
- (c) Write the equation of the line of best fit.
- (d) Predict y for x = 5 and x = 12. Which prediction is more reliable?
Interpreting a regression line. Understanding

A study of 20 cars gives the regression line: ŷ = 22.5 − 1.8x, where x = age of car (years) and y = resale value ($000s).
- (a) Predict the resale value of a 5-year-old car.
- (b) Interpret the slope in context.
- (c) The data covers cars aged 1 to 10 years. Predict the value of a 15-year-old car. Is this reliable?
- (d) At what age does the model predict the car has no value (y = 0)? Is this realistic?
Residual analysis. Understanding

ŷ = 40 + 5x for a dataset of study hours vs test score.

x (hours) 2 4 6 8 10

y (actual) 50 55 72 80 88
- (a) Calculate the predicted ŷ for each x.
- (b) Calculate the residual for each data point.
- (c) Which student performed most above prediction?
- (d) Do the residuals suggest the linear model is a good fit? Explain.
Temperature and fuel consumption. Problem Solving

An engineer records daily temperature x (°C) and fuel consumption y (L/100 km) for a bus over 8 days:

x 5 8 10 15 18 22 25 30

y 14.2 13.5 13.0 12.2 11.8 11.0 10.5 9.8
- (a) Describe the correlation (direction, form, strength).
- (b) ¯x = 16.6, ¯y = 12.0. The slope is b = −0.203. Find the y-intercept a.
- (c) Write the equation and predict fuel use at 20°C.
- (d) Predict fuel use at 40°C. Is this reliable? What limitations apply?
Choosing the better model. Problem Solving

Two researchers analyse the same dataset (age x vs reaction time y in milliseconds). Researcher A proposes ŷ = 200 + 3x. Researcher B proposes ŷ = 150 + 5x.

x (age) 20 30 40 50 60

y (actual) 258 298 350 395 440
- (a) Calculate the predictions from each model at each x.
- (b) Calculate the residuals for each model.
- (c) Which model has smaller overall residuals (closer to 0)? This is the better fit.
- (d) For which age group does Model A give a better prediction, and Model B a worse one?

L29 — Lines of Best Fit

Key Terms

Line of best fit (least-squares regression line)

Equation of the line of best fit

Finding the equation

Interpolation and extrapolation

Residuals

Worked Example 1 — Drawing a line of best fit by eye

Worked Example 2 — Using the equation to predict

Worked Example 3 — Interpreting slope and intercept

Worked Example 4 — Calculating a residual

Worked Example 5 — Mean point check

Reading the line of best fit. Fluency

Interpreting slope and intercept. Fluency

Residuals. Fluency

Interpolation vs extrapolation. Fluency

Line of best fit from a scatter plot. Understanding

Finding a line through the mean point. Understanding

Interpreting a regression line. Understanding

Residual analysis. Understanding

Temperature and fuel consumption. Problem Solving

Choosing the better model. Problem Solving