Practice Maths

L28 — Scatter Plots and Correlation

Key Terms

Bivariate data
Two variables measured on the same individual — we look for a relationship between them.
Explanatory variable
The variable on the x-axis that we think influences the response variable (also called the independent variable).
Response variable
The variable on the y-axis that responds to changes in the explanatory variable (also called the dependent variable).
Correlation
The strength and direction of a linear relationship between two variables, described as positive, negative, or none.
Pearson's r
A number between −1 and 1 measuring the strength and direction of linear correlation; |r| close to 1 means strong, close to 0 means weak.
Outlier
A data point that lies far from the general pattern and can strongly influence the value of r.

Bivariate data

Bivariate data involves two variables measured on the same individual or unit. We look for a relationship (association) between the explanatory variable (x-axis, independent) and the response variable (y-axis, dependent).

Scatter plots

A scatter plot displays bivariate data as ordered pairs (x, y) plotted on a Cartesian plane. Each point represents one observation.

Correlation

TermMeaning
Positive correlationAs x increases, y tends to increase
Negative correlationAs x increases, y tends to decrease
No correlationNo discernible pattern
Strong correlationPoints cluster tightly around a line
Weak correlationPoints scattered widely
LinearPattern approximates a straight line
Non-linearPattern is curved

Pearson’s correlation coefficient r

r measures the strength and direction of linear correlation. −1 ≤ r ≤ 1.

Value of rInterpretation
r = 1Perfect positive linear
0.7 ≤ r < 1Strong positive
0.3 ≤ r < 0.7Moderate positive
0 < r < 0.3Weak positive
r = 0No linear correlation
−0.3 < r < 0Weak negative
−0.7 < r ≤ −0.3Moderate negative
−1 < r ≤ −0.7Strong negative
r = −1Perfect negative linear

Correlation does NOT imply causation!

Positive (r ≈ 0.97) No correlation (r ≈ 0) Negative (r ≈ −0.98)
Three types of scatter plot correlation
Hot Tip: Correlation does not imply causation — even r = −1 or r = 1 only tells you the variables move together, not that one causes the other. Always ask: could a lurking variable explain the pattern?

Worked Example 1 — Describing a scatter plot

A scatter plot shows hours of exercise (x) vs resting heart rate in bpm (y) for 10 people. The points slope downward from left to right in a fairly tight band.

Direction: Negative (as exercise hours increase, heart rate decreases).

Form: Linear (points roughly follow a straight line pattern).

Strength: Strong (points cluster closely).

r ≈ −0.85 (strong negative linear correlation).

Worked Example 2 — Identifying explanatory and response variables

A researcher records daily temperature (°C) and ice cream sales ($). Which is the explanatory variable?

Explanatory (x): Temperature — this is the variable we think influences the other.

Response (y): Ice cream sales — this responds to changes in temperature.

We plot temperature on the x-axis and sales on the y-axis.

Worked Example 3 — Interpreting r

A study finds r = −0.62 between screen time (hours/day) and sleep quality score.

Interpretation: Moderate negative linear correlation. As daily screen time increases, sleep quality tends to decrease. However, this does not prove that screen time causes poor sleep — there may be confounding variables (e.g. stress).

Worked Example 4 — Spotting an outlier

In a scatter plot of study hours vs test score, most points follow a positive trend. One student studied 8 hours but scored only 32%. How should this be handled?

Step 1: Identify: (8, 32%) is an outlier — it lies far from the main cluster.

Step 2: Investigate: Is the data recorded correctly? Was the student unwell?

Step 3: Report: Describe the overall trend, and note the outlier separately. Do not automatically remove it.

Worked Example 5 — Correlation vs causation

A study shows strong positive correlation (r = 0.91) between the number of swimming pools in a suburb and the crime rate. Does having more swimming pools cause more crime?

No. This is a case of a lurking variable: wealthier suburbs have more pools and tend to have more reported crime simply due to population density. Correlation does not establish causation — a controlled experiment is needed.

  1. Describing correlation. Fluency

    • (a) A scatter plot shows points rising steeply from lower-left to upper-right in a tight band. Describe the correlation.
    • (b) r = −0.85. Is the correlation positive or negative? Strong or weak?
    • (c) r = 0.15. Describe the correlation.
    • (d) Which value of r indicates the strongest linear relationship: 0.6, −0.8, 0.3, or −0.2?
  2. Explanatory and response variables. Fluency

    • (a) Study hours and exam mark. Which is the explanatory variable?
    • (b) Rainfall (mm) and crop yield (tonnes). Which goes on the x-axis?
    • (c) A researcher measures shoe size and IQ score. Can we identify an explanatory variable? Why or why not?
    • (d) Age of a car (years) and its resale value ($). Which is the response variable?
  3. Reading scatter plot features. Fluency

    • (a) How many data points are in a scatter plot if it has 15 plotted dots?
    • (b) A scatter plot has points clustered in a curve (not a straight line). Is the relationship linear or non-linear?
    • (c) One point sits far away from all others. What is this called?
    • (d) Two variables show r = 0. Does this mean they are not related? Explain.
  4. Estimating r from description. Fluency

    • (a) Height of parents vs height of their children — moderate positive. Which of these is most likely? r = 0.5, r = 0.95, r = −0.5.
    • (b) Outside temperature vs hot chocolate sales — strong negative. Most likely r?
    • (c) Number of absences vs final grade — moderate negative. Most likely r?
    • (d) Random number generator output (x) vs another (y). Most likely r?
  5. Analysing a scatter plot. Understanding

    The scatter plot below shows average daily temperature (°C) vs number of visitors to a beach on 10 different days.

    18 20 22 24 26 28 30 0 200 400 600 800 1000 outlier? Temperature (°C) Visitors
    • (a) Describe the direction, form, and strength of the correlation for the main cluster of 9 points.
    • (b) Estimate the value of r for the 9-point cluster.
    • (c) Identify the outlier and suggest a possible reason for it.
    • (d) Does the scatter plot suggest that higher temperatures cause more beach visitors? Explain.
  6. Correlation vs causation. Understanding

    • (a) A study shows a strong positive correlation between the number of fire trucks at a fire and the amount of damage done. Does sending more trucks cause more damage?
    • (b) Shoe size and reading ability show positive correlation in primary school students. Explain why.
    • (c) Give an example of two variables with positive correlation that are not causally related.
    • (d) A controlled experiment removes confounding variables. Why can’t we run a controlled experiment to test “does smoking cause cancer” on humans?
  7. Plotting and reading a scatter plot. Understanding

    The data below shows hours of study (x) and test score (y) for 7 students.

    x (hours)1234567
    y (score)42555870748085
    • (a) Describe the overall trend.
    • (b) Which student’s result is most surprising?
    • (c) Estimate the score a student who studied 4.5 hours might achieve.
    • (d) Is it reasonable to predict the score for a student who studied 15 hours? Why or why not?
  8. Comparing scatter plots. Understanding

    • (a) Scatter plot A has r = 0.92 and scatter plot B has r = 0.45. Which shows a stronger linear relationship?
    • (b) A scatter plot has r = −0.87. What does this tell us about the relationship?
    • (c) Two scatter plots each have r = 0.7 but different shapes (one linear, one curved). Why might r be misleading for the curved one?
    • (d) r = 0.04 between people’s birthdate and their income. What can we conclude?
  9. Fitness study. Problem Solving

    A fitness researcher collects data on 8 participants: daily steps (thousands) and body mass index (BMI).

    Steps (000s)345678910
    BMI3129272624232120
    • (a) Describe the direction and strength of the relationship.
    • (b) Identify the explanatory and response variables.
    • (c) Can the researcher conclude that walking more steps causes a lower BMI? What other factors might be involved?
    • (d) Estimate r from the pattern. Is it closer to −0.3, −0.7, or −0.98?
  10. Misleading correlations. Problem Solving

    A news headline reads: “Areas with more dentists have lower rates of tooth decay. The government should hire more dentists to fix tooth decay.”

    • (a) What type of correlation is described?
    • (b) Suggest a lurking variable that might explain the correlation without dentists causing the decrease.
    • (c) Design a better study to test whether more dentists cause lower tooth decay rates.
    • (d) Give two examples from everyday life where a strong correlation exists but causation cannot be assumed.