Practice Maths

Topic Review — Bivariate Data and Scatter Plots

Mixed questions covering all three lessons. Click each answer button to reveal the solution.

  1. Correlation basics. Fluency

    • (a) r = 0.91. Describe the correlation strength and direction.
    • (b) r = −0.35. Is this strong or weak?
    • (c) Which has a stronger linear relationship: r = 0.65 or r = −0.78?
    • (d) r = 0. Does this mean the variables are unrelated?
  2. Lines of best fit. Fluency

    • (a) ŷ = 15 + 4x. Find ŷ when x = 6.
    • (b) A line passes through (2, 20) and (8, 44). Find its equation.
    • (c) ŷ = 80 − 3x. What does the slope tell us?
    • (d) Actual y = 52, ŷ = 47. Find the residual.
  3. r² and interpretation. Fluency

    • (a) r = 0.9. Calculate r² and interpret it.
    • (b) r² = 0.36. What is |r|?
    • (c) r = −0.7. What percentage of variation in y is unexplained by x?
    • (d) A study finds r² = 0.02. Is x a good predictor of y?
  4. Interpolation, extrapolation, and causation. Fluency

    • (a) Data range x = 5 to 25. A prediction is made at x = 18. Interpolation or extrapolation?
    • (b) A prediction is made at x = 40. Which type?
    • (c) A study finds a strong positive correlation between shoe size and maths ability in children aged 6–14. What is the most likely explanation?
    • (d) Name one situation where a high r is expected and causation is genuinely plausible.
  5. Scatter plot analysis. Understanding

    The scatter plot below shows advertising spend ($000s) vs monthly sales ($000s) for a company over 10 months. A line of best fit is shown.

    0 2 4 6 8 10 Advertising ($000s) 50 70 90 110 130 150 170 190 210 Sales ($000s)
    • (a) Describe the correlation shown by the main cluster.
    • (b) Read two points on the line of best fit and find its equation.
    • (c) r = 0.98. Calculate r² and interpret it in context.
    • (d) Can we conclude that more advertising causes higher sales? What other factors might be involved?
  6. Two-way table. Understanding

    400 students: phone use (>4 h/day = High, ≤4 h = Low) vs sleep (≥8 h = Good, <8 h = Poor).

    Good SleepPoor SleepTotal
    High phone use40160200
    Low phone use13070200
    Total170230400
    • (a) What percentage of high phone users get good sleep?
    • (b) What percentage of low phone users get good sleep?
    • (c) Does the table suggest an association? State the direction.
    • (d) Can we say high phone use causes poor sleep? What study design would help establish this?
  7. Regression in context. Understanding

    ŷ = 3.5 + 0.12x, where x = number of pages in a book, y = reading time (hours). Data range: x = 100 to 500 pages.

    • (a) Predict reading time for a 300-page book.
    • (b) Interpret the slope in context.
    • (c) Predict reading time for a 700-page book. Is this reliable?
    • (d) A reader finished a 250-page book in 40 hours. Find the residual and interpret.
  8. Outlier investigation. Understanding

    • (a) A scatter plot of income vs savings has r = 0.82. When one data point (very low income, very high savings) is removed, r rises to 0.94. Describe the outlier’s effect.
    • (b) A dataset has n = 4 points. One outlier strongly pulls the regression line. How does small sample size increase the impact of an outlier?
    • (c) You discover that one data point in a study was recorded with a decimal place error (e.g. 45.0 instead of 4.50). What should you do?
    • (d) When should an outlier be kept in the analysis?
  9. Crop yield prediction. Problem Solving

    A farmer records rainfall (x, mm) and wheat yield (y, tonnes/hectare) over 8 seasons. r = 0.87, ŷ = 1.2 + 0.05x, data range x = 200–600 mm.

    • (a) Calculate r² and state how much variation in yield is explained by rainfall.
    • (b) Predict yield for a season with 450 mm rainfall.
    • (c) A year with 800 mm rainfall is predicted by ŷ = 1.2+0.05(800) = 41.2 t/ha. Give two reasons this may be unreliable.
    • (d) Rainfall is just one factor. List three other variables that affect wheat yield.
  10. Designing and evaluating a study. Problem Solving

    A school wants to investigate whether class size (x, students) affects student performance (y, average test score).

    • (a) Identify the explanatory and response variables.
    • (b) Data from 20 classes gives r = −0.68, ŷ = 85 − 0.4x. Interpret the slope.
    • (c) The school uses this to justify hiring more teachers (reducing class sizes from 30 to 20). What does the model predict will happen? Calculate the change.
    • (d) List two confounding variables that might explain the negative correlation between class size and performance without class size directly causing the improvement.
  11. Mean point and line equation. Problem Solving

    Five data points: x = {3, 5, 7, 9, 11}, y = {22, 30, 38, 46, 54}.

    • (a) Find ¯x and ¯y.
    • (b) From the pattern, estimate the slope.
    • (c) Use the mean point to find the y-intercept a.
    • (d) Verify your equation passes through all 5 points. What does this suggest about r?
  12. Full bivariate report. Problem Solving

    A health researcher studies hours of exercise per week (x) vs resting heart rate (y, bpm) in 50 adults aged 20–40. Results: r = −0.79, ŷ = 85 − 2.3x (data: x = 0–10 hours).

    • (a) Write a complete description of the association (direction, form, strength, r²).
    • (b) Predict heart rate for someone exercising 5 hours per week.
    • (c) Predict for 15 hours/week. Comment on reliability.
    • (d) Write a caution statement about this study’s findings.