Topic Review — Bivariate Data and Scatter Plots
Mixed questions covering all three lessons. Click each answer button to reveal the solution.
-
Correlation basics. Fluency
- (a) r = 0.91. Describe the correlation strength and direction.
- (b) r = −0.35. Is this strong or weak?
- (c) Which has a stronger linear relationship: r = 0.65 or r = −0.78?
- (d) r = 0. Does this mean the variables are unrelated?
-
Lines of best fit. Fluency
- (a) ŷ = 15 + 4x. Find ŷ when x = 6.
- (b) A line passes through (2, 20) and (8, 44). Find its equation.
- (c) ŷ = 80 − 3x. What does the slope tell us?
- (d) Actual y = 52, ŷ = 47. Find the residual.
-
r² and interpretation. Fluency
- (a) r = 0.9. Calculate r² and interpret it.
- (b) r² = 0.36. What is |r|?
- (c) r = −0.7. What percentage of variation in y is unexplained by x?
- (d) A study finds r² = 0.02. Is x a good predictor of y?
-
Interpolation, extrapolation, and causation. Fluency
- (a) Data range x = 5 to 25. A prediction is made at x = 18. Interpolation or extrapolation?
- (b) A prediction is made at x = 40. Which type?
- (c) A study finds a strong positive correlation between shoe size and maths ability in children aged 6–14. What is the most likely explanation?
- (d) Name one situation where a high r is expected and causation is genuinely plausible.
-
Scatter plot analysis. Understanding
The scatter plot below shows advertising spend ($000s) vs monthly sales ($000s) for a company over 10 months. A line of best fit is shown.
- (a) Describe the correlation shown by the main cluster.
- (b) Read two points on the line of best fit and find its equation.
- (c) r = 0.98. Calculate r² and interpret it in context.
- (d) Can we conclude that more advertising causes higher sales? What other factors might be involved?
-
Two-way table. Understanding
400 students: phone use (>4 h/day = High, ≤4 h = Low) vs sleep (≥8 h = Good, <8 h = Poor).
Good Sleep Poor Sleep Total High phone use 40 160 200 Low phone use 130 70 200 Total 170 230 400 - (a) What percentage of high phone users get good sleep?
- (b) What percentage of low phone users get good sleep?
- (c) Does the table suggest an association? State the direction.
- (d) Can we say high phone use causes poor sleep? What study design would help establish this?
-
Regression in context. Understanding
ŷ = 3.5 + 0.12x, where x = number of pages in a book, y = reading time (hours). Data range: x = 100 to 500 pages.
- (a) Predict reading time for a 300-page book.
- (b) Interpret the slope in context.
- (c) Predict reading time for a 700-page book. Is this reliable?
- (d) A reader finished a 250-page book in 40 hours. Find the residual and interpret.
-
Outlier investigation. Understanding
- (a) A scatter plot of income vs savings has r = 0.82. When one data point (very low income, very high savings) is removed, r rises to 0.94. Describe the outlier’s effect.
- (b) A dataset has n = 4 points. One outlier strongly pulls the regression line. How does small sample size increase the impact of an outlier?
- (c) You discover that one data point in a study was recorded with a decimal place error (e.g. 45.0 instead of 4.50). What should you do?
- (d) When should an outlier be kept in the analysis?
-
Crop yield prediction. Problem Solving
A farmer records rainfall (x, mm) and wheat yield (y, tonnes/hectare) over 8 seasons. r = 0.87, ŷ = 1.2 + 0.05x, data range x = 200–600 mm.
- (a) Calculate r² and state how much variation in yield is explained by rainfall.
- (b) Predict yield for a season with 450 mm rainfall.
- (c) A year with 800 mm rainfall is predicted by ŷ = 1.2+0.05(800) = 41.2 t/ha. Give two reasons this may be unreliable.
- (d) Rainfall is just one factor. List three other variables that affect wheat yield.
-
Designing and evaluating a study. Problem Solving
A school wants to investigate whether class size (x, students) affects student performance (y, average test score).
- (a) Identify the explanatory and response variables.
- (b) Data from 20 classes gives r = −0.68, ŷ = 85 − 0.4x. Interpret the slope.
- (c) The school uses this to justify hiring more teachers (reducing class sizes from 30 to 20). What does the model predict will happen? Calculate the change.
- (d) List two confounding variables that might explain the negative correlation between class size and performance without class size directly causing the improvement.
-
Mean point and line equation. Problem Solving
Five data points: x = {3, 5, 7, 9, 11}, y = {22, 30, 38, 46, 54}.
- (a) Find ¯x and ¯y.
- (b) From the pattern, estimate the slope.
- (c) Use the mean point to find the y-intercept a.
- (d) Verify your equation passes through all 5 points. What does this suggest about r?
-
Full bivariate report. Problem Solving
A health researcher studies hours of exercise per week (x) vs resting heart rate (y, bpm) in 50 adults aged 20–40. Results: r = −0.79, ŷ = 85 − 2.3x (data: x = 0–10 hours).
- (a) Write a complete description of the association (direction, form, strength, r²).
- (b) Predict heart rate for someone exercising 5 hours per week.
- (c) Predict for 15 hours/week. Comment on reliability.
- (d) Write a caution statement about this study’s findings.