Bivariate Data Analysis — Topic Review
This review covers all four lessons in Bivariate Data Analysis: Scatterplots & Correlation, Least-Squares Regression, Prediction & Interpolation, and Residual Analysis.
Review Questions
-
A researcher records the number of hours per week spent exercising (x) and the systolic blood pressure in mmHg (y) for 15 adults. The data gives Pearson’s r = −0.84.
- Describe the association fully (direction, form, strength).
- Which variable is the explanatory variable? Justify.
- Does this mean exercise causes lower blood pressure? Explain.
-
Match each r value to its correct description:
r value Description 0.12 Strong negative linear −0.79 Moderate positive linear 0.64 Weak positive linear −0.97 Very strong negative linear -
A least-squares regression line for data on fertiliser applied (x, kg/ha) and crop yield (y, t/ha) is found to be ŷ = 1.8 + 0.24x.
- Interpret the gradient 0.24 in context.
- Interpret the y-intercept 1.8 in context.
- Use the equation to predict the yield when 30 kg/ha of fertiliser is applied.
-
The table below shows data on average daily temperature (°C) and number of hot drinks sold at a café:
Temp x (°C) 10 15 20 25 30 Drinks sold y 85 74 62 48 35 - Identify the explanatory and response variables.
- Technology gives the regression equation ŷ = 132.0 − 3.22x. Interpret the gradient.
- What does the negative gradient tell us about the relationship?
-
The regression equation for house price (y, $’000) and floor area (x, m²) is ŷ = 180 + 2.35x. The data was collected from houses with floor areas between 80 m² and 280 m².
- Predict the price of a house with floor area 150 m². Is this interpolation or extrapolation?
- Predict the price of a house with floor area 400 m². Is this reliable? Explain.
- A house sells for $650,000 (y = 650). Find the residual if its floor area is 180 m².
-
A regression line is fitted to data on study hours (x) and exam score (y). The equation is ŷ = 38 + 6.4x, based on students who studied between 1 and 10 hours.
- Predict the score for a student who studied 5 hours.
- A student claims the model predicts a score of 166 for 20 hours of study. Evaluate this claim.
- Why does extrapolation become increasingly unreliable the further we go beyond the data range?
-
A regression line ŷ = 3 + 2.1x is fitted to a dataset. The residuals (in x-order) are: −2.1, −1.5, −0.8, +0.2, +1.1, +1.9, +2.8.
- Describe the pattern in these residuals.
- What does this pattern tell you about the linear model?
- What type of model would be more appropriate?
-
For a study on advertising spend and sales, technology gives r = 0.88.
- Calculate the coefficient of determination r².
- Interpret r² in context.
- A colleague says “r² = 0.88 means the model is almost perfect.” Is this correct? Explain.
-
Data on the age of a machine (x, years) and its maintenance cost (y, $) gives the regression equation ŷ = 420 + 185x, with r = 0.94 and data collected for machines aged 1–8 years.
- Predict the maintenance cost for a 5-year-old machine.
- Calculate and interpret r².
- Is it appropriate to use this equation to predict the maintenance cost of a 15-year-old machine? Explain.
- The actual maintenance cost for a 5-year-old machine is $1,450. Find and interpret the residual.
-
A scatterplot of body mass index (x) and resting metabolic rate (y, calories/day) for 20 participants shows a moderate positive linear association. Technology gives: ŷ = 820 + 14.3x and r = 0.72.
- Interpret the gradient in context.
- Calculate r² and interpret it.
- Predict the resting metabolic rate for a person with BMI = 28.
- Explain why this regression line should not be used to establish that high BMI causes a higher metabolic rate.
-
In a study of primary school children, the number of books read per month (x) and mathematics test score (y) have r = 0.73. A school principal concludes that encouraging reading will improve maths scores.
- Identify a possible lurking variable.
- Does r = 0.73 prove the principal’s conclusion? Explain.
- What study design could better investigate a causal link?
-
A linear model for rainfall (x, mm) and dam water level (y, m) gives r² = 0.81. The residual plot shows a random scatter pattern.
- What does r² = 0.81 tell you?
- What does the random scatter residual plot confirm?
- Calculate r given r² = 0.81. (Assume a positive association.)
- Would you trust predictions from this model? Justify.
-
A marine biologist measures water temperature (x, °C) and coral bleaching percentage (y, %) across 12 reef sites. Technology gives: ŷ = −28.4 + 3.7x and r = 0.91.
- Describe the association (direction, form, strength).
- Interpret the gradient in context.
- Calculate r² and interpret it.
- Predict the bleaching percentage when water temperature is 29°C.
- The actual bleaching at 29°C is 82%. Find the residual and interpret it.
-
A dataset on bacterial population (y, thousands) after x hours shows a curved scatterplot. A log(y) vs x transformation gives r = 0.997 and a residual plot showing random scatter. The linear model (no transformation) has r² = 0.78 and a curved residual plot.
- Calculate r² for the transformed model.
- Which model is more appropriate? Justify using both numerical and graphical evidence.
- What type of growth does the successful transformation suggest?
-
The data below shows the selling price ($’000) of 6 used cars and their age (years):
Age x (years) 1 2 3 5 7 9 Price y ($’000) 32 27 23 16 11 7 Technology gives: ŷ = 35.6 − 3.14x, r = −0.997.
- Describe the association.
- Calculate r² and interpret it.
- Predict the price of a 4-year-old car. State whether this is interpolation or extrapolation.
- The residual for the 5-year-old car is +0.3. Show that the actual price is consistent with the table.
- Would it be appropriate to use the model to predict the price of a 20-year-old car? Justify.