Practice Maths

Bivariate Data Analysis — Topic Review

This review covers all four lessons in Bivariate Data Analysis: Scatterplots & Correlation, Least-Squares Regression, Prediction & Interpolation, and Residual Analysis.

Review Questions

  1. A researcher records the number of hours per week spent exercising (x) and the systolic blood pressure in mmHg (y) for 15 adults. The data gives Pearson’s r = −0.84.

    1. Describe the association fully (direction, form, strength).
    2. Which variable is the explanatory variable? Justify.
    3. Does this mean exercise causes lower blood pressure? Explain.
  2. Match each r value to its correct description:

    r valueDescription
    0.12Strong negative linear
    −0.79Moderate positive linear
    0.64Weak positive linear
    −0.97Very strong negative linear
  3. A least-squares regression line for data on fertiliser applied (x, kg/ha) and crop yield (y, t/ha) is found to be ŷ = 1.8 + 0.24x.

    1. Interpret the gradient 0.24 in context.
    2. Interpret the y-intercept 1.8 in context.
    3. Use the equation to predict the yield when 30 kg/ha of fertiliser is applied.
  4. The table below shows data on average daily temperature (°C) and number of hot drinks sold at a café:

    Temp x (°C) 10 15 20 25 30
    Drinks sold y 85 74 62 48 35
    1. Identify the explanatory and response variables.
    2. Technology gives the regression equation ŷ = 132.0 − 3.22x. Interpret the gradient.
    3. What does the negative gradient tell us about the relationship?
  5. The regression equation for house price (y, $’000) and floor area (x, m²) is ŷ = 180 + 2.35x. The data was collected from houses with floor areas between 80 m² and 280 m².

    1. Predict the price of a house with floor area 150 m². Is this interpolation or extrapolation?
    2. Predict the price of a house with floor area 400 m². Is this reliable? Explain.
    3. A house sells for $650,000 (y = 650). Find the residual if its floor area is 180 m².
  6. A regression line is fitted to data on study hours (x) and exam score (y). The equation is ŷ = 38 + 6.4x, based on students who studied between 1 and 10 hours.

    1. Predict the score for a student who studied 5 hours.
    2. A student claims the model predicts a score of 166 for 20 hours of study. Evaluate this claim.
    3. Why does extrapolation become increasingly unreliable the further we go beyond the data range?
  7. A regression line ŷ = 3 + 2.1x is fitted to a dataset. The residuals (in x-order) are: −2.1, −1.5, −0.8, +0.2, +1.1, +1.9, +2.8.

    1. Describe the pattern in these residuals.
    2. What does this pattern tell you about the linear model?
    3. What type of model would be more appropriate?
  8. For a study on advertising spend and sales, technology gives r = 0.88.

    1. Calculate the coefficient of determination r².
    2. Interpret r² in context.
    3. A colleague says “r² = 0.88 means the model is almost perfect.” Is this correct? Explain.
  9. Data on the age of a machine (x, years) and its maintenance cost (y, $) gives the regression equation ŷ = 420 + 185x, with r = 0.94 and data collected for machines aged 1–8 years.

    1. Predict the maintenance cost for a 5-year-old machine.
    2. Calculate and interpret r².
    3. Is it appropriate to use this equation to predict the maintenance cost of a 15-year-old machine? Explain.
    4. The actual maintenance cost for a 5-year-old machine is $1,450. Find and interpret the residual.
  10. A scatterplot of body mass index (x) and resting metabolic rate (y, calories/day) for 20 participants shows a moderate positive linear association. Technology gives: ŷ = 820 + 14.3x and r = 0.72.

    1. Interpret the gradient in context.
    2. Calculate r² and interpret it.
    3. Predict the resting metabolic rate for a person with BMI = 28.
    4. Explain why this regression line should not be used to establish that high BMI causes a higher metabolic rate.
  11. In a study of primary school children, the number of books read per month (x) and mathematics test score (y) have r = 0.73. A school principal concludes that encouraging reading will improve maths scores.

    1. Identify a possible lurking variable.
    2. Does r = 0.73 prove the principal’s conclusion? Explain.
    3. What study design could better investigate a causal link?
  12. A linear model for rainfall (x, mm) and dam water level (y, m) gives r² = 0.81. The residual plot shows a random scatter pattern.

    1. What does r² = 0.81 tell you?
    2. What does the random scatter residual plot confirm?
    3. Calculate r given r² = 0.81. (Assume a positive association.)
    4. Would you trust predictions from this model? Justify.
  13. A marine biologist measures water temperature (x, °C) and coral bleaching percentage (y, %) across 12 reef sites. Technology gives: ŷ = −28.4 + 3.7x and r = 0.91.

    1. Describe the association (direction, form, strength).
    2. Interpret the gradient in context.
    3. Calculate r² and interpret it.
    4. Predict the bleaching percentage when water temperature is 29°C.
    5. The actual bleaching at 29°C is 82%. Find the residual and interpret it.
  14. A dataset on bacterial population (y, thousands) after x hours shows a curved scatterplot. A log(y) vs x transformation gives r = 0.997 and a residual plot showing random scatter. The linear model (no transformation) has r² = 0.78 and a curved residual plot.

    1. Calculate r² for the transformed model.
    2. Which model is more appropriate? Justify using both numerical and graphical evidence.
    3. What type of growth does the successful transformation suggest?
  15. The data below shows the selling price ($’000) of 6 used cars and their age (years):

    Age x (years) 1 2 3 5 7 9
    Price y ($’000) 32 27 23 16 11 7

    Technology gives: ŷ = 35.6 − 3.14x, r = −0.997.

    1. Describe the association.
    2. Calculate r² and interpret it.
    3. Predict the price of a 4-year-old car. State whether this is interpolation or extrapolation.
    4. The residual for the 5-year-old car is +0.3. Show that the actual price is consistent with the table.
    5. Would it be appropriate to use the model to predict the price of a 20-year-old car? Justify.