Solutions: Prediction and Interpolation

(a) ŷ = 32 + 7.5(6) = 32 + 45 = 77

(b) Interpolation. The x-value of 6 hours falls within the observed data range of 1 to 10 hours. Because the regression line is based on evidence across this range, predictions within it are generally reliable.
r² = 0.85² = 0.7225

Interpretation: 72.25% of the variation in resting heart rate (bpm) is explained by the linear relationship with daily exercise time (minutes). The remaining 27.75% is due to other factors such as age, diet, and genetics.
(a) x = 5 (i.e., $5,000): ŷ = 120 + 18.5(5) = 120 + 92.5 = 212.5 units. This is interpolation (x = 5 is within the range 1–8), so the prediction is reliable.

(b) x = 15 (i.e., $15,000): ŷ = 120 + 18.5(15) = 120 + 277.5 = 397.5 units. This is extrapolation — $15,000 is well beyond the observed range of $1,000–$8,000. The linear relationship may not hold at much higher spend levels (e.g., market saturation). This prediction should be treated with caution.
r² = 0.73² = 0.5329

About 53.3% of the variation in the response variable is explained by the linear relationship with the explanatory variable. This indicates a moderate linear model — roughly half the variation is explained, and half is due to other factors.
(a) r² = 0.88² = 0.7744. About 77.4% of the variation in the number of coral bleaching events is explained by the linear relationship with water temperature.

(b) ŷ = −12.4 + 1.8(26) = −12.4 + 46.8 = 34.4 bleaching events. This is interpolation (x = 26 is within 22–30), so reliable.

(c) The slope tells us that for each 1°C increase in water temperature, the number of coral bleaching events is predicted to increase by 1.8.
(a) ŷ = 68 + 1.9(25) = 68 + 47.5 = 115.5 mmHg

(b) BMI = 45 is outside the observed data range of 18–38. This is extrapolation, so the prediction is unreliable. At extreme BMI values, the relationship between BMI and blood pressure may become non-linear or involve additional medical complications not captured by the model.

(c) 61% of the variation in blood pressure is explained by the linear relationship with BMI. The remaining 39% is attributable to other factors such as age, diet, physical activity, and stress.
(a) In both datasets, 64% of the variation in y is explained by the linear relationship with x. Both models account for about two-thirds of the variability in the response variable.

(b) Dataset B (80 points) gives a more reliable prediction at x = 30. With more data, the regression line is a more stable estimate — it is less influenced by individual points. A larger sample size produces narrower confidence intervals around the prediction.

(c) Both use extrapolation (x = 55 > 50), which is equally concerning in principle. However, Dataset A’s line is based on only 8 points and may be poorly estimated, making the extrapolation doubly unreliable. Both predictions should be flagged as untrustworthy.
(a) ȳ = 4.2 + 0.052(88) = 4.2 + 4.576 = 8.776 L/100km. The regression line passes through (88, 8.776), confirming the property that the LSRL always passes through (x̄, ȳ).

(b) r² = 0.91² = 0.8281. About 82.8% of the variation in fuel consumption is explained by the linear relationship with vehicle speed. The linear model is a very good fit.

(c) ŷ = 4.2 + 0.052(95) = 4.2 + 4.94 = 9.14 L/100km. Interpolation (95 is within 60–110), reliable.
(a) Slope 0.034: for each additional 1 kg/ha of fertiliser, crop yield is predicted to increase by 0.034 tonnes/ha. y-intercept 1.8: at 0 kg/ha, the model predicts 1.8 t/ha yield. While a crop can grow without fertiliser, x = 0 is outside the data range (50–200), so the y-intercept is only cautiously interpreted.

(b) r² = 0.87² = 0.7569 ≈ 75.7%. The remaining ~24% could be due to rainfall variation, soil type, pest pressure, irrigation, or management practices.

(c) ŷ = 1.8 + 0.034(250) = 10.3 t/ha. The colleague’s claim is incorrect — x = 250 > 200, making this extrapolation. Crop yields typically plateau at high fertiliser rates (due to soil saturation and environmental limits). The prediction is unreliable and should not be presented as a reliable forecast without additional evidence that the linear model extends beyond 200 kg/ha.

(d) ŷ = 1.8 + 0.034(125) = 1.8 + 4.25 = 6.05 t/ha. This is the mean crop yield ȳ, confirming the regression line passes through (x̄, ȳ) = (125, 6.05).
(a) r² = 0.93 means that 93% of the variation in global temperature anomaly is explained by the linear relationship with CO&sub2; concentration. Only 7% is unexplained. The model is an excellent fit to the historical data.

(b) ŷ = −10.2 + 0.030(380) = −10.2 + 11.4 = 1.2°C. Interpolation (380 is within 316–412), so this is a reliable prediction.

(c) ŷ = −10.2 + 0.030(445) = −10.2 + 13.35 = 3.15°C. This is extrapolation (445 > 412). Reliability is reduced, but the extrapolation is modest (~33 ppm beyond the observed range), and physical climate models provide theoretical support. The prediction should be presented with uncertainty bounds rather than as a precise forecast.

(d) The journalist overstates what statistics can prove. A high r² shows a very strong linear association, but correlation does not establish causation by itself. However, causation in this case is also supported by well-understood physical mechanisms (the greenhouse effect), controlled laboratory experiments, ice-core paleoclimate data, and the absence of plausible alternative explanations. The correct statement is: “the data show a very strong linear association (r² = 0.93); combined with established physical theory, the evidence for causation is compelling.”