L51 — Statistics, Probability and Mixed Problem Solving
Key Terms
- Line of best fit
- The straight line that best models a scatter plot; always passes through the mean point (x̅, y̅).
- Correlation coefficient r
- Measures strength and direction of linear association: r near ±1 is strong, r near 0 is weak. Correlation ≠ causation.
- Conditional probability
- P(A|B) = P(A ∩ B) / P(B) — the probability of A given that B has already occurred.
- Extrapolation
- Predicting outside the observed data range using a fitted model — results are unreliable as the relationship may not continue.
- Index laws
- Key rules: am×an=am+n, am÷an=am−n, (am)n=amn, a0=1, a−n=1/an.
- Logarithm
- logb(x) = y means by = x; use log laws (product, quotient, power) and the inverse relationship to solve exponential equations.
Statistics and probability summary
| Concept | Key idea | Watch out |
|---|---|---|
| Line of best fit | Passes through (x̅, y̅); minimises vertical distances | Never extrapolate too far outside data range |
| Multiplication rule | P(A and B) = P(A) × P(B|A) | Only equals P(A)×P(B) if independent |
| Complementary events | P(A') = 1 − P(A) | Useful when direct calculation is complex |
| Growth/decay | A = A0bt (b > 1: growth, 0 < b < 1: decay) | Take logs to solve for t |
Worked Example — Exponential Growth
A bacteria population doubles every 3 hours. There are initially 500 bacteria. When will there be 8 000?
Step 1 — Write the model. A = 500 × 2t/3.
Step 2 — Set up the equation. 8000 = 500 × 2t/3 ⇒ 16 = 2t/3.
Step 3 — Solve using logs or recognition. 24 = 16 ⇒ t/3 = 4 ⇒ t = 12 hours.
Bivariate Data and Scatter Plots (T3T1)
A scatter plot shows the relationship between two numerical variables. The line of best fit (least squares regression line) passes through the mean point (x̅, y̅) and summarises the linear trend. The correlation coefficient r measures the strength and direction of the relationship:
- r near +1: strong positive correlation.
- r near −1: strong negative correlation.
- r near 0: weak or no linear correlation.
Interpolation (predicting within the data range) is reliable. Extrapolation (predicting outside the range) is risky — the linear pattern may not continue.
Probability (T3T2)
For mutually exclusive events: P(A or B) = P(A) + P(B). For any two events: P(A or B) = P(A) + P(B) − P(A and B). Venn diagrams and two-way tables are powerful tools for organising probability information.
Indices and Logarithms (T3T3)
To solve exponential equations:
- Isolate the exponential term.
- If the base matches: equate exponents directly.
- Otherwise: take log of both sides. log(ax) = x log(a).
Networks (T3T4)
A graph consists of vertices (nodes) and edges. Key applications:
- Minimum spanning tree: connects all vertices with minimum total edge weight (Prim's or Kruskal's algorithm).
- Shortest path: minimum weight path between two vertices (Dijkstra's algorithm).
- Euler circuit: all vertices must have even degree. Euler path: exactly 2 odd-degree vertices.
Worked Example 2 — Conditional Probability
In a class of 30, 18 play sport (S) and 12 play music (M). 6 play both. A student is chosen at random. Given they play sport, what is the probability they also play music?
Solution
P(M|S) = P(M ∩ S) / P(S) = (6/30) / (18/30) = 6/18 = 1/3.
-
Scatter plots and correlation. Fluency
A dataset gives hours of study (x) and test score (y): (2,55), (4,63), (5,70), (7,78), (8,85), (10,90).
- (a) Calculate x̅ and y̅.
- (b) Describe the correlation (strength and direction).
- (c) A student studies for 6 hours. Use the approximate line of best fit through (2,55) and (10,90) to predict their score.
- (d) Would it be valid to predict the score for 20 hours of study? Explain.
-
Probability basics. Fluency
A bag contains 5 red, 3 blue and 2 green marbles. One marble is drawn at random.
- (a) Find P(red).
- (b) Find P(not green).
- (c) Two marbles are drawn without replacement. Find P(both red).
- (d) Two marbles are drawn with replacement. Find P(one red, one blue) in either order.
-
Index laws and exponential equations. Fluency
- (a) Simplify: (3x²y)3 ÷ (9xy²).
- (b) Solve: 2x = 64.
- (c) Solve: 32x−1 = 27.
- (d) Solve: 5x = 100 (use log base 10; round to 2 decimal places).
-
Networks. Fluency
A graph has vertices A, B, C, D, E with edges: AB=5, AC=3, BC=4, BD=6, CD=2, DE=7, CE=8.
- (a) List the degree of each vertex.
- (b) Is an Euler path possible? Justify.
- (c) Find the minimum spanning tree using Kruskal's algorithm (add lowest-weight edges without creating cycles).
- (d) Find the shortest path from A to E.
-
Line of best fit and prediction. Understanding
The equation of a least-squares line is y = 4.5x + 12, where x = temperature (°C) and y = ice cream sales per day.
- (a) Interpret the gradient and y-intercept in context.
- (b) Predict ice cream sales when temperature is 28°C.
- (c) The actual sales at 28°C were 142. Calculate the residual.
- (d) The data was collected for temperatures between 15°C and 35°C. Comment on predicting sales at 5°C.
-
Conditional probability and Venn diagrams. Understanding
In a survey of 200 people: 120 own a car (C), 80 own a bicycle (B), 40 own both.
- (a) Draw a Venn diagram and label all regions with their counts.
- (b) Find P(C only).
- (c) Find P(C | B) — the probability of owning a car given they own a bicycle.
- (d) Find P(neither C nor B).
-
Growth and decay. Understanding
- (a) A car worth $24 000 depreciates by 15% per year. Write a formula for its value V after t years.
- (b) Find its value after 3 years (to the nearest dollar).
- (c) After how many complete years is it worth less than $10 000?
- (d) A savings account pays 4.5% per annum compound interest. How long to double the initial investment? (Use logarithms.)
-
Networks — minimum spanning tree and paths. Understanding
Six towns are connected by roads with distances (km): A–B=12, A–C=18, B–C=10, B–D=14, C–D=8, C–E=20, D–E=6, D–F=16, E–F=10.
- (a) How many edges would a spanning tree of this graph have?
- (b) Use Kruskal's algorithm to find the minimum spanning tree.
- (c) Find the total length of the minimum spanning tree.
- (d) Find the shortest path from A to F.
-
Mixed: statistics and probability. Problem Solving
A school surveyed 150 students on favourite sport: 60 chose football (F), 55 chose basketball (B), 35 chose neither. Of the football players, 20 also chose basketball.
- (a) Construct a two-way table or Venn diagram.
- (b) Find P(F ∩ B).
- (c) Find P(F | B) — probability of preferring football given they prefer basketball.
- (d) A student is chosen at random. Find P(only one sport chosen).
-
Extended: mixed Year 10 problem. Problem Solving
A researcher models the number of website visitors per day t (days after launch) with V(t) = 200 × 1.08t.
- (a) How many visitors on the day of launch?
- (b) How many visitors after 10 days? (Round to nearest whole number.)
- (c) After how many complete days does the daily visitor count first exceed 1 000? (Use logarithms.)
- (d) The server can handle 5 000 visitors per day. After how many complete days must they upgrade? (Use logarithms, round up.)