Topic Review — Sampling and Proportions — Solutions

← Sampling and Proportions

This review covers all lessons in this topic: random sampling and bias, sample proportions as random variables, and the distribution of sample proportions. Click each answer to reveal the worked solution.

Review Questions

Explain the difference between a population parameter and a sample statistic. Give one example of each.
A population parameter is a fixed numerical value that describes the entire population (often unknown). Example: the true proportion p of left-handed adults in Australia.
A sample statistic is a value calculated from sample data and used to estimate the population parameter. Example: the sample proportion p̂ of left-handed people in a survey of 200 adults.
A survey of 400 voters finds 220 support a proposed law. Calculate the sample proportion p̂ and explain what it estimates.
p̂ = 220/400 = 0.55
This estimates the true population proportion p of all voters who support the proposed law.
In a random sample of size n from a population with true proportion p, state the mean and standard deviation of the sample proportion p̂.
E(p̂) = p
SD(p̂) = √(p(1−p)/n)
The sample proportion p̂ is an unbiased estimator of p because its expected value equals p.
For p = 0.4 and n = 100, find: (a) E(p̂) (b) Var(p̂) (c) SD(p̂).
(a) E(p̂) = p = 0.4
(b) Var(p̂) = p(1−p)/n = 0.4 × 0.6 / 100 = 0.0024, so Var(p̂) = 0.0024
(c) SD(p̂) = √0.0024 ≈ 0.0490
State the two conditions that must be met for the distribution of p̂ to be approximately normal. Why are these conditions needed?
The conditions are:
1. np ≥ 5 (expected number of successes is at least 5)
2. n(1−p) ≥ 5 (expected number of failures is at least 5)
These conditions ensure the sample size is large enough relative to p for the Central Limit Theorem to apply, so the binomial distribution of X = np̂ is well-approximated by a normal distribution.
A coin is suspected of being biased. In 200 flips, 112 heads appear. Check the normality conditions (using p = 0.5) and state the approximate distribution of p̂.
np = 200 × 0.5 = 100 ≥ 5 ✓
n(1−p) = 200 × 0.5 = 100 ≥ 5 ✓
Both conditions met, so p̂ ~ N(p, p(1−p)/n) = N(0.5, 0.5 × 0.5/200) = N(0.5, 0.00125)
SD(p̂) = √0.00125 ≈ 0.0354
Using the distribution from Q6, find P(p̂ > 0.56).
p̂ ~ N(0.5, 0.00125), SD = 0.0354
Z = (0.56 − 0.5) / 0.0354 ≈ 1.697
P(p̂ > 0.56) = P(Z > 1.697) ≈ 1 − 0.9552 ≈ 0.0448
Explain what “sampling variability” means. Why does increasing n reduce sampling variability?
Sampling variability refers to the fact that different random samples from the same population will produce different values of p̂. The sample proportion varies from sample to sample.
Increasing n reduces sampling variability because SD(p̂) = √(p(1−p)/n) decreases as n increases. With larger samples, individual sample proportions cluster more tightly around the true p, making estimation more precise.
A large school has 35% of students who walk to school. Random samples of size n = 200 are repeatedly taken.
- (a) Find the mean and standard deviation of the distribution of p̂.
- (b) Find the probability that a sample proportion falls between 0.30 and 0.40.
- (c) Find the value c such that P(p̂ > c) = 0.025.
(a) E(p̂) = 0.35, SD(p̂) = √(0.35 × 0.65/200) = √0.0011375 ≈ 0.03373
(b) Z₁ = (0.30 − 0.35)/0.03373 ≈ −1.483
Z₂ = (0.40 − 0.35)/0.03373 ≈ 1.483
P(0.30 < p̂ < 0.40) = P(−1.483 < Z < 1.483) ≈ 0.8620
(c) P(p̂ > c) = 0.025 ⇒ P(p̂ < c) = 0.975
Z = 1.96, so c = 0.35 + 1.96 × 0.03373 ≈ 0.4161
A national survey finds that 68% of adults own a smartphone. A journalist takes a random sample of 150 adults and finds 92 own smartphones.
- (a) Find the observed sample proportion p̂.
- (b) Find P(p̂ ≤ 92/150) given p = 0.68.
- (c) Is the journalist’s result unusual? Justify using probability.
(a) p̂ = 92/150 ≈ 0.6133
(b) SD(p̂) = √(0.68 × 0.32/150) = √0.001451 ≈ 0.03809
Z = (0.6133 − 0.68) / 0.03809 ≈ −1.750
P(p̂ ≤ 0.6133) ≈ P(Z ≤ −1.750) ≈ 0.0401
(c) There is about a 4% chance of getting a sample proportion this low or lower if p = 0.68. This is somewhat unusual (less than 5%), suggesting the journalist’s sample may be an outlier or the true proportion in that region might differ from 68%.
Without sampling: explain in one sentence why p̂ is called an unbiased estimator of p.
p̂ is an unbiased estimator of p because its expected value equals the true population proportion: E(p̂) = p, meaning the sample proportion neither systematically overestimates nor underestimates p on average.
A factory produces items, 8% of which are defective. Quality control takes samples of n = 400.
- (a) Verify normality conditions for p̂.
- (b) Find P(p̂ > 0.10).
(a) np = 400 × 0.08 = 32 ≥ 5 ✓; n(1−p) = 400 × 0.92 = 368 ≥ 5 ✓
(b) SD(p̂) = √(0.08 × 0.92/400) = √0.000184 ≈ 0.01356
Z = (0.10 − 0.08)/0.01356 ≈ 1.475
P(p̂ > 0.10) ≈ 1 − 0.9299 ≈ 0.0701
Describe a potential source of sampling bias for each of these scenarios:
- (a) Estimating average weekly screen time by asking students in the school library.
- (b) Estimating the proportion of adults who exercise daily using an online health forum.
(a) Selection bias: students in a library are likely to study more and may have less screen time than average, making the sample unrepresentative of all students.
(b) Voluntary response bias / self-selection bias: people who join health forums are already interested in health and exercise, so the sample will overestimate the proportion who exercise daily.
A polling organisation finds that in a sample of n = 600 voters, p̂ = 0.52 support Candidate A.
- (a) Find P(p̂ ≥ 0.52 | p = 0.50) — i.e. the probability of this result or more extreme if the race is tied.
- (b) Does this provide strong evidence that Candidate A is ahead? Justify.
(a) Under p = 0.50: SD(p̂) = √(0.50 × 0.50/600) = √(1/2400) ≈ 0.02041
Z = (0.52 − 0.50)/0.02041 ≈ 0.9800
P(p̂ ≥ 0.52 | p = 0.50) = P(Z ≥ 0.98) ≈ 1 − 0.8365 ≈ 0.1635
(b) No, this does not provide strong evidence. There is about a 16% chance of getting p̂ ≥ 0.52 even if the race is exactly tied. This probability is too high to be considered statistically significant at conventional levels (α = 0.05). The result is plausible under the null hypothesis p = 0.50.
Suppose n = 150 and p̂ = 0.48. Explain what would happen to the distribution of p̂ if (a) n was increased to 1500 while p remained 0.48, and (b) p changed to 0.20 while n stayed at 150.
(a) n increased to 1500: E(p̂) stays at 0.48, but SD(p̂) = √(0.48 × 0.52/1500) ≈ 0.0129, compared to √(0.48 × 0.52/150) ≈ 0.0408 originally. The distribution becomes narrower and taller — sample proportions cluster more tightly around 0.48. Estimates are more precise.
(b) p changed to 0.20: E(p̂) = 0.20 (shifts left). SD(p̂) = √(0.20 × 0.80/150) ≈ 0.0327. The distribution is now centred at 0.20 and slightly narrower than when p = 0.48 (which maximises p(1−p)).