Sampling Distributions and the Central Limit Theorem — Worked Solutions
-
Q1 — Expected Value and Standard Error
A population has mean μ = 50 and standard deviation σ = 10. Random samples of size n = 25 are drawn. Find E(X̄) and SD(X̄).
Expected value:
E(X̄) = μ = 50
The sample mean is an unbiased estimator of the population mean, regardless of sample size.
Standard deviation (standard error):
SD(X̄) = σ/√n = 10/√25 = 10/5 = 2
So X̄ has mean 50 and standard deviation 2 (the standard error). Since n = 25 and the population is assumed normal (or n is large enough), X̄ ∼ N(50, 4).
-
Q2 — Probability Above a Value
Using μ = 50, σ = 10, n = 25, find P(X̄ > 52).
From Q1: X̄ ∼ N(50, 4), so SD(X̄) = 2.
Standardise:
Z = (X̄ − μ) / (σ/√n) = (52 − 50) / 2 = 2/2 = 1.0
Find probability:
P(X̄ > 52) = P(Z > 1.0) = 1 − Φ(1.0) = 1 − 0.8413 = 0.1587
There is approximately a 15.87% chance that a sample mean of n = 25 observations exceeds 52, when the true population mean is 50.
-
Q3 — Probability Below a Value (Normal Population)
Heights are normally distributed with μ = 175 cm and σ = 8 cm. For random samples of size n = 16, find P(X̄ < 173).
Since the population is normally distributed, X̄ is exactly normal for any n.
Standard error:
SE = σ/√n = 8/√16 = 8/4 = 2
So X̄ ∼ N(175, 4).
Standardise:
Z = (173 − 175)/2 = −2/2 = −1.0
Find probability:
P(X̄ < 173) = P(Z < −1.0) = Φ(−1.0) = 1 − Φ(1.0) = 1 − 0.8413 = 0.1587
-
Q4 — Effect of Quadrupling Sample Size
What happens to the standard error when n is quadrupled?
The standard error is SE = σ/√n.
If n is replaced by 4n:
New SE = σ/√(4n) = σ/(2√n) = (1/2) × (σ/√n)
The standard error is halved.
In general, to reduce the standard error by a factor of k, you must multiply n by k². So to halve the SE, you need 4 times the sample size. This shows that reducing sampling variability becomes increasingly expensive as n grows.
-
Q5 — CLT for a Uniform Population
A population is uniform on [0, 10]. For random samples of size n = 50, describe the approximate distribution of X̄ and find P(X̄ > 5.5).
Population parameters for Uniform[0, 10]:
μ = (0 + 10)/2 = 5
σ² = (10 − 0)²/12 = 100/12 = 25/3
σ = √(25/3) = 5/√3 ≈ 2.887
Apply the CLT:
Since n = 50 ≥ 30, by the Central Limit Theorem:
X̄ ≈ N(5, (25/3)/50) = N(5, 1/6)
Standard error = 5/(√3 × √50) = 5/√150 ≈ 0.4082
Find P(X̄ > 5.5):
Z = (5.5 − 5)/0.4082 ≈ 1.225
P(X̄ > 5.5) = P(Z > 1.225) = 1 − Φ(1.22) ≈ 1 − 0.8888 = 0.1112
Despite the uniform (non-normal) population shape, the CLT guarantees that sample means are approximately bell-shaped for n = 50.
-
Q6 — Machine Filling: Probability Within a Range
A machine fills bags with mean 500 g and SD 15 g. For a sample of n = 36, find P(492 < X̄ < 508).
Standard error:
SE = 15/√36 = 15/6 = 2.5
Since n = 36 ≥ 30, by the CLT: X̄ ≈ N(500, 6.25).
Standardise both bounds:
Z1 = (492 − 500)/2.5 = −8/2.5 = −3.2
Z2 = (508 − 500)/2.5 = 8/2.5 = 3.2
Find probability:
P(−3.2 ≤ Z ≤ 3.2) = 2Φ(3.2) − 1 = 2(0.9993) − 1 = 0.9986
Almost all samples of size 36 will have a mean between 492 g and 508 g — the quality control range is very effectively captured.
-
Q7 — Required Sample Size for a Precision Requirement
How large must n be so that P(|X̄ − μ| < 2) ≥ 0.95, when σ = 10?
We need P(−2 < X̄ − μ < 2) ≥ 0.95.
Standardising: P(−2/SE < Z < 2/SE) ≥ 0.95, where SE = 10/√n.
For a standard normal, P(−z* < Z < z*) = 0.95 requires z* = 1.96.
So we need:
2/SE ≥ 1.96
SE ≤ 2/1.96
10/√n ≤ 2/1.96 = 1.0204
√n ≥ 10/1.0204 = 9.8
n ≥ 9.8² = 96.04
Minimum sample size: n = 97.
Check: SE = 10/√97 ≈ 1.015. z = 2/1.015 = 1.970 > 1.96. ✓
-
Q8 — Why the CLT Matters
Explain why the Central Limit Theorem is useful even when the population distribution is unknown.
In practice, we almost never know the exact shape of the population distribution. The CLT is powerful because:
- No distributional assumption needed: The CLT holds for any population with a finite mean and variance, whether it is skewed, bimodal, uniform, or any other shape.
- Enables probability calculations: Once we know X̄ is approximately normal, we can compute probabilities, construct confidence intervals, and perform hypothesis tests using standard normal tables — all without needing to know the population distribution.
- Justifies inference procedures: Most classical inferential statistics (t-tests, z-tests, confidence intervals) rely on the normality of X̄, which the CLT guarantees for large n.
- Large samples make it more reliable: For n ≥ 30, the normal approximation is generally excellent for most real-world populations.
In essence, the CLT bridges the gap between messy real-world data and the mathematically clean normal distribution, making statistical inference broadly applicable.
-
Q9 — Borderline CLT Application
A non-normal population has μ = 30 and σ = 6. For n = 9, can the CLT be applied? Calculate P(X̄ > 32) and discuss.
Can the CLT be applied?
n = 9 < 30, so the rule of thumb for applying the CLT is not satisfied. Strictly, the CLT cannot be relied upon here for a non-normal population. If the population were normal, X̄ would be exactly N(30, 4) for any n, but this is not stated.
Calculation assuming normality (for illustration):
SE = 6/√9 = 6/3 = 2
Z = (32 − 30)/2 = 1.0
P(X̄ > 32) = P(Z > 1.0) = 1 − 0.8413 = 0.1587
Discussion:
If the population is non-normal, the calculated probability of 0.1587 is an approximation that may be unreliable for such a small sample. The actual probability depends on the true population shape. For mildly non-normal populations (e.g., slightly skewed), n = 9 may give a reasonable approximation. For heavily skewed or heavy-tailed distributions, this calculation could be significantly in error. In practice, one might use bootstrapping or transformation methods for small samples from non-normal populations.
-
Q10 — Middle 90% Interval for Battery Life
Batteries have mean life 200 h and SD 20 h. Quality control samples 64 batteries. Find the interval within which the middle 90% of sample means fall.
Distribution of X̄:
SE = 20/√64 = 20/8 = 2.5
By CLT (n = 64 ≥ 30): X̄ ≈ N(200, 6.25).
Find the 90% middle interval:
The middle 90% leaves 5% in each tail. From standard normal tables:
P(Z < 1.645) ≈ 0.95, so z* = 1.645.
The interval is: μ ± z* × SE
200 ± 1.645 × 2.5
200 ± 4.1125
Middle 90% interval: (195.89 h, 204.11 h)
Interpretation: if quality control repeatedly samples 64 batteries, 90% of those sample means will fall between approximately 195.9 h and 204.1 h, assuming the true mean is 200 h.