Sampling Distributions and the Central Limit Theorem
Key Terms
- The sample mean X̄ is a random variable. For a population with mean μ and standard deviation σ, a random sample of size n gives:
- E(X̄) = μ and SD(X̄) = σ/√n (the standard error).
- If the population is normal, then X̄ ∼ N(μ, σ²/n) exactly, for any n.
- Central Limit Theorem (CLT)
- For any population with mean μ and finite standard deviation σ, the distribution of X̄ is approximately normal for sufficiently large n:
- X̄ ≈ N(μ, σ²/n) when n is large (rule of thumb: n ≥ 30).
- To find probabilities, standardise: Z = (X̄ − μ) / (σ/√n) ∼ N(0, 1).
- As n increases, the spread of X̄ decreases — larger samples give more precise estimates of μ.
- E(X̄) = μ
- Var(X̄) = σ²/n
- SD(X̄) = σ/√n (standard error)
- X̄ ∼ N(μ, σ²/n) exactly if population is normal; approximately if n ≥ 30 (CLT)
- Z = (X̄ − μ) / (σ/√n) ∼ N(0,1)
Worked Example 1 — Distribution of the Sample Mean
A population has μ = 60 and σ = 12. Random samples of size n = 36 are taken.
State the distribution of X̄.
Standard error = σ/√n = 12/√36 = 12/6 = 2.
By the CLT (n = 36 ≥ 30): X̄ ≈ N(60, 4) [variance = 2² = 4].
Find P(X̄ > 63).
Z = (63 − 60)/2 = 1.5
P(X̄ > 63) = P(Z > 1.5) = 1 − Φ(1.5) = 1 − 0.9332 = 0.0668
Worked Example 2 — Using the CLT for a Non-Normal Population
Task completion times have μ = 25 min and σ = 8 min (skewed distribution). For n = 64 samples:
(a) Why can we use the normal distribution for X̄?
n = 64 ≥ 30, so by the CLT, X̄ ≈ N(25, 64/64) = N(25, 1). Standard error = 8/√64 = 1.
(b) Find P(23 ≤ X̄ ≤ 26).
Z1 = (23 − 25)/1 = −2, Z2 = (26 − 25)/1 = 1
P(−2 ≤ Z ≤ 1) = Φ(1) − Φ(−2) = 0.8413 − 0.0228 = 0.8185
Why Does Sampling Produce a Distribution?
When we draw a random sample from a population and compute the sample mean X̄, we get a single number. But if we were to repeat this process — drawing many samples of the same size n and computing X̄ each time — we would get many different values. These values form a distribution: the sampling distribution of X̄. Understanding this distribution is the foundation of all statistical inference.
The key insight is that X̄ is not a fixed number — it is a random variable. Before we draw our sample, X̄ could take many possible values. The sampling distribution describes exactly which values are likely and which are not.
Properties of the Sampling Distribution
For any population with mean μ and standard deviation σ, and for random samples of size n, the following hold exactly (not just approximately):
- E(X̄) = μ: The sample mean is an unbiased estimator of the population mean. On average, across all possible samples, X̄ equals μ.
- Var(X̄) = σ²/n: The variance of the sample mean decreases as n increases. Larger samples yield less variable estimates.
- SD(X̄) = σ/√n: This is called the standard error of the mean. It measures the typical distance of X̄ from μ.
If the underlying population is itself normal, N(μ, σ²), then X̄ is also normal for any sample size n: X̄ ∼ N(μ, σ²/n).
The Central Limit Theorem
The Central Limit Theorem is arguably the most important theorem in all of statistics. It states that regardless of the shape of the underlying population distribution (provided it has a finite mean and standard deviation), the sampling distribution of X̄ approaches a normal distribution as n → ∞.
Formally: if X1, X2, …, Xn are independent and identically distributed (i.i.d.) with mean μ and standard deviation σ, then:
X̄ ≈ N(μ, σ²/n) for large n.
The rule of thumb n ≥ 30 works well for most distributions. Highly skewed or heavy-tailed distributions may require larger n. If the population is already normal, the CLT holds exactly for all n ≥ 1.
Standardising the Sample Mean
Once we know X̄ is (approximately) normal, we standardise to use the Z-table. The standardisation formula for X̄ is:
Z = (X̄ − μ) / (σ/√n)
This Z has the standard normal distribution N(0, 1). Notice that σ/√n (the standard error) plays the role that σ plays when standardising individual observations. Do not confuse these two situations.
Effect of Sample Size
As n increases: the standard error σ/√n decreases (proportional to 1/√n), so the sampling distribution becomes more tightly concentrated around μ. Doubling n reduces the standard error by a factor of √2 ≈ 1.41. Quadrupling n halves the standard error. This is why larger samples give more reliable estimates.
Mastery Practice
-
State the distribution of X̄. Fluency
A population has μ = 50 and σ = 8. Random samples of size n = 16 are taken. State the distribution of X̄, giving its mean and variance.
-
Probability for sample mean. Fluency
Heights are normally distributed with μ = 170 cm and σ = 10 cm. For random samples of size n = 25, find P(X̄ > 172).
-
Probability below the mean. Fluency
A machine fills cans with mean μ = 500 mL and standard deviation σ = 5 mL (normally distributed). Find P(X̄ < 498) for samples of size n = 25.
-
Probability within a range. Fluency
A population has μ = 100 and σ = 15. Find the probability that a sample mean of size n = 36 lies within 3 of 100, i.e. P(97 ≤ X̄ ≤ 103).
-
Two-sided probability. Understanding
The time to complete a task has μ = 20 min and σ = 4 min. For random samples of size n = 64, find P(19 ≤ X̄ ≤ 21).
-
Applying the CLT to a non-normal distribution. Understanding
A non-normal distribution has μ = 30 and σ = 6. For samples of size n = 50, use the CLT to find P(X̄ > 31).
-
Find the unknown mean. Understanding
If P(X̄ < 52) = 0.9332 for samples of size n = 16 from a normal distribution with σ² = 25, find μ.
-
Effect of sample size. Understanding
A population has μ = 40 and σ = 10. Compare the distributions of X̄ for samples of size n = 10, n = 30, and n = 100. For each, state the mean and standard error of X̄. Describe what happens to the variability as n increases.
-
Skewed population and the CLT. Problem Solving
A survey finds household incomes are right-skewed with mean $75 000 and standard deviation $30 000. Random samples of size n = 100 are taken.
(a) Explain why X̄ is approximately normally distributed.
(b) Find P(X̄ > $78 000).
-
Required sample size. Problem Solving
Quality control: packets must have mean weight 500 g. A machine has output with σ = 8 g. What is the minimum sample size n needed so that P(|X̄ − 500| > 2) < 0.05?