Distribution of Sample Proportions

Key Terms

When n is large, p̂ is approximately normally distributed by the CLT

p̂ ~ N(p, p(1−p)/n) approximately, when np ≥ 5 and n(1−p) ≥ 5

The standard error of p̂ is SE(p̂) = √[p(1−p)/n]

P(p̂ < x) can be found using normCdf(x, p, SE) on CAS

As n increases, the sampling distribution becomes more symmetric and bell-shaped, even if the underlying population is not normal

Sampling distribution of p̂:
E(p̂) = p
Var(p̂) = p(1−p)/n
SE(p̂) = √[p(1−p)/n]

Normal approximation (when np ≥ 5 and n(1−p) ≥ 5):
p̂ ~ N(p, p(1−p)/n)

Z-score: Z = (p̂ − p) / √[p(1−p)/n]

Worked Example: 40% of voters support a candidate. A random sample of n = 100 voters is taken. Find P(p̂ < 0.35).

p = 0.4, n = 100
Check: np = 40 ≥ 5 ✓, n(1−p) = 60 ≥ 5 ✓
SE = √(0.4 × 0.6/100) = √0.0024 ≈ 0.04899
p̂ ~ N(0.4, 0.0024)
P(p̂ < 0.35) = normCdf(−10⁹⁹, 0.35, 0.4, 0.04899) ≈ 0.1539

Hot Tip: When using CAS for sampling distribution problems, enter the standard error (SE = √[p(1−p)/n]) as the σ parameter in normCdf. Always check the np ≥ 5 conditions before applying the normal approximation.

The Sampling Distribution of p̂

When we take random samples from a population and calculate the sample proportion p̂ for each sample, these values form a sampling distribution. Understanding the shape, centre and spread of this distribution is essential for statistical inference.

Key facts:

The sampling distribution of p̂ is centred at the true proportion p (since E(p̂) = p)
Its spread depends on both p and n: Var(p̂) = p(1−p)/n
For small samples, the distribution can be skewed; for large samples it becomes approximately normal

The Central Limit Theorem for Proportions

The Central Limit Theorem (CLT) states that for large enough samples, the sampling distribution of p̂ is approximately normal, regardless of the shape of the population distribution.

Specifically: if X ~ Bin(n, p), then p̂ = X/n is approximately N(p, p(1−p)/n) when:

np ≥ 5 (expect at least 5 successes)
n(1−p) ≥ 5 (expect at least 5 failures)

These conditions ensure the binomial distribution is reasonably symmetric and bell-shaped.

Calculating Probabilities for p̂

Once we know p̂ is approximately normal, we can find probabilities using CAS:

P(p̂ < x) = normCdf(−10⁹⁹, x, p, SE) where SE = √[p(1−p)/n]
P(p̂ > x) = normCdf(x, 10⁹⁹, p, SE)
P(a < p̂ < b) = normCdf(a, b, p, SE)

Effect of Sample Size on the Shape

As n increases:

The distribution of p̂ becomes more bell-shaped (closer to normal)
The spread decreases (SE = √[p(1−p)/n] gets smaller)
Most sample proportions cluster tightly around the true p

The key insight: larger samples don’t change the centre (E(p̂) = p always), but they dramatically reduce the variability of p̂, making our estimate more reliable.

Mastery Practice

See Answers ➔

State the two conditions required for the normal approximation of p̂ to be valid. Are they satisfied when n = 60 and p = 0.1?
For p = 0.5 and n = 200, find: (a) E(p̂) (b) Var(p̂) (c) SE(p̂) (d) the approximate distribution of p̂
A population has p = 0.3. Random samples of size n = 100 are taken. Find P(p̂ < 0.28).
It is known that 55% of households own a pet. In a random sample of 150 households, find P(p̂ > 0.60).
A coin is suspected to be biased with p = 0.55 (probability of heads). The coin is tossed n = 80 times. Find P(0.5 < p̂ < 0.65). Verify conditions first.
Compare the standard error of p̂ when p = 0.5 for:
- (a) n = 50
- (b) n = 200
- (c) n = 800
What pattern do you notice? What does this mean for sampling?
A quality control analyst knows that 8% of items from a production line are defective. A sample of 200 items is taken. Find the probability that the sample proportion of defective items exceeds 10%.
The proportion of left-handed people in a population is 0.12. Explain what happens to the shape of the sampling distribution of p̂ as n increases from 20 to 500. At what sample size do the conditions for normal approximation first become satisfied?
A market researcher claims that 70% of consumers prefer Brand A over Brand B. A competitor conducts a survey of 250 consumers to test this claim.
- (a) Under the assumption that p = 0.7, find the mean and standard error of p̂.
- (b) The survey finds p̂ = 0.65. Find P(p̂ ≤ 0.65) assuming p = 0.7.
- (c) Do you think the competitor’s result is strong evidence against the 70% claim? Justify your answer using part (b).
In a large city, 25% of residents commute by public transport. Random samples of size 100 and size 400 are taken from the city.
- (a) Find P(p̂ > 0.30) for the sample of size 100.
- (b) Find P(p̂ > 0.30) for the sample of size 400.
- (c) Explain the significant difference between these probabilities in terms of the CLT and the benefit of larger samples.