Solutions — Distribution of Sample Proportions

← Back to Questions

The two conditions for the normal approximation are:
1. np ≥ 5
2. n(1−p) ≥ 5

When n = 60 and p = 0.1:
np = 60 × 0.1 = 6 ≥ 5 ✓
n(1−p) = 60 × 0.9 = 54 ≥ 5 ✓
Both conditions are satisfied.
p = 0.5, n = 200
(a) E(p̂) = p = 0.5
(b) Var(p̂) = p(1−p)/n = 0.5 × 0.5/200 = 0.25/200 = 0.00125
(c) SE(p̂) = √0.00125 ≈ 0.03536
(d) p̂ ~ N(0.5, 0.00125) approximately
p = 0.3, n = 100
Conditions: np = 30 ≥ 5 ✓, n(1−p) = 70 ≥ 5 ✓
SE = √(0.3 × 0.7/100) = √0.0021 ≈ 0.04583
P(p̂ < 0.28) = normCdf(−10⁹⁹, 0.28, 0.3, 0.04583)
Z = (0.28 − 0.3)/0.04583 ≈ −0.436
P(p̂ < 0.28) ≈ 0.3315
p = 0.55, n = 150
Conditions: np = 82.5 ≥ 5 ✓, n(1−p) = 67.5 ≥ 5 ✓
SE = √(0.55 × 0.45/150) = √0.00165 ≈ 0.04062
P(p̂ > 0.60) = normCdf(0.60, 10⁹⁹, 0.55, 0.04062)
Z = (0.60 − 0.55)/0.04062 ≈ 1.231
P(p̂ > 0.60) ≈ 0.1092
p = 0.55, n = 80
Check: np = 44 ≥ 5 ✓, n(1−p) = 36 ≥ 5 ✓
SE = √(0.55 × 0.45/80) = √0.0030938 ≈ 0.05562
P(0.5 < p̂ < 0.65) = normCdf(0.5, 0.65, 0.55, 0.05562)
Z₁ = (0.5 − 0.55)/0.05562 ≈ −0.899
Z₂ = (0.65 − 0.55)/0.05562 ≈ 1.798
P ≈ 0.7993
SE = √[0.5 × 0.5/n] = √(0.25/n) = 0.5/√n
(a) n=50: SE = 0.5/√50 ≈ 0.0707
(b) n=200: SE = 0.5/√200 ≈ 0.0354
(c) n=800: SE = 0.5/√800 ≈ 0.0177
Pattern: When n is multiplied by 4, SE halves. SE is proportional to 1/√n. This means that to double the precision (halve the SE), you need four times as many observations — sampling is expensive!
p = 0.08, n = 200
Conditions: np = 16 ≥ 5 ✓, n(1−p) = 184 ≥ 5 ✓
SE = √(0.08 × 0.92/200) = √0.000368 ≈ 0.01918
P(p̂ > 0.10) = normCdf(0.10, 10⁹⁹, 0.08, 0.01918)
Z = (0.10 − 0.08)/0.01918 ≈ 1.043
P(p̂ > 0.10) ≈ 0.1484
As n increases, the CLT guarantees the sampling distribution of p̂ approaches a normal distribution.
For n = 20: np = 2.4 < 5 ✗ — conditions not satisfied; distribution is right-skewed
For n = 500: np = 60 ≥ 5 ✓, n(1−p) = 440 ≥ 5 ✓ — clearly normal

Minimum n for normal approximation:
Need np ≥ 5: n × 0.12 ≥ 5 ⇒ n ≥ 41.7 ⇒ n ≥ 42
Need n(1−p) ≥ 5: n × 0.88 ≥ 5 ⇒ n ≥ 5.7 (always satisfied when n≥42)
Conditions first satisfied at n = 42.
p = 0.7, n = 250
Conditions: np = 175 ≥ 5 ✓, n(1−p) = 75 ≥ 5 ✓
(a) E(p̂) = 0.7; SE = √(0.7 × 0.3/250) = √0.00084 ≈ 0.02898
(b) P(p̂ ≤ 0.65) = normCdf(−10⁹⁹, 0.65, 0.7, 0.02898)
Z = (0.65 − 0.7)/0.02898 ≈ −1.725
P(p̂ ≤ 0.65) ≈ 0.0423
(c) If p truly is 0.7, there is only a 4.2% chance of getting p̂ ≤ 0.65. This is fairly strong evidence against the 70% claim, as such a result would be unlikely to occur by chance alone. Most statisticians would consider a result this extreme (under 5%) as noteworthy, though a formal hypothesis test would be needed for a definitive conclusion.
p = 0.25
(a) n = 100: SE = √(0.25×0.75/100) = √0.001875 ≈ 0.04330
P(p̂ > 0.30) = normCdf(0.30, 10⁹⁹, 0.25, 0.04330)
Z = (0.30−0.25)/0.04330 ≈ 1.155
P ≈ 0.1241
(b) n = 400: SE = √(0.25×0.75/400) = √0.0004688 ≈ 0.02165
P(p̂ > 0.30) = normCdf(0.30, 10⁹⁹, 0.25, 0.02165)
Z = (0.30−0.25)/0.02165 ≈ 2.309
P ≈ 0.0105
(c) With n=100, there is a 12.4% chance of the sample proportion exceeding 0.30; with n=400, only a 1.05% chance. The larger sample has a much smaller SE (0.022 vs 0.043), so p̂ is far more tightly concentrated around p=0.25. The CLT guarantees the approximation is reliable for both samples (np≥5 in both cases). Larger samples make it less likely to observe large deviations from the true proportion purely by chance.