Practice Maths

Solutions — Distribution of Sample Proportions

← Back to Questions

  1. The two conditions for the normal approximation are:
    1. np ≥ 5
    2. n(1−p) ≥ 5

    When n = 60 and p = 0.1:
    np = 60 × 0.1 = 6 ≥ 5 ✓
    n(1−p) = 60 × 0.9 = 54 ≥ 5 ✓
    Both conditions are satisfied.
  2. p = 0.5, n = 200
    (a) E(p̂) = p = 0.5
    (b) Var(p̂) = p(1−p)/n = 0.5 × 0.5/200 = 0.25/200 = 0.00125
    (c) SE(p̂) = √0.00125 ≈ 0.03536
    (d) p̂ ~ N(0.5, 0.00125) approximately
  3. p = 0.3, n = 100
    Conditions: np = 30 ≥ 5 ✓, n(1−p) = 70 ≥ 5 ✓
    SE = √(0.3 × 0.7/100) = √0.0021 ≈ 0.04583
    P(p̂ < 0.28) = normCdf(−1099, 0.28, 0.3, 0.04583)
    Z = (0.28 − 0.3)/0.04583 ≈ −0.436
    P(p̂ < 0.28) ≈ 0.3315
  4. p = 0.55, n = 150
    Conditions: np = 82.5 ≥ 5 ✓, n(1−p) = 67.5 ≥ 5 ✓
    SE = √(0.55 × 0.45/150) = √0.00165 ≈ 0.04062
    P(p̂ > 0.60) = normCdf(0.60, 1099, 0.55, 0.04062)
    Z = (0.60 − 0.55)/0.04062 ≈ 1.231
    P(p̂ > 0.60) ≈ 0.1092
  5. p = 0.55, n = 80
    Check: np = 44 ≥ 5 ✓, n(1−p) = 36 ≥ 5 ✓
    SE = √(0.55 × 0.45/80) = √0.0030938 ≈ 0.05562
    P(0.5 < p̂ < 0.65) = normCdf(0.5, 0.65, 0.55, 0.05562)
    Z1 = (0.5 − 0.55)/0.05562 ≈ −0.899
    Z2 = (0.65 − 0.55)/0.05562 ≈ 1.798
    P ≈ 0.7993
  6. SE = √[0.5 × 0.5/n] = √(0.25/n) = 0.5/√n
    (a) n=50: SE = 0.5/√50 ≈ 0.0707
    (b) n=200: SE = 0.5/√200 ≈ 0.0354
    (c) n=800: SE = 0.5/√800 ≈ 0.0177
    Pattern: When n is multiplied by 4, SE halves. SE is proportional to 1/√n. This means that to double the precision (halve the SE), you need four times as many observations — sampling is expensive!
  7. p = 0.08, n = 200
    Conditions: np = 16 ≥ 5 ✓, n(1−p) = 184 ≥ 5 ✓
    SE = √(0.08 × 0.92/200) = √0.000368 ≈ 0.01918
    P(p̂ > 0.10) = normCdf(0.10, 1099, 0.08, 0.01918)
    Z = (0.10 − 0.08)/0.01918 ≈ 1.043
    P(p̂ > 0.10) ≈ 0.1484
  8. As n increases, the CLT guarantees the sampling distribution of p̂ approaches a normal distribution.
    For n = 20: np = 2.4 < 5 ✗ — conditions not satisfied; distribution is right-skewed
    For n = 500: np = 60 ≥ 5 ✓, n(1−p) = 440 ≥ 5 ✓ — clearly normal

    Minimum n for normal approximation:
    Need np ≥ 5: n × 0.12 ≥ 5 ⇒ n ≥ 41.7 ⇒ n ≥ 42
    Need n(1−p) ≥ 5: n × 0.88 ≥ 5 ⇒ n ≥ 5.7 (always satisfied when n≥42)
    Conditions first satisfied at n = 42.
  9. p = 0.7, n = 250
    Conditions: np = 175 ≥ 5 ✓, n(1−p) = 75 ≥ 5 ✓
    (a) E(p̂) = 0.7; SE = √(0.7 × 0.3/250) = √0.00084 ≈ 0.02898
    (b) P(p̂ ≤ 0.65) = normCdf(−1099, 0.65, 0.7, 0.02898)
    Z = (0.65 − 0.7)/0.02898 ≈ −1.725
    P(p̂ ≤ 0.65) ≈ 0.0423
    (c) If p truly is 0.7, there is only a 4.2% chance of getting p̂ ≤ 0.65. This is fairly strong evidence against the 70% claim, as such a result would be unlikely to occur by chance alone. Most statisticians would consider a result this extreme (under 5%) as noteworthy, though a formal hypothesis test would be needed for a definitive conclusion.
  10. p = 0.25
    (a) n = 100: SE = √(0.25×0.75/100) = √0.001875 ≈ 0.04330
    P(p̂ > 0.30) = normCdf(0.30, 1099, 0.25, 0.04330)
    Z = (0.30−0.25)/0.04330 ≈ 1.155
    P ≈ 0.1241
    (b) n = 400: SE = √(0.25×0.75/400) = √0.0004688 ≈ 0.02165
    P(p̂ > 0.30) = normCdf(0.30, 1099, 0.25, 0.02165)
    Z = (0.30−0.25)/0.02165 ≈ 2.309
    P ≈ 0.0105
    (c) With n=100, there is a 12.4% chance of the sample proportion exceeding 0.30; with n=400, only a 1.05% chance. The larger sample has a much smaller SE (0.022 vs 0.043), so p̂ is far more tightly concentrated around p=0.25. The CLT guarantees the approximation is reliable for both samples (np≥5 in both cases). Larger samples make it less likely to observe large deviations from the true proportion purely by chance.