The sample proportion is p̂ = X/n, where X is the number of successes and n is the sample size.
p̂ is a random variable because its value varies from sample to sample — different random samples from the same population will give different numbers of successes, and hence different values of p̂.
The population proportion p is a fixed (unknown) parameter, while p̂ is a statistic that varies with each sample.
p̂ = X/n = 36/120 = 0.3
This means 30% of the sampled voters supported the policy. This is our estimate of the population proportion p, but the true proportion is unknown.
For larger n, Var(p̂) = p(1−p)/n decreases. Since variance decreases, the distribution of p̂ clusters more tightly around the true proportion p. Effect: Larger samples give more precise estimates of p. The standard deviation of p̂ is √[p(1−p)/n], which decreases as n increases. To halve the standard deviation, you must quadruple the sample size.
Var(p̂) = p(1−p)/n (a) n=100: Var = 0.4 × 0.6 / 100 = 0.0024; SD = √0.0024 ≈ 0.0490 (b) n=400: Var = 0.24/400 = 0.0006; SD = √0.0006 ≈ 0.0245 (c) n=1600: Var = 0.24/1600 = 0.00015; SD = √0.00015 ≈ 0.01225
Pattern: when n is multiplied by 4, SD is halved. SD is proportional to 1/√n.
The condition for the normal approximation is:
np ≥ 5 AND n(1−p) ≥ 5 (a) n=20, p=0.4: np = 8 ≥ 5 ✓ n(1−p) = 12 ≥ 5 ✓ Satisfied (b) n=10, p=0.2: np = 2 < 5 ✗ NOT satisfied (sample too small relative to p) (c) n=50, p=0.95: n(1−p) = 2.5 < 5 ✗ NOT satisfied (p is very close to 1, so failures are rare) (d) n=100, p=0.07: np = 7 ≥ 5 ✓ n(1−p) = 93 ≥ 5 ✓ Satisfied
p = 0.35, n = 200
Check conditions: np = 200 × 0.35 = 70 ≥ 5 ✓ n(1−p) = 130 ≥ 5 ✓
E(p̂) = 0.35
Var(p̂) = 0.35 × 0.65 / 200 = 0.2275/200 = 0.0011375
SD(p̂) = √0.0011375 ≈ 0.03373
By the normal approximation: p̂ ~ N(0.35, 0.0011375) Approximately p̂ ~ N(0.35, 0.001138) (writing as N(μ, σ²) notation)
(a) p̂ is an unbiased estimator of p because E(p̂) = p. On average, across many random samples, the sample proportion equals the true population proportion.
(b) Unbiasedness is important because it means our estimation method does not systematically overestimate or underestimate the true value. Even though any one sample gives a specific value of p̂ that may differ from p, on average we “get it right”. This is a fundamental property we want from estimation procedures.
p̂ = X/n where X ~ Bin(n, p)
E(X) = np, so E(p̂) = E(X/n) = (1/n)E(X) = (1/n)(np) = p ✓
Var(X) = np(1−p), so Var(p̂) = Var(X/n) = (1/n²)Var(X) = (1/n²)(np(1−p)) = p(1−p)/n ✓
p = 0.25, n = 80
E(p̂) = 0.25; Var(p̂) = 0.25 × 0.75/80 = 0.1875/80 ≈ 0.002344
SD(p̂) ≈ 0.04841
Conditions: np = 20 ≥ 5 ✓ n(1−p) = 60 ≥ 5 ✓ (a) The distribution of p̂ is approximately N(0.25, 0.002344). (b) The distribution is centred at p = 0.25 with a relatively small spread (σ ≈ 0.048), so most sample proportions will be within about 0.10 of 0.25 (i.e., between 0.15 and 0.35).