Practice Maths

Sample Proportions as Random Variables

Key Terms

The sample proportion p̂ = X/n, where X is the count of “successes” in a sample of size n
p̂ is a random variable — it takes a different value each time a new sample is drawn from the same population
E(p̂) = p
— the expected value of p̂ equals the true population proportion p, so p̂ is an unbiased estimator of p
Var(p̂) = p(1 − p)/n
— variance decreases as sample size n increases
SD(p̂) = √(p(1 − p)/n)
— also called the standard error of the proportion
Increasing n makes p̂ less variable and more concentrated around p
p is the population proportion (fixed, usually unknown); p̂ is the sample proportion (observed, varies between samples)
Sample Proportion Formulas:
p̂ = X/n     E(p̂) = p     Var(p̂) = p(1 − p)/n     SD(p̂) = √(p(1 − p)/n)
Worked Example: In a population where 40% prefer Brand A (p = 0.4), a sample of n = 100 is taken.

p̂ = X/100 where X = number in sample preferring Brand A
E(p̂) = 0.4
Var(p̂) = 0.4 × 0.6 / 100 = 0.0024
SD(p̂) = √0.0024 ≈ 0.0490
Hot Tip: The key insight is that p̂ is random — not a fixed number. E(p̂) = p means “on average across all possible samples, the sample proportion equals the true proportion.” This is what “unbiased” means. SD(p̂) tells you how spread out the sample proportions are; it gets smaller as n increases, which is why larger samples give more reliable estimates.

What is a Sample Proportion?

When we take a random sample from a population and count how many individuals have a particular characteristic, we call that count X. The sample proportion is defined as p̂ = X/n, where n is the sample size. For example, if we survey 200 voters and 90 say they will vote Yes, then p̂ = 90/200 = 0.45.

The population proportion p (the true fraction of the whole population with the characteristic) is a fixed but typically unknown parameter. The sample proportion p̂ is our observable estimate of p.

p̂ as a Random Variable

Crucially, p̂ is a random variable. Before we take the sample, we do not know which n individuals we will select, so we do not know what X will be. If we took many different samples of the same size from the same population, we would get a different value of p̂ each time. The collection of all possible values p̂ could take, along with their probabilities, is called the sampling distribution of p̂.

Because X follows a binomial distribution B(n, p), we can derive the properties of p̂ = X/n using standard rules for expectations and variances.

Expected Value: E(p̂) = p

Using E(X) = np for a binomial random variable:

E(p̂) = E(X/n) = E(X)/n = np/n = p

This tells us that p̂ is an unbiased estimator of p: on average, the sample proportion equals the true population proportion. There is no systematic tendency to over- or under-estimate p.

Variance and Standard Deviation of p̂

Using Var(X) = np(1 − p) for a binomial:

Var(p̂) = Var(X/n) = Var(X)/n² = np(1 − p)/n² = p(1 − p)/n

SD(p̂) = √(p(1 − p)/n)

The standard deviation of p̂ (often called the standard error) measures how much p̂ typically varies from sample to sample. It depends on both p and n:

  • Larger n → smaller SD(p̂) → p̂ is more tightly clustered around p → more precise estimates
  • p near 0.5 → largest SD(p̂) for a given n (maximum variability when outcomes are most unpredictable)
  • p near 0 or 1 → smaller SD(p̂) (less variability when one outcome is dominant)

The Effect of Sample Size

Doubling n reduces SD(p̂) by a factor of √2 (not 2). To halve the standard deviation, you need to quadruple the sample size. This is a fundamental result in statistics: precision improves slowly with sample size, which is why large samples are expensive to achieve.

Exam technique: When asked “what does E(p̂) = p mean?” write: “The expected value of the sample proportion equals the population proportion, meaning p̂ is an unbiased estimator of p.” When asked why larger samples are better, link it to Var(p̂) = p(1−p)/n decreasing as n increases.

Mastery Practice

  1. Fluency A sample of 80 households is surveyed and 28 are found to have solar panels. Calculate the sample proportion p̂. What does p̂ estimate?

  2. Fluency The proportion of students in a school who ride to school is p = 0.25. A random sample of n = 60 students is selected. What is E(p̂)? Explain in plain language what this means.

  3. Fluency In a large city, 60% of residents recycle regularly (p = 0.6). A sample of n = 150 is taken. Calculate Var(p̂).

  4. Fluency Using p = 0.6 and n = 150 (from the previous question), calculate SD(p̂). Give your answer correct to 4 decimal places.

  5. Understanding For p = 0.3, calculate SD(p̂) for sample sizes n = 25, n = 100, and n = 400. Describe the pattern you observe.

  6. Understanding A quality control manager takes two samples from a production line where p = 0.08 (proportion of defective items): Sample A has n = 200 and Sample B has n = 800. Which sample proportion is likely to be closer to p = 0.08? Justify using SD(p̂).

  7. Understanding For a certain population, SD(p̂) = 0.05 when n = 100. What sample size is needed to reduce SD(p̂) to 0.025?

  8. Understanding A researcher states: “The sample proportion p̂ = 0.42 from my survey is a random variable.” A student objects: “But p̂ = 0.42 is a fixed number — how can it be random?” Write a correct response to the student.

  9. Problem Solving Explain why p̂ is called an “unbiased” estimator of p. Does “unbiased” mean that p̂ = p for every sample? Why or why not?

  10. Problem Solving In a large community, 35% of people support a proposed new park (p = 0.35). A journalist takes a random sample of n = 400 residents. (a) Find E(p̂) and SD(p̂). (b) The journalist reports p̂ = 0.31. Is this a surprising result? Use SD(p̂) to justify your answer.