Sample Proportions as Random Variables
Key Terms
- The sample proportion p̂ = X/n, where X is the count of “successes” in a sample of size n
- p̂ is a random variable — it takes a different value each time a new sample is drawn from the same population
- E(p̂) = p
- — the expected value of p̂ equals the true population proportion p, so p̂ is an unbiased estimator of p
- Var(p̂) = p(1 − p)/n
- — variance decreases as sample size n increases
- SD(p̂) = √(p(1 − p)/n)
- — also called the standard error of the proportion
- Increasing n makes p̂ less variable and more concentrated around p
- p is the population proportion (fixed, usually unknown); p̂ is the sample proportion (observed, varies between samples)
p̂ = X/n E(p̂) = p Var(p̂) = p(1 − p)/n SD(p̂) = √(p(1 − p)/n)
p̂ = X/100 where X = number in sample preferring Brand A
E(p̂) = 0.4
Var(p̂) = 0.4 × 0.6 / 100 = 0.0024
SD(p̂) = √0.0024 ≈ 0.0490
What is a Sample Proportion?
When we take a random sample from a population and count how many individuals have a particular characteristic, we call that count X. The sample proportion is defined as p̂ = X/n, where n is the sample size. For example, if we survey 200 voters and 90 say they will vote Yes, then p̂ = 90/200 = 0.45.
The population proportion p (the true fraction of the whole population with the characteristic) is a fixed but typically unknown parameter. The sample proportion p̂ is our observable estimate of p.
p̂ as a Random Variable
Crucially, p̂ is a random variable. Before we take the sample, we do not know which n individuals we will select, so we do not know what X will be. If we took many different samples of the same size from the same population, we would get a different value of p̂ each time. The collection of all possible values p̂ could take, along with their probabilities, is called the sampling distribution of p̂.
Because X follows a binomial distribution B(n, p), we can derive the properties of p̂ = X/n using standard rules for expectations and variances.
Expected Value: E(p̂) = p
Using E(X) = np for a binomial random variable:
E(p̂) = E(X/n) = E(X)/n = np/n = p
This tells us that p̂ is an unbiased estimator of p: on average, the sample proportion equals the true population proportion. There is no systematic tendency to over- or under-estimate p.
Variance and Standard Deviation of p̂
Using Var(X) = np(1 − p) for a binomial:
Var(p̂) = Var(X/n) = Var(X)/n² = np(1 − p)/n² = p(1 − p)/n
SD(p̂) = √(p(1 − p)/n)
The standard deviation of p̂ (often called the standard error) measures how much p̂ typically varies from sample to sample. It depends on both p and n:
- Larger n → smaller SD(p̂) → p̂ is more tightly clustered around p → more precise estimates
- p near 0.5 → largest SD(p̂) for a given n (maximum variability when outcomes are most unpredictable)
- p near 0 or 1 → smaller SD(p̂) (less variability when one outcome is dominant)
The Effect of Sample Size
Doubling n reduces SD(p̂) by a factor of √2 (not 2). To halve the standard deviation, you need to quadruple the sample size. This is a fundamental result in statistics: precision improves slowly with sample size, which is why large samples are expensive to achieve.
Mastery Practice
-
Fluency A sample of 80 households is surveyed and 28 are found to have solar panels. Calculate the sample proportion p̂. What does p̂ estimate?
-
Fluency The proportion of students in a school who ride to school is p = 0.25. A random sample of n = 60 students is selected. What is E(p̂)? Explain in plain language what this means.
-
Fluency In a large city, 60% of residents recycle regularly (p = 0.6). A sample of n = 150 is taken. Calculate Var(p̂).
-
Fluency Using p = 0.6 and n = 150 (from the previous question), calculate SD(p̂). Give your answer correct to 4 decimal places.
-
Understanding For p = 0.3, calculate SD(p̂) for sample sizes n = 25, n = 100, and n = 400. Describe the pattern you observe.
-
Understanding A quality control manager takes two samples from a production line where p = 0.08 (proportion of defective items): Sample A has n = 200 and Sample B has n = 800. Which sample proportion is likely to be closer to p = 0.08? Justify using SD(p̂).
-
Understanding For a certain population, SD(p̂) = 0.05 when n = 100. What sample size is needed to reduce SD(p̂) to 0.025?
-
Understanding A researcher states: “The sample proportion p̂ = 0.42 from my survey is a random variable.” A student objects: “But p̂ = 0.42 is a fixed number — how can it be random?” Write a correct response to the student.
-
Problem Solving Explain why p̂ is called an “unbiased” estimator of p. Does “unbiased” mean that p̂ = p for every sample? Why or why not?
-
Problem Solving In a large community, 35% of people support a proposed new park (p = 0.35). A journalist takes a random sample of n = 400 residents. (a) Find E(p̂) and SD(p̂). (b) The journalist reports p̂ = 0.31. Is this a surprising result? Use SD(p̂) to justify your answer.