Effect of Sample Size on Statistics
Key Ideas
Key Terms
- population parameter
- the true value for the entire population (e.g. the true mean height of all Year 8 students).
- sample statistic
- the value calculated from a sample (e.g. the mean height of 20 selected students). May differ from the population parameter.
- sampling variability
- natural variation between different samples from the same population. Small samples have high variability; large samples have low variability.
- Law of Large Numbers
- as more trials or observations are made, the sample mean approaches the true population mean.
Worked Example
Context: A bag contains tiles numbered 1–10. The true mean is 5.5.
Sample 1 (n=3): {2, 7, 9} → mean = 6.0
Sample 2 (n=3): {1, 3, 4} → mean = 2.7
Sample 3 (n=8): {1, 3, 5, 6, 7, 8, 9, 10} → mean = 6.1
The two small samples give very different results (6.0 and 2.7). The larger sample (6.1) is closer to the true mean of 5.5 and is more reliable.
The Problem with Small Samples
Imagine you want to know the average height of all Year 8 students in Queensland. You measure 3 students and get heights of 145 cm, 182 cm, and 156 cm — giving a mean of 161 cm. But those 3 students might not be typical. One is very tall, which pulls the average up. With only 3 data points, you have very little confidence that your answer reflects the true population average.
This is called sampling variability — when you take a small sample, the statistics you calculate (mean, median, etc.) can vary a lot from sample to sample. Two different small samples from the same population might give quite different results. This makes small samples unreliable.
How Larger Samples Improve Reliability
As you increase the sample size, the statistics calculated from the sample get closer and closer to the true population values. This is one of the most important ideas in statistics, sometimes called the law of large numbers. It says that as the number of trials or observations grows, the sample average will tend to approach the true population average.
A practical example: flip a fair coin 10 times. You might get 7 heads and 3 tails — that looks like 70% heads. Flip it 1000 times and you will almost certainly get very close to 50% heads. The more trials, the closer you get to the true theoretical probability of 0.5.
The same idea applies to any survey or measurement. A sample of 10 students gives you a rough idea of student opinion. A sample of 300 students gives you a much more reliable estimate — random variation has less power to distort the result.
Variability in Small Samples — Seeing It for Yourself
If you take many different small samples from the same population, you will notice that each sample gives a slightly different mean. These means bounce around quite a bit. But if you take many different large samples, the means stay much closer together — they cluster tightly around the true population mean.
This is why scientists run experiments on hundreds or thousands of participants, not just a handful. It is also why opinion polls report a "margin of error" — this gives a sense of how much the sample result might differ from the true population value. Larger samples have smaller margins of error.
Representativeness vs. Size
It is important to understand that a large sample is not enough on its own. The sample also needs to be representative — it needs to reflect the diversity of the population. A sample of 10 000 people all from the same suburb is less useful than a well-chosen stratified sample of 500 people from across the state.
Think of it this way: size helps reduce random variability, but it cannot fix a biased selection method. If your sampling method consistently over-represents one group (e.g. only surveying students who stay back at school after hours, who might be more motivated than average), making the sample bigger just gives you a bigger biased result.
The gold standard is: a large sample + a random (or well-stratified) selection method. Together, these give you results you can trust.
Mastery Practice
-
Calculate the mean of each sample. Then compare your results to the population mean given. Fluency
Population: scores from 1 to 10, population mean = 5.5
- Sample A (n=3): {2, 5, 8}
- Sample B (n=3): {1, 4, 10}
- Sample C (n=3): {3, 6, 9}
- Sample D (n=5): {2, 4, 6, 7, 9}
- Sample E (n=5): {1, 3, 5, 8, 10}
- Sample F (n=8): {1, 2, 4, 5, 6, 7, 9, 10}
- Sample G (n=8): {1, 3, 4, 5, 6, 7, 8, 10}
- Sample H (n=10): {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
-
The population mean is given. Calculate the sample mean and state how far it is from the population mean. Fluency
- Population mean = 50. Sample: {42, 48, 55, 60, 45}
- Population mean = 7.5. Sample: {6, 8, 9, 7}
- Population mean = 100. Sample: {88, 94, 102, 115}
- Population mean = 20. Sample: {15, 18, 22, 25, 30}
- Population mean = 15. Sample: {12, 13, 14, 16, 17, 18}
- Population mean = 200. Sample: {185, 190, 205, 210, 220, 230}
-
In each pair, which sample gives a more reliable estimate of the population mean? Explain your answer. Fluency
- Sample P: n = 8. Sample Q: n = 80.
- Sample A: 10 people surveyed about their age. Sample B: 500 people surveyed about their age.
- Tossing a coin 5 times vs. tossing it 200 times to estimate the probability of heads.
- Measuring the pH of 3 water samples vs. 30 water samples from a river.
- Asking 15 students vs. 150 students about their favourite genre of music.
- Recording the temperature at one location on one day vs. at 100 locations over 30 days.
- Testing 5 products from a factory batch vs. testing 200 products from the same batch.
- Using 4 data points vs. 40 data points to estimate the average time spent on social media.
-
Survey A used n = 20; Survey B used n = 200. Both asked the same question about the same population. Explain why their results might differ in each context. Understanding
- The question: “Do you prefer summer or winter?” Survey A found 70% prefer summer; Survey B found 58% prefer summer.
- The question: “How many hours of sleep do you get on school nights?”
- The question: “Have you exercised in the last week?”
- The question: “What is your favourite subject?”
- The question: “Do you own a pet?”
- The question: “How often do you eat fast food?”
-
Describe how you would improve the reliability of each study. Understanding
- A school surveyed 5 students to find out what Year 8 students think about the canteen menu.
- A scientist collected data from one river sampling site to determine water quality across the entire river system.
- A health study used 8 participants to test the effectiveness of a new vitamin supplement.
- An election poll surveyed 30 voters at a single shopping centre to predict state election results.
- A teacher used 3 students’ quiz results to estimate the class average.
- A company tested 4 tyres to estimate the average lifespan of a production run of 10 000 tyres.
-
Solve each problem involving sample size and variability. Problem Solving
- Coin flipping simulation: A student flips a fair coin and records the proportion of heads after each flip.
Results: After 5 flips: 4 heads (80%); after 20 flips: 13 heads (65%); after 100 flips: 53 heads (53%); after 500 flips: 251 heads (50.2%).- What is the theoretical probability of heads?
- Describe the trend as the number of flips increases.
- Which result is most reliable? Explain.
- Drawing from a bag: A bag contains 10 balls: 4 red, 3 blue, 2 green, 1 yellow.
A student draws balls (replacing each one) and records results:
n=5: 3 red, 1 blue, 1 green (60% red)
n=20: 9 red, 7 blue, 3 green, 1 yellow (45% red)
n=100: 41 red, 29 blue, 19 green, 11 yellow (41% red)- What is the true proportion of red balls in the bag?
- Which sample most closely matches the true proportions for all colours?
- Explain why the n=5 sample gave such a different result.
- Survey reliability: Three surveys asked: “Do you support a 4-day school week?”
Survey 1 (n=10): 8 said yes (80%)
Survey 2 (n=50): 34 said yes (68%)
Survey 3 (n=500): 310 said yes (62%)- Which survey is most likely to reflect the true population view? Explain.
- What is a possible reason Survey 1 shows 80% support when Survey 3 shows 62%?
- If you were advising a school principal, which survey result would you use? Justify your recommendation.
- Coin flipping simulation: A student flips a fair coin and records the proportion of heads after each flip.
-
State whether each statement is True or False. If false, write the correct statement. Understanding
- Larger samples always give sample statistics exactly equal to the population parameter.
- A sample of n = 200 is generally more reliable than a sample of n = 20 from the same population.
- Two samples of the same size from the same population will always give the same mean.
- The Law of Large Numbers states that as sample size increases, sample means get closer to the population mean.
- With very small samples, one unusual value can greatly affect the mean.
- A census always gives more reliable results than a large random sample.
- As the number of coin flips increases, the proportion of heads should get closer to 0.5.
-
Three students each took samples from a jar of numbered cards (population mean = 8.0). Their results are shown below. Understanding
Student A (n = 4): {3, 9, 12, 7} — mean = 7.75
Student B (n = 10): {5, 7, 8, 8, 9, 9, 10, 7, 6, 8} — mean = 7.7
Student C (n = 20): {6, 7, 7, 8, 8, 8, 9, 9, 9, 8, 7, 8, 9, 8, 7, 8, 9, 8, 7, 8} — mean = 7.95- How far is each sample mean from the population mean of 8.0?
- Which sample gives the most reliable estimate? Explain why.
- Student A says: “My mean (7.75) is just as good as Student C’s mean (7.95) because both are less than 8.” Do you agree? Explain.
- If Student B took another sample of n = 10, would they necessarily get the same mean? Explain.
-
Answer each question about sampling variability. Understanding
- A politician says: “Our poll of 15 people found 80% support our policy.” Give two reasons why this result might not reflect the true population view.
- A scientist conducts the same experiment three times with n = 5 each time. Results: mean = 4.2, 5.8, 3.9. Does this prove the experiment is unreliable? Explain using the concept of sampling variability.
- Two students sample heights from the same year group. Student A uses n = 8 and gets mean = 162 cm. Student B uses n = 40 and gets mean = 165 cm. Which result should be used to estimate the year group’s average height? Why?
- Explain why increasing the sample size does not guarantee that the sample mean will equal the population mean, but still improves reliability.
-
Design and evaluate a sampling strategy. Problem Solving
A factory makes batteries and claims the average battery life is 20 hours. A consumer group wants to test this claim.
- The consumer group tests 5 batteries and gets lifetimes: {17, 19, 21, 22, 18}. Calculate the mean. Does this support or refute the factory’s claim?
- Another 5 batteries are tested: {14, 16, 17, 22, 25}. Calculate the mean. Compare to the first sample.
- Both samples are combined into one sample of 10. Calculate the mean of all 10 values. Is this mean more reliable than either individual sample? Explain.
- The consumer group decides to test 100 batteries. Their mean is 18.6 hours. Does this provide strong evidence that the factory’s claim of 20 hours is wrong? Justify your answer using what you know about sample size and reliability.
- The factory says the sample of 100 is “not representative” because all batteries were taken from one production shift. What type of bias might this introduce, and how could the consumer group improve their sampling?