L40 — Effect of Sample Size on Statistics — Solutions — Grade 8

Calculate sample means (population mean = 5.5)

Sample A {2,5,8}:
Sample B {1,4,10}:
Sample C {3,6,9}:
Sample D {2,4,6,7,9}:
Sample E {1,3,5,8,10}:
Sample F {1,2,4,5,6,7,9,10}:
Sample G {1,3,4,5,6,7,8,10}:
Sample H {1,2,3,4,5,6,7,8,9,10}:

Compare sample mean to population mean

Population mean = 50. Sample {42,48,55,60,45}:
Population mean = 7.5. Sample {6,8,9,7}:
Population mean = 100. Sample {88,94,102,115}:
Population mean = 20. Sample {15,18,22,25,30}:
Population mean = 15. Sample {12,13,14,16,17,18}:
Population mean = 200. Sample {185,190,205,210,220,230}:

Which sample size is more reliable?

Sample Q (n=80) is more reliable. A larger sample reduces variability and gives a more stable estimate of the population mean.
Sample B (500 people) is more reliable. With 500 responses, random variation has much less influence on the result.
200 flips is more reliable. With only 5 flips, getting 4 heads (80%) is not unusual by chance. With 200 flips, the proportion will be much closer to the true probability of 0.5.
30 water samples is more reliable. Three samples may all come from a similar stretch of the river; 30 samples across the whole system better represent overall water quality.
150 students is more reliable. A larger sample is more likely to include students with a full range of music preferences.
100 locations over 30 days is far more reliable. One location on one day gives a single data point highly affected by local weather conditions.
200 products is more reliable. Testing 200 from the batch gives a much more stable estimate of the defect rate or average lifespan.
40 data points is more reliable. A small dataset of 4 can easily be skewed by one unusual value; 40 points gives a more stable average.

Why do Survey A (n=20) and Survey B (n=200) give different results?

With only 20 people, one or two extra “summer” responses can push the percentage noticeably higher. Survey B (200 people) captures more of the population’s actual preference, so its result of 58% is more trustworthy.
With n=20, the sample may happen to include mostly students with very regular or very irregular sleep. With n=200, the range of sleep patterns is more fully represented and individual extremes have less influence on the mean.
A small sample of 20 might happen to include a particularly active group (e.g. from a sports club) or a particularly inactive group. Larger samples are more likely to include people of all activity levels.
With only 20 students, niche subjects might appear more or less popular by chance (e.g. all 20 could happen to enjoy maths). With 200, all subjects get more responses and the proportions stabilise.
By chance, all 20 people surveyed might happen to be pet owners (or non-owners). The larger sample better reflects the true mix in the population.
Small samples are more sensitive to who happens to be selected. With 200, the diverse range of eating habits in the population is better captured.

Improve study reliability

Increase the sample to at least 30–50 students, using stratified or random sampling to ensure all class groups are represented, not just convenience selections.
Collect samples from multiple sites along the entire river (upstream, midstream, downstream, and tributaries) rather than one fixed point.
Increase the number of participants to at least 30–100, use a control group (participants who do not take the supplement), and randomly assign participants to groups.
Increase the sample to at least 500–1000 voters, use random sampling across multiple locations (not just one shopping centre), and at different times of day.
Include all students in the class (census) or randomly select a much larger proportion — using 3 of 30 students (10%) is too small and likely unrepresentative.
Test at least 50–200 tyres to get a reliable estimate. With only 4, one unusually durable or weak tyre can greatly distort the average lifespan estimate.

Simulated experiment problems

Coin flipping:
1. The theoretical probability of heads = 0.5 (50%).
2. As the number of flips increases, the proportion of heads gets closer and closer to 50%. The results show 80% → 65% → 53% → 50.2%, steadily approaching the theoretical value.
3. The 500-flip result (50.2%) is most reliable because the large number of trials reduces the effect of random variation. This is the Law of Large Numbers in action.
Drawing from a bag (true proportions: red=40%, blue=30%, green=20%, yellow=10%):
1. True proportion of red balls = 4 out of 10 = 40%.
2. The n=100 sample most closely matches all true proportions (red 41%, blue 29%, green 19%, yellow 11%) compared to the actual values (40%, 30%, 20%, 10%).
3. With only 5 draws, random chance has a huge effect — drawing 3 red balls in 5 is entirely plausible even if they are only 40% of the bag. Small samples can produce extreme results that don’t reflect the true composition.
School survey reliability:
1. Survey 3 (n=500) is most likely to reflect the true population view. With 500 respondents, random variation has the least influence and the sample is most likely to include a representative mix of views.
2. Survey 1 may have selected by chance a group of students who strongly favour the 4-day week (e.g. all friends, a particular class, or students who were told about the survey and chose to respond). The small sample size magnifies the effect of these coincidental selections.
3. Survey 3 should be used. It has the largest sample (n=500), the most reliable estimate (62%), and the lowest sampling error. Surveys 1 and 2 are too small to draw confident conclusions about the full school population.

True or False about sample size

False — larger samples give statistics that are closer to the population parameter, but they do not guarantee an exact match. Random variation still exists.
True.
False — due to random variation, two samples of the same size from the same population will almost certainly give different means.
True.
True.
False — a census gives exact population parameters (assuming no measurement error), but a large well-designed random sample often produces very accurate estimates and is more practical when a census is impossible.
True.

Analyse sample means (population mean = 8.0)

Student A: |7.75 − 8.0| = 0.25; Student B: |7.7 − 8.0| = 0.30; Student C: |7.95 − 8.0| = 0.05.
Student C’s sample (n=20) gives the most reliable estimate because it is the largest sample. Larger samples reduce the effect of random variation and produce statistics closer to the true population mean.
No. Although both are below 8, Student A’s sample (n=4) has much higher sampling variability. A sample of 4 can easily deviate from the true mean by chance. The reliability of an estimate depends on sample size, not just whether it happens to be close on one occasion.
No. Due to random variation, a new sample of n=10 would almost certainly give a different mean. Sample statistics vary between samples, especially at smaller sample sizes.

Sampling variability in context

(1) The sample of 15 is too small to reliably represent the full population; random chance could easily produce 80% support in such a small group. (2) The sampling method is unknown — if only supporters were asked, or if the sample was not random, the result is biased.
No, this does not prove the experiment is unreliable. The variation in means (4.2, 5.8, 3.9) is a normal consequence of sampling variability with small samples (n=5). With such a small sample size, one or two unusual values can greatly shift the mean. The experiment should be repeated with larger samples before drawing conclusions.
Student B’s result (n=40, mean=165 cm) should be used. The larger sample size reduces the influence of chance and gives a more stable, reliable estimate of the year group’s true average height.
Increasing sample size reduces sampling variability — the range of possible sample means narrows as n increases. However, because samples are still drawn randomly, a sample mean will not exactly equal the population mean unless every member of the population is included. The key improvement is that larger samples are more likely to be close to the true value.

Design and evaluate a sampling strategy

Mean = (17+19+21+22+18) ÷ 5 = 97 ÷ 5 = 19.4 hours. This is close to the factory’s claimed average of 20 hours — it does not provide strong evidence against the claim with such a small sample.
Mean = (14+16+17+22+25) ÷ 5 = 94 ÷ 5 = 18.8 hours. This is further below the claim. The two samples give different means (19.4 vs. 18.8), illustrating sampling variability with small samples.
Combined mean = (97+94) ÷ 10 = 191 ÷ 10 = 19.1 hours. Yes, the combined mean is more reliable than either individual sample of 5 because the larger sample size (n=10) reduces the effect of random variation.
Yes, this provides stronger evidence. A sample of n=100 is much more reliable than n=5. A mean of 18.6 hours is 1.4 hours below the claim — this is a consistent and meaningful difference that is unlikely to be due to chance alone with a large sample.
This introduces time-based bias (or cluster sampling bias) — batteries from one shift may have been produced under the same conditions (same machines, same materials, same workers), so they may be systematically better or worse than batteries from other shifts. To improve the sample, the consumer group should randomly select batteries from different production shifts, machines, and time periods.