Collecting Data and Sample Size — Solutions
Click any answer to watch the solution video.
-
Classify each data collection method
- Rainfall records:
- Every student’s test score:
- 20 classmates surveyed:
- Market research firm data:
- Every bird counted:
- ABS unemployment figures:
- Random sample of club members:
- Every resident in council database:
-
Population and sample
- Population: all light bulbs produced; Sample: 50 out of every 1000 bulbs tested
- Population: all students at the school; Sample: the 8 selected students
- Population: all listeners of the radio station; Sample: the 200 listeners contacted
- Population: all clients of the vet clinic; Sample: the 30 clients surveyed
- Population: all 80 000 residents of the city; Sample: the 400 residents surveyed
- Population: all 4000 trees in the forest; Sample: the 25 trees selected
- Population: all shoppers at the supermarket; Sample: the 100 shoppers surveyed on Saturday morning
- Population: all 60 swimmers at the carnival; Sample: the 12 swimmers timed
-
Classify the sampling method
- Names drawn from a hat:
- Every 10th customer:
- First 15 people at the bus stop:
- Proportional groups (40% admin, 60% production):
- Numbered tickets drawn from a barrel:
- Willing shoppers who stop to chat:
- 10 students from each year level:
- Every 20th item on a production line:
-
Sample size and reliability
- 1000 students gives more reliable results. A larger sample reduces the effect of random variation and better represents the full Queensland school population.
- 200 patients gives more reliable results. A sample of 5 is too small to detect real differences — chance variation could make an ineffective drug appear effective.
- 150 people gives more reliable results. A sample of 15 may not capture the full range of preferences in the suburb.
- 2000 voters gives more reliable results. Polling more voters reduces sampling error and produces a more accurate prediction.
- 500 flips gives more reliable results. With only 10 flips, getting 7 heads does not mean the coin is unfair — larger samples stabilise the proportion closer to 0.5.
- 2000 students gives more reliable results. A sample of 20 from a large country may reflect only one region or school type.
-
Identify bias in surveys
- Biased sample — employees have a vested interest in saying the food is good. They are not representative of the general public.
- Leading question — the phrase “Don’t you agree” pushes respondents towards answering yes.
- Biased sample — only students interested in sport attend the carnival, so results will not represent all Year 8 students.
- Self-selection bias — only people who actively visit the website will respond, missing those who don’t use it.
- Loaded language — the word “waste” implies a negative judgment, which may cause people to underreport their television viewing.
- Biased sample — customers at a shopping centre are not representative of all people in the area (e.g. non-shoppers are excluded).
-
Plan a data collection study
- School lunch habits:
(i) Primary data — observing or surveying students directly.
(ii) Survey — impractical to observe every student every day.
(iii) Stratified sampling — select students from each year level proportionally.
(iv) Possible bias: surveying only one day of the week may not represent typical behaviour. - Screen time and sleep:
(i) Primary data — participants self-report screen time and sleep duration.
(ii) Survey — too many teenagers in Australia for a census.
(iii) Random sampling — randomly select teenagers from a national register.
(iv) Possible bias: self-reported data may be inaccurate; teenagers may underreport screen time. - Native bird species:
(i) Primary data — researcher observes and records birds directly in the reserve.
(ii) Survey (or census if the reserve is small enough) — multiple observation sessions across the reserve.
(iii) Systematic sampling — walk set transects across the reserve at set time intervals.
(iv) Possible bias: birds are more active at certain times of day, so surveys done only at midday may miss species active at dawn or dusk.
- School lunch habits:
-
Data collection comparison table
Row 1 — What do Year 8 students eat for lunch?
- Primary or Secondary:
- Census or Survey:
- Sampling method:
- Possible bias:
Row 2 — Average rainfall in Brisbane last year
- Primary or Secondary:
- Possible bias:
Row 3 — Number of defective phones in a factory batch of 10 000
- Primary or Secondary:
- Census or Survey:
- Sampling method:
- Possible bias:
-
Evaluate a data collection plan
- Problems: (1) The PE class is a convenience sample — it does not represent students who do not take PE or who chose different electives; (2) surveying on a Friday afternoon may over-represent sports popular at that time. Improvement: Use random or stratified sampling across all Year 8 classes on different days.
- Problems: (1) Self-selection bias — only people who already follow the council’s social media respond, missing non-social-media users; (2) the sample size (300 out of 50 000) is only 0.6%, which is very small. Improvement: Use random phone or mail surveys to reach a broader cross-section of residents.
- Problems: (1) The sample of 6 students is too small to be reliable; (2) measuring only high-achieving students introduces bias — they may not be representative of all students in the school. Improvement: Randomly select students from across all year levels and ability groups.
-
Categorical vs numerical data
- Colour of school bag:
- Number of siblings:
- Student’s height in cm:
- Type of transport to school:
- Number of books on a shelf:
- Room temperature in °C:
- Favourite subject:
- Time taken to run 100 m:
-
Design a complete data collection plan
- Research question: “Are students at our school satisfied with the canteen’s food options, prices, and service?”
- Population: all 800 students. Sample size: approximately 80 students (10%). A 10% sample balances practicality with reliability; it is large enough to detect genuine trends across the school.
- Stratified random sampling: divide students by year level (e.g. Years 7–12) and randomly select proportional numbers from each year. This ensures all year levels are represented and the sample reflects the full school population.
- Closed question (e.g. “How satisfied are you with the canteen? Rate from 1 (very dissatisfied) to 5 (very satisfied).”); Open-ended question (e.g. “What one change would most improve the canteen?”)
- Potential bias: students surveyed immediately after a long queue may rate satisfaction lower. Minimise by surveying students at different times of day and on different days of the week.