L53 — Issues in Data Collection
Types of Bias in Data Collection
- Voluntary response bias — only people who feel strongly bother to respond.
- Convenience bias — surveying people who happen to be nearby (e.g. friends, class).
- Leading question bias — question wording nudges respondents toward a particular answer.
- Small sample bias — too few people so chance results have a large effect.
Worked Example
Question: A student writes: "Do you agree that our school canteen food is terrible?" Identify the problem and suggest an improvement.
Step 1 — Identify the bias. The word "terrible" is loaded and negative — this is a leading question.
Step 2 — Explain the effect. Respondents are nudged to say "yes" because of the wording, not genuine dislike.
Step 3 — Neutral version: "How would you rate the quality of the school canteen food?"
Options: Excellent / Good / Average / Poor / Very Poor
Key Terms
- population
- the entire group you want information about (e.g. all students at a school)
- sample
- a smaller group selected from the population to represent it
- bias
- a systematic error in data collection that causes results to consistently favour certain outcomes
- primary data
- data you collect yourself through surveys, experiments, or observations
- secondary data
- data collected by someone else (e.g. ABS, news reports, databases)
Why Data Collection Matters
The way you collect data can completely change what the results say — and whether you can trust them. Good data collection is fair, representative, and free from bias. Poor data collection leads to conclusions that sound convincing but are misleading.
Populations and Samples
A population is the entire group you want information about. A sample is a smaller group selected to represent it. We use samples because surveying everyone is usually impossible.
- A well-chosen sample gives results close to what the whole population would say
- A poorly chosen sample gives biased results — they favour one group over another
- Larger samples are more reliable only if chosen properly
Types of Bias
- Leading question bias: the wording pushes people toward a particular answer. Example: "Don't you agree the canteen food is terrible?" A neutral version: "How do you rate the canteen food?" with balanced options.
- Convenience bias: the sample consists only of easy-to-reach people — like asking friends or only your class. These may not represent the wider population.
- Voluntary response bias: people self-select to respond (e.g. an online poll). People with strong opinions are over-represented.
- Small sample bias: too few people means chance variation has a big effect on results.
Primary vs Secondary Data
- Primary data — collected yourself (surveys, experiments, observations). You control how it is collected.
- Secondary data — collected by someone else (ABS, newspapers, websites). Useful for large datasets, but evaluate the source for trustworthiness, currency, and relevance.
-
Primary or Secondary Data
Classify each as primary or secondary data.
- You survey 30 classmates about their favourite sport.
- You download population statistics from the Australian Bureau of Statistics website.
- You record the temperature outside your house every morning for two weeks.
- You read a newspaper article reporting average household income.
- You weigh the school bags of students in your class.
-
Biased or Unbiased?
Classify each sample as biased or unbiased. Give a brief reason.
- To find out students' favourite lunch, a teacher selects every 5th student from the school roll.
- A TV show asks viewers to text in their vote for the "best ever" episode.
- A student surveys only her friends about whether students should have less homework.
- A doctor randomly selects patient files from the database to study recovery times.
- An online poll lets website visitors vote as many times as they like.
-
Identify the Type of Bias
Identify the type of bias in each scenario: voluntary response, convenience, leading question, or small sample.
- A journalist asks: "Wouldn't you agree that the new road is an improvement?"
- A student surveys the 3 people sitting next to him at the library to draw conclusions about the whole school.
- A website posts: "Do you think social media is harmful? Click here to vote." Only 12% of visitors click.
- A researcher draws conclusions about Australian eating habits after only surveying people at one café in Sydney CBD at lunchtime.
- A radio station runs a competition where listeners call in to share opinions on local traffic; only very frustrated commuters call.
-
Rewrite the Biased Question
Rewrite each biased question to make it neutral and fair.
- "Don't you think that too much homework is unfair to students?"
- "Surely you agree that the new park design is much better than the old one?"
- "How often do you waste time on your phone?"
- "Do you support the unnecessary and costly new traffic lights on Main Street?"
-
Problem Solving
- A school wants to find out what sport students most enjoy playing at lunch. Design a fair survey process: write the survey question, describe who you would survey, and explain your sampling method.
- A student wants to find out students' preferred school start time. She plans to survey students arriving at the bus stop at 7:45 am. Explain why this convenience sample is likely to be problematic and what kind of bias it may introduce.
- A newspaper headline reads: "80% of parents support longer school holidays — survey of 15 parents at school fete." Identify two problems with this survey and explain how they could affect the reliability of the result.
-
True or False?
State True or False and explain your reasoning.
- A sample is always less reliable than surveying the whole population.
- If you survey 500 people, the results must be accurate.
- Random selection helps reduce bias.
- A biased sample of 1 000 people gives more reliable results than an unbiased sample of 50 people.
-
Identify Population and Sample
For each scenario, identify the population and the sample.
- A researcher wants to know how many hours per week Queensland teenagers spend on social media. She surveys 400 Year 7–12 students across 10 schools.
- A council wants to know if local residents support a new dog park. They survey 80 people who attend the community meeting.
- A scientist wants to test the quality of water in all rivers in Australia. She takes water samples from 25 rivers.
- A company wants to know if its product is popular with Australian households. It calls 1 200 randomly selected phone numbers.
-
Evaluate and Improve the Method
For each survey method, identify the problem and suggest an improvement.
- To find out if Year 7 students enjoy maths, a teacher only asks the top maths class.
- A health researcher wants to find average Australian adult sleep patterns. She posts a survey on a fitness app and receives 2 000 responses.
- To determine the most popular school subject, a student asks everyone at lunchtime on Monday. About 60 out of 300 students respond.
- A local council wants to know if residents support a new skate park. They place a form in the council newsletter and 45 people respond out of 8 000 households.
-
Data Collection Design
- You want to find out the most popular after-school activity among all Year 7 students at your school (approximately 120 students). Describe two different sampling methods and explain one advantage of each.
- Explain why surveying the whole population (a census) is not always practical. Give two real-world examples where a sample is used instead.
- What is the difference between a random sample and a representative sample? Can a sample be random but not representative? Give an example.
-
Extended Investigation
-
A student is investigating: "How many hours of homework do Year 7 students do each week?" She asks 10 students in her class on a Friday afternoon: {2, 4, 1, 3, 5, 2, 3, 4, 6, 2}.
- Identify two potential sources of bias in how she collected this data.
- Explain how each bias could affect her results.
- Redesign the data collection process to make it more reliable.
-
A news article reports: "A survey found that 3 in 4 Australians believe the government is doing a poor job." The survey was posted on an opposition party's social media page; 1 200 followers responded.
- Identify the type(s) of bias present.
- Explain why the large sample size of 1 200 does not make the results reliable.
- Describe how you would conduct a more reliable version of the same survey.
-
A student is investigating: "How many hours of homework do Year 7 students do each week?" She asks 10 students in her class on a Friday afternoon: {2, 4, 1, 3, 5, 2, 3, 4, 6, 2}.