Collecting Data and Sample Size

Key Ideas

Key Terms

population: the entire group being studied (e.g. all Year 8 students in Queensland).
sample: a smaller group selected from the population, used to make inferences about the whole.
census: data collected from every member of the population.
survey: data collected from a sample of the population.
primary data: data collected firsthand by the researcher (e.g. conducting a survey or experiment).
secondary data: data collected by someone else and used by the researcher (e.g. ABS statistics, published studies).
random sampling: every member of the population has an equal chance of selection. Other methods: systematic (every nth), stratified (proportional groups), convenience (easiest to reach).
sample size: the number of individuals in a sample. Larger samples give more reliable estimates with less variability.

Hot Tip A sample must be representative of the population. Convenience samples (asking your friends) are often biased because they do not reflect the wider group.

Worked Example

Question: A school wants to know students’ favourite sport. They survey every 5th student on the roll. Identify: (a) the population (b) the sample (c) the sampling method.

(a) Population: All students at the school.

(b) Sample: Every 5th student on the roll.

(c) Sampling method: Systematic sampling — students are selected at regular intervals.

Populations and Samples

In statistics, the population is the entire group you want to study — every single member of it. For example, if you want to know the average height of all Year 8 students in Queensland, the population is every Year 8 student in Queensland. That could be tens of thousands of students!

Because it is usually impossible (or too expensive and time-consuming) to measure every member of a population, we instead study a smaller group called a sample. A sample is a subset of the population that we use to make inferences about the whole group. The key goal is for the sample to be representative — it should reflect the diversity of the population, not just a convenient or biased subset.

When data is collected from the entire population, it is called a census. Australia's national census, for example, collects information from every household in the country. When data is collected from only a part of the population, it is a survey or sample study.

Types of Sampling

Simple random sampling gives every member of the population an equal chance of being selected. Drawing names from a hat or using a random number generator are examples. This is the gold standard for avoiding bias.

Systematic sampling selects every nth member of a list. For example, choosing every 5th student on a roll. It is convenient but can be biased if there is a pattern in the list that repeats with the same frequency as your selection interval.

Stratified sampling divides the population into subgroups (strata) — such as year levels, genders, or suburbs — and then randomly selects from each stratum in proportion to its size. This ensures all subgroups are fairly represented.

Convenience sampling selects whoever is easiest to reach — like asking your friends or people at a local shopping centre. This is quick but often biased, because it does not represent the full population.

Primary and Secondary Data

Primary data is collected firsthand by the researcher for a specific purpose. Examples include conducting your own survey, running an experiment, or making direct observations. Primary data is fresh and tailored to your question, but it takes time and effort to collect.

Secondary data is data that someone else has already collected, which you then access and use. Examples include government statistics (like ABS data), published research papers, weather records, or school databases. Secondary data is quicker to obtain but may not perfectly match your research question, and you need to consider whether it is reliable and up to date.

Sample Size and Reliability

The sample size is the number of individuals in your sample. In general, larger samples give more reliable and accurate estimates of the population. A small sample might happen to have unusual values that skew the results — this is called sampling variation or sampling error.

Think of it this way: if you flip a coin 10 times, you might get 8 heads just by chance. But if you flip it 1000 times, you would expect results much closer to 50% heads, 50% tails. The same principle applies to surveys: a poll of 50 people could easily give a misleading result, while a poll of 2000 people is likely to be close to the true population value.

However, a larger sample size does not fix a badly designed sample. If your sample is biased (e.g. only asking students who like sport whether sport should be compulsory), a large sample size just gives you a large biased result.

Key tip: The two most important things about a sample are that it is representative and large enough. A sample of 5 friends is almost never enough to draw conclusions about all 800 students in a school. When evaluating someone else's claim, always ask: How big was the sample? How was it selected? Could there be any bias?

Mastery Practice

Classify each data collection method. Write Primary or Secondary, and Census or Survey. Fluency
1. A researcher reads last year’s government rainfall records.
2. A teacher records the test score of every student in the class.
3. A student asks 20 classmates what their favourite subject is.
4. A company uses sales data collected by a market research firm.
5. A biologist counts every bird in a national park.
6. A journalist uses unemployment figures published by the ABS.
7. A sports club surveys a random selection of members about training times.
8. A council records the age of every resident in its database.
For each scenario, identify the population and the sample. Fluency
1. A factory tests 50 out of every 1000 light bulbs it produces to check for defects.
2. A Year 8 teacher selects 8 students from her class to represent the school in a survey about uniforms.
3. A radio station rings 200 listeners to ask about their favourite music genre.
4. A vet clinic asks 30 clients about their pets’ dietary habits.
5. A city council surveys 400 of the city’s 80 000 residents about a new park.
6. A researcher selects 25 trees from a forest of 4000 to study leaf size.
7. A supermarket surveys 100 shoppers exiting the store on a Saturday morning.
8. A swim coach times 12 of the 60 swimmers at a carnival.
Classify each as random, systematic, stratified, or convenience sampling. Fluency
1. A teacher puts all students’ names in a hat and draws 10.
2. A researcher surveys every 10th customer entering a shopping centre.
3. A student asks the first 15 people she sees at the bus stop.
4. A company ensures 40% of surveyed employees are from administration and 60% are from production, matching the actual workforce proportions.
5. Numbered tickets are placed in a barrel and 50 are drawn at random.
6. A journalist interviews shoppers who are willing to stop and chat.
7. A school selects 10 students from each year level for a wellbeing survey.
8. A quality controller checks every 20th item on a production line.
For each scenario, state whether a larger or smaller sample would give more reliable results, and explain why. Understanding
1. Estimating the proportion of left-handed students in Queensland schools: asking 10 students vs. 1000 students.
2. Testing whether a new pain relief tablet works: trialling on 5 patients vs. 200 patients.
3. Determining the most popular flavour of ice-cream in a suburb: asking 15 people vs. 150 people.
4. Predicting election results: polling 50 voters vs. 2000 voters.
5. Checking whether a coin is fair: flipping it 10 times vs. 500 times.
6. Estimating the average height of Year 8 students in Australia: measuring 20 students vs. 2000 students.
Identify the potential bias or problem with each survey question or method. Understanding
1. A fast food company asks its own employees: “Do you think our food is great?”
2. A survey question reads: “Don’t you agree that the school should have a longer lunch break?”
3. To find out what Year 8 students think about sport, a researcher only surveys students at a sports carnival.
4. A survey is conducted online, but only people who visit the school’s website see it.
5. A question asks: “How many hours do you waste watching television each week?”
6. A shopping centre surveys its customers to represent the views of all people in the local area.
For each research question, design a data collection study. State: (i) primary or secondary data, (ii) census or survey, (iii) sampling method, (iv) one possible source of bias. Problem Solving
1. Research question: What percentage of students at your school bring their lunch from home?
2. Research question: How does screen time affect sleep duration in teenagers across Australia?
3. Research question: What is the most common native bird species in a local bushland reserve?

Complete the table by filling in the most appropriate answer for each scenario. Understanding

Research question	Primary or Secondary?	Census or Survey?	Sampling method	Possible bias?
What do Year 8 students eat for lunch?	?	?	?	?
Average rainfall in Brisbane last year	?	N/A	N/A	?
Number of defective phones in a factory batch of 10 000	?	?	?	?

Read each data collection plan. Identify two things that could go wrong and suggest an improvement. Understanding
1. A student wants to know the favourite sport of all Year 8 students at their school (200 students). They survey their PE class of 25 students on a Friday afternoon.
2. A journalist wants to know whether people in a city support a new highway. They post a poll on the city council’s social media page and get 300 responses out of 50 000 residents.
3. A school nurse wants to know the average height of students. She measures all 6 students in the highest-performing academic class.
Classify each variable as categorical or numerical (discrete or continuous). Understanding
1. The colour of a student’s school bag.
2. The number of siblings a student has.
3. A student’s height in centimetres.
4. The type of transport used to get to school.
5. The number of books on a shelf.
6. The temperature of a room in degrees Celsius.
7. A student’s favourite subject.
8. The time taken to run 100 m.
Design a complete data collection plan for the following scenario. Include all details. Problem Solving

Scenario: Your school principal wants to know whether students are satisfied with the school canteen. There are 800 students at your school.
1. State the research question clearly.
2. Identify the population and suggest an appropriate sample size. Justify your choice.
3. Choose a sampling method and explain why it would give a representative sample.
4. Write two survey questions you would include (one closed, one open-ended).
5. Identify one potential source of bias and explain how you would minimise it.

See Answers ➔

Collecting Data and Sample Size

Key Ideas

Key Terms

Worked Example

Populations and Samples

Types of Sampling

Primary and Secondary Data

Sample Size and Reliability

Mastery Practice

Classify each data collection method. Write Primary or Secondary, and Census or Survey. Fluency

For each scenario, identify the population and the sample. Fluency

Classify each as random, systematic, stratified, or convenience sampling. Fluency

For each scenario, state whether a larger or smaller sample would give more reliable results, and explain why. Understanding

Identify the potential bias or problem with each survey question or method. Understanding

For each research question, design a data collection study. State: (i) primary or secondary data, (ii) census or survey, (iii) sampling method, (iv) one possible source of bias. Problem Solving

Complete the table by filling in the most appropriate answer for each scenario. Understanding

Read each data collection plan. Identify two things that could go wrong and suggest an improvement. Understanding

Classify each variable as categorical or numerical (discrete or continuous). Understanding

Design a complete data collection plan for the following scenario. Include all details. Problem Solving