L55 — Mixed Data Analysis
A complete statistical inquiry follows this cycle:
- Pose a question
- Collect or obtain data
- Display the data
- Calculate measures (mean, median, mode, range)
- Interpret — what do the measures tell you?
- Conclude — answer the original question
Worked Example
Dataset: {45, 52, 67, 71, 43, 58, 72, 65, 49, 76, 61, 54}
Sorted: 43, 45, 49, 52, 54, 58, 61, 65, 67, 71, 72, 76
4 | 3 5 9
5 | 2 4 8
6 | 1 5 7
7 | 1 2 6 Key: 4|3 = 43
Mean = 713 ÷ 12 = 59.4 | Median = (58+61)÷2 = 59.5 | No mode | Range = 76−43 = 33
Conclusions: "The class averaged around 59–60, suggesting mid-range achievement. The range of 33 marks indicates moderate variability."
Key Terms
- statistical inquiry cycle
- the process of posing a question, collecting data, displaying it, calculating measures, interpreting, and concluding
- conclusion
- a statement that answers the original question using specific data values; must stay within what the data shows
- limitation
- a factor that restricts what conclusions can be drawn — e.g. small sample, biased collection, or data from a different context
- over-generalise
- to draw conclusions that go further than the data supports — e.g. using one class's results to claim something about all students in Australia
Putting It All Together
This lesson applies everything from Year 7 Data. A complete analysis has three stages: organise and display, calculate measures, then interpret and communicate. Skipping any stage makes your analysis incomplete.
Stage 1 — Choose and Create a Display
Ask: what type of data is this, and what am I trying to show?
- Stem-and-leaf / dot plot — small numerical datasets; shows full distribution
- Back-to-back stem-and-leaf — comparing two groups
- Line graph — showing how a quantity changes over time
- Column / bar graph — comparing categories
Always include a title, labelled axes, and a key if needed.
Stage 2 — Calculate the Measures
For a complete statistical summary:
- Mean — sum ÷ count
- Median — middle value in sorted data
- Mode — most frequent value
- Range — maximum − minimum
If there is an outlier, note it and consider whether mean or median better represents the data.
Stage 3 — Interpret and Write Conclusions
A good conclusion:
- Answers the original question using actual numbers from the data
- Uses language like "the data suggests...", "on average...", "the results indicate..."
- Does not over-generalise beyond the sample
- Notes any limitations — small sample? biased collection? outliers?
When comparing two groups, structure your conclusion as: "Group A had a [higher/lower] [measure] of [value] compared to Group B's [measure] of [value], suggesting..."
Common Mistakes to Avoid
- Forgetting to sort data before finding the median
- Using the mean when there is a clear outlier
- Writing "Group A is better" without saying by how much or in what way
- Drawing a line graph for categorical data
- Confusing "no mode" with "the mode is 0"
-
Calculate All Four Measures
For each dataset, find the mean, median, mode, and range.
- Dataset A — Goals scored per game (10 games): {3, 5, 2, 4, 5, 1, 3, 6, 5, 2}
- Dataset B — Heights (cm) of 12 students: {148, 152, 165, 147, 158, 152, 171, 149, 162, 152, 155, 161}
- Dataset C — Time (min) to complete a puzzle (14 participants): {8, 12, 7, 15, 9, 11, 8, 14, 10, 8, 13, 6, 11, 9}
-
Choose the Best Display Type
For each dataset, choose the best display type and explain your choice. (Display types: dot plot, stem-and-leaf plot, line graph, column/bar graph, pie/sector chart)
- Number of pets owned by each student in a class of 28 students (values range from 0 to 6).
- Percentage of students who chose each of four different sports as their favourite.
- Heights (cm) of 30 students to show the overall shape of the distribution.
- Temperature recorded every hour over a 24-hour period.
-
Full Analysis
Number of books read by 15 students over the school holidays: {4, 7, 2, 9, 5, 3, 7, 1, 8, 6, 7, 4, 5, 3, 6}
- Write out the stems and leaves for a stem-and-leaf plot of this data.
- Calculate the mean, median, mode, and range.
- Write three conclusions about reading habits during the holidays based on your analysis.
-
Sort and Calculate
Sort each dataset and find mean, median, mode, and range.
- Dataset D — Daily steps over 8 days: {6 200, 8 400, 5 900, 7 100, 8 400, 9 000, 6 800, 8 400}
- Dataset E — Marks on 10 spelling tests: {14, 16, 20, 18, 14, 17, 20, 14, 19, 16}
- Dataset F — Ages of members of a chess club: {12, 14, 13, 15, 12, 14, 16, 12, 13, 14, 15, 12}
-
Which Measure of Centre?
For each dataset, state which measure of centre best summarises it. Explain your choice.
- House prices: {$420 000, $450 000, $380 000, $410 000, $1 800 000, $430 000, $400 000}
- Shoe sizes in a class of 25 students.
- 1 km run times: {5.2, 5.8, 6.1, 5.5, 5.9, 6.3, 5.7, 5.4, 5.6, 6.0}
- Number of siblings each student has: {0, 1, 2, 1, 0, 3, 1, 0, 2, 1, 0, 1, 4, 1, 0}
-
Outliers and Their Effect
- Dataset: {4, 5, 6, 5, 7, 6, 5, 4, 6, 50}. Calculate the mean, median, and mode with and without the outlier (50). Which measure is most affected?
- A student scores {72, 68, 75, 74, 71, 70, 69, 3} on 8 tests (3 was because she was sick). Calculate the mean with and without the score of 3. How does the mean change?
- Explain in your own words why the median is often preferred when a dataset contains an outlier.
- Give a real-world example where an outlier would make the mean a misleading summary statistic.
-
Stem-and-Leaf Plots
- Draw a stem-and-leaf plot for: {34, 41, 27, 38, 52, 29, 45, 33, 48, 56, 31, 44, 37, 50, 26}
- Use your plot to find the median and mode.
- Calculate the mean and range for this dataset.
- Using the plot, identify the stem (tens digit) in which most values appear.
- Read the following plot and list all the original data values:
2 | 1 4 8
3 | 0 3 5 9
4 | 2 6
5 | 1 7
Key: 2|1 = 21 - For the plot in (e), find the median, mode, and range.
-
Compare Two Classes
Test scores (out of 50) for two classes:
Class A (leaves) Stem Class B (leaves) 9 8 5 2 2 3 6 7 6 4 1 3 1 4 5 8 8 5 3 0 4 2 4 7 2 5 0 Key: for Class A read leaf then stem (e.g. leaf 2, stem 5 = 52) • for Class B read stem then leaf (e.g. stem 5, leaf 0 = 50)
- List all scores for Class A and Class B.
- Calculate the mean and range for each class.
- Find the median for each class.
- Which class performed better? Use statistics to justify your answer.
- Which class had more spread in their results? Explain.
-
Interpreting a Column Graph
How Year 7 students travel to school:
Method Walk Bus Car Train Cycle Number of students 28 45 37 18 12 - How many students are represented in total?
- What percentage of students travel by bus? (Round to 1 decimal place.)
- What fraction of students travel by car or walk? Express as a simplified fraction.
- Which two methods combined account for more than half of all students?
- A student says the mode of this data is "Bus." Is she correct? Explain what mode means in this context.
- Write two conclusions about how Year 7 students travel to school, using the data to support each.
-
Full Statistical Investigation
Minutes per day spent reading for pleasure (20 Year 7 students):
{0, 15, 30, 20, 45, 10, 0, 25, 60, 15, 30, 20, 10, 45, 0, 30, 20, 15, 40, 25}
- Sort the data and construct a stem-and-leaf plot (use stems 0, 1, 2, 3, 4, 5, 6).
- Calculate the mean, median, mode, and range. Show all working.
- The researcher says: "Most Year 7 students don't read at all for pleasure." Does the data support this claim? Explain using specific statistics.
- Identify any potential outliers and explain how they affect the mean.
- Write a brief report (4–5 sentences) summarising the reading habits of this group. Use measures of centre and spread.