Comparing Data Sets
Key Terms
- Five-number summary
- Minimum, Q1, median (Q2), Q3, maximum — the five values needed to draw a box plot.
- Box plot (box-and-whisker)
- A graphical display of the five-number summary; the box shows Q1–Q3, the line inside the box shows the median, whiskers extend to the min and max (excluding outliers).
- Parallel box plots
- Two or more box plots on the same scale, enabling direct comparison of centre and spread.
- Outlier on a box plot
- Shown as an individual point (×) or dot beyond the whisker, at more than 1.5 × IQR from Q1 or Q3.
- Comparing distributions
- Always comment on: centre (median), spread (IQR and range), shape (skew or symmetry), and any outliers.
- IQR (comparing)
- The width of the box represents the IQR; a wider box means the middle 50% of the data is more spread out.
Comparing Two Distributions
1. Shape — symmetric, positively skewed (right tail), negatively skewed (left tail)
2. Centre — compare means or medians (use median for skewed data)
3. Spread — compare IQR or standard deviations
Skewness rule of thumb:
• Mean > Median → positive skew (right tail)
• Mean < Median → negative skew (left tail)
• Mean ≈ Median → roughly symmetric
Back-to-Back Stem-and-Leaf Plots
| Feature | How to read |
|---|---|
| Stem | Centre column — shared by both datasets (tens digit) |
| Left group | Leaves read right-to-left from the stem (units digit) |
| Right group | Leaves read left-to-right from the stem (units digit) |
| Purpose | Visual comparison of shape, centre, and spread side-by-side |
Worked Example 1 — Back-to-Back Stem-and-Leaf
The following back-to-back stem-and-leaf shows reaction times (milliseconds) for two groups:
| Group A (treated) | Stem | Group B (control) |
|---|---|---|
| 9 8 7 | 2 | 1 3 5 |
| 6 5 4 2 | 3 | 0 2 4 7 8 |
| 8 3 | 4 | 1 2 6 |
| 1 | 5 | 3 7 |
Reading: Group A: 27, 28, 29, 32, 34, 35, 36, 38, 43, 51
Group B: 21, 23, 25, 30, 32, 34, 37, 38, 41, 42, 46, 53, 57
Comparison:
Centre: Group A median ≈ 35; Group B median ≈ 37 — Group A is slightly faster.
Spread: Group A range = 51−27 = 24; Group B range = 57−21 = 36 — Group A is more consistent.
Shape: Both roughly symmetric; Group B slightly more spread with higher values.
Full Lesson: Comparing Data Sets
The Language of Comparison
Statistical comparison requires precise language. Rather than “Group A is better,” a strong response states: “Group A has a higher median (52 vs 47), indicating a higher typical score. However, Group A also has a larger IQR (18 vs 12), suggesting greater variability in performance.”
Parallel Box Plots
Parallel (or side-by-side) box plots display two or more datasets on the same number line axis. Each box plot shows the five-number summary of one dataset. Comparing them visually reveals:
• Which group has more spread (compare box widths = IQR)
• Whether distributions overlap significantly
• Skewness (long whisker or box half = skew in that direction)
• Outliers (individual dots)
Skewness and the Choice of Measure
The shape of a distribution determines which statistics are appropriate:
Positively skewed (right tail): Median and IQR are more representative. Outliers on the right pull the mean above the median.
Negatively skewed (left tail): Median and IQR preferred. The mean is pulled below the median.
The Statistical Investigation Process
When comparing two groups, follow the investigation framework:
- Question: Formulate a specific statistical question (e.g. “Do Year 11 students in School A score higher than School B on standardised maths tests?”)
- Data: Identify the data collected (sample, population, how obtained)
- Analysis: Calculate relevant statistics, construct appropriate displays
- Conclusion: Answer the original question with reference to specific statistics. Acknowledge limitations.
Mastery Practice
-
Fluency
State whether each distribution is positively skewed, negatively skewed, or symmetric:
- Mean = 50, Median = 55
- Mean = 70, Median = 70
- Mean = 80, Median = 65
-
Fluency
Five-number summaries: Group X: Min = 10, Q1 = 22, Median = 30, Q3 = 41, Max = 55. Group Y: Min = 15, Q1 = 25, Median = 35, Q3 = 48, Max = 60. Compare the spread of the two groups (use range and IQR).
-
Fluency
The following back-to-back stem-and-leaf shows test scores for Class A (left) and Class B (right):
Class A Stem Class B 5 6 7 8 7 6 5 7 5 6 7 8 3 2 8 2 3 1 9 1 List all scores for each class. How many students are in each class?
-
Understanding
Compare the two datasets using mean, median, range and IQR:
Dataset 1: 12, 15, 18, 20, 22, 25, 28
Dataset 2: 5, 8, 20, 20, 20, 32, 35
-
Understanding
Rainfall (mm) for two cities over 6 months:
Brisbane: 45, 60, 80, 55, 40, 70
Cairns: 120, 200, 350, 180, 80, 150
Calculate the mean for each city. Which city has more rainfall on average and which is more variable? Calculate the range for each to support your answer. -
Understanding
A researcher compares weekly study hours for two groups:
Males: 8, 5, 12, 6, 9, 7, 11, 8, 10, 4
Females: 10, 12, 8, 14, 11, 9, 13, 7, 12, 14
Calculate the mean and range for each group. Comment on the difference in centre and consistency. -
Understanding
A boxplot has: Min = 5, Q1 = 12, Median = 18, Q3 = 26, Max = 34.
- Find the IQR and range.
- Use the IQR fence test to check for outliers.
- Is the distribution symmetric or skewed? Justify using the position of the median within the box.
-
Problem Solving
Sprint times (seconds) for two training groups:
Group 1: 10.2, 10.5, 10.8, 11.0, 11.2, 11.5, 11.8
Group 2: 9.8, 10.1, 10.9, 11.1, 11.4, 12.0, 12.3- Find the five-number summary (Min, Q1, Median, Q3, Max) for each group.
- Calculate the IQR for each group.
- Compare the two groups using shape, centre and spread. Which group is faster? Which is more consistent?
-
Problem Solving
Pulse rates (bpm) for two groups:
Athletes: 52, 58, 60, 62, 65, 68, 70
Non-athletes: 68, 72, 75, 78, 80, 82, 88
Calculate mean, median, and IQR for each group. Write a structured comparison addressing shape, centre, and spread. -
Problem Solving
Statistical Investigation: A teacher wants to know whether girls spend more time on homework per night than boys.
Boys (hours/night): 1.5, 2.0, 1.0, 2.5, 1.5, 3.0, 2.0, 1.5
Girls (hours/night): 2.0, 2.5, 3.0, 2.5, 2.0, 3.5, 2.5, 2.0- State a specific statistical question for this investigation.
- Calculate mean and median for each group.
- Compare the distributions using shape, centre and spread.
- State a conclusion in context, referring to the specific values you calculated.