Practice Maths

Comparing Data Sets

Key Terms

Five-number summary
Minimum, Q1, median (Q2), Q3, maximum — the five values needed to draw a box plot.
Box plot (box-and-whisker)
A graphical display of the five-number summary; the box shows Q1–Q3, the line inside the box shows the median, whiskers extend to the min and max (excluding outliers).
Parallel box plots
Two or more box plots on the same scale, enabling direct comparison of centre and spread.
Outlier on a box plot
Shown as an individual point (×) or dot beyond the whisker, at more than 1.5 × IQR from Q1 or Q3.
Comparing distributions
Always comment on: centre (median), spread (IQR and range), shape (skew or symmetry), and any outliers.
IQR (comparing)
The width of the box represents the IQR; a wider box means the middle 50% of the data is more spread out.

Comparing Two Distributions

Always compare three aspects:
1. Shape — symmetric, positively skewed (right tail), negatively skewed (left tail)
2. Centre — compare means or medians (use median for skewed data)
3. Spread — compare IQR or standard deviations

Skewness rule of thumb:
• Mean > Median → positive skew (right tail)
• Mean < Median → negative skew (left tail)
• Mean ≈ Median → roughly symmetric

Back-to-Back Stem-and-Leaf Plots

FeatureHow to read
StemCentre column — shared by both datasets (tens digit)
Left groupLeaves read right-to-left from the stem (units digit)
Right groupLeaves read left-to-right from the stem (units digit)
PurposeVisual comparison of shape, centre, and spread side-by-side

Worked Example 1 — Back-to-Back Stem-and-Leaf

The following back-to-back stem-and-leaf shows reaction times (milliseconds) for two groups:

Group A (treated) Stem Group B (control)
9 8 721 3 5
6 5 4 230 2 4 7 8
8 341 2 6
153 7

Reading: Group A: 27, 28, 29, 32, 34, 35, 36, 38, 43, 51
Group B: 21, 23, 25, 30, 32, 34, 37, 38, 41, 42, 46, 53, 57

Comparison:
Centre: Group A median ≈ 35; Group B median ≈ 37 — Group A is slightly faster.
Spread: Group A range = 51−27 = 24; Group B range = 57−21 = 36 — Group A is more consistent.
Shape: Both roughly symmetric; Group B slightly more spread with higher values.

Hot Tip: When comparing two datasets in a QCAA exam, always explicitly address all three aspects: shape, centre AND spread. A response that only comments on centre (e.g. “Group A has a higher mean”) will not receive full marks. Use precise language: “Group A has a higher median of X compared to Group B’s median of Y, indicating…”

Full Lesson: Comparing Data Sets

The Language of Comparison

Statistical comparison requires precise language. Rather than “Group A is better,” a strong response states: “Group A has a higher median (52 vs 47), indicating a higher typical score. However, Group A also has a larger IQR (18 vs 12), suggesting greater variability in performance.”

Parallel Box Plots

Parallel (or side-by-side) box plots display two or more datasets on the same number line axis. Each box plot shows the five-number summary of one dataset. Comparing them visually reveals:

• Which group has the higher centre (compare median lines)
• Which group has more spread (compare box widths = IQR)
• Whether distributions overlap significantly
• Skewness (long whisker or box half = skew in that direction)
• Outliers (individual dots)

Skewness and the Choice of Measure

The shape of a distribution determines which statistics are appropriate:

Symmetric: Mean and standard deviation are appropriate.
Positively skewed (right tail): Median and IQR are more representative. Outliers on the right pull the mean above the median.
Negatively skewed (left tail): Median and IQR preferred. The mean is pulled below the median.

The Statistical Investigation Process

When comparing two groups, follow the investigation framework:

  1. Question: Formulate a specific statistical question (e.g. “Do Year 11 students in School A score higher than School B on standardised maths tests?”)
  2. Data: Identify the data collected (sample, population, how obtained)
  3. Analysis: Calculate relevant statistics, construct appropriate displays
  4. Conclusion: Answer the original question with reference to specific statistics. Acknowledge limitations.
Lesson Tip: Back-to-back stem-and-leaf plots work well for small datasets (n < 30). For larger datasets, parallel box plots are more practical. Both should lead to the same comparison conclusions when done correctly.

Mastery Practice

  1. Fluency

    State whether each distribution is positively skewed, negatively skewed, or symmetric:

    1. Mean = 50, Median = 55
    2. Mean = 70, Median = 70
    3. Mean = 80, Median = 65
  2. Fluency

    Five-number summaries: Group X: Min = 10, Q1 = 22, Median = 30, Q3 = 41, Max = 55. Group Y: Min = 15, Q1 = 25, Median = 35, Q3 = 48, Max = 60. Compare the spread of the two groups (use range and IQR).

  3. Fluency

    The following back-to-back stem-and-leaf shows test scores for Class A (left) and Class B (right):

    Class A Stem Class B
    567 8
    7 6 575 6 7 8
    3 282 3
    191

    List all scores for each class. How many students are in each class?

  4. Understanding

    Compare the two datasets using mean, median, range and IQR:

    Dataset 1: 12, 15, 18, 20, 22, 25, 28

    Dataset 2: 5, 8, 20, 20, 20, 32, 35

  5. Understanding

    Rainfall (mm) for two cities over 6 months:
    Brisbane: 45, 60, 80, 55, 40, 70
    Cairns: 120, 200, 350, 180, 80, 150
    Calculate the mean for each city. Which city has more rainfall on average and which is more variable? Calculate the range for each to support your answer.

  6. Understanding

    A researcher compares weekly study hours for two groups:
    Males: 8, 5, 12, 6, 9, 7, 11, 8, 10, 4
    Females: 10, 12, 8, 14, 11, 9, 13, 7, 12, 14
    Calculate the mean and range for each group. Comment on the difference in centre and consistency.

  7. Understanding

    A boxplot has: Min = 5, Q1 = 12, Median = 18, Q3 = 26, Max = 34.

    1. Find the IQR and range.
    2. Use the IQR fence test to check for outliers.
    3. Is the distribution symmetric or skewed? Justify using the position of the median within the box.
  8. Problem Solving

    Sprint times (seconds) for two training groups:
    Group 1: 10.2, 10.5, 10.8, 11.0, 11.2, 11.5, 11.8
    Group 2: 9.8, 10.1, 10.9, 11.1, 11.4, 12.0, 12.3

    1. Find the five-number summary (Min, Q1, Median, Q3, Max) for each group.
    2. Calculate the IQR for each group.
    3. Compare the two groups using shape, centre and spread. Which group is faster? Which is more consistent?
  9. Problem Solving

    Pulse rates (bpm) for two groups:
    Athletes: 52, 58, 60, 62, 65, 68, 70
    Non-athletes: 68, 72, 75, 78, 80, 82, 88
    Calculate mean, median, and IQR for each group. Write a structured comparison addressing shape, centre, and spread.

  10. Problem Solving

    Statistical Investigation: A teacher wants to know whether girls spend more time on homework per night than boys.

    Boys (hours/night): 1.5, 2.0, 1.0, 2.5, 1.5, 3.0, 2.0, 1.5
    Girls (hours/night): 2.0, 2.5, 3.0, 2.5, 2.0, 3.5, 2.5, 2.0

    1. State a specific statistical question for this investigation.
    2. Calculate mean and median for each group.
    3. Compare the distributions using shape, centre and spread.
    4. State a conclusion in context, referring to the specific values you calculated.