Univariate Data Analysis — Topic Review — Solutions
15 questions covering all Univariate Data sub-topics: data classification, displays, measures of centre, measures of spread, normal distribution and z-scores, and comparing data sets.
-
Fluency
Classify each of the following as categorical (nominal or ordinal) or numerical (discrete or continuous):
- Eye colour
- Number of siblings
- Height in centimetres
- Movie rating (G / PG / M / MA)
(a) Eye colour → Categorical (nominal) — no natural order.
(b) Number of siblings → Numerical (discrete) — counted values, cannot be fractional.
(c) Height in cm → Numerical (continuous) — measured, can take any value in a range.
(d) Movie rating → Categorical (ordinal) — categories with a natural order (G < PG < M < MA).
-
Fluency
Find the mean, median and mode of: 6, 9, 4, 9, 7, 3, 9, 8, 5, 10
Ordered: 3, 4, 5, 6, 7, 8, 9, 9, 9, 10
Mean = (3+4+5+6+7+8+9+9+9+10)/10 = 70/10 = 7
Median = (7+8)/2 = 7.5 (average of 5th and 6th values)
Mode = 9 (appears 3 times)
-
Fluency
Find Q1, Q3 and IQR for: 8, 12, 15, 18, 22, 26, 30, 35
n = 8. Median = (18+22)/2 = 20 (between 4th and 5th values).
Lower half: 8, 12, 15, 18 → Q1 = (12+15)/2 = 13.5
Upper half: 22, 26, 30, 35 → Q3 = (26+30)/2 = 28
IQR = 28 − 13.5 = 14.5
-
Fluency
Heights follow a normal distribution: μ = 175 cm, σ = 8 cm. Find the z-score for a height of 191 cm.
z = (x − μ) / σ = (191 − 175) / 8 = 16/8 = 2
A height of 191 cm is exactly 2 standard deviations above the mean.
-
Fluency
A normal distribution has μ = 50, σ = 5. What percentage of values lie between 40 and 60?
40 = 50 − 2(5) = μ − 2σ and 60 = 50 + 2(5) = μ + 2σ
By the 68–95–99.7% rule, 95% of values lie within μ ± 2σ.
-
Understanding
Dataset: 14, 17, 19, 21, 23, 25, 27, 62. Use the IQR outlier method to determine whether 62 is an outlier. Show all working.
Ordered: 14, 17, 19, 21, 23, 25, 27, 62. n = 8.
Q1 = (17+19)/2 = 18. Q3 = (25+27)/2 = 26. IQR = 26 − 18 = 8.
Upper fence = Q3 + 1.5 × IQR = 26 + 12 = 38.
62 > 38, so 62 is an outlier.
-
Understanding
Two datasets have mean = 25. Dataset A has SD = 3, Dataset B has SD = 10. Describe what each standard deviation tells you about the spread of each dataset.
Dataset A (SD = 3): Values are tightly clustered around the mean of 25. Most values fall within 25 ± 3, i.e., between 22 and 28. This indicates a highly consistent dataset.
Dataset B (SD = 10): Values are widely spread around the mean. Most values fall within 25 ± 10, i.e., between 15 and 35. This indicates far greater variability in the data.
-
Understanding
Biology test: μ = 60, σ = 12. Chemistry test: μ = 72, σ = 15. Sam scored 78 on Biology and 90 on Chemistry. In which test did Sam perform better relative to the rest of the class? Use z-scores.
Biology z-score: z = (78 − 60)/12 = 18/12 = 1.5
Chemistry z-score: z = (90 − 72)/15 = 18/15 = 1.2
Sam’s Biology z-score (1.5) is higher than Chemistry (1.2), so Biology was the better relative performance, despite the lower raw score. Sam was further above the class mean in Biology.
-
Understanding
A box plot has: Min = 5, Q1 = 12, Median = 18, Q3 = 26, Max = 34.
- Find the IQR.
- Use the fence test to check for outliers.
- Describe the shape of the distribution.
(a) IQR = 26 − 12 = 14
(b) Lower fence = 12 − 1.5(14) = 12 − 21 = −9. Upper fence = 26 + 1.5(14) = 26 + 21 = 47.
Min = 5 > −9 and Max = 34 < 47. No outliers.(c) Median (18) is 6 units from Q1 and 8 units from Q3, so the upper box half is slightly longer. This suggests a slight positive skew (right tail).
-
Understanding
Pulse rates (bpm): Athletes: 52, 58, 60, 62, 65, 68, 70. Non-athletes: 68, 72, 75, 78, 80, 82, 88. Find the median and IQR for each group. Compare the centre and spread.
Athletes: Median = 62 bpm. Q1 = 58, Q3 = 68, IQR = 10.
Non-athletes: Median = 78 bpm. Q1 = 72, Q3 = 82, IQR = 10.
Comparison: Non-athletes have a higher median pulse rate (78 vs 62 bpm), indicating a higher typical resting heart rate. Both groups have the same IQR (10 bpm), meaning similar consistency in spread within each group. The 16 bpm difference in medians is a notable difference attributable to athletic fitness.
-
Understanding
In a normal distribution, what percentage of data lies: (a) above μ + 2σ? (b) below μ − σ? (c) between μ and μ + σ?
(a) 95% within μ ± 2σ → 5% outside → by symmetry: 2.5% above μ + 2σ.
(b) 68% within μ ± σ → 32% outside → by symmetry: 16% below μ − σ.
(c) 68% within μ ± σ → by symmetry, half of that is between mean and μ + σ: 34%.
-
Problem Solving
Exam scores for two classes:
Class 1: 45, 52, 61, 68, 74, 78, 82, 89
Class 2: 38, 55, 66, 70, 72, 75, 83, 91
Calculate mean, median and IQR for each class. Compare using shape, centre and spread. Which class performed better overall?Class 1: Mean = 549/8 = 68.6. Median = (68+74)/2 = 71. Q1 = (52+61)/2 = 56.5, Q3 = (78+82)/2 = 80. IQR = 23.5.
Class 2: Mean = 550/8 = 68.75. Median = (70+72)/2 = 71. Q1 = (55+66)/2 = 60.5, Q3 = (75+83)/2 = 79. IQR = 18.5.
Comparison:
Centre: Nearly identical (mean ≈ 68.7, median = 71 for both).
Spread: Class 2 has a smaller IQR (18.5 vs 23.5), meaning the middle 50% of Class 2 scores are more tightly grouped. Class 1 has a wider spread with lower scores pulling the distribution further down.
Shape: Class 1 appears slightly negatively skewed (lower scores drag mean down); Class 2 similar but more symmetric.
Conclusion: Performance is very similar. Class 2 is slightly more consistent (smaller IQR) while having a marginally higher mean, suggesting marginally better performance. -
Problem Solving
Weekly wages ($): 520, 540, 560, 580, 600, 620, 650, 2400.
- Calculate the mean and median.
- Which measure better represents a ‘typical’ wage? Explain.
- Calculate the IQR and use the fence test to determine if $2400 is an outlier.
(a) Mean = (520+540+560+580+600+620+650+2400)/8 = 6470/8 = $808.75
Median = (580+600)/2 = $590 (average of 4th and 5th values)
(b) The median ($590) better represents a typical wage. The $2400 outlier dramatically inflates the mean to $808.75, well above 7 of the 8 actual wages. The median is resistant to this outlier.
(c) Q1 = (540+560)/2 = 550. Q3 = (620+650)/2 = 635. IQR = 635 − 550 = 85.
Upper fence = 635 + 1.5(85) = 635 + 127.5 = 762.5.
$2400 > $762.50: $2400 is an outlier. -
Problem Solving
Heights follow a normal distribution: μ = 170 cm, σ = 10 cm.
- What percentage of people are between 150 cm and 190 cm?
- What percentage are taller than 190 cm?
- In a group of 500 people, how many would you expect to be shorter than 150 cm?
(a) 150 = 170 − 2(10) = μ − 2σ and 190 = μ + 2σ. Within μ ± 2σ: 95%.
(b) 190 = μ + 2σ. Above μ + 2σ: (100% − 95%)/2 = 2.5%.
(c) 150 = μ − 2σ. Below μ − 2σ: 2.5%. Expected count = 2.5% × 500 = 12.5 ≈ 13 people.
-
Problem Solving
A researcher records data for 8 students: study hours per week (x): 2, 3, 3, 4, 5, 6, 6, 7, and corresponding test scores (%): 55, 62, 65, 70, 74, 80, 78, 85.
- Calculate the mean study hours and mean test score.
- Describe the shape of the test score distribution (use mean vs median).
- Does the data appear to support a claim that more study leads to higher scores? Comment using the data.
- What further statistical analysis would be needed to confirm this relationship?
(a) Mean study hours = (2+3+3+4+5+6+6+7)/8 = 36/8 = 4.5 hrs.
Mean score = (55+62+65+70+74+80+78+85)/8 = 569/8 = 71.1%.(b) Sorted scores: 55, 62, 65, 70, 74, 78, 80, 85. Median = (70+74)/2 = 72. Mean (71.1) < median (72) very slightly, suggesting a near-symmetric distribution with a slight negative skew.
(c) The data does appear to support the claim. As study hours increase from 2 to 7, scores generally increase from 55% to 85%. The pattern is consistent (higher study → higher score) with only a minor reversal at 6 hours (80% and 78%). The mean score of 71.1% for a mean study time of 4.5 hrs is consistent with a positive association.
(d) To confirm the relationship formally, we would need to calculate the Pearson correlation coefficient (r) to measure the strength and direction of the linear association, and potentially fit a least-squares regression line (covered in Units 3/4 Bivariate Data).