Solutions: Types of Data and Statistical Displays
-
Fluency
-
Discrete — you count pets; only whole number values are possible (0, 1, 2, 3, …)
-
Nominal — hair colour is a named category with no natural ordering (brown is not “greater than” blonde)
-
Continuous — temperature is measured; it can take any value (e.g. 28.4°C, 31.07°C)
-
Ordinal — grades have a natural order (A > B > C > D > E) but the gaps between grades are not necessarily equal
-
Discrete — shoe sizes come in fixed increments (5, 5.5, 6, 6.5, …); you cannot have size 6.3. Countable separate values.
-
Continuous — height is measured; any value is possible (e.g. 182.3 cm, 195.87 cm)
-
Nominal — postcodes are labels/identifiers. Arithmetic on them is meaningless (e.g. 4000 + 4001 ≠ a meaningful postcode).
-
Continuous — time is measured; any positive real value is theoretically possible (e.g. 9.87 s, 12.403 s)
-
-
Fluency
-
Dot plot — discrete data with a small range; shows each value individually
-
Bar chart — nominal (categorical) data; bars separated by gaps to show distinct categories
-
Histogram — continuous numerical data; bars touch to reflect the continuous scale
-
Bar chart (ordered) — ordinal data; bars can be ordered A to E to show the ranking
-
Dot plot or bar chart — discrete with distinct values; a dot plot shows every value, a bar chart shows frequency of each size
-
Histogram or stem-and-leaf plot — continuous data; histogram shows grouped distribution; stem-and-leaf preserves individual values
-
Bar chart — nominal data (postcodes are just labels); a bar chart shows how many observations fall in each postcode
-
Histogram or stem-and-leaf plot — continuous measured data; class intervals group the times; stem-and-leaf (using seconds and tenths) retains precision
-
-
Fluency
-
Total = 3 + 7 + 12 + 8 + 4 = 34 guests
-
The 20–24 age group has the highest frequency (12 guests).
-
Percentage = 12 ÷ 34 × 100 = 35.3%
-
-
Fluency
-
Minimum = 32 (stem 3, leaf 2)
-
Maximum = 62 (stem 6, leaf 2)
-
Count leaves: 3 + 5 + 3 + 1 = 12 values
-
Stem 4, leaves 1, 4, 4, 7, 9: values are 41, 44, 44, 47, 49
-
-
Understanding
-
Nominal — sport names are categories with no natural order.
-
Bar chart — appropriate for nominal categorical data; bars are separated to show distinct sports.
-
Sport Frequency Relative Frequency (%) Football 7 35% Basketball 5 25% Swimming 4 20% Tennis 3 15% Other 1 5% Total 20 100% -
A histogram would not be suitable. A histogram is used for continuous numerical data grouped into class intervals. Favourite sports are categorical (nominal) data — there is no numerical scale on which to place bars that touch.
-
-
Understanding
-
Symmetric (approximately bell-shaped / mound-shaped). The distribution rises to a single peak in the middle and falls away on both sides equally.
-
Negatively skewed (left-skewed). Most values are concentrated in the upper stems (70s and 80s), with a long tail extending to the left (lower scores).
-
Bimodal. Two separate peaks suggest the data may come from two distinct subgroups within the population.
-
The distribution is positively skewed with an outlier. The bulk of the data is between 1 and 10, but the single point at 25 is far removed from the rest, creating a long tail to the right.
-
-
Understanding
-
Class interval Frequency 40–49 1 50–59 2 60–69 5 70–79 8 80–89 6 90–99 3 Total 25 -
The distribution is approximately symmetric with a slight positive skew. Frequencies build from low scores, peak in the 70–79 class, then decrease. The majority of students scored between 60 and 89.
-
-
Understanding
-
A histogram is not appropriate for categorical data because a histogram requires a numerical scale on the x-axis where values can be placed in order along a continuous number line. Categorical groups (e.g. “red”, “blue”, “green”) have no numerical position and cannot be meaningfully placed on such a scale. The touching bars of a histogram imply a continuous range between adjacent values — which does not exist for categories.
-
A bar chart is not appropriate for continuous data because a bar chart has gaps between bars to indicate that each bar represents a completely separate, distinct category. Continuous data flows without interruption along a number line — grouping it into class intervals and displaying it with gaps would incorrectly suggest that no data values could exist between the intervals. A histogram (with touching bars) correctly communicates the continuous nature of the data.
-
-
Problem Solving
-
Back-to-back stem-and-leaf plot:
Sample A (left) Stem Sample B (right) 9 8 7 6 6 5 4 4 3 2 1 0 1 2 3 4 5 5 6 7 7 3 2 1 0 2 0 1 2 Key: 1|2 = 12 cm (Sample A reads right to left)
-
Sample A: The distribution is approximately uniform/symmetric across the 10–23 cm range, with values spread fairly evenly and a slight concentration in the teens.
Sample B: The distribution is similar in shape, with most values in the teens and a few in the 20s. Both distributions are slightly positively skewed (more values at the lower end).
-
Observation 1 (Centre): Sample A has a slightly higher typical height — its values appear more concentrated in the upper teens (17–20 cm range), whereas Sample B has slightly more values in the lower teens (10–15 cm range).
Observation 2 (Spread): Both samples have a similar range (12–23 cm vs 10–22 cm), so the spread of heights is comparable. Neither sample shows outliers or unusual gaps.
-
-
Problem Solving
-
Variables to collect:
- Exercise frequency (e.g. days per week)
- Academic result (e.g. overall grade percentage or GPA)
- Year level (to control for age/difficulty)
- Gender (optional covariate)
- Duration of each exercise session (if frequency is not enough)
-
Exercise frequency (days/week): Discrete (counted, whole number values 0–7)
Academic result (%): Continuous (measured score, can be any value 0–100)
Year level: Ordinal (Year 7, 8, 9 … have a natural order)
Gender: Nominal (categories with no natural order)
-
Survey method: Distribute a voluntary, anonymous questionnaire to a random sample of at least 50 students from across multiple year levels at the school. Ask: (1) How many days per week do you exercise for at least 30 minutes? (2) What is your current overall academic average (%)? (3) What year level are you in? Collect data over one school term to ensure consistency.
-
Exercise frequency: Dot plot or bar chart (discrete, small number of values 0–7)
Academic result: Histogram (continuous, grouped into class intervals such as 50–59%, 60–69%, etc.)
To show relationship: Scatter plot (exercise days on x-axis, academic result on y-axis)
-
Voluntary response bias: Students who are motivated, healthy and organised are more likely to complete the survey. These students may exercise more and have higher academic results for the same underlying reason (self-discipline). This would overstate the relationship between exercise and academic performance. The sample would not be representative of all students.
-