Solutions: Types of Data and Statistical Displays

Fluency
1. Discrete — you count pets; only whole number values are possible (0, 1, 2, 3, …)
2. Nominal — hair colour is a named category with no natural ordering (brown is not “greater than” blonde)
3. Continuous — temperature is measured; it can take any value (e.g. 28.4°C, 31.07°C)
4. Ordinal — grades have a natural order (A > B > C > D > E) but the gaps between grades are not necessarily equal
5. Discrete — shoe sizes come in fixed increments (5, 5.5, 6, 6.5, …); you cannot have size 6.3. Countable separate values.
6. Continuous — height is measured; any value is possible (e.g. 182.3 cm, 195.87 cm)
7. Nominal — postcodes are labels/identifiers. Arithmetic on them is meaningless (e.g. 4000 + 4001 ≠ a meaningful postcode).
8. Continuous — time is measured; any positive real value is theoretically possible (e.g. 9.87 s, 12.403 s)
Fluency
1. Dot plot — discrete data with a small range; shows each value individually
2. Bar chart — nominal (categorical) data; bars separated by gaps to show distinct categories
3. Histogram — continuous numerical data; bars touch to reflect the continuous scale
4. Bar chart (ordered) — ordinal data; bars can be ordered A to E to show the ranking
5. Dot plot or bar chart — discrete with distinct values; a dot plot shows every value, a bar chart shows frequency of each size
6. Histogram or stem-and-leaf plot — continuous data; histogram shows grouped distribution; stem-and-leaf preserves individual values
7. Bar chart — nominal data (postcodes are just labels); a bar chart shows how many observations fall in each postcode
8. Histogram or stem-and-leaf plot — continuous measured data; class intervals group the times; stem-and-leaf (using seconds and tenths) retains precision
Fluency
1. Total = 3 + 7 + 12 + 8 + 4 = 34 guests
2. The 20–24 age group has the highest frequency (12 guests).
3. Percentage = 12 ÷ 34 × 100 = 35.3%
Fluency
1. Minimum = 32 (stem 3, leaf 2)
2. Maximum = 62 (stem 6, leaf 2)
3. Count leaves: 3 + 5 + 3 + 1 = 12 values
4. Stem 4, leaves 1, 4, 4, 7, 9: values are 41, 44, 44, 47, 49

Understanding

Nominal — sport names are categories with no natural order.
Bar chart — appropriate for nominal categorical data; bars are separated to show distinct sports.

Sport	Frequency	Relative Frequency (%)
Football	7	35%
Basketball	5	25%
Swimming	4	20%
Tennis	3	15%
Other	1	5%
Total	20	100%

A histogram would not be suitable. A histogram is used for continuous numerical data grouped into class intervals. Favourite sports are categorical (nominal) data — there is no numerical scale on which to place bars that touch.

Understanding
1. Symmetric (approximately bell-shaped / mound-shaped). The distribution rises to a single peak in the middle and falls away on both sides equally.
2. Negatively skewed (left-skewed). Most values are concentrated in the upper stems (70s and 80s), with a long tail extending to the left (lower scores).
3. Bimodal. Two separate peaks suggest the data may come from two distinct subgroups within the population.
4. The distribution is positively skewed with an outlier. The bulk of the data is between 1 and 10, but the single point at 25 is far removed from the rest, creating a long tail to the right.

Understanding

Class interval	Frequency
40–49	1
50–59	2
60–69	5
70–79	8
80–89	6
90–99	3
Total	25

The distribution is approximately symmetric with a slight positive skew. Frequencies build from low scores, peak in the 70–79 class, then decrease. The majority of students scored between 60 and 89.

Understanding
1. A histogram is not appropriate for categorical data because a histogram requires a numerical scale on the x-axis where values can be placed in order along a continuous number line. Categorical groups (e.g. “red”, “blue”, “green”) have no numerical position and cannot be meaningfully placed on such a scale. The touching bars of a histogram imply a continuous range between adjacent values — which does not exist for categories.
2. A bar chart is not appropriate for continuous data because a bar chart has gaps between bars to indicate that each bar represents a completely separate, distinct category. Continuous data flows without interruption along a number line — grouping it into class intervals and displaying it with gaps would incorrectly suggest that no data values could exist between the intervals. A histogram (with touching bars) correctly communicates the continuous nature of the data.

Problem Solving

Back-to-back stem-and-leaf plot:

Sample A (left)	Stem	Sample B (right)
9 8 7 6 6 5 4 4 3 2	1	0 1 2 3 4 5 5 6 7 7
3 2 1 0	2	0 1 2

Key: 1|2 = 12 cm (Sample A reads right to left)

Sample A: The distribution is approximately uniform/symmetric across the 10–23 cm range, with values spread fairly evenly and a slight concentration in the teens.

Sample B: The distribution is similar in shape, with most values in the teens and a few in the 20s. Both distributions are slightly positively skewed (more values at the lower end).
Observation 1 (Centre): Sample A has a slightly higher typical height — its values appear more concentrated in the upper teens (17–20 cm range), whereas Sample B has slightly more values in the lower teens (10–15 cm range).

Observation 2 (Spread): Both samples have a similar range (12–23 cm vs 10–22 cm), so the spread of heights is comparable. Neither sample shows outliers or unusual gaps.

Problem Solving
1. Variables to collect:
  - Exercise frequency (e.g. days per week)
  - Academic result (e.g. overall grade percentage or GPA)
  - Year level (to control for age/difficulty)
  - Gender (optional covariate)
  - Duration of each exercise session (if frequency is not enough)
2. Exercise frequency (days/week): Discrete (counted, whole number values 0–7)
  
  Academic result (%): Continuous (measured score, can be any value 0–100)
  
  Year level: Ordinal (Year 7, 8, 9 … have a natural order)
  
  Gender: Nominal (categories with no natural order)
3. Survey method: Distribute a voluntary, anonymous questionnaire to a random sample of at least 50 students from across multiple year levels at the school. Ask: (1) How many days per week do you exercise for at least 30 minutes? (2) What is your current overall academic average (%)? (3) What year level are you in? Collect data over one school term to ensure consistency.
4. Exercise frequency: Dot plot or bar chart (discrete, small number of values 0–7)
  
  Academic result: Histogram (continuous, grouped into class intervals such as 50–59%, 60–69%, etc.)
  
  To show relationship: Scatter plot (exercise days on x-axis, academic result on y-axis)
5. Voluntary response bias: Students who are motivated, healthy and organised are more likely to complete the survey. These students may exercise more and have higher academic results for the same underlying reason (self-discipline). This would overstate the relationship between exercise and academic performance. The sample would not be representative of all students.