Types of Data and Statistical Displays

Key Terms

Categorical data: Data that represents named categories or groups; cannot be measured numerically (e.g. colour, brand, gender).
Numerical data: Data with number values; discrete (countable whole numbers) or continuous (measured on a scale).
Frequency table: A table showing how often each value or category occurs in a dataset.
Histogram: A graph for continuous numerical data; bars are adjacent (no gaps); x-axis shows class intervals, y-axis shows frequency.
Stem-and-leaf plot: Splits each data value into a stem (leading digit(s)) and leaf (last digit); retains the actual data values.
Back-to-back stem-and-leaf: Compares two data sets side by side sharing the same stems; leaves go left and right.

Classifying Data

Every piece of data we collect belongs to a specific type. Choosing the correct type determines which displays and statistics are appropriate.

Category	Type	Description	Examples	Best Display
Categorical	Nominal	Named groups — no natural order	Eye colour, blood type, suburb	Bar chart, pie chart
Categorical	Ordinal	Named groups — natural order exists	Satisfaction (1–5), grade (A/B/C), size (S/M/L)	Bar chart, ordered frequency table
Numerical	Discrete	Counted — only specific values possible	Number of children, goals scored, shoe size	Dot plot, stem-and-leaf, bar chart
Numerical	Continuous	Measured — any value in a range	Height, temperature, time, mass	Histogram, stem-and-leaf

Statistical Displays

Common displays used in General Maths:

Dot plot — best for small datasets; shows every individual value; reveals clusters, gaps and outliers
Stem-and-leaf plot — shows shape and retains individual values; stems are tens, leaves are units
Bar chart — for categorical data; bars are separated by gaps
Histogram — for continuous data grouped into class intervals; bars touch (no gaps)
Frequency table — organises data into classes; can calculate totals and percentages

SVG 1 — Dot Plot: {2, 3, 3, 4, 4, 4, 5, 5, 6, 7}

SVG 2 — Histogram: Class intervals vs Frequency

Note: bars in a histogram touch — no gaps between them.

Worked Example 1 — Classifying Variables

Classify each variable as nominal, ordinal, discrete or continuous:

Variable	Type	Reason
Favourite sport	Nominal	Categories with no natural order
Movie rating (1–5 stars)	Ordinal	Ordered categories; cannot add or subtract ratings meaningfully
Number of siblings	Discrete	Counted; only whole number values possible
Weight of a parcel	Continuous	Measured; can take any value in a range
Postcode	Nominal	Labels only — arithmetic on postcodes is meaningless
Length of a fish	Continuous	Measured; infinite possible values in any interval

Worked Example 2 — Frequency Table and Distribution Shape

The ages of 34 party guests were recorded. Frequency table:

Age group	Frequency	Relative frequency
10–14	3	3/34 ≈ 8.8%
15–19	7	7/34 ≈ 20.6%
20–24	12	12/34 ≈ 35.3%
25–29	8	8/34 ≈ 23.5%
30–34	4	4/34 ≈ 11.8%
Total	34	100%

Distribution shape: The frequencies rise from the 10–14 group, peak at 20–24, then fall. This is approximately symmetric (mound-shaped/bell-shaped), with the peak in the middle class.

If most values were in the lower age groups and a few in the higher groups, we would say the distribution is positively skewed (tail to the right).

Hot Tip: For a histogram, the x-axis shows class intervals (continuous ranges), and bars must touch — no gaps. A bar chart for categorical data has gaps between bars. Mixing these up is one of the most common errors in Statistics. Ask yourself: “Is the data a measurement or a category?”

Full Lesson: Types of Data and Statistical Displays

Why Does Data Type Matter?

Statistics is about making sense of data. But before we can calculate anything or draw a graph, we need to understand what kind of data we have. Using the wrong display or statistic for a data type leads to misleading results. For example, finding the “average” postcode is mathematically possible but completely meaningless.

The Two Major Categories: Categorical and Numerical

Categorical data describes qualities or groups. We cannot perform arithmetic on it in a meaningful way.

Nominal: Groups have no ordering. Flipping the order changes nothing. Examples: eye colour (blue, green, brown), blood type (A, B, AB, O), suburb names.
Ordinal: Groups have a natural order, but the gaps between categories are not necessarily equal. Examples: satisfaction survey (strongly disagree, disagree, neutral, agree, strongly agree), academic grades (A, B, C, D, E).

Numerical data represents quantities. Arithmetic operations (addition, subtraction, averages) are meaningful.

Discrete: Values are counted; only specific values (usually integers) are possible. You cannot have 2.3 children. Examples: number of goals, number of students, shoe size.
Continuous: Values are measured along a continuous scale. Any value within a range is theoretically possible. Examples: height (162.4 cm), time (9.58 s), temperature (37.2°C).

Choosing the Right Display

The choice of display depends entirely on the data type:

Dot plot: Ideal for small datasets (up to ~30 values) of discrete or continuous data. Each dot represents one data point. Excellent for spotting the shape, clusters, gaps and outliers. Preserves all individual values.
Stem-and-leaf plot: For numerical data. The “stem” is the leading digit(s), the “leaf” is the last digit. Back-to-back stem plots compare two groups on the same stem. Preserves all individual values while showing shape.
Bar chart: For categorical data. Each category gets a separate bar. Bars must have gaps between them to indicate that the categories are distinct, not continuous.
Histogram: For continuous numerical data that has been grouped into class intervals. Bars touch to reflect the continuous nature of the scale. The area of each bar is proportional to frequency.
Frequency table: Organises both categorical and numerical data. For grouped data, class width should be consistent. Allows calculation of relative frequencies and percentages.

Describing Distribution Shape

When you look at a histogram or dot plot, describe its overall pattern:

Symmetric: The left and right halves are mirror images. A “bell curve” is symmetric.
Positively skewed (right-skewed): Tail extends to the right; most values are in the lower range with a few very high values. Example: house prices.
Negatively skewed (left-skewed): Tail extends to the left; most values are in the upper range with a few very low values. Example: scores on an easy test.
Bimodal: Two separate peaks, suggesting the data may come from two distinct groups.
Uniform: All values are roughly equally frequent.

Reading a stem-and-leaf plot: The stem plot below shows test scores.

3 | 2 5 8    → values: 32, 35, 38
4 | 1 4 4 7 9    → values: 41, 44, 44, 47, 49
5 | 0 3 6    → values: 50, 53, 56
6 | 2    → value: 62

Key: 3|2 = 32

This gives 10 values in total. The distribution is roughly symmetric with a slight positive skew (more values at 40s than 60s). The minimum is 32 and the maximum is 62.

Lesson Tip: A common mistake is calling shoe size “continuous” because it looks like a number. But shoe sizes come in fixed increments (7, 7.5, 8, 8.5, …) — you cannot have a shoe size of 8.3. This makes it discrete. Always ask: “Could this value take any number in a range, or only specific separate values?”

Mastery Practice

See Answers ➔

Fluency
Classify each of the following variables as nominal, ordinal, discrete or continuous:
1. Number of pets owned by a family
2. Hair colour of students in a class
3. Daily maximum temperature (°C) in Brisbane
4. Student academic grade (A, B, C, D, E)
5. Shoe size (Australian sizing: 5, 5.5, 6, 6.5, …)
6. Height of players on a basketball team
7. Postcode of a suburb
8. Time taken (seconds) to run 100 m
Fluency
For each data type in Question 1 (parts a–h), state the most appropriate statistical display and give one reason for your choice.

Fluency

A frequency table shows the ages of guests at a party:

Age group	Frequency
10–14	3
15–19	7
20–24	12
25–29	8
30–34	4

How many guests attended the party in total?
Which age group was most common?
What percentage of guests were in the 20–24 age group? (Round to 1 decimal place.)

Fluency

A stem-and-leaf plot shows test scores for a class:

Stem	Leaf
3	2 5 8
4	1 4 4 7 9
5	0 3 6
6	2

Key: 3|2 = 32

What is the minimum score?
What is the maximum score?
How many values are in the dataset?
List all the values in the 40s.

Understanding
A survey asks 20 students to name their favourite sport. Results: football ×7, basketball ×5, swimming ×4, tennis ×3, other ×1.
1. Classify this data type.
2. What is the most appropriate display for this data?
3. Construct a frequency table including a “Relative Frequency (%)” column.
4. Which display would not be suitable for this data? Explain why.
Understanding
Describe the shape of each distribution. Use terms such as symmetric, positively skewed, negatively skewed, bimodal or uniform.
1. A histogram where frequencies rise gradually to a peak in the middle class, then decrease symmetrically.
2. A stem-and-leaf plot where most values are in the upper stems (70s and 80s), with only a few in the lower stems.
3. A histogram with two separate peaks at opposite ends of the scale.
4. A dot plot with values spread evenly from 1 to 10, but one dot sitting far to the right at 25.
Understanding
A class of 25 students received the following scores on a test:

45, 52, 58, 62, 63, 65, 67, 68, 70, 71, 72, 72, 74, 75, 75, 76, 78, 80, 82, 84, 86, 88, 90, 92, 95
1. Group the data into class intervals 40–49, 50–59, 60–69, 70–79, 80–89, 90–99 and construct a frequency table.
2. Describe the shape of the distribution.
Understanding
Explain clearly:
1. Why a histogram is not an appropriate display for categorical data.
2. Why a bar chart is not appropriate for displaying continuous numerical data.
Problem Solving
A sample of 15 plants has the following heights (cm):

12, 13, 14, 14, 15, 16, 16, 17, 18, 18, 19, 20, 21, 22, 23

A second sample of 15 plants was grown under different conditions:

10, 11, 12, 13, 14, 15, 15, 16, 17, 17, 18, 19, 20, 21, 22
1. Construct a back-to-back stem-and-leaf plot comparing the two samples. Use tens as the stems.
2. Comment on the shape of each distribution.
3. Make two observations comparing the two groups (consider typical value and spread).
Problem Solving
Design a statistical investigation to answer the question: “Do students who exercise more frequently achieve higher academic results?”
1. Identify all the variables you would need to collect data on.
2. Classify each variable (nominal, ordinal, discrete or continuous).
3. Describe in detail how you would collect data. Include sample size and method.
4. State the most appropriate display for each variable.
5. Identify one potential source of bias in your investigation and explain how it could affect results.