Types of Data and Statistical Displays
Key Terms
- Categorical data
- Data that represents named categories or groups; cannot be measured numerically (e.g. colour, brand, gender).
- Numerical data
- Data with number values; discrete (countable whole numbers) or continuous (measured on a scale).
- Frequency table
- A table showing how often each value or category occurs in a dataset.
- Histogram
- A graph for continuous numerical data; bars are adjacent (no gaps); x-axis shows class intervals, y-axis shows frequency.
- Stem-and-leaf plot
- Splits each data value into a stem (leading digit(s)) and leaf (last digit); retains the actual data values.
- Back-to-back stem-and-leaf
- Compares two data sets side by side sharing the same stems; leaves go left and right.
Classifying Data
Every piece of data we collect belongs to a specific type. Choosing the correct type determines which displays and statistics are appropriate.
| Category | Type | Description | Examples | Best Display |
|---|---|---|---|---|
| Categorical | Nominal | Named groups — no natural order | Eye colour, blood type, suburb | Bar chart, pie chart |
| Ordinal | Named groups — natural order exists | Satisfaction (1–5), grade (A/B/C), size (S/M/L) | Bar chart, ordered frequency table | |
| Numerical | Discrete | Counted — only specific values possible | Number of children, goals scored, shoe size | Dot plot, stem-and-leaf, bar chart |
| Continuous | Measured — any value in a range | Height, temperature, time, mass | Histogram, stem-and-leaf |
Statistical Displays
Common displays used in General Maths:
- Dot plot — best for small datasets; shows every individual value; reveals clusters, gaps and outliers
- Stem-and-leaf plot — shows shape and retains individual values; stems are tens, leaves are units
- Bar chart — for categorical data; bars are separated by gaps
- Histogram — for continuous data grouped into class intervals; bars touch (no gaps)
- Frequency table — organises data into classes; can calculate totals and percentages
SVG 1 — Dot Plot: {2, 3, 3, 4, 4, 4, 5, 5, 6, 7}
SVG 2 — Histogram: Class intervals vs Frequency
Note: bars in a histogram touch — no gaps between them.
Worked Example 1 — Classifying Variables
Classify each variable as nominal, ordinal, discrete or continuous:
| Variable | Type | Reason |
|---|---|---|
| Favourite sport | Nominal | Categories with no natural order |
| Movie rating (1–5 stars) | Ordinal | Ordered categories; cannot add or subtract ratings meaningfully |
| Number of siblings | Discrete | Counted; only whole number values possible |
| Weight of a parcel | Continuous | Measured; can take any value in a range |
| Postcode | Nominal | Labels only — arithmetic on postcodes is meaningless |
| Length of a fish | Continuous | Measured; infinite possible values in any interval |
Worked Example 2 — Frequency Table and Distribution Shape
The ages of 34 party guests were recorded. Frequency table:
| Age group | Frequency | Relative frequency |
|---|---|---|
| 10–14 | 3 | 3/34 ≈ 8.8% |
| 15–19 | 7 | 7/34 ≈ 20.6% |
| 20–24 | 12 | 12/34 ≈ 35.3% |
| 25–29 | 8 | 8/34 ≈ 23.5% |
| 30–34 | 4 | 4/34 ≈ 11.8% |
| Total | 34 | 100% |
Distribution shape: The frequencies rise from the 10–14 group, peak at 20–24, then fall. This is approximately symmetric (mound-shaped/bell-shaped), with the peak in the middle class.
If most values were in the lower age groups and a few in the higher groups, we would say the distribution is positively skewed (tail to the right).
Full Lesson: Types of Data and Statistical Displays
Why Does Data Type Matter?
Statistics is about making sense of data. But before we can calculate anything or draw a graph, we need to understand what kind of data we have. Using the wrong display or statistic for a data type leads to misleading results. For example, finding the “average” postcode is mathematically possible but completely meaningless.
The Two Major Categories: Categorical and Numerical
Categorical data describes qualities or groups. We cannot perform arithmetic on it in a meaningful way.
- Nominal: Groups have no ordering. Flipping the order changes nothing. Examples: eye colour (blue, green, brown), blood type (A, B, AB, O), suburb names.
- Ordinal: Groups have a natural order, but the gaps between categories are not necessarily equal. Examples: satisfaction survey (strongly disagree, disagree, neutral, agree, strongly agree), academic grades (A, B, C, D, E).
Numerical data represents quantities. Arithmetic operations (addition, subtraction, averages) are meaningful.
- Discrete: Values are counted; only specific values (usually integers) are possible. You cannot have 2.3 children. Examples: number of goals, number of students, shoe size.
- Continuous: Values are measured along a continuous scale. Any value within a range is theoretically possible. Examples: height (162.4 cm), time (9.58 s), temperature (37.2°C).
Choosing the Right Display
The choice of display depends entirely on the data type:
- Dot plot: Ideal for small datasets (up to ~30 values) of discrete or continuous data. Each dot represents one data point. Excellent for spotting the shape, clusters, gaps and outliers. Preserves all individual values.
- Stem-and-leaf plot: For numerical data. The “stem” is the leading digit(s), the “leaf” is the last digit. Back-to-back stem plots compare two groups on the same stem. Preserves all individual values while showing shape.
- Bar chart: For categorical data. Each category gets a separate bar. Bars must have gaps between them to indicate that the categories are distinct, not continuous.
- Histogram: For continuous numerical data that has been grouped into class intervals. Bars touch to reflect the continuous nature of the scale. The area of each bar is proportional to frequency.
- Frequency table: Organises both categorical and numerical data. For grouped data, class width should be consistent. Allows calculation of relative frequencies and percentages.
Describing Distribution Shape
When you look at a histogram or dot plot, describe its overall pattern:
- Symmetric: The left and right halves are mirror images. A “bell curve” is symmetric.
- Positively skewed (right-skewed): Tail extends to the right; most values are in the lower range with a few very high values. Example: house prices.
- Negatively skewed (left-skewed): Tail extends to the left; most values are in the upper range with a few very low values. Example: scores on an easy test.
- Bimodal: Two separate peaks, suggesting the data may come from two distinct groups.
- Uniform: All values are roughly equally frequent.
3 | 2 5 8 → values: 32, 35, 38
4 | 1 4 4 7 9 → values: 41, 44, 44, 47, 49
5 | 0 3 6 → values: 50, 53, 56
6 | 2 → value: 62
Key: 3|2 = 32
This gives 10 values in total. The distribution is roughly symmetric with a slight positive skew (more values at 40s than 60s). The minimum is 32 and the maximum is 62.
Mastery Practice
-
Fluency
Classify each of the following variables as nominal, ordinal, discrete or continuous:
- Number of pets owned by a family
- Hair colour of students in a class
- Daily maximum temperature (°C) in Brisbane
- Student academic grade (A, B, C, D, E)
- Shoe size (Australian sizing: 5, 5.5, 6, 6.5, …)
- Height of players on a basketball team
- Postcode of a suburb
- Time taken (seconds) to run 100 m
-
Fluency
For each data type in Question 1 (parts a–h), state the most appropriate statistical display and give one reason for your choice.
-
Fluency
A frequency table shows the ages of guests at a party:
Age group Frequency 10–14 3 15–19 7 20–24 12 25–29 8 30–34 4 - How many guests attended the party in total?
- Which age group was most common?
- What percentage of guests were in the 20–24 age group? (Round to 1 decimal place.)
-
Fluency
A stem-and-leaf plot shows test scores for a class:
Stem Leaf 3 2 5 8 4 1 4 4 7 9 5 0 3 6 6 2 Key: 3|2 = 32
- What is the minimum score?
- What is the maximum score?
- How many values are in the dataset?
- List all the values in the 40s.
-
Understanding
A survey asks 20 students to name their favourite sport. Results: football ×7, basketball ×5, swimming ×4, tennis ×3, other ×1.
- Classify this data type.
- What is the most appropriate display for this data?
- Construct a frequency table including a “Relative Frequency (%)” column.
- Which display would not be suitable for this data? Explain why.
-
Understanding
Describe the shape of each distribution. Use terms such as symmetric, positively skewed, negatively skewed, bimodal or uniform.
- A histogram where frequencies rise gradually to a peak in the middle class, then decrease symmetrically.
- A stem-and-leaf plot where most values are in the upper stems (70s and 80s), with only a few in the lower stems.
- A histogram with two separate peaks at opposite ends of the scale.
- A dot plot with values spread evenly from 1 to 10, but one dot sitting far to the right at 25.
-
Understanding
A class of 25 students received the following scores on a test:
45, 52, 58, 62, 63, 65, 67, 68, 70, 71, 72, 72, 74, 75, 75, 76, 78, 80, 82, 84, 86, 88, 90, 92, 95
- Group the data into class intervals 40–49, 50–59, 60–69, 70–79, 80–89, 90–99 and construct a frequency table.
- Describe the shape of the distribution.
-
Understanding
Explain clearly:
- Why a histogram is not an appropriate display for categorical data.
- Why a bar chart is not appropriate for displaying continuous numerical data.
-
Problem Solving
A sample of 15 plants has the following heights (cm):
12, 13, 14, 14, 15, 16, 16, 17, 18, 18, 19, 20, 21, 22, 23
A second sample of 15 plants was grown under different conditions:
10, 11, 12, 13, 14, 15, 15, 16, 17, 17, 18, 19, 20, 21, 22
- Construct a back-to-back stem-and-leaf plot comparing the two samples. Use tens as the stems.
- Comment on the shape of each distribution.
- Make two observations comparing the two groups (consider typical value and spread).
-
Problem Solving
Design a statistical investigation to answer the question: “Do students who exercise more frequently achieve higher academic results?”
- Identify all the variables you would need to collect data on.
- Classify each variable (nominal, ordinal, discrete or continuous).
- Describe in detail how you would collect data. Include sample size and method.
- State the most appropriate display for each variable.
- Identify one potential source of bias in your investigation and explain how it could affect results.