Mean, Variance and Standard Deviation

Key Terms

Expected value (mean): E(X) = ∑ x · P(X = x)
E(X²): E(X²) = ∑ x² · P(X = x)
Variance: Var(X) = E(X²) − [E(X)]²
Standard deviation: SD(X) = √Var(X)
Linear transformations: E(aX + b) = aE(X) + b
Variance of linear transformation: Var(aX + b) = a²Var(X); For independent X and Y: E(X + Y) = E(X) + E(Y); Var(X + Y) = Var(X) + Var(Y)

Mean: E(X) = ∑ x · P(X = x)

E(X²): ∑ x² · P(X = x)

Variance: Var(X) = E(X²) − [E(X)]²

Standard Deviation: SD(X) = √Var(X)

Linear transformations (a, b constants):
E(aX + b) = aE(X) + b
Var(aX + b) = a² Var(X)
SD(aX + b) = |a| · SD(X)

Worked Example: X has the distribution: P(X=1)=0.2, P(X=2)=0.5, P(X=3)=0.3. Find E(X), Var(X), SD(X).

E(X): 1(0.2) + 2(0.5) + 3(0.3) = 0.2 + 1.0 + 0.9 = 2.1

E(X²): 1²(0.2) + 2²(0.5) + 3²(0.3) = 0.2 + 2.0 + 2.7 = 4.9

Var(X): E(X²) − [E(X)]² = 4.9 − (2.1)² = 4.9 − 4.41 = 0.49

SD(X): √0.49 = 0.7

Hot Tip: Var(aX + b) = a²Var(X) — note that adding a constant b does not change the variance! Shifting a distribution left or right does not change its spread. Only scaling (multiplying by a) affects the variance, and it scales by a². This is one of the most common exam tricks.

The Expected Value: A Weighted Average

The expected value E(X) is the long-run average value of a random variable — if you repeated the random experiment many times and averaged the outcomes, the average would approach E(X). Importantly, E(X) is not necessarily a value that X can actually take; it is a theoretical average.

The formula E(X) = ∑ x · P(X = x) is a weighted average of all possible values of X, where each value is weighted by its probability. Values that occur more often contribute more heavily to the average.

For example, if a game pays $10 with probability 0.1 and $1 with probability 0.9, the expected payout is 10(0.1) + 1(0.9) = $1.90. Over many plays, you would average $1.90 per game.

Variance: Measuring Spread

The variance Var(X) measures how spread out the distribution is around the mean. A high variance means outcomes vary widely from the mean; a low variance means they cluster close to the mean.

The defining formula is Var(X) = E[(X − μ)²], the expected squared deviation from the mean. However, the computational formula is far easier:

Var(X) = E(X²) − [E(X)]²

You compute E(X) and E(X²) separately from the distribution table, then subtract the square of the mean from the mean of the squares. This formula is algebraically equivalent to the defining formula but avoids having to compute deviations from the mean for each outcome.

Standard Deviation

The standard deviation SD(X) = √Var(X) is simply the square root of the variance. It is expressed in the same units as X, making it easier to interpret than variance. If X is measured in dollars, SD(X) is in dollars too — variance would be in dollars squared.

Linear Transformations of a Random Variable

Suppose you transform X into Y = aX + b (scaling by a and shifting by b). The key rules are:

E(aX + b) = aE(X) + b: The mean scales and shifts in the same way as the transformation. This makes intuitive sense — if you multiply every outcome by 2 and add 3, the average also multiplies by 2 and gains 3.
Var(aX + b) = a²Var(X): Adding a constant b does not change the variance. Shifting all outcomes by the same amount does not change how spread out they are. Scaling by a multiplies the standard deviation by |a|, so variance multiplies by a².

These rules are essential for standardising random variables and solving problems where you need to find the distribution of a transformed variable.

Combining Independent Random Variables

For two independent random variables X and Y:

E(X + Y) = E(X) + E(Y) and E(X − Y) = E(X) − E(Y)
Var(X + Y) = Var(X) + Var(Y) and Var(X − Y) = Var(X) + Var(Y)

Note that variances always add for independent variables, whether you are adding or subtracting the variables. This is because variance measures squared deviation — subtracting Y is the same as adding −Y, and Var(−Y) = (−1)²Var(Y) = Var(Y).

Exam strategy: In a mean/variance question, always set up a column for x, P(X=x), x·P(X=x), and x²·P(X=x). Sum the third column for E(X) and the fourth column for E(X²). Then Var(X) = E(X²) − [E(X)]². Show each column clearly — this earns method marks even if you make an arithmetic error.

Mastery Practice

Fluency X has the distribution: P(X=0)=0.2, P(X=1)=0.5, P(X=2)=0.3. Find E(X).
Fluency Using the distribution from Q1 (P(X=0)=0.2, P(X=1)=0.5, P(X=2)=0.3), find Var(X) and SD(X).
Fluency If E(X) = 4 and Var(X) = 9, find E(3X − 2) and Var(3X − 2).
Fluency X has the distribution: P(X=−1)=1/4, P(X=0)=1/2, P(X=1)=1/4. Find E(X) and E(X²).
Understanding X has the distribution: P(X=0)=0.3, P(X=1)=0.4, P(X=2)=0.2, P(X=3)=0.1. Find E(X), Var(X), and SD(X).
Understanding A game pays $5 if you roll a 6 on a fair die, and you lose $1 otherwise. Find the expected profit per game.
Understanding X takes values 1, 2, 3 with P(X = k) = k/6 for k = 1, 2, 3. Find E(X), E(X²), and Var(X).
Understanding X and Y are independent with E(X)=3, Var(X)=2, E(Y)=5, Var(Y)=4. Using E(X+Y)=E(X)+E(Y) and Var(X+Y)=Var(X)+Var(Y), find E(2X−Y) and Var(2X−Y).
Problem Solving Lottery tickets cost $10 each. Out of 1000 tickets: one wins $1000, five win $50, and the rest win nothing. Find the expected net gain per ticket.
Problem Solving X has E(X) = μ and Var(X) = σ². Prove that E[(X − μ)²] = σ² by expanding the square and using the formula Var(X) = E(X²) − [E(X)]².

See Answers ➔