Bernoulli Distributions
Key Terms
- A Bernoulli trial has exactly two outcomes: success (probability p) and failure (probability 1 − p)
- X = 1 (success) with probability p; X = 0 (failure) with probability 1 − p
- E(X) = p
- Var(X) = p(1 − p)
- SD(X) = √(p(1 − p))
- Maximum variance occurs at p = 0.5 (maximum uncertainty)
- The Bernoulli distribution is the building block for the Binomial distribution
P(X = 1) = p (success)
P(X = 0) = 1 − p (failure)
Mean: E(X) = p
Variance: Var(X) = p(1 − p)
Standard Deviation: SD(X) = √(p(1 − p))
Linear transformation: E(aX + b) = ap + b; Var(aX + b) = a²p(1 − p)
X ~ Bernoulli(0.7):
E(X) = p = 0.7
Var(X) = p(1 − p) = 0.7 × 0.3 = 0.21
SD(X) = √0.21 ≈ 0.458
What Is a Bernoulli Trial?
A Bernoulli trial is the simplest possible random experiment: one with exactly two outcomes. We call these outcomes success and failure — though in practice, success just means the outcome we are counting, which may not be something desirable. For example, in a quality control context, “success” might mean detecting a defective item.
The probability of success is p, where 0 ≤ p ≤ 1, and the probability of failure is therefore 1 − p. Every event that can be framed as a yes/no question with a fixed probability is a Bernoulli trial: a coin flip, testing whether a randomly selected voter supports a candidate, or checking whether a machine part is within tolerance.
The Bernoulli Random Variable
We assign numbers to the outcomes: X = 1 for success and X = 0 for failure. This numerical encoding is deliberate — it makes X a genuine random variable we can compute with. The probability function is:
P(X = 1) = p P(X = 0) = 1 − p
We write X ~ Bernoulli(p) to indicate this. Note that the distribution is entirely determined by the single parameter p.
Deriving the Mean and Variance
From the definition E(X) = ∑ x·P(X=x):
E(X) = 1 × p + 0 × (1−p) = p
For variance, first find E(X²). Since X only takes values 0 and 1, X² = X (squaring 0 or 1 gives the same value). Therefore E(X²) = E(X) = p.
Var(X) = E(X²) − [E(X)]² = p − p² = p(1 − p)
The variance is maximised when p = 0.5 (equal chance of success or failure — maximum uncertainty) and is zero when p = 0 or p = 1 (the outcome is certain).
The Bernoulli Distribution as a Building Block
The Bernoulli distribution is the foundation for the Binomial distribution. When you conduct n independent Bernoulli trials and count the total number of successes, you obtain a Binomial random variable. Understanding Bernoulli distributions thoroughly is therefore essential preparation for Binomial distributions.
When n independent Bernoulli(p) variables X1, X2, …, Xn are summed, the total Y = X1 + X2 + ··· + Xn has E(Y) = np and Var(Y) = np(1−p). These are the mean and variance of the Binomial distribution, derived directly from the Bernoulli properties and the independence rules for expectation and variance.
Mastery Practice
- Fluency A fair die is rolled. Success is defined as rolling a 3. Find p, P(X=1), and P(X=0).
- Fluency For a Bernoulli(0.3) trial, find E(X) and Var(X).
- Fluency A coin has P(H) = 0.6. X = 1 if heads, X = 0 if tails. Find P(X=1), E(X), and SD(X).
- Fluency For X ~ Bernoulli(p), verify that E(X²) = p and hence derive Var(X) = p(1 − p).
- Understanding Find the value of p that maximises Var(X) = p(1 − p) for a Bernoulli distribution. Justify using calculus or completing the square.
- Understanding Two independent Bernoulli(0.4) trials are conducted. Let Y = total number of successes. List all outcomes, and find P(Y=0), P(Y=1), P(Y=2).
- Understanding X ~ Bernoulli(0.7). Find E(3X − 1) and Var(3X − 1).
- Understanding In a production line, 5% of items are defective. You inspect one item. Define X and find its distribution, E(X), and Var(X).
- Problem Solving Prove that for X ~ Bernoulli(p), SD(X) ≤ 0.5, with equality if and only if p = 0.5.
- Problem Solving You play 3 independent Bernoulli(p) trials. Define Y = sum of outcomes. Show Y can take values 0, 1, 2, 3 and find P(Y = k) for k = 0, 1, 2, 3 in terms of p. What well-known distribution does Y follow?