Bernoulli Distributions

Key Terms

A Bernoulli trial has exactly two outcomes: success (probability p) and failure (probability 1 − p); X = 1 (success) with probability p; X = 0 (failure) with probability 1 − p
E(X) = p
Var(X) = p(1 − p)
SD(X) = √(p(1 − p)): Maximum variance occurs at p = 0.5 (maximum uncertainty); The Bernoulli distribution is the building block for the Binomial distribution

Bernoulli Distribution X ~ Bernoulli(p):
P(X = 1) = p (success)
P(X = 0) = 1 − p (failure)

Mean: E(X) = p
Variance: Var(X) = p(1 − p)
Standard Deviation: SD(X) = √(p(1 − p))

Linear transformation: E(aX + b) = ap + b; Var(aX + b) = a²p(1 − p)

Worked Example: A coin has P(H) = 0.7. Define X = 1 if heads, X = 0 if tails. Find E(X), Var(X), SD(X).

X ~ Bernoulli(0.7):
E(X) = p = 0.7
Var(X) = p(1 − p) = 0.7 × 0.3 = 0.21
SD(X) = √0.21 ≈ 0.458

Hot Tip: The Bernoulli mean is simply p, and the variance is p(1 − p). You can derive these directly: E(X) = 1(p) + 0(1−p) = p. For variance: E(X²) = 1²(p) + 0²(1−p) = p, so Var(X) = p − p² = p(1−p). Memorise these — they appear repeatedly in Binomial questions.

What Is a Bernoulli Trial?

A Bernoulli trial is the simplest possible random experiment: one with exactly two outcomes. We call these outcomes success and failure — though in practice, success just means the outcome we are counting, which may not be something desirable. For example, in a quality control context, “success” might mean detecting a defective item.

The probability of success is p, where 0 ≤ p ≤ 1, and the probability of failure is therefore 1 − p. Every event that can be framed as a yes/no question with a fixed probability is a Bernoulli trial: a coin flip, testing whether a randomly selected voter supports a candidate, or checking whether a machine part is within tolerance.

The Bernoulli Random Variable

We assign numbers to the outcomes: X = 1 for success and X = 0 for failure. This numerical encoding is deliberate — it makes X a genuine random variable we can compute with. The probability function is:

P(X = 1) = p P(X = 0) = 1 − p

We write X ~ Bernoulli(p) to indicate this. Note that the distribution is entirely determined by the single parameter p.

Deriving the Mean and Variance

From the definition E(X) = ∑ x·P(X=x):

E(X) = 1 × p + 0 × (1−p) = p

For variance, first find E(X²). Since X only takes values 0 and 1, X² = X (squaring 0 or 1 gives the same value). Therefore E(X²) = E(X) = p.

Var(X) = E(X²) − [E(X)]² = p − p² = p(1 − p)

The variance is maximised when p = 0.5 (equal chance of success or failure — maximum uncertainty) and is zero when p = 0 or p = 1 (the outcome is certain).

The Bernoulli Distribution as a Building Block

The Bernoulli distribution is the foundation for the Binomial distribution. When you conduct n independent Bernoulli trials and count the total number of successes, you obtain a Binomial random variable. Understanding Bernoulli distributions thoroughly is therefore essential preparation for Binomial distributions.

When n independent Bernoulli(p) variables X₁, X₂, …, X_n are summed, the total Y = X₁ + X₂ + ··· + X_n has E(Y) = np and Var(Y) = np(1−p). These are the mean and variance of the Binomial distribution, derived directly from the Bernoulli properties and the independence rules for expectation and variance.

Exam strategy: When a question describes a single yes/no experiment with a fixed probability, immediately identify it as a Bernoulli trial. Define your random variable (X = 1 for success, X = 0 for failure), state the value of p, then write down E(X) = p and Var(X) = p(1−p) directly. You do not need to rederive these formulas each time.

Mastery Practice

Fluency A fair die is rolled. Success is defined as rolling a 3. Find p, P(X=1), and P(X=0).
Fluency For a Bernoulli(0.3) trial, find E(X) and Var(X).
Fluency A coin has P(H) = 0.6. X = 1 if heads, X = 0 if tails. Find P(X=1), E(X), and SD(X).
Fluency For X ~ Bernoulli(p), verify that E(X²) = p and hence derive Var(X) = p(1 − p).
Understanding Find the value of p that maximises Var(X) = p(1 − p) for a Bernoulli distribution. Justify using calculus or completing the square.
Understanding Two independent Bernoulli(0.4) trials are conducted. Let Y = total number of successes. List all outcomes, and find P(Y=0), P(Y=1), P(Y=2).
Understanding X ~ Bernoulli(0.7). Find E(3X − 1) and Var(3X − 1).
Understanding In a production line, 5% of items are defective. You inspect one item. Define X and find its distribution, E(X), and Var(X).
Problem Solving Prove that for X ~ Bernoulli(p), SD(X) ≤ 0.5, with equality if and only if p = 0.5.
Problem Solving You play 3 independent Bernoulli(p) trials. Define Y = sum of outcomes. Show Y can take values 0, 1, 2, 3 and find P(Y = k) for k = 0, 1, 2, 3 in terms of p. What well-known distribution does Y follow?

See Answers ➔