Lesson 6: The Bradley-Terry Model

In Lesson 1 we introduced Elo's formula for predicting wins. But why that formula? Why not some other function of the rating difference? The answer comes from a 1952 statistical model by Ralph Bradley and Milton Terry.

The Setup

You have N players. You can't measure their skill directly — you can only observe pairwise outcomes. You want a model that:

Bradley-Terry's answer: Give each player a positive strength parameter p_i. The probability that i beats j is simply their share of the combined strength:

That's it. The simplest possible "ratio model." If Alice has strength 3 and Bob has strength 1, Alice wins 3/(3+1) = 75% of the time.

From Ratios to Elo's Formula

The Bradley-Terry formula uses raw strength values. Elo uses ratings on a log scale. Here's the connection:

Step 1: Define the rating as the log of strength:

R_i = c × log₁₀(p_i)

where c = 400 (Elo's scaling constant).

Step 2: Substitute into Bradley-Terry:

P(i beats j) = p_i / (p_i + p_j)
= 1 / (1 + p_j/p_i)
= 1 / (1 + 10^{(R_j - R_i) / 400})

Result: Exactly the Elo expected score formula from Lesson 1.

Elo's formula IS the Bradley-Terry model expressed on a logarithmic rating scale. The logistic curve isn't arbitrary — it's the unique shape that emerges from "probability equals ratio of strengths."

Why Ratios? Three Justifications

1. The Exponential Race Argument

Imagine each player has a random "performance time" drawn from an exponential distribution. Player i's mean time is 1/p_i (stronger = faster). The probability that i finishes before j:

This falls directly out of the math of exponential distributions. It means: if you believe performance is memoryless and strength scales the rate, Bradley-Terry is the only consistent model.

2. The Independence of Irrelevant Alternatives

The Bradley-Terry model satisfies a key axiom: adding or removing other players from the tournament doesn't change the predicted probability between any pair. The ratio p_i/(p_i + p_j) depends only on i and j — not on who else exists.

This is exactly what you want for a rating system. Alice's chance of beating Bob shouldn't change just because Carol joined the ladder.

3. Log-Odds Are Linear

where λ = log(p). The log-odds of winning is simply the difference in log-strengths. This linearity is what makes ratings additive and interpretable: a 200-point gap always means the same thing, regardless of whether we're at 1200 vs 1400 or 2600 vs 2800.

Why not a normal (Gaussian) model? Elo originally assumed performances were normally distributed, not logistic. The formulas are nearly identical in practice (the logistic and normal CDFs differ by < 1% in the relevant range). Bradley-Terry uses the logistic because it's algebraically cleaner — you get closed-form expressions. Modern systems (Chatbot Arena, most implementations) use the logistic/Bradley-Terry form.

Sources: Bradley & Terry, "Rank Analysis of Incomplete Block Designs," Biometrika 39(3/4):324–345, 1952. Wikipedia: Bradley-Terry model.

Quick Check

Player A has strength p=5, Player B has strength p=3. What's P(A beats B)?

Why does Elo use ratings (log-scale) instead of raw Bradley-Terry strengths?

What does "independence of irrelevant alternatives" mean for a rating system?

Lesson 6: The Bradley-Terry Model — Why This Formula?