Lesson 6: The Bradley-Terry Model — Why This Formula?

In Lesson 1 we introduced Elo's formula for predicting wins. But why that formula? Why not some other function of the rating difference? The answer comes from a 1952 statistical model by Ralph Bradley and Milton Terry.

The Setup

You have N players. You can't measure their skill directly — you can only observe pairwise outcomes. You want a model that:

  1. Assigns each player a single strength parameter
  2. Predicts win probability from those parameters
  3. Is internally consistent (if A > B > C, the probabilities should reflect that)
Bradley-Terry's answer: Give each player a positive strength parameter pi. The probability that i beats j is simply their share of the combined strength:
P(i beats j) = pi / (pi + pj)

That's it. The simplest possible "ratio model." If Alice has strength 3 and Bob has strength 1, Alice wins 3/(3+1) = 75% of the time.

From Ratios to Elo's Formula

The Bradley-Terry formula uses raw strength values. Elo uses ratings on a log scale. Here's the connection:

Step 1: Define the rating as the log of strength:

Ri = c × log₁₀(pi)

where c = 400 (Elo's scaling constant).

Step 2: Substitute into Bradley-Terry:

P(i beats j) = pi / (pi + pj)
  = 1 / (1 + pj/pi)
  = 1 / (1 + 10(Rj - Ri) / 400)

Result: Exactly the Elo expected score formula from Lesson 1.

Elo's formula IS the Bradley-Terry model expressed on a logarithmic rating scale. The logistic curve isn't arbitrary — it's the unique shape that emerges from "probability equals ratio of strengths."

Why Ratios? Three Justifications

1. The Exponential Race Argument

Imagine each player has a random "performance time" drawn from an exponential distribution. Player i's mean time is 1/pi (stronger = faster). The probability that i finishes before j:

P(i finishes first) = pi / (pi + pj)

This falls directly out of the math of exponential distributions. It means: if you believe performance is memoryless and strength scales the rate, Bradley-Terry is the only consistent model.

2. The Independence of Irrelevant Alternatives

The Bradley-Terry model satisfies a key axiom: adding or removing other players from the tournament doesn't change the predicted probability between any pair. The ratio pi/(pi + pj) depends only on i and j — not on who else exists.

This is exactly what you want for a rating system. Alice's chance of beating Bob shouldn't change just because Carol joined the ladder.

3. Log-Odds Are Linear

Take the log-odds of i beating j:

log(P(i beats j) / P(j beats i)) = log(pi/pj) = λi − λj

where λ = log(p). The log-odds of winning is simply the difference in log-strengths. This linearity is what makes ratings additive and interpretable: a 200-point gap always means the same thing, regardless of whether we're at 1200 vs 1400 or 2600 vs 2800.

Why not a normal (Gaussian) model? Elo originally assumed performances were normally distributed, not logistic. The formulas are nearly identical in practice (the logistic and normal CDFs differ by < 1% in the relevant range). Bradley-Terry uses the logistic because it's algebraically cleaner — you get closed-form expressions. Modern systems (Chatbot Arena, most implementations) use the logistic/Bradley-Terry form.

Sources: Bradley & Terry, "Rank Analysis of Incomplete Block Designs," Biometrika 39(3/4):324–345, 1952. Wikipedia: Bradley-Terry model.

Quick Check

Player A has strength p=5, Player B has strength p=3. What's P(A beats B)?

Why does Elo use ratings (log-scale) instead of raw Bradley-Terry strengths?

What does "independence of irrelevant alternatives" mean for a rating system?

Recommended Reading

The Wikipedia article on Bradley-Terry gives the formal likelihood function and connects to maximum likelihood estimation. For how this underpins modern LLM evaluation, see §2 of Chiang et al. (2024).