You've learned the mechanics of Elo. This lesson connects those mechanics to a deeper framework: probability as inference, as championed by E.T. Jaynes in Probability Theory: The Logic of Science.
Under this view, when we say "Player A has a 76% chance of beating Player B," we are not claiming some physical mechanism produces wins 76% of the time. We are saying: given what we know (their rating histories), 76% is our best-justified degree of belief.
An Elo rating is not measuring a physical property of a player — like their height or reaction time. It is our best estimate of their latent skill given the outcomes we have observed.
"Player A's true skill is 1600. We measure it through games."
"Given 40 games of evidence, our rational belief about A's skill is centred at 1600."
The Jaynesian framing is more honest: a different set of opponents, or a different sequence of games, would produce a different number. The rating is a property of our state of knowledge, not of the player alone.
Compare these two operations side by side:
They have identical structure:
| Bayesian concept | Elo equivalent |
|---|---|
| Prior belief | Current rating R |
| Likelihood of evidence | Logistic model (the E formula) |
| Surprise (data − prediction) | S − E |
| Learning rate / prior weakness | K-factor |
| Posterior belief | New rating R' |
The Elo expected-score formula:
is a likelihood function: it answers "given assumed skills RA and RB, how probable is A's win?" This is precisely what Bayes' theorem needs as input. The formal statistical model is called the Bradley-Terry model (1952) — it's maximum-likelihood estimation of latent strength parameters from pairwise comparisons.
Source: Bradley & Terry, "Rank Analysis of Incomplete Block Designs" (1952), Biometrika 39(3/4):324–345.
Elo is a point estimate — a single number. A rigorous Jaynesian analysis would maintain a full probability distribution over each player's skill:
Elo says: "Alice is rated 1600."
Full Bayesian says: "Our belief about Alice's skill is a distribution centred at 1600 with standard deviation 45 — meaning we think there's a 68% chance her true skill is between 1555 and 1645."
Systems that do this properly:
These are all attempts to do what Jaynes would insist on: report your uncertainty, not just your best guess.
Under the Jaynesian view, what does an Elo rating represent?
What does K-factor correspond to in Bayesian inference?
What does full Bayesian Elo (e.g., TrueSkill) provide that classical Elo doesn't?
Chapter 1 of Jaynes' Probability Theory: The Logic of Science (the "plausible reasoning" chapter) lays the philosophical groundwork. For the formal connection to rating systems, see Glickman's "The Glicko System" paper, which explicitly derives ratings as Bayesian estimation.
Next: The Bradley-Terry Model — Why This Formula? →
Questions? Ask your Copilot agent. This is deep territory — happy to explore further.