Lesson 5: Elo as Bayesian Inference

You've learned the mechanics of Elo. This lesson connects those mechanics to a deeper framework: probability as inference, as championed by E.T. Jaynes in Probability Theory: The Logic of Science.

Jaynes' Core Thesis

Probability is not a frequency. It is a degree of plausibility assigned by a reasoning agent, given the information available. Probabilities live in the map (our model), not the territory (reality).

Under this view, when we say "Player A has a 76% chance of beating Player B," we are not claiming some physical mechanism produces wins 76% of the time. We are saying: given what we know (their rating histories), 76% is our best-justified degree of belief.

Elo Ratings Are Beliefs

An Elo rating is not measuring a physical property of a player — like their height or reaction time. It is our best estimate of their latent skill given the outcomes we have observed.

Frequentist framing

"Player A's true skill is 1600. We measure it through games."

Jaynesian framing

"Given 40 games of evidence, our rational belief about A's skill is centred at 1600."

The Jaynesian framing is more honest: a different set of opponents, or a different sequence of games, would produce a different number. The rating is a property of our state of knowledge, not of the player alone.

The Update Rule Is Bayesian

Bayes' theorem (conceptual): posterior = prior + learning_rate × (evidence − prediction) Elo update: R' = R + K × (S − E)

Bayesian concept	Elo equivalent
Prior belief	Current rating R
Likelihood of evidence	Logistic model (the E formula)
Surprise (data − prediction)	S − E
Learning rate / prior weakness	K-factor
Posterior belief	New rating R'

K-factor is the strength of your prior. High K says "I don't trust my current estimate — let new evidence dominate." Low K says "I'm fairly confident already — resist large revisions." This is exactly the Bayesian tradeoff between prior conviction and likelihood weight.

The Likelihood Function

is a likelihood function: it answers "given assumed skills R_A and R_B, how probable is A's win?" This is precisely what Bayes' theorem needs as input. The formal statistical model is called the Bradley-Terry model (1952) — it's maximum-likelihood estimation of latent strength parameters from pairwise comparisons.

Source: Bradley & Terry, "Rank Analysis of Incomplete Block Designs" (1952), Biometrika 39(3/4):324–345.

Where Elo Falls Short of Full Jaynes

Elo is a point estimate — a single number. A rigorous Jaynesian analysis would maintain a full probability distribution over each player's skill:

Elo says: "Alice is rated 1600."

Full Bayesian says: "Our belief about Alice's skill is a distribution centred at 1600 with standard deviation 45 — meaning we think there's a 68% chance her true skill is between 1555 and 1645."

These are all attempts to do what Jaynes would insist on: report your uncertainty, not just your best guess.

Jaynes' robot: In Probability Theory, Jaynes imagines a reasoning robot that must assign consistent plausibilities given its information. An Elo system is such a robot — it takes game outcomes as input and produces the most rational belief about relative skill. The only sin is that classical Elo discards the uncertainty, keeping only the mean.

The Deep Shared Insight

Both Jaynes and Elo answer the same question: How do you build a consistent, self-correcting model of something you cannot directly observe, using only indirect evidence?

Jaynes: assign priors, update with evidence, remain coherent.
Elo: assign ratings, update with game results, let surprises drive learning.

They are the same operation at different levels of mathematical rigour.

Quick Check

Under the Jaynesian view, what does an Elo rating represent?

What does K-factor correspond to in Bayesian inference?

What does full Bayesian Elo (e.g., TrueSkill) provide that classical Elo doesn't?