Lesson 1: The Core Idea Behind Elo

Lesson 2 →

You want to rank players (or LLMs, or ping pong colleagues) — but you have no absolute measuring stick. All you have is who beat whom.

Key Insight: Elo assumes every player has a hidden "true strength" number. The difference between two players' numbers predicts the probability that one will beat the other.

The Model

Arpad Elo proposed: if Player A has rating R_A and Player B has rating R_B, then A's expected probability of winning is:

E_A = 1 / (1 + 10^{(R_B - R_A) / 400})

That's it. The entire system flows from this one formula.

What the Numbers Mean

Equal ratings (difference = 0): each player has a 50% chance of winning.
+400 difference: the stronger player is expected to win ~91% of the time.
+200 difference: the stronger player wins ~76% of the time.

Source: Wikipedia — Elo rating system; Elo, The Rating of Chessplayers (1978), Ch. 2.

Why This Solves Your Problem

In a ping pong ladder or an LLM arena, you never measure "absolute skill." You only observe: A beat B. Elo converts these pairwise outcomes into a single number per player that is consistent — if A usually beats B, and B usually beats C, then A's rating will be higher than C's, even if A and C never play.

No fixed criteria needed. The ratings are entirely relative. They emerge from wins and losses alone. That's why Elo works for chess, ping pong, and LLM comparison — anywhere you can ask "which of these two is better?" without needing to define "better" on an absolute scale.

Quick Check

Player X has rating 1600. Player Y has rating 1400. What's the rating difference from X's perspective?

Two players have identical ratings. What's the expected win probability for each?

Lesson 1: The Core Idea Behind Elo

The Model

What the Numbers Mean

Why This Solves Your Problem

Quick Check

Recommended Reading