Lesson 1: The Core Idea Behind Elo

You want to rank players (or LLMs, or ping pong colleagues) — but you have no absolute measuring stick. All you have is who beat whom.

Key Insight: Elo assumes every player has a hidden "true strength" number. The difference between two players' numbers predicts the probability that one will beat the other.

The Model

Arpad Elo proposed: if Player A has rating RA and Player B has rating RB, then A's expected probability of winning is:

EA = 1 / (1 + 10(RB - RA) / 400)

That's it. The entire system flows from this one formula.

What the Numbers Mean

Source: Wikipedia — Elo rating system; Elo, The Rating of Chessplayers (1978), Ch. 2.

Why This Solves Your Problem

In a ping pong ladder or an LLM arena, you never measure "absolute skill." You only observe: A beat B. Elo converts these pairwise outcomes into a single number per player that is consistent — if A usually beats B, and B usually beats C, then A's rating will be higher than C's, even if A and C never play.

No fixed criteria needed. The ratings are entirely relative. They emerge from wins and losses alone. That's why Elo works for chess, ping pong, and LLM comparison — anywhere you can ask "which of these two is better?" without needing to define "better" on an absolute scale.

Quick Check

Player X has rating 1600. Player Y has rating 1400. What's the rating difference from X's perspective?

Two players have identical ratings. What's the expected win probability for each?

Recommended Reading

For the full picture, read the Wikipedia article on Elo — particularly the "Mathematical details" section. It's clear, well-sourced, and covers the formula derivation in more depth.