Elo Rating System — Quick Reference

Core Formulas

Expected score (probability A beats B):

E_A = 1 / (1 + 10^{(R_B - R_A) / 400})

Rating update after a game:

R'_A = R_A + K × (S_A - E_A)

Where S = 1 (win), 0.5 (draw), 0 (loss).

K	Use case
10	Top-level established players (low volatility)
20	Standard competitive play
32	General purpose / Chatbot Arena / casual ladders
40–64	Provisional / new entrants (fast convergence)

Elo rating: A number representing relative skill, meaningful only in comparison to others in the same pool.
K-factor: Maximum points gained/lost per game. Controls how fast ratings react.
Expected score (E): Predicted probability of winning, based on the rating difference.
Zero-sum: Points gained by the winner = points lost by the loser. Total rating in the pool is conserved.
Convergence: The process by which ratings approach a player's "true" skill after sufficient games.
Provisional rating: A rating during the first N games, often computed with higher K for faster adjustment.
Bootstrap confidence interval: Statistical technique (used by Chatbot Arena) to estimate uncertainty in a rating by resampling the data.
Rating inflation/deflation: Systematic drift in average rating over time, caused by players entering/leaving the pool.