Elo Rating System — Quick Reference

Core Formulas

Expected score (probability A beats B):

EA = 1 / (1 + 10(RB - RA) / 400)

Rating update after a game:

R'A = RA + K × (SA - EA)

Where S = 1 (win), 0.5 (draw), 0 (loss).

Rating Difference → Win Probability

DifferenceStronger player wins
050%
10064%
20076%
30085%
40091%
50095%

K-Factor Guidelines

KUse case
10Top-level established players (low volatility)
20Standard competitive play
32General purpose / Chatbot Arena / casual ladders
40–64Provisional / new entrants (fast convergence)

Ping Pong Ladder Quick-Start

  1. Everyone starts at 1500
  2. Use K = 40 for first 10 games per player, then K = 32
  3. Expect stable ratings after ~20 games per person
  4. Update after every match: winner gains, loser loses (zero-sum)

Glossary

Elo rating
A number representing relative skill, meaningful only in comparison to others in the same pool.
K-factor
Maximum points gained/lost per game. Controls how fast ratings react.
Expected score (E)
Predicted probability of winning, based on the rating difference.
Zero-sum
Points gained by the winner = points lost by the loser. Total rating in the pool is conserved.
Convergence
The process by which ratings approach a player's "true" skill after sufficient games.
Provisional rating
A rating during the first N games, often computed with higher K for faster adjustment.
Bootstrap confidence interval
Statistical technique (used by Chatbot Arena) to estimate uncertainty in a rating by resampling the data.
Rating inflation/deflation
Systematic drift in average rating over time, caused by players entering/leaving the pool.