Elo Rating System — Quick Reference
Core Formulas
Expected score (probability A beats B):
EA = 1 / (1 + 10(RB - RA) / 400)
Rating update after a game:
R'A = RA + K × (SA - EA)
Where S = 1 (win), 0.5 (draw), 0 (loss).
Rating Difference → Win Probability
| Difference | Stronger player wins |
| 0 | 50% |
| 100 | 64% |
| 200 | 76% |
| 300 | 85% |
| 400 | 91% |
| 500 | 95% |
K-Factor Guidelines
| K | Use case |
| 10 | Top-level established players (low volatility) |
| 20 | Standard competitive play |
| 32 | General purpose / Chatbot Arena / casual ladders |
| 40–64 | Provisional / new entrants (fast convergence) |
Ping Pong Ladder Quick-Start
- Everyone starts at 1500
- Use K = 40 for first 10 games per player, then K = 32
- Expect stable ratings after ~20 games per person
- Update after every match: winner gains, loser loses (zero-sum)
Glossary
- Elo rating
- A number representing relative skill, meaningful only in comparison to others in the same pool.
- K-factor
- Maximum points gained/lost per game. Controls how fast ratings react.
- Expected score (E)
- Predicted probability of winning, based on the rating difference.
- Zero-sum
- Points gained by the winner = points lost by the loser. Total rating in the pool is conserved.
- Convergence
- The process by which ratings approach a player's "true" skill after sufficient games.
- Provisional rating
- A rating during the first N games, often computed with higher K for faster adjustment.
- Bootstrap confidence interval
- Statistical technique (used by Chatbot Arena) to estimate uncertainty in a rating by resampling the data.
- Rating inflation/deflation
- Systematic drift in average rating over time, caused by players entering/leaving the pool.