Lesson 3: Convergence — How Ratings Stabilize

You now know the formula and the update rule. But does the system actually work? Do ratings converge to something meaningful, or do they just bounce around forever?

The Self-Correcting Mechanism

Built-in negative feedback: If your rating is too high relative to your true skill, you'll lose more often than predicted → your rating drops. If too low, you'll win more often → rating rises. The system always pushes ratings toward a stable equilibrium.

This happens because (S - E) has a sign:

How Many Games to Stabilize?

This depends on K and the rating pool:

Rule of thumb: With K = 32, a player's rating typically stabilizes within 20–30 games against opponents of varied strength. FIDE uses a "provisional" label for the first 30 rated games.

For a small ping pong ladder (8–12 people), expect ratings to feel "right" after everyone has played roughly 15–20 matches each.

The Starting Rating Problem

Everyone has to start somewhere. Common approaches:

  1. Fixed start (e.g., everyone begins at 1500). Simple, but early games produce wild swings as the system figures out who's actually good.
  2. Provisional period with high K. New players use K = 40–64 for their first N games, then drop to K = 32. This lets them find their level quickly without permanently distorting others' ratings.
  3. Placement matches. Play several games before assigning a rating. Chatbot Arena does this — a new model is matched broadly at first to quickly estimate its strength.
For your ping pong ladder: Start everyone at 1500 with K = 40 for the first 10 games, then switch to K = 32. Ratings will feel reasonable after 2–3 weeks of regular play.

When Elo Doesn't Stabilize Well

Watch out for these failure modes:

Source: Wikipedia — Ratings inflation; Chiang et al. (2024) §3 on convergence in Chatbot Arena.

Quick Check

A player's rating is much higher than their true skill. Over many games, what happens?

In a 10-person ping pong ladder, what's the main risk of starting everyone at 1500 with K = 10?

Recommended Reading

The Yi Zhu blog post on Chatbot Arena Elo covers how LMSYS handles convergence and bootstrapping for new models being added to an existing leaderboard.