Skip to content

Tournaments

Jacob Marshall edited this page Jan 21, 2024 · 1 revision

Inspired in part by Tom Murphy's Elo World, tournaments provide a way to compare algorithms in agents in terms of their Elo Rating. Elo ratings are generated by playing a round-robin tournament, with every competitor playing each other competitor N-times. After the game outcomes are collected, the results are shuffled, and rating are adjusted according to the following:

$$ E_{A} = 1 / (1 + 10^{(R_{B} - R_{A})/400}) $$

$$ E_{B} = 1 / (1 + 10^{(R_{A} - R_{B})/400}) $$

$$ R_{A}' = R_{A} + K (S_{A} - E_{A}) $$

$$ R_{B}' = R_{B} + K (S_{B} - E_{B}) $$

$E_{A}$ & $E_{B}$ give the expected game result for player $A$ and player $B$, given the ratings of each player: $R_{A}$ & $R_{B}$.

$R_{A}'$ & $R_{B}'$ provide the updated ratings for each player, given game results $S_{A}$ and $S_{B}$ (1.0 for a win, 0.0 for a loss, 0.5 for a draw), expected results $E_{A}$ and $E_{B}$, and constant K=16.

All players ratings are initialized to 1500. The tournament simulation is run n_tournaments times, in order to provide a better approximation of strength, as the order of games played can have a large effect on a competitor's final rating.

If running in a notebook, a matchup heatmap is generated, showing the win rate of a competitor against each of its opponents, ordered by Elo rating:

heatmap

Tournament Parameters

  • n_games: number of games to play between each competitor in the tournament
  • n_tournaments: number of times to simulate the tournament (as Elo scores can greatly vary depending on game simulation order)
  • tournament_name: name assigned to results file
  • competitors: list of competitor objects of the format:
    • name: unique identifier assigned to competitor
    • algo_config: evaluator configuration
    • [when applicable] checkpoint: path to checkpoint file to load the competitor's model from, when applicable

Example Configurations

Clone this wiki locally