Optimising an AI Player

Optimising an AI Player

The framework has automatic optimisation facilities built in (and there are plans to add more at some point in the future). To use these facilities there are two core components to understand, and one optional component:

The optimisation algorithm used is NTBEA (N-Tuple Bandit Evolutionary Optimisation), and full details of this can be found in the following paper:

Lucas, Simon M., Jialin Liu, and Diego Perez-Liebana. 2018. ‘The N-Tuple Bandit Evolutionary Algorithm for Game Agent Optimisation’. In IEEE Congress on Evolutionary Computation (CEC). Rio de Janeiro. https://doi.org/10.1109/CEC.2018.8477869.


A set of parameters to be optimised must implement the ITunableParameters interface. To make this as easy as possible, the following implementation hierarchy is useful:

  • ITunableParameters The main interface.
    • TunableParameters This is an abstract class.
      • PlayerParameters This is an abstract class that includes parameters settings around computational budgets for players - whether this is by ForwardModel calls, wall-clock time or number of iterations. Two example concrete sub-classes of this are:
        • MCTSParameters for Monte Carlo Tree Search (MCTSPlayer)
        • RMHCParameters for Random Mutation Hill Climbing (RMHCPlayer)

For the detail of the interface methods have a look at the code; the emphasis of this page is to explain how to quickly add new tunable parameters to an AI agent, and to then tune it for a specific game or environment. This is most easily shown by example. In the code below we add four new parameters (one boolean, one int, one double, one enum) to MCTSParams to show what needs to be added to make them tunable (and why). String parameters are also supported, although we’d usually recommend using an enum instead.

Step one is to add these as instance properties to the parameters class as normal, and then use these properties in your AI code as needed.

public class MCTSParams extends PlayerParameters {

  public double K = Math.sqrt(2);
  public int maxTreeDepth = 10;
  public boolean openLoop = false;
  public MCTSEnums.TreePolicy treePolicy = UCB;

Step two is to mark these as tunable in the parameter class constructor. The addTunableParameter() method inherited from TunableParameters will do all the internal record-keeping. This takes three arguments:

  1. The name of the parameter (a String)
  2. The default value of the parameter (can be any of int, enum, double, String, boolean)
  3. [Optionally] A List of the standard settings that this value could be tuned to.

In the example below treePolicy is set to a single value (it is still tunable, as will be explained later). The other three have between 2 and 6 possible values they can be tuned between (by default).

    public MCTSParams(long seed) {
        addTunableParameter("K", Math.sqrt(2), Arrays.asList(0.0, 0.1, 1.0, Math.sqrt(2), 3.0, 10.0));
        addTunableParameter("maxTreeDepth", 10, Arrays.asList(1, 3, 10, 30));
        addTunableParameter("openLoop", false, Arrays.asList(false, true));
        addTunableParameter("treePolicy", UCB);

Step three is to implement a _reset() method on the parameters class to update the main class instance properties with a tuned set. This method is called by the tuner to initialise a new set of parameters (see ParameterSearch in the next section).

    public void _reset() {
        super._reset(); // This is important to ensure that PlayerParameters are also picked up (if being co-tuned)
        K = (double) getParameterValue("K");
        maxTreeDepth = (int) getParameterValue("maxTreeDepth");
        openLoop = (boolean) getParameterValue("openLoop");
        treePolicy = (MCTSEnums.TreePolicy) getParameterValue("treePolicy");

And that’s it. These four new parameters are all now set up to be accessible for auto-tuning using ParameterSearch.

There is a step four if you are creating a completely new type of agent with a new class that inherits from PlayerParameters - this is to implement the instantiate() method of ITunableParameters, which returns a fully formed AbstractPlayer with the current parameter settings. Have a look at any of the existing concrete sub-classes mentioned at the start for guidelines here.


The ParameterSearch class provides the main access point for optimising an agent. It is designed to be usable via the command line, and requires a minimum of three arguments:

  1. Either the class name of the parameters to be optimised, or a JSON file defining the search space (see JSON section). For example:
    • players.mcts.MCTSParams
    • config\NTBEA\MCTSSearchSpace.json
  2. Number of NTBEA iterations to run.
  3. The game to optimise for. This is a String that matches a GameType enumeration. For example ‘LoveLetter’, or ‘Uno’.

This can be run via an IDE, or a ParameterSearch-jar-with-dependencies.jar is generated by the maven build. There are then a multitude of other options available - full details can be found here, in the code, or by executing:

java -jar ParameterSearch-jar-with-dependencies.jar --help

When run this will log the best set of parameters found for the agent to maximise the objective on the specified game (this objective can be the win rate, the game score, or the ordinal position in the game - see above documentation references for details).

Ths most important options we would always recommend you use are:

  • evalGames=n, where n is the number of games to run the final ‘best’ setting on. This will then provide an estimate of the true win rate / score of the chosen setting.
  • repeat=r, where r is the number of independent NTBEA runs to execute. Any one NTBEA run can give a poor recommendation, and given a budget of X total iterations is is empirically better to run 10 runs, each using X/10 iterations. (See this paper for detailed experiments on test domains that show this.)

Other important options allow you to specify the number of players, the precise opponents to play against, and verbose logging to get a feel for the fitness landscape.

JSON Search Space

As reviewed in the TunableParameter section, a default set of values is defined when a parameter is coded. However there are many situations when it is desirable to change this set without needing to change the code; for example if you want to fix several parameters, and then tune one or two on a much finer-grain.

For these situations a JSON file can be used to define the search space used by ParameterSearch (as the first argument). A detailed example is shown below.

	"K" : [0.01, 0.1, 1.0, 10.0, 100.0],
	"rolloutLength" : [0, 5, 10, 20, 50, 100],
	"maxTreeDepth" : [1, 3, 10, 30, 100],
	"rolloutType" : "RANDOM",
	"budgetType" : "BUDGET_TIME",
	"information" : ["Closed_Loop", "Open_Loop", "Information_Set"],
	"selectionPolicy" : ["ROBUST", "SIMPLE"],
	"treePolicy" : ["UCB", "EXP3", "AlphaGo", "RegretMatching"],
	"opponentTreePolicy" : ["SelfOnly", "Paranoid"],
	"fmCallsBudget" : 4000,
	"budget" : 40,
	"epsilon" : 1e-6,
	"breakMS" : 0,
	"exploreEpsilon" : [0.01, 0.03, 0.1, 0.3]

The most important entry is the class attribute, that defines a sub-class of TunableParameters.

Each further entry in the main object is for one tunable parameter. It must have the same name as the parameter (and any unknown names will be reported by ParameterSearch to try and spot unwitting typing errors). The value is then either a single value - which is then fixed for the optimisation run - or an array of all the values to be considered. An array can hold values of one of the core JSON types: Integer, Double, Boolean, String. This copes with parameters that are an enum by entering the different values as Strings. These are converted to the enum of the same exact name (using Enum.valueOf()); see treePolicy in the above snippet for an example.

Heuristic tuning and recursive search spaces

The json below indicates how a heuristic can be tuned alongside MCTS parameters. In this case one of the parameters of MCTS (heuristic) is itself a TunableParameter represented by a JSONObject.

	"K" : 1.0,
	"rolloutLength" : [0, 10, 100],
	"maxTreeDepth" : 10,
	"rolloutType" : "RANDOM",
	"treePolicy" : "UCB"
	"opponentTreePolicy" : "SelfOnly"
  	"information" : "Information_Set",
	"budgetType" : "BUDGET_TIME",
	"budget" : 40,
	"breakMS" : 0,
	"heuristic" : { 
		"class" : "games.dominion.DominionHeuristic",
		"victoryPoints" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"treasureValue" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"actionCards" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"treasureInHand" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"actionCardsInHand" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"actionsLeft" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"buysLeft" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"provinceCount" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"duchyCount" : [-1.0, -0.2, 0.0, 0.2, 1.0],
		"estateCount" : [-1.0, -0.2, 0.0, 0.2, 1.0],

In this case all of the parameters with Arrays at all levels of recursion will be amalgamated into a single search space. It is also possible to have the same class (games.dominion.DominionHeuristic) used in two independent places - for example if we were to be simultaneously tuning one heuristic for the MCTS rollout evaluation, and another for an opponent model - as the parameters in each JSONObject are given their own namespace to avoid any clashes.