Performance

We want games to run fast. This is true whether using statistical forward planning techniques such as MCTS, or Reinforcement Learning (once we get around to implementing a clear interface for this). The ForwardModel.next() and GameState.copy() methods are called multiple times, and should not be too slow. However, the usual software engineering adage of write a good solution first, and only optimise for performance when you have a problem, still holds.

To quote Donald Knuth

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

As a guide to times for the next and copy methods, compare the games outlined below (an exhaustive list as of January 2021):

Game	Copy (μs)	Next(μs)
Diamant	6	3.3
Dominion	29	6.2
Exploding Kittens	25	4.2
Colt Express	18	2.8
Virus	18	5.9
Uno	14	1.0
Dots and Boxes	57	1.7
Love Letter	7	1.6

These from a rather old desktop computer, but give rough order of magnitude. They are medians calculated with Random agent play using GameReport. Only bother optimising if your game differs significantly from these examples; and it will obviously always be dependent on the game complexity and number of mutable components that need to be copied/updated.

Using lambdas, method references and streams

One point of contention has been the relative performance of lambdas, method references and streams in code. See for example here, or here.

Out of curiosity we ran some rough-and-ready benchmark tests to see how much we should care. Firstly, we generated a list of 10,000 random integers and summed it using the three distinct implementations below:

    double useMethodReference(List<Number> input) {
        return input.stream().mapToDouble(Number::doubleValue).sum();
    }

    double useLambda(List<Number> input) {
        return input.stream().mapToDouble(i -> i.doubleValue()).sum();
    }

    double useUglyJava(List<Number> input) {
        double retValue = 0.0;
        for (Number n : input) {
            retValue += n.doubleValue();
        }
        return retValue;
    }

The table below reports the results in microseconds. Several iterations were run so that initial transients as the JVM warmed up could be accounted for.

Iteration	Simple Java	Method Ref.	Lambda
1	736	6445	1401
2	419	167	159
3	110	151	147
4	82	149	148
5	81	152	145
6	82	142	154

This shows that once the JVM has optimised (at runtime) for the code patterns, then the non-stream approach is 40-50% faster. It is also noticeable that the java code ‘warms up’ faster, implying that the JVM has to do less funky runtime optimisation - which makes sense. This latter point is not that relevant, as in this context it is only bits of code called frequently enough to be fully runtime-optimised that we should really consider optimising. This test is highly abstract and does not do anything as complex as real game code.

Secondly, we therefore converted the core MCTS algorithm to use/avoid streams completely in three functions called multiple times in each iteration - namely to count the number of node visits, and the value of a node, and find all unexpanded nodes.

// Firstly two plain java functions
    private double actionTotValueV1(AbstractAction action, int playerId) {
        double retValue = 0.0;
        for (SingleTreeNodeNoStreams node : children.get(action)) {
            if (node != null)
                retValue += node.totValue[playerId];
        }
        return retValue;
    }

    private List<AbstractAction> unexpandedActionsV1() {
        List<AbstractAction> retValue = new ArrayList<>();
        for (AbstractAction action : actionsFromState) {
            if(children.get(action) == null)
                retValue.add(action);
        }
        return retValue;
    }

// and then their streamed/method referenced versions
    private double actionTotValueV2(AbstractAction action, int playerId) {
        return Arrays.stream(children.get(action))
                .filter(Objects::nonNull)
                .mapToDouble(n -> n.totValue[playerId])
                .sum();
    }

    private List<AbstractAction> unexpandedActionsV2() {
        return actionsFromState.stream().filter(a -> children.get(a) == null).collect(toList());
    }

We then ran comparisons on two games.

Dots and Boxes with 1s of time per move, OSLA opponents, 4 players, Closed Loop MCTS, and no rollouts or redeterminisation. The lack of rollouts and using Closed Loop means that we do very little work in the forward model, and hence the computational work is concentrated in the MCTS algorithm itself.
Love Letter with 2s of time per move, random opponents, 4 players, Open Loop, Information Set MCTS but again with no rollouts.

Game	Stream Iter.	No Stream Iter	Gain
DotsAndBoxes	3600	4100	10-15%
LoveLetter	16000	18000	10-15%

The reported figures are median MCTS iterations conducted within the computational budget. The mean figure is warped by outliers of a small number of iterations (while the JVM warms up) and huge number of iterations at the end of a game, when we can easily run hundreds of thousands of iterations to decide on the final move. The median is robust to these outliers.

Conclusion

In this case we decided to convert the four relevant MCTS functions to the non-stream version. This is because even a 10% performance increase in a core method used by every game may be helpful. However, in general, using a more functional, stream-based style benefits from:

clearer code
more concise code
less error-prone code

And outside of this quite specific core the decision would have gone the other way. So, knock yourself out with streams and lambdas and only bother optimising if there may actually be a problem.