The first part of the Similarity Report 2019 compared first/best move choice of over a hundred chess engines, set to compute a best move at d=1, and return the result. Each engine returned a best move for 8000+ epd test positions, selected for the purpose of “move similarity between engines” testing, by Don Dailey, for the Simex Similarity tester in around 2011. Similarity between an engine pair is expressed as a percentage by count of same moves selected divided by total positions in the test suite.
To verify the Simex results, we tested again, increasing the number of test positions by a factor of around 20 (160,000 epds against 8000+ epds), and, because we don’t know for sure the epd selection method used by Don Dailey (if has been suggested he chose positions from computer chess games where evaluation was between plus/minus one pawn), we used, for the larger testing, an epd selection method based on the following:
- sampling was from a fully shuffled 1,000,000,000 suite of epds, originally created in 2018 for neural network training, containing positions from both human and computer games.
- the 1000 most commonly naturally occurring piece configurations were computed from this set.
- batches of 20,000,000 samples with roughly equal piece configurations of the 1000 most common were sampled into four groups based on game phase (opening to endgame).
- at this point we have 4 x 20,000,000 sets of epds.
- each set is shuffled, and 40,000 positions taken, in batches of 10,000.
- we now have 16 epd suites, each of 10,000 positions, organised by four game phases.
- each selected EPD was tested for count of legal moves available. Fewer than 0.3 of one percent of the selected EPDs had only one legal move.
We posit that our sampling method random samples from naturally occurring chess positions across a balanced range of material configurations and this represents a suitably representative wide-ranging sample of chess positions.
Each engine was tested again, set to search at depth=1, against each of the 16 x 10000 test suites. Note that in this testing we compute move similarity for each of four game stages, opening, early middlegame, late middle game and ending.
Click on the picture(s) to enlarge