Sim03, a chess engine similarity measuring tool, was developed in the period 2010-12 by Don Dailey, one of the programmers of the chess engine Komodo series. A chess engine was presented with 8238 chess positions and asked for its top move choice. A pair of different chess engines could then be compared for similarity by matching up and counting the number of positions for which each engine chose the same move.
It's successor Simex is thus a similarity by outcome tool. Chess engines 'think' using a combination of evaluation (the smart part) and search, which tends to be more algorithmic. The increasing ability of engines to search very deep over the last few years has decreased the value of the Sim03 tool, as, arguably, deep searchers will be more inclined to choose similar moves, with less ability for Simex to discriminate between them.
Ed Schröder, the programmer of the Rebel series of chess engines, recently devised a technique to use Simex to consider only the evaluation part of a chess engine, by limiting the engine search depth to one move ahead only, effectively disabling its search function. An added bonus was that the 8238 chess positions could now be evaluated by any chess engine using UCI, Universal Chess Interface protocols, in a matter of minutes or less.
Around one hundred and fifty chess engines were available for testing, some were unsuitable, but one hundred and thrity five remained, and after testing all these engines and comparing all engine pairs, we were able to produce a 135x135 chess engine similarity or correlation matrix containing around nine thousand correlation values, with each engine-pair correlation value expressed as a percentage figure.
Correlation Similarity meaning.
The upper bound of similarity/correlation is 100%, we would expect the evaluation of two identical engines to agree with each others move choice 100% of the time. We find a lower bound in the data of around 30 to 40% for the chess positions used, assumedly unconnected engines appear to generate the same move for a position around one time in three, suggesting the test suite has, on average, around three 'sensible' candidate moves that a 'sensible' evaluation function would choose between.
All tested engines in this report are of the alpha-beta type, so our proposed baseline is an alpha-beta baseline. When we test as many neural net engines as possible for our next report, we may well discover a different baseline figure for move variance, since neural net engines anecdotally evaluate positions differently to alpha-beta handcrafted evaluation functions.