SIMEX (SIMilarity EXperiments) is the successor of Don Dailey's famous SIM03 that became extremely important in the chaotic 2010-2015 period when the computer chess community was overfloaded with Rybka 3 clones and derivatives. Not only could SIM03 detect Ippolit, Robolito and friends as Rybka3 derivatives but also Fruit 2.1 derivatives were detected.
Nowadays a large number of strong engines are available on Github and a starting programmer has a rich choice from which engine to start. So far so good. What's not good is the deliberate lack of transparency, making a few changes, releasing it as if is an original work while it is a clone and often an abuse of the GPL. SIM03 was developped to unmask the lack of transparency and the system has proven itself as accurate.
SIMEX works the same way but is more user-friendly, has more features but the main advantage is that you are no longer limited to the build-in 8238 positions of SIM03 but can create your own positions using EPD, in the download 7 EPD sets are provided for demonstration purposes. SIMEX uses MEA from Ferdinand Mosca as a base.
A comparison between SIM03 and SIMEX first to check its reliability. We tested both SIM03 and SIMEX with the 8238 SIM03 positions on 5 time controls, 100ms | 250ms | 500ms | 1000ms and finally 2500ms.
From the comparion we can see the numbers of SIMEX are somewhat higher than SIM03 and can be explained by the two different approaches. What's real interesting and hardly was discussed in the 2010-2015 period is the increasing simililarity when the time control also is increased and that a program like Ethereal 11.25 slighly crosses the 60% line at 2.5 seconds with 3 other engines.
Quick-guide: after installation to get a first impression double-click the example.bat batchfile, it will run a small EPD of 100 positions only with Stockfish 9 and Stockfish 10 and after about 30 seconds you will see the similary result. The 6 other batch files are for real.
simex.epd contains the original 8238 SIM03 positions as already discussed above.
sts.epd contains the 1500 positions of the famous STS test. As one can see from the link the similarity is alarming but in reality it's a bad choice as the positions are too easy. To test engines for similarity use random positions, more on that below.
It also shows (and this is extremely important) that each created set of positions will have its own orange and red markers.
MEA wasn't created for simex but for other purposes such as OKE or creating opening books and for that reason requires a special EPD tag. SOMU 1.5a will do that job for you, see the [F9] and [F10] options marked with "new" on that page.
In a nutshell:
[F9] - from a PGN create a suitable simex EPD.
[f10] - converts an EPD that contains the "bm" tag for the use in SIMEX.
1. SIM03 sends the whole game history to the engine while SIMEX uses EPD. This might cause differences.
2. The time control is fundamental different. SIM03 is in control, it sends a stop command to the engine when time is up. MEA leaves it to the engine programmer and how he has programmed the fixed move time. Unfortunately not every engine has programmed this accurately.
An extreme example is Rybka1. With SIM03 it uses 17 minutes to finish the 8238 position at 100ms (already 3½ minutes too much!) but with SIMEX it notable takes 1 hour and 2 minutes to finish. One can check the end of the log file to check the sanity of the time an engine has used. For Rybka1 we got:
Time allocation : BAD!! spending more time ActualTime > ExpectedTime + MarginTime ExpectedTime : 823.8s ActualTime : 3669.8s
However Rybka1 is a big exception, the engines we tested stay in reasonable margins but reason (2) explains why the SIMEX similarity percentages are somewhat higher than with SIM03.
1. SIMEX parameters to manipulate the data for better results see the README file.
2. Add comments to HTML reports. Store them into legend.txt, example.
3. Make a dendrogram from *.data files for *.png visualization, example. During the creation of an HTML SIMEX also creates an Excel file called dendrogram.csv which can be used by the dendrogram tool of Ferdinand Mosca. Just double click dendrogram.bat in case you want such a picture.
4. Chris Whittington created EPD sets of each 10,000 positions. Each set contains a specific piece distribution. In total 100 sets representing the most common board positions in use. See the list. Example with SIMEX.