It´s generally known by now similarity testing on moves does not work with NNUE nets. On this page we will try to research if it is not possible using other methods. One method is to calculate the Root-mean-square deviation (or RMS) of the scores instead of moves as after all NNUE is a set of scores. We will present data and the source code for discussion elsewhere.
Let´s start at the beginning of NNUE in the summer of 2020 the starting point of the NNUE revolution when the Stockfish team implemented the Sergio nets. Our first goal is to measure the stability of the RMS of Stockfish NNUE nets. From the Sergio nets we calculate the RMS of the very first 3 nets (july) and the last 3 (september) and compare the RMS with the final SF12 net, see table one. In table two the nets between SF12 and SF13 are compared plus 5 nets after the release of SF13.
NNUE Research Project
March 10, 2021
SF13 NNUE versus
As we can see the NNUE nets since the last Sergio net (September 14, 2020) and SF13 the RMS fluctuates between 54 and 60. The new 5 kennyfrc test NNUE's show a decreasing RMS meaning a high similarity. Next we are going to compare nets we found on the internet of several NNUE engines (table 3) and the SF13 versus Fat Fritz 2 (table 4).
Let's begin to say that the (above) data is far too less to draw conclusions, let alone final conclusions, more data is needed but the current available data is good enough for an initial discussion.
1. Igel - The current data implies that the Igel NNUE (version 2.70 and 2.80) with a RMS above 100 and a SIM of 30-35 is dissimilar to the Stockfish nets. The webpage states those have their origin in Dietrich Kappe's Night Nurse network. With version 2.90 the author moved to an Igel 2.60 based NNUE. RMS=69 | SIM=57.
2. Fat Fritz 2 - is showing the strongest similarity of the engines we tested to date with Stockfish nets.
3. Minic - we learn from its website the author from Minic 2.46 to Minic 2.53 experimented with Stockfish NNUE but later moved to an own nets such as napping nexus and nascent nutrient which confirms our findings with the similar RMS to Stockfish.
4. Nemorino - the website states: Nemorino uses a slightly adapted file format compared to Stockfish, but you can use the netwok parameter files from Stockfish (and other NNUE engines) as well. Those files will be copied and converted automatically at first usage by Nemorino. It's hard to comprehend what this means in practice and so we ran Nemorino with a SF13 NNUE. The result (like the evaluations) was unrealistic (MSR=156.18 and SIM=21.19), also no copy was made.
5. Rubichess - From the website we learn the author created its own NNUE. The RMS implies that is true.
6. Orion 0.7 - An experimental NNUE version using a Stockfish net as the RMS shows.
7. Orion 0.8 - The author states the NNUE is an original work and seems to confirm our findings.
1. Testing is done with the standard simex utility which creates EPD's in the EPD_OUT folder. The sim-score util compares 2 EPD's and calculates the RMS and SIM. That's basically it. Also a debug.txt file is created to check the calculation. All analysis is done at 100ms except for Fat Fritz 2 we made an exception and tested 250ms and 500ms also.
2. IMPORTANT - The sim-score util is strictly meant for NNUE network comparison. It might also work for Lc0 networks as the nets like NNUE contain a set of values, however none is tested. We very much doubt the use of AB engines because of the handcrafted evaluations. Although not tested we fear a lot of false positives, it's best to keep the original simex approach checking moves, not scores.
3. Syntax - sim-score file1.epd file2.epd Example in the download: sim-score sf13.epd sf12.epd outputs:
The RMS is calculated with the standard formula. The SIM score follows a slightly different path, it checks the square root deviation and makes a catagorization from 0 to 5, dev0 scores are very similar, dev1 scores aresimilar, the dev2-dev5 show no correlation. The SIM is calculated as: (dev0 * 1.25 + dev1) / number of positions in the EPD. The 2 different types of "Mean" calculation seem to complement each other.
4. The souce code of sim-score and the executable are public domain. Improvements, other formulas are welcome.