nnue

It´s generally known by now similarity testing on moves does not work with NNUE nets. On this page we will try to research if it is not possible using other methods. One method is to calculate the Root-mean-square deviation (or RMS) of the scores instead of moves as after all NNUE is a set of scores. We will present data and the source code for discussion elsewhere.


Let´s start at the beginning of NNUE in the summer of 2020 the starting point of the NNUE revolution when the Stockfish team implemented the Sergio nets. Our first goal is to measure the stability of the RMS of Stockfish NNUE nets. From the Sergio nets we calculate the RMS of the very first 3 nets (july) and the last 3 (september) and compare the RMS with the final SF12 net, see table one. In table two the nets between SF12 and SF13 are compared plus 5 nets after the release of SF13.

NNUE Research Project

March 10, 2021

Table one

SF12-NNUE versus

RMS

SIM

sv-20200720-1017

64.68

57.44

sv-20200721-0909

64.28

58.19

sv-20200721-1432

64.17

59.82

sv-20200906-1046

55.14

69.52

sv-20200908-1733

54.33

68.81

sv-20200914-1520

54.41

69.46

Table two

SF13 NNUE versus

RMS

SIM

SF12

59.92

64.43

sf-0c6fc5ef48e1

54.72

65.31

sf-dd0c4c630f7e

56.85

62.92

sf-516f5b95189a

56.41

62.07

sf-6b7a4192c303

47.51

74.14

sf-94816594b327

48.33

70.43

As we can see the NNUE nets since the last Sergio net (September 14, 2020) and SF13 the RMS fluctuates between 54 and 60. The new 5 kennyfrc test NNUE's show a decreasing RMS meaning a high similarity. Next we are going to compare nets we found on the internet of several NNUE engines (table 3) and the SF13 versus Fat Fritz 2 (table 4).

Table three

SF12-NNUE versus

RMS

SIM

105.87

34.11

Igel-280

108.58

32.05

Igel-290

69.38

57.10

51.30

72.83

Minic-nascent

96.02

38.57

Minic-nexus

74.60

56.31

67.93

61.09

92.73

48.48

36.27

92.16

70.23

55.56

Table four

SF13-NNUE versus

RMS

SIM

Fat Fritz 2 100 ms

50.92

70.74

Fat Fritz 2 250 ms

46.15

74.68

Fat Fritz 2 500 ms

45.16

78.09

Click on the links in table 3 to view the policy regarding NNUE of the engine authors.


The SIM (similarity) number will be explained below but in principle the RMS value is our guide. And as we can see the lower the RMS the higher the SIM.

___________________________________________________________________________________________________


Observations / Remarks


Let's begin to say that the (above) data is far too less to draw conclusions, let alone final conclusions, more data is needed but the current available data is good enough for an initial discussion.


1. Igel - The current data implies that the Igel NNUE (version 2.70 and 2.80) with a RMS above 100 and a SIM of 30-35 is dissimilar to the Stockfish nets. The webpage states those have their origin in Dietrich Kappe's Night Nurse network. With version 2.90 the author moved to an Igel 2.60 based NNUE. RMS=69 | SIM=57.


2. Fat Fritz 2 - is showing the strongest similarity of the engines we tested to date with Stockfish nets.


3. Minic - we learn from its website the author from Minic 2.46 to Minic 2.53 experimented with Stockfish NNUE but later moved to an own nets such as napping nexus and nascent nutrient which confirms our findings with the similar RMS to Stockfish.


4. Nemorino - the website states: Nemorino uses a slightly adapted file format compared to Stockfish, but you can use the netwok parameter files from Stockfish (and other NNUE engines) as well. Those files will be copied and converted automatically at first usage by Nemorino. It's hard to comprehend what this means in practice and so we ran Nemorino with a SF13 NNUE. The result (like the evaluations) was unrealistic (MSR=156.18 and SIM=21.19), also no copy was made.


5. Rubichess - From the website we learn the author created its own NNUE. The RMS implies that is true.


6. Orion 0.7 - An experimental NNUE version using a Stockfish net as the RMS shows.


7. Orion 0.8 - The author states the NNUE is an original work and seems to confirm our findings.

___________________________________________________________________________________________________


Operation


1. Testing is done with the standard simex utility which creates EPD's in the EPD_OUT folder. The sim-score util compares 2 EPD's and calculates the RMS and SIM. That's basically it. Also a debug.txt file is created to check the calculation. All analysis is done at 100ms except for Fat Fritz 2 we made an exception and tested 250ms and 500ms also.


2. IMPORTANT - The sim-score util is strictly meant for NNUE network comparison. It might also work for Lc0 networks as the nets like NNUE contain a set of values, however none is tested. We very much doubt the use of AB engines because of the handcrafted evaluations. Although not tested we fear a lot of false positives, it's best to keep the original simex approach checking moves, not scores.


3. Syntax - sim-score file1.epd file2.epd Example in the download: sim-score sf13.epd sf12.epd outputs:


dev0    dev1    dev2    dev3    dev4    dev5     RMS     SIM
2592   2061    1491     880      530      673   59.92   64.43


The RMS is calculated with the standard formula. The SIM score follows a slightly different path, it checks the square root deviation and makes a catagorization from 0 to 5, dev0 scores are very similar, dev1 scores are similar, the dev2-dev5 show no correlation. The SIM is calculated as: (dev0 * 1.25 + dev1) / number of positions in the EPD. The 2 different types of "Mean" calculation seem to complement each other.


4. The souce code of sim-score and the executable are public domain. Improvements, other formulas are welcome.

Sim-Score

0.5 Mb