NICE is a tool meant to significantly shorten long lasting engine testing like eng-eng matches with CUTE-chess-cli of 10,000, 15,000 ... 30,000 bullet games. However NICE can never replace the final arbiter CUTE.
Running NICE with the provided sfx.epd or lcx.epd of +84,000 positions at 250ms at 20 cores takes about 20 minutes. The idea to invest 20 minutes before you put an engine change into cute-chess looks attractive to me when NICE can report an elo gain or regression with a precision of 1 elo.
Elo gain example of NICE
ProDeo - has a very well tuned evaluation and it was hard to find a candidate improvement with NICE until we arrived at testing the passed pawn setting for the endgame controlled by the paramter [Passed Pawns = 100]. Testing various settings NICE showed the values 70 and 75 on top with a predicted elo gain of 3 points for the 75 setting and an even 8 elo for the 70 setting in comparison with the default setting of 100.
EPD : epd\sfx.epd Time : 250ms Max Engine Points Used Time Found Pos Elo Points Score Time 1 PP-70 528.677 07:28:52.0 34.607 84.617 2811 846.170 62.48% 250ms 2 PP-75 527.637 07:26:50.2 34.426 84.617 2806 846.170 62.36% 250ms 3 ProDeo-default 527.048 07:32:48.7 34.417 84.617 2803 846.170 62.29% 250ms
Time to put the 75 value into CUTE, 10,000 games resulting in a 4.5 elo point gain with a LOS of 98% Total run time NICE 2 x 20 minutes = 40 minutes. Total run time Cutechess 18 hours. On this moment we haven't yet tested the 70 setting, our interest is not to improve the old dinosaur ProDeo, but the reliability of NICE.
What's in the download?
The main parts
The heart of the system, it analyzes EPD sets via batch (*.bat) files.
• The folder contains 84,000 positions (sfx.epd) analyzed with Stockfish 11 at long time control and multiple threads and about the same 84,000 positions (lcx.epd) were analyzed with Lc0 on long time control on a RTX 2080.
• Additionally the sfx.epd and lcx.epd are into 2 parts of each 40,000 positions for faster testing. It's not recommended to use less positions, volume (like in cute-chess) is extremely important for accuracy.
To tell MEA what to do, which engine to use, which EPD set to analyze, what time control, how much hash, how many threads.
1. Split Run splits the EPD of your choice (say sfx.epd with 84,000 pos) over a user defined number of threads (2-64 max) and creates a batch file @start.bat, then double click @start.bat and the analysis will start.
2. When finished choose Combine Results, select x1.csv and the results of the threads are combined into one output.csv. Hereafter you are offered the choice if you want the results to add to an existing *.csv database of your choice. Thereafter your browser will show you the (updated) result.
Getting familiar with
Pre-installed are the engines Laser 1.6 (rated 3188 at CCRL 40/2) and Laser 1.7 (rated 3274 at CCRL 40/2).
1. Start NICE and choose Split Run, select laser16.bat and enter the number of threads. Type for example 8 and NICE will split the 40,000 positions into 8 parts of each 5,000 positions and create the needed 8 batch files. Exit the program.
2. Double click @start.bat and MEA will start the analysis process.
3. When all threads are finished start NICE again choose Combine Results, select x1.csv and when asked what to do with the results combined in output.csv choose add and append the result in laser.csv whereafter your browser will show you the (updated) result.
4. Repeat steps 1-3 but now use laser17.bat instead.
5. When finished our PC produced this html and it showed Laser 1.7 gained 88 elo points comparable with the CCRLelo gain of 86. Note that this is a cherry-picked example, with other engines we tested NICE reported a lower elo gain than CCRL.
Remarks, hints and limitations
It's assumed that the people who are interested in this util (chess programmers mainly) have
the knowledge how batch files work.
1. MEA produces an excellent EPD with scores, depth of each position in the epd_out folder.
2. Our experience (so far) has been we had more success with the sfx EPD's than with the lcx EPD's.
3. For easy sharing results a text file is produced from each created html output.
4. The disadvantage of the MEA fixed movetime [MT] is that it abrupts aborts the search when the movetime limit is reached. An alternative is including depth, for example:
Tells the engine to finish after 20 plies with a maximum time of 1 second.
5. An abrupt time control with [MT] of say 250ms isn't comparable with:
• A regular time control;
• 250ms is a factor 12 slower than CCRL 40/2.
6. Some engines (in fact a bit more than some) have the habit to surpress the mainlines of the early plies, some hide the first x plies, some hide the main lines the first 100ms and there are even those who use 1 full second. It's probably inspired by a speed gain of 1-2 millieseconds. When there is no main line NICE will report the number of cases found in the error column of the html and the result will be unreliable.
7.1 When an engine uses to much time "Used Time" in the HTML will be colored red.
7.2 When an engine plays too fast "Used Time" in the HTML will be colored blue.
8. For more examples, technical details, inner working of NICE or questions check out this thread on Talkchess.
Ferdinand Mosca for MEA
Jeffrey An and Michael An, for the use of Laser.
Chris Whittington for the analysis of the 2 x
80,000 positions with Stockfish 11 and with
Lc0 on a RTX 2080.
Dann Corbit for the vast majority of the 80,000