Strategic Test Suite

Download

MRT.EXE creates from a MEA Excel (*.csv) file a rating list as used here so you can create (and publish) your own findings. Syntax: mrt [source] [destination]

MRE.EXE creates from a MEA log-file 2 EPD files, one that contains the positions an engine scored no point at all and a second one that contains the positions that the best (10 points) move is not found. See the examples [ one ] and [ two ] of the Stockfish 9 log-file at 60 seconds per move. Syntax: mre [source]

The STS test suite was developed during the years 2009-2010 by Dann Corbit and Swaminathan Natarajan containing 1500 theme based STS positions for the purpose of tuning engines and in particular the evaluation function. Using Ferdinand Mosca's new tool MEA gave me inspiration to give the STS test suite more attention in the form of 6 rating lists.

Bullet 0.2 second per move
Blitz 0.5 second per move
1 second per move
5 seconds per move
10 seconds per move
about 60 seconds per move established by 10 seconds per move running on 8 real cores (no hyperthreading)

An ELO rating (between 0 and max. 4000) is assigned based on the score (points) an engine produces. If an engine scores the maximum of 1500*10 points the ELO rating is 4000, else lower.

It's amazing the see the old (2010-2012) Rybka 3 -> Ippolit -> Robolito based clique (Houdini 1.5, Bouquet, Critter) to dominate the various lists while programs rated 250-300 elo higher, like Komodo and Stockfish, are unable to surpass them. It's probably based on the fact these were the (then TOP) programs used to develop (and error checking) the STS test suite.

In other words, it can't serve as an alternative rating list.

____________________________________________________________________________________________________

TOOLS

___________________________________________________________________________________________________

Usefulness

Experiments with ProDeo

We want to find out if STS can be used for engine tuning. As first step we deliberately feed ProDeo with various settings that are known to be small regressions (5-15 elo max) and see if these changes systematically produce a lower score. And they did which is a good sign. And so we might expect the opposite is also true, that settings which produce a higher score are good candidates to become real improvements proven by eng-eng testing.

Hunting for candidate improvements

We run ProDeo with MEA with several different parameters using 3 time controls, 1 second, 5 seconds and 10 seconds per move. Within 1½ day we found 3 candidate improvements, settings that on all 3 time controls produce a better score. See below overview.

Change	Parameter old	Parameter new
ks=95	[King Safety = 105]	[King Safety = 95]
mob=50	[Mobility = 55]	[Mobility = 50]
ks2.eng	[King Front Square = 8] [King Front pawn Square = 8]	[King Front Square = 16] [King Front pawn Square = 0]