Test Suite Creator

TSC version 1.0

20 Mb

TSC is a tool that can make from any EPD file a STS-like Test Suite. The STS (Strategic Test Suite) was created by Swamithan and Dann Corbit and contains 1500 carefully chosen positions with multiple good moves varying with bonus points in the range of 10 to 1. The 1500 positions were checked and double checked with the best software available at the time. And herein lies a problem, during the years engines has increased 300-400 elo points in strength (advantage) and the set has become a bit outdated (disadvantage).


With TSC we are going to try and use the advantage to fix the disadvange, no more handcraft work, all goes automatic. Using (freely chosen) top engines here is how it works:


1. We ran the 1500 positions with SF11, MultiPV=4, 20 cores, 60 seconds per move, result: sts-sf11.epd

The best move gets 10 points and based on the evaluation score differences the points of next 3 moves are calculated, varying from 9 to 0.


2. We ran the 1500 positions with Lc0, MultiPV=4, 60 seconds per move, result: sts-lc0.epd

So now we already have 2 new STS sets.


3. To enrich the new created sets (more variation) we can combine the analysis of SF11 and Lc0, result: sts-lc0-sf11.epd. Points are added when moves are equal, new moves are included.


And so finally we have 4 STS sets. a) sts-epd the old and standard one, b) one with SF11 results, c) one with Lc0 results and d) one with the combined results of SF11 and Lc0.


Which of the 4 is best? Hard to say. Posted forum examples [ one ] [ two ] look good but a final conclusion would be premature.


______________________________________________________________________________________________



Using TSC

step by step


As an introduction example we take the MATS.EPD positional test suite (as found in the EPD folder) and make it a STS-like one. The set contains only 23 positions so it will take less time.


1. Double click analyze_set_one.bat and mats.epd will be analyzed by SF11 at 5 seconds per move.


2. Double click tsc.exe and with the arrow keys go to MEA epd ONE and select

mats_multipv4_Stockfish_11_mt5000ms_epd.epd and press enter.


3. Go to Create Set and press enter.


4. In the next menu you are offered the choice to combine 2 sets (as described in the previous chapter point 3) but for now just press enter.


5. The STS-like set mats.epd is created and this file should be moved to the MEA\EPD sub-folder and we can begin to test.


6. Then go to the MEA folder and double click run-mats.bat.


7. Ten engines will be tested at one second per move.


8. Besides the mats.htm output mats.txt is created for copy and paste purposes.


That's the main story.


________________________________________________________________________________________________


The details


What's pre-installed?


1. Everything runs via (*.bat) batch files and so there is no need to go to the command prompt, just double click executables or batch files from the Windows Explorer.


2. Pre-installed are the 4 versions of the 1500 STS positions, from the MEA folder double click "run-sts" or "run-sts-lc0" or "run-sts-sf11" or "run-sts-lc0-sf11" to start 10 engines running.  


3. The same applies for the second pre-installed test suite rebel.epd containing 635 mainly positional based positions.


To change the time control or add, delete engines or change EPD sets modify the batch file with a text editor. To change the time control to (say) 5 seconds per move change the set MT=1000 parameter to set MT=5000 and save the file. MT means Move Time in milliseconds.


4. The MEA\RESULTS folder - a place to keep results. It contains the various test results (*.htm and *.txt) of the STS and REBEL EPD's.


_______________________________________________________________________________________________


The TSC parameters


1. Base Points - in STS the maximum points for a position is 10, with TSC we want to be more flexible and you can increase or decrease that number.


2. Score Margin - this is an extremely important parameter, the default value is set to 0.05. Score Margin is used to calculate the number of points for the remaining moves when MultiPV=4 is active. For example, if the best move has a score of 1.00 and the second best move has a score of 0.80 then the points for that move will be 10-(100-80)/5=6.


3. MultiPV=4 - default value is 4, else one.


4. The TSC\EPD folder - contains a number of MEA compatible test suites you (like the above mentioned MATS.EPD example) can use with analyze_set_one.bat by changing the parameter set EPD=epd\mats.epd to the right EPD file.


5. Raw EPD files can be made MEA compatible with SOMU -> F10 before they can be used with TSC.


_______________________________________________________________________________________________

Future work


1. Add an option that lists those EPD lines an engine failed to get the best move.


2. Add an option that lists those EPD lines an engine scored zero points.

Credits

Ferdinand Mosca

for MEA

Dann Corbit, Swaminathan Natarajan

for STS

The programmers for the fair use of their engines