Evaluation Rating List

I have a dream....

that one day many chess programmers will participate in this new type of competition.


The goal is to improve the evaluation in a new way, that is, without the obstacle of search. Imagine a reasonable strong (open source) engine with a reasonable good search, readable source code and we replace the evaluation funnction with our own. What are the advantages c.q disadvantages?




1. It's much more easy to discover the weaknesses of your evaluation since search hardly (to none) plays its dominant role. You don't lose (or win) a game because you are outsearched. You lose (or win) a game because of your evaluation.


2. Playing X versus Y -- since the 2 searches are indentical -- you are measuring the evaluation strength.


3. If we can determine strength we can create a competition based on fixed depth games in order to avoid the last issue that may influence the result as engine X and Y have different time cycles, engine X might have a slow evaluation while engine Y has a fast one. As such we eliminate the last obstacle for a reasonable fair estimation who has the strongest eval based within the scope of this project.


4. The learning effect. Will depend on the number of participants considering the status is open source and GPL.





I can name a few but I leave the issue open for a public discussion first on Talkchess (aka CCC).






The technical part


Step-1 - I found a good candidate in TOGA II 3.0 that will serve as a base. It's FRUIT based, mailbox, readable source code and probably well known to many programmers. It's CCRL rated 2852 ranked at place 61.


Step-2 - In a nutshell, I isolated the evaluation of my first engine for the PC (Mephisto Gideon 1993) and included it in EVAL.CPP and replaced the TOGA evaluation with GIDEON. Compiled it and played 2 matches of each 4000 games at D=8 and D=10. See results below.


Step-3 - Made a start updating the GIDEON (1993) evaluation to the ProDeo (2017) evaluation. I am half-way and called it REBEL for the moment. Played the same 2 matches against TOGA II 3.0 Results so far:

1 Toga3  :  2879.6  4468.5  8000  55.8%
2 Rebel  :  2862.0  4169.5  8000  52.1%
3 Gideon :  2814.4  3362.0  8000  42.0%

Gideon - Toga3  4000 40.7% 
Gideon - Rebel  4000 43.4%

Rebel  - Toga3  4000 47.6%
Rebel  - Gideon 4000 56.6%
1 Toga3  :  2882.8  4520.0  8000  56.5%
2 Rebel  :  2866.1  4239.0  8000  53.0%
3 Gideon :  2807.1  3241.0  8000  40.5%

Gideon - Toga3  4000 39.8%
Gideon - Rebel  4000 41.2%

Rebel - Toga3   4000 47.2%
Rebel - Gideon  4000 58.8%

As one can see the difference between D8 and D10 is not that much which is a good thing.



Download and compilation


Step-1 - Create a project and (only) add the TOGA *.cpp and *.h files into your project.


Step-2 - Open EVAl.CPP and you will see #define ENGINE GIDEON // 0=TOGA | 1=GIDEON | 2=REBEL


It will compile the GIDEON engine, change GIDEON to REBEL to compile the current REBEL engine and TOGA will compile the orginal TOGA II 3.0 engine. Create a new #define to compile your own engine and add the #include files.


Step-3 - Go to line 347, it's the start of the TOGA evaluation function. Here we convert the TOGA board and color to our own engine format. Then we call our evaluation and return the score in centipawns. That's all there is to it. As a bonus I added the search items ply, alpha and beta for lazy-eval lovers in case new engine authors want to use this project as a base creating a chess engine instead of participating in the evaluation competition which step always can be taken later.