DATA UPDATES

387 Mb

Updates

MillionBase 2.9 - 2.9 million quality chess games in PGN format. The 2 yearly TWIC update, added 350.000 new games from the years 2016 and 2017. Database up to date till January 2018. Next update January 2020.

Purpose of this page


Updates of data files without the need to release a new version.


Manual installation required.

______________________________________________________________________________________________


Data files for material odds testing

pgn_to_epd_v2 is a tool written by Chris Whittington that produces EPD positions from PGN files for eng-eng material odds testing.


Operation - double click make.bat and it will create 8 epd sets:


1. nite-odds.epd
2. bishop-odds.epd
3. queen-odds.epd
4. pawn-f2.epd
5. no-castling.epd
6. rook-odds.epd
7. queen-for-rook.epd
8. queen-for-nite.epd

Parameters


--halfmoves 6

--epdpoolsize 500

--analysistime 250

pgn_to_epd_v2

45 Mb

______________________________________________________________________________________________


Data files for Texel Tuning


pgn_to_epd is a tool written by Chris Whittington that produces EPD positions from PGN files for Texel Tuning. Below are ready to use EPD's made from the CCRL archives.

PGN

Games

Texel EPD

Positions

Download PGN and EPD

ccrl-40/15-elo-3400

2.587

ccrl-40/15-elo-3400

75.000

ccrl-40/15-elo-3300

15.017

ccrl-40/15-elo-3300

406.000

ccrl-40/15-elo-3200

59.125

ccrl-40/15-elo-3200

1.5 million

ccrl-40/15-elo-3100

156.212

ccrl-40/15-elo-3100

4 million

ccrl-40/15-elo-3000

284.538

ccrl-40/15-elo-3000

7.5 million



ccrl-40/2-elo-3400

32.659

ccrl-40/2-elo-3400

1 million

ccrl-40/2-elo-3300

66.243

ccrl-40/2-elo-3300

2 million

ccrl-40/2-elo-3200

146.141

ccrl-40/2-elo-3200

4.3 million

ccrl-40/2-elo-3100

267.034

ccrl-40/2-elo-3100

8 million

ccrl-40/2-elo-3000

410.069

ccrl-40/2-elo-3000

12.1 million

EPD documentation

example


2q5/1p2kp2/3n4/P2Pp1p1/3pP1Pp/3B1P2/2QK3P/8 b - - 30 56; c8b8 - pgn=0.5 len=160

Tag

Meaning

30

moves since last capture | promotion | castling.

can be used to filter when engines are in one of those closed positions without making progress.

56

game move number.

-

can mean several things:

x - move is capture.
+ - move gives check.
e  - is evasion, eg moves are out of check situations.
P -  move is promotion.

can all be used to filter for quiet versus non-quiet positions.

pgn

Pgn=0.0 means stmove loses. The scores are POV whoever is about to move. Pgn=0.5 is draw.

len 160

means the epd is from a game with a total of 160 half moves long.

pgn_to_epd

37 Mb

Making your own EPD's

Operation


1. Store one or multiple PGN's into the PGNS folder.

2. Double click make.bat

3. When finished the result is in the normalised-epds folder.

4. In case of multiple EPD's, to merge them into one, from the command line type copy *.epd my-epd.epd


Remarks

1. Make sure your PGN contains the White and Black elo tags, games without are skipped.


2. Command line paramters:


   --elo 3000                  - will skip games between players with an elo less than 3000.


   --samplingrate 4        - number of positions sampled from each PGN, following len / 4.

                                      - in case of the example above that means 160 / 4 = 40 moves.


   --analysisengine sf11 - define the engine to use, Stockfish 11 is default.

Credits


Chris Whittington for pgn_to_epd

The Stockfish people for the use of Stockfish 11

______________________________________________________________________________________________


Data files for Texel Tuning (part II)


Chris Whittington contributed 41 million EPD positions analysed by Stockfish 11 at 25ms for Texel Tuning. In his own words:


Files contain 41 million EPD’s, sampled from LiChess PGN Database of human games on rebel13.nl

 

All with evaluation by Stockfish 11, search set at 25 milliseconds. Format is 6-part FEN plus centipawn evaluation, POV side on move. Filtered for a) legality, b) availability of more than one legal move from the position and c) not immediately game terminal. Sampling rate from PGNs was around 12%

 

rnb1k2r/ppppqppp/5n2/4N3/1bPP4/2N5/PP2PPPP/R1BQKB1R b KQkq - 0 1; sf11=-89.0

r1bqk2r/pp1pppbp/2n3p1/8/2P1N3/1P3N2/P2QPPPP/R1B1KB1R b KQkq - 0 1; sf11=170.0

1r1q2kr/p6p/1nB2p1B/3b1P2/3p2P1/5Q1P/PP6/R4RK1 w - - 0 1; sf11=381.0

8/k7/P5R1/1Pb5/2P2r2/3K4/8/8 b - - 0 1; sf11=0.0

rn3k1b/4p2p/bqp2np1/p3P3/1p6/3B1P2/PPPQNNP1/R3K2R b KQ - 0 1; sf11=-678.0

2r1n1k1/1p3p1p/3p2p1/1p1Pp3/1B2P3/1P2bP2/P3N1PP/1R5K w - - 0 1; sf11=-20.0

8/4kp2/7p/5K1P/BP4P1/b7/8/8 b - - 0 1; sf11=-13.0

 

Suitable, and designed for (Texel) tuning of chess evaluation function. These are suitable for first shot tuning, proof of concept. Chess engine programmers would probably want to later develop their own testing sets and their own evaluations.

 

Release is into Public Domain. Thanks to Ed Schroeder for hosting.

 

Cordially,

Chris Whittington

622 Mb

____________________________________________________________________________________________


Selected Lichess games

Lichess offers all games played for download, currently more than 700 million. We made 2 selections. From the 2017-12 till 2019-06 archives we created 2 databases.


1. A human vs human database between players of at least 2200 elo. Games 3.784.887

Human

519 Mb

2. A database with only (quoting the web-site) games that include Stockfish analysis evaluations.

Month

Games

Size

Download

2017-12

984.113

411 Mb

2018-01

1.088.616

452 Mb

2018-02

1.048.521

435 Mb

2018-03

1.185.883

490 Mb

2018-04

1.169.227

483 Mb

2018-05

1.198.755

495 Mb

2018-06

1.090.840

450 Mb

2018-07

1.123.155

465 Mb

2018-08

1.249.779

510 Mb

2018-09

1.150.071

474 Mb

2018-10

1.252.763

517 Mb

2018-11

1.421.087

586 Mb

2018-12

1.763.654

725 Mb

2019-01

1.824.684

750 Mb

2019-02

1.595.372

656 Mb

2019-03

1.775.418

729 Mb

2019-04

1.625.806

669 Mb

2019-05

1.583.024

651 Mb

2019-06

1.496.678

616 Mb

In total 25.6 million annotated Stockfish games.


Say average game length is 60 moves, we get:

25.000.000 x 120 = 3.000.000.000 positions


for NN training


or other purposes.

How it is done

so you can do it yourself


1. Download and install SOMU 1.5


2. Extract Lichess downloads in the PGN folder.


3. Doubleclick SOMU.EXE and press [F8]


4. Select the PGN from the folder.


5. And make the selections of your choice.


6. The result is stored in OUTPUT.PGN



Related hint


Once you have the human database of 3.784.887 games installed you can extract from that database higher elo rated database such as 2300, 2400, 2500 etc. with SOMU 1.5 [F8].