Updates
MillionBase 3.45 - 3.45 million (human) quality chess games in PGN and SCID format. The 2 yearly TWIC update, added 550.000 new games from the years 2018, 2019 and 2020. Database up to date till October 6, 2020.
_____________________________________________________________________________________________
Correspondence 2.0 - 2 million corresponding chess games in PGN contributed by Dann Corbit. From the 2 million games a Polyglot book is created.
______________________________________________________________________________________________
Data files for material odds testing
pgn_to_epd_v2 is a tool written by Chris Whittington that produces EPD positions from PGN files for eng-eng material odds testing.
Operation - double click make.bat and it will create 8 epd sets:
1. nite-odds.epd
2. bishop-odds.epd
3. queen-odds.epd
4. pawn-f2.epd
5. no-castling.epd
6. rook-odds.epd
7. queen-for-rook.epd
8. queen-for-nite.epd
______________________________________________________________________________________________
Data files for Texel Tuning
pgn_to_epd is a tool written by Chris Whittington that produces EPD positions from PGN files for Texel Tuning. Below are ready to use EPD's made from the CCRL archives.
PGN | Games | Texel EPD | Positions | Download PGN and EPD |
ccrl-40/15-elo-3400 | 2.587 | ccrl-40/15-elo-3400 | 75.000 | |
ccrl-40/15-elo-3300 | 15.017 | ccrl-40/15-elo-3300 | 406.000 | |
ccrl-40/15-elo-3200 | 59.125 | ccrl-40/15-elo-3200 | 1.5 million | |
ccrl-40/15-elo-3100 | 156.212 | ccrl-40/15-elo-3100 | 4 million | |
ccrl-40/15-elo-3000 | 284.538 | ccrl-40/15-elo-3000 | 7.5 million | |
ccrl-40/2-elo-3400 | 32.659 | ccrl-40/2-elo-3400 | 1 million | |
ccrl-40/2-elo-3300 | 66.243 | ccrl-40/2-elo-3300 | 2 million | |
ccrl-40/2-elo-3200 | 146.141 | ccrl-40/2-elo-3200 | 4.3 million | |
ccrl-40/2-elo-3100 | 267.034 | ccrl-40/2-elo-3100 | 8 million | |
ccrl-40/2-elo-3000 | 410.069 | ccrl-40/2-elo-3000 | 12.1 million |
EPD documentation
example
2q5/1p2kp2/3n4/P2Pp1p1/3pP1Pp/3B1P2/2QK3P/8 b - - 30 56; c8b8 - pgn=0.5 len=160
Tag | Meaning |
30 | moves since last capture | promotion | castling. can be used to filter when engines are in one of those closed positions without making progress. |
56 | game move number. |
- | can mean several things: x - move is capture. can all be used to filter for quiet versus non-quiet positions. |
pgn | Pgn=0.0 means stmove loses. The scores are POV whoever is about to move. Pgn=0.5 is draw. |
len 160 | means the epd is from a game with a total of 160 half moves long. |
Making your own EPD's
Operation
1. Store one or multiple PGN's into the PGNS folder.
2. Double click make.bat
3. When finished the result is in the normalised-epds folder.
4. In case of multiple EPD's, to merge them into one, from the command line type copy *.epd my-epd.epd
Remarks
1. Make sure your PGN contains the White and Black elo tags, games without are skipped.
2. Command line paramters:
--elo 3000 - will skip games between players with an elo less than 3000.
--samplingrate 4 - number of positions sampled from each PGN, following len / 4.
- in case of the example above that means 160 / 4 = 40 moves.
--analysisengine sf11 - define the engine to use, Stockfish 11 is default.
Credits
Chris Whittington for pgn_to_epd
The Stockfish people for the use of Stockfish 11
______________________________________________________________________________________________
Data files for Texel Tuning (part II)
Chris Whittington contributed 41 million EPD positions analysed by Stockfish 11 at 25ms for Texel Tuning. In his own words:
Files contain 41 million EPD’s, sampled from LiChess PGN Database of human games on rebel13.nl
All with evaluation by Stockfish 11, search set at 25 milliseconds. Format is 6-part FEN plus centipawn evaluation, POV side on move. Filtered for a) legality, b) availability of more than one legal move from the position and c) not immediately game terminal. Sampling rate from PGNs was around 12%
rnb1k2r/ppppqppp/5n2/4N3/1bPP4/2N5/PP2PPPP/R1BQKB1R b KQkq - 0 1; sf11=-89.0
r1bqk2r/pp1pppbp/2n3p1/8/2P1N3/1P3N2/P2QPPPP/R1B1KB1R b KQkq - 0 1; sf11=170.0
1r1q2kr/p6p/1nB2p1B/3b1P2/3p2P1/5Q1P/PP6/R4RK1 w - - 0 1; sf11=381.0
8/k7/P5R1/1Pb5/2P2r2/3K4/8/8 b - - 0 1; sf11=0.0
rn3k1b/4p2p/bqp2np1/p3P3/1p6/3B1P2/PPPQNNP1/R3K2R b KQ - 0 1; sf11=-678.0
2r1n1k1/1p3p1p/3p2p1/1p1Pp3/1B2P3/1P2bP2/P3N1PP/1R5K w - - 0 1; sf11=-20.0
8/4kp2/7p/5K1P/BP4P1/b7/8/8 b - - 0 1; sf11=-13.0
Suitable, and designed for (Texel) tuning of chess evaluation function. These are suitable for first shot tuning, proof of concept. Chess engine programmers would probably want to later develop their own testing sets and their own evaluations.
Release is into Public Domain. Thanks to Ed Schroeder for hosting.
Cordially,
Chris Whittington
622 Mb
____________________________________________________________________________________________
Selected Lichess games
Lichess offers all games played for download, currently more than 700 million. We made 2 selections. From the 2017-12 till 2019-06 archives we created 2 databases.
1. A human vs human database between players of at least 2200 elo. Games 3.784.887
2. A database with only (quoting the web-site) games that include Stockfish analysis evaluations.
Month | Games | Size | Download |
2017-12 | 984.113 | 411 Mb | |
2018-01 | 1.088.616 | 452 Mb | |
2018-02 | 1.048.521 | 435 Mb | |
2018-03 | 1.185.883 | 490 Mb | |
2018-04 | 1.169.227 | 483 Mb | |
2018-05 | 1.198.755 | 495 Mb | |
2018-06 | 1.090.840 | 450 Mb | |
2018-07 | 1.123.155 | 465 Mb | |
2018-08 | 1.249.779 | 510 Mb | |
2018-09 | 1.150.071 | 474 Mb | |
2018-10 | 1.252.763 | 517 Mb | |
2018-11 | 1.421.087 | 586 Mb | |
2018-12 | 1.763.654 | 725 Mb | |
2019-01 | 1.824.684 | 750 Mb | |
2019-02 | 1.595.372 | 656 Mb | |
2019-03 | 1.775.418 | 729 Mb | |
2019-04 | 1.625.806 | 669 Mb | |
2019-05 | 1.583.024 | 651 Mb | |
2019-06 | 1.496.678 | 616 Mb |
In total 25.6 million annotated Stockfish games.
Say average game length is 60 moves, we get:
25.000.000 x 120 = 3.000.000.000 positions
for NN training
or other purposes.
How it is done
so you can do it yourself
1. Download and install SOMU 1.5
2. Extract Lichess downloads in the PGN folder.
3. Doubleclick SOMU.EXE and press [F8]
4. Select the PGN from the folder.
5. And make the selections of your choice.
6. The result is stored in OUTPUT.PGN
Related hint
Once you have the human database of 3.784.887 games installed you can extract from that database higher elo rated database such as 2300, 2400, 2500 etc. with SOMU 1.5 [F8].