Time odds Matches

With the increasing playing strength of chess engines the draw rate of matches also increases. Between top engines this often goes up to 80-90%, see for instance the TCEC super final of 2021.


Does this mean progress becomes harder and harder? Certainly. Does this mean progress sooner or later will become hard to measure because the law of Diminishing Returns and the draw rate between engines will reach 95% or worse? In the 90's it was said by some that all games will be a draw when top engines would reach a search depth of 14 plies. We (now) know that is not true, in fact the end is nowhere near in sight as we will demonstrate playing time-odds matches with the current 2 top engines Stockfish 14 and Komodo Dragon 2.5


About 2 years ago we already demonstrated with time-odds matches how strong Stockfish 11 is and how much time factors it needed before its nearest competitors could beat it. We now do it the other way around, we play SF14 vs SF14 matches with time-odds of factor 2, 4, 8 and 16 and measure the elo progress as an indication how much space there is for improvement and how much the draw rate will lower.

Balanced openings

Time odd

Stockfish 14

Dragon 2.5

Equal time

50.0%

50.0%

Factor 2

55.4%

57.4%

Factor 4

59.9%

64.1%

Factor 8

62.5%

70.7%

Factor 16

66.8%

74.4%

PGN

ORDO calculation

Time odd

Stockfish 14

Dragon 2.5

Equal time

3565

3552

Factor 2

3602

3637

Factor 4

3635

3714

Factor 8

3654

3789

Factor 16

3687

3833

____________________________________________________________________________________


Technical


1. The four matches Komodo and Stockfish are played with as base time control 40/40 (one second average).


Match-1 : SF14 (40/40) vs SF14 (40/80) - one-vs-two-seconds
Match-2 : SF14 (40/40) vs SF14 (40/160) - one-vs-four-seconds
Match-3 : SF14 (40/40) vs SF14 (40/320) - one-vs-eight-seconds
Match-4 : SF14 (40/40) vs SF14 (40/640) - one-vs-sixteen-seconds


Likewise for Komodo.


2. To know how strong Stockfish 14 and Komodo Dragon 2.5 are at 40/40 we play 2 time-odds matches at GRL time control (40/120) against 3 of its nearest competitors.

. Stockfish 14 at 40/40 rates : 3565

. Komodo Dragon 2.5 at 40/40 rates : 3552


So now we roughly know at which elo Komodo and Stockfish play at one second average and so we could calculate an imaginary elo (imaginary because the results are based on self-play) for both engines with ORDO as listed above.


____________________________________________________________________________________


Draw rate overview

Draw rate balanced openings

Time odd

Stockfish 14

Dragon 2.5

Equal time

Factor 2

84.1%

75.8%

Factor 4

77.4%

67.4%

Factor 8

72.2%

55.8%

Factor 16

65.6%

48.0%

Conclusions


1. Despite the high draw rates of the two current strongest chess engines there is still plenty of room for improvement as engines playing at ~3550 elo can still lose significally.


2. Komodo Dragon scales a lot better than Stockfish.

___________________________________________________________________________________________________

The issue of diminishing returns


Ideally we should also play more matches to measure the effects of the diminishing returns phenomenon and so we also play:


. Factor 2 vs Factor 4, 8 and 16

. Factor 4 vs Factor 8 and 16

. Factor 8 vs Factor 16


And calculate the diminishing returns.

Balanced openings

Time odd

Stockfish 14

Dragon 2.5

2 secs vs 4 secs

55.1%

56.2%

2 secs vs 8 secs

56.6%

63.3%

2 secs vs 16 secs

61.4%

66.5%

4 secs vs 8 secs

52.2%

55.1%

4 secs vs 16 secs

54.6%

61.8%

8 secs vs 16 secs

53.4%

53.4%

Draw rate

Time odd

Stockfish 14

Dragon 2.5

2 secs vs 4 secs

83.4%

80%

2 secs vs 8 secs

83.2%

70.6%

2 secs vs 16 secs

74.8%

62%

4 secs vs 8 secs

88.4%

81%

4 secs vs 16 secs

86.8%

73.2%

8 secs vs 16 secs

89.2%

91.9%

After 10,000 games both engines played we can keep the balance, calculate an imaginary elo (imaginary because the results are based on self-play) and view the Diminishing Returns for each engine

Diminishing Returns for Stockfish 14

ORDO calculation

Engine

Rating

Gain

Games

Sixteen seconds

3679

+27

1000

Eight seconds

3652

+15

1750

Four seconds

3637

+35

2250

Two seconds

3602

+37

2250

One second

3565


2750

Diminishing Returns for Komodo Dragon 2.5

ORDO calculation

Engine

Rating

Gain

Games

Sixteen seconds

3732

+34

1000

Eight seconds

3698

+44

1750

Four seconds

3654

+47

2250

Two seconds

3607

+55

2250

One second

3552


2750

As we can see (most clearly from the Komodo results) the elo gain lowers and lowers after each doubling of time.


While these 20,000 games are played single core (which took more than a week) it is expected the elo gains will lower further and further using multiple threads.

PGN

20,000 games

____________________________________________________________________________________________________


What about lower rated engines?


We test 9 other engines in the range of 2600 - 3600 elo (time odds factor 2 only)

and compare the elo gain and draw rate.

Time control : 1 second vs 2 seconds

Engine

GRL elo

Result

Elo Gain

Draw rate

Stockfish 14

~3700

55.4%

38

84.1%

Dragon 2.5

~3650

57.4%

52

75.8%

Dragon 2.0

~3600

59.8%

68

73.4%

Stockfish 11

~3500

62.6%

88

65.0%

Koivisto 6.0

~3400

61.4%

80

68.1%

Clover 2.4

~3200

66.0%

112

58.1%

Counter 3.8

~3000

62.8%

89

63.0%

Wasp 1.02

~2900

67.1%

120

45.8%

ProDeo 3.1

~2800

68.1%

127

45.0%

Fruit 2.1

~2700

67.3%

121

39.2%

Zevra 2.4

~2600

64.5%

101

47.5%

It's surprising to see even an 3500 elo rated engine like Stockfish 11 produces such a high elo gain at this time-odds level. Time do double the time control, see table on your right.

Time control : 2 seconds vs 4 seconds

Engine

GRL elo

Result

Elo Gain

Draw rate

Stockfish 14

~3700

55.1%

35

83.4%

Dragon 2.5

~3650

56.2%

43

80.0%

Dragon 2.0

~3600

57.4%

52

74.8%

Stockfish 11

~3500

61.6%

81

69.6%

Koivisto 6.0

~3400

58.2%

57

75.6%

Clover 2.4

~3200

64.0%

98

56.8%

Counter 3.8

~3000

64.8%

103

56.8%

Wasp 1.02

~2900

65.2%

106

46.4%

ProDeo 3.1

~2800

63.2%

92

52.8%

Fruit 2.1

~2700

66.4%

114

38.4%

Zevra 2.4

~2600

63.6%

95

44.0%

Still high elo gains. We do the same for :

. 4 secs vs 8 secs (faster than CCRL 40/2 and CEGT 40/4)

. 8 secs vs 16 secs (close to CCRL 40/15 and CEGT 40/20)

And present the results in a different (final) format.

_____________________________________________________________________________________________________


Presenting the final results

Engine

GRL elo

1 vs 2

2 vs 4

4 vs 8

8 vs 16

Stockfish 14

~3700

+38

+35

+15

+24

Dragon 2.5

~3650

+52

+43

+35

+24

Dragon 2.0

~3600

+68

+52

+45

+29

Stockfish 11

~3500

+88

+81

+74

+54

Koivisto 6.0

~3400

+80

+57

+53

+47

Clover 2.4

~3200

+112

+98

+87

+64

Counter 3.8

~3000

+89

+103

+66

+82

Wasp 1.02

~2900

+120

+106

+89

+71

ProDeo 3.1

~2800

+127

+92

+112

+88

Fruit 2.1

~2700

+121

+114

+124

+112

Zevra 2.4

~2600

+101

+95

+61

+85

Diminishing ELO returns time odds factor 2

Draw rates time odds factor 2

Engine

GRL elo

1 vs 2

2 vs 4

4 vs 8

8 vs 16

Stockfish 14

~3700

84.1%

83.4%

88.4%

89.2%

Dragon 2.5

~3650

75.8%

80.0%

81.0%

91.9%

Dragon 2.0

~3600

73.4%

74.8%

77.6%

82.0%

Stockfish 11

~3500

65.0%

69.6%

68.4%

75.6%

Koivisto 6.0

~3400

68.1%

75.6%

75.2%

78.4%

Clover 2.4

~3200

58.1%

56.8%

64.0%

68.6%

Counter 3.8

~3000

63.0%

56.8%

64.4%

66.0%

Wasp 1.02

~2900

45.8%

46.4%

52.0%

53.2%

ProDeo 3.1

~2800

45.0%

52.8%

47.2%

58.0%

Fruit 2.1

~2700

39.2%

38.4%

36.4%

44%

Zevra 2.4

~2600

47.5%

44.0%

51.2%

53.2%

Observations - which is not the same as conclusions :-)


1. The sharp fall in elo gain (green vs red) (8 vs 16 seconds) seems to indicate that for top engines the road to further progress NNUE evaluation becomes more and more important, perhaps even more important than search improvements, although of course they always go hand in hand.


2. There is a clear pattern (with a few exceptions) that after each doubling of the time odds time control the elo gain lowers while the draw rate increases.


3. Stockfish 11 is interesting, it's a HCE engine, while the orange are NNUE, and it seems to profit more from the doubling of the time control.


4. For the lower rated engines counts they profit the most, search seems to be the dominant factor.

________________________________________________________________________________________


One step further

A comparison with the GRL (single core) vs the GRL (20 cores)

and the draw rates

Draw Rate Comparison

Engine

one core

20 cores

Stockfish 14

39%

61%

Komodo-Dragon 2.5

48%

58%

Komodo-Dragon 2.0

43%

65%

Ethereal 13.25

47%

63%

Koivisto 6.16

49%

58%

SlowChess 2.7

51%

61%

RubiChess 2.2

47%

61%

Average Search Depth Comparison

Engine

one core

20 cores

Stockfish 14

28.59

37.12

Komodo-Dragon 2.5

27.72

35.70

Komodo-Dragon 2.0

25.86

32.08

Ethereal 13.25

25.56

32.10

Koivisto 6.16

27.12

30.53

SlowChess 2.7

21.37

24.94

RubiChess 2.2

29.50

35.03

Still low draw rates with 20 cores.


Maybe unbalanced but playable positions (like the gambit positions) is the future, at least for the entertaining part.

This study is the work of playing 35,750 games that took

about 12 days in total using 20 cores.


Last update - October 22, 2021