Open Chess Game Database Standard - Page 3 (2024)

Skip to content

  • Quick links
    • Unanswered topics
    • Active topics
    • Search
  • FAQ
  • Login
  • Register
  • Computer Chess ClubBoard indexComputer Chess Club ForumsComputer Chess Club: Programming and Technical Discussions
  • Search

Moderators: hgm, Rebel, chrisw

Post Reply

  • Print view

284 posts

  • Page 3 of 29
    • Jump to page:
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 29
  • Next

phhnguyen
Posts: 1434
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham
Contact:

Contact phhnguyen

Re: Open Chess Game Database Standard

  • Quote

Postby phhnguyen »

Fulvio wrote: Sat Oct 23, 2021 11:09 amGlad to finally see a serious approach with some real benchmarks.
However, to give a brief summary of how slow that is::
converting that PGN to SCID4: 54 seconds (247x faster)
size of uncompressed db: 509 MB (2.7 times smaller)
size of compressed db: 292 MB (1.7 times smaller)
searching the games with result == '1-0' and length >= 100: 0.132 seconds (20x faster)

Very impressive speeds! The gaps between SCID and our experiments are so huge!

I guess that is why many users prefer, mention to SCID databases! Perhaps, it is also one of the main reasons some people become disappointed and gave up their SQL tryings!

BTW, may I know what is your computer? Did you use multi-threads (if yes, how many threads did you use for converting, querying)? I need to understand how real large the gaps are. Narrowing those gaps is one of the important tasks we will do.

https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager

Top

mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor
Contact:

Contact mvanthoor

Re: Open Chess Game Database Standard

  • Quote

Postby mvanthoor »

phhnguyen wrote: Sat Oct 23, 2021 1:12 pmYou are right! But… not totally Open Chess Game Database Standard - Page 3 (3)
...
At the moment for SQL we store a game as two strings, one for the starting FEN, another for the move list (SAN moves).

Don't get me wrong, I fully intend to follow this thread (and possibly provide some input or try some things in Rust myself), because I'd rather have a somewhat larger SQL database than a smaller binary database. I'm not trying to store the 100 million games Lichess puts out every day; if I can store my old MegaBase 2017 with TWIC updates (roughly 8 million games), that'd be fine; even if it's bigger than the Chessbase database.

The database could be compressed with zip or 7-zip; if the uncompressed DB is under 24 GB in size, I'd probably be able to load it right into the computer's main memory if need be. But we'd get to that later.

One way to quickly find every position in each was described by Daniel: you could store a Zobrist hash and a GUID for each position you encounter during the PGN -> DB conversion. The DB could then have a table in between the game and position table (don't know the english name: Dutch is "koppel-tabel", or "connection table"), where each record stores a reference to the game, to a position, and an order. So you could query a game from that table, order it on the "order" field, and you have all the positions occurring in that game. You could also query the table on a certain position and (distinct) game Id's, and you instantly got any game in which this position occurs.

I'm not too concerned with regard to performance (is SQLite multi-threaded? Probably depends on the programming library you use), but it would probably take significantly more storage than a binary format. Even if it takes 1.5 or 2x the storage, I'd probably prefer this over yet another binary format.

Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL

Top

mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor
Contact:

Contact mvanthoor

Re: Open Chess Game Database Standard

  • Quote

Postby mvanthoor »

phhnguyen wrote: Sat Oct 23, 2021 1:20 pmI guess that is why many users prefer, mention to SCID databases!

SCID is an impressive program, with the clunkiest user-interface I've ever seen. Even on Linux, it doesn't integrate into any desktop. The best thing that could be done for SCID (after documenting the database format) is to create a programming library to use SCID databases and then to write a new front-end GUI for it. (I know; easier said than done and lots of work. I'm doing something similar at work at the moment; write a completely new backend, keeping compatibility with the existing system, and then write a new GUI front-end and/or apps for that new backend.)

phhnguyen wrote: Sat Oct 23, 2021 1:20 pmPerhaps, it is also one of the main reasons some people become disappointed and gave up their SQL tryings!

PS: don't forget to index the columns on which searches will be run often. (You might end up indexing half the database though... Open Chess Game Database Standard - Page 3 (5) )

Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL

Top

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

  • Quote

Postby dangi12012 »

Fulvio wrote: Sat Oct 23, 2021 11:09 am

phhnguyen wrote: Sat Oct 23, 2021 2:34 amSQLite database sizes:
mb-3.45.ocgdb.db3: 1.38 GB
mb-3.45.ocgdb.db3.zip (compressed): 509.7 MB

Building (convert from PGN games into the SQL database):
#games: 3456762
#errors: 5985 (caused by ignoring chess960 games)
elapsed: 3:42:22
speed: 259 games/s

Near 4 hours for converting! Terrible slow!

Code: Select all

SELECT g.id, w.name white, white_elo, b.name black, black_elo, timer, date, result, eco, length, fen, moves FROM game g INNER JOIN player w ON white_id = w.id INNER JOIN player b ON black_id = b.idWHERE g.id = ?

#queries: 1000000
elapsed: 00:33
speed: 30185 queries/s

Which that information we can be sure the query speed is not an issue to display/extract data from that database.

Glad to finally see a serious approach with some real benchmarks.
However, to give a brief summary of how slow that is::
converting that PGN to SCID4: 54 seconds (247x faster)
size of uncompressed db: 509 MB (2.7 times smaller)
size of compressed db: 292 MB (1.7 times smaller)
searching the games with result == '1-0' and length >= 100: 0.132 seconds (20x faster)

Fulvio hang in there I will release my tool soon!
It does literally 100x your speed with 200-300k games / s!

Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer

Top

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

  • Quote

Postby dangi12012 »

phhnguyen wrote: Sat Oct 23, 2021 1:12 pm

mvanthoor wrote: Sat Oct 23, 2021 8:06 amThe problem with chess databases is not finding players, games tournaments, or results in the database. The problem is finding position X across all games. That is one of the points where most people who design a database format get stuck on SQL.

You are right! But… not totally Open Chess Game Database Standard - Page 3 (6)

Position searching is always a huge challenge for any kind of database, from traditional binary to SQL ones.

At the moment for SQL we store a game as two strings, one for the starting FEN, another for the move list (SAN moves). That’s so simple, straightforward, making it looks like a simple copy of a PGN game. How can a binary database store a game? Perhaps, by using a binary array for the starting position and another binary array for the move list. That should be a simple binary version of the PGN game! The cores of binary and SQL ones are almost the same. Can one use those arrays for position searching? Probably not or with terrible performances since the search function has to create all extra information on the fly. I don’t think there is any magic with those arrays only!

To implement the position search ability, we typically have to add a lot of extra information, index a lot of things. However, what we can add to a binary database, we can add to a SQL database too. There is almost no limitation: SQL can store binary data, from bytes, big numbers to huge arrays. There may be some difficulties and differences between those databases such as speeds, data sizes but I believe they can work similarly.

However, with SQL, we can use existing-almost-perfect-query/search engines in the background, work on easy, clear, clean environments to debug, alternate data structures, thus developers can save much more time and energy to focus on the main work.

In the contrast, the traditional database developers must build almost everything from zero, using heavy buggy development environments, data itself is always the magic, puzzle, changing data structures may cause massive code changes and bugs. That’s why position searching became a very hard and nearly impossible task. That’s why almost all database apps don’t have full features (for position searching).

Position search is enabled by my tool too. The trick is to store a 128bit hash for all positions. A game becomes a list of hashIDs (IDs not the hashes themselves - so around 48 bits per position.

But its not too big. A few GB for 100s of thousands of games.

Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer

Top

Fulvio
Posts: 395
Joined: Fri Aug 12, 2016 8:43 pm

Re: Open Chess Game Database Standard

  • Quote

Postby Fulvio »

phhnguyen wrote: Sat Oct 23, 2021 1:20 pm

Fulvio wrote: Sat Oct 23, 2021 11:09 amGlad to finally see a serious approach with some real benchmarks.
However, to give a brief summary of how slow that is::
converting that PGN to SCID4: 54 seconds (247x faster)
size of uncompressed db: 509 MB (2.7 times smaller)
size of compressed db: 292 MB (1.7 times smaller)
searching the games with result == '1-0' and length >= 100: 0.132 seconds (20x faster)

Very impressive speeds! The gaps between SCID and our experiments are so huge!

I guess that is why many users prefer, mention to SCID databases! Perhaps, it is also one of the main reasons some people become disappointed and gave up their SQL tryings!

BTW, may I know what is your computer? Did you use multi-threads (if yes, how many threads did you use for converting, querying)? I need to understand how real large the gaps are. Narrowing those gaps is one of the important tasks we will do.

Ryzen 3600, 16GB ram, SSD 970 EVO Plus.
Querying uses only 1 thread.
Conversion uses 2 threads: one to read the PGN (it also verify that all the moves are valid, it is the part that takes the most time) and the other inserts the games into the SCID4 database. It is possible to disable multithreading, if I remember correctly it's about 30% slower.
If you want to quickly try it on your pc you can download one of the "portable" versions that do not need installation:
https://sourceforge.net/projects/scid/f ... %20Latest/

Top

mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor
Contact:

Contact mvanthoor

Re: Open Chess Game Database Standard

  • Quote

Postby mvanthoor »

dangi12012 wrote: Sat Oct 23, 2021 7:27 pmBut its not too big. A few GB for 100s of thousands of games.

But that could be a problem; the Chessbase format is a few GB for 10 million games. It would not be economical to need 50 GB to store 10 million games, if software such as Chessbase can store it in 4-5 GB. MegaBase2017+TWIC updates (close to 8 million games) takes 4.3 GB.

Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL

Top

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

  • Quote

Postby dangi12012 »

mvanthoor wrote: Sat Oct 23, 2021 7:46 pm

dangi12012 wrote: Sat Oct 23, 2021 7:27 pmBut its not too big. A few GB for 100s of thousands of games.

But that could be a problem; the Chessbase format is a few GB for 10 million games. It would not be economical to need 50 GB to store 10 million games, if software such as Chessbase can store it in 4-5 GB. MegaBase2017+TWIC updates (close to 8 million games) takes 4.3 GB.

You cannot have all 3- Insane query speed with the flexibility to query against everything and a small db.

This SQL standard would have very high flexibility with a moderate size and a high query speed.
If you would like to know the top 3 opening traps that backfire against 2000 Elo rated players but work below that - a sql query can look that up for you Open Chess Game Database Standard - Page 3 (8) - and you can literally write that query in under 5 minutes. - and 1 minute if you are fluent with sql.

Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer

Top

dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Open Chess Game Database Standard

  • Quote

Postby dangi12012 »

Another idea would be for tournament preparation:
Load all games of a player into an "opening book for player <xxx>" and find his common responses to an opening where he tends to play into an inaccuracy.

This can be done with the pgns from there https://database.lichess.org/ - because they also contain the player names and Stockfish evaluations. Pgn contains all the data but a nice little query can be written in sql just like that.

I think an SQL standard really is a great idea and I thank OP for opening this thread.

Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer

Top

mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor
Contact:

Contact mvanthoor

Re: Open Chess Game Database Standard

  • Quote

Postby mvanthoor »

dangi12012 wrote: Sat Oct 23, 2021 10:53 pmI think an SQL standard really is a great idea and I thank OP for opening this thread.

The one reason I think SQL would be a great idea (assuming it is workable with regard to speed and size) is that I don't have to write my own binary format when I start on my own user interface some day. I'm fully capable of designing such a format (did it before, for embedded projects where SQL was totally not an option), but I'd rather not.

Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL

Top

Post Reply

  • Print view

284 posts

  • Page 3 of 29
    • Jump to page:
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • 29
  • Next

Return to “Computer Chess Club: Programming and Technical Discussions”

Jump to

  • Computer Chess Club Forums
  • ↳ Computer Chess Club: General Topics
  • ↳ Computer Chess Club: Tournaments and Matches
  • ↳ Computer Chess Club: Programming and Technical Discussions
  • ↳ Computer Chess Club: Kindergarten
  • ↳ Chess Players Forum
  • Computer Chess ClubBoard index
  • All times are UTC+02:00
  • Delete cookies
  • Contact us
Open Chess Game Database Standard - Page 3 (2024)
Top Articles
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated:

Views: 6098

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.