PROs: - its easier/faster to check&skip already existing results w.r.t. walking the filesystem - no additional deps (built-in in python) - still a multi-language self-contained file-based storage - we can implement several logic on SQL (object filtering and counting, conditional indexing, etc.) CONs: - SQL-like management, we'll have to cope with migrations - probably will use more disk space - embeddings as BLOBs. Save and load are pretty fast though, on SSD: > Bulk insertion of 100k 1024-d vectors took 5.6198 seconds. > Reading out 100k 1024-d vectors took 2.3810 seconds.
PROs:
CONs: