Skip to content

[collection] consider SQLite for keeping analysis results #20

@fabiocarrara

Description

@fabiocarrara

PROs:

  • its easier/faster to check&skip already existing results w.r.t. walking the filesystem
  • no additional deps (built-in in python)
  • still a multi-language self-contained file-based storage
  • we can implement several logic on SQL (object filtering and counting, conditional indexing, etc.)

CONs:

  • SQL-like management, we'll have to cope with migrations
  • probably will use more disk space
  • embeddings as BLOBs. Save and load are pretty fast though, on SSD:

    Bulk insertion of 100k 1024-d vectors took 5.6198 seconds.
    Reading out 100k 1024-d vectors took 2.3810 seconds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions