Skip to content

Recommendation Systems #70

@psilabs-dev

Description

@psilabs-dev

Research Issue

No plans to turn this into a solid feature request, since there are a couple moving parts involved.

References:

Tag Similarity Search

This is suitable for LRR instances which have an abundance of tags per archive (e.g. Pixiv/NH/EH). But it would be absolutely useless for Twitter.

Image similarity search

This can be done by vectorizing all images (or better: all archives). Perhaps only PNG/JPG images are supported, i.e. no GIFs, PDFs, or weird files. Then, we can start testing.

Initial indexing will take a lot of time, but once indexing is done, we won't have too much pressure.

Pixiv produces useful metadata for training. Twitter is good for testing. NH also provides metadata.

With this feature, there is some reasonable guarantee that search is fast.

Recommendation System

However: what if this results in all duplicates being shown, instead of "recommendations"? When archive size reaches 100k+, if image similarity search only returns duplicate images, then this would weaken the value of image similarity search.

A better approach would be to find images which are similar, but not too similar, to the image which you are viewing. This can involve a simple tweak to the image similarity search logic.

Text-to-Image similarity search

This is harder. We need to pass a text, then find images which are most closely related to our filter. Theoretically this is possible with CLIP, but will it be performant in scale?


Then, we might do things like:

  • search all hands
  • search for all images similar to this image

But, the main issue is 1) is it accurate/useful, and 2) is it fast (enough)?

Database Options

  • redis: this already exists (and allegedly supports vector DB well), so it would be nice if we can just use redis.
  • postgres: this is also reliable, SQL; but limited.
  • qdrant: this is an oss vector database.

Metadata

Metadata

Assignees

Labels

downstreamA downstream issue (not to be brought upstream).featureNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions