Skip to content

Conversation

@ashkrisk
Copy link
Contributor

@ashkrisk ashkrisk commented Jan 21, 2026

JVector provides methods for reading fvec and ivec files in jvector-examples/.../SiftLoader. These methods return a List by loading all the vectors into memory. This doesn't work in cases where the total size of the vectors exceeds the available memory, which in turn means that consuming applications like BenchYAML cannot work with larger-than-memory datasets.

This PR adds FvecSegmentReader to jvector-native as an experimental API. In the future, it's possible to integrate this with MultiFileDataSource to allow benchmarking of larger-than-memory datasets.

Alternate approaches

  • Keep using SiftLoader.readFvecs. This is fundamentally limited by being unable to process larger-than-memory datasets.
  • Use the existing MemorySegmentReader for IO, through ReaderSupplierFactory.open(). This re-uses existing code and will automatically fall back to MMapReader on lower JDK versions. However, this makes the implementation a bit clunky since the RandomAccessReader interface is not thread-safe, which would force us to use a thread-safe MemorySegment in a defensive manner. Using the MemorySegment API directly makes the code cleaner and more self-contained.

Possible next steps

  • Use this in BenchYAML to add support for larger-than-memory datasets.
  • Add a fallback implementation that works on lower JDK versions.

@ashkrisk ashkrisk marked this pull request as ready for review January 22, 2026 12:04
@ashkrisk ashkrisk changed the title Add MemorySegment-based reader for Fvec files Add MemorySegment-based RAVV for Fvec files Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant