Add MemorySegment-based RAVV for Fvec files #601

ashkrisk · 2026-01-21T14:22:25Z

JVector provides methods for reading fvec and ivec files in jvector-examples/.../SiftLoader. These methods return a List by loading all the vectors into memory. This doesn't work in cases where the total size of the vectors exceeds the available memory, which in turn means that consuming applications like BenchYAML cannot work with larger-than-memory datasets.

This PR adds FvecSegmentReader to jvector-native as an experimental API. In the future, it's possible to integrate this with MultiFileDataSource to allow benchmarking of larger-than-memory datasets.

Alternate approaches

Keep using SiftLoader.readFvecs. This is fundamentally limited by being unable to process larger-than-memory datasets.
Use the existing MemorySegmentReader for IO, through ReaderSupplierFactory.open(). This re-uses existing code and will automatically fall back to MMapReader on lower JDK versions. However, this makes the implementation a bit clunky since the RandomAccessReader interface is not thread-safe, which would force us to use a thread-safe MemorySegment in a defensive manner. Using the MemorySegment API directly makes the code cleaner and more self-contained.

Possible next steps

Use this in BenchYAML to add support for larger-than-memory datasets.
Add a fallback implementation that works on lower JDK versions.

ashkrisk added 3 commits January 21, 2026 19:25

Add MemorySegment-based reader for Fvec files

672a2e3

Add unit tests for FvecSegmentRavv

04868e0

Add license header

ed4fe7e

ashkrisk marked this pull request as ready for review January 22, 2026 12:04

ashkrisk requested review from MarkWolters, jshook, marianotepper and tlwillke as code owners January 22, 2026 12:04

ashkrisk changed the title ~~Add MemorySegment-based reader for Fvec files~~ Add MemorySegment-based RAVV for Fvec files Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MemorySegment-based RAVV for Fvec files #601

Add MemorySegment-based RAVV for Fvec files #601

ashkrisk commented Jan 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add MemorySegment-based RAVV for Fvec files #601

Are you sure you want to change the base?

Add MemorySegment-based RAVV for Fvec files #601

Conversation

ashkrisk commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alternate approaches

Possible next steps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ashkrisk commented Jan 21, 2026 •

edited

Loading