This pipeline uses s3vectors to manage vector representations of data. It allows indexing vectors in S3 and performing similarity-based queries or searches. It also helps create the necessary S3 vector bucket and resources required to run the entire process.
If you use your own dataset, you need to specify all the necessary parameters in the "Initializations" section.
Make sure your files (dataset, queries, and true_neighbors) follow the correct format (as indicated in the notebook).
Upload the CSV files from files.zip to any S3 bucket.
Execute the "Initializations" section to:
- Import the necessary packages
- Create the required S3 vector resources
Note:
If the resources already exist, an error will appear.
You can run "Clean environment" to delete the existing resources.
Execute the "Vectors Indexing" section to insert vectors into S3 Vectors.
You have two options:
- Insert the entire dataset at once
- Insert vectors one by one
Important:
Specify the name of the S3 bucket where the CSV files are located.
Execute the "Querying" section to perform queries on the indexed dataset.
Execute the "Query Recall" section to calculate the precision of the queries.
Execute "Get Vectors" to retrieve a specific vector from the dataset.
Execute "Clean environment" to delete both local and S3 vector resources.
Recommended order:
Initializations → Vectors Indexing → Querying → Query Recall
(Optional: Get Vectors, Clean environment)