S3 VECTORS PIPELINE

This pipeline uses s3vectors to manage vector representations of data. It allows indexing vectors in S3 and performing similarity-based queries or searches. It also helps create the necessary S3 vector bucket and resources required to run the entire process.

STEPS TO EXECUTE PIPELINE

Use of a custom Dataset

If you use your own dataset, you need to specify all the necessary parameters in the "Initializations" section.
Make sure your files (dataset, queries, and true_neighbors) follow the correct format (as indicated in the notebook).

1 - Upload CSV Files

Upload the CSV files from files.zip to any S3 bucket.

2 - Run Initializations

Execute the "Initializations" section to:

Import the necessary packages
Create the required S3 vector resources

Note:

If the resources already exist, an error will appear.
You can run "Clean environment" to delete the existing resources.

3 - Run Vectors Indexing

Execute the "Vectors Indexing" section to insert vectors into S3 Vectors.

You have two options:

Insert the entire dataset at once
Insert vectors one by one

Important:

Specify the name of the S3 bucket where the CSV files are located.

4 - Run Querying

Execute the "Querying" section to perform queries on the indexed dataset.

5 - Run Query Recall

Execute the "Query Recall" section to calculate the precision of the queries.

6 - (Optional) Get Vectors

Execute "Get Vectors" to retrieve a specific vector from the dataset.

7 - (Optional) Clean Environment

Execute "Clean environment" to delete both local and S3 vector resources.

Recommended order:
Initializations → Vectors Indexing → Querying → Query Recall
(Optional: Get Vectors, Clean environment)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.pyrun		.pyrun
README.md		README.md
files.zip		files.zip
s3vectors-pipeline.ipynb		s3vectors-pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S3 VECTORS PIPELINE

STEPS TO EXECUTE PIPELINE

Use of a custom Dataset

1 - Upload CSV Files

2 - Run Initializations

3 - Run Vectors Indexing

4 - Run Querying

5 - Run Query Recall

6 - (Optional) Get Vectors

7 - (Optional) Clean Environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

S3 VECTORS PIPELINE

STEPS TO EXECUTE PIPELINE

Use of a custom Dataset

1 - Upload CSV Files

2 - Run Initializations

3 - Run Vectors Indexing

4 - Run Querying

5 - Run Query Recall

6 - (Optional) Get Vectors

7 - (Optional) Clean Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages