CS6400 Hybrid Vector Search (Group 11)

Efficient vector search combining semantic similarity with metadata filtering.

Team Members

Zhangding Liu - Baselines & Evaluation
Yao-Ting Huang - Data Module
Zaowei Dai - Indexing Module
Yichang Xu - Search Module

📋 Important Documents - READ FIRST!

All team members MUST read these before starting:

docs/GIT_WORKFLOW.md - Git collaboration workflow (branching strategy, PR process, commit conventions)
docs/API_CONTRACT.md - Module interface specifications (ensures code integration)
TASK_ASSIGNMENT.md - Individual task assignments

Quick Start

# 1. Clone and install
git clone https://github.com/ZhangdingLiu/CS6400_Project_Group11.git
cd CS6400_Project_Group11
pip install -r requirements.txt

# 2. Read the important docs above ⭐

# 3. Create your feature branch from develop
git checkout develop
git pull origin develop
git checkout -b feature/your-module-name

# 4. Start coding following API_CONTRACT.md

Project Structure

CS6400_Project_Group11/
├── docs/              # 📋 Documentation
│   ├── GIT_WORKFLOW.md    # ⭐ Git workflow (MUST READ!)
│   └── API_CONTRACT.md    # ⭐ Interface specs (MUST READ!)
├── data/              # Data loading & preprocessing (Yao-Ting)
├── indexing/          # IVF-PQ index & signatures (Zaowei)
├── search/            # Search engine (Yichang)
├── baselines/         # Baseline methods (Zhangding)
├── evaluation/        # Evaluation framework (Zhangding)
├── experiments/       # Experiment runners
├── utils/             # Utilities (shared)
├── config/            # Configuration files
├── TASK_ASSIGNMENT.md # ⭐ Task assignments (MUST READ!)
└── requirements.txt   # Python dependencies

How It Works

Filter-Aware Pruning: Metadata signatures eliminate IVF lists that can't satisfy filters
Adaptive Deepening: Dynamically adjust search parameters based on intermediate results
Hybrid Search: Combines vector similarity with structured metadata filtering

Running Experiments

# After all modules are implemented:
python experiments/run_experiments.py
python experiments/analyze_results.py

Development Workflow

Full process: See docs/GIT_WORKFLOW.md

Quick steps:

Read TASK_ASSIGNMENT.md to understand your tasks
Read docs/API_CONTRACT.md for interface specifications
Create feature branch from develop
Write code in your assigned module folder
Add unit tests
Submit PR to develop branch (NOT main!)
Wait for code review and merge

Configuration

Edit config/config.yaml to adjust:

Dataset size and embedding method
Index parameters (nlist, m, nbits)
Search parameters (nprobe_max, growth factors)

Dependencies

FAISS (vector indexing)
NumPy, Pandas (data processing)
PyArrow (Parquet files)
PyTest (testing)

See requirements.txt for full list.

Timeline

Week 1-2: Data + Indexing
Week 2-3: Search + Baselines
Week 4: Integration + Experiments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CS6400 Hybrid Vector Search (Group 11)

Team Members

📋 Important Documents - READ FIRST!

Quick Start

Project Structure

How It Works

Running Experiments

Development Workflow

Configuration

Dependencies

Timeline

Questions?

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
baselines		baselines
config		config
data		data
docs		docs
evaluation		evaluation
experiments		experiments
indexing		indexing
scripts		scripts
search		search
utils		utils
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
LICENSE		LICENSE
README.md		README.md
TASK_ASSIGNMENT.md		TASK_ASSIGNMENT.md
requirements.txt		requirements.txt

License

docldai-22/CS6400_Project_Group11

Folders and files

Latest commit

History

Repository files navigation

CS6400 Hybrid Vector Search (Group 11)

Team Members

📋 Important Documents - READ FIRST!

Quick Start

Project Structure

How It Works

Running Experiments

Development Workflow

Configuration

Dependencies

Timeline

Questions?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages