Efficient vector search combining semantic similarity with metadata filtering.
- Zhangding Liu - Baselines & Evaluation
- Yao-Ting Huang - Data Module
- Zaowei Dai - Indexing Module
- Yichang Xu - Search Module
All team members MUST read these before starting:
- docs/GIT_WORKFLOW.md - Git collaboration workflow (branching strategy, PR process, commit conventions)
- docs/API_CONTRACT.md - Module interface specifications (ensures code integration)
- TASK_ASSIGNMENT.md - Individual task assignments
# 1. Clone and install
git clone https://github.com/ZhangdingLiu/CS6400_Project_Group11.git
cd CS6400_Project_Group11
pip install -r requirements.txt
# 2. Read the important docs above ⭐
# 3. Create your feature branch from develop
git checkout develop
git pull origin develop
git checkout -b feature/your-module-name
# 4. Start coding following API_CONTRACT.mdCS6400_Project_Group11/
├── docs/ # 📋 Documentation
│ ├── GIT_WORKFLOW.md # ⭐ Git workflow (MUST READ!)
│ └── API_CONTRACT.md # ⭐ Interface specs (MUST READ!)
├── data/ # Data loading & preprocessing (Yao-Ting)
├── indexing/ # IVF-PQ index & signatures (Zaowei)
├── search/ # Search engine (Yichang)
├── baselines/ # Baseline methods (Zhangding)
├── evaluation/ # Evaluation framework (Zhangding)
├── experiments/ # Experiment runners
├── utils/ # Utilities (shared)
├── config/ # Configuration files
├── TASK_ASSIGNMENT.md # ⭐ Task assignments (MUST READ!)
└── requirements.txt # Python dependencies
- Filter-Aware Pruning: Metadata signatures eliminate IVF lists that can't satisfy filters
- Adaptive Deepening: Dynamically adjust search parameters based on intermediate results
- Hybrid Search: Combines vector similarity with structured metadata filtering
# After all modules are implemented:
python experiments/run_experiments.py
python experiments/analyze_results.pyFull process: See docs/GIT_WORKFLOW.md
Quick steps:
- Read
TASK_ASSIGNMENT.mdto understand your tasks - Read
docs/API_CONTRACT.mdfor interface specifications - Create feature branch from develop
- Write code in your assigned module folder
- Add unit tests
- Submit PR to develop branch (NOT main!)
- Wait for code review and merge
Edit config/config.yaml to adjust:
- Dataset size and embedding method
- Index parameters (nlist, m, nbits)
- Search parameters (nprobe_max, growth factors)
- FAISS (vector indexing)
- NumPy, Pandas (data processing)
- PyArrow (Parquet files)
- PyTest (testing)
See requirements.txt for full list.
- Week 1-2: Data + Indexing
- Week 2-3: Search + Baselines
- Week 4: Integration + Experiments