Skip to content

Taycode/pipelinepal

Repository files navigation

PipelinePal - Engineer Search Pipeline

A fast MVP that allows internal team members to search for engineers using natural language queries.

🎯 Project Goal

Build a fast MVP that allows internal team members to search for engineers using natural language queries, such as:

"Looking for backend engineers with NestJS and AWS, 3+ years of experience in fintech."

🧱 Architecture Overview

         ┌────────────────────────────┐
         │      Next.js Frontend      │
         │ - Query input              │
         │ - Results list             │
         └────────────┬───────────────┘
                      │
                      ▼
         ┌────────────────────────────┐
         │        FastAPI Backend     │
         │ - /ingest endpoint         │
         │ - /search endpoint         │
         └────────────┬───────────────┘
                      │
      ┌───────────────┴────────────────────┐
      ▼                                    ▼
[Greenhouse API]                    [Pinecone Vector DB]
- Fetch resumes                    - Store semantic embeddings
- PDFs + metadata                  - Search similar profiles
          ▼                                  ▲
    [OpenAI (GPT-4)]               [OpenAI Embeddings]
- Parse resume text to JSON        - Convert query into vector
- Extract skills, years, domains
                      │
                      ▼
                [PostgreSQL Database]
       - Store resume text + structured metadata

🧰 Technology Stack

Layer Tool Why
Frontend Next.js Fast React-based framework, easy to deploy
Backend FastAPI Simple, async Python API layer
Resume Source Greenhouse API Resume and candidate data
Resume Parsing pdfplumber + OpenAI GPT-4 Easy, accurate text extraction and structuring
Embedding OpenAI text-embedding-ada-002 Reliable and powerful for semantic search
Vector DB Pinecone Hosted, scalable vector database
Relational DB PostgreSQL Stores resume text and structured data
Deployment Docker For clean reproducible local/in-cloud deployment

🚀 Quick Start

Prerequisites

  • Python 3.8+
  • Node.js 18+
  • Docker and Docker Compose
  • PostgreSQL
  • Pinecone account
  • OpenAI API key
  • Greenhouse API access

Environment Setup

  1. Clone the repository
  2. Copy .env.example to .env and fill in your API keys
  3. Run the setup scripts
# Backend setup
cd backend
pip install -r requirements.txt

# Frontend setup
cd frontend
npm install

# Database setup
docker-compose up -d postgres

Running the Application

# Start backend
cd backend
uvicorn main:app --reload

# Start frontend
cd frontend
npm run dev

📁 Project Structure

pipelinepal/
├── backend/                 # FastAPI backend
│   ├── app/
│   │   ├── api/            # API routes
│   │   ├── core/           # Configuration and utilities
│   │   ├── models/         # Database models
│   │   ├── services/       # Business logic
│   │   └── utils/          # Helper functions
│   ├── requirements.txt
│   └── main.py
├── frontend/               # Next.js frontend
│   ├── components/         # React components
│   ├── pages/             # Next.js pages
│   ├── styles/            # CSS styles
│   └── package.json
├── docker-compose.yml      # Database services
└── README.md

🔄 System Flow

1. Ingestion Phase

  • Pull resumes from Greenhouse API (PDFs + candidate metadata)
  • Convert PDFs to text using pdfplumber
  • Use OpenAI (GPT-4) to extract skills, job titles, companies, domains, years of experience
  • Save structured data + full text in PostgreSQL
  • Generate embeddings using OpenAI Embeddings
  • Store embeddings in Pinecone for semantic search

2. Search Phase

  • User submits natural language query via UI
  • Query is embedded using OpenAI
  • Perform vector search in Pinecone
  • Retrieve top matching candidates and metadata from PostgreSQL
  • Return results to frontend for display

📋 Features

  • Resume ingestion from Greenhouse API
  • GPT-4-based resume parsing
  • Embedding pipeline using OpenAI
  • PostgreSQL storage for resumes and metadata
  • Pinecone vector storage for semantic search
  • Search endpoint with natural language queries
  • Next.js frontend with search interface
  • Docker deployment setup

🔧 API Endpoints

  • POST /ingest - Ingest resumes from Greenhouse
  • GET /search - Search candidates with natural language
  • GET /candidates - List all candidates
  • GET /health - Health check

🗓️ Timeline

Week Focus
Week 1 Resume ingestion, parsing, storage, and embedding
Week 2 Search pipeline, frontend UI, filters, testing, and deployment

💡 Notes

  • Using LLMs instead of rule-based NLP allows for rapid prototyping and handles messy resume formats
  • MVP favors speed and quality over scale
  • Future improvements: add filters, improve ranking, move to local LLM if needed

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published