PipelinePal - Engineer Search Pipeline

A fast MVP that allows internal team members to search for engineers using natural language queries.

🎯 Project Goal

Build a fast MVP that allows internal team members to search for engineers using natural language queries, such as:

"Looking for backend engineers with NestJS and AWS, 3+ years of experience in fintech."

🧱 Architecture Overview

         ┌────────────────────────────┐
         │      Next.js Frontend      │
         │ - Query input              │
         │ - Results list             │
         └────────────┬───────────────┘
                      │
                      ▼
         ┌────────────────────────────┐
         │        FastAPI Backend     │
         │ - /ingest endpoint         │
         │ - /search endpoint         │
         └────────────┬───────────────┘
                      │
      ┌───────────────┴────────────────────┐
      ▼                                    ▼
[Greenhouse API]                    [Pinecone Vector DB]
- Fetch resumes                    - Store semantic embeddings
- PDFs + metadata                  - Search similar profiles
          ▼                                  ▲
    [OpenAI (GPT-4)]               [OpenAI Embeddings]
- Parse resume text to JSON        - Convert query into vector
- Extract skills, years, domains
                      │
                      ▼
                [PostgreSQL Database]
       - Store resume text + structured metadata

🧰 Technology Stack

Layer	Tool	Why
Frontend	Next.js	Fast React-based framework, easy to deploy
Backend	FastAPI	Simple, async Python API layer
Resume Source	Greenhouse API	Resume and candidate data
Resume Parsing	pdfplumber + OpenAI GPT-4	Easy, accurate text extraction and structuring
Embedding	OpenAI `text-embedding-ada-002`	Reliable and powerful for semantic search
Vector DB	Pinecone	Hosted, scalable vector database
Relational DB	PostgreSQL	Stores resume text and structured data
Deployment	Docker	For clean reproducible local/in-cloud deployment

🚀 Quick Start

Prerequisites

Python 3.8+
Node.js 18+
Docker and Docker Compose
PostgreSQL
Pinecone account
OpenAI API key
Greenhouse API access

Environment Setup

Clone the repository
Copy .env.example to .env and fill in your API keys
Run the setup scripts

# Backend setup
cd backend
pip install -r requirements.txt

# Frontend setup
cd frontend
npm install

# Database setup
docker-compose up -d postgres

Running the Application

# Start backend
cd backend
uvicorn main:app --reload

# Start frontend
cd frontend
npm run dev

📁 Project Structure

pipelinepal/
├── backend/                 # FastAPI backend
│   ├── app/
│   │   ├── api/            # API routes
│   │   ├── core/           # Configuration and utilities
│   │   ├── models/         # Database models
│   │   ├── services/       # Business logic
│   │   └── utils/          # Helper functions
│   ├── requirements.txt
│   └── main.py
├── frontend/               # Next.js frontend
│   ├── components/         # React components
│   ├── pages/             # Next.js pages
│   ├── styles/            # CSS styles
│   └── package.json
├── docker-compose.yml      # Database services
└── README.md

🔄 System Flow

1. Ingestion Phase

Pull resumes from Greenhouse API (PDFs + candidate metadata)
Convert PDFs to text using pdfplumber
Use OpenAI (GPT-4) to extract skills, job titles, companies, domains, years of experience
Save structured data + full text in PostgreSQL
Generate embeddings using OpenAI Embeddings
Store embeddings in Pinecone for semantic search

2. Search Phase

User submits natural language query via UI
Query is embedded using OpenAI
Perform vector search in Pinecone
Retrieve top matching candidates and metadata from PostgreSQL
Return results to frontend for display

📋 Features

Resume ingestion from Greenhouse API
GPT-4-based resume parsing
Embedding pipeline using OpenAI
PostgreSQL storage for resumes and metadata
Pinecone vector storage for semantic search
Search endpoint with natural language queries
Next.js frontend with search interface
Docker deployment setup

🔧 API Endpoints

POST /ingest - Ingest resumes from Greenhouse
GET /search - Search candidates with natural language
GET /candidates - List all candidates
GET /health - Health check

🗓️ Timeline

Week	Focus
Week 1	Resume ingestion, parsing, storage, and embedding
Week 2	Search pipeline, frontend UI, filters, testing, and deployment

💡 Notes

Using LLMs instead of rule-based NLP allows for rapid prototyping and handles messy resume formats
MVP favors speed and quality over scale
Future improvements: add filters, improve ranking, move to local LLM if needed

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
DOCKER_README.md		DOCKER_README.md
DOCKER_SETUP_COMPLETE.md		DOCKER_SETUP_COMPLETE.md
README.md		README.md
TALENT_MATCHING_GUIDE.md		TALENT_MATCHING_GUIDE.md
docker-compose.yml		docker-compose.yml
docker-setup.sh		docker-setup.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PipelinePal - Engineer Search Pipeline

🎯 Project Goal

🧱 Architecture Overview

🧰 Technology Stack

🚀 Quick Start

Prerequisites

Environment Setup

Running the Application

📁 Project Structure

🔄 System Flow

1. Ingestion Phase

2. Search Phase

📋 Features

🔧 API Endpoints

🗓️ Timeline

💡 Notes

About

Uh oh!

Releases

Packages

Languages

Taycode/pipelinepal

Folders and files

Latest commit

History

Repository files navigation

PipelinePal - Engineer Search Pipeline

🎯 Project Goal

🧱 Architecture Overview

🧰 Technology Stack

🚀 Quick Start

Prerequisites

Environment Setup

Running the Application

📁 Project Structure

🔄 System Flow

1. Ingestion Phase

2. Search Phase

📋 Features

🔧 API Endpoints

🗓️ Timeline

💡 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages