Jobzy is a full-stack job aggregation platform that automatically scrapes job listings from multiple Indian and global job portals — LinkedIn, Naukri, Foundit (Monster India), and Indeed — on a configurable schedule, deduplicates them, and surfaces them in a clean Next.js dashboard.
You define Preferences (search criteria + platforms + polling interval), and Jobzy keeps your job feed fresh in the background — automatically.
- 🔍 Multi-platform scraping — LinkedIn, Naukri, Foundit, Indeed scraped via Playwright + stealth mode
- ⚙️ Preference-driven scheduling — set keywords, location, experience range, job type, and poll interval per preference
- 📦 BullMQ queues — one queue per platform; repeatable background jobs run on your schedule
- 🧠 Smart deduplication — Redis-backed seen-job cache prevents duplicate DB inserts
- 🕐 24-hour freshness cap — stale jobs are filtered out regardless of datePosted preference
- ✅ Applied-jobs tracking — mark jobs as applied; they disappear from the main feed automatically
- 🗑️ Ignore jobs — permanently remove unwanted listings from your feed instantly
- 📊 Stats API — aggregated job counts by platform, total preferences count
- ♻️ Self-healing on restart — server re-enqueues any preferences that lost their queue entries
- 🌐 Next.js frontend — job feed, applied-jobs list, preference management dashboard
- 🐳 Docker Compose — one command to spin up API, workers, and frontend together
┌──────────────────────────────────────────────────────┐
│ Docker Compose │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ Next.js │ │ Express API │ │ BullMQ │ │
│ │ Frontend │→ │ (port 3000) │ │ Workers │ │
│ │ (port 3001) │ └──────┬───────┘ └─────┬─────┘ │
│ └──────────────┘ │ │ │
└──────────────────────────────────────────────────────┘
│ │
┌─────────────┴──┐ ┌───────┴──────┐
│ MongoDB Atlas │ │ Redis Cloud │
│ (Jobs, Prefs, │ │ (Queues + │
│ AppliedJobs) │ │ Dedup cache│
└────────────────┘ └──────────────┘
| Component | Tech | Purpose |
|---|---|---|
| API Server | Express.js | REST API for preferences, jobs, stats, health |
| Workers | BullMQ + Node.js | Background scraping jobs per platform |
| Scrapers | Playwright + stealth | Browser automation for each job portal |
| Database | MongoDB Atlas | Stores jobs, preferences, applied jobs |
| Queue / Cache | Redis Cloud | BullMQ job queues + seen-job dedup |
| Frontend | Next.js 14 (App Router) | Dashboard UI |
jobzy/
├── src/
│ ├── server.js # Express app entry point
│ ├── browser.js # Playwright browser factory (stealth mode)
│ ├── db/
│ │ ├── mongoose.js # MongoDB connection
│ │ └── models/
│ │ ├── Preference.js # Search preference schema
│ │ ├── Job.js # Scraped job schema
│ │ └── AppliedJob.js # Applied-job snapshot schema
│ ├── routes/
│ │ ├── jobs.js # LinkedIn scraper route
│ │ ├── naukri.js # Naukri scraper route
│ │ ├── foundit.js # Foundit scraper route
│ │ ├── indeed.js # Indeed scraper route
│ │ └── preferences.js # Preference CRUD + jobs-list + applied-jobs
│ ├── scrapers/
│ │ ├── linkedin.js # LinkedIn scraper
│ │ ├── naukri.js # Naukri scraper
│ │ ├── foundit.js # Foundit scraper
│ │ └── indeed.js # Indeed scraper
│ ├── workers/
│ │ ├── index.js # Worker process entry point (starts all 4 workers)
│ │ ├── linkedin.worker.js
│ │ ├── naukri.worker.js
│ │ ├── foundit.worker.js
│ │ ├── indeed.worker.js
│ │ └── extractors.js # Platform-specific job ID extractors
│ ├── queues/
│ │ └── index.js # BullMQ queue definitions & helpers
│ ├── dedup/
│ │ └── redis.js # Redis client + isJobSeen / markJobSeen helpers
│ └── utils/
│ └── jobFreshness.js # 24h freshness filter
├── web/ # Next.js frontend
│ ├── app/
│ │ ├── page.tsx # Main jobs dashboard
│ │ ├── layout.tsx
│ │ └── jobs/ # Job detail pages
│ ├── components/ # Reusable UI components
│ ├── lib/ # API helper utilities
│ └── next.config.ts
├── scripts/
│ └── reset.js # CLI tool to clear queue / DB data
├── Dockerfile # API + Worker image
├── docker-compose.yml
├── .env.example
└── package.json
- Node.js v18+ and npm
- MongoDB Atlas free cluster (atlas.mongodb.com)
- Redis Cloud free subscription (redis.io)
- Docker (optional, for containerised deployment)
- Create a Preference via the dashboard or API (
POST /api/preferences). - The API saves it to MongoDB and calls
enqueuePreference(), which adds a repeatable BullMQ job to each selected platform's queue withrepeat: { every: repeatEvery, immediately: true }. - The Worker process (running separately) picks up jobs from each queue.
- Each worker calls the corresponding Playwright scraper, which launches a headless Chromium browser in stealth mode to scrape results.
- For each scraped job:
- A unique
platformJobIdis extracted from the URL. - The Redis seen-job cache (
isJobSeen) is checked — duplicates are skipped. - A 24-hour freshness check filters out old jobs.
- Fresh, unseen jobs are inserted into MongoDB and marked in Redis (
markJobSeen).
- A unique
- The frontend polls
/api/jobs-listand displays the aggregated, deduplicated feed. - Clicking Apply calls
POST /api/preferences/apply-job/:id, which snapshots the job toAppliedJoband removes it from the main feed. - Clicking Ignore calls
DELETE /api/preferences/job/:id, which hard-deletes the record.
git clone https://github.com/your-username/jobzy.git
cd jobzy
npm installnpm run install:browsers
# Equivalent to: npx playwright install chromiumcp .env.example .envEdit .env:
PORT=3000
HEADLESS=true
# Redis Cloud — use rediss:// (double-s) for TLS
REDIS_URL=rediss://default:<password>@<host>:<port>
# MongoDB Atlas SRV connection string
MONGODB_URI=mongodb+srv://<username>:<password>@cluster0.xxxxx.mongodb.net/jobzy?retryWrites=true&w=majority
# Next.js — API base URL seen by the browser
NEXT_PUBLIC_API_URL=http://localhost:3000npm run dev
# or for production:
npm startnpm run dev:worker
# or for production:
npm run workercd web
npm install
npm run dev # Runs on http://localhost:3001Spins up the API, worker, and Next.js frontend in one command:
# Build and start all services
docker compose up --build
# Run in the background
docker compose up -d --build| Service | Port | Description |
|---|---|---|
api |
3000 | Express REST API |
worker |
— | BullMQ background workers (no HTTP port) |
web |
3001 | Next.js frontend |
Note: Redis and MongoDB run on the cloud. Only
REDIS_URLandMONGODB_URIin.envare needed — no local Redis/Mongo containers required.
GET /health
Returns API status and a summary of all available endpoints.
Preferences are the core concept — each defines what to search, where, and how often.
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/preferences |
List all preferences with live queue status |
POST |
/api/preferences |
Create a new preference |
DELETE |
/api/preferences/:id |
Delete preference and remove from queue |
POST |
/api/preferences/:id/start |
Resume scraping for a preference |
POST |
/api/preferences/:id/pause |
Pause scraping (keeps DB record) |
GET |
/api/preferences/:id/status |
Get live queue status (active | paused) |
GET |
/api/preferences/stats |
Job counts per platform + total prefs |
{
"name": "Senior React Developer",
"filters": {
"keywords": "React Node.js",
"location": "Bangalore",
"experience": "2-5",
"experienceMin": 2,
"experienceMax": 5,
"datePosted": 60,
"jobType": "fulltime"
},
"platforms": ["naukri", "foundit", "linkedin", "indeed"],
"repeatEvery": 600000,
"startNow": true
}| Field | Type | Description |
|---|---|---|
name |
string | Required. Human-readable label |
filters.keywords |
string | Job title / skills to search |
filters.location |
string | City or region |
filters.experience |
string | Experience string for LinkedIn/Naukri (e.g. "2-5") |
filters.experienceMin |
number | Min years of experience (Foundit) |
filters.experienceMax |
number | Max years of experience (Foundit) |
filters.datePosted |
number | Age filter in minutes (e.g. 60 = last hour, 1440 = last day) |
filters.jobType |
string | Job type filter |
platforms |
array | Platforms to scrape. Defaults to all four |
repeatEvery |
number | Poll interval in ms. Clamped to 3 min – 60 min |
startNow |
boolean | Auto-start on creation. Defaults to true |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/jobs-list |
Paginated job feed (excludes applied jobs) |
POST |
/api/preferences/apply-job/:jobId |
Mark a job as applied |
GET |
/api/preferences/applied-jobs |
List all applied jobs |
DELETE |
/api/preferences/job/:jobId |
Permanently delete/ignore a job |
| Param | Type | Description |
|---|---|---|
platform |
string | Filter by platform (linkedin, naukri, etc.) |
prefId |
string | Filter by preference ID |
keyword |
string | Case-insensitive title search |
page |
number | Page number (default: 1) |
limit |
number | Results per page (default: 50) |
Trigger a scrape without a preference, useful for testing:
| Endpoint | Platform | Key Query Params |
|---|---|---|
GET /api/jobs/search |
keywords, location, datePosted, jobType |
|
GET /api/naukri/search |
Naukri | keywords, location, experience, datePosted |
GET /api/foundit/search |
Foundit | keywords, location, experienceMin, experienceMax, datePosted, jobType |
GET /api/indeed/search |
Indeed | keywords, location, datePosted |
{
name: String, // e.g. "Frontend Roles - Bangalore"
filters: {
keywords: String, // Search query
location: String,
experience: String, // LinkedIn / Naukri format
experienceMin: Number, // Foundit min years
experienceMax: Number, // Foundit max years
datePosted: Number, // Minutes (60, 1440, etc.)
jobType: String,
},
platforms: [String], // ['naukri', 'foundit', 'linkedin', 'indeed']
repeatEvery: Number, // Interval in ms (3min–60min)
createdAt: Date,
}{
platform: String, // 'naukri' | 'foundit' | 'linkedin' | 'indeed'
platformJobId: String, // Unique job ID from source platform
preferenceIds: [String],
title: String,
company: String,
location: String,
experience: String,
salary: String,
postedAt: String, // ISO date string from source
skills: [String],
url: String,
easyApply: Boolean,
fetchedAt: Date,
}Snapshot of a job at the time it was marked applied. Fields mirror Job. Includes appliedAt: Date.
| Script | Command | Description |
|---|---|---|
start |
node src/server.js |
Start API server (production) |
dev |
nodemon src/server.js |
Start API server with auto-reload |
worker |
node src/workers/index.js |
Start all BullMQ workers (production) |
dev:worker |
nodemon src/workers/index.js |
Start workers with auto-reload |
reset |
node scripts/reset.js |
Clear queue jobs only |
reset:all |
node scripts/reset.js --all |
Clear queue jobs + MongoDB data |
install:browsers |
npx playwright install chromium |
Install Playwright Chromium |
On every server startup, Jobzy checks all preferences in MongoDB and automatically re-enqueues any whose BullMQ repeatable jobs are missing (e.g. after a Redis flush or container restart). This makes the system resilient to infrastructure restarts with zero manual intervention.
| Variable | Required | Description |
|---|---|---|
PORT |
No | API server port (default: 3000) |
HEADLESS |
No | Set to true for headless Playwright (recommended in production) |
REDIS_URL |
Yes | Redis Cloud connection string (rediss://...) |
MONGODB_URI |
Yes | MongoDB Atlas connection string |
NEXT_PUBLIC_API_URL |
Yes (web) | API base URL visible to the browser (http://localhost:3000 locally, http://api:3000 in Docker) |
- Rate limiting: Scrapers use stealth mode but may get CAPTCHA challenges on aggressive poll intervals. Keep
repeatEveryat 15+ minutes for production use. - LinkedIn: Requires a publicly accessible listing URL; sign-in-gated jobs are not scraped.
- Headless mode: Set
HEADLESS=falselocally to visually debug scrapers. - Dedup TTL: Redis seen-job keys should be given a TTL in production; currently they persist indefinitely.
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Commit your changes:
git commit -m 'feat: add my feature' - Push to the branch:
git push origin feature/my-feature - Open a Pull Request
MIT © 2025 Jobzy