🚀 Jobzy — Automated Job Aggregator

Jobzy is a full-stack job aggregation platform that automatically scrapes job listings from multiple Indian and global job portals — LinkedIn, Naukri, Foundit (Monster India), and Indeed — on a configurable schedule, deduplicates them, and surfaces them in a clean Next.js dashboard.

You define Preferences (search criteria + platforms + polling interval), and Jobzy keeps your job feed fresh in the background — automatically.

✨ Features

🔍 Multi-platform scraping — LinkedIn, Naukri, Foundit, Indeed scraped via Playwright + stealth mode
⚙️ Preference-driven scheduling — set keywords, location, experience range, job type, and poll interval per preference
📦 BullMQ queues — one queue per platform; repeatable background jobs run on your schedule
🧠 Smart deduplication — Redis-backed seen-job cache prevents duplicate DB inserts
🕐 24-hour freshness cap — stale jobs are filtered out regardless of datePosted preference
✅ Applied-jobs tracking — mark jobs as applied; they disappear from the main feed automatically
🗑️ Ignore jobs — permanently remove unwanted listings from your feed instantly
📊 Stats API — aggregated job counts by platform, total preferences count
♻️ Self-healing on restart — server re-enqueues any preferences that lost their queue entries
🌐 Next.js frontend — job feed, applied-jobs list, preference management dashboard
🐳 Docker Compose — one command to spin up API, workers, and frontend together

🏗️ Architecture

┌──────────────────────────────────────────────────────┐
│                    Docker Compose                    │
│                                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────┐  │
│  │  Next.js     │  │  Express API │  │  BullMQ   │  │
│  │  Frontend    │→ │  (port 3000) │  │  Workers  │  │
│  │  (port 3001) │  └──────┬───────┘  └─────┬─────┘  │
│  └──────────────┘         │                │        │
└──────────────────────────────────────────────────────┘
                            │                │
              ┌─────────────┴──┐     ┌───────┴──────┐
              │  MongoDB Atlas │     │  Redis Cloud │
              │  (Jobs, Prefs, │     │  (Queues +   │
              │   AppliedJobs) │     │   Dedup cache│
              └────────────────┘     └──────────────┘

Components

Component	Tech	Purpose
API Server	Express.js	REST API for preferences, jobs, stats, health
Workers	BullMQ + Node.js	Background scraping jobs per platform
Scrapers	Playwright + stealth	Browser automation for each job portal
Database	MongoDB Atlas	Stores jobs, preferences, applied jobs
Queue / Cache	Redis Cloud	BullMQ job queues + seen-job dedup
Frontend	Next.js 14 (App Router)	Dashboard UI

📁 Project Structure

jobzy/
├── src/
│   ├── server.js               # Express app entry point
│   ├── browser.js              # Playwright browser factory (stealth mode)
│   ├── db/
│   │   ├── mongoose.js         # MongoDB connection
│   │   └── models/
│   │       ├── Preference.js   # Search preference schema
│   │       ├── Job.js          # Scraped job schema
│   │       └── AppliedJob.js   # Applied-job snapshot schema
│   ├── routes/
│   │   ├── jobs.js             # LinkedIn scraper route
│   │   ├── naukri.js           # Naukri scraper route
│   │   ├── foundit.js          # Foundit scraper route
│   │   ├── indeed.js           # Indeed scraper route
│   │   └── preferences.js      # Preference CRUD + jobs-list + applied-jobs
│   ├── scrapers/
│   │   ├── linkedin.js         # LinkedIn scraper
│   │   ├── naukri.js           # Naukri scraper
│   │   ├── foundit.js          # Foundit scraper
│   │   └── indeed.js           # Indeed scraper
│   ├── workers/
│   │   ├── index.js            # Worker process entry point (starts all 4 workers)
│   │   ├── linkedin.worker.js
│   │   ├── naukri.worker.js
│   │   ├── foundit.worker.js
│   │   ├── indeed.worker.js
│   │   └── extractors.js       # Platform-specific job ID extractors
│   ├── queues/
│   │   └── index.js            # BullMQ queue definitions & helpers
│   ├── dedup/
│   │   └── redis.js            # Redis client + isJobSeen / markJobSeen helpers
│   └── utils/
│       └── jobFreshness.js     # 24h freshness filter
├── web/                        # Next.js frontend
│   ├── app/
│   │   ├── page.tsx            # Main jobs dashboard
│   │   ├── layout.tsx
│   │   └── jobs/               # Job detail pages
│   ├── components/             # Reusable UI components
│   ├── lib/                    # API helper utilities
│   └── next.config.ts
├── scripts/
│   └── reset.js                # CLI tool to clear queue / DB data
├── Dockerfile                  # API + Worker image
├── docker-compose.yml
├── .env.example
└── package.json

⚙️ Prerequisites

Node.js v18+ and npm
MongoDB Atlas free cluster (atlas.mongodb.com)
Redis Cloud free subscription (redis.io)
Docker (optional, for containerised deployment)

🧩 How It Works — End to End

Create a Preference via the dashboard or API (POST /api/preferences).
The API saves it to MongoDB and calls enqueuePreference(), which adds a repeatable BullMQ job to each selected platform's queue with repeat: { every: repeatEvery, immediately: true }.
The Worker process (running separately) picks up jobs from each queue.
Each worker calls the corresponding Playwright scraper, which launches a headless Chromium browser in stealth mode to scrape results.
For each scraped job:
- A unique platformJobId is extracted from the URL.
- The Redis seen-job cache (isJobSeen) is checked — duplicates are skipped.
- A 24-hour freshness check filters out old jobs.
- Fresh, unseen jobs are inserted into MongoDB and marked in Redis (markJobSeen).
The frontend polls /api/jobs-list and displays the aggregated, deduplicated feed.
Clicking Apply calls POST /api/preferences/apply-job/:id, which snapshots the job to AppliedJob and removes it from the main feed.
Clicking Ignore calls DELETE /api/preferences/job/:id, which hard-deletes the record.

🛠️ Local Setup

1. Clone & Install

git clone https://github.com/your-username/jobzy.git
cd jobzy
npm install

2. Install Playwright Browser

npm run install:browsers
# Equivalent to: npx playwright install chromium

3. Configure Environment

cp .env.example .env

Edit .env:

PORT=3000
HEADLESS=true

# Redis Cloud — use rediss:// (double-s) for TLS
REDIS_URL=rediss://default:<password>@<host>:<port>

# MongoDB Atlas SRV connection string
MONGODB_URI=mongodb+srv://<username>:<password>@cluster0.xxxxx.mongodb.net/jobzy?retryWrites=true&w=majority

# Next.js — API base URL seen by the browser
NEXT_PUBLIC_API_URL=http://localhost:3000

4. Start the API Server

npm run dev
# or for production:
npm start

5. Start the Background Workers (separate terminal)

npm run dev:worker
# or for production:
npm run worker

6. Start the Frontend

cd web
npm install
npm run dev     # Runs on http://localhost:3001

🐳 Docker Compose (Recommended)

Spins up the API, worker, and Next.js frontend in one command:

# Build and start all services
docker compose up --build

# Run in the background
docker compose up -d --build

Service	Port	Description
`api`	3000	Express REST API
`worker`	—	BullMQ background workers (no HTTP port)
`web`	3001	Next.js frontend

Note: Redis and MongoDB run on the cloud. Only REDIS_URL and MONGODB_URI in .env are needed — no local Redis/Mongo containers required.

📡 API Reference

Health Check

GET /health

Returns API status and a summary of all available endpoints.

Preferences

Preferences are the core concept — each defines what to search, where, and how often.

Method	Endpoint	Description
`GET`	`/api/preferences`	List all preferences with live queue status
`POST`	`/api/preferences`	Create a new preference
`DELETE`	`/api/preferences/:id`	Delete preference and remove from queue
`POST`	`/api/preferences/:id/start`	Resume scraping for a preference
`POST`	`/api/preferences/:id/pause`	Pause scraping (keeps DB record)
`GET`	`/api/preferences/:id/status`	Get live queue status (`active` \| `paused`)
`GET`	`/api/preferences/stats`	Job counts per platform + total prefs

Create Preference — Request Body

{
  "name": "Senior React Developer",
  "filters": {
    "keywords": "React Node.js",
    "location": "Bangalore",
    "experience": "2-5",
    "experienceMin": 2,
    "experienceMax": 5,
    "datePosted": 60,
    "jobType": "fulltime"
  },
  "platforms": ["naukri", "foundit", "linkedin", "indeed"],
  "repeatEvery": 600000,
  "startNow": true
}

Field	Type	Description
`name`	string	Required. Human-readable label
`filters.keywords`	string	Job title / skills to search
`filters.location`	string	City or region
`filters.experience`	string	Experience string for LinkedIn/Naukri (e.g. `"2-5"`)
`filters.experienceMin`	number	Min years of experience (Foundit)
`filters.experienceMax`	number	Max years of experience (Foundit)
`filters.datePosted`	number	Age filter in minutes (e.g. `60` = last hour, `1440` = last day)
`filters.jobType`	string	Job type filter
`platforms`	array	Platforms to scrape. Defaults to all four
`repeatEvery`	number	Poll interval in ms. Clamped to 3 min – 60 min
`startNow`	boolean	Auto-start on creation. Defaults to `true`

Jobs Feed

Method	Endpoint	Description
`GET`	`/api/jobs-list`	Paginated job feed (excludes applied jobs)
`POST`	`/api/preferences/apply-job/:jobId`	Mark a job as applied
`GET`	`/api/preferences/applied-jobs`	List all applied jobs
`DELETE`	`/api/preferences/job/:jobId`	Permanently delete/ignore a job

Jobs List Query Parameters

Param	Type	Description
`platform`	string	Filter by platform (`linkedin`, `naukri`, etc.)
`prefId`	string	Filter by preference ID
`keyword`	string	Case-insensitive title search
`page`	number	Page number (default: `1`)
`limit`	number	Results per page (default: `50`)

On-Demand Scraper Endpoints

Trigger a scrape without a preference, useful for testing:

Endpoint	Platform	Key Query Params
`GET /api/jobs/search`	LinkedIn	`keywords`, `location`, `datePosted`, `jobType`
`GET /api/naukri/search`	Naukri	`keywords`, `location`, `experience`, `datePosted`
`GET /api/foundit/search`	Foundit	`keywords`, `location`, `experienceMin`, `experienceMax`, `datePosted`, `jobType`
`GET /api/indeed/search`	Indeed	`keywords`, `location`, `datePosted`

🗄️ Data Models

`Preference`

{
  name: String,            // e.g. "Frontend Roles - Bangalore"
  filters: {
    keywords:      String, // Search query
    location:      String,
    experience:    String, // LinkedIn / Naukri format
    experienceMin: Number, // Foundit min years
    experienceMax: Number, // Foundit max years
    datePosted:    Number, // Minutes (60, 1440, etc.)
    jobType:       String,
  },
  platforms: [String],    // ['naukri', 'foundit', 'linkedin', 'indeed']
  repeatEvery: Number,    // Interval in ms (3min–60min)
  createdAt: Date,
}

`Job`

{
  platform: String,       // 'naukri' | 'foundit' | 'linkedin' | 'indeed'
  platformJobId: String,  // Unique job ID from source platform
  preferenceIds: [String],
  title: String,
  company: String,
  location: String,
  experience: String,
  salary: String,
  postedAt: String,       // ISO date string from source
  skills: [String],
  url: String,
  easyApply: Boolean,
  fetchedAt: Date,
}

`AppliedJob`

Snapshot of a job at the time it was marked applied. Fields mirror Job. Includes appliedAt: Date.

🔧 NPM Scripts

Script	Command	Description
`start`	`node src/server.js`	Start API server (production)
`dev`	`nodemon src/server.js`	Start API server with auto-reload
`worker`	`node src/workers/index.js`	Start all BullMQ workers (production)
`dev:worker`	`nodemon src/workers/index.js`	Start workers with auto-reload
`reset`	`node scripts/reset.js`	Clear queue jobs only
`reset:all`	`node scripts/reset.js --all`	Clear queue jobs + MongoDB data
`install:browsers`	`npx playwright install chromium`	Install Playwright Chromium

🔄 Self-Healing Queue

On every server startup, Jobzy checks all preferences in MongoDB and automatically re-enqueues any whose BullMQ repeatable jobs are missing (e.g. after a Redis flush or container restart). This makes the system resilient to infrastructure restarts with zero manual intervention.

🌍 Environment Variables Reference

Variable	Required	Description
`PORT`	No	API server port (default: `3000`)
`HEADLESS`	No	Set to `true` for headless Playwright (recommended in production)
`REDIS_URL`	Yes	Redis Cloud connection string (`rediss://...`)
`MONGODB_URI`	Yes	MongoDB Atlas connection string
`NEXT_PUBLIC_API_URL`	Yes (web)	API base URL visible to the browser (`http://localhost:3000` locally, `http://api:3000` in Docker)

🚧 Limitations & Known Behaviour

Rate limiting: Scrapers use stealth mode but may get CAPTCHA challenges on aggressive poll intervals. Keep repeatEvery at 15+ minutes for production use.
LinkedIn: Requires a publicly accessible listing URL; sign-in-gated jobs are not scraped.
Headless mode: Set HEADLESS=false locally to visually debug scrapers.
Dedup TTL: Redis seen-job keys should be given a TTL in production; currently they persist indefinitely.

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Commit your changes: git commit -m 'feat: add my feature'
Push to the branch: git push origin feature/my-feature
Open a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
scripts		scripts
src		src
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

🚀 Jobzy — Automated Job Aggregator

✨ Features

🏗️ Architecture

Components

📁 Project Structure

⚙️ Prerequisites

🧩 How It Works — End to End

🛠️ Local Setup

1. Clone & Install

2. Install Playwright Browser

3. Configure Environment

4. Start the API Server

5. Start the Background Workers (separate terminal)

6. Start the Frontend

🐳 Docker Compose (Recommended)

📡 API Reference

Health Check

Preferences

Create Preference — Request Body

Jobs Feed

Jobs List Query Parameters

On-Demand Scraper Endpoints

🗄️ Data Models

Preference

Job

AppliedJob

🔧 NPM Scripts

🔄 Self-Healing Queue

🌍 Environment Variables Reference

🚧 Limitations & Known Behaviour

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`Preference`

`Job`

`AppliedJob`

Packages