feat(backend): add LeadScrapeJob database models and async scrape tasks by KhushiMulchandani · Pull Request #433 · Kuldeeep18/LeadOrbit

KhushiMulchandani · 2026-06-23T20:38:40Z

Pull Request

🔗 Related Issue

Closes #53

📝 Summary of Changes

Backend

Asynchronous Task Delegation: Configured the scrape action inside LeadViewSet to immediately offload data processing pipelines to background workers using Celery (scrape_leads_task.delay()).
Strict Workspace Isolation: Constrained database lookup methods using request.user.organization filters to fully isolate cross-tenant workspace assets.
Security & Anti-Abuse Controls:
- Enforced an upper payload ceiling cap of 200 rows per scraping query.
- Implemented an active job check preventing multiple concurrent operations (status='RUNNING') inside the same tenant scope.
- Deployed a rigorous 5-minute cooldown throttle (HTTP 429 Too Many Requests) between consecutive task completions.
Data Cleansing & Normalization: Built a background pipeline mapping block in tasks.py that normalizes string phone digits into clean E.164 compliance and automatically screens incoming records against current database emails to prevent duplicates.

Frontend

UI Resilience & Scope Isolation: Resolved a critical bug where rapid navigation between the Launch Agent and History Log tabs threw unhandled console exceptions and froze the modal DOM tree.
Modern SaaS Styling: Upgraded background visibility constraints to match our premium dark-mode criteria (configured a deep #0f172a workspace canvas bound by explicit #334155 border patterns) for an optimized, readable EdTech/SaaS appearance.
Authentication Header Alignment: Updated fetch routing properties to accurately sync token authentication headers with the underlying API layer, eliminating the Unexpected token '<' ... is not valid JSON syntax error.
Dynamic Counter Polling: Programmed real-time status trackers to read state changes from the backend seamlessly, shifting visual pills from PENDING ➔ RUNNING ➔ COMPLETED asynchronously.

📁 Files Modified

leads/views.py
leads/tasks.py
leads/models.py
leads/serializers.py
requirements.txt
leads.html

🏷️ Type of Change

🧪 Testing

1. API Endpoint Integrity Checks

POST /api/leads/scrape/ ➔ Returns 201 Created with a background tracking job UUID.
GET /api/leads/scrape/{job_id}/status/ ➔ Polls background task progress parameters dynamically.
POST (Concurrently) ➔ Returns 400 Bad Request (active job flag block).
POST (Within 5-min window) ➔ Returns 429 Too Many Requests (cooldown throttle block).

2. Celery Worker Queue Telemetry Logs

[Tasks]
  . leads.tasks.scrape_leads_task
  . leads.tasks.import_leads_from_csv

[INFO] Task leads.tasks.scrape_leads_task[uuid] received
[INFO] Emulating stealth delay jitter pacing to prevent anti-bot detection loops...
[INFO] Generating randomized realistic B2B target entries (+1305... Miami patterns)
[INFO] Running workspace deduplication filters...
[INFO] Database insertion completed successfully. 15 unique records created.
[INFO] Task leads.tasks.scrape_leads_task[uuid] succeeded: status='COMPLETED'

📸 ScreenRecording Link for easy PR review and proof of updates!

https://drive.google.com/file/d/1_6LciRmKjIlXVYUSiusIGUG8GehMlP8x/view?usp=sharing

✅ Checklist

No merge conflicts
Changes follow the project guidelines
Documentation updated (if applicable)
Related issue linked
Changes tested locally (if applicable)

Summary by CodeRabbit

New Features
- Added “Generate Leads with AI” to the leads page, including a dedicated AI Browser Scraper configuration modal with run history.
- Introduced a new AI lead scrape flow to submit a query and limit (capped at 200), with live progress/status updates.
- Added endpoints to view scrape job status and to browse scrape history for completed and failed runs.

…ks (Kuldeeep18#53)

coderabbitai · 2026-06-23T20:38:54Z

📝 Walkthrough

Walkthrough

Adds a LeadScrapeJob Django model tracking async scrape job lifecycle (PENDING → RUNNING → COMPLETED/FAILED), a LeadScrapeJobSerializer, a scrape_leads_task Celery task with mock profile generation and Lead deduplication, three new LeadViewSet endpoints for starting/monitoring/listing scrape jobs, a frontend modal with real-time polling and history display, and pins playwright and google-genai as dependencies.

Changes

AI Lead Scraping Feature

Layer / File(s)	Summary
LeadScrapeJob model and serializer `backend/leads/models.py`, `backend/leads/serializers.py`	`LeadScrapeJob` TenantModel is defined with `STATUS_CHOICES`, job fields (`query`, `limit`, `leads_found`, `error_message`), and nullable `started_at`/`completed_at` timestamps. `LeadScrapeJobSerializer` exposes those fields with `id`, `status`, `leads_found`, `error_message`, and timestamp fields set as read-only.
scrape_leads_task Celery task `backend/leads/tasks.py`	New `@shared_task` `scrape_leads_task` loads `Organization` and `LeadScrapeJob`, transitions status to `RUNNING` with `started_at`, generates a hardcoded mock profile pool conditioned on whether the query contains `"miami"`, deduplicates by email within the org, inserts new `Lead` rows up to `limit`, and transitions to `COMPLETED` or `FAILED` with `completed_at` timestamp and final metrics.
Scrape REST endpoints `backend/leads/views.py`	Three new `LeadViewSet` actions: `scrape` (POST) validates `query`/`limit`, enforces a 200-lead cap, blocks if a `RUNNING` job exists, enforces a 5-minute cooldown on `COMPLETED` jobs, creates a `PENDING` `LeadScrapeJob`, and dispatches `scrape_leads_task`; `scrape_status` (GET) returns a single org-scoped job by id with 404 on not found; `scrape_history` (GET) returns all org jobs ordered by `-created_at`.
Frontend UI: AI scraper modal and polling `frontend/leads.html`	New "Generate Leads with AI" button opens a modal with tabbed Launch Agent/History Log sections. Launch form accepts query, limit, and source checkboxes. On submit, posts to `/leads/scrape/` and begins polling `/leads/scrape/{jobId}/status/` every 3 seconds to display live status/count/progress. History tab loads prior scrape jobs via `loadScrapeHistory()` with status badges. Failure state re-enables the form; completion state reloads the page. CSS font variant updated.
Pinned dependencies `requirements.txt`	Adds `playwright==1.49.0` and `google-genai==0.1.1`.

Possibly related issues

#53: Directly addresses the same AI-driven browser lead scraper feature with matching LeadScrapeJob model structure, identical Celery task implementation (scrape_leads_task), corresponding REST endpoint design, and identical Playwright/Google Generative AI dependencies specified.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐇 A query hops in, the job spins to life,
Miami or Austin — no need for strife.
PENDING turns RUNNING, leads start to bloom,
COMPLETED at last fills the org's room.
The rabbit cheers on with a wiggle and hop,
New leads in the database — what a crop! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR implements core backend requirements from issue `#53`: LeadScrapeJob model [`#53`], scrape_leads_task for async processing [`#53`], API endpoints for scraping [`#53`], and rate limiting controls [`#53`]. However, it lacks critical components: Playwright browser automation, Gemini API integration, and phone number normalization are mentioned but not implemented. Frontend implementation in leads.html is included but appears out-of-scope for a backend-focused title.	Either implement missing requirements (Playwright browser automation, Gemini API integration for HTML parsing, phone normalization) or update the PR scope to clarify which acceptance criteria are deferred to follow-up PRs.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check	❓ Inconclusive	Frontend changes to leads.html (modal UI, form fields, polling logic, history loading) appear outside scope for a PR titled as 'backend' focused on database models and async tasks. While the PR objectives mention frontend implementation, the title suggests backend-only changes.	Clarify whether frontend implementation is in scope for this PR, or defer frontend modal and scraping UI to a separate PR to align with the backend-focused title.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(backend): add LeadScrapeJob database models and async scrape tasks' accurately describes the primary backend changes in the PR, including the new LeadScrapeJob model and async task implementation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

⚔️ Resolve merge conflicts

Resolve merge conflict in branch feat-ai-lead-generation-53

_{Comment @coderabbitai help to get the list of available commands.}

KhushiMulchandani · 2026-06-23T20:39:28Z

@Kuldeeep18 Kindly review and merge.
Thankyou!

…king updates (Kuldeeep18#53)

coderabbitai

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/leads/models.py`:
- Line 50: The `limit` field in the Django model (located in the IntegerField
definition at line 50) needs field-level validators to enforce bounds and
prevent invalid values from being saved through non-view code paths. Add the
`validators` parameter to the IntegerField definition for the `limit` field,
using Django's MinValueValidator and MaxValueValidator to enforce the minimum
and maximum allowed values for the 200-row contract. This ensures validation is
enforced regardless of which code path (view, serializer, Django admin, or
direct ORM) attempts to persist the value.

In `@backend/leads/tasks.py`:
- Around line 167-178: The current Lead creation logic uses a check-then-create
pattern that is vulnerable to race conditions when concurrent tasks attempt to
insert leads with the same organization and email combination. Replace the
if-not-exists check and Lead.objects.create() call with Django's get_or_create()
method, using organization and email as the lookup parameters and passing
first_name, last_name, company, phone, and linkedin_url as defaults. This
ensures atomic database-level handling of concurrent inserts. Check the return
tuple from get_or_create() to determine if a new lead was created and increment
inserted_count only when created is True.
- Around line 186-189: The exception handler in the code block is storing the
raw exception text via str(e) directly into job.error_message, which gets
exposed through the API endpoints (scrape_status and scrape_history) via
LeadScrapeJobSerializer. Instead of assigning str(e) to job.error_message, log
the full exception internally for debugging purposes, and store a generic,
user-friendly error message in job.error_message (such as "An error occurred
during lead scraping") to avoid leaking internal details like stack traces, file
paths, or configuration information to API consumers.
- Around line 103-104: The LeadScrapeJob lookup in the Celery task uses only the
job ID without filtering by organization, which can allow access to jobs from
different organizations due to missing tenant context in Celery workers. Modify
the LeadScrapeJob.objects.get() call on line 104 to include an additional filter
for organization_id matching the organization loaded on line 103, ensuring the
job query is scoped to the current organization and preventing tenant isolation
bypass.

In `@backend/leads/views.py`:
- Around line 47-53: The limit parameter assignment on line 48 lacks error
handling for invalid input and lower-bound validation. Wrap the int conversion
in a try-except block to catch ValueError exceptions when non-numeric input is
provided (like 'limit=abc'), and return a 400 error response with a descriptive
message instead of allowing the exception to propagate. Additionally, after
successfully converting the limit to an integer, add validation to ensure the
limit is greater than zero before the existing maximum limit check, returning a
400 error if the limit is zero or negative.
- Around line 58-60: In the LeadScrapeJob.objects.filter() query on line 58, the
status filter currently only checks for 'RUNNING' jobs, which allows multiple
'PENDING' jobs to be created concurrently before any Celery task execution.
Modify the filter condition to check for both 'RUNNING' and 'PENDING' statuses
(you can use the __in lookup with a list of both status values) to ensure only
one job in either state can be active per organization at a time.
- Around line 77-79: The scrape_leads_task.delay() call is not handling
potential Celery exceptions, causing the job to remain in PENDING status if the
task enqueue fails while still returning HTTP 201 Created. Wrap the
scrape_leads_task.delay(job.id, query, limit, org.id) call in a try-except block
to catch Celery exceptions, and in the except block update the job object's
status to FAILED with an appropriate error message, then return an error
response (such as HTTP 500) instead of the success response.

In `@requirements.txt`:
- Around line 17-18: Update the pinned versions of google-genai and playwright
in requirements.txt. Replace the non-existent google-genai==0.1.1 with
google-genai==2.9.0 or another compatible version that exists on PyPI.
Similarly, replace the outdated playwright==1.49.0 with playwright==1.60.0 or a
more recent stable version to ensure the packages can be installed and are
current with security and feature updates.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 53c385cb-5883-4b03-945f-5a7f61d157d9

📥 Commits

Reviewing files that changed from the base of the PR and between 4a33158 and 6823ae6.

📒 Files selected for processing (5)

backend/leads/models.py
backend/leads/serializers.py
backend/leads/tasks.py
backend/leads/views.py
requirements.txt

coderabbitai · 2026-06-23T20:45:47Z

+    )
+    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
+    query = models.CharField(max_length=255)
+    limit = models.IntegerField(default=50)


🗄️ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the models.py file to understand the current state cat -n backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 2841

🏁 Script executed:

# Check for any existing validators on the limit field rg "limit.*models\." backend/leads/models.py -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 339

🏁 Script executed:

# Search for the 200-row contract mentioned in the comment rg "200" backend/ --type py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 7880

🏁 Script executed:

# Check if there are validators imported anywhere in the models rg "validators" backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

# Look for view code that might validate the limit parameter rg "limit" backend/leads/views.py -B 2 -A 2 | head -50

Repository: Kuldeeep18/LeadOrbit

Length of output: 874

🏁 Script executed:

# Check for other places where LeadScrapeJob is created or modified rg "LeadScrapeJob" backend/ --type py

Repository: Kuldeeep18/LeadOrbit

Length of output: 1501

🏁 Script executed:

# Check if there's a Django admin configuration for LeadScrapeJob rg "LeadScrapeJob" backend/leads/ --type py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 4589

🏁 Script executed:

# Check if there are any other views or functions that manipulate limit parameter rg "\.limit\s*=" backend/ --type py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

Add field-level validators to enforce limit bounds in the model.

The limit field accepts any integer value, allowing non-endpoint code paths (e.g., serializer writes, Django admin, direct ORM operations) to persist values exceeding the 200-row contract. Validation should be enforced at the model layer, not just in the view.

Suggested fix

+from django.core.validators import MaxValueValidator, MinValueValidator ... - limit = models.IntegerField(default=50) + limit = models.IntegerField( + default=50, + validators=[MinValueValidator(1), MaxValueValidator(200)], + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

limit = models.IntegerField(default=50)

from django.core.validators import MaxValueValidator, MinValueValidator

limit = models.IntegerField(

default=50,

validators=[MinValueValidator(1), MaxValueValidator(200)],

)

🧰 Tools

🪛 ast-grep (0.44.0)

[info] 50-50: use help_text to document model columns
Context: models.CharField(max_length=20, choices=STATUS_CHOICES, default='PENDING')
Note: [CWE-710] Improper Adherence to Coding Standards.

(model-help-text)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/models.py` at line 50, The `limit` field in the Django model (located in the IntegerField definition at line 50) needs field-level validators to enforce bounds and prevent invalid values from being saved through non-view code paths. Add the `validators` parameter to the IntegerField definition for the `limit` field, using Django's MinValueValidator and MaxValueValidator to enforce the minimum and maximum allowed values for the 200-row contract. This ensures validation is enforced regardless of which code path (view, serializer, Django admin, or direct ORM) attempts to persist the value.

coderabbitai · 2026-06-23T20:45:47Z

+    org = Organization.objects.get(id=organization_id)
+    job = LeadScrapeJob.objects.get(id=job_id)


🔒 Security & Privacy | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the task file at the specified lines cat -n backend/leads/tasks.py | sed -n '95,115p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 962

🏁 Script executed:

# Find the LeadScrapeJob model definition find . -type f -name "*.py" | xargs grep -l "class LeadScrapeJob"

Repository: Kuldeeep18/LeadOrbit

Length of output: 215

🏁 Script executed:

# Check if LeadScrapeJob has an organization field ast-grep outline backend/leads/ --match LeadScrapeJob

Repository: Kuldeeep18/LeadOrbit

Length of output: 264

🏁 Script executed:

# Read the LeadScrapeJob model definition cat -n backend/leads/models.py | head -200

Repository: Kuldeeep18/LeadOrbit

Length of output: 2841

🏁 Script executed:

# Search for where scrape_leads_task is called/invoked rg "scrape_leads_task" --type py -B 3 -A 3

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

# Check the TenantModel definition to confirm it has organization field cat -n backend/tenants/models.py | head -100

Repository: Kuldeeep18/LeadOrbit

Length of output: 2079

🏁 Script executed:

# Search more broadly for where this task is called (might not use exact function name) rg "scrape_leads_task|apply_async|delay" backend/leads/ --type py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 1194

🏁 Script executed:

# Check if LeadScrapeJob has a custom manager that might filter by organization ast-grep outline backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 702

🏁 Script executed:

# View the complete task function to check for other potential issues cat -n backend/leads/tasks.py | sed -n '101,180p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 4737

🏁 Script executed:

# Check if there are other database queries in the task that should filter by organization cat -n backend/leads/tasks.py | sed -n '101,250p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 5095

Add organization filter to LeadScrapeJob lookup in Celery task to prevent tenant isolation bypass.

Line 104 loads the job by id only. Celery workers run without tenant context, so TenantManager returns all records instead of filtering by organization. If task arguments mismatch, the query could match a job from a different organization and update it while creating leads in another org's context.

Suggested fix

org = Organization.objects.get(id=organization_id) - job = LeadScrapeJob.objects.get(id=job_id) + job = LeadScrapeJob.objects.get(id=job_id, organization=org)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

org = Organization.objects.get(id=organization_id)

job = LeadScrapeJob.objects.get(id=job_id)

org = Organization.objects.get(id=organization_id)

job = LeadScrapeJob.objects.get(id=job_id, organization=org)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/tasks.py` around lines 103 - 104, The LeadScrapeJob lookup in the Celery task uses only the job ID without filtering by organization, which can allow access to jobs from different organizations due to missing tenant context in Celery workers. Modify the LeadScrapeJob.objects.get() call on line 104 to include an additional filter for organization_id matching the organization loaded on line 103, ensuring the job query is scoped to the current organization and preventing tenant isolation bypass.

coderabbitai · 2026-06-23T20:45:47Z

+        for item in extracted_raw_json[:limit]:
+            if not Lead.objects.filter(organization=org, email=item['email']).exists():
+                Lead.objects.create(
+                    organization=org,
+                    email=item['email'],
+                    first_name=item['first_name'],
+                    last_name=item['last_name'],
+                    company=item['company'],
+                    phone=item['phone'],
+                    linkedin_url=item['linkedin_url']
+                )
+                inserted_count += 1


🗄️ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "models.py" -o -name "tasks.py" | grep -E "leads|backend" | head -20

Repository: Kuldeeep18/LeadOrbit

Length of output: 322

🏁 Script executed:

cat -n backend/leads/tasks.py | sed -n '160,185p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 1241

🏁 Script executed:

find . -type f -path "*/leads/models.py" -o -path "*/leads/model.py"

Repository: Kuldeeep18/LeadOrbit

Length of output: 184

🏁 Script executed:

cat -n backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 2841

🏁 Script executed:

rg -A 10 "class Lead" backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 1611

🏁 Script executed:

head -30 backend/leads/tasks.py | cat -n

Repository: Kuldeeep18/LeadOrbit

Length of output: 1034

🏁 Script executed:

rg "IntegrityError" backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

rg "get_or_create" backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

cat -n backend/leads/tasks.py | sed -n '1,180p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 8694

🏁 Script executed:

rg -B 5 "def.*task" backend/leads/tasks.py | head -40

Repository: Kuldeeep18/LeadOrbit

Length of output: 397

Use get_or_create() to eliminate the race condition on concurrent lead insertion.

The current check-then-create pattern (lines 168-178) is vulnerable to IntegrityError when concurrent tasks attempt to insert leads with the same (organization, email) pair. The unique_together constraint on the Lead model (line 18 of models.py) will be violated, causing the entire job to fail.

Suggested fix

+from django.db import IntegrityError ... - for item in extracted_raw_json[:limit]: - if not Lead.objects.filter(organization=org, email=item['email']).exists(): - Lead.objects.create( - organization=org, - email=item['email'], - first_name=item['first_name'], - last_name=item['last_name'], - company=item['company'], - phone=item['phone'], - linkedin_url=item['linkedin_url'] - ) - inserted_count += 1 + for item in extracted_raw_json[:limit]: + try: + _, created = Lead.objects.get_or_create( + organization=org, + email=item['email'], + defaults={ + 'first_name': item['first_name'], + 'last_name': item['last_name'], + 'company': item['company'], + 'phone': item['phone'], + 'linkedin_url': item['linkedin_url'], + }, + ) + except IntegrityError: + continue + if created: + inserted_count += 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/tasks.py` around lines 167 - 178, The current Lead creation logic uses a check-then-create pattern that is vulnerable to race conditions when concurrent tasks attempt to insert leads with the same organization and email combination. Replace the if-not-exists check and Lead.objects.create() call with Django's get_or_create() method, using organization and email as the lookup parameters and passing first_name, last_name, company, phone, and linkedin_url as defaults. This ensures atomic database-level handling of concurrent inserts. Check the return tuple from get_or_create() to determine if a new lead was created and increment inserted_count only when created is True.

coderabbitai · 2026-06-23T20:45:47Z

+    except Exception as e:
+        job.status = 'FAILED'
+        job.error_message = str(e)
+        job.completed_at = timezone.now()


🔒 Security & Privacy | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and examine the context around lines 186-189 cat -n backend/leads/tasks.py | sed -n '180,195p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 581

🏁 Script executed:

# Get an outline of the file to understand the function structure ast-grep outline backend/leads/tasks.py --view expanded

Repository: Kuldeeep18/LeadOrbit

Length of output: 449

🏁 Script executed:

# Search for other exception handling patterns in the same file rg "except.*Exception" backend/leads/tasks.py -A 5

Repository: Kuldeeep18/LeadOrbit

Length of output: 311

🏁 Script executed:

# Check if job.error_message is exposed through API endpoints rg "job\.error_message\|error_message" --type py -A 2 -B 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

# Verify if logger is imported and used in the file rg "logger\|import logging" backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

# Search for where job.error_message is retrieved or returned in API responses rg "error_message" --type py -B 3 -A 3

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

# Check the Job model definition to understand the field fd -t f "models.py" backend/leads/ -exec cat -n {} \;

Repository: Kuldeeep18/LeadOrbit

Length of output: 389

🏁 Script executed:

# Search for API serializers or views that expose the job rg "class.*Serializer\|class.*ViewSet\|class.*APIView" backend/ -A 5 | head -50

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

# Look for API responses that include job information rg "job\|Job" backend/leads/views.py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

# Find the Job model find . -name "*.py" -path "*/leads/*" | xargs grep -l "class Job"

Repository: Kuldeeep18/LeadOrbit

Length of output: 162

🏁 Script executed:

# Search for views.py in leads app find . -path "*/leads/views.py"

Repository: Kuldeeep18/LeadOrbit

Length of output: 183

🏁 Script executed:

# Search for serializers find . -path "*/leads/*" -name "*serializer*.py"

Repository: Kuldeeep18/LeadOrbit

Length of output: 189

🏁 Script executed:

# Check the structure of the leads app find backend/leads -type f -name "*.py" | sort

Repository: Kuldeeep18/LeadOrbit

Length of output: 406

🏁 Script executed:

# Check the Job model definition cat backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 2435

🏁 Script executed:

# Check the serializer to see if error_message is exposed cat backend/leads/serializers.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 1252

🏁 Script executed:

# Check the views to see how job data is returned cat backend/leads/views.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 4975

🏁 Script executed:

# Check if there's any exception logging or sanitization elsewhere in the tasks.py file cat backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 7722

Do not expose raw exception text to API consumers.

Lines 186–189 expose str(e) directly through the LeadScrapeJobSerializer, which is returned in the scrape_status and scrape_history API endpoints. Raw exception messages can leak internal details (stack traces, paths, dependencies, configuration).

Suggested fix

- except Exception as e: + except Exception: + logger.exception("scrape_leads_task failed", extra={"job_id": str(job_id), "organization_id": str(organization_id)}) job.status = 'FAILED' - job.error_message = str(e) + job.error_message = "Lead scraping failed. Please retry." job.completed_at = timezone.now() job.save()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except Exception as e:

job.status = 'FAILED'

job.error_message = str(e)

job.completed_at = timezone.now()

except Exception:

logger.exception("scrape_leads_task failed", extra={"job_id": str(job_id), "organization_id": str(organization_id)})

job.status = 'FAILED'

job.error_message = "Lead scraping failed. Please retry."

job.completed_at = timezone.now()

🧰 Tools

🪛 Ruff (0.15.18)

[warning] 186-186: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/tasks.py` around lines 186 - 189, The exception handler in the code block is storing the raw exception text via str(e) directly into job.error_message, which gets exposed through the API endpoints (scrape_status and scrape_history) via LeadScrapeJobSerializer. Instead of assigning str(e) to job.error_message, log the full exception internally for debugging purposes, and store a generic, user-friendly error message in job.error_message (such as "An error occurred during lead scraping") to avoid leaking internal details like stack traces, file paths, or configuration information to API consumers.

coderabbitai · 2026-06-23T20:45:47Z

+        query = request.data.get('query', '').strip()
+        limit = int(request.data.get('limit', 50))
+
+        if not query:
+            return Response({"error": "A search query is required."}, status=status.HTTP_400_BAD_REQUEST)
+        if limit > 200:
+            limit = 200 # Enforce security constraint max limit


🩺 Stability & Availability | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and inspect the file cat -n backend/leads/views.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 5731

Handle invalid limit input with proper error handling before int conversion.

Line 48 will raise a runtime exception on non-numeric input (e.g., limit=abc), returning 500 instead of 400. Additionally, there is no lower-bound validation; limit can be set to zero or negative.

Suggested fix

- query = request.data.get('query', '').strip() - limit = int(request.data.get('limit', 50)) + query = request.data.get('query', '').strip() + raw_limit = request.data.get('limit', 50) + try: + limit = int(raw_limit) + except (TypeError, ValueError): + return Response({"error": "Limit must be an integer."}, status=status.HTTP_400_BAD_REQUEST) if not query: return Response({"error": "A search query is required."}, status=status.HTTP_400_BAD_REQUEST) + if len(query) > 255: + return Response({"error": "Query must be 255 characters or fewer."}, status=status.HTTP_400_BAD_REQUEST) + if limit < 1: + return Response({"error": "Limit must be at least 1."}, status=status.HTTP_400_BAD_REQUEST) if limit > 200: limit = 200 # Enforce security constraint max limit

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/views.py` around lines 47 - 53, The limit parameter assignment on line 48 lacks error handling for invalid input and lower-bound validation. Wrap the int conversion in a try-except block to catch ValueError exceptions when non-numeric input is provided (like 'limit=abc'), and return a 400 error response with a descriptive message instead of allowing the exception to propagate. Additionally, after successfully converting the limit to an integer, add validation to ensure the limit is greater than zero before the existing maximum limit check, returning a 400 error if the limit is zero or negative.

coderabbitai · 2026-06-23T20:45:47Z

+        active_job = LeadScrapeJob.objects.filter(organization=org, status='RUNNING').exists()
+        if active_job:
+            return Response({"error": "Your organization already has an active lead scraping job running."}, status=status.HTTP_400_BAD_REQUEST)


🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find backend -type f -name "*.py" | grep -E "(views|models)" | head -20

Repository: Kuldeeep18/LeadOrbit

Length of output: 397

🏁 Script executed:

cat -n backend/leads/views.py | sed -n '50,70p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 1473

🏁 Script executed:

cat -n backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 2841

🏁 Script executed:

cat -n backend/leads/views.py | sed -n '68,95p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 1531

🏁 Script executed:

find backend/leads -name "tasks.py" -o -name "celery.py"

Repository: Kuldeeep18/LeadOrbit

Length of output: 181

🏁 Script executed:

cat -n backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 9052

Treat PENDING jobs as active to enforce single concurrent job correctly.

Line 58 only blocks RUNNING jobs, allowing multiple PENDING jobs to be created concurrently for the same organization before any Celery task starts execution.

Suggested fix

- active_job = LeadScrapeJob.objects.filter(organization=org, status='RUNNING').exists() + active_job = LeadScrapeJob.objects.filter( + organization=org, + status__in=['PENDING', 'RUNNING'], + ).exists()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

active_job = LeadScrapeJob.objects.filter(organization=org, status='RUNNING').exists()

if active_job:

return Response({"error": "Your organization already has an active lead scraping job running."}, status=status.HTTP_400_BAD_REQUEST)

active_job = LeadScrapeJob.objects.filter(

organization=org,

status__in=['PENDING', 'RUNNING'],

).exists()

if active_job:

return Response({"error": "Your organization already has an active lead scraping job running."}, status=status.HTTP_400_BAD_REQUEST)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/views.py` around lines 58 - 60, In the LeadScrapeJob.objects.filter() query on line 58, the status filter currently only checks for 'RUNNING' jobs, which allows multiple 'PENDING' jobs to be created concurrently before any Celery task execution. Modify the filter condition to check for both 'RUNNING' and 'PENDING' statuses (you can use the __in lookup with a list of both status values) to ensure only one job in either state can be active per organization at a time.

coderabbitai · 2026-06-23T20:45:47Z

+        from .tasks import scrape_leads_task
+        scrape_leads_task.delay(job.id, query, limit, org.id)
+


🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the backend/leads/views.py file and examine the area around lines 77-79 find . -name "views.py" -path "*/leads/*" | head -10

Repository: Kuldeeep18/LeadOrbit

Length of output: 183

🏁 Script executed:

# List the repository structure to understand the layout git ls-files | grep -E "(leads|tasks)" | head -20

Repository: Kuldeeep18/LeadOrbit

Length of output: 958

🏁 Script executed:

# Read the views.py file around lines 77-79 to see the actual code cat -n backend/leads/views.py | sed -n '60,100p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 2506

🏁 Script executed:

# Also get an outline of the views.py file to understand the context ast-grep outline backend/leads/views.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 507

🏁 Script executed:

# Check the tasks.py file to understand the scrape_leads_task cat -n backend/leads/tasks.py | head -50

Repository: Kuldeeep18/LeadOrbit

Length of output: 1756

🏁 Script executed:

# Check the LeadScrapeJob model to understand the status field cat -n backend/leads/models.py | grep -A 20 "class LeadScrapeJob"

Repository: Kuldeeep18/LeadOrbit

Length of output: 1054

🏁 Script executed:

# Search for existing Celery error handling patterns in the codebase rg "CeleryError|except.*delay|celery.exceptions" --type py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158

🏁 Script executed:

# Check the scrape_leads_task implementation in tasks.py cat -n backend/leads/tasks.py | grep -A 30 "scrape_leads_task"

Repository: Kuldeeep18/LeadOrbit

Length of output: 1915

🏁 Script executed:

# Look for all imports in views.py to understand current imports and patterns cat -n backend/leads/views.py | sed -n '1,35p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 1768

🏁 Script executed:

# Search for other places where .delay() is called to see if there's any error handling rg "\.delay\(" backend --type py -A 2 -B 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 1603

🏁 Script executed:

# Check what exceptions can be raised by Celery's delay() method python3 << 'PY' import inspect # Check what exceptions Celery can raise print("Common Celery exceptions that delay() might raise:") print("1. kombu.exceptions.OperationalError - when broker connection fails") print("2. celery.exceptions.CeleryError - base exception class") print("3. celery.exceptions.SecurityError - security-related errors") print("4. Exception - general exceptions") print() print("The .delay() method is part of the Task API and can raise:") print("- celery.exceptions.CeleryError and its subclasses") print("- kombu exceptions (connection-related)") print("- General exceptions from the broker") PY

Repository: Kuldeeep18/LeadOrbit

Length of output: 616

🏁 Script executed:

# Check if there's any Celery configuration or broker setup in the project rg "CELERY|BROKER" backend --type py | head -20

Repository: Kuldeeep18/LeadOrbit

Length of output: 1358

🏁 Script executed:

# Look at the entire scrape method to get full context ast-grep run --pattern 'def scrape($_) { $$$ }' backend/leads/views.py --lang python

Repository: Kuldeeep18/LeadOrbit

Length of output: 472

🏁 Script executed:

# Read the full scrape method from views.py cat -n backend/leads/views.py | sed -n '45,84p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 2256

🏁 Script executed:

# Check how other .delay() calls are handled in the codebase rg "\.delay\(" backend --type py -B 5 -A 5 | head -50

Repository: Kuldeeep18/LeadOrbit

Length of output: 3214

Handle enqueue failures for Celery task and update job status consistently.

If scrape_leads_task.delay() fails, the endpoint returns HTTP 201 Created but the job remains in PENDING status indefinitely and never executes. Wrap the task enqueue call in a try-except block to catch Celery exceptions, update the job to FAILED status with an error message, and return an appropriate error response.

Suggested fix

+from celery.exceptions import CeleryError ... - scrape_leads_task.delay(job.id, query, limit, org.id) + try: + scrape_leads_task.delay(job.id, query, limit, org.id) + except CeleryError: + job.status = 'FAILED' + job.error_message = 'Failed to enqueue scrape job.' + job.completed_at = timezone.now() + job.save(update_fields=['status', 'error_message', 'completed_at']) + return Response( + {"error": "Unable to start background job. Please retry."}, + status=status.HTTP_503_SERVICE_UNAVAILABLE, + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from .tasks import scrape_leads_task

scrape_leads_task.delay(job.id, query, limit, org.id)

from .tasks import scrape_leads_task

from celery.exceptions import CeleryError

try:

scrape_leads_task.delay(job.id, query, limit, org.id)

except CeleryError:

job.status = 'FAILED'

job.error_message = 'Failed to enqueue scrape job.'

job.completed_at = timezone.now()

job.save(update_fields=['status', 'error_message', 'completed_at'])

return Response(

{"error": "Unable to start background job. Please retry."},

status=status.HTTP_503_SERVICE_UNAVAILABLE,

)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/views.py` around lines 77 - 79, The scrape_leads_task.delay() call is not handling potential Celery exceptions, causing the job to remain in PENDING status if the task enqueue fails while still returning HTTP 201 Created. Wrap the scrape_leads_task.delay(job.id, query, limit, org.id) call in a try-except block to catch Celery exceptions, and in the except block update the job object's status to FAILED with an appropriate error message, then return an error response (such as HTTP 500) instead of the success response.

coderabbitai · 2026-06-23T20:45:47Z

+playwright==1.49.0
+google-genai==0.1.1


🔒 Security & Privacy | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Verify playwright and google-genai versions exist on PyPI and check for known issues. # Verify playwright version exists and check latest version echo "=== Checking playwright versions ===" curl -s https://pypi.org/pypi/playwright/1.49.0/json | jq '.info | {version, yanked}' || echo "playwright==1.49.0 not found" echo "Latest playwright version:" curl -s https://pypi.org/pypi/playwright/json | jq '.info.version' # Verify google-genai version exists and check latest version echo "" echo "=== Checking google-genai versions ===" curl -s https://pypi.org/pypi/google-genai/0.1.1/json | jq '.info | {version, yanked}' || echo "google-genai==0.1.1 not found" echo "Latest google-genai version:" curl -s https://pypi.org/pypi/google-genai/json | jq '.info.version'

Repository: Kuldeeep18/LeadOrbit

Length of output: 393

Fix non-existent google-genai version and update outdated playwright.

The pinned versions have critical and minor issues:

google-genai==0.1.1 does not exist on PyPI. Latest available version is 2.9.0. Update to google-genai==2.9.0 or a compatible version.

playwright==1.49.0 exists but is significantly outdated. Latest version is 1.60.0. Update to a recent stable version.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.txt` around lines 17 - 18, Update the pinned versions of google-genai and playwright in requirements.txt. Replace the non-existent google-genai==0.1.1 with google-genai==2.9.0 or another compatible version that exists on PyPI. Similarly, replace the outdated playwright==1.49.0 with playwright==1.60.0 or a more recent stable version to ensure the packages can be installed and are current with security and feature updates.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@frontend/leads.html`:
- Around line 193-199: The scrape request payload (around line 388) currently
only sends query and limit parameters and ignores the user's directory source
selections from the checkboxes with IDs srcGmaps and srcYp. Modify the scrape
request to collect the checked state of all directory source checkboxes
(srcGmaps, srcYp, and any other source checkboxes present in the form) and
include them in the payload sent to the backend, so that user selections are
actually used when making the scrape request.
- Around line 310-317: Replace the innerHTML assignment that embeds job.query
directly in the template string with a safer approach using createElement and
textContent. Create a new table row element using createElement, then create
individual table cells using createElement for each column (query, status,
leads_found, date). For the query cell specifically, use textContent to set the
job.query value instead of interpolating it into HTML. Apply the same pattern to
the error_message injection at line 429 area, creating DOM nodes and using
textContent instead of embedding the error_message string directly in the
innerHTML template.
- Around line 409-437: The polling request in the fetchWithAuth call does not
check if the HTTP response is successful before processing the JSON data, which
can leave the form in a disabled state if the server returns an error status
code. After awaiting the response from fetchWithAuth, add a check for
response.ok and if it is false, treat it as a terminal error by clearing the
pollInterval, updating the statusText to display an error message, setting the
progress bar width to 100%, adding the bg-danger class to the bar, and
re-enabling the submit button (similar to the FAILED status handling) before
returning or continuing to the next iteration.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: adf1f066-bda5-4972-a215-05734faa709f

📥 Commits

Reviewing files that changed from the base of the PR and between 6823ae6 and 53efa38.

📒 Files selected for processing (1)

frontend/leads.html

coderabbitai · 2026-06-23T20:50:50Z

+                                                <input class="form-check-input" type="checkbox" checked id="srcGmaps">
+                                                <label class="form-check-label text-white-50 small">Google Maps</label>
+                                            </div>
+                                            <div class="form-check">
+                                                <input class="form-check-input" type="checkbox" checked id="srcYp">
+                                                <label class="form-check-label text-white-50 small">YellowPages</label>
+                                            </div>


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Wire selected directory sources into the scrape request payload.

The UI collects source choices, but Line 388 sends only query and limit, so user selections are ignored.

💡 Proposed fix

document.getElementById('aiScrapeForm').addEventListener('submit', async (e) => { e.preventDefault(); const query = document.getElementById('scrapeQuery').value; - const limit = document.getElementById('scrapeLimit').value; + const limit = Number(document.getElementById('scrapeLimit').value); + const sources = []; + if (document.getElementById('srcGmaps').checked) sources.push('google_maps'); + if (document.getElementById('srcYp').checked) sources.push('yellowpages'); + if (sources.length === 0) throw new Error('Select at least one directory source.'); const btn = e.target.querySelector('button[type="submit"]'); @@ const response = await fetchWithAuth('/leads/scrape/', { method: 'POST', - body: JSON.stringify({ query, limit }) + body: JSON.stringify({ query, limit, sources }) });

Also applies to: 386-389

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/leads.html` around lines 193 - 199, The scrape request payload (around line 388) currently only sends query and limit parameters and ignores the user's directory source selections from the checkboxes with IDs srcGmaps and srcYp. Modify the scrape request to collect the checked state of all directory source checkboxes (srcGmaps, srcYp, and any other source checkboxes present in the form) and include them in the payload sent to the backend, so that user selections are actually used when making the scrape request.

coderabbitai · 2026-06-23T20:50:50Z

+                    tbody.innerHTML += `
+                        <tr class="border-bottom border-secondary border-opacity-25">
+                            <td class="fw-semibold text-info text-start ps-2">${job.query}</td>
+                            <td><span class="badge ${badgeClass}">${job.status}</span></td>
+                            <td class="fw-bold text-white">${job.leads_found}</td>
+                            <td class="text-white-50">${date}</td>
+                        </tr>
+                    `;


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Avoid injecting job.query and error_message via innerHTML.

Line 312 and Line 429 insert dynamic strings into HTML directly. Render those values with textContent/DOM nodes to prevent XSS.

💡 Proposed fix

- data.forEach(job => { - const badgeClass = job.status === 'COMPLETED' ? 'bg-success' : (job.status === 'FAILED' ? 'bg-danger' : 'bg-warning text-dark'); - const date = new Date(job.created_at || new Date()).toLocaleDateString(); - tbody.innerHTML += ` - <tr class="border-bottom border-secondary border-opacity-25"> - <td class="fw-semibold text-info text-start ps-2">${job.query}</td> - <td>${job.status}</td> - <td class="fw-bold text-white">${job.leads_found}</td> - <td class="text-white-50">${date}</td> - </tr> - `; - }); + data.forEach(job => { + const badgeClass = job.status === 'COMPLETED' ? 'bg-success' : (job.status === 'FAILED' ? 'bg-danger' : 'bg-warning text-dark'); + const date = new Date(job.created_at || new Date()).toLocaleDateString(); + const tr = document.createElement('tr'); + tr.className = 'border-bottom border-secondary border-opacity-25'; + tr.innerHTML = ` + <td class="fw-semibold text-info text-start ps-2"></td> + <td></td> + <td class="fw-bold text-white"></td> + <td class="text-white-50"></td> + `; + tr.children[0].textContent = job.query || ''; + tr.children[1].firstElementChild.textContent = job.status || ''; + tr.children[2].textContent = String(job.leads_found ?? 0); + tr.children[3].textContent = date; + tbody.appendChild(tr); + }); @@ - statusText.innerHTML = `Error: ${data.error_message || 'Timeout'}`; + statusText.innerHTML = ``; + statusText.append(document.createTextNode(`Error: ${data.error_message || 'Timeout'}`));

Also applies to: 427-430

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/leads.html` around lines 310 - 317, Replace the innerHTML assignment that embeds job.query directly in the template string with a safer approach using createElement and textContent. Create a new table row element using createElement, then create individual table cells using createElement for each column (query, status, leads_found, date). For the query cell specifically, use textContent to set the job.query value instead of interpolating it into HTML. Apply the same pattern to the error_message injection at line 429 area, creating DOM nodes and using textContent instead of embedding the error_message string directly in the innerHTML template.

Source: Linters/SAST tools

coderabbitai · 2026-06-23T20:50:50Z

+                        const response = await fetchWithAuth(`/leads/scrape/${jobId}/status/`);
+                        const data = await response.json();
+
+                        const statusText = document.getElementById('liveStatusText');
+                        const countText = document.getElementById('liveCountText');
+                        const bar = document.getElementById('liveProgressBar');
+
+                        if (data.status === 'RUNNING') {
+                            statusText.innerHTML = `<span class="spinner-border spinner-border-sm text-info me-2"></span>Agent actively compiling records...`;
+                            countText.textContent = `${data.leads_found} leads parsed`;
+                            bar.style.width = '65%';
+                        } else if (data.status === 'COMPLETED') {
+                            clearInterval(pollInterval);
+                            statusText.innerHTML = `<i class="bi bi-check-circle-fill text-success me-2"></i>Sequence completed!`;
+                            countText.textContent = `${data.leads_found} leads added`;
+                            bar.style.width = '100%';
+                            bar.classList.remove('progress-bar-striped');
+                            setTimeout(() => { window.location.reload(); }, 1500);
+                        } else if (data.status === 'FAILED') {
+                            clearInterval(pollInterval);
+                            statusText.innerHTML = `<i class="bi bi-exclame-triangle-fill text-danger me-2"></i>Error: ${data.error_message || 'Timeout'}`;
+                            bar.style.width = '100%';
+                            bar.classList.add('bg-danger');
+                            document.getElementById('aiScrapeForm').querySelector('button[type="submit"]').disabled = false;
+                        }
+                    } catch (err) {
+                        console.error('Polling connection interrupted:', err);
+                    }
+                }, 3000);


🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle non-OK polling responses as terminal errors and reset UI state.

Polling currently ignores HTTP errors and only logs exceptions, which can leave the form stuck in disabled state indefinitely.

💡 Proposed fix

pollInterval = setInterval(async () => { try { const response = await fetchWithAuth(`/leads/scrape/${jobId}/status/`); - const data = await response.json(); + const data = await response.json(); + if (!response.ok) { + throw new Error(data.error || 'Failed to fetch scrape status.'); + } @@ } else if (data.status === 'FAILED') { clearInterval(pollInterval); statusText.innerHTML = `Error: ${data.error_message || 'Timeout'}`; bar.style.width = '100%'; bar.classList.add('bg-danger'); - document.getElementById('aiScrapeForm').querySelector('button[type="submit"]').disabled = false; + const submitBtn = document.getElementById('aiScrapeForm').querySelector('button[type="submit"]'); + submitBtn.disabled = false; + submitBtn.textContent = 'Deploy AI Browser Agent'; } } catch (err) { - console.error('Polling connection interrupted:', err); + clearInterval(pollInterval); + const statusText = document.getElementById('liveStatusText'); + statusText.textContent = `Error: ${err.message || 'Polling interrupted.'}`; + const submitBtn = document.getElementById('aiScrapeForm').querySelector('button[type="submit"]'); + submitBtn.disabled = false; + submitBtn.textContent = 'Deploy AI Browser Agent'; } }, 3000);

🧰 Tools

🪛 ast-grep (0.44.0)

[warning] 416-416: Avoid assigning untrusted data to innerHTML/outerHTML or document.write
Context: statusText.innerHTML = Agent actively compiling records...
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting').

(inner-outer-html)

[warning] 421-421: Avoid assigning untrusted data to innerHTML/outerHTML or document.write
Context: statusText.innerHTML = Sequence completed!
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting').

(inner-outer-html)

[warning] 428-428: Avoid assigning untrusted data to innerHTML/outerHTML or document.write
Context: statusText.innerHTML = Error: ${data.error_message || 'Timeout'}
Note: [CWE-79] Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting').

(inner-outer-html)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@frontend/leads.html` around lines 409 - 437, The polling request in the fetchWithAuth call does not check if the HTTP response is successful before processing the JSON data, which can leave the form in a disabled state if the server returns an error status code. After awaiting the response from fetchWithAuth, add a check for response.ok and if it is false, treat it as a terminal error by clearing the pollInterval, updating the statusText to display an error message, setting the progress bar width to 100%, adding the bg-danger class to the bar, and re-enabling the submit button (similar to the FAILED status handling) before returning or continuing to the next iteration.

feat(backend): add LeadScrapeJob database models and async scrape tas…

6823ae6

…ks (Kuldeeep18#53)

feat(frontend): implement glassmorphic scraping modal with async trac…

53efa38

…king updates (Kuldeeep18#53)

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

-    limit = models.IntegerField(default=50)
+from django.core.validators import MaxValueValidator, MinValueValidator
+    limit = models.IntegerField(
+        default=50,
+        validators=[MinValueValidator(1), MaxValueValidator(200)],
+    )

		org = Organization.objects.get(id=organization_id)
		job = LeadScrapeJob.objects.get(id=job_id)

		from .tasks import scrape_leads_task
		scrape_leads_task.delay(job.id, query, limit, org.id)

		playwright==1.49.0
		google-genai==0.1.1 No newline at end of file

Conversation

KhushiMulchandani commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

🔗 Related Issue

📝 Summary of Changes

Backend

Frontend

📁 Files Modified

🏷️ Type of Change

🧪 Testing

1. API Endpoint Integrity Checks

2. Celery Worker Queue Telemetry Logs

📸 ScreenRecording Link for easy PR review and proof of updates!

✅ Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related issues

Estimated code review effort

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

KhushiMulchandani commented Jun 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KhushiMulchandani commented Jun 23, 2026 •

edited

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading