Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions backend/leads/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,22 @@ class LeadTag(TenantModel):

class Meta:
unique_together = ('lead', 'tag')

class LeadScrapeJob(TenantModel):
STATUS_CHOICES = (
('PENDING', 'Pending'),
('RUNNING', 'Running'),
('COMPLETED', 'Completed'),
('FAILED', 'Failed'),
)
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
query = models.CharField(max_length=255)
limit = models.IntegerField(default=50)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ—„οΈ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the models.py file to understand the current state
cat -n backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 2841


🏁 Script executed:

# Check for any existing validators on the limit field
rg "limit.*models\." backend/leads/models.py -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 339


🏁 Script executed:

# Search for the 200-row contract mentioned in the comment
rg "200" backend/ --type py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 7880


🏁 Script executed:

# Check if there are validators imported anywhere in the models
rg "validators" backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

# Look for view code that might validate the limit parameter
rg "limit" backend/leads/views.py -B 2 -A 2 | head -50

Repository: Kuldeeep18/LeadOrbit

Length of output: 874


🏁 Script executed:

# Check for other places where LeadScrapeJob is created or modified
rg "LeadScrapeJob" backend/ --type py

Repository: Kuldeeep18/LeadOrbit

Length of output: 1501


🏁 Script executed:

# Check if there's a Django admin configuration for LeadScrapeJob
rg "LeadScrapeJob" backend/leads/ --type py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 4589


🏁 Script executed:

# Check if there are any other views or functions that manipulate limit parameter
rg "\.limit\s*=" backend/ --type py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


Add field-level validators to enforce limit bounds in the model.

The limit field accepts any integer value, allowing non-endpoint code paths (e.g., serializer writes, Django admin, direct ORM operations) to persist values exceeding the 200-row contract. Validation should be enforced at the model layer, not just in the view.

Suggested fix
+from django.core.validators import MaxValueValidator, MinValueValidator
...
-    limit = models.IntegerField(default=50)
+    limit = models.IntegerField(
+        default=50,
+        validators=[MinValueValidator(1), MaxValueValidator(200)],
+    )
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
limit = models.IntegerField(default=50)
from django.core.validators import MaxValueValidator, MinValueValidator
limit = models.IntegerField(
default=50,
validators=[MinValueValidator(1), MaxValueValidator(200)],
)
🧰 Tools
πŸͺ› ast-grep (0.44.0)

[info] 50-50: use help_text to document model columns
Context: models.CharField(max_length=20, choices=STATUS_CHOICES, default='PENDING')
Note: [CWE-710] Improper Adherence to Coding Standards.

(model-help-text)

πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/leads/models.py` at line 50, The `limit` field in the Django model
(located in the IntegerField definition at line 50) needs field-level validators
to enforce bounds and prevent invalid values from being saved through non-view
code paths. Add the `validators` parameter to the IntegerField definition for
the `limit` field, using Django's MinValueValidator and MaxValueValidator to
enforce the minimum and maximum allowed values for the 200-row contract. This
ensures validation is enforced regardless of which code path (view, serializer,
Django admin, or direct ORM) attempts to persist the value.

status = models.CharField(max_length=20, choices=STATUS_CHOICES, default='PENDING')
leads_found = models.IntegerField(default=0)
error_message = models.TextField(blank=True, null=True)
started_at = models.DateTimeField(null=True, blank=True)
completed_at = models.DateTimeField(null=True, blank=True)

def __str__(self):
return f"Job {self.id} - {self.query} ({self.status})"
7 changes: 7 additions & 0 deletions backend/leads/serializers.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from rest_framework import serializers
from .models import Lead, Tag, LeadTag
from .models import LeadScrapeJob

class TagSerializer(serializers.ModelSerializer):
class Meta:
Expand All @@ -17,3 +18,9 @@ class Meta:
def get_tags(self, obj):
tags = Tag.objects.filter(tagged_leads__lead=obj)
return TagSerializer(tags, many=True).data

class LeadScrapeJobSerializer(serializers.ModelSerializer):
class Meta:
model = LeadScrapeJob
fields = ['id', 'query', 'limit', 'status', 'leads_found', 'error_message', 'started_at', 'completed_at', 'created_at']
read_only_fields = ['organization', 'id', 'status', 'leads_found', 'error_message', 'started_at', 'completed_at', 'created_at']
97 changes: 97 additions & 0 deletions backend/leads/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@
from .models import Lead
from tenants.models import Organization
import logging
import random

import json
import random
from django.utils import timezone
from .models import LeadScrapeJob

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -91,3 +97,94 @@ def import_leads_from_csv(file_contents, organization_id):
summary = f"Processed {leads_created} new, {leads_updated} updated, {skipped} skipped for organization {org.name}"
logger.info(summary)
return summary

@shared_task
def scrape_leads_task(job_id, query, limit, organization_id):
org = Organization.objects.get(id=organization_id)
job = LeadScrapeJob.objects.get(id=job_id)
Comment on lines +103 to +104

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”’ Security & Privacy | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the task file at the specified lines
cat -n backend/leads/tasks.py | sed -n '95,115p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 962


🏁 Script executed:

# Find the LeadScrapeJob model definition
find . -type f -name "*.py" | xargs grep -l "class LeadScrapeJob"

Repository: Kuldeeep18/LeadOrbit

Length of output: 215


🏁 Script executed:

# Check if LeadScrapeJob has an organization field
ast-grep outline backend/leads/ --match LeadScrapeJob

Repository: Kuldeeep18/LeadOrbit

Length of output: 264


🏁 Script executed:

# Read the LeadScrapeJob model definition
cat -n backend/leads/models.py | head -200

Repository: Kuldeeep18/LeadOrbit

Length of output: 2841


🏁 Script executed:

# Search for where scrape_leads_task is called/invoked
rg "scrape_leads_task" --type py -B 3 -A 3

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

# Check the TenantModel definition to confirm it has organization field
cat -n backend/tenants/models.py | head -100

Repository: Kuldeeep18/LeadOrbit

Length of output: 2079


🏁 Script executed:

# Search more broadly for where this task is called (might not use exact function name)
rg "scrape_leads_task|apply_async|delay" backend/leads/ --type py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 1194


🏁 Script executed:

# Check if LeadScrapeJob has a custom manager that might filter by organization
ast-grep outline backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 702


🏁 Script executed:

# View the complete task function to check for other potential issues
cat -n backend/leads/tasks.py | sed -n '101,180p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 4737


🏁 Script executed:

# Check if there are other database queries in the task that should filter by organization
cat -n backend/leads/tasks.py | sed -n '101,250p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 5095


Add organization filter to LeadScrapeJob lookup in Celery task to prevent tenant isolation bypass.

Line 104 loads the job by id only. Celery workers run without tenant context, so TenantManager returns all records instead of filtering by organization. If task arguments mismatch, the query could match a job from a different organization and update it while creating leads in another org's context.

Suggested fix
    org = Organization.objects.get(id=organization_id)
-   job = LeadScrapeJob.objects.get(id=job_id)
+   job = LeadScrapeJob.objects.get(id=job_id, organization=org)
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
org = Organization.objects.get(id=organization_id)
job = LeadScrapeJob.objects.get(id=job_id)
org = Organization.objects.get(id=organization_id)
job = LeadScrapeJob.objects.get(id=job_id, organization=org)
πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/leads/tasks.py` around lines 103 - 104, The LeadScrapeJob lookup in
the Celery task uses only the job ID without filtering by organization, which
can allow access to jobs from different organizations due to missing tenant
context in Celery workers. Modify the LeadScrapeJob.objects.get() call on line
104 to include an additional filter for organization_id matching the
organization loaded on line 103, ensuring the job query is scoped to the current
organization and preventing tenant isolation bypass.


job.status = 'RUNNING'
job.started_at = timezone.now()
job.save()

try:
import time
time.sleep(random.uniform(2.0, 4.0))

# 1. Dynamically toggle mock data payload based on the user's query
normalized_query = query.lower()
# extracted_raw_json = []
if "miami" in normalized_query:


# Pools of authentic names and business styles
first_names = ["Carlos", "Elena", "Marcus", "Sofia", "Ricardo", "Amanda", "Devon", "Priyanka", "Alejandro", "Melissa", "Jorge", "Alina"]
last_names = ["Mendez", "Rostova", "Vance", "Blanco", "Vega", "Gomez", "Chen", "Patel", "Cordova", "Suarez", "Levine", "Sinclair"]
clinic_styles = ["Dental Smiles", "Bayside Dental Care", "Magic City Orthodontics", "Brickell Dental Studio", "Elite Dental Group", "Ocean Drive Dentistry"]
email_domains = ["miamidentalsmiles.com", "baysidedentalcare.com", "magiccityortho.com", "brickelldental.com", "elitedentalfl.com", "oceandrivedental.io"]

extracted_raw_json = []

# Generate 15 distinct, highly realistic B2B profiles
for _ in range(15):
f_name = random.choice(first_names)
l_name = random.choice(last_names)
clinic = random.choice(clinic_styles)
domain = random.choice(email_domains)

# Create a realistic corporate email structure
email_style = random.choice([
f"{f_name.lower()}.{l_name.lower()}@{domain}",
f"dr.{l_name.lower()}@{domain}",
f"contact@{domain}"
])

# Generate a valid-looking Miami (+1 305) local phone line
random_phone = f"+1305555{random.randint(1000, 9999)}"

extracted_raw_json.append({
"first_name": f_name,
"last_name": l_name,
"email": email_style,
"company": clinic,
"phone": random_phone,
"linkedin_url": f"https://linkedin.com/in/{f_name.lower()}-{l_name.lower()}-dds"
})
else:
# Default fallback to original Austin tech profile mock data
extracted_raw_json = [
{"first_name": "Amit", "last_name": "Sharma", "email": "amit.sharma@austintech.com", "company": "Austin Tech Solutions", "phone": "+15125550192", "linkedin_url": "https://linkedin.com/in/amit-sharma-tech"},
{"first_name": "Sarah", "last_name": "Jenkins", "email": "sjenkins@apexgrowth.io", "company": "Apex Growth Corp", "phone": "+15125550143", "linkedin_url": "https://linkedin.com/in/sjenkins-growth"},
{"first_name": "Rajesh", "last_name": "Patel", "email": "rajesh@lonestarventures.co", "company": "Lone Star Ventures", "phone": "+15125550188", "linkedin_url": "https://linkedin.com/in/rpatel-invest"}
]

# Real-time status update broadcast emulation
job.leads_found = 1
job.save()
time.sleep(1)

inserted_count = 0
for item in extracted_raw_json[:limit]:
if not Lead.objects.filter(organization=org, email=item['email']).exists():
Lead.objects.create(
organization=org,
email=item['email'],
first_name=item['first_name'],
last_name=item['last_name'],
company=item['company'],
phone=item['phone'],
linkedin_url=item['linkedin_url']
)
inserted_count += 1
Comment on lines +167 to +178

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ—„οΈ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "models.py" -o -name "tasks.py" | grep -E "leads|backend" | head -20

Repository: Kuldeeep18/LeadOrbit

Length of output: 322


🏁 Script executed:

cat -n backend/leads/tasks.py | sed -n '160,185p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 1241


🏁 Script executed:

find . -type f -path "*/leads/models.py" -o -path "*/leads/model.py"

Repository: Kuldeeep18/LeadOrbit

Length of output: 184


🏁 Script executed:

cat -n backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 2841


🏁 Script executed:

rg -A 10 "class Lead" backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 1611


🏁 Script executed:

head -30 backend/leads/tasks.py | cat -n

Repository: Kuldeeep18/LeadOrbit

Length of output: 1034


🏁 Script executed:

rg "IntegrityError" backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

rg "get_or_create" backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

cat -n backend/leads/tasks.py | sed -n '1,180p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 8694


🏁 Script executed:

rg -B 5 "def.*task" backend/leads/tasks.py | head -40

Repository: Kuldeeep18/LeadOrbit

Length of output: 397


Use get_or_create() to eliminate the race condition on concurrent lead insertion.

The current check-then-create pattern (lines 168-178) is vulnerable to IntegrityError when concurrent tasks attempt to insert leads with the same (organization, email) pair. The unique_together constraint on the Lead model (line 18 of models.py) will be violated, causing the entire job to fail.

Suggested fix
+from django.db import IntegrityError
...
-        for item in extracted_raw_json[:limit]:
-            if not Lead.objects.filter(organization=org, email=item['email']).exists():
-                Lead.objects.create(
-                    organization=org,
-                    email=item['email'],
-                    first_name=item['first_name'],
-                    last_name=item['last_name'],
-                    company=item['company'],
-                    phone=item['phone'],
-                    linkedin_url=item['linkedin_url']
-                )
-                inserted_count += 1
+        for item in extracted_raw_json[:limit]:
+            try:
+                _, created = Lead.objects.get_or_create(
+                    organization=org,
+                    email=item['email'],
+                    defaults={
+                        'first_name': item['first_name'],
+                        'last_name': item['last_name'],
+                        'company': item['company'],
+                        'phone': item['phone'],
+                        'linkedin_url': item['linkedin_url'],
+                    },
+                )
+            except IntegrityError:
+                continue
+            if created:
+                inserted_count += 1
πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/leads/tasks.py` around lines 167 - 178, The current Lead creation
logic uses a check-then-create pattern that is vulnerable to race conditions
when concurrent tasks attempt to insert leads with the same organization and
email combination. Replace the if-not-exists check and Lead.objects.create()
call with Django's get_or_create() method, using organization and email as the
lookup parameters and passing first_name, last_name, company, phone, and
linkedin_url as defaults. This ensures atomic database-level handling of
concurrent inserts. Check the return tuple from get_or_create() to determine if
a new lead was created and increment inserted_count only when created is True.


# Complete tracking job metrics lifecycle cleanly
job.status = 'COMPLETED'
job.leads_found = inserted_count
job.completed_at = timezone.now()
job.save()

except Exception as e:
job.status = 'FAILED'
job.error_message = str(e)
job.completed_at = timezone.now()
Comment on lines +186 to +189

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ”’ Security & Privacy | πŸ”΄ Critical

🧩 Analysis chain

🏁 Script executed:

# First, verify the file exists and examine the context around lines 186-189
cat -n backend/leads/tasks.py | sed -n '180,195p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 581


🏁 Script executed:

# Get an outline of the file to understand the function structure
ast-grep outline backend/leads/tasks.py --view expanded

Repository: Kuldeeep18/LeadOrbit

Length of output: 449


🏁 Script executed:

# Search for other exception handling patterns in the same file
rg "except.*Exception" backend/leads/tasks.py -A 5

Repository: Kuldeeep18/LeadOrbit

Length of output: 311


🏁 Script executed:

# Check if job.error_message is exposed through API endpoints
rg "job\.error_message\|error_message" --type py -A 2 -B 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

# Verify if logger is imported and used in the file
rg "logger\|import logging" backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

# Search for where job.error_message is retrieved or returned in API responses
rg "error_message" --type py -B 3 -A 3

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

# Check the Job model definition to understand the field
fd -t f "models.py" backend/leads/ -exec cat -n {} \;

Repository: Kuldeeep18/LeadOrbit

Length of output: 389


🏁 Script executed:

# Search for API serializers or views that expose the job
rg "class.*Serializer\|class.*ViewSet\|class.*APIView" backend/ -A 5 | head -50

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

# Look for API responses that include job information
rg "job\|Job" backend/leads/views.py -B 2 -A 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

# Find the Job model
find . -name "*.py" -path "*/leads/*" | xargs grep -l "class Job"

Repository: Kuldeeep18/LeadOrbit

Length of output: 162


🏁 Script executed:

# Search for views.py in leads app
find . -path "*/leads/views.py"

Repository: Kuldeeep18/LeadOrbit

Length of output: 183


🏁 Script executed:

# Search for serializers
find . -path "*/leads/*" -name "*serializer*.py"

Repository: Kuldeeep18/LeadOrbit

Length of output: 189


🏁 Script executed:

# Check the structure of the leads app
find backend/leads -type f -name "*.py" | sort

Repository: Kuldeeep18/LeadOrbit

Length of output: 406


🏁 Script executed:

# Check the Job model definition
cat backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 2435


🏁 Script executed:

# Check the serializer to see if error_message is exposed
cat backend/leads/serializers.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 1252


🏁 Script executed:

# Check the views to see how job data is returned
cat backend/leads/views.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 4975


🏁 Script executed:

# Check if there's any exception logging or sanitization elsewhere in the tasks.py file
cat backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 7722


Do not expose raw exception text to API consumers.

Lines 186–189 expose str(e) directly through the LeadScrapeJobSerializer, which is returned in the scrape_status and scrape_history API endpoints. Raw exception messages can leak internal details (stack traces, paths, dependencies, configuration).

Suggested fix
-    except Exception as e:
+    except Exception:
+        logger.exception("scrape_leads_task failed", extra={"job_id": str(job_id), "organization_id": str(organization_id)})
         job.status = 'FAILED'
-        job.error_message = str(e)
+        job.error_message = "Lead scraping failed. Please retry."
         job.completed_at = timezone.now()
         job.save()
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
job.status = 'FAILED'
job.error_message = str(e)
job.completed_at = timezone.now()
except Exception:
logger.exception("scrape_leads_task failed", extra={"job_id": str(job_id), "organization_id": str(organization_id)})
job.status = 'FAILED'
job.error_message = "Lead scraping failed. Please retry."
job.completed_at = timezone.now()
🧰 Tools
πŸͺ› Ruff (0.15.18)

[warning] 186-186: Do not catch blind exception: Exception

(BLE001)

πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/leads/tasks.py` around lines 186 - 189, The exception handler in the
code block is storing the raw exception text via str(e) directly into
job.error_message, which gets exposed through the API endpoints (scrape_status
and scrape_history) via LeadScrapeJobSerializer. Instead of assigning str(e) to
job.error_message, log the full exception internally for debugging purposes, and
store a generic, user-friendly error message in job.error_message (such as "An
error occurred during lead scraping") to avoid leaking internal details like
stack traces, file paths, or configuration information to API consumers.

job.save()
59 changes: 59 additions & 0 deletions backend/leads/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
from .models import Lead, Tag
from .serializers import LeadSerializer, TagSerializer

from django.utils import timezone
from .models import LeadScrapeJob
from .serializers import LeadScrapeJobSerializer

class LeadViewSet(viewsets.ModelViewSet):
serializer_class = LeadSerializer
queryset = Lead.objects.all()
Expand Down Expand Up @@ -37,6 +41,61 @@ def import_csv(self, request):
import_leads_from_csv.delay(file_contents, request.user.organization.id)

return Response({"message": "File received. Processing in background.", "filename": file_obj.name}, status=status.HTTP_202_ACCEPTED)

@action(detail=False, methods=['post'], url_path='scrape')
def scrape(self, request):
query = request.data.get('query', '').strip()
limit = int(request.data.get('limit', 50))

if not query:
return Response({"error": "A search query is required."}, status=status.HTTP_400_BAD_REQUEST)
if limit > 200:
limit = 200 # Enforce security constraint max limit
Comment on lines +47 to +53

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | πŸ”΄ Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate and inspect the file
cat -n backend/leads/views.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 5731


Handle invalid limit input with proper error handling before int conversion.

Line 48 will raise a runtime exception on non-numeric input (e.g., limit=abc), returning 500 instead of 400. Additionally, there is no lower-bound validation; limit can be set to zero or negative.

Suggested fix
-        query = request.data.get('query', '').strip()
-        limit = int(request.data.get('limit', 50))
+        query = request.data.get('query', '').strip()
+        raw_limit = request.data.get('limit', 50)
+        try:
+            limit = int(raw_limit)
+        except (TypeError, ValueError):
+            return Response({"error": "Limit must be an integer."}, status=status.HTTP_400_BAD_REQUEST)
         
         if not query:
             return Response({"error": "A search query is required."}, status=status.HTTP_400_BAD_REQUEST)
+        if len(query) > 255:
+            return Response({"error": "Query must be 255 characters or fewer."}, status=status.HTTP_400_BAD_REQUEST)
+        if limit < 1:
+            return Response({"error": "Limit must be at least 1."}, status=status.HTTP_400_BAD_REQUEST)
         if limit > 200:
             limit = 200 # Enforce security constraint max limit
πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/leads/views.py` around lines 47 - 53, The limit parameter assignment
on line 48 lacks error handling for invalid input and lower-bound validation.
Wrap the int conversion in a try-except block to catch ValueError exceptions
when non-numeric input is provided (like 'limit=abc'), and return a 400 error
response with a descriptive message instead of allowing the exception to
propagate. Additionally, after successfully converting the limit to an integer,
add validation to ensure the limit is greater than zero before the existing
maximum limit check, returning a 400 error if the limit is zero or negative.


org = request.user.organization

# Constraint 1: Check for an active running job in this organization
active_job = LeadScrapeJob.objects.filter(organization=org, status='RUNNING').exists()
if active_job:
return Response({"error": "Your organization already has an active lead scraping job running."}, status=status.HTTP_400_BAD_REQUEST)
Comment on lines +58 to +60

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find backend -type f -name "*.py" | grep -E "(views|models)" | head -20

Repository: Kuldeeep18/LeadOrbit

Length of output: 397


🏁 Script executed:

cat -n backend/leads/views.py | sed -n '50,70p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 1473


🏁 Script executed:

cat -n backend/leads/models.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 2841


🏁 Script executed:

cat -n backend/leads/views.py | sed -n '68,95p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 1531


🏁 Script executed:

find backend/leads -name "tasks.py" -o -name "celery.py"

Repository: Kuldeeep18/LeadOrbit

Length of output: 181


🏁 Script executed:

cat -n backend/leads/tasks.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 9052


Treat PENDING jobs as active to enforce single concurrent job correctly.

Line 58 only blocks RUNNING jobs, allowing multiple PENDING jobs to be created concurrently for the same organization before any Celery task starts execution.

Suggested fix
-        active_job = LeadScrapeJob.objects.filter(organization=org, status='RUNNING').exists()
+        active_job = LeadScrapeJob.objects.filter(
+            organization=org,
+            status__in=['PENDING', 'RUNNING'],
+        ).exists()
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
active_job = LeadScrapeJob.objects.filter(organization=org, status='RUNNING').exists()
if active_job:
return Response({"error": "Your organization already has an active lead scraping job running."}, status=status.HTTP_400_BAD_REQUEST)
active_job = LeadScrapeJob.objects.filter(
organization=org,
status__in=['PENDING', 'RUNNING'],
).exists()
if active_job:
return Response({"error": "Your organization already has an active lead scraping job running."}, status=status.HTTP_400_BAD_REQUEST)
πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/leads/views.py` around lines 58 - 60, In the
LeadScrapeJob.objects.filter() query on line 58, the status filter currently
only checks for 'RUNNING' jobs, which allows multiple 'PENDING' jobs to be
created concurrently before any Celery task execution. Modify the filter
condition to check for both 'RUNNING' and 'PENDING' statuses (you can use the
__in lookup with a list of both status values) to ensure only one job in either
state can be active per organization at a time.


# Constraint 2: Enforce 5-minute cooldown period between completions
five_minutes_ago = timezone.now() - timezone.timedelta(minutes=5)
recent_job = LeadScrapeJob.objects.filter(organization=org, status='COMPLETED', completed_at__gte=five_minutes_ago).exists()
if recent_job:
return Response({"error": "Throttled. Please wait 5 minutes between lead generation queries."}, status=status.HTTP_429_TOO_MANY_REQUESTS)

# Create tracking job record
job = LeadScrapeJob.objects.create(
organization=org,
query=query,
limit=limit,
status='PENDING'
)

# Dispatch Celery background worker agent task
from .tasks import scrape_leads_task
scrape_leads_task.delay(job.id, query, limit, org.id)

Comment on lines +77 to +79

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, find the backend/leads/views.py file and examine the area around lines 77-79
find . -name "views.py" -path "*/leads/*" | head -10

Repository: Kuldeeep18/LeadOrbit

Length of output: 183


🏁 Script executed:

# List the repository structure to understand the layout
git ls-files | grep -E "(leads|tasks)" | head -20

Repository: Kuldeeep18/LeadOrbit

Length of output: 958


🏁 Script executed:

# Read the views.py file around lines 77-79 to see the actual code
cat -n backend/leads/views.py | sed -n '60,100p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 2506


🏁 Script executed:

# Also get an outline of the views.py file to understand the context
ast-grep outline backend/leads/views.py

Repository: Kuldeeep18/LeadOrbit

Length of output: 507


🏁 Script executed:

# Check the tasks.py file to understand the scrape_leads_task
cat -n backend/leads/tasks.py | head -50

Repository: Kuldeeep18/LeadOrbit

Length of output: 1756


🏁 Script executed:

# Check the LeadScrapeJob model to understand the status field
cat -n backend/leads/models.py | grep -A 20 "class LeadScrapeJob"

Repository: Kuldeeep18/LeadOrbit

Length of output: 1054


🏁 Script executed:

# Search for existing Celery error handling patterns in the codebase
rg "CeleryError|except.*delay|celery.exceptions" --type py

Repository: Kuldeeep18/LeadOrbit

Length of output: 158


🏁 Script executed:

# Check the scrape_leads_task implementation in tasks.py
cat -n backend/leads/tasks.py | grep -A 30 "scrape_leads_task"

Repository: Kuldeeep18/LeadOrbit

Length of output: 1915


🏁 Script executed:

# Look for all imports in views.py to understand current imports and patterns
cat -n backend/leads/views.py | sed -n '1,35p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 1768


🏁 Script executed:

# Search for other places where .delay() is called to see if there's any error handling
rg "\.delay\(" backend --type py -A 2 -B 2

Repository: Kuldeeep18/LeadOrbit

Length of output: 1603


🏁 Script executed:

# Check what exceptions can be raised by Celery's delay() method
python3 << 'PY'
import inspect

# Check what exceptions Celery can raise
print("Common Celery exceptions that delay() might raise:")
print("1. kombu.exceptions.OperationalError - when broker connection fails")
print("2. celery.exceptions.CeleryError - base exception class")
print("3. celery.exceptions.SecurityError - security-related errors")
print("4. Exception - general exceptions")
print()
print("The .delay() method is part of the Task API and can raise:")
print("- celery.exceptions.CeleryError and its subclasses")
print("- kombu exceptions (connection-related)")
print("- General exceptions from the broker")
PY

Repository: Kuldeeep18/LeadOrbit

Length of output: 616


🏁 Script executed:

# Check if there's any Celery configuration or broker setup in the project
rg "CELERY|BROKER" backend --type py | head -20

Repository: Kuldeeep18/LeadOrbit

Length of output: 1358


🏁 Script executed:

# Look at the entire scrape method to get full context
ast-grep run --pattern 'def scrape($_) { $$$ }' backend/leads/views.py --lang python

Repository: Kuldeeep18/LeadOrbit

Length of output: 472


🏁 Script executed:

# Read the full scrape method from views.py
cat -n backend/leads/views.py | sed -n '45,84p'

Repository: Kuldeeep18/LeadOrbit

Length of output: 2256


🏁 Script executed:

# Check how other .delay() calls are handled in the codebase
rg "\.delay\(" backend --type py -B 5 -A 5 | head -50

Repository: Kuldeeep18/LeadOrbit

Length of output: 3214


Handle enqueue failures for Celery task and update job status consistently.

If scrape_leads_task.delay() fails, the endpoint returns HTTP 201 Created but the job remains in PENDING status indefinitely and never executes. Wrap the task enqueue call in a try-except block to catch Celery exceptions, update the job to FAILED status with an error message, and return an appropriate error response.

Suggested fix
+from celery.exceptions import CeleryError
...
-        scrape_leads_task.delay(job.id, query, limit, org.id)
+        try:
+            scrape_leads_task.delay(job.id, query, limit, org.id)
+        except CeleryError:
+            job.status = 'FAILED'
+            job.error_message = 'Failed to enqueue scrape job.'
+            job.completed_at = timezone.now()
+            job.save(update_fields=['status', 'error_message', 'completed_at'])
+            return Response(
+                {"error": "Unable to start background job. Please retry."},
+                status=status.HTTP_503_SERVICE_UNAVAILABLE,
+            )
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from .tasks import scrape_leads_task
scrape_leads_task.delay(job.id, query, limit, org.id)
from .tasks import scrape_leads_task
from celery.exceptions import CeleryError
try:
scrape_leads_task.delay(job.id, query, limit, org.id)
except CeleryError:
job.status = 'FAILED'
job.error_message = 'Failed to enqueue scrape job.'
job.completed_at = timezone.now()
job.save(update_fields=['status', 'error_message', 'completed_at'])
return Response(
{"error": "Unable to start background job. Please retry."},
status=status.HTTP_503_SERVICE_UNAVAILABLE,
)
πŸ€– Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/leads/views.py` around lines 77 - 79, The scrape_leads_task.delay()
call is not handling potential Celery exceptions, causing the job to remain in
PENDING status if the task enqueue fails while still returning HTTP 201 Created.
Wrap the scrape_leads_task.delay(job.id, query, limit, org.id) call in a
try-except block to catch Celery exceptions, and in the except block update the
job object's status to FAILED with an appropriate error message, then return an
error response (such as HTTP 500) instead of the success response.

return Response({
"message": "AI Lead Generation background agent launched successfully.",
"job_id": str(job.id)
}, status=status.HTTP_201_CREATED)

@action(detail=False, methods=['get'], url_path='scrape/(?P<job_id>[^/.]+)/status')
def scrape_status(self, request, job_id=None):
try:
job = LeadScrapeJob.objects.get(organization=request.user.organization, id=job_id)
serializer = LeadScrapeJobSerializer(job)
return Response(serializer.data, status=status.HTTP_200_OK)
except LeadScrapeJob.DoesNotExist:
return Response({"error": "Scrape job not found."}, status=status.HTTP_404_NOT_FOUND)

@action(detail=False, methods=['get'], url_path='scrape/history')
def scrape_history(self, request):
jobs = LeadScrapeJob.objects.filter(organization=request.user.organization).order_by('-created_at')
serializer = LeadScrapeJobSerializer(jobs, many=True)
return Response(serializer.data, status=status.HTTP_200_OK)

class TagViewSet(viewsets.ModelViewSet):
serializer_class = TagSerializer
Expand Down
Loading