-
Notifications
You must be signed in to change notification settings - Fork 64
feat(backend): add LeadScrapeJob database models and async scrape tasks #433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -5,6 +5,12 @@ | |||||||||||||||||||
| from .models import Lead | ||||||||||||||||||||
| from tenants.models import Organization | ||||||||||||||||||||
| import logging | ||||||||||||||||||||
| import random | ||||||||||||||||||||
|
|
||||||||||||||||||||
| import json | ||||||||||||||||||||
| import random | ||||||||||||||||||||
| from django.utils import timezone | ||||||||||||||||||||
| from .models import LeadScrapeJob | ||||||||||||||||||||
|
|
||||||||||||||||||||
| logger = logging.getLogger(__name__) | ||||||||||||||||||||
|
|
||||||||||||||||||||
|
|
@@ -91,3 +97,94 @@ def import_leads_from_csv(file_contents, organization_id): | |||||||||||||||||||
| summary = f"Processed {leads_created} new, {leads_updated} updated, {skipped} skipped for organization {org.name}" | ||||||||||||||||||||
| logger.info(summary) | ||||||||||||||||||||
| return summary | ||||||||||||||||||||
|
|
||||||||||||||||||||
| @shared_task | ||||||||||||||||||||
| def scrape_leads_task(job_id, query, limit, organization_id): | ||||||||||||||||||||
| org = Organization.objects.get(id=organization_id) | ||||||||||||||||||||
| job = LeadScrapeJob.objects.get(id=job_id) | ||||||||||||||||||||
|
Comment on lines
+103
to
+104
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π Security & Privacy | π Major π§© Analysis chainπ Script executed: # First, let's examine the task file at the specified lines
cat -n backend/leads/tasks.py | sed -n '95,115p'Repository: Kuldeeep18/LeadOrbit Length of output: 962 π Script executed: # Find the LeadScrapeJob model definition
find . -type f -name "*.py" | xargs grep -l "class LeadScrapeJob"Repository: Kuldeeep18/LeadOrbit Length of output: 215 π Script executed: # Check if LeadScrapeJob has an organization field
ast-grep outline backend/leads/ --match LeadScrapeJobRepository: Kuldeeep18/LeadOrbit Length of output: 264 π Script executed: # Read the LeadScrapeJob model definition
cat -n backend/leads/models.py | head -200Repository: Kuldeeep18/LeadOrbit Length of output: 2841 π Script executed: # Search for where scrape_leads_task is called/invoked
rg "scrape_leads_task" --type py -B 3 -A 3Repository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: # Check the TenantModel definition to confirm it has organization field
cat -n backend/tenants/models.py | head -100Repository: Kuldeeep18/LeadOrbit Length of output: 2079 π Script executed: # Search more broadly for where this task is called (might not use exact function name)
rg "scrape_leads_task|apply_async|delay" backend/leads/ --type py -B 2 -A 2Repository: Kuldeeep18/LeadOrbit Length of output: 1194 π Script executed: # Check if LeadScrapeJob has a custom manager that might filter by organization
ast-grep outline backend/leads/models.pyRepository: Kuldeeep18/LeadOrbit Length of output: 702 π Script executed: # View the complete task function to check for other potential issues
cat -n backend/leads/tasks.py | sed -n '101,180p'Repository: Kuldeeep18/LeadOrbit Length of output: 4737 π Script executed: # Check if there are other database queries in the task that should filter by organization
cat -n backend/leads/tasks.py | sed -n '101,250p'Repository: Kuldeeep18/LeadOrbit Length of output: 5095 Add organization filter to LeadScrapeJob lookup in Celery task to prevent tenant isolation bypass. Line 104 loads the job by Suggested fix org = Organization.objects.get(id=organization_id)
- job = LeadScrapeJob.objects.get(id=job_id)
+ job = LeadScrapeJob.objects.get(id=job_id, organization=org)π Committable suggestion
Suggested change
π€ Prompt for AI Agents |
||||||||||||||||||||
|
|
||||||||||||||||||||
| job.status = 'RUNNING' | ||||||||||||||||||||
| job.started_at = timezone.now() | ||||||||||||||||||||
| job.save() | ||||||||||||||||||||
|
|
||||||||||||||||||||
| try: | ||||||||||||||||||||
| import time | ||||||||||||||||||||
| time.sleep(random.uniform(2.0, 4.0)) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # 1. Dynamically toggle mock data payload based on the user's query | ||||||||||||||||||||
| normalized_query = query.lower() | ||||||||||||||||||||
| # extracted_raw_json = [] | ||||||||||||||||||||
| if "miami" in normalized_query: | ||||||||||||||||||||
|
|
||||||||||||||||||||
|
|
||||||||||||||||||||
| # Pools of authentic names and business styles | ||||||||||||||||||||
| first_names = ["Carlos", "Elena", "Marcus", "Sofia", "Ricardo", "Amanda", "Devon", "Priyanka", "Alejandro", "Melissa", "Jorge", "Alina"] | ||||||||||||||||||||
| last_names = ["Mendez", "Rostova", "Vance", "Blanco", "Vega", "Gomez", "Chen", "Patel", "Cordova", "Suarez", "Levine", "Sinclair"] | ||||||||||||||||||||
| clinic_styles = ["Dental Smiles", "Bayside Dental Care", "Magic City Orthodontics", "Brickell Dental Studio", "Elite Dental Group", "Ocean Drive Dentistry"] | ||||||||||||||||||||
| email_domains = ["miamidentalsmiles.com", "baysidedentalcare.com", "magiccityortho.com", "brickelldental.com", "elitedentalfl.com", "oceandrivedental.io"] | ||||||||||||||||||||
|
|
||||||||||||||||||||
| extracted_raw_json = [] | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # Generate 15 distinct, highly realistic B2B profiles | ||||||||||||||||||||
| for _ in range(15): | ||||||||||||||||||||
| f_name = random.choice(first_names) | ||||||||||||||||||||
| l_name = random.choice(last_names) | ||||||||||||||||||||
| clinic = random.choice(clinic_styles) | ||||||||||||||||||||
| domain = random.choice(email_domains) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # Create a realistic corporate email structure | ||||||||||||||||||||
| email_style = random.choice([ | ||||||||||||||||||||
| f"{f_name.lower()}.{l_name.lower()}@{domain}", | ||||||||||||||||||||
| f"dr.{l_name.lower()}@{domain}", | ||||||||||||||||||||
| f"contact@{domain}" | ||||||||||||||||||||
| ]) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # Generate a valid-looking Miami (+1 305) local phone line | ||||||||||||||||||||
| random_phone = f"+1305555{random.randint(1000, 9999)}" | ||||||||||||||||||||
|
|
||||||||||||||||||||
| extracted_raw_json.append({ | ||||||||||||||||||||
| "first_name": f_name, | ||||||||||||||||||||
| "last_name": l_name, | ||||||||||||||||||||
| "email": email_style, | ||||||||||||||||||||
| "company": clinic, | ||||||||||||||||||||
| "phone": random_phone, | ||||||||||||||||||||
| "linkedin_url": f"https://linkedin.com/in/{f_name.lower()}-{l_name.lower()}-dds" | ||||||||||||||||||||
| }) | ||||||||||||||||||||
| else: | ||||||||||||||||||||
| # Default fallback to original Austin tech profile mock data | ||||||||||||||||||||
| extracted_raw_json = [ | ||||||||||||||||||||
| {"first_name": "Amit", "last_name": "Sharma", "email": "amit.sharma@austintech.com", "company": "Austin Tech Solutions", "phone": "+15125550192", "linkedin_url": "https://linkedin.com/in/amit-sharma-tech"}, | ||||||||||||||||||||
| {"first_name": "Sarah", "last_name": "Jenkins", "email": "sjenkins@apexgrowth.io", "company": "Apex Growth Corp", "phone": "+15125550143", "linkedin_url": "https://linkedin.com/in/sjenkins-growth"}, | ||||||||||||||||||||
| {"first_name": "Rajesh", "last_name": "Patel", "email": "rajesh@lonestarventures.co", "company": "Lone Star Ventures", "phone": "+15125550188", "linkedin_url": "https://linkedin.com/in/rpatel-invest"} | ||||||||||||||||||||
| ] | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # Real-time status update broadcast emulation | ||||||||||||||||||||
| job.leads_found = 1 | ||||||||||||||||||||
| job.save() | ||||||||||||||||||||
| time.sleep(1) | ||||||||||||||||||||
|
|
||||||||||||||||||||
| inserted_count = 0 | ||||||||||||||||||||
| for item in extracted_raw_json[:limit]: | ||||||||||||||||||||
| if not Lead.objects.filter(organization=org, email=item['email']).exists(): | ||||||||||||||||||||
| Lead.objects.create( | ||||||||||||||||||||
| organization=org, | ||||||||||||||||||||
| email=item['email'], | ||||||||||||||||||||
| first_name=item['first_name'], | ||||||||||||||||||||
| last_name=item['last_name'], | ||||||||||||||||||||
| company=item['company'], | ||||||||||||||||||||
| phone=item['phone'], | ||||||||||||||||||||
| linkedin_url=item['linkedin_url'] | ||||||||||||||||||||
| ) | ||||||||||||||||||||
| inserted_count += 1 | ||||||||||||||||||||
|
Comment on lines
+167
to
+178
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ποΈ Data Integrity & Integration | π Major π§© Analysis chainπ Script executed: find . -type f -name "models.py" -o -name "tasks.py" | grep -E "leads|backend" | head -20Repository: Kuldeeep18/LeadOrbit Length of output: 322 π Script executed: cat -n backend/leads/tasks.py | sed -n '160,185p'Repository: Kuldeeep18/LeadOrbit Length of output: 1241 π Script executed: find . -type f -path "*/leads/models.py" -o -path "*/leads/model.py"Repository: Kuldeeep18/LeadOrbit Length of output: 184 π Script executed: cat -n backend/leads/models.pyRepository: Kuldeeep18/LeadOrbit Length of output: 2841 π Script executed: rg -A 10 "class Lead" backend/leads/models.pyRepository: Kuldeeep18/LeadOrbit Length of output: 1611 π Script executed: head -30 backend/leads/tasks.py | cat -nRepository: Kuldeeep18/LeadOrbit Length of output: 1034 π Script executed: rg "IntegrityError" backend/leads/tasks.pyRepository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: rg "get_or_create" backend/leads/tasks.pyRepository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: cat -n backend/leads/tasks.py | sed -n '1,180p'Repository: Kuldeeep18/LeadOrbit Length of output: 8694 π Script executed: rg -B 5 "def.*task" backend/leads/tasks.py | head -40Repository: Kuldeeep18/LeadOrbit Length of output: 397 Use The current check-then-create pattern (lines 168-178) is vulnerable to Suggested fix+from django.db import IntegrityError
...
- for item in extracted_raw_json[:limit]:
- if not Lead.objects.filter(organization=org, email=item['email']).exists():
- Lead.objects.create(
- organization=org,
- email=item['email'],
- first_name=item['first_name'],
- last_name=item['last_name'],
- company=item['company'],
- phone=item['phone'],
- linkedin_url=item['linkedin_url']
- )
- inserted_count += 1
+ for item in extracted_raw_json[:limit]:
+ try:
+ _, created = Lead.objects.get_or_create(
+ organization=org,
+ email=item['email'],
+ defaults={
+ 'first_name': item['first_name'],
+ 'last_name': item['last_name'],
+ 'company': item['company'],
+ 'phone': item['phone'],
+ 'linkedin_url': item['linkedin_url'],
+ },
+ )
+ except IntegrityError:
+ continue
+ if created:
+ inserted_count += 1π€ Prompt for AI Agents |
||||||||||||||||||||
|
|
||||||||||||||||||||
| # Complete tracking job metrics lifecycle cleanly | ||||||||||||||||||||
| job.status = 'COMPLETED' | ||||||||||||||||||||
| job.leads_found = inserted_count | ||||||||||||||||||||
| job.completed_at = timezone.now() | ||||||||||||||||||||
| job.save() | ||||||||||||||||||||
|
|
||||||||||||||||||||
| except Exception as e: | ||||||||||||||||||||
| job.status = 'FAILED' | ||||||||||||||||||||
| job.error_message = str(e) | ||||||||||||||||||||
| job.completed_at = timezone.now() | ||||||||||||||||||||
|
Comment on lines
+186
to
+189
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π Security & Privacy | π΄ Critical π§© Analysis chainπ Script executed: # First, verify the file exists and examine the context around lines 186-189
cat -n backend/leads/tasks.py | sed -n '180,195p'Repository: Kuldeeep18/LeadOrbit Length of output: 581 π Script executed: # Get an outline of the file to understand the function structure
ast-grep outline backend/leads/tasks.py --view expandedRepository: Kuldeeep18/LeadOrbit Length of output: 449 π Script executed: # Search for other exception handling patterns in the same file
rg "except.*Exception" backend/leads/tasks.py -A 5Repository: Kuldeeep18/LeadOrbit Length of output: 311 π Script executed: # Check if job.error_message is exposed through API endpoints
rg "job\.error_message\|error_message" --type py -A 2 -B 2Repository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: # Verify if logger is imported and used in the file
rg "logger\|import logging" backend/leads/tasks.pyRepository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: # Search for where job.error_message is retrieved or returned in API responses
rg "error_message" --type py -B 3 -A 3Repository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: # Check the Job model definition to understand the field
fd -t f "models.py" backend/leads/ -exec cat -n {} \;Repository: Kuldeeep18/LeadOrbit Length of output: 389 π Script executed: # Search for API serializers or views that expose the job
rg "class.*Serializer\|class.*ViewSet\|class.*APIView" backend/ -A 5 | head -50Repository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: # Look for API responses that include job information
rg "job\|Job" backend/leads/views.py -B 2 -A 2Repository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: # Find the Job model
find . -name "*.py" -path "*/leads/*" | xargs grep -l "class Job"Repository: Kuldeeep18/LeadOrbit Length of output: 162 π Script executed: # Search for views.py in leads app
find . -path "*/leads/views.py"Repository: Kuldeeep18/LeadOrbit Length of output: 183 π Script executed: # Search for serializers
find . -path "*/leads/*" -name "*serializer*.py"Repository: Kuldeeep18/LeadOrbit Length of output: 189 π Script executed: # Check the structure of the leads app
find backend/leads -type f -name "*.py" | sortRepository: Kuldeeep18/LeadOrbit Length of output: 406 π Script executed: # Check the Job model definition
cat backend/leads/models.pyRepository: Kuldeeep18/LeadOrbit Length of output: 2435 π Script executed: # Check the serializer to see if error_message is exposed
cat backend/leads/serializers.pyRepository: Kuldeeep18/LeadOrbit Length of output: 1252 π Script executed: # Check the views to see how job data is returned
cat backend/leads/views.pyRepository: Kuldeeep18/LeadOrbit Length of output: 4975 π Script executed: # Check if there's any exception logging or sanitization elsewhere in the tasks.py file
cat backend/leads/tasks.pyRepository: Kuldeeep18/LeadOrbit Length of output: 7722 Do not expose raw exception text to API consumers. Lines 186β189 expose Suggested fix- except Exception as e:
+ except Exception:
+ logger.exception("scrape_leads_task failed", extra={"job_id": str(job_id), "organization_id": str(organization_id)})
job.status = 'FAILED'
- job.error_message = str(e)
+ job.error_message = "Lead scraping failed. Please retry."
job.completed_at = timezone.now()
job.save()π Committable suggestion
Suggested change
π§° Toolsπͺ Ruff (0.15.18)[warning] 186-186: Do not catch blind exception: (BLE001) π€ Prompt for AI Agents |
||||||||||||||||||||
| job.save() | ||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -4,6 +4,10 @@ | |||||||||||||||||||||||||||||||||
| from .models import Lead, Tag | ||||||||||||||||||||||||||||||||||
| from .serializers import LeadSerializer, TagSerializer | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| from django.utils import timezone | ||||||||||||||||||||||||||||||||||
| from .models import LeadScrapeJob | ||||||||||||||||||||||||||||||||||
| from .serializers import LeadScrapeJobSerializer | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| class LeadViewSet(viewsets.ModelViewSet): | ||||||||||||||||||||||||||||||||||
| serializer_class = LeadSerializer | ||||||||||||||||||||||||||||||||||
| queryset = Lead.objects.all() | ||||||||||||||||||||||||||||||||||
|
|
@@ -37,6 +41,61 @@ def import_csv(self, request): | |||||||||||||||||||||||||||||||||
| import_leads_from_csv.delay(file_contents, request.user.organization.id) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| return Response({"message": "File received. Processing in background.", "filename": file_obj.name}, status=status.HTTP_202_ACCEPTED) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| @action(detail=False, methods=['post'], url_path='scrape') | ||||||||||||||||||||||||||||||||||
| def scrape(self, request): | ||||||||||||||||||||||||||||||||||
| query = request.data.get('query', '').strip() | ||||||||||||||||||||||||||||||||||
| limit = int(request.data.get('limit', 50)) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| if not query: | ||||||||||||||||||||||||||||||||||
| return Response({"error": "A search query is required."}, status=status.HTTP_400_BAD_REQUEST) | ||||||||||||||||||||||||||||||||||
| if limit > 200: | ||||||||||||||||||||||||||||||||||
| limit = 200 # Enforce security constraint max limit | ||||||||||||||||||||||||||||||||||
|
Comment on lines
+47
to
+53
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π©Ί Stability & Availability | π΄ Critical π§© Analysis chainπ Script executed: # First, locate and inspect the file
cat -n backend/leads/views.pyRepository: Kuldeeep18/LeadOrbit Length of output: 5731 Handle invalid Line 48 will raise a runtime exception on non-numeric input (e.g., Suggested fix- query = request.data.get('query', '').strip()
- limit = int(request.data.get('limit', 50))
+ query = request.data.get('query', '').strip()
+ raw_limit = request.data.get('limit', 50)
+ try:
+ limit = int(raw_limit)
+ except (TypeError, ValueError):
+ return Response({"error": "Limit must be an integer."}, status=status.HTTP_400_BAD_REQUEST)
if not query:
return Response({"error": "A search query is required."}, status=status.HTTP_400_BAD_REQUEST)
+ if len(query) > 255:
+ return Response({"error": "Query must be 255 characters or fewer."}, status=status.HTTP_400_BAD_REQUEST)
+ if limit < 1:
+ return Response({"error": "Limit must be at least 1."}, status=status.HTTP_400_BAD_REQUEST)
if limit > 200:
limit = 200 # Enforce security constraint max limitπ€ Prompt for AI Agents |
||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| org = request.user.organization | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # Constraint 1: Check for an active running job in this organization | ||||||||||||||||||||||||||||||||||
| active_job = LeadScrapeJob.objects.filter(organization=org, status='RUNNING').exists() | ||||||||||||||||||||||||||||||||||
| if active_job: | ||||||||||||||||||||||||||||||||||
| return Response({"error": "Your organization already has an active lead scraping job running."}, status=status.HTTP_400_BAD_REQUEST) | ||||||||||||||||||||||||||||||||||
|
Comment on lines
+58
to
+60
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π― Functional Correctness | π Major π§© Analysis chainπ Script executed: find backend -type f -name "*.py" | grep -E "(views|models)" | head -20Repository: Kuldeeep18/LeadOrbit Length of output: 397 π Script executed: cat -n backend/leads/views.py | sed -n '50,70p'Repository: Kuldeeep18/LeadOrbit Length of output: 1473 π Script executed: cat -n backend/leads/models.pyRepository: Kuldeeep18/LeadOrbit Length of output: 2841 π Script executed: cat -n backend/leads/views.py | sed -n '68,95p'Repository: Kuldeeep18/LeadOrbit Length of output: 1531 π Script executed: find backend/leads -name "tasks.py" -o -name "celery.py"Repository: Kuldeeep18/LeadOrbit Length of output: 181 π Script executed: cat -n backend/leads/tasks.pyRepository: Kuldeeep18/LeadOrbit Length of output: 9052 Treat Line 58 only blocks Suggested fix- active_job = LeadScrapeJob.objects.filter(organization=org, status='RUNNING').exists()
+ active_job = LeadScrapeJob.objects.filter(
+ organization=org,
+ status__in=['PENDING', 'RUNNING'],
+ ).exists()π Committable suggestion
Suggested change
π€ Prompt for AI Agents |
||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # Constraint 2: Enforce 5-minute cooldown period between completions | ||||||||||||||||||||||||||||||||||
| five_minutes_ago = timezone.now() - timezone.timedelta(minutes=5) | ||||||||||||||||||||||||||||||||||
| recent_job = LeadScrapeJob.objects.filter(organization=org, status='COMPLETED', completed_at__gte=five_minutes_ago).exists() | ||||||||||||||||||||||||||||||||||
| if recent_job: | ||||||||||||||||||||||||||||||||||
| return Response({"error": "Throttled. Please wait 5 minutes between lead generation queries."}, status=status.HTTP_429_TOO_MANY_REQUESTS) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # Create tracking job record | ||||||||||||||||||||||||||||||||||
| job = LeadScrapeJob.objects.create( | ||||||||||||||||||||||||||||||||||
| organization=org, | ||||||||||||||||||||||||||||||||||
| query=query, | ||||||||||||||||||||||||||||||||||
| limit=limit, | ||||||||||||||||||||||||||||||||||
| status='PENDING' | ||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # Dispatch Celery background worker agent task | ||||||||||||||||||||||||||||||||||
| from .tasks import scrape_leads_task | ||||||||||||||||||||||||||||||||||
| scrape_leads_task.delay(job.id, query, limit, org.id) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
|
Comment on lines
+77
to
+79
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π©Ί Stability & Availability | π Major π§© Analysis chainπ Script executed: # First, find the backend/leads/views.py file and examine the area around lines 77-79
find . -name "views.py" -path "*/leads/*" | head -10Repository: Kuldeeep18/LeadOrbit Length of output: 183 π Script executed: # List the repository structure to understand the layout
git ls-files | grep -E "(leads|tasks)" | head -20Repository: Kuldeeep18/LeadOrbit Length of output: 958 π Script executed: # Read the views.py file around lines 77-79 to see the actual code
cat -n backend/leads/views.py | sed -n '60,100p'Repository: Kuldeeep18/LeadOrbit Length of output: 2506 π Script executed: # Also get an outline of the views.py file to understand the context
ast-grep outline backend/leads/views.pyRepository: Kuldeeep18/LeadOrbit Length of output: 507 π Script executed: # Check the tasks.py file to understand the scrape_leads_task
cat -n backend/leads/tasks.py | head -50Repository: Kuldeeep18/LeadOrbit Length of output: 1756 π Script executed: # Check the LeadScrapeJob model to understand the status field
cat -n backend/leads/models.py | grep -A 20 "class LeadScrapeJob"Repository: Kuldeeep18/LeadOrbit Length of output: 1054 π Script executed: # Search for existing Celery error handling patterns in the codebase
rg "CeleryError|except.*delay|celery.exceptions" --type pyRepository: Kuldeeep18/LeadOrbit Length of output: 158 π Script executed: # Check the scrape_leads_task implementation in tasks.py
cat -n backend/leads/tasks.py | grep -A 30 "scrape_leads_task"Repository: Kuldeeep18/LeadOrbit Length of output: 1915 π Script executed: # Look for all imports in views.py to understand current imports and patterns
cat -n backend/leads/views.py | sed -n '1,35p'Repository: Kuldeeep18/LeadOrbit Length of output: 1768 π Script executed: # Search for other places where .delay() is called to see if there's any error handling
rg "\.delay\(" backend --type py -A 2 -B 2Repository: Kuldeeep18/LeadOrbit Length of output: 1603 π Script executed: # Check what exceptions can be raised by Celery's delay() method
python3 << 'PY'
import inspect
# Check what exceptions Celery can raise
print("Common Celery exceptions that delay() might raise:")
print("1. kombu.exceptions.OperationalError - when broker connection fails")
print("2. celery.exceptions.CeleryError - base exception class")
print("3. celery.exceptions.SecurityError - security-related errors")
print("4. Exception - general exceptions")
print()
print("The .delay() method is part of the Task API and can raise:")
print("- celery.exceptions.CeleryError and its subclasses")
print("- kombu exceptions (connection-related)")
print("- General exceptions from the broker")
PYRepository: Kuldeeep18/LeadOrbit Length of output: 616 π Script executed: # Check if there's any Celery configuration or broker setup in the project
rg "CELERY|BROKER" backend --type py | head -20Repository: Kuldeeep18/LeadOrbit Length of output: 1358 π Script executed: # Look at the entire scrape method to get full context
ast-grep run --pattern 'def scrape($_) { $$$ }' backend/leads/views.py --lang pythonRepository: Kuldeeep18/LeadOrbit Length of output: 472 π Script executed: # Read the full scrape method from views.py
cat -n backend/leads/views.py | sed -n '45,84p'Repository: Kuldeeep18/LeadOrbit Length of output: 2256 π Script executed: # Check how other .delay() calls are handled in the codebase
rg "\.delay\(" backend --type py -B 5 -A 5 | head -50Repository: Kuldeeep18/LeadOrbit Length of output: 3214 Handle enqueue failures for Celery task and update job status consistently. If Suggested fix+from celery.exceptions import CeleryError
...
- scrape_leads_task.delay(job.id, query, limit, org.id)
+ try:
+ scrape_leads_task.delay(job.id, query, limit, org.id)
+ except CeleryError:
+ job.status = 'FAILED'
+ job.error_message = 'Failed to enqueue scrape job.'
+ job.completed_at = timezone.now()
+ job.save(update_fields=['status', 'error_message', 'completed_at'])
+ return Response(
+ {"error": "Unable to start background job. Please retry."},
+ status=status.HTTP_503_SERVICE_UNAVAILABLE,
+ )π Committable suggestion
Suggested change
π€ Prompt for AI Agents |
||||||||||||||||||||||||||||||||||
| return Response({ | ||||||||||||||||||||||||||||||||||
| "message": "AI Lead Generation background agent launched successfully.", | ||||||||||||||||||||||||||||||||||
| "job_id": str(job.id) | ||||||||||||||||||||||||||||||||||
| }, status=status.HTTP_201_CREATED) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| @action(detail=False, methods=['get'], url_path='scrape/(?P<job_id>[^/.]+)/status') | ||||||||||||||||||||||||||||||||||
| def scrape_status(self, request, job_id=None): | ||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||
| job = LeadScrapeJob.objects.get(organization=request.user.organization, id=job_id) | ||||||||||||||||||||||||||||||||||
| serializer = LeadScrapeJobSerializer(job) | ||||||||||||||||||||||||||||||||||
| return Response(serializer.data, status=status.HTTP_200_OK) | ||||||||||||||||||||||||||||||||||
| except LeadScrapeJob.DoesNotExist: | ||||||||||||||||||||||||||||||||||
| return Response({"error": "Scrape job not found."}, status=status.HTTP_404_NOT_FOUND) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| @action(detail=False, methods=['get'], url_path='scrape/history') | ||||||||||||||||||||||||||||||||||
| def scrape_history(self, request): | ||||||||||||||||||||||||||||||||||
| jobs = LeadScrapeJob.objects.filter(organization=request.user.organization).order_by('-created_at') | ||||||||||||||||||||||||||||||||||
| serializer = LeadScrapeJobSerializer(jobs, many=True) | ||||||||||||||||||||||||||||||||||
| return Response(serializer.data, status=status.HTTP_200_OK) | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| class TagViewSet(viewsets.ModelViewSet): | ||||||||||||||||||||||||||||||||||
| serializer_class = TagSerializer | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ποΈ Data Integrity & Integration | π Major
π§© Analysis chain
π Script executed:
# First, let's examine the models.py file to understand the current state cat -n backend/leads/models.pyRepository: Kuldeeep18/LeadOrbit
Length of output: 2841
π Script executed:
Repository: Kuldeeep18/LeadOrbit
Length of output: 339
π Script executed:
Repository: Kuldeeep18/LeadOrbit
Length of output: 7880
π Script executed:
Repository: Kuldeeep18/LeadOrbit
Length of output: 158
π Script executed:
Repository: Kuldeeep18/LeadOrbit
Length of output: 874
π Script executed:
Repository: Kuldeeep18/LeadOrbit
Length of output: 1501
π Script executed:
Repository: Kuldeeep18/LeadOrbit
Length of output: 4589
π Script executed:
Repository: Kuldeeep18/LeadOrbit
Length of output: 158
Add field-level validators to enforce
limitbounds in the model.The
limitfield accepts any integer value, allowing non-endpoint code paths (e.g., serializer writes, Django admin, direct ORM operations) to persist values exceeding the 200-row contract. Validation should be enforced at the model layer, not just in the view.Suggested fix
π Committable suggestion
π§° Tools
πͺ ast-grep (0.44.0)
[info] 50-50: use help_text to document model columns
Context: models.CharField(max_length=20, choices=STATUS_CHOICES, default='PENDING')
Note: [CWE-710] Improper Adherence to Coding Standards.
(model-help-text)
π€ Prompt for AI Agents