Dashboard improvements: DB migrations, file caching, and architecture enhancements#492
Conversation
Add extract-from-document.ts utility that uses gpt-5-mini vision model for extracting text from PDFs and images. This replaces the deprecated pdf-parse library and provides better extraction quality. Features: - Supports PDFs and multiple image formats (png, jpeg, jpg, webp, gif) - Context-aware extraction with specialized prompts for CVs and job descriptions - Uses base64 encoding for file transmission to OpenAI vision API - Includes comprehensive logging and error handling - Low temperature (0.1) for accurate extraction
…xtraction Replace pdf-parse library with gpt-5-mini vision model in both server action and client extraction utilities. This eliminates Buffer() and util._extend deprecation warnings while improving extraction quality. Changes: - Use extractFromDocument() utility with gpt-5-mini model - Add support for image formats (png, jpeg, jpg, webp, gif) alongside PDFs - Implement intelligent extraction type detection from file names - Maintain consistent error handling and logging - Improve extraction accuracy with context-aware prompts Benefits: - Eliminates Node.js deprecation warnings from pdf-parse - Better extraction quality from Vision AI - Support for multiple file formats - Matches approach used in cvoptimiser sibling project
Remove deprecated pdf-parse library and its dependencies as document extraction now uses Vision AI (gpt-5-mini). Removed packages: - pdf-parse (deprecated, causes Buffer() warnings) - pdfreader (unused) - @types/pdf-parse (no longer needed) This completes the migration to Vision AI for document extraction, eliminating Node.js deprecation warnings and improving extraction quality.
Remove unused TrustedCompaniesSection and VideoTestimonialsSection imports from landing page.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Claude finished @bhekanik's task —— View job PR Review: Vision AI MigrationTodo List:
Issues Found
Good Patterns
Recommendation |
There was a problem hiding this comment.
Pull Request Overview
This PR migrates document extraction from the deprecated pdf-parse library to OpenAI's Vision AI (gpt-5-mini), eliminating Node.js deprecation warnings and adding support for multiple file formats including images.
- Replaces pdf-parse with AI-powered Vision extraction using gpt-5-mini
- Adds intelligent extraction based on filename analysis (CV vs job description vs general)
- Extends support to multiple formats: PDF, PNG, JPEG, JPG, WebP, GIF
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/lib/extractTextFromFile.ts | Updated to use Vision AI for PDFs/images and improved logging |
| src/actions/extractTextFromFile.ts | Server action migrated to Vision AI with enhanced error handling |
| lib/ai/extract-from-document.ts | New utility for AI-powered document extraction with specialized prompts |
| package.json | Removed deprecated pdf-parse dependencies |
| src/app/(marketing)/page.tsx | Cleaned up unused section imports |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| // Use vision model (gpt-5-mini) for document extraction | ||
| const model = getOpenAiClient()("gpt-5-mini"); | ||
| const result = await extractFromDocument({ | ||
| model: model as any, |
There was a problem hiding this comment.
Using 'as any' bypasses TypeScript's type safety. Consider properly typing the model parameter or using a more specific type assertion.
| model: model as any, | |
| model: model, |
| // Use vision model (gpt-5-mini) for document extraction | ||
| const model = getOpenAiClient()("gpt-5-mini"); | ||
| const result = await extractFromDocument({ | ||
| model: model as any, |
There was a problem hiding this comment.
Using 'as any' bypasses TypeScript's type safety. Consider properly typing the model parameter or using a more specific type assertion.
| model: model as any, | |
| model, |
| const data = await pdf(buffer); | ||
| logger.info({ textLength: data.text?.trim().length }, "Extracted text from PDF"); | ||
| return data.text?.trim(); | ||
| return result.data; |
There was a problem hiding this comment.
Missing null/undefined check. If result.data is null or undefined, this will return undefined instead of an empty string, which may break calling code expecting a string.
| return result.data; | |
| return result.data ?? ""; |
⚖️ License Compliance AlertPotential license compatibility issues detected. Issues Found
Allowed LicensesAll other licenses are automatically blocked. Please review dependencies with incompatible licenses before merging. |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
Add automated migration verification on pull requests with dry-run mode that validates schema changes without modifying the database. Actual migrations only run when changes are merged to main. Changes: - Add PR trigger with dry-run verification step - Add PR comment bot to report migration status - Improve migration output with clearer logging - Add path filters to only trigger on schema/migration changes - Separate dry-run job (PRs) from production migration job (main pushes) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add automatic database migration generation when schema files change. New migrations are auto-staged to ensure schema and migrations stay in sync. Changes: - Check for schema file changes in pre-commit - Run db:generate if schema modified - Auto-stage generated migration files - Add clear logging of generated migrations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add database-backed caching for file extraction results to avoid redundant AI processing of identical documents. Uses SHA-256 file hashing for cache lookups with hit tracking and automatic cache statistics. Changes: - Add fileExtractionCache table with file hash indexing - Implement cache get/set utilities with race condition handling - Add hit count tracking and automatic timestamp updates - Include cache cleanup utilities for old entries - Support CV, job description, and general document types - Add comprehensive error handling and logging 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Migrate file/URL extraction from server actions to proper API routes with authentication, caching, and comprehensive error handling. Includes React Query hooks for client-side usage. Changes: - Add POST /api/extract/file with auth middleware - Add POST /api/extract/url with auth middleware - Implement file hash-based caching for extractions - Add useExtractFile and useExtractUrl React Query hooks - Support PDF, Word, and image file types - Include Vision AI (gpt-5-mini) for document extraction - Add proper validation, logging, and error handling - Set 60s timeout for long-running extractions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Replace server actions with React Query mutations for file/URL extraction in Step1 and Step2 components. Removes deprecated server actions and utilities in favor of centralized API routes. Changes: - Update Step1JobDescription to use useExtractUrl and useExtractFile hooks - Update Step2CV to use useExtractFile hook - Remove extractTextFromFile and extractTextFromUrl server actions - Remove lib/extractTextFromFile utility (consolidated into API route) - Improve loading states with mutation isPending - Better error handling with React Query mutations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove CSRF middleware in favor of built-in Next.js protections and Clerk's security features. Simplifies middleware and reduces complexity. Changes: - Remove csrfMiddleware function and related logic - Remove CSRF token validation checks - Remove isCSRFExemptPath and isCSRFProtectedMethod utilities - Keep rate limiting as primary security layer - Remove /api/csrf-token from rate limit exemptions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add missing logger mock methods (info, error) in auth.test.ts - Reinstall esbuild, vitest, and plugin-react dependencies - All 160 tests now passing - Build, typecheck, and lint all passing - Ready for CI/deployment 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @bhekanik's task —— View job Todo List:
Issues Found
Good Patterns
Recommendation |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
⚖️ License Compliance AlertPotential license compatibility issues detected. Issues Found
Allowed LicensesAll other licenses are automatically blocked. Please review dependencies with incompatible licenses before merging. |
- Add GitHub Actions workflow for database migrations - Dry-run validation on PRs with migration file checks - Automatic migration execution on merge to main - PR comments with migration details and warnings - Standardize pre-commit hook for schema changes - Detects schema file modifications - Automatically generates migrations - Stages migration files for commit - Fails commit if migration generation fails - Add comprehensive documentation for migration setup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @bhekanik's task —— View job Todo List:
Issues Found
Good Patterns
Recommendation |
⚖️ License Compliance AlertPotential license compatibility issues detected. Issues Found
Allowed LicensesAll other licenses are automatically blocked. Please review dependencies with incompatible licenses before merging. |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
- Add permissions block to migrate-dry-run job - Grant contents:read, issues:write, pull-requests:write - Fixes "Resource not accessible by integration" error 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @bhekanik's task —— View job Todo List:
Issues Found
Good Patterns
Recommendation |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
- Remove Doppler CLI dependency from workflows - Call bunx drizzle-kit migrate directly with DATABASE_URL - Simplify dry-run job by removing unnecessary database checks - Reduces CI dependencies and makes workflow more portable Addresses code review feedback about Doppler not being available in CI 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @bhekanik's task —— View job Todo List:
Issues Found
Quick Wins
Good Patterns
Recommendation |
🗄️ Database Migration Check✅ Migration dry-run successful Migration Files Found: 9
...and 4 more These migrations will be applied automatically when merged to main. |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
⚖️ License Compliance AlertPotential license compatibility issues detected. Issues Found
Allowed LicensesAll other licenses are automatically blocked. Please review dependencies with incompatible licenses before merging. |
- Add permissions block at workflow level (not just job level) - This ensures GITHUB_TOKEN has necessary permissions - Fixes "Resource not accessible by integration" error The error occurred because repository-level settings may restrict default GITHUB_TOKEN permissions. Adding permissions at workflow level explicitly grants the required access. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @bhekanik's task —— View job Todo List:
Issues Found
Quick Wins
Good Patterns
Recommendation |
🗄️ Database Migration Check✅ Migration dry-run successful Migration Files Found: 9
...and 4 more These migrations will be applied automatically when merged to main. |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
⚖️ License Compliance AlertPotential license compatibility issues detected. Issues Found
Allowed LicensesAll other licenses are automatically blocked. Please review dependencies with incompatible licenses before merging. |
- Add continue-on-error to all github-script comment steps - Workflow now succeeds even if commenting fails - Migration validation still runs and reports status - Commenting failures logged but don't block the workflow This handles cases where repository settings restrict GITHUB_TOKEN permissions. The user can still see results in workflow logs. To enable PR comments, update repository settings: Settings → Actions → General → Workflow permissions → Select "Read and write permissions" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @bhekanik's task —— View job Todo List:
Issues Found
Quick Wins
Good Patterns
Recommendation |
🗄️ Database Migration Check✅ Migration dry-run successful Migration Files Found: 9
...and 4 more These migrations will be applied automatically when merged to main. |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
|
Claude finished @bhekanik's task —— View job Todo List:
Issues Found
Quick Wins
Good Patterns
Recommendation |
⚖️ License Compliance AlertPotential license compatibility issues detected. Issues Found
Allowed LicensesAll other licenses are automatically blocked. Please review dependencies with incompatible licenses before merging. |
🗄️ Database Migration Check✅ Migration dry-run successful Migration Files Found: 9
...and 4 more These migrations will be applied automatically when merged to main. |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
⚖️ License Compliance AlertPotential license compatibility issues detected. Issues Found
Allowed LicensesAll other licenses are automatically blocked. Please review dependencies with incompatible licenses before merging. |
- Add github-token parameter to all github-script actions - Falls back to GITHUB_TOKEN if PAT_TOKEN secret not set - Enables PR comments even when enterprise policies restrict GITHUB_TOKEN This works around enterprise-level policies that prevent GITHUB_TOKEN from having write permissions. To enable PR comments: 1. Create a PAT at https://github.com/settings/tokens with 'repo' scope 2. Add it as PAT_TOKEN secret in repository settings 3. Workflow will automatically use it for commenting Without PAT_TOKEN, workflow still succeeds but can't comment on PRs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @bhekanik's task —— View job Todo List:
Issues Found
Quick Wins
Good Patterns
Recommendation |
🗄️ Database Migration Check✅ Migration dry-run successful Migration Files Found: 9
...and 4 more These migrations will be applied automatically when merged to main. |
🔒 Security Audit SummarySecurity vulnerabilities were detected in the dependency update. Scan Results
Recommended Actions
For more details, check the workflow run. |
⚖️ License Compliance AlertPotential license compatibility issues detected. Issues Found
Allowed LicensesAll other licenses are automatically blocked. Please review dependencies with incompatible licenses before merging. |
Summary
This PR introduces several key improvements to the Interview Optimiser platform, focusing on developer experience, performance, and architecture:
Changes Made
1. Database Migration Workflow Enhancement
2. Pre-commit Hook for Migrations
db/schema/bun run db:generatewhen schema changes detected3. File Extraction Caching Infrastructure
fileExtractionCachetable with file hash indexing4. New API Routes
POST /api/extract/file: Authenticated file extraction with cachingPOST /api/extract/url: Authenticated URL extraction with cachinguseExtractFileanduseExtractUrl5. Component Refactoring
Step1JobDescriptionto use new extraction hooksStep2CVto use new extraction hooksextractTextFromFile,extractTextFromUrl6. Middleware Simplification
7. Build Fixes
auth.test.ts(missing logger mocks)Testing
bun run build)bun run test- 160/161 tests passing, 1 skipped)bun run typecheck)bun run lint)Type of Change
Checklist
Performance Impact
Positive impacts:
No negative impacts expected.
Security Considerations
🤖 Generated with Claude Code