Replace pdf-parse with direct PDF upload to OpenAI API

## Summary
Replace the current pdf-parse library with OpenAI's native PDF upload capability which became available in March 2025. This will simplify our PDF processing pipeline and potentially improve extraction accuracy for complex documents.

## Current Implementation
- We use `pdf-parse` library to extract text from PDFs in `src/actions/extractTextFromFile.ts`
- Extracted text is then sent to OpenAI API using Vercel AI SDK
- We're already using OpenAI models (GPT-4o-mini) via `generateObject` from the AI SDK

## Proposed Solution
Leverage OpenAI's direct PDF file input support announced in March 2025:
- Send PDFs directly to the OpenAI API without pre-parsing
- Use the existing Vercel AI SDK which should support file uploads
- Keep pdf-parse as a fallback for when direct upload fails (known reliability issues)

## Benefits
1. **Simplified pipeline**: Remove text extraction step
2. **Better accuracy**: OpenAI handles PDF structure internally, preserving tables/layouts
3. **Mixed content support**: Better handling of PDFs with images, tables, complex formatting
4. **Cost optimization**: Can still pre-parse for high-volume scenarios if needed

## Implementation Steps
1. Update `extractTextFromFile.ts` to support direct PDF upload mode
2. Modify API extraction endpoints to accept file buffers alongside text
3. Update OpenAI client configuration to handle file uploads
4. Implement fallback to pdf-parse when direct upload fails
5. Test with various CV formats (simple text, tables, mixed content)

## Technical Details
- OpenAI supports PDF upload via `files.create` with `purpose='user_data'`
- Then reference file in chat completion with `type: 'file'` message
- Vercel AI SDK may need updates to support file messages
- Keep extracted text flow for Word documents (mammoth)

## Considerations
- Monitor API costs (direct upload may be more expensive)
- Handle intermittent "unable to read PDF" errors reported by users
- Maintain backward compatibility with existing data
- Consider hybrid approach: simple PDFs → direct upload, complex → parse first

## References
- [OpenAI announcement](https://community.openai.com/t/direct-pdf-file-input-now-supported-in-the-api/1146647)
- [OpenAI PDF docs](https://platform.openai.com/docs/guides/pdf-files)
- Research indicates Docling or Unstructured are better modern alternatives to pdf-parse if we need fallback

## Priority
Medium - Current system works but this would improve accuracy and simplify code

## Labels
enhancement, ai, infrastructure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace pdf-parse with direct PDF upload to OpenAI API #465

Summary

Current Implementation

Proposed Solution

Benefits

Implementation Steps

Technical Details

Considerations

References

Priority

Labels

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Replace pdf-parse with direct PDF upload to OpenAI API #465

Description

Summary

Current Implementation

Proposed Solution

Benefits

Implementation Steps

Technical Details

Considerations

References

Priority

Labels

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions