Improve deduplication — catch same opportunity posted under different URLs

## Context
ScoutBot currently deduplicates by **Application Link URL**. This works well for most cases, but the same opportunity sometimes appears on multiple aggregator sites under different URLs. Students can receive the same opportunity twice if it's listed on both OpportunityDesk and AfterSchoolAfrica, for example.

## Task
Add a secondary deduplication pass based on **title similarity**.

## Suggested approach
After URL-based dedup, check the new entry's title against all titles already in the sheet. If similarity is above a threshold (e.g. 85%), skip the duplicate.

A simple approach without ML:
```python
from difflib import SequenceMatcher

def title_similarity(a, b):
    return SequenceMatcher(None, a.lower(), b.lower()).ratio()
```

Or strip common words (Scholarship, Fellowship, Program, 2025, 2026) before comparing.

## Files to touch
- `scoutbot/pipelines.py` — where dedup currently happens

## Notes
- Keep URL-based dedup as the primary check (fast)
- Title-based dedup only fires when URL dedup passes
- Log any title-based skips for visibility
- Add a test in `tests/`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve deduplication — catch same opportunity posted under different URLs #51

Context

Task

Suggested approach

Files to touch

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Improve deduplication — catch same opportunity posted under different URLs #51

Description

Context

Task

Suggested approach

Files to touch

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions