Reddit scraper violates API Terms of Service — no OAuth, fake browser User-Agent

## Summary
`fetch_reddit_user_about` and `fetch_reddit_recent_comments` in `specific_scrapers.py` access Reddit's JSON API using a fake browser User-Agent without OAuth authentication. This violates Reddit's API Terms of Service, which require all third-party applications to use OAuth 2.0 and a registered app client ID for any programmatic API access.

## Evidence
**File:** `src/adapters/specific_scrapers.py`

```python
async def fetch_reddit_user_about(*, username: str, settings: AppSettings | None = None) -> dict[str, Any] | None:
    ...
    headers = {
        "Accept": "application/json",
        "User-Agent": "Mozilla/5.0 (compatible; OSINT-D2/1.0)",  # ← fake browser UA
    }
    async with build_async_client(settings, extra_headers=headers) as client:
        resp = await client.get(url)   # https://www.reddit.com/user/{username}/about.json
```

The `about.json` and `comments.json` endpoints are part of Reddit's public API surface, but since June 2023, Reddit requires all API access (including public endpoints) to go through OAuth with a registered app. Direct JSON endpoint access using browser-style headers is explicitly prohibited and is the scraping pattern that triggered Reddit's API pricing controversy.

Reddit's current Terms of Service state: *"You may not use the Reddit Platform... in a way that does not comply with Reddit's API Terms."* The API Terms require OAuth 2.0 for all access.

Scraping with a fake browser UA (`Mozilla/5.0 (compatible; OSINT-D2/1.0)`) also provides no honest identification of the client, which is specifically what Reddit requires via the `User-Agent` format `<platform>:<app ID>:<version string> (by /u/<reddit username>)`.

## Why this matters
1. **Legal risk**: Reddit's ToS explicitly prohibit unauthenticated API access. Distributing a tool that violates these terms exposes the project and its users to cease-and-desist actions.
2. **Reliability**: Reddit has actively blocked unauthenticated scraping since 2023. The endpoints can and do return 403, 429, or incorrect data at any time. The code checks for `status_code != 200` but does not distinguish between "user doesn't exist" and "Reddit blocked the request" — both are silently treated as `None`.
3. **IP reputation**: The tool uses residential proxies via ScrapingAnt; accessing Reddit this way may get proxy IP ranges flagged and could impact other ScrapingAnt users.

## Attack or failure scenario
Reddit begins returning HTTP 200 responses with GDPR-compliant empty payloads for unauthenticated scrapers (a pattern they have used before). The `fetch_reddit_user_about` function receives a well-formed but empty response, returns `None`, and the operator concludes the target has no Reddit presence — a false negative in a privacy-sensitive investigation.

## Root cause
The Reddit scrapers were implemented against the legacy API surface that existed before Reddit enforced OAuth. No OAuth flow was implemented when Reddit changed its policies in 2023.

## Recommended fix
1. Register a Reddit app and add `OSINT_D2_REDDIT_CLIENT_ID` and `OSINT_D2_REDDIT_CLIENT_SECRET` to `AppSettings`.
2. Implement client credentials OAuth flow (`grant_type=client_credentials`) before making API calls.
3. Use the proper Reddit API User-Agent format: `osint-d2:v0.1:by-/u/<maintainer_username>`.
4. If Reddit credentials are not configured, skip the Reddit scan and emit a warning rather than scraping silently.

The PRAW library handles this cleanly, or use httpx directly with the OAuth token flow:

```python
token_resp = await client.post(
    "https://www.reddit.com/api/v1/access_token",
    data={"grant_type": "client_credentials"},
    auth=(client_id, client_secret),
    headers={"User-Agent": "osint-d2/0.1"},
)
token = token_resp.json()["access_token"]
# Then use Authorization: bearer {token} on API calls
```

## Acceptance criteria
- Reddit API calls use OAuth 2.0 client credentials flow.
- `User-Agent` follows Reddit's required format.
- `OSINT_D2_REDDIT_CLIENT_ID` and `OSINT_D2_REDDIT_CLIENT_SECRET` are documented in `.env.example`.
- If credentials are absent, `RedditScanner` skips gracefully with a warning.

## Suggested labels
security, bug, technical-debt

## Priority
P2

## Severity
**Medium** — Active ToS violation affecting all users of the tool. Produces unreliable results post-2023 Reddit API changes. Legal risk is real but enforcement is typically directed at large-scale abusers.

## Confidence
Confirmed — no OAuth token flow, no client ID, fake UA.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reddit scraper violates API Terms of Service — no OAuth, fake browser User-Agent #28

Summary

Evidence

Why this matters

Attack or failure scenario

Root cause

Recommended fix

Acceptance criteria

Suggested labels

Priority

Severity

Confidence

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reddit scraper violates API Terms of Service — no OAuth, fake browser User-Agent #28

Description

Summary

Evidence

Why this matters

Attack or failure scenario

Root cause

Recommended fix

Acceptance criteria

Suggested labels

Priority

Severity

Confidence

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions